+ All Categories
Home > Documents > Tim Gorman Evergreen Database Technologies, Inc. North Carolina Oracle Users Group Scaling to...

Tim Gorman Evergreen Database Technologies, Inc. North Carolina Oracle Users Group Scaling to...

Date post: 01-Apr-2015
Category:
Upload: karl-avey
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
36
Tim Gorman Evergreen Database Technologies, Inc. http:// www.EvDBT.com North Carolina Oracle Users Group Scaling to Infinity: Partitioning Data Warehouses in Oracle
Transcript
Page 1: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Tim GormanEvergreen Database Technologies, Inc.

http://www.EvDBT.com

North Carolina Oracle Users Group

Scaling to Infinity:Partitioning Data Warehouses in

Oracle

Page 2: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Speaker Qualifications

• Tim Gorman (chief, cook, and bottle-washer – EvDBT.com)• Director of RMOUG “Training Days 2007” conference

• Info online at http://www.RMOUG.org

• Co-author (with Gary Dodge)• “Oracle8 Data Warehousing” 1998 John Wiley & Sons• “Essential Oracle8i Data Warehousing” 2000 John Wiley & Sons

• Co-author (with Oak Table Network - http://www.OakTable.net)• “Oracle Insights: Tales of the Oak Table” 2004 Apress

• Oracle data warehousing DBA since 1994• Technical manager at Oracle Consulting• Independent consultant since 1998

Page 3: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Agenda

• I’ve seen the good, the bad, and the ugly…• In the end, successful data warehouses are a combination of all

three

• But in general, I see three major errors that result in doom…• Ignore basic requirements for DW and design what is familiar• Fail to portray data changes over time• Fail to utilize partitioning from the beginning

Page 4: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Ignoring the requirements

• Repeat after me -- reporting and analysis applications do not enforce business rules!

• Reporting and analysis applications are responsible for presenting data in the format that works best for end-users and their query/analysis tools• Very often, what end-users seem to want is a simple spreadsheet, hundreds of

columns wide• GIVE THEM WHAT THEY WANT!• Conceal from them what it takes to provide what they want

• Do NOT build a data model to enforce referential integrity and/or business rules

Page 5: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Third-normal form:1. Eliminate repeating groups

Every attribute is atomic and scalar

2. Eliminate functional dependencies on composite key componentsEvery attribute is functionally dependent on the whole key

3. Eliminate functional dependencies on non key componentsEvery fact/attribute in the entity should rely on the whole key

4. 4th, 5th, and 6th normal forms have been definedBut most entities that are in 3NF are also 4th, 5th, and 6th NF

Intended for use in process-oriented operational systems• enforce data integrity according to business rules

• using referential-integrity constraint mechanisms in application code as well as databases

Ignoring the requirements

Page 6: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

• Data presented in a simplistic dimensional model versus the 3rd-normal-form (3NF) entity-relationship model used by most operational systems

• Ralph Kimball discusses in The Data Warehouse Toolkit• John Wiley & Sons - ISBN #0471153370

• provide immediate, on-demand, and high-performance access to corporate or organizational subject data

• comprised of fact tables containing varying levels of summarized data and dimension tables representing important subject areas

• very simple representation of data• It is a spreadsheet with one degree of normalization for flexibility

• also known as a star schema because diagrams generally represent the fact table as a hub and dimensions as spokes

Ignoring the requirements

Page 7: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Customers

Orders

Order Lines

Products

Suppliers

Order Facts

Products DimSuppliers Dim

Customers Dim Time Dim

TransactionalOperational

Entity-RelationalModeling

DimensionalModeling

Ignoring the requirements

Page 8: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Ignoring the requirements• Fact tables

• More volatile• Contain columns for:

• Dimension keys• Measures

• In a spreadsheet or tabular report• Dimension keys don’t appear at all• Measures appear in the “cells” of the report

• Dimension tables• Usually more static

• Although the dimension for people is usually quite volatile

• Contain columns for:• Dimension keys• Attributes

• In a spreadsheet or tabular report• Dimension keys don’t appear at all• Attributes appear as “column headers” or “row headers”

Page 9: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Time-variant data, who cares?• Two major types of queries from business intelligence

applications to data warehouse databases• Point in time

• What is the present situation? What do the numbers look like now?• “Situational awareness” applications, also known as “dashboards” or

“executive information systems”• Usually uses the present point in time, but could also use any specific point

in time in the past

• Trend analysis• How do things look now versus 3 months ago? A year ago?• How have things changed day-by-day over the past quarter? Week-by-

week over the past year? Month-by-month over the past 7 years?

Page 10: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Time-variant data, who cares?

• Consider this…• Dimension tables are usually designed to be point-in-time or type-1

• People, items, products, etc.• Locations, time, etc.

• Fact tables are almost always designed to be time-variant or type-2• Transactions

• What happens when you join transactions from years ago with dimensional attributes from the present?• For example, when analyzing purchases by location, does it make

sense to summarize all transactions by a person’s present location?• Or should it reflect the person’s location at the time of the transaction?

Page 11: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Time-variant data, who cares?

• Every data warehouse has at least one slowly-changing dimension (SCD)• Usually involving “people” (i.e. accounts, customers, employees,

parties, etc)

• Static dimensions do not need to be time-variant• Identifying “static” dimensions: if a change is made to the

dimension, should it be reflected across all time?

• SCDs should be represented as “type-2”• “type-1” views of SCDs can be created as needed• “type-1” views of fact tables can also be created, if necessary, to

support point-in-time tactical reporting

Page 12: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Time-variant data, who cares?

PERSON_DIM

• Person_key• Eff_dto Last_nameo First_nameo Address_1o Address_2o Cityo …

CURR_PERSON_DIM

• Person_keyo Eff_dto Last_nameo First_nameo Address_1o Address_2o Cityo …

Type-2 Dimension

(time-variant)

Type-1 Dimension

(point-in-time view)

PK PK

Page 13: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Time-variant data, who cares?

PERSON_DIM

• Person_key• Eff_dt

CURR_PERSON_DIM

• Person_key

TXN_FACT

* …• Person_key• Person_eff_dt• …

Page 14: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Time-variant data, who cares?• Slowly-changing dimensions should always be “type-2”

• With “type-1” views constructed using the just-loaded “type-2” data

• So, with this in mind…• Why do people so often treat time-variant tables as an after-thought?

• Why do “extraction-transformation-loading” (ETL) processes so often focus on “MERGE” logic (“if row doesn’t exist then INSERT else UPDATE”) on the current point-in-time tables, and then insert change data as an after-thought

• a.k.a. “type-1” or “point-in-time” data

• Instead of…• inserting change data into the time-variant “type-2” table from which point-in-

time “type-1” views (as materialized views?) can be built for any point-in-time?

• Think about it…• If users should be using “type-2” data for SCDs, who usually utilizes the

“type-1” views of the SCDs? What are they good for?

Page 15: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Four characteristics of a DW• Non-volatile, time-variant, subject-oriented, integrated

• Bill Inmon “Building the Data Warehouse” 3rd Ed 2002 (Wiley)• Think about what these mean?

• Consider the converse of these characteristics? Volatile? Static-image? Process-oriented? Application-specific?

• Time-variant, non-volatile database implies:• Insert, index, and analyze each row of data only once

From an implementation perspective, this is vital to remember! And often ignored completely!!!

• Consider an extreme situation? Analytical database for quantum research in physics 50 Tbytes of data to load every day

Page 16: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

The Virtuous Cycle

• Insert-only processing enables…• Tables and indexes partitioned by time

• Optionally sub-partitioned by other key values

• Partitioned tables/indexes enables…• Partition pruning during queries• Direct-path loads using EXCHANGE PARTITION• Time-variant tables/indexes and tablespaces• Purging using DROP or TRUNCATE partition instead of DELETE

• Partition pruning enables…• Infinite scalability for queries, regardless of how large the

database becomes

• Direct-path (a.k.a. append) loads enable…• Ability to load more data, faster, more efficiently• Table compression

Page 17: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

• Time-variant partitioned tables/indexes enable…• Time-variant tablespaces

• Time-variant tablespaces enable…• READ ONLY tablespaces for older, less-volatile data

• READ ONLY tablespaces enable…• Near-line storage (i.e. NAS, SAMFS/HFS, etc)

• “Right-sizing” of storage to the need, classified by IOPS

• Backup efficiencies• READ WRITE tablespaces scheduled for backup every day or week• READ ONLY tablespaces scheduled for backup every quarter or year

The Virtuous Cycle

Page 18: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

The Virtuous Cycle

• Using EXCHANGE PARTITION for loads enables…• Elimination of ETL “load window” and 24x7 availability for queries• Direct-path loads• Bitmap indexes and bitmap-join indices

• Bitmap indices enable…• Star transformations on “star” (dimensional) schemas

• Star transformations enable…• Bitmap-join indexes• SUCCESS!

• optimal query-execution plan for dimensional data models!

Page 19: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

The Death Spiral

• Volatile data presented in a static-image according to process-oriented concepts leads to…• ETL using “conventional-path” INSERT, UPDATE, and DELETE

operations (including MERGE and multi-table INSERT)

• Conventional-path operations are troubletrouble with:• Bitmap indexes and bitmap-join indexes

• Forcing frequent complete rebuilds until they get too big

• Contention in Shared Pool, Buffer Cache, global structures• Mixing of queries and loads simultaneously on table and indexes• Periodic rebuilds/reorgs of tables if deletions occur

• Full redo logging and undo transaction tracking

• ETL will dominate the workload in the database• Queries will consist mainly of “dumps” or extracts to downstream

systems• Query performance will be abysmal and worsening…

Page 20: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

The Death Spiral

• Without partitioning• Query performance worsens as tables/indexes grow larger• Loads must be performed into “live” tables

• Users must be locked out during “load cycle”• In-flight queries must be killed during “load cycle”• Bitmap indexes must be dropped/rebuilt during “load cycle”• Entire tables must be re-analyzed during “load cycle”

• Entire database must be backed up frequently• Data cannot be “right-sized” to storage options according to IOPS

• Everything just gets harder and harder to do…• …and that stupid Oracle software is to blame…• BRING ON TERADATA OR <insert-flavor-of-the-month>

Page 21: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Exchange Partition• The basic technique of bulk-loading new data into a

temporary “load table”, which is then indexed, analyzed, and then “published” all at once to end-users using the EXCHANGE PARTITION operation, should be the default load technique for all large tables in a data warehouse• fact tables• slowly-changing or quickly-changing dimensions

• Assumptions for this example:• Composite partitioned fact table named TXN

• Range partitioned on DATE column TXN_DATE• Hash partitioned on NUMBER column ACCT_KEY• Data to be loaded into partition P20040225 on TXN

Page 22: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

22-Feb2004

23-Feb2004

24-Feb2004

(empty) 25-Feb2004

Composite-partitionedtable TXN

Hash-partitionedtable TXN_TEMP

2. BulkLoads

5. EXCHANGE PARTITION

4. Table &Col Stats

3. IndexCreates

Exchange Partition

1. CreateTemp Table

Page 23: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Exchange Partition

1.Create temporary table TXN_TEMP as a hash-partitioned table

2.Perform parallel, direct-path load of new data into TXN_TEMP

3.Create indexes on the temporary hash-partitioned table TXN_TEMP corresponding to the local indexes on TXN

• using PARALLEL, NOLOGGING, and COMPUTE STATISTICS options

4.Gather CBO statistics on table TXN_TEMP• Only table and columns stats -- leave computed index stats!

5.alter table TXNexchange partition P20040225 with table TXN_TEMPincluding indexes without validation update global indexes;

Page 24: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Exchange Partition• It is a good idea to encapsulate this logic inside PL/SQL packaged- or stored-procedures:

SQL> execute exchpart.prepare(‘TXN_FACT’,’TMP_’, - 2 ’25-FEB-2004’);SQL> alter session enable parallel dml;SQL> insert /*+ append parallel(n,4) */ 2 into tmp_txn_fact n 3 select /*+ full(x) parallel(x,4) */ * 4 from stage_txn_fact x 5 where load_date >= ‘25-FEB-2004’ 6 and load_date < ‘28-FEB-2004’;SQL> commit;SQL> execute exchpart.finish(‘TXN_FACT’,’TMP_’);

• DDL for “exchpart.sql” posted at http://www.EvDBT.com/tools.htm

Page 25: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Exchange Partition• Loading time-variant fact and dimension tables is not the only

load activity in most data warehouses• Often, some tables contain current or point-in-time data

• Example: type-1 dimension “snowflaked” from type-2 dimension• This is often an excellent situation for materialized views• But, as is often the case, the refresh mechanisms built in with materialized

views might not be the most efficient

• With each load cycle, the current images need to be updated• Instead of performing transactional MERGE (I.e. Update or Insert) logic

directly on the table• Rebuild the table into a temporary table, then “swap” it in using EXCHANGE

PARTITION

Page 26: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

23-Feb2004

24-Feb2004

25-Feb2004

Composite-partitioned tableACCOUNT_DIM

Composite-partitioned tableCURR_ACCOUNT_DIM

Hash-partitioned table

Merge/build operation

Exchange Partition

Page 27: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Previous cycle’s current-image data in

composite-partitioned table

CURR_ACCOUNT_DIM, with single partition named

PZERO

New current-image data in hash-

partitioned table CURR_ACCT_DIM_

TEMP

EXCHANGE PARTITION

Exchange Partition

Page 28: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Exchange PartitionINSERT /*+ append parallel(t, 8) */ INTO TMP_CURR_ACCOUNT_DIM

T

SELECT /*+ full(x) parallel(x, 8) */

0 partkey, acctkey, effdt, …(and so on for all columns)…

FROM (SELECT acctkey, effdt, …(and so on for all columns)…,

row_number() over (partition by acctkey order by effdt) ranking

FROM (SELECT acctkey, effdt, …(and so on for all columns)…

FROM CURR_ACCOUNT_DIM

UNION ALL

SELECT acctkey, effdt, …(and so on for all columns)…

FROM CURR_ACCOUNT_DIM partition (P20040225)

)

)

WHERE RANKING = 1;

Page 29: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Exchange Partition

ALTER TABLE CURR_ACCOUNT_DIM

exchange partition PZERO

with table TMP_ACCOUNT_DIM

[ with | without ] validation

including indexes

update global indexes;

Page 30: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Choosing partition keys• The most important decision when partitioning is…

• Choosing the partition key columns• All benefits of partitioning hinges upon this choice!!!

• Which columns to partition upon?• If the table contains time-variant data

• Choose the RANGE partition key DATE column to optimize:• ETL according to load cycles• End-user access through partition pruning

• Choose the HASH or LIST sub-partition key column to optimize:• End-user access through partition pruning

• If the table does NOT contain time-variant data• Choose the RANGE, HASH, or LIST partition key column to optimize:

• End-user access through partition pruning

Page 31: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

• When choosing columns to optimize ETL• Choose a column which distinguishes different load cycles

• Should be a DATE column

• When choosing columns to optimize end-user access• Gather hard facts about usage – don’t guess!

• Oracle STATSPACK and Oracle10g AWR

• Data dictionary table SYS.COL_USAGE$• Populated automatically by cost-based optimizer in Oracle9i and above• DDL script “dba_column_usage.sql” can be downloaded from http://

www.EvDBT.com/tools.htm

• Ambeo Usage Tracker (http://www.ambeo.com) • Teleran iSight (http://www.teleran.com)

Choosing partition keys

Page 32: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Choosing partition keys

• Example: fact table for credit-card processing• Fact table is time-variant

• Use range partitioning on DATE datatype to optimize ETL and queries

• Use hash- or list-subpartitioning to optimizer queries

• Fact table has four DATE columns• TXN_DT (date on which transaction occurred)• POST_DT (date on which transaction was posted by merchant)• PAID_DT (date on which transaction was paid to merchant)• LOAD_DT (date on which transaction was loaded to DW)

Page 33: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Choosing partition keys• Which should be chosen? And why?

• LOAD_DT• Optimizes ETL perfectly, but does not benefit queries in any

way…• Data is loaded by LOAD_DT• End-users don’t query on LOAD_DT

• TXN_DT, POST_DT, and PAID_DT• Each benefits a different set of end-user queries• Presents some problems for ETL processing

• Each date loads mostly into the latest partition, then a little into each partition for the previous 2-4 days

• This situation can be handled by iterating through the five steps of the basic EXCHANGE PARTITION algorithm

• Where each iteration processes a different LOAD_DT value

Page 34: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Summary recommendations

1. Use dimensional data models for the “presentation” to end-users• Don’t “free lance” and confuse the end-users• Understand the purpose of facts and dimensions

2. Base the database design on time-variant data structures• Don’t join “type-2” fact data to “type-1” dimension data• Load “type-2” data first, then rebuild “type-1” data from that

3. Use partitioning• Enable the “virtuous cycle” of Oracle features that cascade from

using partitioning intelligently…

Page 35: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Questions?

Page 36: Tim Gorman Evergreen Database Technologies, Inc.   North Carolina Oracle Users Group Scaling to Infinity: Partitioning.

Thank You!

• Rocky Mountain Oracle Users Group (www.rmoug.org)• “Training Days 2008”, Denver CO• Tue-Thu 12-14 Feb 2008

• Tues 12-Feb: 4-hour “university sessions”• Wed-Thu 13-14 Feb: main conference

• Thu-Sun 15-18 Feb 2008• Informal ad-hoc ski weekend for attendees who wish to partake!!!

• Tim’s contact info:• Web: http://www.EvDBT.com• Email: [email protected]


Recommended