Oracle 10g for Data Warehousing
Hermann Baer, OracleProduct Management Data Warehousing
Server Technologies
NoCOUG Winter Conference, Feb 8th 2005
Agenda
Oracle10g for data warehousing - short trip back in the history
– Continuous innovation over decades Adoption trends and drivers
– What do we see in the market Design and build a Data Warehouse
– Ensure a well-balanced system– Optimize Oracle
Oracle Database 10gR2 – sneak preview
The way to Oracle10g …
Data Warehousing development started decades ago with Oracle 7.0
– Primary focus on performance and scalability– Extended with Manageability and the BI platform
vision in the Oracle8i time frame Data Warehousing Imperatives
– Efficient Extract, Transform, Load (ETL)– Managing Large Data Volumes– Fast Query Response– Supporting Large User Population– Managing Simply
Oracle 7.3Oracle 7.3
Oracle10g for Data Warehousing Continuous Innovation
Partitioned Tables and Indexes Partition Pruning Parallel Index Scans Parallel Insert, Update, Delete Parallel Bitmap Star Query Parallel ANALYZE Parallel Constraint Enabling Server Managed Backup/Recovery Point-in-Time Recovery
Partitioned Tables and Indexes Partition Pruning Parallel Index Scans Parallel Insert, Update, Delete Parallel Bitmap Star Query Parallel ANALYZE Parallel Constraint Enabling Server Managed Backup/Recovery Point-in-Time Recovery
Oracle 8.0Oracle 8.0
Hash and Composite Partitioning Resource Manager Progress Monitor Adaptive Parallel Query Server-based Analytic Functions Materialized Views Transportable Tablespaces Direct Loader API Functional Indexes Partition-wise Joins Security Enhancements
Hash and Composite Partitioning Resource Manager Progress Monitor Adaptive Parallel Query Server-based Analytic Functions Materialized Views Transportable Tablespaces Direct Loader API Functional Indexes Partition-wise Joins Security Enhancements
Oracle9iOracle9i
List and Range-List Partitioning Table Compression Bitmap Join Index Self-Tuning Runtime Memory New Analytic Functions Grouping Sets External Tables MERGE Multi-Table Insert Proactive Query Governing System Managed Undo
List and Range-List Partitioning Table Compression Bitmap Join Index Self-Tuning Runtime Memory New Analytic Functions Grouping Sets External Tables MERGE Multi-Table Insert Proactive Query Governing System Managed Undo
Oracle8iOracle8i
Oracle10gOracle10g Self-tuning SQL Optimization SQL Access Advisor Automatic Storage Manager Self-tuning Memory Change Data Capture SQL Models SQL Frequent Itemsets SQL Partition Outer Joins Statistical functions and much more ...
Self-tuning SQL Optimization SQL Access Advisor Automatic Storage Manager Self-tuning Memory Change Data Capture SQL Models SQL Frequent Itemsets SQL Partition Outer Joins Statistical functions and much more ...
Agenda
Oracle10g for data warehousing - short trip back in the history
– Continuous innovation over decades Adoption trends and drivers
– What do we see in the market Design and build a Data Warehouse
– Ensure a well-balanced system– Optimize Oracle
Oracle Database 10gR2 – sneak preview
Oracle VLDWs are growing– Less systems, more data
DW systems are consolidated– Global view of the business
Importance of Data Warehousing increases dramatically
– Growing operational/tactical importance Cost Effectiveness becomes more important
– Better decisions, lower cost
Main Trends and Drivers
1. Sears Teradata 4.63
2. HCIA Informix 4.50
3. Wal-Mart Teradata 4.42
4. Tele Danmark DB2 2.84
5. CitiCorp DB2 2.47
6. MCI Informix 1.88
7. NDC Health Oracle 1.85
8. Sprint Teradata 1.30
9. Ford Oracle 1.20
10. Acxiom Oracle 1.13
SBC Teradata10.50
First Union Informix 4.50
Dialog Proprietary 4.25
Telecom Italia DB2 3.71
FedEx Teradata 3.70
Office Depot Teradata 3.08
AT & T Teradata 2.83
SK C&C Oracle 2.54
NetZero Oracle 2.47
Telecom Italia Informix 2.32
2001 Survey1998 SurveyFrance Telecom Oracle
29.23
AT&T Daytona26.27
SBC Teradata24.81
Anonymous DB216.19
Amazon.com Oracle13.00
Kmart Teradata12.59
Claria Oracle12.10
HIRA Sybase IQ11.94
FedEx Teradata9.98
Vodafone Teradata9.91
2003 Survey
Oracle VLDWs are growingWinter 2003 VLDB SurveyLargest Database Size, Decision Support
Powerful RDBMS functionality becomes more important and visible, e.g.
– Partitioning– Table compression– Automatic Storage Management (ASM)– Parallel processing
Oracle VLDWs are growing
Increasing Importance of DW
Latency between operational and analytical data must be minimized
– Intelligence when you need it Need for new and enhanced analytical
capabilities– More value from your data
“Classical” strengths of an RDBMS become more important
– E.g. Security, B/R, Availability, Concurrency
Safe money whenever possible– Commodity servers– Commodity disks– Software manageability
Example Amazon– 16 low cost Intel boxes replaced one SuperDome– Low cost storage arrays replaced high end storage
arrays– 2 DBAs
Cost Effectiveness
3 6 9 12 15 18 21 24
Months
100%
200%
300%W
o
r
k
l
o
d
Cost EffectivenessPay and Scale Incrementally
3 6 9 12 15 18 21 24
Months
100%
200%
300%W
o
r
k
l
o
d
Cost EffectivenessPay and Scale Incrementally ... with RAC
Commodity components make specific database functionality more important
– RAC for Scalability and Availability– Resource Manager– Automatic Storage Management (ASM)– RMAN / Oracle Backup (Oracle10gR2)
Cost Effectiveness
Oracle Database 10gDW Major Feature Summary
• ULDB support– Database size extended to
Exabytes (BIGFILES)– Unlimited size LOBs– Hash Partitioned Global Indexes– ASM removes file system limits
More Value From Your Data– Many New OLAP Features– New Data Mining algorithms – Stand-alone Data Mining Tool– Advanced Statistics
– SQL Model Clause– Frequent Item Sets– Partition Outer Join
Intelligence When You Need It–Cross Platform Transportable Tablespaces–Data Pump
– Async Change Data Capture– Enhancements to MERGE
Reduced Total Cost of Ownership• Manageability
– Workload Repository– Automatic SQL Tuning– Self-Tuning Global Memory– ASM
Agenda
Oracle10g for data warehousing - short trip back in the history
– Continuous innovation over decades Adoption trends and drivers
– What do we see in the market Design and build a Data Warehouse
– Ensure a well-balanced system– Optimize Oracle
Oracle Database 10gR2 – sneak preview
Build the foundation for Success
Even after decades of innovation, a computer ‘still’ consists of three main components
– CPU provides the computing power– Memory stores the transient data for computational operations– Disks (I/O) store the persistent information
Getting the best performance is finding the right balance of all these components and use them optimally
– Size your system appropriately– Design your database appropriately– Use the database appropriately
Data Warehousing is ‘just a special kind of application’
Configuring for your Workload
CPU requirements depend on user workload:– Concurrency of users, ratio of CPU-related tasks
Memory requirement mostly user-process driven IO requirements depend on query-mix:
– CPU vs. IO Relative CPU power for IO related tasks
– Logically Random IOs (predominant in star schema) required for index driven queries, e.g. Index lookups, Index
driven joins, Index scans– Logically Sequential IOs (predominant in 3rd NF schema)
required for table scans, e.g. Hash Joins
Find the balance between CPU and IO
Oracle can read 300+MB/sec per GHz/CPU power– Direct Read, multi-block IO,
e.g, parallel full table scan ('lab environment')
An ‘average’ DW system should plan for 75 -100MB/sec per GHz/CPU
– Typical mixture of IO and CPU intensive operations – Ball park number, adjust accordingly
TPC-H plans for appr. 200MB per 3GHz Xeon
Sizing GuidelinesConfiguring for Throughput
Configuring for Throughput“The weakest link” defines the throughput
Components to consider:● CPU: Quantity and speed ● HBA (Host Bus Adapter):
Quantity and speed● Switch speed● Controller: Quantity and speed● Disk: Quantity and speed
FC-Switch1 FC-Switch2
DiskArray 1
DiskArray 2
DiskArray 3
DiskArray 4
DiskArray 5
DiskArray 6
DiskArray 7
DiskArray 8
HB
A1
HB
A2
HB
A1
HB
A2
HB
A1
HB
A2
HB
A1
HB
A2
Throughput Performance
Component theory (Bit/s) maximal Byte/s
HBA 1/2Gbit/s 100/200 Mbytes/s
16 Port Switch 8 x 2Gbit/s 1600 Mbytes/s
Fibre Channel 2Gbit/s 200 Mbytes/s
Disk Controller 2Gbit/s 200 Mbytes/s
GigE NIC 1Gbit/s 80 Mbytes/s
Infiniband 10Gbit/s 890 Mbytes/s
CPU 200MB/s
Configuring for ThroughputBit is not Byte
Configuring for Throughput
FC-Switch1 FC-Switch2
DiskArray 1
DiskArray 2
DiskArray 3
DiskArray 4
DiskArray 5
DiskArray 6
DiskArray 7
DiskArray 8
HB
A1
HB
A2
HB
A1
HB
A2
HB
A1
HB
A2
HB
A1
HB
A2
Each switch needs to support 800MB/s to guarantee a total system throughput of 1600 MB/s
Each machine has 2 HBAs = 400MB/s; all 8 HBAs can sustain 8 * 200MB/s = 1600 MB/s
Each machine has 2 CPUs; all four servers drive about 2 * 200MB/s * 4 = 1600 MB/s
Each disk array has one 2Gbit controller; all 8 disk arrays can sustain 8 * 200MB/s = 1600 MB/s
Configuring the Storage
Design for throughput, not capacity Keep it simple
– Try using RAID 0+1 Use S.A.M.E. methodology
– Stripe And Mirror Everything– At the HW level, if available– Using ASM capabilities
Leverage ASM whenever possible– Striping and Mirroring capabilities– Automatic rebalancing– Enables low cost storage
You can easily compute the theoretical I/O performance of your system
– Typically measured by the minimum of [ I/O channel capacity, I/O controller capacity, disk I/O capacity]
Verify the I/O performance limits using OS-level commands
– Do this prior to using the database Cover basic IO operations and the average future load
pattern– Random single block IO vs. sequential multi block IO– Concurrency
Calibrate your System
Calibrate your SystemThroughput dd vs. ORCL DIRECT READ
0
100
200
300
400
500
Throughput [MB/s]
1 2 3 4 5 6 7 8 9
Copies of dd/Degree of Paralellism
dd Oracle
● Oracle drives about 90% of what dd can drive with a table scan● If you do not get the expected throughput fix the hardware
Agenda
Oracle10g for data warehousing - short trip back in the history
– Continuous innovation over decades Adoption trends and drivers
– What do we see in the market Design and build a Data Warehouse
– Ensure a well-balanced system– Optimize Oracle
Oracle Database 10gR2 – sneak preview
Schema – which way to go?
Don’t get lost in theory and academia– Philosophical discussions won’t help (“Star fights 3NF”)– None of the two extremes will work (RedBrick?, Teradata?)
Design according to your business needs– Reality shows that most of the customers are doing a mix and
match
3NF more in an ODS layer ‘Denormalized’ 3NF in DW/Stage for general
purposes Dimensional model for subject areas, e.g. sales,
marketing (remember shared dimensions!)
* OLAP will not be covered in this presentation
Successful database has to support everything
The chosen schema approach determines used Oracle functionality
The chosen schema approach determines IO pattern– Logically Random IOs (predominant in star schema)
required for index driven queries, e.g. Index lookups, Index driven joins, Index scans
– Logically Sequential IOs (predominant in 3rd NF schema) required for table scans, e.g. Hash Join
Oracle has both functionality to – Push the IO to the limit– Optimize the IO requirements
Schema – which way to go?
Schema – which way to go?Star Schema
Leading performance for dimensional schemas Innovative usage of bitmap indexes and
bitmap join indexes– Index access instead of large table access– Bitmap indexes 3 to 20 times smaller than
btree indexes
Support for complex star schemas– Multiple fact tables– Snowflake schemas– Large number of dimensions
Fully integrated Parallel execution Partition Pruning
99-May
99-Apr
99-Feb
99-Jan
99-Mar
99-Jun
Sales
I/O – Minimize RequestsPartition Pruning
Only the relevant partitions will be accessed
Optimizer knows or finds the relevant partitions
– Static pruning with known values in advance– Dynamic pruning uses internal recursive
SQL to find the relevant partitions Minimizes I/O operations
– Also provides order of magnitude performance gains
Monthly Salesby Region
QueryWhat were the sales in the West and South regions for the past three Quarters?
DetailDetail
I/O – Minimize RequestsMaterialized Views
A simple rollup Month -> Quarter provides unprecedented gain on performance and minimal I/O
QueryRewrite
CUSTOMER_ORDERS CUSTOMER_ORDER_PRODUCTS
............
Jan
Feb
Mar
Ap
r............
Jan
Feb
Mar
Ap
r
Jan
Jan
Jan, Hash 1
Jan, Hash 2
Jan, Hash 3
Jan, Hash 4
Example of an optimized parallel partition-wise join of a composite partitioned table
Schema – which way to go?3NF example
Use parallelism to enable single process scalability
Unrestricted parallelism– No data layout requirement or restriction (as in
shared nothing systems) – All operations can be parallelized
Data on Disk Query Servers
scanscan
scanscan
scanscan
sort A-Ksort A-K
sort L-S sort L-S
sort T-Zsort T-Z
Dispatch workDispatch work
Scanners Sorters (Aggregators)
Coordinator
Schema Agnostic - Parallel ExecutionSchema – which way to go?
DOP 2
DOP 2
Total 200 MB/sec
I/O bandwidth requirement increases with single process parallelism and multi-user concurrency
– Plan for your system’s expected I/O throughput based on average concurrent users and parallelism
Schema – which way to go?Schema Agnostic - Parallel Execution
DOP 4
DOP 4
DOP 4
DOP 4
Total 400 MB/sec
I/O bandwidth requirement increases with single process parallelism and multi-user concurrency
– Plan for your system’s expected I/O throughput based on average concurrent users and parallelism
Schema – which way to go?Schema Agnostic - Parallel Execution
Total 800 MB/sec
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
Total 1600 MB/sec
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
I/O bandwidth requirement increases with single process parallelism and multi-user concurrency
– Plan for your system’s expected I/O throughput based on average concurrent users and parallelism
Schema – which way to go?Schema Agnostic - Parallel Execution
Star schema– Range-partition fact tables by time– Bitmap indexes on dimension-key columns of fact table– ‘Star transformation’ for end-user queries– Materialized views for pre-aggregated cubes
3NF or normalized schema– Composite range-hash partitioning on large tables– ‘Partition-wise’ joins and parallel execution are key
performance enabler for joining large tables Hybrid environments
– Use both dogmas concurrently in the same system without affecting each other
Schema – which way to go?Oracle‘s functionality
Choose what fits your needs best!Oracle provides optimizations for any kind of setup
Init.ora – less is more
Do not de-tune Oracle– Very often, our performance engineers are
getting improvements just by removing parameters
– Results can be poor optimizer plans, wasted memory, and serialization points
Trust Oracle– Don’t try and second guess the software– With the exception of buffer and subject area
related parameters, the system defaults are usually optimum
Lessons learned from History
Init.ora – less is more
Ensure that data warehouse relevant parameters are set
– Not all parameters are enabled by default in older database releases prior to Oracle10g
Size and set buffer and memory related parameters
– Two parameters are enough
Do not touch other parameters unless necessary
Basic Rules
Init.ora – less is more
COMPATIBLE– Database release version to enable new functionality
OPTIMIZER_FEATURES_ENABLED– Database release version to enable new functionality
DB_MULTIBLOCK_READ_COUNT– Maximize multiblock I/O (use multiple of OS I/O size)
DISK_ASYNCH_IO– Set to TRUE (Only relevant for older Linux versions)
PARALLEL_MAX_SERVERS– Adjust to system capabilities (default to 5 prior to Oracle10g)
QUERY_REWRITE_ENABLED– Set to TRUE, enabled by default with Oracle10g
QUERY_REWRITE_INTEGRITY– ENFORCED by default, can be potentially lowered
STAR_TRANSFORMATION_ENABLED– Set to TRUE
Data Warehouse relevant parameters
Data Warehousing is ‘just a special kind of application’
Ensure a well-tuned I/O subsystem– Size for I/O throughput, not for disk capacity– Use appropriate hardware / storage
Find a schema balance– Design according your needs using the
appropriate model, not the other way around Init.ora settings: less is more
Build the foundation for SuccessSummary
Agenda
Oracle10g for data warehousing - short trip back in the history
– Continuous innovation over decades Adoption trends and drivers
– What do we see in the market Design and build a Data Warehouse
– Ensure a well-balanced system– Optimize Oracle
Oracle Database 10gR2 – sneak preview
ETL Enhancements
DML error logging• Column values that are too large• Constraint violations (NOT NULL, unique,
referential, check constraints)• Errors raised during trigger execution• Type conversion errors• Partition mapping errors
• Distributed Change Data Capture• Enables 9.2 as source for asynchronous CDC
DML Error Logging (example)
INSERTINTO salesSELECT product_id, customer_id, TRUNC(sales_date), 3, promotion_id, quantity, amountFROM sales_activity_directLOG ERRORS INTOsales_activity_errors('load_20050801')REJECT LIMIT UNLIMITED ;
Performance Enhancements
Sort– ORDER BY statements– (B-tree) index creation– Up to 5 times performance improvement
Aggregation– GROUP BY statements– Materialized views using aggregations– Implicit use of aggregations, e.g. statistics gathering– Two to three times performance improvement
Query rewrite using multiple materialized views
Partitioning Enhancements
Scalability– Maximum number of partitions 64K -> 1M– Resource optimization for DROP TABLE of a
partitioned table– Support for partitioning on index-organized tables– Support for hash-partitioned global indexes
Performance– Support for ‘Multi dimensional’ partition pruning
Other Enhancements
Manageability– SQL Access Advisor improvements– Materialized view refresh improvements
Analytics– SQL model clause enhancements
Summary
Oracle10g for data warehousing - short trip back in the history
– The most powerful and successful DW platform
Adoption trends and drivers– Be visionary, though conservative – Guarantee success and protect investments
Design and build a Data Warehouse– Ensure a well-balanced system– Optimize Oracle
Oracle Database 10gR2 Beta – Interested?
AQ&Q U E S T I O N SQ U E S T I O N S
A N S W E R SA N S W E R S