+ All Categories
Home > Documents > OLAP Implementation Techniques

OLAP Implementation Techniques

Date post: 23-Nov-2014
Category:
Upload: nawaz-rehan
View: 1,187 times
Download: 19 times
Share this document with a friend
Popular Tags:
34
1 High Performance Data Warehouse Design and Construction OLAP Implementation Techniques
Transcript
Page 1: OLAP Implementation Techniques

1

High Performance Data WarehouseDesign and Construction

OLAP Implementation Techniques

Page 2: OLAP Implementation Techniques

2

Objectives

Provide a robust framework for OLAP techniques for decision support.

Characterize tradeoffs in performance, scalability, flexibility, and complexity associated to various OLAP implementation techniques.

Examine tradeoffs in aggregate construction.

Page 3: OLAP Implementation Techniques

3

Topics

OLAP framework for decision support. Physical implementation techniques:

MOLAP, ROLAP, HOLAP, and DOLAP. Star schema design.

Page 4: OLAP Implementation Techniques

Where Does OLAP Fit In?

OLAP = On-line analytical processing. OLAP is a characterization of applications, not a database

design technique. Idea is to provide very fast response time in order to

facilitate iterative decision-making. Analytical processing requires access to complex

aggregations (as opposed to record-level access).

Page 5: OLAP Implementation Techniques

Where Does OLAP Fit In?

Information is conceptually viewed as “cubes” for simplifying the way in which users access, view, and analyze data.

Quantitative values are known as “facts” or “measures.”– e.g., sales $, units sold, etc.

Descriptive categories are known as “dimensions.”– e.g., geography, time, product, scenario (budget or actual), etc.

Dimensions are often organized in hierarchies that represent levels of detail in the data (e.g., UPC, SKU, product subcategory, product category, etc.).

Page 6: OLAP Implementation Techniques

OLAP FASMI TestFast: Delivers information to the user at a fairly constant

rate. Most queries should be delivered to the user in five seconds or less.

Analysis: Performs basic numerical and statistical analysis of the data, pre-defined by an application developer or defined ad hoc by the user.

Shared: Implements the security requirements necessary for sharing potentially confidential data across a large user population.

Multi-dimensional: The essential characteristic of OLAP.Information: Accesses all the data and information necessary

and relevant for the application, wherever it may reside and not limited by volume.

...from the OLAP Report by Pendse and Creeth.

Page 7: OLAP Implementation Techniques

OLAP Implementations

MOLAP: OLAP implemented with a multi-dimensional database.

ROLAP: OLAP implemented with a relational database.

HOLAP: OLAP implemented with a hybrid of multi-dimensional and relational database technologies.

DOLAP: OLAP implemented for desktop decision support environments.

Page 8: OLAP Implementation Techniques

MOLAP Implementations

OLAP has historically been implemented through use of multi-dimensional databases (MDDs).

Dimensions are key business factors for analysis:– geographies (zip, state, region,...)

– products (item, product category, product department,...)

– dates (day, week, month, quarter, year,...)

Very high performance via fast look-up into “cube” data structure to retrieve pre-calculated results.

“Cube” data structures allow pre-calculation of aggregate results for each possible combination of dimensional values.

Use of application programming interface (API) for access via front-end tools.

Page 9: OLAP Implementation Techniques

MOLAP Implementations Need to consider both maintenance and storage

implications when designing strategy for when to build cubes.

Maintenance Considerations: Every data item received into MDD must be aggregated into every cube (assuming “to-date” summaries are maintained).

Storage Considerations: Although cubes get much smaller (e.g., more dense) as dimensions get less detailed (e.g., year vs. day), storage implications for building hundreds of cubes can be significant.

Page 10: OLAP Implementation Techniques

MOLAP Implementations Typically outperform relational database technology because all

answers are pre-computed into cubes (and overhead for accessing cubes is very low).

Difficult to scale because of combinatorial explosion in the number and size of cubes when dimensions of significant cardinality are required.

Beyond tens (sometimes small hundreds) of thousands of entries in a single dimension will break the MOLAP model because the pre-computed cube model does not work well when the cubes are very sparse in the population of individual cells.

See www.olapreport.com/DataExplosion.htm

Page 11: OLAP Implementation Techniques

Virtual CubesVirtual cubes are used when there is a need to join

information from two dissimilar cubes that share one or more common dimensions.

Similar to a relational view; two (or more) cubes are linked along common dimension(s).

Often used to save space by eliminating redundant storage of information.

Example: Build a list price cube that can be used to compute discounts given across many stores in a retail chain without redundant storage of the list price data through use of a virtual cube.

Page 12: OLAP Implementation Techniques

Partitioned Cubes

One logical cube of data can be spread across multiple physical cubes on separate (or same) servers.

The divide-and-conquer approach of partitioned cubes helps to mitigate the scalability limitations of a MOLAP environment.

Ideal cube partitioning is completely invisible to end users.

Page 13: OLAP Implementation Techniques

ROLAP Implementations

Advances in database technologies and front-end tools have begun to allow deployment of OLAP using ANSI SQL RDBMS implementations.

ROLAP facilitates deployment of much larger dimension tables than MOLAP implementations.

Front-end tools to facilitate GUI access to multi-dimensional analysis capabilities.

Aggregate awareness allows exploitation of pre-built summary tables for some front-end tools.

Star schema designs are often used to facilitate OLAP against relational databases.

Page 14: OLAP Implementation Techniques

14

Simplified Third Normal Form (Retail)ZONE REGION

zip _x_SMSA1

ZIP ZONE ZIP SMSA ZIP ADI QTR YR

STORE # ADDRESS ZIP ...WEEK QTR

DATE WEEK

RECEIPT # STORE # DATE ...

DATE WEATHER

RECEIPT #ITEM # ... $

ITEM # CATEGORYITEM # MFCTR

DEPTCATEGORY

zip _x_adi year

quarter

week

sale_header

store

date_x_store_x_weather

sale_detail

item_x_categoryitem_x_mfctr

category_x_dept

M

1M

1 1 1M

1M11

M M

1 M 1

M

M M1 1

M1

STORE #1

M

M M

Page 15: OLAP Implementation Techniques

15

Simplified Star Schema

ITEM# CATEGORY DEPT MFCTR ...

ITEM# STORE# DATERECEIPT# ...

M

1

Fact Table

Product Dimension Table

STORE# ADDRESS ZIP ADI SMSA ZONE

1

M

Geography Dimension Table

REGION

$

DATE WEEK QUARTER YEAR ...

Calendar Dimension Table

1

M

A vastly simplified model ... may even summarize out receipt # .....

STORE# DATE WEATHER

Store x Date Dimensional Table

1

M

1

M

Page 16: OLAP Implementation Techniques

Simplified Star Schema

A vastly simplified physical data model!

Collapse dimensional hierarchies into a single table for each dimension and create a single fact table from the header and detail records:

Fewer tables. Fewer joins to get results.

Page 17: OLAP Implementation Techniques

17

Star Schema for High Performance

Business question: How many $ in raincoats did I sell in the first week of January through stores in Boston?

Assume: 4 Billion rows in fact table. 20 different kinds (size, color, style) of raincoats

(product category) out of 50,000 UPCs in store. 8 stores out of 400 are in BOSTON SMSA. 2 years of POS history in DBMS.

Page 18: OLAP Implementation Techniques

18

Star Schema for High Performance

Simple (poor performance) approach to query execution:

1. Join item table with filtering on raincoat product category (very selective) to fact table.

2. Join date table with filtering by week (next most selective) to result table.

3. Join store table with filtering on store to result table from step 2.

4. Aggregate.

Page 19: OLAP Implementation Techniques

19

Star Schema for High Performance

Advanced (better performance) approach to query execution:

1. Cartesian product join between dimensional tables.* Result is 20 x 8 x 7 = 1,120 rows.

2. Use composite index on item:store:day into fact table for very selective access.* Access less than 0.00000008 percent of data in fact table!

Sophisticated cost-based optimizers will figure this out.

Page 20: OLAP Implementation Techniques

20

Forcing a Cartesian Product Join

Add an addition “join_value” column in each dimensional table.

Set join_value to same value in all rows of the dimensional tables.

Add additional where clause predicates joining on this column between dimensional tables.

NOTE: This shouldn't be necessary with a “smart” optimizer.

Page 21: OLAP Implementation Techniques

21

Forcing a Cartesian Product Join

Sample code:

select sum(sales.sales_amt)from d_sales_detail ,store ,item ,periodwhere d_sales_detail.store_id = store.store_id and d_sales_detail.item_id = item.item_id and d_sales_detail.day_dt = period.day_dt and period.day_dt between '23-NOV-2000' and '24-DEC-2000' and item.trade_style_cd = 'BARBIE' and store.state_cd = 'CA' and store.join_value = period.join_value and store.join_value = item.join_value and period.join_value = item.join_value;

Page 22: OLAP Implementation Techniques

22

Star Schema for High Performance

Problem: What if I want to know raincoat sales in first week of January regardless of store?

Answer: Performance advantage of composite index in traditional RDBMS is severely impaired!

B-tree indexing techniques do not allow for flexibility in the use of dimensions for query purposes.

Bit indexing (and variations thereof) often allows much more generality in achieving high performance from a star schema.

Page 23: OLAP Implementation Techniques

23

Star Schema for High PerformanceBottom Line:

It is not at all unusual to obtain an order of magnitude (or more) in performance advantage using a star schema with advanced indexing versus a more traditional relational database implementation.

Despite what vendors may tell you, star schemas cannot be effectively implemented for all DSS business applications and/or data models.

Page 24: OLAP Implementation Techniques

24

ROLAP

Relational OLAP often makes heavy use of summary tables to provide near instantaneous access for multi-dimensional queries.

Foundation is usually star schema or snowflake database design.

Allows OLAP with much larger data sets than multi-dimensional database (MDD) products using cube structures (MOLAP).

Page 25: OLAP Implementation Techniques

25

ROLAP

Number of summary tables can get very large if discipline is not enforced...

Assume a retail database with the following two dimensions on the fact table...

Calendar: Day, Week, Period, Quarter, Year, All Days

Geography: Store, Zone, District, Region, All Stores

Page 26: OLAP Implementation Techniques

26

ROLAP

All Days 13 19 24 28 30

Year 9 15 22 27 29

Quarter 6 11 18 23 26

Period 4 8 14 21 25

Week 2 5 10 17 20

Day 1 3 7 12 16

Store Zone District Region All Stores

Summary tables in a naive implementation require all combinations of the dimensions at each aggregation level...

30 summary tables! ... Add in item, SKU, subcategory, category, and all items...now we are up to 150 pre-aggregates!

Page 27: OLAP Implementation Techniques

27

ROLAP

Summary tables are more of a maintenance issue than a storage issue in most production implementations.

Notice that summary tables get much smaller as dimensions get less detailed (e.g., year vs. day).

Should plan for double the size of the unsummarized data for ROLAP summaries in most environments.

Every detail record that is received into warehouse must aggregate into EVERY summary table (assuming "to-date" summaries are maintained).

Page 28: OLAP Implementation Techniques

28

ROLAP

Warning: Do not assume that dimensions are always simple hierarchies.

Example: Items are not just category, subcategory, SKU, and atomic item.... what about trade styles or manufacturer?

Now we need summary tables along these lines as well...another 120 summary tables!

Calendar vs. accounting period vs. billing cycle can be even worse...

Page 29: OLAP Implementation Techniques

29

ROLAP

Many ROLAP products have devised ways to reduce the number of summary tables:

Ability to build summaries on-the-fly as demanded by end-user applications.

Ability to aggregate efficiently from subset of the summary tables.

Tools exist in some products to assist in DBAs in selecting the "best” aggregations to build.

HOLAP (Hybrid OLAP) tools allow co-existence of pre-built cubes alongside relational OLAP structures.

Page 30: OLAP Implementation Techniques

Intelligent Aggregation Selection

Maximum performance boost implies lots of disk for every pre-calculation.

Minimum performance boost implies no disk with zero pre-calculation.

Strategy is to use meta data to heuristically determine optimum set of aggregates from which all other aggregates can be derived.

Page 31: OLAP Implementation Techniques

Aggregate Wizards

Page 32: OLAP Implementation Techniques

Fact Table Aggregates

Enhance performance on common queries at coarser granularities.

Save space to permit storing more history than possible with finer granularities.

Take advantage of need to store other facts (with similar samples) at a particular granularity.

Page 33: OLAP Implementation Techniques

Aggregate Advice

Coarser granularity decreases potential cardinality, but usually increases density (e.g., daily summary table is typically twice the size of weekly summary table - not seven times).

Strongly consider omitting candidate aggregates where expected cardinality is more than 10% that of next finer granularity stored.

Keep the detail for drill down, even if you deploy aggregates for performance.

Page 34: OLAP Implementation Techniques

34

Bottom Line

There are many implementation techniques for delivery of an OLAP environment.

Must fully consider the performance, scalability, complexity, and flexibility characteristics when deciding between MOLAP, ROLAP, and HOLAP.

Understand your tools and RDBMS!


Recommended