Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 215 times |
Download: | 1 times |
Transbase® Hypercube: A leading-edge ROLAP
Engine supporting multidimensional Indexing and Hierarchy Clustering
Transbase® Hypercube: A leading-edge ROLAP
Engine supporting multidimensional Indexing and Hierarchy Clustering
Roland Pieringer
Transaction Software GmbH
Thomas-Dehler-Str. 18
81737 München, Germany
www.transaction.de
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 2 -
BTW 2003
Motivation
Many applications have multidimensional data Multidimensional indexes support retrieval of MD data Application Field: Data Warehouses
Hierarchically organized dimensions (e.g., year – month – day) Large data volumes Relatively static Mainly retrieval query profile
MD indexes usually support numeric MD data Encoding for hierarchical data necessary
Multidimensional Hierarchical Clustering (MHC)
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 3 -
BTW 2003
Theoretical comparison of range query performance
idealcase
multidimensionalindex
multipleB-Trees,
bitmap indexes
compound primaryB-Tree
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 4 -
BTW 2003
UB-Tree: basic concepts
Combination of B+-Tree and Z-curve Z-curve is used to map multidimensional points to one-dimensional
values (Z-values) Z-values are used as keys in B*-Tree Z-curve preserves spatial-proximity symmetric clustering
Index part
Data part
8 178 17 39 5139 51
2828
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 5 -
BTW 2003
Visualized range-queries
Germany
Sachsen
Bayern
Freiberg
Leipzig
Dresden
Burgh
München
Passau
Feb 2003 Mar 2003 Jun 2003
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 6 -
BTW 2003
MHC: Non-clustered hierarchy
Item
ProductGroup
Category
Sector
VideoAudio
Camcorder VCR
TR-780 TRV-30 GR-AX 200 GV-500 SLV-E800
Brown Goods White Goods
ALL
...
...
ID 2 11 5 8 21
Item
ProductGroup
Category
Sector
VideoAudio VideoAudio
Camcorder VCR
TR-780 TRV-30TR-780 TRV-30 GR-AX 200 GV-500 SLV-E800
Brown Goods White GoodsBrown Goods White Goods
ALL
...
...
ID 2 11 5 8 21
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 7 -
BTW 2003
MHC: Clustered hierarchy
...
Item
ProductGroup
Category
Sector
VideoAudio
Camcorder VCR
TR-780 TRV-30 GR-AX 200 GV-500 SLV-E800
Brown Goods White Goods
ALL
...
...
ID 2 11 5 8 21
0 1
00 01
0 1
Surrogate 00100000
0000 0001 0000 0001 0010
00100001 00110000 00110001 0011001032 33 48 49 50
...
Item
ProductGroup
Category
Sector
VideoAudio VideoAudio
Camcorder VCR
TR-780 TRV-30TR-780 TRV-30 GR-AX 200 GV-500 SLV-E800
Brown Goods White GoodsBrown Goods White Goods
ALL
...
...
ID 2 11 5 8 21
0 1
00 01
0 1
Surrogate 00100000
0000 0001 0000 0001 0010
00100001 00110000 00110001 0011001032 33 48 49 50
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 8 -
BTW 2003
Basic technology of MHC
MHC: Multidimensional Hierarchical Clustering MHC necessary because
Hierarchical organization of dimensions in warehouses No intervals for hierarchical restrictions Naive restrictions lead to many point queries instead of one
interval on UB-Tree
Artificial encoding of hierarchies: Mapping of hierarchy restrictions to range restrictions Mapping is used for physical clustering of the fact table Modification of query algorithms necessary Fast computation and space efficient
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 9 -
BTW 2003
Implementation of MHC
Implementation into Transbase® DBMS kernel Computation and maintenance of MHC encoding Integration into DDL and DML Integration into optimizer Integration into archiving tools
Transparency to users Physical optimization No extension of the DML
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 10 -
BTW 2003
Supported schemata
Support of star schema and snowflake schema Star schemata
Conventional complete de-normaliation of the dimension tables Foreign key relationships between fact table and dimension
tables
Supported snowflake schemata Inner dimension tables de-normalized with hierarchy attributes Feature attributes can be normalized Fully supported by optimizer More efficient than star schemata (knowledge about hierarchical
dependency)
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 11 -
BTW 2003
Transbase® DDL extension
Dimension TableCREATE TABLE dim_segment (
country_id INTEGER NOT NULL,country_txt CHAR(*),region_id INTEGER NOT NULL,region_txt CHAR(*),micromarket_id INTEGER(*) NOT NULL,micromarket_txt CHAR(*),outlet_id INTEGER NOT NULLoutlet_txt CHAR(*),SURROGATE cs_segment COMPOUND (country_id
SIBLINGS 16, region_id SIBLINGS 19, micromarket_id SIBLINGS 6, outlet_id SIBLINGS 2202),
PRIMARY KEY (outlet_id))
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 12 -
BTW 2003
Transbase® DDL extension (cont.)
Fact Table: CREATE TABLE fact (
dseg INTEGER REFERENCES dim_segment(outlet_id) ON UPDATE CASCADE,dprod INTEGER REFERENCES dim_product(item_id) ON UPDATE CASCADE,dtime INTEGER REFERENCES dim_time(day_id) ON UPDATE CASCADE,turnover NUMERIC(10,2)
…SURROGATE cs_seg FOR dseg,SURROGATE cs_prod FOR dprod,SURROGATE cs_time FOR dtime,PRIMARY HCKEY (cs_seg, cs_prod, cs_time)
)
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 13 -
BTW 2003
DML
No change of DML statements (SELECT, INSERT, UPDATE, DELETE) Conventional star (snowflake) joins (SQL-92 compliant):
SELECT country, department, category, group, year, quarter, month, SUM(price), SUM(turnover)
FROM customer c, product p, date d, fact fWHERE
f.custkey = c.customer AND f.prodkey = p.item_key AND f.datekey = d.day AND c.country = 'GERMANY' ANDc.department = 'SOUTH' ANDp.category = 'TV' ANDd.month = '10/2002' AND d.year = '2002'
GROUP BY country, department, category, group, year, month
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 14 -
BTW 2003
Conventional query processing
Standard method (non-clustering indexes): Index evaluation of dimension restrictions Fact table tuple materialization Residual join with dimension tables Grouping and aggregating Sorting
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 15 -
BTW 2003
MHC query processing: Overview
Abstract execution plan: better understanding, implementation in operator trees
Three phases: Interval generation (semi – join) Fact table access Grouping and residual join
Optimizing: hierarchical pre-grouping Minimize residual join operations by grouping before joining
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 16 -
BTW 2003
AEP - overview
Fact
Fact Table Access
Group Select
Order By
Create Range Create Range
DiDj
Main Execution Phase
Interval Generation
.
.
.
Residual Join
Dk
Di
...
Having
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 17 -
BTW 2003
Interval generation
Mapping of hierarchical restrictions into a number of intervals
Usage of special hierarchy indexes: DXh Index: (ht, ht-1, ..., h1, cs) Efficient interval computation
Optimization for feature restrictions: Merging many small intervals to less large intervals Usage of hierarchical dependency for feature attributes, if
supported by the schema (snowflake schemata)
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 18 -
BTW 2003
Fact table access
Combination of intervals of all clustering dimensions forms multidimensional query boxes QBi
Fact table access with implicit tuple materialization Sequential processing of query boxes Fast retrieving of result tuples Postfiltering can be necessary depending on the UB-Tree
dimensions and restrictions
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 19 -
BTW 2003
Standard AEP
Fact Table Access
Residual Join
Group Select
Order By
Dk
Predicate Evaluation
Having
Fact
Di
...
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 20 -
BTW 2003
Optimization: Hierarchical pre-grouping
Basic concept Hierarchy encoding stored in fact table (compound surrogates) Groups of hierarchical GROUP BY attributes built from
compound surrogates Grouping not exact for non-prefix path grouping Drastic reduction of fact table result tuples Example (for hierarchy year – month – day):
number of fact table result tuples: 100.000pre-grouping (on month): ca. 3.000 (aggregated) tuples residual join with 3.000 instead of 100.000 tuples reduction by a factor of 30!
Possibly post-grouping necessary for too fine pre-grouping
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 21 -
BTW 2003
Hierarchical pre-grouping (cont.)
Dln
Fact Table Access
Post-Group
Order By
Pre-Group
Residual Join
Having
Predicate EvaluationFact
Residual Join
Dei
De1
Dl1
...
...
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 22 -
BTW 2003
Performance comparison
Data: Real world data warehouse of electronic retailer in Greece 5 dimensions, 49 measures on fact table 3 years of transactions, i.e., 8,5 million fact table tuples (2,8 GB)
Environment 2 Processor Pentium II (400 MHz), 768 MB RAM, Windows 2000
Queries 22 query classes with 1.320 real world user queries
Comparisons MHC versus no multidimensional clustering Conventional grouping versus hierarchical pre-grouping
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 23 -
BTW 2003
Perf. comp: MHC – no clustering
FT Sel. % [0.0-0.1] [0.1-1.0] [1.0-5.0]
STAR AEP STAR AEP STAR AEP
MIN 0 0 65 2 274 11
MAX 30 6 290 9 1219 47
MEDIAN 1 1 182 8 477 23
STD-DEV 5 1 76 3 346 14
Time of fact tuple access in seconds
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 24 -
BTW 2003
Perf. comp: no pre-grouping – pre-grouping
FT Sel. % All [0.0 - 0.25] [0.25 - 1.0] [1.0 - 10.0]
MIN 3,6 3,6 21,3 46,0
1. Quartile 245,8 135,1 911,3 816,2
MEDIAN 1.139,5 531,6 2.270,4 5.938,9
3. Quartile 4.708,0 1.905,6 9.747,5 25.409,6
MAX 593.280,0 19.340,0 78.384,0 593.280,0
Comparison of grouping Cardinality:No pre-grouping / Hier. pre-grouping
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 25 -
BTW 2003
Perf. comp: no pre-grouping – pre-groupingSpeedup of the time of hierarchical pre-grouping
FT Sel. % ALL [0.0 - 0.25] [0.25 - 1.0] [1.0 - 10.0]
MIN 0,3 0,3 0,8 0,6
1. Quartile 3,0 2,4 3,9 4,6
MEDIAN 4,4 3,6 5,8 6,6
3. Quartile 6,5 5,2 7,2 7,8
MAX 25,5 14,3 25,5 12,6
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 26 -
BTW 2003
Summary
MHC: Multidimensional hierarchical clustering Encoding for hierarchy paths, in order to support clustering
multidimensional indexes Support of star and snowflake schemata
Full implementation into Transbase® Integration into the query processor (maintenance of compound
surrogates) Integration into the optimizer (interval generation, fact table
access, hierarchical pre-grouping)
Significant speedup of performance: Clustering vs. non-clustering organization: 2-20 Conventional grouping vs. hierarchical pre-grouping: 4-7
Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de
Feb 2003- 27 -
BTW 2003
Questions ????
Everything clear?
Otherwise contact:
Roland Pieringer
Tel: 089/62709-0
Transaction Software GmbH
www.transaction.de