G. Papastefanatos1, P. Vassiliadis2, A. Simitsis3, Y. Vassiliou1
(1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr
(2) University of Ioannina, Ioannina, Hellas (Greece) [email protected]
(3) HP Labs, Palo Alto, California, USA [email protected]
Design Metrics for Data Warehouse Evolution
ER'08, Barcelona, October 2008 2
Outline
• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions
ER'08, Barcelona, October 2008 3
Outline
• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions
ER'08, Barcelona, October 2008 4
Motivation
WWW Act1
Act2
Act3
Act4
Act5
Data warehouses are evolving environments, e.g.:–A dimension is removed or renamed–The structure of a dimension table is updated–A fact table is completely decoupled from a dimension–The measures of a fact table change–An ETL source is modified, etc.
ER'08, Barcelona, October 2008 5
Evolution Effects
• SW and data artifacts around the warehouse (e.g., ETL activities, materialized views, reports) are affected:– Syntactically – i.e., become invalid– Semantically – i.e., must conform to the new source database
semantics• Adaptation to new semantics
– time-consuming task– treated in most of the cases manually by the
administrators/developers• Evolution-driven design is missing
ER'08, Barcelona, October 2008 6
We would like to know…
• Can we measure and quantify in a principled way the vulnerability of certain parts of a data warehouse environment and find these constructs that are most sensitive to evolution?
• Can we predict and quantify the impact of a change towards the rest system?
• What are the “right” measures for evaluating the quality of the design of a data warehouse, with respect to its evolution capabilities?
ER'08, Barcelona, October 2008 7
Outline
• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions
ER'08, Barcelona, October 2008 8
Data Warehouse Schema Evolution Our approach
Mechanism for performing what-if analysis for potential changes of database configurations
Graph based representation of database constructs (i.e., relations, views, constraints, queries)
Annotation of graph with rules for adapting queries to database schema evolution
Evolving databases
QueriesDatabase Schema
Graph-based modeling
for uniform representation
Metrics for Evaluating Evolution
Design
Evolving applications
Rules for Handling Evolution
ER'08, Barcelona, October 2008 9
Graph based representation
map-select
map-select
S
Sgroup by
from
=whereop2
op1
GB group by
W.EMP#.FK
op
op
SS
S SS
S
EMP.PK
op
V
WORKS
EMP
SalNameEmp#
Emp# Hours Proj#
HOURS
Emp#
Module
Module
Module
Module
Qfrom
map-select map-selectSUMT_HOURS
op2
op1>=
50K
AND
wherewhere
S
Emp#
S
from
map-select
SELECT Emp#, SUM(Hours) as T_HOURSFROM VGROUP BY Emp#
CREATE VIEW V AS SELECT Emp#, HoursFROM EMP E, WORKS WWHERE E.Emp# = W.Emp#AND E.Sal >= 50K
WORKS (Emp#, Proj#,Hours)
EMP(Emp#, Name, Sal)
ER'08, Barcelona, October 2008 10
Graph Annotation with rules
We annotate
For reacting toW
ith rule
Set of graph elements· Query Node: Q1· Attribute Node: EMP.E_TITLE· View Node: Emps_Prjs, etc.
Set of rules· Propagate· Block· Prompt
Set of evolution events· Add Attribute· Delete Attribute· Rename View, etc.
1
3
2
ER'08, Barcelona, October 2008 11
Graph Adaptation
Annotated Query GraphEvent
Add attribute Phone to relation EMP
Transformed Query Graph
Q
Name
EID
Name
EID
S
S
EMPS
S
map-select
map-select …
ON attribute addition TO EMP THEN propagate
Q: SELECT EID, Name FROM EMP
Q: SELECT EID, Name, Phone FROM EMP
Q
Name
EID
Name
EID
S
S
EMPS
S
map-select
map-select …
ON attribute addition TO EMP THEN propagate
Phone
S
Phone
S
map-select
ER'08, Barcelona, October 2008 12
Outline
• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions
ER'08, Barcelona, October 2008 13
Simple Metrics
Simple: in-degree, out-degree, degreeEMP.Emp# is more
“important” than EMP.SAL, w.r.t. how many nodes depend directly on it
map-select
map-select
S
S
from
=whereop2
op1
W.EMP#.FK
op
op
SS
S SS
S
EMP.PK
op
V
WORKS
EMP
SalNameEmp#
Emp# Hours Proj#
HOURS
Emp#
Module
Module
Module
map-select
op2
op1>=
50K
AND
wherewhere
from
ER'08, Barcelona, October 2008 14
Transitive Metrics
Transitive: in-degree, out-degree, degree
Variant with a view + query is more “complicated” wrt how many nodes are involved in the propagation of EMP.Emp# towards the end
map-select
map-select
S
Sgroup by
from
=whereop2
op1
GB group by
W.EMP#.FK
op
op
SS
S SS
S
EMP.PK
op
V
WORKS
EMP
SalNameEmp#
Emp# Hours Proj#
HOURS
Emp#
Module
Module
Module
Module
Qfrom
map-select map-selectSUMT_HOURS
op2
op1>=
50K
AND
wherewhere
S
Emp#
S
from
map-select
ER'08, Barcelona, October 2008 15
Zoomed-out degrees
41V
WORKS
EMP
Q
3
3
• Only top-level nodes are retained• Only one edge between modules is retained weighted
with the number of edges suppressed
Simple degreesTransitive degrees
ER'08, Barcelona, October 2008 16
Entropy-based metrics
P(v|yk) =
Vy
i
k
i
yvpaths
yvpaths
),(
),(, for all nodes yi V.
V
WORKS
EMP
Q
Probability that a node v is affected by an event occurring on another node yi :
ExamplesP(Q|V) = 1/3, P(Q|EMP) = 1/3, P(V|WORKS) = 1/2
Entropy-based metrics - continued
Entropy of a node v: The “sensitivity” that a node v is affected by a random event on the graph.
ER'08, Barcelona, October 2008 17
Vy
ii
i
yvPyvPvH )|(log)|( 2 , for all nodes yi V.
ER'08, Barcelona, October 2008 18
Outline
• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions
Testbed Configuration
ER'08, Barcelona, October 2008 19
TPC-DS benchmark: Web Sales schema with 3 variants–Original (1 fact – 13 dimensions)–Surrounded with views–Customer dimensions merged
ER'08, Barcelona, October 2008 20
Distribution of Evolution Events
Operation Distribution 1 Distribution 2 Rename Measure 29% (15) 0% (0)Add Measure 25% (13) 0% (0)Rename Dimension Attribute 21% (11) 0% (0)Add Dimension Attribute 15% (8) 37% (25)Delete Measure 6% (3) 0% (0)Delete Dimension Attribute 4% (2) 44% (30)Delete FKs 0% 13% (9)Delete Dimension Table 0% 6% (4)
Distr 1: Recorded from the Greek Public sector
Distr 2: Migration to a pure star schema
ER'08, Barcelona, October 2008 21
Evaluating effectiveness
• Effectiveness – how well our metrics can “forecast” the impact of
events over the different constructs of the schema• Configuration
– we used mainly the Distr. 1 of events (real data) – we tested nine configurations based on
• variations of the schema– Web Sales (WS), Web Sales extended with views (WS-views),
star variant of Web Sales (WS-star)• variations of the policy
– Block-All, Propagate-All, Mixture
ER'08, Barcelona, October 2008 22
Events affecting dimensions
(a) WS schema (b) WS-star schema
ER'08, Barcelona, October 2008 23
Total no. events on Views (schema=WS-views, policy=block)
0
10
20
30
40
50
60
70
80
90
100
ALL_SALES
PROMOTIO
NAL_SALE
CUSTOMER_C
UST_ADDRESS
DAYS30
DAYS30_6
0
DAYS60_9
0
DAYS90_1
20
DAYS120
WEBSA
LES_IT
EM_D
ATEDIM AT PT
NumberAffected
StrengthOut
StrengthTotal
WS-views schema
Events affecting views
ER'08, Barcelona, October 2008 24
Events affecting queries
(a) WS schema (b) WS-star schema
ER'08, Barcelona, October 2008 25
Comparison of design configurations
(a) only affected queries (b) all affected nodes
for Distr. 1
ER'08, Barcelona, October 2008 26
Comparison of design configurations
(a) only affected queries (b) all affected nodes
for Distr. 2
ER'08, Barcelona, October 2008 27
Outline
• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions
Conclusions
• A framework for handling the impact of changes in a DW environment
• A set of metrics for DW evolution– simple– transitive– entropy-based
• An extensive experimental evaluation based on both, real and synthetic dataset
• Platform: Hecataeus– A tool for visualizing and performing what-if analysis
for evolution scenariosER'08, Barcelona, October 2008 28
ER'08, Barcelona, October 2008 29
Gracias!
Hecataeus: A tool for visualizing and performing what-if analysis
for evolution scenarios
http://www.cs.uoi.gr/~pvassil/projects/hecataeus/index.html
ER'08, Barcelona, October 2008 30
http://www.cs.uoi.gr/~pvassil/projects/architecture_graph/
Questions?
ER'08, Barcelona, October 2008 31
Gracias!
Sources: http://en.wikipedia.org/wiki/Image:Barcelona_-_planol_ciutat_vella_1860.jpg
http://maps.google.com