Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 220 times |
Download: | 1 times |
1
9
Data WarehouseCSC5301
Hachim Haddouti
2
9
About Me Hachim Haddouti, born in 1969, married, one baby 9 weeks Ph.D. in Computer Science (Database Management Systems) at Technical
University of Munich under Supervision of Prof. Bayer (Inventor of B-Tree) Master in Computer Science (Knowledge Management Systems) at
Techical University of Berlin Project Manager at BMW Munich Germany Senior Consultant and Project Manager at DaimlerChrysler Services (now
called T-Systems, Deutsche Telekom) Research Scientist with Prof. R. Bayer in Technical University of Munich UNESCO Consultant Visiting Scientist at Tsukuba University, Japan, University of Sta. Barbara
University, California; University of Catania, Italy; Beijing Univ China …
Area of Interest: DBMS, Digital Libraries, Document & Content & Knowledge Management, XML databases and Web technologies, Multilinguality etc.
More at www.haddouti.de
3
9
Do You Remember?
OLTP
DSSMD
MD
drill downRollUp
Slice/dice
MOLAPROLAP
Star schema
Data mining
Data cube
Data extractionFact table
4
9
Why DW?
Mining of mobile phone calls:
(Caller, Callee, Time, Duration, Geogr. Location) ~ 100 B/tuple
In Germany107 users * 10 calls/(day*user) * 100 B/call =
= 1010 B/day ~ 3*1012 B/year = 3 TB/year
Scanning data at 107 B/s takes
3*1012/107 = 3*105 s > 3 days
5
9Data Warehouses
“Subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process” Inmon (AP = analytical processing is missing)
Used for analysis of existing data
Resolves performance issues suffered by operational RDBMSs and OLTPs
6
9Data Warehouse Architecture
Figure 9.7
7
9
Model
• need abstract model with above operations
• suitable datastructures
• very large databases
Relational Model?
• one-dimensional access via primary key
• n*m „relationships“ are 2-dimensional: (FK1, FK2)
8
9
The Multidimensional Data Model
Requirements: must support typical analyses, queries like
Sales of a product group digital cameras in Nov, Dec Jan Feb in Munich area
sorted by sales of each product in € sorted by sales in numbers sorted by shops
9
9
Data model
ER Model a disaster for querying a
huge amount of data (time) not understandable for users
and they can not be navigated usefully by DBMS software.
hard to visualize; many possible connections between tables,
To avoid redundancy
MD Model better performance Better data
organisation Better visualization Business queries
(why, what if)
10
9
Typical DWH Analyses/Queries
What are the consequences of new orders for production capacity w.r. to investment, personnel, maintenance, extra hours, ...
Seasonal adaptions, e.g. when to produce how many skis, bikinis, convertibles, ...
Influence of external financing on profits
11
9
Operations:
• aggregation
• slice
• dice (cube)
• rollup to coarser level
• drill down to more detailed level
• grouping
• sorting
12
9
Data Cube Representation
13
9
Slicing on Time Dimension
14
9
Dicing on Part Dimension
15
9
Steps to build a DWH
Acquisition of data Data cleansing Storage Processing: AP Maintenance, ...
Not possible with classical DB-technology alone
16
9
On-Line Analytical Processing OLTP (online transaction processing) for operational
data of enterprise, e.g. in relational DBMS, IMS, SAP/R3, ...
DSS: Decision Support System to store data/information for strategic management decisions: aggregations, summaries, etc.
Optimized to work with data warehouses
Used to answer questions
Allows users to perceive data as a multidimensional data cube
Data mining
17
9
OLTP versus OLAP
Thematic focus OLTP: many small transactions (microscopic
view of business processes, individual steps at lowest level, single order, delivery)
OLAP: finances in general, personnel in general, ...
OLAP requires integration and unification of many detailed data into big picture
Time orientation Durability: data extracted once, no updates
18
9
Technical Comparison OLTP vs OLAP
OLTP: high rate of updates, several thousand t/s OLAP: read only transactions, very complex, DWH is
loaded at certain time intervals, e.g. after the end of the month, quarter Compute intensive Special systems with new access methods, e.g.
multidimensional data organization and access methods
Special OLAP systems necessary to offload OLTP systems
19
9
ROLAP and MOLAP
Solution 1: ROLAP relational online analytical processing, built on top of relational DBS, additional middleware or client front end (star schema)
Solution 2: MOLAP: multidimensional online analytical processing
new model new data organizations new algorithms new query languages new optimization techniques
20
9Data Warehouse Structure
21
9
Rules for OLAP Systems
Multidimensional conceptual view
Transparency
Accessibility
Consistent reporting performance
Client/server architecture
Generic dimensionality
22
9
Rules for OLAP Systems
Dynamic sparse matrix handling
Multiuser support
Unrestricted, cross-dimensional operations
Intuitive data manipulation
Flexible reporting
Unlimited dimensions and aggregation levels