+ All Categories
Home > Data & Analytics > Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Date post: 07-Jan-2017
Category:
Upload: andreas-buckenhofer
View: 58 times
Download: 1 times
Share this document with a friend
78
Andreas Buckenhofer Data Warehouse (Datenbanken II)
Transcript
Page 1: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Andreas Buckenhofer

Data Warehouse (Datenbanken II)

Page 2: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Overview of the lecture

Data Warehouse / DHBW / Fall 2016 / Page 2

1. Introduction to DWH, DWH Architectures - 20.10.2016

2. Data Modeling, OLAP 1 - 27.10.2016

3. OLAP 2, ETL - 03.11.2016

4. Metadata, DWH Projects, Advanced Topics - 10.11.2016

Page 3: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

What you will learn today

Data Warehouse / DHBW / Fall 2016 / Page 3

• After the end of this lecture you will be able to

• Understand the necessity for metadata

• Understand lifecycle of DWH projects

• Advanced topics like Operational BI, DWH Appliances, Cloud BI

Page 4: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Metadata

Data Warehouse / DHBW / Fall 2016 / Page 4

Page 5: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

What is metadata?

Data Warehouse / DHBW / Fall 2016 / Page 5

Data

about

other data

Page 6: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Types of metadata

Data Warehouse / DHBW / Fall 2016 / Page 6

• Business Metadata

• Definition of business vocabulary and relationships

• Definition of the value range

• Linkage to physical representation

Page 7: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Types of metadata

Data Warehouse / DHBW / Fall 2016 / Page 7

• Report metadata

• Report definitions

• Data sources

• Column definitions

• Computations

• Logical and physical metadata of data model

• Table structure

• Definition of columns

• Relationships between tables and columns

• Dimension hierarchy

Page 8: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Types of metadata

Data Warehouse / DHBW / Fall 2016 / Page 8

• ETL metadata

• Job design

• Input-/output tables

• computations

• Mappings / transformations

• Operational meta data of ETL jobs

• Start time and duration

• Return code

Page 9: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

The Areas of Metadata

Data Warehouse / DHBW / Fall 2016 / Page 9

Page 10: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

The Areas of Metadata Connected

Data Warehouse / DHBW / Fall 2016 / Page 10

Page 11: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Why a common metadata repository?

Data Warehouse / DHBW / Fall 2016 / Page 11

• Components of a data warehouse system are interconnected

• BI report user has to know

• the meaning, definitions of the shown measures, „KPIs“ (key performance

indicators)

• BI report designer has to know

• the table definitions

• the meaning of the column values

• ETL job designer has to know

• the table definitions

• the exact definition of the measures

• Database administrator has to know

• Which tables are used by ETL jobs, reports

Page 12: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Why a common metadata repository?

Data Warehouse / DHBW / Fall 2016 / Page 12

• Metadata driven ETL development

• Generate parts of ETL code

• increasing interest for Data Vault development projects

• Tools e.g. MID Innovator, Quipu, AnalytiX DS, Talend, Pentaho, Wherescape, and

others

• Common metadata repository ensures consistency across all components

• Many tools involved (DB, ETL, Frontend, …)

• Enables cross component metadata analysis

• Data Lineage

• Impact Analysis

Page 13: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Where does a field of data in this report come from?

Data Warehouse / DHBW / Fall 2016 / Page 13

• “Data lineage”

• Import & Browse Full BI Report Metadata

• Navigate through report attributes

• Visually navigate through data lineage across tools

• Combines

operational &

design viewpoint

Page 14: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

What happens if I change this column?

Data Warehouse / DHBW / Fall 2016 / Page 14

• “Impact Analysis”

• Show complete change impact in graphical or list form

• Includes impact on reports in BI tools

• Visually navigate through impacted objects across tools

• Allows impact analysis on any object type

Page 15: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

What does this field mean?

Data Warehouse / DHBW / Fall 2016 / Page 15

• Show relationships between business terms, data model entities, and technical and

report fields

• Requires cross-tool mapping of business terms

• Allows field meaning to be understood

• Allows business term relationships to be understood

Page 16: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

What objects does this user own?

Data Warehouse / DHBW / Fall 2016 / Page 16

• Shows objects that user manages

• Shows stewardship relationships on business terms

• Shows user group associations

Page 17: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

What happened on the last job run?

Data Warehouse / DHBW / Fall 2016 / Page 17

• Navigation through complete job details

• Navigation of complete operational metadata

Page 18: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Warehousing Projects

Data Warehouse / DHBW / Fall 2016 / Page 18

Page 19: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Warehouse

FrontendBackend

External data

sources

Internal data

sources

Top-Down vs Bottom-Up Approach

Data Warehouse / DHBW / Fall 2016 / Page 19

Staging Layer

(Input Layer)

Core Warehouse

Layer

(Storage Layer)

Reporting Layer

(Output Layer)

(Mart Layer)

Top Down (Inmon)

Bottom Up (Kimball)

Page 20: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Top-Down vs Bottom-Up Approach

Data Warehouse / DHBW / Fall 2016 / Page 20

• Top-Down (Inmon)

• Design Core Warehouse Layer = integrated data model first

• Design data marts afterwards

• Bottom-Up (Kimball)

• Design data marts first

• Combine data Marts together

• DWH Bus architecture

• conformed dimensions to integrate different data marts / fact tables

Page 21: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Think big, start local

Data Warehouse / DHBW / Fall 2016 / Page 21

• Both approaches have their down-sides

• Top-Down takes enormous initial effort to build data model for Core Warehouse

Layer

• Bottom-Up is risky as central / integrated focus is lost

�Think big, start local

• Small iterations

• Waterfall approach taking 8-12 months or longer often fails or does not deliver in

time

• Always think about how to achieve flexible data integration in Core Warehouse Layer

• Data Marts can be dropped and reloaded from Data in the Core Warehouse Layer

• Dropping the Core Warehouse Layer not possible. Data loss (history)

Page 22: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Why do DWH projects fail?

Data Warehouse / DHBW / Fall 2016 / Page 22

Page 23: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Critical success factors for building a data warehouse

Data Warehouse / DHBW / Fall 2016 / Page 23

• Answer most important questions of participating business units

• Provide high-quality data

• Introduction in time

• Usage of modern technology

• Business orientation

• Easy to use

• Executive sponsor

• Patience – user acceptance evolves over time

• “Quick wins”

Page 24: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

DWH project phases

Data Warehouse / DHBW / Fall 2016 / Page 24

Page 25: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

1. Project start

Data Warehouse / DHBW / Fall 2016 / Page 25

• Describe future situations and scenarios

• No technical details

• Develop multiple solutions and discuss their advantages and disadvantages

• Maybe start with a Proof of Concept (PoC)

• Estimate expected amount of data

Page 26: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

2. Analysis/Technical Concept

Data Warehouse / DHBW / Fall 2016 / Page 26

• Information requirements analysis

• Logical modeling of data / information

• Transform knowledge from interviews into logical data schemas (represented by

Multidimensional or Star Schemas)

• Define transformation and unification rules (from data in operative systems to the

data warehouse)

• Identify Frontend requirements

• Define dimensions and measures

• Define reports (layout, prompts, output fields, filter, etc)

• Analyze operative data sources

• Very important task to get an understanding of source data, structures of the data,

data quality

Page 27: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

2. Analysis/Technical Concept

Data Warehouse / DHBW / Fall 2016 / Page 27

• Data and Architectural Concept

• Important: Scalability

• Top-down design

• Transform abstract data model into the world of hardware (e.g. separate servers for

DB, ETL, Frontend), software, scalability, return times, etc.

• Ensure that data warehouse works together with other IT systems

• Tool Selection / Evaluation

• Choose tools: ETL tool, database, Frontend tools

• Has to know own tool-requirements very detailed

• Aspects: performance, availability and uniformness (interfaces, query languages,

etc.)

Page 28: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

3. System Design

Data Warehouse / DHBW / Fall 2016 / Page 28

• Transition from business view to technical view

• Transform requirements into actual solutions

• Describe how to implement the system

• Create catalog of actual technical and other requirements

Page 29: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Possible DWH Analysis and Design work products

Data Warehouse / DHBW / Fall 2016 / Page 29

• How to document / identify requirements?

• Must be easy to understand from non-technical users during Analysis/Technical

concept phase

• Must provide sufficient information for System Design phase

• The following slides provide some example work products that are produced during

Analysis/Technical concept phase and may be refine during System Design phase

Page 30: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Possible DWH Analysis and Design work products

Data Warehouse / DHBW / Fall 2016 / Page 30

Source: Lawrence Corr: Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema

BEAM = Business Event

Analysis and Modeling

Page 31: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Possible DWH Analysis and Design work products

Data Warehouse / DHBW / Fall 2016 / Page 31

Source: Lawrence Corr: Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema

Page 32: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Possible DWH Analysis and Design work products

Data Warehouse / DHBW / Fall 2016 / Page 32

Source: Lawrence Corr: Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema

Page 33: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Possible DWH Analysis and Design work products

Data Warehouse / DHBW / Fall 2016 / Page 33

Source: Lawrence Corr: Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema

Page 34: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Possible DWH Analysis and Design work products

Data Warehouse / DHBW / Fall 2016 / Page 34

Source: Lawrence Corr: Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema

SK = Surrogate Key

BK = Business Key

CV = Current Value (SCD1)

GD = Granular Dimension

NA = Nonadditive fact

FA = Fully Additive fact

SA = Semiadditive fact

PS = Periodic Snapshot

RP = Role-playing

HV = One Historic value (SCD2)

Page 35: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

4. Implementation/Realization

Data Warehouse / DHBW / Fall 2016 / Page 35

• Data storage

• Install and configure database system

• Create physical data schema for all DWH layers

• Usage of database design tools

Page 36: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

4. Implementation/Realization

Data Warehouse / DHBW / Fall 2016 / Page 36

• Data Integration, ETL

• Transfer data from company-internal and -external sources into the data warehouse

• Connect data sources

• Eliminate mistakes / inconsistencies in data / possible error origins

• Transform data to unique coding

• Aggregate data

• Frontend

• Set up front ends, OLAP tools

• Connect to Data Mart Layer

• Create reports or other visualizations

Page 37: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

5. Test & Rollout

Data Warehouse / DHBW / Fall 2016 / Page 37

• Authorization concept

• Access control

• Not static

• Enable administration

• Production concept

• Concept for initial load and incremental/delta loads

• Concepts to keep the system running, even if amount of data and users increases

exponentially

• Define responsibilities

• Educate users

• Classes for different types of users

Page 38: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

BICC: BI Center of Excellence

Data Warehouse / DHBW / Fall 2016 / Page 38

• Organizational teams that coordinate and standardize DWH activities within an (end

user) organization

• Define standards and create BI portfolio (e.g. which tools/products to use)

• Create DWH architecture and govern BI activities

• Establish processes for business and IT interaction DWH application development

• Monitor DWH/BI market for new trends

• Determine skills and experience of Business users

Page 39: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Exercises

Data Warehouse / DHBW / Fall 2016 / Page 39

• List 3 reasons why common metadata is important in the context of warehousing

• Define 3-5 criteria for the evaluation of an ETL tool

• How does a relational DBMS (like Oracle, DB2, MS SQL Server) meet these

requirements?

Page 40: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Exercises

Data Warehouse / DHBW / Fall 2016 / Page 40

• List 3 reasons why common metadata is important in the context of warehousing

• Components of a data warehouse system are interconnected (high complexity!)

• Metadata driven ETL development

• Common metadata repository ensures consistency across all components

• Enables cross component metadata analysis

Page 41: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Exercises

Data Warehouse / DHBW / Fall 2016 / Page 41

• Define 5 criteria for the evaluation of an ETL tool

• Supplier profile

• Support

• HW/SW requirements

• Costs

• Usability

• Reliability

• Performance and scalability

• Multi-tenant

• Interfaces

• Scheduling

Page 42: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Exercises

Data Warehouse / DHBW / Fall 2016 / Page 42

• How does a relational DBMS meet these requirements?

• RDBMS provide many of the functionalities but additional programming required

• RDBMS are often used for ETL/ELT by programming with SQL, PL/SQL, SQLT, etc

ETL Tool Manual ETL

Informatica, Talend, Oracle ODI, etc. SQL, PL/SQL, SQLT, etc.

Separate license No additional license

Workflow, error handling, and restart/recovery

functionality included

Workflow, error handling, and restart/recovery

functionality must be implemented manually

Impact analysis and where-used (lineage)

functionality available

Impact analysis and where-used (lineage)

functionality difficult

Faster development, easier maintenance Slower development, more difficult maintenance

Additional (Tool-) Know How required Know How often available

Page 43: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Frontend

Data Warehouse / DHBW / Fall 2016 / Page 43

Page 44: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Interface to the end user

Data Warehouse / DHBW / Fall 2016 / Page 44

• Reporting (Standard, ad-hoc)

• OLAP

• Dashboards, Scorecards

• Advanced Analytics / Data Mining / Text Mining

• Search & Discovery

Page 45: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Reporting (Standard, ad-hoc)

Data Warehouse / DHBW / Fall 2016 / Page 45

• Standard Reports

• Prepared static reports that can be executed at request by end users

• Are executed at the end of an ETL process and e.g. send by email to end users

• Normally based on fact tables and its dimensions

• Reports are often lists similar to Excel-Sheets but can also contain graphics (e.g. line

charts)

• Ad-hoc Reports

• End users create their own reports („Self service“)

Page 46: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

OLAP

Data Warehouse / DHBW / Fall 2016 / Page 46

• ROLAP / MOLAP Client Frontend

• Prepared cubes (multidimensional or relational fact tables)

• User can perform interactive analysis of data

Page 47: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Dashboards, Scorecards

Data Warehouse / DHBW / Fall 2016 / Page 47

• „Progress reports“

• Provide an overall view of KPIs (Key Performance Indicators)

• Combination of several elements from Reporting and/or OLAP (e.g. line charts) into an

overall view (like a „cockpit“)

• Dashboard is more focused on operational goals

• High-level overview what is happening

• Scorecard is more focused on strategic goals

• Plan a strategy and identify why something happens

Page 48: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Advanced Analytics / Data Mining / Text Mining

Data Warehouse / DHBW / Fall 2016 / Page 48

• See Mr. Bollinger‘s lecture

Page 49: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Search & Discovery

Data Warehouse / DHBW / Fall 2016 / Page 49

• Not just numerical data

• Analysis of new data types gets more and more important

• Text

• GPS coordinates

• Pictures

• Videos

• Data can be available in RDBMS (e.g. text modules/indexes available), Hadoop or SQL

DBs

Page 50: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Many graphical elements to use in reports

Data Warehouse / DHBW / Fall 2016 / Page 50

Page 51: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Many graphical elements to use in reports

Data Warehouse / DHBW / Fall 2016 / Page 51

Source: https://github.com/d3/d3/wiki/Gallery

Page 52: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Many graphical elements … chamber of horror

Data Warehouse / DHBW / Fall 2016 / Page 52

Source: Hichert / Faisst, http://www.backup-page.hichert.com/

Page 53: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Information Design

Data Warehouse / DHBW / Fall 2016 / Page 53

• Information design is the practice of presenting information in a way that fosters

efficient and effective understanding of it.

(source: Wikipedia, https://en.wikipedia.org/wiki/Information_design )

• Some authors are well known for their criticism of many graphical representations -

they provide rules for good information design

• Edward Tufte

• Stephen Few

• Rolf Hichert

• Define standards, e.g.

• use always the same colors and with care (red = negative, green = positive)

• pie charts are rarely useful and should be avoided (better use bar chart or line chart)

• No 3D elements as these elements don’t enhance information but introduce clutter

Page 54: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Table with integrated bar charts

Data Warehouse / DHBW / Fall 2016 / Page 54

Source: Hichert, http://www.hichert.com/de/resource/table-template-02/

Page 55: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

BI end user roles

Data Warehouse / DHBW / Fall 2016 / Page 55

• Consumers / BI Users

• use reports and dashboards to obtain information

• Power Users

• Use reports and dashboards to obtain information

• Create new reports and dashboards

• Data Scientists

• Statistical / mathematical geeks

• Analyze / explore data

• Need to analyze raw (non-cleansed, non-transformed) data

Page 56: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Visual data discovery and automatic data analysis

Data Warehouse / DHBW / Fall 2016 / Page 56

Source: Kohlhammer, J., Proff, D.U., Wiener, A.: Visual Business Analytics – Effektiver Zugang zu Daten und Informationen. dpunkt Verlag GmbH, Heidelberg (2013b)

Page 57: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Newer / Advanced Topics

Data Warehouse / DHBW / Fall 2016 / Page 57

Page 58: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Newer / advanced Topics

Data Warehouse / DHBW / Fall 2016 / Page 58

1. Operational Data Warehousing

2. Data Warehouse Appliances

3. Cloud BI

Page 59: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Operational Data Warehousing

Data Warehouse / DHBW / Fall 2016 / Page 59

• Classical“ Data Warehouses

• Information in the warehouse used to support strategic business decisions

• Kept separate from operational systems

• Load of new data only in larger intervals (mostly weekly or monthly)

• Shorter intervals not required by users

• Huge system resources of the ETL process made it necessary to run it in low

usage periods of the warehouse (like night or weekend)

• Near/Real Time Operational Data Warehousing

• Information in the warehouse used for tactical business decisions as well

• Low latency of information in data warehouse therefore needed

• Not only mathematical aggregations

Page 60: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Why operational Data Warehousing?

Data Warehouse / DHBW / Fall 2016 / Page 60

• With classical data warehouses users have to access two types of systems to get a

complete image of a customer (for instance for CRM applications or in call centers)

• the data warehouse to see what happened in the past

• the OLTP systems to get the most current information

• With an operational data warehouse

• all this information is in one system

• tighter integration with operational systems is easier

• for instance personalized offers � „closing the loop“

Page 61: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Examples of Operational Data Warehousing

Data Warehouse / DHBW / Fall 2016 / Page 61

New applications and data sources

Increase demand for an

Operational DWH, e.g.

• Industry 4.0 / Smart Factory

• Internet Of Things

• Internet of medical things

• Connected Cars

Source: Gluchowski: Analytische Informationssysteme, 5.Aufl., p. 277

Replace pen

& paper with

electronic

workflows

Decision support for

each end user and not

only management

Increasing demand to

publish same content

on different devices

Page 62: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

SmartFactory Service Platform

Data Warehouse / DHBW / Fall 2016 / Page 62

Source: Gluchowski: Analytische Informationssysteme, 5.Aufl., p. 279

Workers

getting

alarms

Containing

and

displaying

complex

manuals,

e.g. during

repair

New data

source

sending lots

of data with

high speed

Real-Time

data

required for

automated

actions

Page 63: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Challenges for Operational Data Warehousing

Data Warehouse / DHBW / Fall 2016 / Page 63

• Real time ETL

• Triggered by business transactions in the operational systems

• Executed asynchronously

• Incremental real-time load

• Tighter integration of operational and data warehouse systems

• DWHs become „mission critical“

• Higher requirements on availability and performance

• Higher „transactional“ system load on data warehouse system

• Data warehouse DB has to deal with typical DWH system load and transactional

load

• Not just aggregations on high amount of data rows

Page 64: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Comparison classical DWH – Operational DWH

Data Warehouse / DHBW / Fall 2016 / Page 64

Classical DWH Operational DWH

Strategic

• Passive

• Historical trends

Tactical

• Execution of strategy

• Prediction

Batch

• E.g. daily batch

Real-Time

• Up-to-data view

Availability

• System can be down for maintenance and

longer response times for some reports are

accepted

Availability

• System becomes critical and must fulfill high

availability and performance requirements

Page 65: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Warehouse Appliances

Data Warehouse / DHBW / Fall 2016 / Page 65

• Setting up and configuring a data warehouse system is a complex task

• Hardware

• Servers

• Storage

• Network

• Connectivity to source systems

• Software

• Database management system

• ETL software

• Reporting and analytics software

• ...

• An optimal performance of the whole system is difficult to achieve

Page 66: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Data Warehouse Appliances

Data Warehouse / DHBW / Fall 2016 / Page 66

• Data Warehouse Appliances are

• Pre-configured and pre-tested hard- and software configurations developed for

running a data warehouse

• Optimized for data warehousing workload

• They are ready to be used after they are delivered to the customer

• Only suited for running OLAP

• In contrast RDBMS: one size fits all: RDBMS are suited for OLTP, OLAP and mixed

workloads

• Products, e.g. Teradata, IBM Netezza (IBM PureData System for Analytics), HP Vertica,

Exasol, Oracle Exadata, MS Analytic Platform System

Page 67: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Simplicity (e.g. Netezza)

Data Warehouse / DHBW / Fall 2016 / Page 67

Page 68: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Typical enhancements

Data Warehouse / DHBW / Fall 2016 / Page 68

• Move as many operations as possible to storage cell instead of moving data to the DB

server

• E.g. filter data already at storage cell and not at DB server

• Avoid transferring unnecessary data

• Column-oriented In-memory storage with high compression

• Many appliances are based on shared nothing architecture

• Each node is independent

• Each node has its own storage or memory

• Parallel processing simpler and faster as no overhead due to contention

Page 69: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Cloud BI

Data Warehouse / DHBW / Fall 2016 / Page 69

• BI applications (database, ETL tools, Frontend) are hosted in a public cloud, e.g.

• AWS (Amazon Web Services)

• Microsoft Azure

• …

• Many tools nowadays are available in the cloud first

• Vendors try to force customers to use clouds

• Or even available in the cloud only

• E.g. Microsoft Power BI

• Security concerns for sensitive data

• But new data source coming from Internet. Storing the data in a (public) cloud can

make sense, e.g.

• Connected Cars, IOT in general

Page 70: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Cloud BI architecture

Data Warehouse / DHBW / Fall 2016 / Page 70

Source: Lang: Business Intelligence erfolgreich umsetzen, 5.Aufl., p. 185

Page 71: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Cloud BI architecture

Data Warehouse / DHBW / Fall 2016 / Page 71

• Analytics as a service

• Provide complete BI (Analytics) SW stack including

• data storage

• data integration (ETL)

• data visualization and/or data modeling (Frontend)

• Meta data management

• Data as a service

• Provide quality data for further usage

• Data marketplace

Page 72: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Cloud BI – Data Warehousing services

Data Warehouse / DHBW / Fall 2016 / Page 72

Source: http://db-engines.com/en/system/Amazon+Redshift%3BSnowflake

Page 73: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Snowflake Architecture

Data Warehouse / DHBW / Fall 2016 / Page 73

Don‘t confuse

Snowflake product

with Snowflake

dimensional model

from session 2

Page 74: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Snowflake Architecture

Data Warehouse / DHBW / Fall 2016 / Page 74

• Snowflake Storage

• Snowflake loads data into its internal optimized, compressed, columnar format

• Snowflake itself uses (!) Amazon Web Service’s S3 (Simple Storage Service) cloud

storage

• Query Processing

• Each virtual warehouse is an MPP (Multi Parallel Processing) compute cluster

composed of multiple compute nodes allocated by Snowflake from Amazon EC2

• Each virtual warehouse is an independent compute cluster that does not share

compute resources with other virtual warehouses

Page 75: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Snowflake Architecture

Data Warehouse / DHBW / Fall 2016 / Page 75

• Cloud Services

• Authentication and access control

• Infrastructure management

• Metadata management

• Query parsing and optimization

• Security

Page 76: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Exercise

Data Warehouse / DHBW / Fall 2016 / Page 76

• For one of the following companies

• Bank

• Telecommunication company

• Online book store (like Amazon.com)

• Discount furniture store (like IKEA)

• Airline

• Car manufacturer

sketch an application based on a classical and

another based on a (near) real time operational data warehouse

Page 77: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Daimler TSS GmbH

Exercise

Data Warehouse / DHBW / Fall 2016 / Page 77

• Compare lecture 1. Possible solutions

• Standard Data Warehouse Architecture

• Data Vault 2.0 Architecture (Dan Linstedt) including log-based discovery (CDC) or

replication for Data extraction

Page 78: Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Thank you!

Daimler TSS GmbH

Wilhelm-Runge-Straße 11, 89081 Ulm, Germany / Phone +49 731 505-06 / Fax +49 731 505-65 99

[email protected] / Internet: www.daimler-tss.com / Intranet portal code: @TSS

Domicile and Court of Registry: Ulm / Commercial Register No.: 3844 / Management: Christoph Röger (Vorsitzender), Steffen Bäuerle


Recommended