+ All Categories
Home > Documents > LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 ·...

LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 ·...

Date post: 12-Jul-2020
Category:
Upload: others
View: 10 times
Download: 2 times
Share this document with a friend
33
A company of Daimler AG LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA MODELING INTRO ANDREAS BUCKENHOFER, DAIMLER TSS
Transcript
Page 1: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

A company of Daimler AG

LECTURE @DHBW: DATA WAREHOUSE

PART X: DWH DATA MODELING INTROANDREAS BUCKENHOFER, DAIMLER TSS

Page 2: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

ABOUT ME

https://de.linkedin.com/in/buckenhofer

https://twitter.com/ABuckenhofer

https://www.doag.org/de/themen/datenbank/in-memory/

http://wwwlehre.dhbw-stuttgart.de/~buckenhofer/

https://www.xing.com/profile/Andreas_Buckenhofer2

Andreas BuckenhoferSenior DB [email protected]

Since 2009 at Daimler TSS Department: Big Data Business Unit: Analytics

Page 3: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

ANDREAS BUCKENHOFER, DAIMLER TSS GMBH

Data Warehouse / DHBWDaimler TSS 3

“Forming good abstractions and avoiding complexity is an essential part of a successful data architecture”

Data has always been my main focus during my long-time occupation in the area of data integration. I work for Daimler TSS as Database Professional and Data Architect with over 20 years of experience in Data Warehouse projects. I am working with Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things, experiment, and program every day.

I share my knowledge in internal presentations or as a speaker at international conferences. I'm regularly giving a full lecture on Data Warehousing and a seminar on modern data architectures at Baden-Wuerttemberg Cooperative State University DHBW. I also gained international experience through a two-year project in Greater London and several business trips to Asia.

I’m responsible for In-Memory DB Computing at the independent German Oracle User Group (DOAG) and was honored by Oracle as ACE Associate. I hold current certifications such as "Certified Data Vault 2.0 Practitioner (CDVP2)", "Big Data Architect“, „Oracle Database 12c Administrator Certified Professional“, “IBM InfoSphere Change Data Capture Technical Professional”, etc.

DHBWDOAG

xing

Contact/Connect

Page 4: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

As a 100% Daimler subsidiary, we give

100 percent, always and never less.

We love IT and pull out all the stops to

aid Daimler's development with our

expertise on its journey into the future.

Our objective: We make Daimler the

most innovative and digital mobility

company.

NOT JUST AVERAGE: OUTSTANDING.

Daimler TSS

Page 5: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

INTERNAL IT PARTNER FOR DAIMLER

+ Holistic solutions according to the Daimler guidelines

+ IT strategy

+ Security

+ Architecture

+ Developing and securing know-how

+ TSS is a partner who can be trusted with sensitive data

As subsidiary: maximum added value for Daimler

+ Market closeness

+ Independence

+ Flexibility (short decision making process,

ability to react quickly)

Daimler TSS 5

Page 6: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

Daimler TSS

LOCATIONS

Data Warehouse / DHBW

Daimler TSS China

Hub Beijing

10 employees

Daimler TSS Malaysia

Hub Kuala Lumpur

42 employeesDaimler TSS IndiaHub Bangalore22 employees

Daimler TSS Germany

7 locations

1000 employees*

Ulm (Headquarters)

Stuttgart

Berlin

Karlsruhe

* as of August 2017

6

Page 7: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

After the end of this lecture you will be able to

Understand differences in data modeling between OLTP and OLAP

Understand why data modeling is important

Understand data modeling in the Core Warehouse Layer and Data Mart Layer

• Data Vault

• Dimensional Model / Star schema

Understand dimensions and facts

Understand ROLAP & MOLAP

WHAT YOU WILL LEARN TODAY

Data Warehouse / DHBWDaimler TSS 7

Page 8: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

Requirements

• Efficient update and delete operations

• Efficient read operations

• Avoid contradiction in the data – don’t store data twice or multiple times

• Easy maintenance of the data model

→As little redundancy as possible in the data model

DATA MODELING FOR OLTP APPLICATIONS

Data Warehouse / DHBWDaimler TSS 8

Page 9: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

First Normal Form (1NF):

• A relation/table is in first normal form if

• the domain of each attribute contains only atomic (simple, indivisible) values.

• the value of any attribute in a tuple/row must be a single value from the domain of that attribute, i.e. no attribute values can be sets

CODD‘S NORMAL FORMS FOR DB RELATIONS: 1NF

Data Warehouse / DHBWDaimler TSS 9

Page 10: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

CODD‘S NORMAL FORMS FOR DB RELATIONS: 1NF

Data Warehouse / DHBWDaimler TSS 10

CD_ID Album Founded Titels

11 Anastacia – Not that kind 1999 1. Not that kind, 2. I‘m outta love, 3 Cowboys & Kisses

12 Pink Floyd – Wish you were here 1964 1. Shine on you crazy diamond

13 Anastacia – Freak of Nature 1999 1. Paid my dues

CD_ID Album Performer Founded Track Titels

11 Not that kind Anastacia 1999 1 Not that kind

11 Not that kind Anastacia 1999 2 I‘m outta love

11 Not that kind Anastacia 1999 3 Cowboys & Kisses

12 Wish you were here Pink Floyd 1964 1 Shine on you crazy diamond

13 Freak of Nature Anastacia 1999 1 Paid my dues

Page 11: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

Second Normal Form (2NF):

• In 1st normal form

• Every non-key attribute is fully dependent on the key. There are no dependencies between a partial key and a non-key field.

CODD‘S NORMAL FORMS FOR DB RELATIONS: 2NF

Data Warehouse / DHBWDaimler TSS 11

Page 12: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

CODD‘S NORMAL FORMS FOR DB RELATIONS: 2NF

Data Warehouse / DHBWDaimler TSS 12

CD_ID Album Performer Founded Track Titels

11 Not that kind Anastacia 1999 1 Not that kind

11 Not that kind Anastacia 1999 2 I‘m outta love

11 Not that kind Anastacia 1999 3 Cowboys & Kisses

12 Wish you were here Pink Floyd 1964 1 Shine on you crazy diamond

13 Freak of Nature Anastacia 1999 1 Paid my duesCD_ID Track Titels

11 1 Not that kind

11 2 I‘m outta love

11 3 Cowboys & Kisses

12 1 Shine on you crazy diamond

13 1 Paid my dues

CD_ID Album Performer Founded

11 Not that kind Anastacia 1999

12 Wish you werehere

Pink Floyd 1964

13 Freak of Nature Anastacia 1999

Page 13: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

Third Normal Form (3FN):

• In 2nd normal form

• No functional dependencies between non key fields: a non-key attribute is dependent from a PK only

CODD‘S NORMAL FORMS FOR DB RELATIONS: 3NF

Data Warehouse / DHBWDaimler TSS 13

Page 14: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

CODD‘S NORMAL FORMS FOR DB RELATIONS: 3NF

Data Warehouse / DHBWDaimler TSS 14

CD_ID Track Titels

11 1 Not that kind

11 2 I‘m outta love

11 3 Cowboys & Kisses

12 1 Shine on you crazy diamond

13 1 Paid my dues

CD_ID Album Performer Founded

11 Not that kind Anastacia 1999

12 Wish you werehere

Pink Floyd 1964

13 Freak of Nature Anastacia 1999

CD_ID Album Performer

11 Not that kind Anastacia

12 Wish you werehere

Pink Floyd

13 Freak of Nature Anastacia

Performer Founded

Anastacia 1999

Pink Floyd 1964

Page 15: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

CODD‘S NORMAL FORMS - SUMMARY FROM 1NF TO 3NF

Data Warehouse / DHBWDaimler TSS 15

CD_ID Track Titels

11 1 Not that kind

11 2 I‘m outta love

11 3 Cowboys & Kisses

12 1 Shine on you crazy diamond

13 1 Paid my dues

CD_ID Album Performer

11 Not that kind Anastacia

12 Wish you werehere

Pink Floyd

13 Freak of Nature Anastacia

Performer Founded

Anastacia 1999

Pink Floyd 1964

CD_ID Album Founded Titels

11 Anastacia – Not that kind 1999 1. Not that kind, 2. I‘m outta love, 3 Cowboys & Kisses

12 Pink Floyd – Wish you were here 1964 1. Shine on you crazy diamond

13 Anastacia – Freak of Nature 1999 1. Paid my dues

nn

Page 16: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

WHY (DATA) MODELING?

Data Warehouse / DHBWDaimler TSS 16

Page 17: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

“Data modeling is the process of learning about the data, and regardless of technology,this process must be performed for a successful application.”

• Learn about the data and promote collective data understanding

• Derive security classification and measures

• Design for performance

• Accelerate development

• Improve Software quality

• Reduce maintenance costs

• Generate code

• NoSQL Schema-on-read: understand model versions after years

WHY (DATA) MODELING?

Data Warehouse / DHBWDaimler TSS 17

Source quote: Steve Hoberman: Data Modeling for Mongo DB, Technics Publications 2014

Page 18: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

IMPORTANCE OF A GOOD DATABASE DESIGN

Data Warehouse / DHBWDaimler TSS 18

Page 19: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

Different levels of abstraction:

• Conceptual (domain) model

• Focus on (main) entities and its business definitions!

• No attributes

• Logical design

• Relational data model (independent of a DBMS or technology)

• Logic can't affect performance = no performance optimization on this level

• Physical implementation

• Representation of a data design for a specific DBMS

• RDBMS are the closest to physical independance

CONCEPTUAL – LOGICAL – PHYSICAL LEVEL

Data Warehouse / DHBWDaimler TSS 19

Page 20: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

Scott Ambler – Disciplined agile delivery

• Do you need it?

• What do you want to achieve?

• What is the value?

• Which representation do you use: 3NF/UML/Object model/ADAPT/Data Vault?

CONCEPTUAL AND LOGICAL LEVEL

Data Warehouse / DHBWDaimler TSS 20

Page 21: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other tool / product. What about data model training?

DATA MODELING - WHAT ABOUT DATA MODELING TRAINING?

Data Warehouse / DHBWDaimler TSS 21

Sources: http://www.dbdebunk.com/2017/06/this-week.html

Page 22: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

MEASURING THE QUALITY OF A DATA MODELDATA MODEL SCORECARD

Data Warehouse / DHBWDaimler TSS 22

Source: Steve Hoberman - Data Modeling Scorecard, Technics Publication 2015

Page 23: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

The diagram shows a typical OLTP data model

• Customers and products have uniqueids and some descriptive attributes

• A customer can place an order on a specific date

• The order contains one or more products

EXERCISE: OLTP DATA MODEL FOR DWH

Data Warehouse / DHBWDaimler TSS 23

Page 24: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

Now consider DWH requirements like non-volatile and time-variant data

• Customer Bush marries and takesher husband’s last name

• Product number 5 gets a priceincrease

How would you solve such

requirements in a data model

for the Core Warehouse Layer?

EXERCISE: OLTP DATA MODEL FOR DWH

Data Warehouse / DHBWDaimler TSS 24

Page 25: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

Possible solutions:

• Add timestamp column as part of the primary key

• For all tables, not only for specific tables (e.g. product, customer)

• Composite keys can become inefficient and impractical

• New tables with head and version data to avoid redundancy• Head table contains static data that does not change (e.g. customer id, birthdate)

• Version table contains data that changes (e.g. last name, comments)

• Store every change in log tables• Querying tables can become difficult and slow if history is required ("main" table

+ log tables)

EXERCISE: OLTP DATA MODEL FOR DWH

Data Warehouse / DHBWDaimler TSS 25

Page 26: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

BAD MODELS

Data Warehouse / DHBWDaimler TSS 26

Source: Corr / Stagnitto: Agile Data Warehouse Design, DecisionOne Press, 2011, page 5

Create a SQL statement for:

How many "Order Transactions"

have been created by"Person/Organisation"?

Page 27: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

• 3NF is inefficient for query processing

• 3NF models are difficult to understand

• 3NF gets even more complicated with history added

• 3NF not suited for „new“ data sources (JSON, NoSQL, etc.)

→ DWH needs own data modeling approaches for the Core Warehouse Layer and the Mart Layer

DISADVANTAGES OF 3NF FOR DWH

Data Warehouse / DHBWDaimler TSS 27

Page 28: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

What are candidates for primary keys?

PRIMARY KEYSNATURAL KEYS, SEQUENCES, HASH KEYS

Data Warehouse / DHBWDaimler TSS 28

Natural Keys

„intelligent“ keys that have a meaning to the

business user

VIN (vehicle identifier)ISO country codes, e.g.

DE, US, UK

Generated Keys

System-generated, unique values, e.g.

sequences (increments)1, 2, 3, 4, 5, etc.

GUIDs (globally unique identifiers) contain e.g.

MAC address + timestamp to make an

identifier unique.

Hash Keys

(composite) Natural key run through a hash

function, e.g.Md5(VIN)

Page 29: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

NATURAL KEYS

Data Warehouse / DHBWDaimler TSS 29

Advantages Disadvantages

Have a meaning: can be considered as master keys Varying length (can be short or very long)

Same value across (OLTP) systems: valid across business processes

Meaning can change over time, e.g. VIN standards changed

Allow parallel loads in a DWH or Big Data system Can be composite (several fields) which would make joins slower and more complex [concatenation would be possible]

Often sequence-driven in OLTP systems (e.g. customer number; collisions possible when integration into DWH is done)

Page 30: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

GENERATED KEYS

Data Warehouse / DHBWDaimler TSS 30

Advantages Disadvantages

Small byte size (sequences): less storage and faster joins Insert performance can be slow (hot spot on index)

Always unique No business meaning

Good B*Tree index clustering Data load into DWH cause lookups (Big Data systems often have no sequences but would fail performance-wise with *sequential* sequence generation)

Page 31: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

HASH KEYS

Data Warehouse / DHBWDaimler TSS 31

Advantages Disadvantages

Allow parallel loads in a DWH or Big Data system Computed value can be longer compared to natural keys

Ability to join across platforms (e.g. RDBMS, NoSQL, Hadoop)

Computed value should be stored as binary instead of char. Some systems only allow have char (e.g. Hadoop).

Example: MD5 hash is binary(16) or char(32).

Deterministic across systems or if data is reloaded Collisions may occur: collision strategy required

Data is distributed Bad B*Tree index clustering

Page 32: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

LOGICAL STANDARD DATA WAREHOUSE ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 32

Data Warehouse

FrontendBackend

External data sources

Internal data sources

Staging Layer(Input Layer)

OLTP

OLTP

Core Warehouse

Layer(Storage

Layer)

Mart Layer(Output Layer)

(Reporting Layer)

Integration Layer

(Cleansing Layer)

Aggregation Layer

Metadata Management

Security

DWH Manager incl. Monitor

Page 33: LECTURE @DHBW: DATA WAREHOUSE PART X: DWH DATA …buckenhofer/20182DWH/Bucken... · 2018-11-07 · Employees often get trained in SQL Server, Oracle, Cognos TM1, Tableau, or any other

Daimler TSS GmbHWilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99

[email protected] / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSSDomicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle

Data Warehouse / DHBWDaimler TSS 33

THANK YOU


Recommended