Idq summit2014 ronald damhof - it's all about the data

Post on 08-Jun-2015

595 views 0 download

Tags:

description

"It's all about the data, a managerial perspective" - these are the slides of the presentations I gave at Data Modeling Zone 2014 in Hamburg and at the International Data Quality Summit in Richmond (VA) 2014.

transcript

R.D.Damhof – October 2014 – IDQ Summit 2014

It’s all about the data !

A managerial perspective

By Ronald Damhof

R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014

I am an opinionated kind a guy…. !

R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014

Who am I - My Data Manifesto

The X commandments of data management !I. Context is leading !II. Data is the ultimate proprietary asset, it is to be managed and

governed in line with morals & ethics, internal and external rules and legislation

!III. Stop center apps and process over data; data first, facts first !IV. It is all about the quality of our product; the data. Get clean,Stay

clean, Get access !V. Thou shall abstract

and separate concerns rigorously !

!

R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014

Who am I - My Data Manifesto

The X commandments of data management !!VI. a) Thou shall make a fundamentalistic distinction between Fact

and Context b) Thou shall not forsake ‘Time’

!VII.Data architecture is not the same as technology architecture !VIII.The science and practice of Information & Data Modeling needs

to be uphold, improved and taught !IX. Specify, Standardise, Automate & Productise !X. Thou can not buy your way out of the data misery you are in

!

!

R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014

XI There is a new saviour in town. Its name is Hadoop and

it calls to us from its mountain: !

‘we got a lake and thou shall throw all your data in it. The water will be clean so you can drink it, the water will flow so it will irrigate your lands, grow your stock, feed your kids and

of course bring you world peace…..’ !

nah, kidding ;-) !

Who am I - My Data Manifesto

The X commandments of data management

R.D.Damhof – September 2014 – Data Modeling Zone

R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014

R.D.Damhof – October 2014 – IDQ Summit 2014

R.D.Damhof – October 2014 – IDQ Summit 2014

Logistics & Manufacturing

R.D.Damhof – October 2014 – IDQ Summit 2014

R.D.Damhof – October 2014 – IDQ Summit 2014

Push/Supply/Source driven Pull/Demand/Product driven

▪ Mass deployment ▪ Control > Agility!▪ Validation of “ingredients” ▪ Repeatable & predictable processes ▪ Standardized processes ▪ High level of automation ▪ Relatively high IT/Data expertise

▪ Piece deployment ▪ Agility > Control!▪ Plausibility ▪ User-friendliness ▪ Relatively low IT expertise ▪ Domain expertise essential

All facts, fully temporal Truth, Interpretation, Context

Business Rules Downstream

The Data Push Pull Point

R.D.Damhof – October 2014 – IDQ Summit 2014

Systematic

Opportunistic

▪ User and developer are separated ▪ Defensive Governance; focus on control and compliance ▪ Strong focus on non-functionals; auditability, robustness, traceability, …. ▪ Centralised and organisation-wide information domain ▪ Configured and controlled deployment environment (dev/tst/acc/prod)

▪ User and developer are the same person or closely related ▪ Offensive governance; focus on adaptability & agility ▪ Decentralised,personal/workgroup/department/theme information domain ▪ All deployment is done in production

The Development Style

R.D.Damhof – October 2014 – IDQ Summit 2014

Development Style

Systematic

Opportunistic

I II

III IV

Research, Innovation & Design

“Shadow IT, Incubation, Ad-hoc,

Once off”

Push/Supply/Source driven Pull/Demand/Product driven

Data Push/Pull

Point

ContextFacts

A Data Deployment Quadrant

R.D.Damhof – October 2014 – IDQ Summit 2014

7 Applications of the Quadrant

R.D.Damhof – October 2014 – IDQ Summit 2014

(1) How we produce

R.D.Damhof – October 2014 – IDQ Summit 2014

How we produce, process variants

R.D.Damhof – October 2014 – IDQ Summit 2014

How we produce, automation

Rephrased - somewhat more nerdy:!• Model-driven, metadata driven!• Declarative instead of imperative !!Rephrased - somewhat more popular: !“In Data, the developer is the data modeller”

R.D.Damhof – October 2014 – IDQ Summit 2014

Production-line: Data orientation

Data Products Information Products

Access to data

Analytical tools

Processing Power

Production-line: Forms orientation

Eg. XBRL

How we produce, production lines

R.D.Damhof – October 2014 – IDQ Summit 2014

(2) How we organize

R.D.Damhof – October 2014 – IDQ Summit 2014

To centralize or to decentralize

R.D.Damhof – October 2014 – IDQ Summit 2014

(3) How we govern

R.D.Damhof – October 2014 – IDQ Summit 2014

How we govern, products

R.D.Damhof – October 2014 – IDQ Summit 2014

I II

III IV

Deliverant is Accountable

Demandee is Accountable

Data scientist/Analyst/Researcher responsible

How we govern, accountability Never, never, never ‘ownership’

In- en outboundData Delivery Agreements

R.D.Damhof – October 2014 – IDQ Summit 2014

(4) How do people excel

R.D.Damhof – October 2014 – IDQ Summit 2014

(5) How to use technology

R.D.Damhof – October 2014 – IDQ Summit 2014

Storage: (R)DBMS Processing: Automation Software Data Quality: Validation, Profiling Development: Data Modeling Accessibility: Data Virtualization

Storage: Pattern based Processing: Automation/limited ETL Data Quality: DQ rules/dashboards User tooling: Reporting, dashboards, Data Visualization

Storage: Analytical Processing: Preptools for Data Analyst User tooling: Advanced Analytics, Data Visualization

(6) How about Technology

R.D.Damhof – October 2014 – IDQ Summit 2014

(7) Business-,Information- or Data Modeling is key

The Logical Model drives the technical data architecture, design and implementation

Conceptual

Logical

e.g Data Vault,

Anchor Model

e.g. Dimensional,

hierarchical,flat

OntologyFacts

Relational

R.D.Damhof – October 2014 – IDQ Summit 2014

Oh…data warehouse?The classic distinction between ‘operational data environment’ and ‘informational data environment’ is fading. "!

Modern day data warehouses have been split up. Where the ‘fact’-part (Q1) moved into the operational side."!

Although data warehouses have evolved, operational applications have not, at least not in terms of data architecture. They should though…..

R.D.Damhof – October 2014 – IDQ Summit 2014

Email: ronald.damhof@prudenza.nl Linkedin: nl.linkedin.com/in/ronalddamhof/ Twitter: RonaldDamhof Blog: prudenza.typepad.com Website: www.prudenza.nl