Date post: | 13-Jan-2015 |
Category: |
Documents |
Upload: | terry-bunio |
View: | 150 times |
Download: | 4 times |
Agile Data Warehouse
The Final Frontier
Terry Bunio
Agile Data Warehouse The Final Frontier
@tbunio
bornagainagilist.wordpress.com
www.protegra.com
Terry Bunio
The Prime Directive
Captain, we need more visualization!
The United Federation of Planets
3
4
5
Spectre of the Agility 2
Data Warehouse Architecture 1
Data Warehouse
• Definition
– “a database used for reporting and data analysis.
It is a central repository of data which is created
by integrating data from multiple disparate
sources. Data warehouses store current as well
as historical data and are commonly used for
creating trending reports for senior management
reporting such as annual and quarterly
comparisons.” – Wikipedia.org
Data Warehouse
• Can refer to:
– Reporting Databases
– Operational Data Stores
– Data Marts
– Enterprise Data Warehouse
– Cubes
– Excel?
– Others
Relational
• Relational Analysis
– Third Normal Form
– OLTP
– Normalized tables optimized for modification
Dimensional
• Dimensional Analysis
– Star Schema
– OLAP
– Facts and Dimensions optimized for retrieval
• Facts – Business events – Transactions
• Dimensions – context for Transactions
– Accounts
– Products
– Date
Relational
Dimensional
Kimball-lytes
• Bottom-up - incremental
– Operational systems feed the Data
Warehouse
– Data Warehouse is a corporate dimensional
model that Data Marts are sourced from
– Data Warehouse is the consolidation of Data
Marts
– Sometimes the Data Warehouse is generated
from Subject area Data Marts
Inmon-ians
• Top-down
– Corporate Information Factory
– Operational systems feed the Data
Warehouse
– Enterprise Data Warehouse is a corporate
relational model that Data Marts are sourced
from
– Enterprise Data Warehouse is the source of
Data Marts
The gist…
• Kimball‟s approach is easier to implement as
you are dealing with separate subject areas,
but can be a nightmare to integrate
• Inmon‟s approach has more upfront effort to
avoid these consistency problems, but takes
longer to implement.
Spectre of the Agility
Incremental - Kimball
•In Segments
•Detailed Analysis •Development
•Deploy
•Long Feedback loop •Considerable changes
•Rework •Defects
Waterfall - Inmon •Detailed Analysis •Large Development
•Large Deploy
•Long Feedback loop •Extensive changes
•Many Defects
Data Warehouse
Project
Popular Agile Data Warehouse Pattern
• Analyze data requirements department by
department
• Create Reports and Facts and Dimensions
for each
• Integrate when you do subsequent
departments
The problem
• Conforming Dimensions
– A Dimension conforms when it is in
equivalent structure and content
– Is an account defined by Marketing the same
as Finance?
• Probably not
– If the Dimensions do not conform, this severly
hampers the Data Warehouse
Where is she?
Where is the true Agility?
• Iterations not Increments
• Brutal Visibility/Visualization
• Short Feedback loops
• Just enough requirements
• Working on enterprise priorities – not just for
an individual department
Our Mission
• “Data... the Final Frontier. These are the
continuing voyages of the starship Agile.
Her on-going mission: to explore strange
new projects, to seek out new value and
new clients, to iteratively go where no
projects have gone before.”
The Prime Directive
The Prime Directive
• Is a vision or philosophy that binds the
actions of Starfleet
• Can an Data Warehouse project truly be
Agile without a Vision of either the Business
Domain or Data Domain?
– Essentially it is then just an Ad Hoc Data
Warehouse. Separate components that may fit
together.
– How do we ensure we are working on the right
priorities for the entire enterprise?
A new model
Agile Enterprise Data Model
• Confirms the major entities and the
relationships between them
– 30-50 entities
• Confirms the Business and Data Domains
• Starts the definition of a Data Model that will
be refined over time
– Completed in 1 – 4 weeks
An Agile Enterprise Data Model
• Is just enough to understand the domain so
that the iterations can proceed
• Is not mapping all the attributes
• Is not BDUF
• Is a User Story Map for the Data Domain
• Contains placeholders for refinement
Agile Enterprise Data Model is a Data Map
Agile Enterprise Data Model
• Is
– Our vision
– Our User Story Map for the Data Domain
– Guides our solution
– Our Prime Directive
– A Data Model
Kimball or Inmon?
Spock
• Hybrid approach
– It is only logical
– Needs of the many outweigh the needs of the
few – or the one
Spock Approach
Agile Enterprise
Data Model
Spock Approach
• Agile Enterprise Data Model
• Operational Data Store
• Dimensional Data Warehouse
• Reporting can then be done from:
– Operational Data Store
– Dimensional Data Warehouse
– New Data Marts
Benefits of Spock Approach
• Agile Enterprise Data Model
– Validates knowledge of Data Domain
– Ensure later increments don’t uncover data
that was previously unknown and hard to
integrate
• Minimizes rework
– True iterations
• Confirm at high level and then refine
Benefits of Spock Approach
• Operational Data Store
– Model data relationally to provide enterprise
level operational reports
– Consolidate and cleanse data before it is
visible to end-users
– Is used to refine the Agile Enterprise Data
Model
Benefits of Spock Approach
• Dimensional Data Warehouse
– Model data dimensionally to validate domain
– Able to provide data to analytical reports
– Able to provide full historical data and context
for reports
– Able to provide clients with the ability to
generate their own reports easily
How do we work iteratively on
a Data Warehouse?
Increments versus iterations
• Increments
– Series by series – department by department
• Iterations
– Story by story – episode by episode
• Enterprise prioritization
– Work on the highest priority for the enterprise
– Not just within each series/department
Captain, we need more Visualization!
His pattern indicates 2 dimensional thinking
Data visualization
Data Visualization
• Is required to:
– Provide a visual data backlog
– Provide a visualization of the data
requirements across dimensions
– Plan iterations
– Lead into the creation of FACT tables and
Dimensions in a Data Warehouse
• For an Agile Data Warehouse we must think
and visualize in more dimensions
• We must create a User Story Map for Data
Requirements that is both intuitive and
informing
– Primarily for clients!
– They should be able to look at it and
understand what is currently being worked on
We need a bigger metaphor
Invoices
Sales
Operations
Master
Data
Payments
Bills
Transactions
Data Hexes
Bill Payment Hex
• Can have up to six dimensions of how
payments are sliced, diced, and aggregated
• Concentric hexes allow for the planning of
iterations
Cardassian Union
Be careful how you spell that…
Data Modeling Union
• For too long the Data Modelers have not
been integrated with Software Developers
• Data Modelers have been like the
Cardassian Union, not integrated with the
Federation
Issues
• This has led to:
– Holy wars
– Each side expecting the other to follow their
schedule
– Lack of communication and collaboration
• Data Modelers need to join the „United
Federation of Projects‟
Tools of the trade
Version Control
Version Control
• If you don‟t control versions, they will control
you
• Data Models must become integrated with
the source control of the project
– In the same repository of project trunk and
branches
Our Version Experience
• We are using Subversion
• We are using Oracle Data Modeler as our
Modeling tool.
– It has very good integration with Subversion
– Our DBMS is SQL Server
• Unlike other modeling tools, the data model
was able to be integrated in Subversion with
the rest of the project
Shameless plug
• Free
• Subversion Integration
• Supports Logical, Relational, and
Dimensional data models
• Since it is free, the data models can be
shared and refined by all 60+ members of
the development team
• Currently on version 889
Adaptability
Change Tolerant Data Model
• Only add tables and columns when they are
absolutely required
• Leverage Data Domains so that attributes
are created consistently and can be changed
in unison
Change Tolerant Data Model
• Don‟t model the data according to the
application‟s Object Model
• Don‟t model the data according to source
systems
• These items will change more frequently
than the actual data structure
• Your Data Model and Object Model should
be different!
Re-Factoring – Read It
Create the plan for how you
will re-factor
Plan for:
• Versioning – Major and minor
• Adaptability – How the data model will adapt
to major changes
• Refinement – How will iterations be planned
and executed
• Re-Factoring – How will the data design be
re-factored
Assimilate
Assimilate
• Assimilate Version Control, Adaptability,
Refinement, and Re-Factoring into core
project activities
– Stand ups
– Continuous Integration
– Check outs and Check Ins
• Make them part of the standard activities –
not something on the side
Our experience
Current Stardate
• We are reaching the end of Operational Data
Store ETL Development
– ODS has been refined as we progress – no
major changes
– Data Warehouse Dimensional Model is also
complete
– Initial Reports analysis is complete and report
backlog will soon be started on
Summary
• Use an Agile Enterprise Data Model to
provide the vision
• Strive for Iterations over Increments – Spock
Approach assists in this
• Use Data Hexes to provide brutal visibility
and Iteration planning
• Plan and Integrate processes for Versioning,
Adaptability, Refinement, and Re-Factoring
What doesn‟t change?
Leadership
Leadership
• “If you want to build a ship, don't drum up
people together to collect wood and don't
assign them tasks and work, but rather teach
them to long for the endless immensity of the
sea.” ~ Antoine de Saint-Exupery
Leadership • “[A goalie's] job is to stop pucks, ... Well, yeah, that's
part of it. But you know what else it is? ... You're
trying to deliver a message to your team that things
are OK back here. This end of the ice is pretty well
cared for. You take it now and go. Go! Feel the
freedom you need in order to be that dynamic,
creative, offensive player and go out and score. ...
That was my job. And it was to try to deliver a
feeling.” ~ Ken Dryden
Agile Data Warehouse
The Final Frontier
Terry Bunio
@tbunio bornagainagilist.wordpress.com www.protegra.com