Date post: | 19-Dec-2016 |
Category: |
Documents |
Upload: | truongduong |
View: | 217 times |
Download: | 0 times |
Jim Gallo
Senior Data Warehouse Architect Information Control Corporation
October 21, 2010
Tools and Techniques for Accurately Estimating BI/DW
Projects
Agenda
• The Business – IT Conundrum
• Requirements Gathering – the Goldilocks Approach • Deriving Information for the Estimate
• The Estimating Process • Building an Estimating Model
• Risk Abatement and Uncertainty
• Improving the Estimating Process
The Business – IT Conundrum
O&M Funds
Capital Funds The color of money matters!
Information to be Gathered – What’s Important?
• Business Definition o Goals/Measures o Business Problems o Questions to Be Answered
• Scope o Ad Hoc o Canned Reports o Self-service Reports o KPI/Scorecard o Dashboard
• Queries and Reports o Define layout and content o Get samples, if available
• Audience Profile o Target Audience o Types of Users o Number of Users o Frequency of Access/Concurrency
• Analysis o Facts and Dimensions o Hierarchies o Granularity
• Content o Entities o Attributes o Sources (internal and external)
• Output Format o Reporting Tool o Web (HTML, etc.) o Other (.pdf, .xls, .doc., etc.)
• Operational o Availability o Refresh frequency
• Data Quality • Security • History and Retention
The Goldilocks Approach
Information to be Gathered – What’s Important?
• Business Definition o Goals/Measures o Business Problems o Questions to Be Answered
• Scope o Ad Hoc o Canned Reports o Self-service Reports o KPI/Scorecard o Dashboard
• Queries and Reports o Define layout and content o Get samples, if available
• Audience Profile o Target Audience o Types of Users o Number of Users o Frequency of Access/Concurrency
• Analysis o Facts and Dimensions o Hierarchies o Granularity
• Content o Entities o Attributes o Sources (internal and external)
• Output Format o Reporting Tool o Web (HTML, etc.) o Other (.pdf, .xls, .doc., etc.)
• Operational o Availability o Refresh frequency
• Data Quality • Security • History and Retention
The total effort is directly related to 3 and only 3 variables.
Any guesses? 1. Number of data elements 2. Number of source files 3. Expectations and reality of data quality
Why is the postulate important? Fact: 60% - 80% of BI/DW project hours deal specifically with DATA
INTEGRATION DATA INTEGRATION represents 70% - 90% of project RISK. Fact:
Jim’s Postulate
If you can adequately estimate the number of fields and source files, you can then derive, with relative certainty hours for:
• Data modeling • Data quality assessment and abatement • ETL design and development • Physical database design
• Business involvement • Testing
As well as the majority of hours needed for:
Derivations from the 3 Key Variables
Quantifying the Key Variables
Commonwealth County
Region City
Zip Code Year
Quarter
Month
Week
Day Male
Female
Unknow
n
16 -
25
26 -
35 36
- 45
46 -
55 >5
5
MOLAP
ROLAP (Star) Full Relational
Source Systems
Initial Requirements
Definition
Detailed Requirements Definition
Focus Here
Business Questions – Retail Bank - Marketing
1. How profitable is the customer? 2. What do we know about our customers and how can we know them better? 3. How can we retain desirable customers? 4. The specific list of questions that the group would like to be able to answer quickly and in a self-service
fashion includes: 5. How do we define profitability of sales? 6. What customers are using what channels and why? How profitable are they? 7. Who are my at risk customers? 8. What fees are we refunding at the associate and account level? 9. What fees are we retaining at the associate and account level? 10. What is the customer’s spending behavior and how do we make customers “sticky”? 11. Should we have more quantity or quality of sales? 12. How can we know our customers better across channels? 13. Are we performing transactions accurately? 14. Are customers in the right products? 15. What are my customers’ activities? 16. Who are the best potential customers so that we can focus our sales efforts on them? 17. Which methods are the most/least expensive from a channel perspective? 18. How do we drive customers to the least expensive channel and retain them? 19. What benefits are received by HNB when fees are waived? 20. Are we staffing resources accurately? 21. Are we forecasting and measuring our efficiency? 22. How are we managing costs, forecasting and how do we measure this? 23. What is the customer’s profitability potential? 24. across all mechanisms at customer accounts?
Quantifying the Key Variables
Measure
Time
Geography
Custom
er
Bank Product
Channel
Bank
Organization/ A
ssociate
Custom
er D
emographics
Consum
er Product
Category
Profitability ($) x x x x x x x Revenue ($) x x x x x x x
Cost ($) x x x x x x x Products (#) x x x x x x x Purchase
Transactions (#) x x x x x x x x
Purchase Transactions ($) x x x x x x x x
Refunds/Waived Fees ($) x x x x x x x
Retention (#) x x x x x x x
Source System Mapping
Initial or Detailed Requirements Definition.
From here you can estimate the number of sources and “guesstimate” data quality and ETL complexity
The Estimating Process
• Questions • Measures • Goals
Warehouse - Tables (DM x3) - Attributes (tables x15)
Facts & Dimensions (Facts Qualifier
Matrix)
• Semantic Layers • Reports • KPIs • Dashboard Components
Cubes
Marts - Tables - Attributes (tables x15)
Initial Load - Columns - Rows
History - Number years - Variability vs. Initial Load
Source System Map
Data Quality Profile - Columns - Rows
Testing
Project Management
Business Analysis
Information Delivery
Data Integration
Requirements
Primary Derivations
Secondary Derivations
Build an Estimating Model – ETL Example
Work with data
modeler
Staging Area
Populate DW
Hours Summary
Target Columns
Data Sources
Complexity
Populate DM Tabs
• Total Hours (all) • Data Modeling • DBA • Source System Profile • Data Quality • ETL • BI and Reporting • Training • Testing
Build an Estimating Model – Summarization
Break/fix after testing
cycles
Schedule and FTE
Approximation
Build an Estimating Model – Smoothing and Sequencing
Variance (Estimated
vs. Planned)
Estimated Hours
Planned Hours
(smoothed)
The Giggle Test and The Law of Big Numbers
70% – 77% of hours have been given due consideration 70% – 90% of RISK has been thought through
Assume a 1-year project
Identify Risks Identify Unknowns
Create a Risk Abatement Action Statement
Insert Hours and Tasks into Project Plan
Risk Abatement and Uncertainty
Risk Abatement and Uncertainty (continued)
If you still believe that all risks and uncertainties have been accounted for, apply a management contingency.
15% - 20% contingency is not uncommon and is standard a practice in most Project Management methodologies
Therefore, total estimate = Detailed estimate * 1.15 or * 1.20
So youʼre still not feeling good about the plan?
Develop Estimating
Model
Requirements
Tasks
Assumptions
Risks
Monitor Project
Project Plan
Determine Cause of Variances
Variances
List Variances by Task and Role
Adjust Model to Account for Observed Variances
Compare Actual Hours to Estimate
Clarity Initial Estimate
Continuous Improvement
Cycle
Improving The Estimation Process
Contact Information
• If you have further questions or comments:
Jim Gallo
Senior Data Warehouse Architect Information Control Corporation
(614) 523-3070