of 41
8/2/2019 David Plotkin
1/41
Quality through Data GovernanceQuality through Data Governance
David Plotkin
Finance Data Quality Manageran o mer ca
Data Quality 2012 Asia Pacific Congress
Last revised: 01/21/2012
8/2/2019 David Plotkin
2/41
AgendaAgenda
Introduction
value add) to the Enterprise.
How to implement Data Governance:
Figuring out what youve got
Adding DG to the Project Methodology
The tools youll need
Setting up a Communications Plan
Measuring Success
8/2/2019 David Plotkin
3/41
Data Challenges: Data Needs to be ManagedData Challenges: Data Needs to be Managed
Collecting Data Definitions for isolated databases is notenough:
Definitions written in haste by project staff
Not rationalized across the Enterprise
Formal Enterprise-Wide Data Governance , ,glossary
Ownership at a granular level of detail
Consistent names & definitions across all apps and databases Data Governance involved in all aspects of Data Quality
.
8/2/2019 David Plotkin
4/41
Understanding Data GovernanceUnderstanding Data Governance
Data Governance is the execution of authority over datamanagement:
ts a a out ata owners p at t e organ zat ona eve ata
Governance board)
And decision making at the data element level (data stewardship)
The exercise and enforcement of decision-making authority
over the management of data assets and the performance of.
(Robert Seiner, TDAN and KII Consulting)
Ensuring that the enterprises data assets are formally
managed.
Coordinating communication to achieve collective goalsroug co a ora on.
(Steven Adler, IBM)
8/2/2019 David Plotkin
5/41
What is Data Governance (Practical)?What is Data Governance (Practical)?
Represents the Enterprise in all things data andmetadata
Metadata: Mandates capture of this information
Data Quality: Issues, fixes, rules, and projects
Champions data quality improvement projects
Instigates methodology changes to ensure capture of data andmetadata
Owns the data and metadata
Driven by relatively high-ranking individuals who canmake decisions for the Enterprise.
8/2/2019 David Plotkin
6/41
Data Governance ValueData Governance Value
Data Governance must tie back to the universal valuedrivers:
Increase revenue and value Manage cost and complexity
Ensure survival through attention to risk, compliance, security,and privacy (Gwen Thomas)
n oes n a ou :
How much time is wasted arguing over ill-defined or undefineddata elements.
How many bad decisions are made due to undefinedelements and poor quality.
8/2/2019 David Plotkin
7/41
Enterprise Data Governance in a NutshellEnterprise Data Governance in a Nutshell
-defined, accurate, consistent, and meets business needs. Data Governance providesproject support along with an evolving set of policies, procedures, and guidelines toachieve these goals.
Ownership is by BusinessFunction
Business Data Steward
Inventory shared data,requirements, and issues
Project Data Stewards work
EveryoneEveryone Data Stewardship CommitteeData Stewardship Committee
Function
Escalation path is to DataGovernance Council
.
DG SharePoint sitefacilitates the work.
IdentifyDataElements
and Issues
AssignOwner
Data Governance TeamData Governance Team
Define Data Elements,Valid Values & DerivationRules
Publish policies,processes andorganization
Coordinate committees
Publish definitions, valid Define,Communicate
a a ewar sa a ewar s
Perform data qualityanalysis
Work with SMEs andTechnical Data Stewards
Choose DQ remediation
Business Glossary Work with project teams to
align deliverables todefinitions
Publish data quality issues
ssess,MakeDecisions
Process,Decisions,Results
and resolution decisions
8/2/2019 David Plotkin
8/41
The DGI Data Governance FrameworkThe DGI Data Governance Framework
8/2/2019 David Plotkin
9/41
Enables overcoming challenges and achieving commongoalsEnables overcoming challenges and achieving commongoals
Goals
RevenueGeneration
Cost Reduction/ Avoidance
Compliance &Risk
Strategy &Business
Undermines Inhibits UnderminesUndermines
1
2
Cant easilyReduce data
errors
High potentialRemediation
costsNon-compliantWith state & Difficult to
1
I
Cant easilyCustomize
Product offeringsand bundles
HighInfrastructure
cost
causes Federalregulations
Of new business
channels
hibitors
Cant easilyIdentify high
value customers
causes causes
Ad-hoc dataQuality
methodsHigher thanNecessary
Probability ofData misuse
TarnishedBrand
reputation
Cant easilyIdentify key
Relationships &hierarchies
Cant easilyIdentify cross-Sell, up-sellopportunities
Lack of dataRetentionpolicies Exposure of
Personally
Cant easilyConsolidate data
From silos,
No single viewOf customer
ent a eInformation in
Non-production
Security
monitoring
Systems quickly
(M&A)
Courtesy of Steven Adler, IBM
8/2/2019 David Plotkin
10/41
Data Governance and Data StewardshipData Governance and Data Stewardship
A data stewardship program is a key part of an overalldata governance program.
It is the operational aspectof data governance.
execution of authority over the management of data,
then data stewardship is formalized accountabilityforthe management of that data.(courtesy of Robert Seiner, KIK Consulting)
- - .
8/2/2019 David Plotkin
11/41
What do we mean by a Data Steward?What do we mean by a Data Steward?
A key representative in a specific business area that isaccountable for quality and use of that data throughout
.
the data and the decision-makers about the data (SherryMichaels, Erie Insurance)
Data stewards are the ones who can reach into the
organization and pull out the knowledge (and.
Data Stewardship is NOT a job it is the formalizing ofdata res onsibilities that are likel in lace in an informalway.
Data Stewardship involves specific tasks for which thestewards must be trained.
8/2/2019 David Plotkin
12/41
Data Stewardship: Needed for Data QualityData Stewardship: Needed for Data Quality
A data quality initiative introduces new constraints onthe ways that individuals create, access, use, modify,an ret re ata. o ensure t at t ese constra nts are notviolated, the data governance and data quality staff
.
Data Quality policies: introduced and monitored
Enough metadata to support the data quality processes Incorporation of data quality into system design by the
developers.
data (not just what is needed for the source system).
Identifying important business impacts of poor quality.
8/2/2019 David Plotkin
13/41
Data Stewards are AccountableData Stewards are Accountable
Data Stewardship establishes accountability for:
Data definitions and derivations
a a qua y ru es an e r en orcemen
Key role in improving data quality
Data-related communications
Data element rationalization
Contributing to data-related policies and procedures. Understanding the downstream uses of their data and how
proposed changes impact those uses.
Their decisions are enforceable
Oversees all data-related work in their business function
Represents their business function as the single point of
contact.
8/2/2019 David Plotkin
14/41
What Happens Without Data Governance?What Happens Without Data Governance?
Different parts of the organization:
Use their own definitions for data, so they may enter differentva ues. ea s to a ec s ons, num ers t at on t matc , etc.
Derive their numbers based on different calculations and thenumbers dont match.
Make different determinations of the data quality, leading to
different degrees of confidence in the numbers (or even a.
Long arguments about meaning and quality.
Improving Data Quality is very hard except in limited.
8/2/2019 David Plotkin
15/41
The organization without Data GovernanceThe organization without Data Governance
8/2/2019 David Plotkin
16/41
Data Quality without Data GovernanceData Quality without Data Governance
Data quality deteriorates over time
Data producers are incented to be fast, but not necessarilyaccurate. Stewards must champion changing the businesspr or es.
Data quality rules are not defined. Stewards can define the
rules and required quality levels. Individuals make their own corrections. Stewardship exposes
this and the costs of these processes.
.demand (and demand funding for) enforcement of DQ rulesduring system loads.
8/2/2019 David Plotkin
17/41
Data Governance OrganizationData Governance Organization
Business IT
Chief Data Steward
Data GovernanceBusiness SponsorPT
Data GovernanceIT SponsorPT
Data Owners FT
Enterprise Data StewardFT
PTEnterprise Application Owner(Delivery Manager)
PT
Business Data StewardsPT
Le end Technical Data Stewards
Project Data StewardsFT Data Domain StewardsFTApplication Domain Owner(Business Partners)PT
Data Stewardship
Council
Data Governance
Committee
Technical Data StewardsPT
CreatesCreatesCreates
PT = Part Time
FT = Full Time
group
group
group
8/2/2019 David Plotkin
18/41
The Stewardship OrganizationThe Stewardship Organization
Data Stewardship Council
Enterprise DataSteward
SalesMembership Insurance
HRCall
MarketinFinancial
ITFinancial
Travelro uc s erv ces
Actuarial ClaimsUnderwritingOperations
en er o e ng ransac ons
BusinessFunctions
8/2/2019 David Plotkin
19/41
Data Stewardship CommitteeData Stewardship Committee
Functional body for data governance program
Apply data standards, policies, and principles.
Participate in and contribute to data governance processes.Evaluate effectiveness of processes.
.
Contribute to and ensure completeness of data-related
documentation (metadata). Make decisions on ownership of data.
Communicate data governance vision & objectives to
. Shape data governance design and implementation; ensure
alignment to the business.
Communicate decisions of the committee.
8/2/2019 David Plotkin
20/41
Why Add Data Governance to Project Methodology?Why Add Data Governance to Project Methodology?
DG tasks benefit from scope limitations of a project.
Limited block of data
Limited number of source systems
Management of tasks and deliverables benefit fromprofessionals (Project Managers).
PMs will bird dog the deliverables and ensure they get done , .
PMs will schedule the tasks and allocate the resources.
.
Subject matter experts are assigned.
Time is allocated to work on the project tasks.
8/2/2019 David Plotkin
21/41
What needs to be added to Project Methodology?What needs to be added to Project Methodology?
Integration with Project Management
, ,
quality rules).
Solution Evaluation components
QA Components (including Data Quality Assurance)
8/2/2019 David Plotkin
22/41
Data Governance Value to a ProjectData Governance Value to a Project
Collection of data definitions Building a body of stewarded and understood data definitions benefits
all those in the enter rise who use the data and alleviates confusionwhen discussing the data. This also helps with conversions.
Collection of data derivations u ng a o y o s ewar e an va a e a a er va ons ea s o a
common way of calculating numbers. The result is not only that theproject delivers results that match the official calculation method, butmuch less time is s ent b data anal sts across the com anattempting to reconcile reports.
Identification and resolution of data quality issues
oor a a qua y can eep a pro ec rom go ng n o pro uc on. erisk to a project is lessened by early identification (and wherepossible, resolution) of data quality issues. Data profiling measuress ecifics of the data and rovides a com arison between what thedata looks like and what the data quality rules say it should look like.
8/2/2019 David Plotkin
23/41
Adjust Project Methodology: Data QualityAdjust Project Methodology: Data Quality
Collect (during Analysis and Design): Data Quality issues and rules for measuring quality (meet guidelines) Data Quality rules: When the data goes bad, how do you know? Information to verify the issues and quantify severity
Project resources Guided by Project Data Steward, collected from business analysts/SMEs Documented in Mapping document or DQ rule dictionary
Measure and validate rules against data using Data Profiling. Quantifies the extent of the data ualit roblem. Rules may need to be restated if fit to data is poor. Data is examined and results reported back to the business. Determination must be made as to fitness for use.
Metrics: Total DQ rules stated and validated Fit of data to stated rules Change in quality of data over time
8/2/2019 David Plotkin
24/41
Adjust Project Methodology: QAAdjust Project Methodology: QA
QA test cases written using Data Quality rules Test cases run as part of regular QA process
a a e ec s rac e n sys em an pr or ze an wor e
just like any other defects. Some business rules and relationships may show up as data
e ec s po c es w ou r vers .
QA test cases written using metadata (definitions)
Do valid value sets show values expected based on definitionsand stated value sets?
o screens s ow mu p e e s a are ac ua y e same ng(due to acronyms)?
Has the metadata been entered into the EMR and glossary?
8/2/2019 David Plotkin
25/41
Data Governance and Data QualityData Governance and Data Quality
A primary deliverable for Data Governance is improveddata quality
This should go beyond just response to DQ issues(reactive) and include defining, finding, and fixing DQissues before the customer does (proactive).
Should include Data Quality Analysis and Reconciliation
Needs to be driven by the Business Impacts of poorquality: some data may be bad, but if it doesnt stop
important business processes, MOVE ON.
8/2/2019 David Plotkin
26/41
The Data Quality Improvement CycleThe Data Quality Improvement Cycle
(1) Identify andmeasure how poor
data quality
AnalyzeAnalyze
objectives
(2) Definebusiness-related
data quality rules &performance
targets
(3) Design qualityimprovement
quality againsttargets
remediate processflaws.
(4) Implementquality
improvementmethods and
processes
8/2/2019 David Plotkin
27/41
Business Results Metrics ExampleBusiness Results Metrics Example
Cost of poor quality data to your business:
Calling/Mailing costs: How many times did we contact someone who already
had a particular type of policy or who was not eligible for that type ofpolicy? How much postage/time was wasted?
Loss of productivity/opportunity cost: How many policies could have been sold ifagents had only contacted eligible policyholders? How much would those policies
have been worth?Loss of business cost: How many policyholders canceled their policies becausewe didnt understand their needs or didnt appear to value their business (surveycan give you an idea). What is the lost lifetime value of those customers?
Compliance cost: How much did we spend responding to regulatory or auditrequests (demand!). How much of that was attributable to poor data quality orinformation not available?
8/2/2019 David Plotkin
28/41
Steps to Data Quality Analysis and ReconciliationSteps to Data Quality Analysis and Reconciliation
Data Profiling
Reviewing the data quality analysis with Data Stewards todetermine acceptable ranges of data quality, associated risk,rans orma on gu e nes, an recommen a ons on a acleansing.
The development of required ETL processing to cleanse thedata.
Only want to do this once after the process has been fixed. Or thats the theory, anyway
8/2/2019 David Plotkin
29/41
Collecting the Data Quality RulesCollecting the Data Quality Rules
Get the rules from the Data Stewards
Create a tem late to collect the ualit rules:
Mandatory, optional, valid values, valid range, data type,patterns
e a ons ps e ween a a e emen s
Relationships between records in different tables
u e conversa ons w s ewar s o ga er ru es
Helping the business help us define what we mean by
goo qua y or a a a e emen . Can help to pre-profile the data (do a sample extract)
o s ow e s ewar s w a s ac ua y presen now.
8/2/2019 David Plotkin
30/41
What is Data Profiling?What is Data Profiling?
Data Profiling is a process whereby one examines thedata available in an existing database and collectsstat st cs an n ormat on a out t at ata.Wikipedia, http://en.wikipedia.org/wiki/data_profiling
Data Profiling is the use of analytical techniques todiscover the structure, content, and quality of data.
, , .
Data Profiling is a set of algorithms for statistically
within a data set as well as exploring relationships thatexist between data elements or across data sets.David Loshin, Knowledge Integrity, Inc.
8/2/2019 David Plotkin
31/41
What is Data Profiling (continued)?What is Data Profiling (continued)?
Uses both real data and metadata to determine thequality of data.
Identified source data requires both a detailedanalysis of the raw data valuescurrently stored in
,existing metadata, to determine the actual
meaning, descriptions and relationships that shouldbe found in the data.
Data profiling should be used whenever data is
being converted, migrated, warehoused or mined.
Can hel discover business rules embedded withindata sets, which can be used for ongoing inspection
and monitoring.
8/2/2019 David Plotkin
32/41
General Benefits from Data ProfilingGeneral Benefits from Data Profiling
Identify or validate
availability of information.
Rapid assessment of which
fields are consistently populated
against model expectations.
Improve predictability of Focus data quality efforts where
pro ec me nes. ey are rea y nee e .
Lower the risk of design Improve visibility to quality of
.
decision making.
migration testing support.
transitional data stores.
Support compliance and Identify transformation rules for
audit requirements. migration and integration.
Danette McGilvray, Granite Falls Consulting, Inc.
8/2/2019 David Plotkin
33/41
Benefits: Saves the Programmers time and effortBenefits: Saves the Programmers time and effort
Programmers already examine the data to makesure their work doesnt lead to code/load/explode.
If they believe what they are told about the data contents,
it invariably leads to code failures.
decide whether to code around the bad data or fix it.
Profiling puts a rigorous process in place to prevent the.
Real example: 24 defects, $556,000 in development time,$142,000 in QA time, 6 month delivery delay because of
unexpecte ata n t e ee .
8/2/2019 David Plotkin
34/41
Scope of the Data Profiling ProcessScope of the Data Profiling Process
Not just done on raw data elements:
Includes counts and aggregations
Other derived values
Can be run on:
Individual columns
Across columns in a table
Across applications and databases
8/2/2019 David Plotkin
35/41
Using Data Profiling for DQ AssessmentUsing Data Profiling for DQ Assessment
1. Extract data to be profiled
2. Analysts profilethe data using a profilingtool and review results
3. Potential anomalies are noted withintools repository. Record:The data element in question
Why it might be an issue4. Reports are generated
from the profiling tool andreviewed by business
Subject matter experts 5. Issues are reviewedand evaluated, e.g.,Red: definitely an issueGreen: not an issueYellow: requires additional
.
Gray: Out of scope
6. Results reviewed.
8/2/2019 David Plotkin
36/41
Data Profiling is also a processData Profiling is also a process
DetermineIssues
DefineProfile the data
Using aReview Analyze
Worth
fixing
1 32 4
Rules
Data Profilingtool
Findings
Issues Set and
Enforce
DataQualitytargets
6
Monitor ongoingData Quality
Impacts on MetadataImpacts on Metadata
8/2/2019 David Plotkin
37/41
Impacts on MetadataImpacts on Metadata
The data quality rules discovered via data profilingare metadata.
The results (quality of the data) are also metadata
Profiling results in a determination that either:
correct and the data is wrong, or
The data is correct and the metadata (data quality rules) are
wrong Unless they are both wrong
e a a a nee s o e recor e
What Data Profiling AchievesWhat Data Profiling Achieves
8/2/2019 David Plotkin
38/41
What Data Profiling AchievesWhat Data Profiling Achieves
Accurate
Accurate and
Inaccurate
Metadata
ProfilingData:
Accurate and
Inaccurate
Data
Inaccurate
Data QualityIssues
A l i A l f bi thd tA l i A l f bi thd t
8/2/2019 David Plotkin
39/41
Analysis: An example of birthdatesAnalysis: An example of birthdates
Check out the beginning of the year
Looks too high
.
Fi i hi UFi i hi U
8/2/2019 David Plotkin
40/41
Finishing UpFinishing Up
Data Governance is a program that needs corporatesupport and an organization
Data is an asset that must be defined, managed,stewarded and governed.
Accountability and Communication are crucial.
Data Governance program
corporation is a primary goal of Data Governance
Thank o and an q estions?Thank o and an q estions?
8/2/2019 David Plotkin
41/41
Thank you andany questions?Thank you andany questions?