+ All Categories
Home > Documents > David Plotkin

David Plotkin

Date post: 05-Apr-2018
Category:
Upload: ark-group
View: 224 times
Download: 0 times
Share this document with a friend

of 41

Transcript
  • 8/2/2019 David Plotkin

    1/41

    Quality through Data GovernanceQuality through Data Governance

    David Plotkin

    Finance Data Quality Manageran o mer ca

    Data Quality 2012 Asia Pacific Congress

    Last revised: 01/21/2012

  • 8/2/2019 David Plotkin

    2/41

    AgendaAgenda

    Introduction

    value add) to the Enterprise.

    How to implement Data Governance:

    Figuring out what youve got

    Adding DG to the Project Methodology

    The tools youll need

    Setting up a Communications Plan

    Measuring Success

  • 8/2/2019 David Plotkin

    3/41

    Data Challenges: Data Needs to be ManagedData Challenges: Data Needs to be Managed

    Collecting Data Definitions for isolated databases is notenough:

    Definitions written in haste by project staff

    Not rationalized across the Enterprise

    Formal Enterprise-Wide Data Governance , ,glossary

    Ownership at a granular level of detail

    Consistent names & definitions across all apps and databases Data Governance involved in all aspects of Data Quality

    .

  • 8/2/2019 David Plotkin

    4/41

    Understanding Data GovernanceUnderstanding Data Governance

    Data Governance is the execution of authority over datamanagement:

    ts a a out ata owners p at t e organ zat ona eve ata

    Governance board)

    And decision making at the data element level (data stewardship)

    The exercise and enforcement of decision-making authority

    over the management of data assets and the performance of.

    (Robert Seiner, TDAN and KII Consulting)

    Ensuring that the enterprises data assets are formally

    managed.

    Coordinating communication to achieve collective goalsroug co a ora on.

    (Steven Adler, IBM)

  • 8/2/2019 David Plotkin

    5/41

    What is Data Governance (Practical)?What is Data Governance (Practical)?

    Represents the Enterprise in all things data andmetadata

    Metadata: Mandates capture of this information

    Data Quality: Issues, fixes, rules, and projects

    Champions data quality improvement projects

    Instigates methodology changes to ensure capture of data andmetadata

    Owns the data and metadata

    Driven by relatively high-ranking individuals who canmake decisions for the Enterprise.

  • 8/2/2019 David Plotkin

    6/41

    Data Governance ValueData Governance Value

    Data Governance must tie back to the universal valuedrivers:

    Increase revenue and value Manage cost and complexity

    Ensure survival through attention to risk, compliance, security,and privacy (Gwen Thomas)

    n oes n a ou :

    How much time is wasted arguing over ill-defined or undefineddata elements.

    How many bad decisions are made due to undefinedelements and poor quality.

  • 8/2/2019 David Plotkin

    7/41

    Enterprise Data Governance in a NutshellEnterprise Data Governance in a Nutshell

    -defined, accurate, consistent, and meets business needs. Data Governance providesproject support along with an evolving set of policies, procedures, and guidelines toachieve these goals.

    Ownership is by BusinessFunction

    Business Data Steward

    Inventory shared data,requirements, and issues

    Project Data Stewards work

    EveryoneEveryone Data Stewardship CommitteeData Stewardship Committee

    Function

    Escalation path is to DataGovernance Council

    .

    DG SharePoint sitefacilitates the work.

    IdentifyDataElements

    and Issues

    AssignOwner

    Data Governance TeamData Governance Team

    Define Data Elements,Valid Values & DerivationRules

    Publish policies,processes andorganization

    Coordinate committees

    Publish definitions, valid Define,Communicate

    a a ewar sa a ewar s

    Perform data qualityanalysis

    Work with SMEs andTechnical Data Stewards

    Choose DQ remediation

    Business Glossary Work with project teams to

    align deliverables todefinitions

    Publish data quality issues

    ssess,MakeDecisions

    Process,Decisions,Results

    and resolution decisions

  • 8/2/2019 David Plotkin

    8/41

    The DGI Data Governance FrameworkThe DGI Data Governance Framework

  • 8/2/2019 David Plotkin

    9/41

    Enables overcoming challenges and achieving commongoalsEnables overcoming challenges and achieving commongoals

    Goals

    RevenueGeneration

    Cost Reduction/ Avoidance

    Compliance &Risk

    Strategy &Business

    Undermines Inhibits UnderminesUndermines

    1

    2

    Cant easilyReduce data

    errors

    High potentialRemediation

    costsNon-compliantWith state & Difficult to

    1

    I

    Cant easilyCustomize

    Product offeringsand bundles

    HighInfrastructure

    cost

    causes Federalregulations

    Of new business

    channels

    hibitors

    Cant easilyIdentify high

    value customers

    causes causes

    Ad-hoc dataQuality

    methodsHigher thanNecessary

    Probability ofData misuse

    TarnishedBrand

    reputation

    Cant easilyIdentify key

    Relationships &hierarchies

    Cant easilyIdentify cross-Sell, up-sellopportunities

    Lack of dataRetentionpolicies Exposure of

    Personally

    Cant easilyConsolidate data

    From silos,

    No single viewOf customer

    ent a eInformation in

    Non-production

    Security

    monitoring

    Systems quickly

    (M&A)

    Courtesy of Steven Adler, IBM

  • 8/2/2019 David Plotkin

    10/41

    Data Governance and Data StewardshipData Governance and Data Stewardship

    A data stewardship program is a key part of an overalldata governance program.

    It is the operational aspectof data governance.

    execution of authority over the management of data,

    then data stewardship is formalized accountabilityforthe management of that data.(courtesy of Robert Seiner, KIK Consulting)

    - - .

  • 8/2/2019 David Plotkin

    11/41

    What do we mean by a Data Steward?What do we mean by a Data Steward?

    A key representative in a specific business area that isaccountable for quality and use of that data throughout

    .

    the data and the decision-makers about the data (SherryMichaels, Erie Insurance)

    Data stewards are the ones who can reach into the

    organization and pull out the knowledge (and.

    Data Stewardship is NOT a job it is the formalizing ofdata res onsibilities that are likel in lace in an informalway.

    Data Stewardship involves specific tasks for which thestewards must be trained.

  • 8/2/2019 David Plotkin

    12/41

    Data Stewardship: Needed for Data QualityData Stewardship: Needed for Data Quality

    A data quality initiative introduces new constraints onthe ways that individuals create, access, use, modify,an ret re ata. o ensure t at t ese constra nts are notviolated, the data governance and data quality staff

    .

    Data Quality policies: introduced and monitored

    Enough metadata to support the data quality processes Incorporation of data quality into system design by the

    developers.

    data (not just what is needed for the source system).

    Identifying important business impacts of poor quality.

  • 8/2/2019 David Plotkin

    13/41

    Data Stewards are AccountableData Stewards are Accountable

    Data Stewardship establishes accountability for:

    Data definitions and derivations

    a a qua y ru es an e r en orcemen

    Key role in improving data quality

    Data-related communications

    Data element rationalization

    Contributing to data-related policies and procedures. Understanding the downstream uses of their data and how

    proposed changes impact those uses.

    Their decisions are enforceable

    Oversees all data-related work in their business function

    Represents their business function as the single point of

    contact.

  • 8/2/2019 David Plotkin

    14/41

    What Happens Without Data Governance?What Happens Without Data Governance?

    Different parts of the organization:

    Use their own definitions for data, so they may enter differentva ues. ea s to a ec s ons, num ers t at on t matc , etc.

    Derive their numbers based on different calculations and thenumbers dont match.

    Make different determinations of the data quality, leading to

    different degrees of confidence in the numbers (or even a.

    Long arguments about meaning and quality.

    Improving Data Quality is very hard except in limited.

  • 8/2/2019 David Plotkin

    15/41

    The organization without Data GovernanceThe organization without Data Governance

  • 8/2/2019 David Plotkin

    16/41

    Data Quality without Data GovernanceData Quality without Data Governance

    Data quality deteriorates over time

    Data producers are incented to be fast, but not necessarilyaccurate. Stewards must champion changing the businesspr or es.

    Data quality rules are not defined. Stewards can define the

    rules and required quality levels. Individuals make their own corrections. Stewardship exposes

    this and the costs of these processes.

    .demand (and demand funding for) enforcement of DQ rulesduring system loads.

  • 8/2/2019 David Plotkin

    17/41

    Data Governance OrganizationData Governance Organization

    Business IT

    Chief Data Steward

    Data GovernanceBusiness SponsorPT

    Data GovernanceIT SponsorPT

    Data Owners FT

    Enterprise Data StewardFT

    PTEnterprise Application Owner(Delivery Manager)

    PT

    Business Data StewardsPT

    Le end Technical Data Stewards

    Project Data StewardsFT Data Domain StewardsFTApplication Domain Owner(Business Partners)PT

    Data Stewardship

    Council

    Data Governance

    Committee

    Technical Data StewardsPT

    CreatesCreatesCreates

    PT = Part Time

    FT = Full Time

    group

    group

    group

  • 8/2/2019 David Plotkin

    18/41

    The Stewardship OrganizationThe Stewardship Organization

    Data Stewardship Council

    Enterprise DataSteward

    SalesMembership Insurance

    HRCall

    MarketinFinancial

    ITFinancial

    Travelro uc s erv ces

    Actuarial ClaimsUnderwritingOperations

    en er o e ng ransac ons

    BusinessFunctions

  • 8/2/2019 David Plotkin

    19/41

    Data Stewardship CommitteeData Stewardship Committee

    Functional body for data governance program

    Apply data standards, policies, and principles.

    Participate in and contribute to data governance processes.Evaluate effectiveness of processes.

    .

    Contribute to and ensure completeness of data-related

    documentation (metadata). Make decisions on ownership of data.

    Communicate data governance vision & objectives to

    . Shape data governance design and implementation; ensure

    alignment to the business.

    Communicate decisions of the committee.

  • 8/2/2019 David Plotkin

    20/41

    Why Add Data Governance to Project Methodology?Why Add Data Governance to Project Methodology?

    DG tasks benefit from scope limitations of a project.

    Limited block of data

    Limited number of source systems

    Management of tasks and deliverables benefit fromprofessionals (Project Managers).

    PMs will bird dog the deliverables and ensure they get done , .

    PMs will schedule the tasks and allocate the resources.

    .

    Subject matter experts are assigned.

    Time is allocated to work on the project tasks.

  • 8/2/2019 David Plotkin

    21/41

    What needs to be added to Project Methodology?What needs to be added to Project Methodology?

    Integration with Project Management

    , ,

    quality rules).

    Solution Evaluation components

    QA Components (including Data Quality Assurance)

  • 8/2/2019 David Plotkin

    22/41

    Data Governance Value to a ProjectData Governance Value to a Project

    Collection of data definitions Building a body of stewarded and understood data definitions benefits

    all those in the enter rise who use the data and alleviates confusionwhen discussing the data. This also helps with conversions.

    Collection of data derivations u ng a o y o s ewar e an va a e a a er va ons ea s o a

    common way of calculating numbers. The result is not only that theproject delivers results that match the official calculation method, butmuch less time is s ent b data anal sts across the com anattempting to reconcile reports.

    Identification and resolution of data quality issues

    oor a a qua y can eep a pro ec rom go ng n o pro uc on. erisk to a project is lessened by early identification (and wherepossible, resolution) of data quality issues. Data profiling measuress ecifics of the data and rovides a com arison between what thedata looks like and what the data quality rules say it should look like.

  • 8/2/2019 David Plotkin

    23/41

    Adjust Project Methodology: Data QualityAdjust Project Methodology: Data Quality

    Collect (during Analysis and Design): Data Quality issues and rules for measuring quality (meet guidelines) Data Quality rules: When the data goes bad, how do you know? Information to verify the issues and quantify severity

    Project resources Guided by Project Data Steward, collected from business analysts/SMEs Documented in Mapping document or DQ rule dictionary

    Measure and validate rules against data using Data Profiling. Quantifies the extent of the data ualit roblem. Rules may need to be restated if fit to data is poor. Data is examined and results reported back to the business. Determination must be made as to fitness for use.

    Metrics: Total DQ rules stated and validated Fit of data to stated rules Change in quality of data over time

  • 8/2/2019 David Plotkin

    24/41

    Adjust Project Methodology: QAAdjust Project Methodology: QA

    QA test cases written using Data Quality rules Test cases run as part of regular QA process

    a a e ec s rac e n sys em an pr or ze an wor e

    just like any other defects. Some business rules and relationships may show up as data

    e ec s po c es w ou r vers .

    QA test cases written using metadata (definitions)

    Do valid value sets show values expected based on definitionsand stated value sets?

    o screens s ow mu p e e s a are ac ua y e same ng(due to acronyms)?

    Has the metadata been entered into the EMR and glossary?

  • 8/2/2019 David Plotkin

    25/41

    Data Governance and Data QualityData Governance and Data Quality

    A primary deliverable for Data Governance is improveddata quality

    This should go beyond just response to DQ issues(reactive) and include defining, finding, and fixing DQissues before the customer does (proactive).

    Should include Data Quality Analysis and Reconciliation

    Needs to be driven by the Business Impacts of poorquality: some data may be bad, but if it doesnt stop

    important business processes, MOVE ON.

  • 8/2/2019 David Plotkin

    26/41

    The Data Quality Improvement CycleThe Data Quality Improvement Cycle

    (1) Identify andmeasure how poor

    data quality

    AnalyzeAnalyze

    objectives

    (2) Definebusiness-related

    data quality rules &performance

    targets

    (3) Design qualityimprovement

    quality againsttargets

    remediate processflaws.

    (4) Implementquality

    improvementmethods and

    processes

  • 8/2/2019 David Plotkin

    27/41

    Business Results Metrics ExampleBusiness Results Metrics Example

    Cost of poor quality data to your business:

    Calling/Mailing costs: How many times did we contact someone who already

    had a particular type of policy or who was not eligible for that type ofpolicy? How much postage/time was wasted?

    Loss of productivity/opportunity cost: How many policies could have been sold ifagents had only contacted eligible policyholders? How much would those policies

    have been worth?Loss of business cost: How many policyholders canceled their policies becausewe didnt understand their needs or didnt appear to value their business (surveycan give you an idea). What is the lost lifetime value of those customers?

    Compliance cost: How much did we spend responding to regulatory or auditrequests (demand!). How much of that was attributable to poor data quality orinformation not available?

  • 8/2/2019 David Plotkin

    28/41

    Steps to Data Quality Analysis and ReconciliationSteps to Data Quality Analysis and Reconciliation

    Data Profiling

    Reviewing the data quality analysis with Data Stewards todetermine acceptable ranges of data quality, associated risk,rans orma on gu e nes, an recommen a ons on a acleansing.

    The development of required ETL processing to cleanse thedata.

    Only want to do this once after the process has been fixed. Or thats the theory, anyway

  • 8/2/2019 David Plotkin

    29/41

    Collecting the Data Quality RulesCollecting the Data Quality Rules

    Get the rules from the Data Stewards

    Create a tem late to collect the ualit rules:

    Mandatory, optional, valid values, valid range, data type,patterns

    e a ons ps e ween a a e emen s

    Relationships between records in different tables

    u e conversa ons w s ewar s o ga er ru es

    Helping the business help us define what we mean by

    goo qua y or a a a e emen . Can help to pre-profile the data (do a sample extract)

    o s ow e s ewar s w a s ac ua y presen now.

  • 8/2/2019 David Plotkin

    30/41

    What is Data Profiling?What is Data Profiling?

    Data Profiling is a process whereby one examines thedata available in an existing database and collectsstat st cs an n ormat on a out t at ata.Wikipedia, http://en.wikipedia.org/wiki/data_profiling

    Data Profiling is the use of analytical techniques todiscover the structure, content, and quality of data.

    , , .

    Data Profiling is a set of algorithms for statistically

    within a data set as well as exploring relationships thatexist between data elements or across data sets.David Loshin, Knowledge Integrity, Inc.

  • 8/2/2019 David Plotkin

    31/41

    What is Data Profiling (continued)?What is Data Profiling (continued)?

    Uses both real data and metadata to determine thequality of data.

    Identified source data requires both a detailedanalysis of the raw data valuescurrently stored in

    ,existing metadata, to determine the actual

    meaning, descriptions and relationships that shouldbe found in the data.

    Data profiling should be used whenever data is

    being converted, migrated, warehoused or mined.

    Can hel discover business rules embedded withindata sets, which can be used for ongoing inspection

    and monitoring.

  • 8/2/2019 David Plotkin

    32/41

    General Benefits from Data ProfilingGeneral Benefits from Data Profiling

    Identify or validate

    availability of information.

    Rapid assessment of which

    fields are consistently populated

    against model expectations.

    Improve predictability of Focus data quality efforts where

    pro ec me nes. ey are rea y nee e .

    Lower the risk of design Improve visibility to quality of

    .

    decision making.

    migration testing support.

    transitional data stores.

    Support compliance and Identify transformation rules for

    audit requirements. migration and integration.

    Danette McGilvray, Granite Falls Consulting, Inc.

  • 8/2/2019 David Plotkin

    33/41

    Benefits: Saves the Programmers time and effortBenefits: Saves the Programmers time and effort

    Programmers already examine the data to makesure their work doesnt lead to code/load/explode.

    If they believe what they are told about the data contents,

    it invariably leads to code failures.

    decide whether to code around the bad data or fix it.

    Profiling puts a rigorous process in place to prevent the.

    Real example: 24 defects, $556,000 in development time,$142,000 in QA time, 6 month delivery delay because of

    unexpecte ata n t e ee .

  • 8/2/2019 David Plotkin

    34/41

    Scope of the Data Profiling ProcessScope of the Data Profiling Process

    Not just done on raw data elements:

    Includes counts and aggregations

    Other derived values

    Can be run on:

    Individual columns

    Across columns in a table

    Across applications and databases

  • 8/2/2019 David Plotkin

    35/41

    Using Data Profiling for DQ AssessmentUsing Data Profiling for DQ Assessment

    1. Extract data to be profiled

    2. Analysts profilethe data using a profilingtool and review results

    3. Potential anomalies are noted withintools repository. Record:The data element in question

    Why it might be an issue4. Reports are generated

    from the profiling tool andreviewed by business

    Subject matter experts 5. Issues are reviewedand evaluated, e.g.,Red: definitely an issueGreen: not an issueYellow: requires additional

    .

    Gray: Out of scope

    6. Results reviewed.

  • 8/2/2019 David Plotkin

    36/41

    Data Profiling is also a processData Profiling is also a process

    DetermineIssues

    DefineProfile the data

    Using aReview Analyze

    Worth

    fixing

    1 32 4

    Rules

    Data Profilingtool

    Findings

    Issues Set and

    Enforce

    DataQualitytargets

    6

    Monitor ongoingData Quality

    Impacts on MetadataImpacts on Metadata

  • 8/2/2019 David Plotkin

    37/41

    Impacts on MetadataImpacts on Metadata

    The data quality rules discovered via data profilingare metadata.

    The results (quality of the data) are also metadata

    Profiling results in a determination that either:

    correct and the data is wrong, or

    The data is correct and the metadata (data quality rules) are

    wrong Unless they are both wrong

    e a a a nee s o e recor e

    What Data Profiling AchievesWhat Data Profiling Achieves

  • 8/2/2019 David Plotkin

    38/41

    What Data Profiling AchievesWhat Data Profiling Achieves

    Accurate

    Accurate and

    Inaccurate

    Metadata

    ProfilingData:

    Accurate and

    Inaccurate

    Data

    Inaccurate

    Data QualityIssues

    A l i A l f bi thd tA l i A l f bi thd t

  • 8/2/2019 David Plotkin

    39/41

    Analysis: An example of birthdatesAnalysis: An example of birthdates

    Check out the beginning of the year

    Looks too high

    .

    Fi i hi UFi i hi U

  • 8/2/2019 David Plotkin

    40/41

    Finishing UpFinishing Up

    Data Governance is a program that needs corporatesupport and an organization

    Data is an asset that must be defined, managed,stewarded and governed.

    Accountability and Communication are crucial.

    Data Governance program

    corporation is a primary goal of Data Governance

    Thank o and an q estions?Thank o and an q estions?

  • 8/2/2019 David Plotkin

    41/41

    Thank you andany questions?Thank you andany questions?


Recommended