+ All Categories
Home > Documents > DQAF: A Comprehensive Approach to Data Quality Assessment › idq2013 › presentations › ...•...

DQAF: A Comprehensive Approach to Data Quality Assessment › idq2013 › presentations › ...•...

Date post: 27-Jan-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
28
DQAF: A Comprehensive Approach to Data Quality Assessment Laura Sebastian-Coleman, Ph.D., IQCP [email protected] Information and Data Quality Conference (IDQ) November 4-7, 2013 Little Rock, AR The better we understand our data, the more value we can get from it!
Transcript
  • DQAF: A Comprehensive Approach to Data Quality Assessment Laura Sebastian-Coleman, Ph.D., IQCP [email protected] Information and Data Quality Conference (IDQ) November 4-7, 2013 Little Rock, AR

    The better we understand our data, the more value we can get from it!

    mailto:[email protected]

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

    Presentation Goals & Agenda • This presentation will describe a comprehensive approach data quality

    assessment, from initial profiling to in-line controls and measurements through periodic measurement. From it, I hope you will – Increase your awareness of how different forms of data assessment enable

    successful data management – Identify specific ideas that you can implement in your environments

    • Agenda – Very Quick Introductions – Optum & me – Review Initial Assumptions and a Problem statement – Define key terms: Data, Data Quality, Measurement, Data Quality

    Assessment – Review how these concepts influence how we manage data and measure

    data quality – Discuss the context of data assessment defined by the Data Quality

    Assessment Framework (DQAF)

    2

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

    About Optum • Optum is a leading information and technology-enabled health services

    business dedicated to helping make the health system work better for everyone.

    • With more than 35,000 people worldwide, Optum delivers intelligent, integrated solutions that modernize the health system and help to improve overall population health.

    • Optum solutions and services are used at nearly every point in the health care system, from provider selection to diagnosis and treatment, and from network management, administration and payments to the innovation of better medications, therapies and procedures.

    • Optum clients and partners include those who promote wellness, treat patients, pay for care, conduct research and develop, manage and deliver medications.

    • With them, Optum is helping to improve the delivery, quality and cost effectiveness of health care.

    3

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

    About me • 10+ years experience in data quality in the health care industry • Working on the IT side of things, in data warehousing • Much of my thinking about data quality and governance has been

    influenced by the demands of data warehousing – Processing and storing large data sets according to a schedule – Storing data in a relational model

    • I think of IT functions as a form of data stewardship. – Data quality (esp. in large data assets) depends on data

    management practices, which are IT’s responsibility. • Have worked in banking, manufacturing, distribution, commercial

    insurance, and academia. These experiences have influenced my understanding of data, quality, and measurement.

    • Published Measuring Data Quality for Ongoing Improvement (2013). This presentation is based on content from that book.

    4

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 5

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

    Assumptions and Problem Statement • Data assessment is fundamental to improving data quality as well as to

    successful, long-term data management. – Data as a product and manufacturing quality assurance – Redman: “Hold the gains”; English, “Sustain data quality” – Define, Measure, Analyze, Improve, Control (DMAIC) – begins and ends with

    assessment

    • Many of us are familiar with the benefits of profiling of data during development projects, but few of us establish clear goals and apply consistent measurement activities at other points in the information lifecycle.

    – In fact, few of us establish goals related to profiling data – or those goals are fuzzy – ‘data discovery … get to know the data’

    – Profiling tools focus on collecting data, not on what to do with the results of data collection.

    – Once a customer has signed off, few of us monitor the quality of data to ensure it remains at expected levels.

    • So… the problem is: We know data is a product of defined processes to which principles of quality control can be applied, but we do not apply them effectively.

    6

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 7

    Define Key Terms – Data • The New Oxford American Dictionary (NOAD) defines data as “facts and

    statistics collected together for reference or analysis.” • ASQ defines data as “A set of collected facts.” ASQ identifies two kinds of

    numerical data: “measured or variable data … and counted or attribute data.” • ISO defines data as “re-interpretable representation of information in a

    formalized manner suitable for communication, interpretation, or processing” (ISO 11179).

    • I define data as: abstract representations of selected characteristics of real-world objects, events, and concepts, expressed and understood through explicitly definable conventions related to their meaning, collection, and storage.

    • Observations about the concept of data: – Data tries to tell the truth about the world (“facts”) – Data is formal – it has a shape – Data is created through human choices, so to understand data’s “truth” you need to

    understand the choices that influence its shape; that is, you need to understand how data effects – brings about – its representation of the world

    – In today’s world, data is largely digital, stored in information systems; these systems influence data’s shape. In some cases, we shape data to fit the system. Doing so can distort their “truth”.

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 8

    Define Key Terms – Data Quality • Data Quality / Quality of Data:

    – The level of quality of data represents the degree to which data meets the expectations of data consumers, based on their intended use of the data.

    – Because data also serves a semiotic function (it serves as a sign of something other than itself), data quality is also directly related to the perception of how well data effects (brings about) this representation.

    • Observations: – High-quality data meets expectations for use and for representational

    effectiveness to a greater degree than low-quality data. – Assessing the quality of data requires understanding those expectations

    and determining the degree to which the data meets them. – Assessment requires understanding

    • The processes that created data • The systems through which the data is created • The concepts the data represents • The known and potential uses of the data

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

    Define Key Terms – Measurement • Measurement: The process of measurement is the act of ascertaining

    the size, amount, or degree of something. – Measuring always involves comparison. Measurements are the

    results of comparison – even if the thing you compare to is a your hand or your foot.

    – Measurement most is a means to quantify comparisons. • Observation: Measurement is both simple and complex.

    – Simple because we do it all the time and our brains are hard-wired to understand parts of our world we do not know in terms of the parts we do know. • “I have measured out my life with coffee spoons…”

    – Complex because, for those things we have not measured before, we often do not have a basis for comparison, the tools to execute the comparison, or the knowledge to evaluate the results. • If you don’t believe me, imagine trying to understand the meaning of

    “temperature” in a world without thermometers.

    9

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 10

    Define Key Terms – Data Quality Assessment • Assessment is the process of evaluating or estimating the nature,

    ability, or quality of a thing. • Data quality assessment is the process of evaluating data to identify

    errors and understand their implications (Maydanchik, 2007). • Observations

    – Like measurement, assessment requires comparison. Further, assessment implies drawing a conclusion about—evaluating—the object of the assessment, whereas measurement does not always imply doing so.

    – Most often, data quality assessment understands the condition of data in relation to particular expectations, requirements, or purposes in order to draw a conclusion about whether it is suitable for those expectations, requirements, or purposes.

    – This process always implies the need also to understand how effectively data represent the objects, events, and concepts it is designed to represent.

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 11

    Functions in Assessment: Collect, Calculate, Compare, Conclude

    •Conceptually, assessment is simple – even if implementing it can be hard. •All assessment has a similar shape.

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

    Improving Data Quality • Producing high quality data requires a combination of technical and

    business skills and knowledge, including data knowledge • Sustaining high quality data requires documenting and managing

    knowledge (i.e., managing metadata, including measurement results). • Purposeful measurement of objectively defined quality characteristics

    – Creates a foundation for trust – Identifies opportunities for improvement and enables confirmation that

    improvements stick – Provides continuous assurance of data quality – Builds explicit data knowledge within an organization

    • In short: Measuring data quality is a means of understanding data – building knowledge about it – in all of its complexity – in order to manage it.

    • How do we develop that understanding?

    12

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 13

    DQAF • Data Quality Assessment Framework (DQAF) is a descriptive taxonomy of 48

    measurement types designed to help people measure the quality of their data and use measurement results to manage data.

    – Conceptual and technology-independent (i.e., it is not a tool or software and not dependent on a particular tool).

    – Generic – it can be applied to any data.

    • Focused on an objective set of data quality dimensions: • Completeness • Timeliness • Validity • Consistency • Integrity

    • This approach includes: – Processes for assessing the condition of data against defined expectations – Processes for defining data quality requirements and associating them with measurement types – An approach for automating the collection and use of data quality measurement results – A common logical data model for storing measurement results

    • Organizations can automate the optimal sub-set of measurement types that best help them manage their data.

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 14

    DQ Dimension, Measurement Type, Specific Metrics

    DIMENSIONS The WHY of

    measurementCompleteness Timeliness Validity Consistency

    MEASURE-MENT TYPES

    The HOW of measurement

    Compare summarized data in amount fields to summarized amount provided in a control record

    Compare actual time of data delivery to scheduled data delivery

    Compare values on incoming data to valid values in a defined domain (reference table, range, or mathematical rule)

    Compare record count distribution of values (column profile) to past instances of data populating the same field.

    SPECIFIC DATA

    QUALITY METRICSThe WHAT of measurement

    Total dollars on Claim records balances to total on control report

    Claim file delivery against time range documented in a service level agreement

    Validity of Revenue Codes against Revenue Code table

    Percentage distribution of adjustment codes on Claim table consistent with past population of the field

    All valid procedure codes are on the procedure code table

    Confirm record level (parent / child) referential integrity between tables to identify parentless child records, (i.e., “orphan”) records

    Integrity

    Dec

    reas

    ing

    abst

    ract

    ion,

    [inc

    reas

    ing

    spec

    ifici

    ty, c

    oncr

    eten

    ess]

    . Clo

    ser p

    roxi

    mity

    to d

    ata

    Incr

    easi

    ng a

    bilit

    y to

    und

    erst

    and

    and

    inte

    rpre

    t mea

    sure

    men

    t res

    ults

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 15

    DQAF: The Context of Data Assessment

    Automated Process Controls

    Ensure correct receipt of data

    Inspect initial condition of data

    Measure data set content

    In-line Measurement Periodic Measurement

    Measure data validity

    Measure data consistency

    Take post-processing data

    set size measurements

    Measure cross-table integrity of

    data

    Assess overall sufficiency of

    database content

    Measure data content

    completeness

    Manage Data within and between Data Stores with Controls and Ongoing Measurement

    Assess the condition of data

    (profile data)

    Support Processes and Skills

    Assess completeness,

    consistency, and integrity of the

    data model

    Manage Data within a Data Store

    Take pre-processing

    timing measurements

    Take post-processing

    timing measurements

    Take pre-processing data

    set size measurements

    Assess data criticality; define measurement requirements

    Understand business

    processes represented by

    the data

    Assess sufficiency of

    metadata and reference data

    Review and understand

    processing rules

    Initial One-Time Assessment

    Gain Understanding of the Data & Data Environment

    Assess effectiveness of measurements

    and controls

    Goa

    lA

    sses

    smen

    t C

    ateg

    ory

    Mea

    sure

    men

    t Act

    iviti

    es

    Note: measurement activities correspond roughly to DQAF measurement types. However, not all DQAF measurement types are included in the schematic.

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

    Results of Data Assessment • The following three slides associate deliverables from each of the

    measurement activities. • Through these deliverables….

    – Metadata is produced, including: • Expectations related to the quality of data, based on dimensions of

    quality • Objective description of the condition of data compared to those

    expectations • Documentation of the relation of data’s condition to processes and

    systems – rules, risks, relationships – Data and process improvement opportunities can be identified and

    quantified, so that decisions can be made about which ones to address.

    16

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

    Initial Assessment

    17

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

    In-line measurement

    18

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

    Periodic Measurement

    19

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 20

    Using the DQAF • The goal is to implement an optimal set of specific measurements in a specific system

    (i.e., Implementing all the types should never be the goal of any system). • Implementing an optimal set of specific measurements requires:

    – Understanding the criticality and risk of data within a system. – Associating critical data with measurement types. – Building the types that will best serve the system by

    • Providing data consumers a level of assurance to that data is sound based on defined expectations

    • Providing data management teams information that confirms that data moves through the system in expected condition

    • The different kinds of assessment are related to each other. – Initial assessment drives the process by separating data that meets expectations

    from data that does not and helping identify at risk and critical data for ongoing measurement.

    – Monitoring and periodic measurement identify data that may cease to meet expectations and data for which there are improvement opportunities.

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 21

    Measurement Decision Tree

    Initial Assessment

    Assessed data sets

    Data meets expectation

    Data does not meet

    expectation

    Critical data

    Less critical data

    Periodic reassessment

    Critical data

    Less critical data

    Improvement projects

    Ongoing monitoring

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

    Importance of DQ measurement • Data is not easy and organizations struggle to get value out of their

    data due to lack of trust and lack of knowledge. • Producing and managing high quality data requires a combination of

    technical and business skills and knowledge, including data knowledge. • Organizations gain competitive advantage from their data only if they

    know their data. • The DQAF processes intensify that advantage by

    – Helping organizations learn more about their own data – Providing a means through which to communicate and use that

    knowledge

    22

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 23

    Comments & Responses

  • Appendices

    24

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 25

    Appendix: DQAF-specific Terms • Measurement Type: Within the DQAF, a measurement type is a subcategory of a

    dimension of data quality that allows for a repeatable pattern of measurement to be executed against any data that fits the criteria required by the type, regardless of specific data content. The measurement results of a particular measurement type can be stored in the same data structure regardless of the data content.

    • Assessment Scenario In the DQAF, an assessment scenario is a setting and process for particular kinds of data assessment. These are connected with the information life cycle. Assessment scenarios include: initial, one-time assessment of data for purposes of data discovery and understanding; measurement of data within improvement projects to ensure that improvements have had the desired effects; in-line measurement and control of data to sustain and improve data quality; and periodic measurement of data quality, also to sustain and improve it.

    • Assessment Category In the DQAF, an assessment category is a way of grouping measurement types based on where in the data life cycle the assessment is likely to be taken. Assessment categories pertain to both the frequency of the measurement (periodic or in-line) and the type of assessment involved (control, measurement, assessment). They correspond closely to assessment scenarios and include: initial assessment, process control, in-line measurement, periodic measurement, and periodic assessment.

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 26

    Appendix: Sample Measurement Types Dimension of

    Quality Measurement

    Type Measurement Type Description Object of

    Measurement Assessment

    Category

    Completeness

    Data set completeness –

    summarized amount field

    data

    For files, compare summarized data in amount fields to summarized

    amount provided in a control record Receipt of data Process control

    Timeliness Timely delivery

    of data for processing

    Compare actual time of data delivery to scheduled data delivery

    Process / Adherence to

    schedule

    In-line measurement

    Validity Validity check,

    single field, detailed results

    Compare values on incoming data to valid values in a defined domain

    (reference table, range, or mathematical rule)

    Content / row counts

    In-line measurement

    Consistency Consistent

    record counts by aggregated date

    Reasonability check, compare record counts and percentage of record counts associated an aggregated date, such as a month, quarter, or

    year, to historical counts and percentages

    Content / aggregated date

    Periodic measurement

    Integrity / Consistency

    Consistent defaults cross-

    table

    Assess column properties and data for consistent default value for fields

    of the same data type Data model Initial one-time assessment

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 27

    Appendix: The Classics of Information Quality • English, Larry P. (1999). Improving Data Warehouse and Business Information Quality.

    Indianapolis, IN: Wiley. • English, Larry P. (2009) Information Quality Applied. Indianapolis, IN: Wiley. • Loshin, David. (2001). Enterprise Knowledge Management: The Data Quality Approach.

    Boston, MA: Morgan Kaufmann. • Loshin, David. (2011). The Practitioner’s Guide to Data Quality Improvement. Boston, MA:

    Morgan Kaufmann. • Maydanchik, Arkady. (2007). Data Quality Assessment. Bradley Beach, NJ: Technics

    Publications, LLC. • McGilvray, Danette. (2008). Executing Data Quality Projects: Ten Steps to Quality Data and

    Trusted Information.™ Boston, MA: Morgan Kaufmann. • Mosely, Mark, Brackett, Michael, Early, Susan, & Henderson, Deborah (eds.). (2009). The Data

    Management Body of Knowledge (DAMA-DMBOK Guide). Bradley Beach, NJ: Technics Publications, LLC.

    • Olson, Jack. (2003). Data Quality: The Accuracy Dimension. Boston, MA: Morgan Kaufmann. • Redman, Thomas C. (2008). Data Driven: Profiting from Your Most Important Business Asset.

    Boston, MA: Harvard Business Press. • Redman, Thomas C. (2001). Data Quality: The Field Guide. Boston, MA: Digital Press. • Redman, Thomas C. (1996). Data Quality for the Information Age. Boston, MA Artech House. • Wang, Richard. (1998, February). A Product Perspective on Total Data Quality Management.

    Communications of the AMC. 58-65. • Wang, Richard and Strong, Diane. (1996, Spring). Beyond Accuracy: What Data Quality Means

    to Customers. Journal of Management Information Systems. 5-33.

  • Confidential property of Optum. Do not distribute or reproduce without express permission from Optum.

    Recommended Reading – Will make you think differently about data • Chisholm, Malcolm D. (2010). Definitions in Information Management: A Guide to the

    Fundamental Semantic Metadata. Canada: Design Media. • Chisholm, Malcolm D. (2012-08-16) Data Quality is Not Fitness for Use. Information

    Management. http://www.information-management.com/news/data-quality-is-not-fitness-for-use-10023022-1.html

    • Crease, Robert P. (2011). World in the Balance: The Historic Quest for an Absolute System of Measurement. New York: W. W. Norton Company.

    • Derman, Emanuel. (2011). Models. Behaving. Badly.: Why Confusing Illusion With Reality can lead to Disaster on Wall Street and in Life. New York: Free Press.

    • Gould, Stephen Jay. (1996). The Mismeasure of Man. New York, NY: Norton. • Ivanov, Kristo. (1972). Quality-Control of Information: On the Concept of Accuracy of

    Information in Data-Banks and in Management Information Systems. Stockholm, Sweden: The Royal Institute of Technology and the University of Stockholm Sweden.

    • Kent, William. (2000). Data and Reality. Bloomington, IN: 1st Books Library. • Taleb, Nassim Nicholas. (2007). The Black Swan: The Impact of the Highly Improbable.

    New York, NY: Random House. • Tufte, Edward R. (1983). The Visual Display of Quantitative Information. Cheshire, CT:

    Graphics Press. • West, Matthew. (2011). Developing High Quality Data Models. Boston, MA: Morgan

    Kaufmann.

    28

    http://www.information-management.com/news/data-quality-is-not-fitness-for-use-10023022-1.htmlhttp://www.information-management.com/news/data-quality-is-not-fitness-for-use-10023022-1.html

    DQAF: A Comprehensive Approach to Data Quality Assessment Presentation Goals & AgendaAbout OptumAbout meSlide Number 5Assumptions and Problem StatementDefine Key Terms – Data Define Key Terms – Data QualityDefine Key Terms – MeasurementDefine Key Terms – Data Quality Assessment Functions in Assessment: Collect, Calculate, Compare, ConcludeImproving Data QualityDQAFDQ Dimension, Measurement Type, Specific MetricsDQAF: The Context of Data AssessmentResults of Data Assessment Initial AssessmentIn-line measurementPeriodic MeasurementUsing the DQAF Measurement Decision TreeImportance of DQ measurementComments & Responses AppendicesAppendix: DQAF-specific Terms Appendix: Sample Measurement TypesAppendix: The Classics of Information QualityRecommended Reading – Will make you think differently about data


Recommended