+ All Categories
Home > Documents > The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for...

The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for...

Date post: 13-Mar-2018
Category:
Upload: dolien
View: 218 times
Download: 1 times
Share this document with a friend
133
The Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting Group www.navesinkconsultinggroup.com Information and Data Quality Conference, Nov 4-7, 2013, Little Rock, AR /Redman-IDQ-DQtutorial-Nov2013 T.C. Redman, Page 1 © Navesink Consulting Group LLC, 2000-2013
Transcript
Page 1: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

The Management System for Data Quality (IDQ Tutorial)

Thomas C. Redman, Ph.D. “the Data Doc”

Navesink Consulting Group www.navesinkconsultinggroup.com

Information and Data Quality Conference, Nov 4-7, 2013, Little Rock, AR

/Redman-IDQ-DQtutorial-Nov2013 T.C. Redman, Page 1 © Navesink Consulting Group LLC, 2000-2013

Page 2: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

“Customers” for this presentation

o Practitioners: Who want to know what they need to do (and a bit of “how to do it”)

o Managers: Who want to know where they fit o Senior Leaders: Who wonder what all the chatter is

about. And if it is real, what they need to do to get in front.

o Live heads: Who want to understand why we think about data quality the way we do.

o Good hearts/Agents of Change: Who want to sort out what’s best for their companies.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 2

Page 3: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

The Big Ideas o Data quality, done right, is a huge win/win! o You have to get in front! This means you have find and eliminate

the root causes of error. o Quality is in the eyes of the customer. o You have to do the technical work tolerably well. o Data quality is done in the line (e.g., business process). o You have to overcome organizational momentum, so actively

manage change. o Get responsibility for DQ out of Tech. Built human capability. o Sooner or later, you’re going to need a data strategy. o The quality program only goes as far as the senior team (perceived

to be) leading the effort insists.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 3

Page 4: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Data Quality Done Right

¨ One data set for a market data vendor ¨ Quality Improvement in a data-intensive department ¨ A long-term, company-wide effort

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 4

Page 5: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Market Data Vendor Background: Financial service companies purchase market data from

companies such as Reuters, Bloomberg, etc. o Lack of trust causes them to purchase basic data from multiple

sources. o Bank request: Far better data, so it could reduce its vendor base

and eliminate downstream costs of bad data. Work conducted: o Clear statement of customer needs. o Measurement against those needs. o Root causes identified and addressed, one at a time. o Statistical control. o In the course of day-in, day-out work.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 TCR, Page 5

Page 6: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Market Data Example: Results

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 TCR, Page 6

Each error not made saves an average of $500. Quickly millions!

Summary message: Even in enormous organizations, the day-in, day-out work of data quality managements (customer needs, measurement, improvement, control) is conducted at the work group level.

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20

Frac

tion

Per

fect

Rec

ords

Month

First-time, on-time results

Accuracy Rate ave lower control limit upper control limit target

2. First Meas

Program start

1. Rqmts defined

3. Improvements 4. Control

Page 7: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Access Financial Assurance at AT&T Background: AT&T expenditures for “access” about $20B/yr. o Access Financial Assurance aims to ensure integrity of access

bills, through parallel “billing.” Key Idea: Get the bill right the first time. Work conducted: o Dissatisfied middle manager, seeking a better way. o Top-down deployment. o Staff group defined series of deliverables, then audited (regional)

compliance. o Supplier and process management. o Customer needs, measurement, improvement, control.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 TCR, Page 7

Page 8: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Results: Access Financial Assurance

o Data accuracy improved 90%. o Billing errors reduced 98%. o Cycle time (bill period closure) reduced 67%. o AT&T costs (of financial assurance) reduced 73%

($100M/year). o LEC costs (of access billing) reduced 20%.

Summary message: There is much hidden “non-value-

added work” built in to accommodate bad data.

/Redman-IDQ-DQtutorial-Nov2013 TCR, Page 8 © Navesink Consulting Group LLC, 2000-2013

Page 9: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Enterprise Programme at BT* (British Telecom) o Revenue: $33 billion/yr o Employees: 95,000 o Operates in 170 countries o 22 Million customers (4 Million business) Enterprise Data Quality Improvement Programme (10-year effort) o Recognized the inherent complexity of people, process, technology issues

(e.g., data quality problems masquerading as “systems issues.”) o Explicit linkage of data (quality improvement) to strategic business

objectives (e.g., business transformation). o Over time, magnitude of DQ problem understood and exposed. o Governance structure starting at the very top. o Consolidated expertise in data quality improvement in IT. o Estimated and delivered benefits vetted by Finance.

/Redman-IDQ-DQtutorial-Nov2013 TCR, Page 9

*This summary largely courtesy of Nigel Turner, who led BT’s programme and is now at Trillium Software. He has vetted this summary.

© Navesink Consulting Group LLC, 2000-2013

Page 10: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

BT – Results Enterprise Data Quality Improvement Programme, cont’d o Dual focus on reducing capital expenditure and the rework that results

from searching for “lost network facilities.” o Problem discovery, measurement, audits, new controls (hold the gains)

enhanced by Trillium DQ tool suite. o Focused on “big improvement projects” (delivered 75 over ten years). o Dual focus on data clean-up and process improvement. Business Benefits: o > $1B (verified and conservative) o Also improved customer satisfaction, better regulatory compliance,

reversed brand damage and revenue leakage, and contributed to business transformation: These benefits not quantified.

Summary Message: More than you might think, data permeate everything. Bad data are “silent killers.”

/Redman-IDQ-DQtutorial-Nov2013 TCR, Page 10 © Navesink Consulting Group LLC, 2000-2013

Page 11: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

For Data, Only Two Moments Really Matter

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 11

The moment of use The moment of

creation

The whole point of data quality

management is to connect the two!

Page 12: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 12

To Clean Up The Lake, One Must First Eliminate The Sources Of Pollutant

The Non-delegatable Choice

Unm

anag

ed

Page 13: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

It is so easy for accountability to shift downstream!!!

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 13

Here’s how you do

number 3!

Page 14: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

The Management Systems for Data and Data Quality touch every other management system

Other Mgmt

Systems

Mgmt Sys for People

Mgmt Sys for Capital Assets

Management System for data

Management System for DQ

Overall management system

/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 14 © Navesink Consulting Group LLC, 2000-2013

The management system for data embraces the totality of effort directed at acquiring data, ensuring their quality, putting them to work, competing with them, etc.

Page 15: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Management System for Data o “Structure follows strategy” (Alfred Chandler)

© Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 15

Objective/ Goal/

Strategy Organization* Process Technology

Strategy: • Goals • Scope • Comp Advantage • Achievable

Organization • People • Org Structure • Governance • Culture

D4 Process • Data • Discovery • Delivery • Dollars DQ Mgmt Metadata

Technology • to increase scale/decrease unit cost

/Redman-IDQ-DQtutorial-Nov2013

* After Roberts, The Modern Firm

Page 16: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Who Does the Work?*

/Redman-IDQ-DQtutorial-Nov2013

8. Formalize management

accountabilities for data

Senior Leadership:

Middle Management (Command):

10. Advance a culture that

values data and data quality

9. Broad, informed,

demanding leadership

2. Manage processes that create data (so they do so correctly)

3. Manage “suppliers” (both inside and out) of

data

1. Focus on the most important

needs (of customers)

6. Improvement: Find and

eliminate root causes of error

4. Measure quality levels

against customer needs

5. Deploy controls, at all

levels, to remain error-free*

7. Set and meet

aggressive targets for

Improvement: top-to-bottom

Work is highly interconnected

Everyone who touches data = Four Basic “Tasks”

Taken together, the tasks define an

overall “Management

System for Data Quality”

16

*Ten habits of those with the best data from Redman, Data Driven: Profiting from Your Most Important Business Asset, Harvard Business Press, 2008.

© Navesink Consulting Group LLC, 2000-2013

Page 17: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Quality Means Customer Satisfaction

¨ Why we define quality the way we do. ¨ Data Defined ¨ Understand and communicate the Voice of the Customer.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 17

Page 18: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

© Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 18

Debates about “what quality means” are age-old

/Redman-IDQ-DQtutorial-Nov2013

Defect-free Reliable

Inexpensive to drive and operate

Comfort Room for plenty of

suitcases Status

Circa 1980: Which is of higher quality: The feature-rich Cadillac or the defect-free Volkswagen bug?

Presenter
Presentation Notes
This slide introduces a discussion about what quality means. It illustrates two “schools of thought:” That quality is mostly concerned with the absence of defects. That quality is mostly concerned with having the features that people want. The slide provides, by way of illustration, the arguments that both sides used. Importantly, both cars are iconic classics and, well-told, the story illustrates the passions that each side brought to the argument. I’ve given some thought to more modern examples. But I’ve given this example dozens of times. And almost everyone resonates with it.
Page 19: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Dr. Juran resolved the issue

o He recognized that different “customers” had different needs and desires.

o He recognized that each customer makes different tradeoffs between features and freedom from defects.

o He proposed the term “fitness for use” (or “fitness for purpose”) to unite the two concepts.

o Perhaps most importantly, he recognized customers decide whether a product, service, or collection of data is of high quality. 19 /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 20: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 20

Data Quality Data are of high quality if they are fit for their intended uses

(by customers) in operations, decision-making, and planning (after Juran).

free of defects: - accessible - accurate - up-to-date - kept secure - etc.

possess desired features: - relevant - comprehensive - proper level of detail - easy-to-interpret - etc.

Data that’s fit for use

Customers are the ultimate arbiters of quality!!

Largely, “the right data”

Mostly, “are the data right?”

Page 21: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 21

Data Quality - aspirational

“Exactly the right data and information in exactly the right place at the right time and in the right format to complete an operation, serve a client, make a decision, or set and execute strategy.”*

*based on Redman, Data Driven: Profiting from Your Most Important Business Asset, Harvard Business Press, 2008

© Navesink Consulting Group LLC, 2000-2013

Page 22: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 22

Data Quality – day-in, day-out

Meeting the most important needs of the most important clients.

© Navesink Consulting Group LLC, 2000-2013

Page 23: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

In quality management, we use the term “customer” quite broadly

o Paying customers o Other external customers, including:

n Shareholders n Regulators and government agencies n Other “stakeholders,” who may be impacted by the

data o Internal customers, including your manager (next slide).

23 /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Presenter
Presentation Notes
We now turn our attention to looking at who the customers are. Introduce the notion that in quality, the term “customer” is used broadly. Go through a list of categories of customers. Most people want to know where the boss fits in. I find it best to just acknowledge “management” as a customer.
Page 24: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Internal customers Most people and their work doesn’t impact external

customers directly. But everyone: o Impacts internal customers directly.

n It is critical that everyone understand who their customers are and what they need.

o Impact external customers indirectly, though a “chain of customers.” n It is also important that everyone understand this

chain.

24 /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 25: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

A chain of customers

25

Set up a master supplier

record

AP: Authorize and

make Payment

Supplier: Receive payment

Factory: Acquire supplier

raw materials

End Customer: Receive finished

Product

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Presenter
Presentation Notes
I think this is straightforward.
Page 26: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 TCR, Page 26

Let’s dig into the data layer, starting with a data item

An item of data consists of three elements:

The thing of interest in the real-world

The particular of interest

(entity, attribute, value)

The value assigned to the attribute for the entity

Example: (Jane Doe, Service Record Date = July 1, 1996)

Note that, as defined, data are abstract. “Customers” see them as they are presented in tables, databases, graphs, etc, via applications.

=data model

Presenter
Presentation Notes
Here’s how I use this slide: First, it is important to recognize that a data item is about something in the real-world. In data language, that something is called an “entity.” You are an entity, Jane Doe is an entity, the company itself is an entity, and so forth. When talking about data, it is important that the specific entity be clearly identified. For example, there is certainly more than one person named Jane Doe. So it is important that we realize we are talking about this specific Jane Doe. Next, data describe some aspect of the entity. In data language, we use the term attribute for each aspect. In the example here, the attribute of interest is “Service Record Date,” the day Jane started her employment with Shell. Finally, a value is assigned to the attribute for this entity. Here, the value assigned to Service Record Date for Jane Doe is July 1, 1996. As a point of reference you (the class) may have heard the term “data model.” Essentially a data model works out the entities and attributes of interest. Without values, it is a bit like a blank appointment calendar. Plenty of space for meetings (the values) but just a structure. I think it important to close the slide with the note that while entities are real, the data per se are not. They are but an abstraction. What we look at is called the data presentation. We’ll say a bit more about them later.
Page 27: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Note that “you” are an “entity” o Your employer is interested in you as an “Employee.” Attributes

include: n Date of Birth n Service Record Date, n Department, n Manager

o Your doctor is interested in you as a “Patient.” Attributes include: n Gender, n Height, n Weight n Blood Pressure

o The taxing authority is interested in you as a Taxpayer o This list can go on and on

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 27

Presenter
Presentation Notes
This slide aims to make the concept of an entity clear. It does so by first pointing out that each person is an “entity.” (if needed, the technical definition of entity is “real-world object or concept”). And like most entities, each of us is very complex, with many, many attributes. Various groups are interested in us for different reasons. As previously noted, your employer is interested in you as employees. So it cares about certain attributes (or fields), as listed. Similarly, a doctor. And the tax man. Each is interested in different attributes. It may bear mention that the term “entity class” is a group of similar entities. Thus “Employee,” “Patient,” and “Taxpayer” are all entity classes.
Page 28: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

The ways an organization chooses to “model” the real-world is critically important

o An organization must decide: n What entities (and entity classes) are most important n What attributes it needs about these.

o (After Twain): “The difference between the ‘right data’ and ‘almost right data’ is like the difference between lightning and a lightning bug.”

o From a quality perspective, the data must be: n Relevant to the task at hand n At the proper level of detail n Clearly defined n Etc

o All of us must make sure we understand exactly what the data we’re using mean!

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 28

Presenter
Presentation Notes
Now draw the line from “each person is an entity” to the data an organization needs to collect. Propose to start the discussion this way: It is infeasible to know all things about all entities (even as database technologies expand). So the company has to make some choices. Obviously thinking of “you” as a patient does not serve your employer’s interest. It needs to think of you as an employee. This is can be involved. Bankers call their customers “accounts,” lawyers call theirs “clients,” tech companies call theirs “users.” Different relationships between the company and customer are implied. So selecting the right one, consistent with the business, is critical. This calls the Twain saying to mind. The last two bullet items are straightforward. But the last bullet item comes at this from a slightly different perspective, turning from data per se, to the class participant’s personal accountability of the last bullet item.
Page 29: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Choosing the right data can be difficult

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 29

Football: And consequential! Data are subtle and nuanced. 1. As noted, people pay more

for exactly the right data! 2. Internally they have

become our lingua franca.

Presenter
Presentation Notes
This slide aims to point out how hard it can be to establish “the right word” and then to connect to Shell needs for exactly the right data. In the analogy, “word = data.” To start, and on the left-hand side, use “football,” to illustrate the general point. The sport is played almost everywhere on the planet, but the term “football” takes on a different meaning in different countries. Then the right-hand side considers the difficulties in pinning down exactly what is meant by “well,” a term near and dear to Shell. I think the points about data being subtle and nuanced are straightforward enough. One note: I took the HSBC graphic from an ad, some eight of ten years ago. To my knowledge, the ad bore no copyright. But Shell may wish to check. And I’ll change it if needed.
Page 30: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Data “presentation”

As we’ve defined them, data are “abstract.” To use them, we must view physical representations in

tables and graphs, on paper and computer screens, via reports and applications.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 30

Presenter
Presentation Notes
To fully understand data quality, we have to introduce one more very important concept, that of data presentation. I’m pretty sure the slide is straightforward enough.
Page 31: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

There are quality dimensions associated with presentation as well

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 31 T. C. Redman, Page 31

year Sales, $M, US

1995 2 1996 1.8 1997 2.2 1998 2 1999 2.4 2000 2.2 2001 2.6 2002 2.4 2003 2.8 2004 2.6 2005 3

It is easier to understand this

Than this

Sales in $M US

00.5

11.5

22.5

33.5

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

Sale

s, in

$M

, US

The most important is “ease of interpretation”

Presenter
Presentation Notes
The slide aims to point out that the whole idea of presentation is to make it easy to understand, interpret, and put the data to work. Like everything in data quality, there are literally dozens of possible “dimensions.” The slide calls out the most important, using a graph, which is much easier to understand than the corresponding table.
Page 32: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

32

Understanding customer needs is a tall order*

o Sometimes it is not even clear who the customers are. o Customers don’t know what they want. o Their needs keep changing. o There are many customers and their needs conflict. o They don’t have time to spend with you. o You don’t have time to spend with them. o Etc. Of course, as we’ve seen, there is really no choice. *The Customer Needs Cookbook provides a way of doing so..

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Presenter
Presentation Notes
Return to the notion that understanding customers and their needs is critical and really hard. I believe the list of reasons is straightfoward. Similarly, the notion that there is not choice. Close the slide by noting that we will provide two goods ways to help do so.
Page 33: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

1. Listen to people. Hear the broader message

Companies use a variety of means to understand customer satisfaction:

o Surveys o Interviews o Customer complaints o Focus groups

Available to you: o Your line manager o Co-workers o Logged errors o People who ask

something of you o Complaints o “Thank yous”

33

Make understanding customers and their needs part of your daily work

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Presenter
Presentation Notes
The first way is essentially to listen. There are actually two messages here: You need to make understanding customer needs part of your work. There are many ways to do so. Juxtapose the ways companies can listen with the ways that are available to people in their daily work.
Page 34: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

34

2. Hear the “Voice of the Customer” Those who drive family cars don’t say: “We need the

glass to to be polarized, .17” thick, tempered for 40 hours in a 800 degree Thermaflex kiln, with a 26% blue-green tint, cut within 14 mils of specification, and with the edges sanded with 500 grit paper.”

They say: “We need to be able to see out in all kinds of

weather. We want to be kept safe. We don’t want to be blinded by the sun. We need the door to sound solid when it closes.”

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Presenter
Presentation Notes
The second way to understand customer and their needs is to hear the “voice of the customer.” VOC is a useful concept. In its most basic and powerful form, it simply dictates that one hear what the customer is really saying in a non-judgmental fashion. Similarly, don’t pollute the voice of the customer with “what I need to do to satisfy the customer.” That will come later in the workshop. I’ve used the example given here dozens of times, with good results.
Page 35: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

35

3. Use the Customer Needs Analysis Cookbook/Communicate the VOC far and wide

2. Learn how they use the data

3. Determine required features and quality levels

4. Document and communicate

“Customer Requirements”

5. Identify and prioritize “gaps”

1. Name most important customers

/Redman-IDQ-DQtutorial-Nov2013

Optional

© Navesink Consulting Group LLC, 2000-2013

6. Communicate the “Voice of the

Customer”

Page 36: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

You Can’t Manage What You Don’t Measure

¨ Why measuring accuracy is so difficult. ¨ General framework for measuring DQ. ¨ A framework for measuring data accuracy. ¨ An example using business rules. ¨ Time series and Pareto plots are the workhorses. ¨ A word of caution!

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 36

Page 37: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Five metrics come up frequently

Database

Process that

creates data

5. Cost-of-poor quality $$

4. Customer satisfaction

(especially for meta-data)

1. Accuracy of data as created by the process

2. Accuracy of embedded data

3. Timeliness of delivery (to customers)/Process Cycle time

To be clear, other metrics

(completion of improvement

projects, incorporation of new features) are sometimes

important /Redman-IDQ-DQtutorial-Nov2013 37 © Navesink Consulting Group LLC, 2000-2013

Page 38: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

A deeper look at “accuracy”

¨ All customers want their data to be “accurate enough.” ¨ “Accuracy is a measure of the degree of agreement

between a data value or collection of data values and with a standard, taken as correct (Field Guide, p 221).”

¨ Most investors are satisfied if: Mega-company’s annual

revenue is stated within one million dollars.

/Redman-IDQ-DQtutorial-Nov2013 38 © Navesink Consulting Group LLC, 2000-2013

Page 39: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

A deeper look at “accuracy”, contd Consider a piece of direct mail sent to:

Jane Doe 10 Main Street

instead of: Jane Doe

12 Main Street Accurate enough? o YES: the mail will very likely be delivered safe and sound. o NO: Who would trust a company that sends mail to the wrong

address? Those responsible for data quality measurement must work

through such issues!

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 39

Page 40: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

A deeper look at “accuracy”, contd.

¨ Note that a data value is “abstract.” It has no physical properties, such as height, acidity, or voltage.

¨ So there is no “accurometer” in the same sense that there is a ruler, pH meter, or voltmeter.

¨ Thus all accuracy measurements have limitations. ¨ Those responsible for data quality measurement

must understand these limitations!

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 40

Page 41: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Developing and using measurements

Understand customer/

customer needs of measurement

Answer basic questions.

For accuracy, select from Accuracy

Framework

Document “protocol” for

making the measurement

Take measurement, according to

protocol

Review results/ Take appropriate

actions

Summarize Results (on time-series

and Pareto plots)

Data Quality Measurement Process

/Redman-IDQ-DQtutorial-Nov2013 41 © Navesink Consulting Group LLC, 2000-2013

Page 42: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

42

Data Accuracy Measurement Framework

Suppliers Customers Bus proc

inputs outputs

feedback

rqmts

DB

What Data to Include:

• Key attributes

• All attributes

Measurement Devices: Expert Opinion / Real-world / Allowed Domains / Complaints / Surveys / Tracking

Reporting Scales:

• Attribute-level

• Record level

• Failed Business Rules

• Customer-Sat

• Six-sigma scale

• COPDQ

There are dozens of candidate accuracy measures based on four factors (e.g., answers to the questions of step 2)

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Where: As received / At DB entry / On DB “exit” / As delivered / In customer eyes / Across process

Page 43: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Candidate Accuracy Measurement Devices

o Comparison business rules: (As previously noted), data values

that fail business rules cannot be correct. o Expert opinion: Those with deep knowledge of the data can spot

(many) errors easily. o Comparison to real-world: Check the value(s) against their real-

world counterpart(s). o Customer complaints: People (often) complain when the data

are wrong. o Surveys: Ask customers o Data tracking: “Track” a data value as it moves across the

process and support systems All have strengths and weaknesses, which we will take up later on.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 43

Presenter
Presentation Notes
This slide highlights the basic idea for each device for making the accuracy measurements.
Page 44: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 44

First-time, On-time measurement Who’s Leading: ¨ Owner of a data warehouse who is implementing

supplier management for the first time. Customers/Needs: Leader: ¨ Is a selected supplier any good? ¨ Are the data getting better/worse? ¨ What are the key areas of improvement? Supplier: Are the data I’m providing any good?

Page 45: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

First-time, on-time: Key Ideas

¨ Make measurements that will help drive accountability back to the supplier.

¨ Frequent measurement (to help get a feel for variation). ¨ Repeatable, low-cost, comprehensive measurement ¨ Measure both accuracy and on-time delivery

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 45

Page 46: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

46

“First-time, on-time” measurement

Supplier Customers Business Process

inputs outputs

As data are created/delivered to DB

DB

“Count” the records that fail business

rules or aren’t delivered on schedule.

feedback

rqmts

Summarize at the record level. Identify opportunities at

attribute level.

Key attributes (fields) only

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 47: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

A data value that fails a business rules cannot be correct

EXAMPLES: o “SUPPLIER NAME = Null”

o “SEX = X”

o REVENUE = 10,000€,

EXPENSE = 8,000€, PROFIT = 4,000€

o HEIGHT = 2.6 metres

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 47

Required attribute

Sex = M, F, or NA only

Profit = Revenue - Expense

Possibly correct, but can’t tell for sure

A “business rule” is a rule that constrains one or more data values

Page 48: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

To be clear o Customers generally want “accurate data:”

Passed business rules è “valid” But

Valid è correct o Indeed,

DQ (validity) ≥ DQ (correct)

o Keep this in mind when interpreting results o Final note: I usually find the depth of business rules

and usability of measurements based on them to be highly correlated.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 48

Page 49: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Process for making process accuracy measurements using business rules

Data delivered (or expected) in

the current time period

Business rules

Failed rules/erred

data spreadsheet

Develop business

rules Apply to

Identify

Summarize results

/Redman-IDQ-DQtutorial-Nov2013 49

Next slide

© Navesink Consulting Group LLC, 2000-2013

Page 50: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Data Quality Measurement Spreadsheet

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 50

Attribute 1 2 3 4 5 Attribute

Summary Record

Summary

Rec# Rule i Rule ii Rule iii Rule iv Rule v Rule vi Rule vii Rule viii Rule ix Rule x Attributes passed (of

5)

Record Pass

A X 4 n B X 4 n C X 4 n D X 4 n E X 4 n F 5 y G X 4 n H 5 y I X X 3 n J X 4 n

Rules Passed (of 10 attempts)

9 9 7 10 9 9 10 9 10 9

Accuracy Measure Fraction Valid fields (4+4+4+4+4+5+4+5+3+4)/50 (next to last column) 0.82 All valid records (2 of 10, see last column) 0.2 Passed Business Rules (9+9+7+10+9+9+10+9+10+9)/100 (last row) 0.91

Page 51: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

51

Data Quality Measurement Spreadsheet: “First-time, on-time” measurement

All critical attributes

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 52: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

DQ measurements are made every time period and plotted as a time-series

DQ

0.500

0.600

0.700

0.800

0.900

frac

tion

perf

ect r

ecor

ds

week ending

Fraction valid records, XYZ Supplier, 1Q2011

fraction perfect ave rqmt

1. Clear title and labels on axes.

2. “This way good” indicator

3. Average line 4. Requirements

line

/Redman-IDQ-DQtutorial-Nov2013 52 © Navesink Consulting Group LLC, 2000-2013

Important features of a time series plot

Page 53: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

53

Support with a simple Pareto plot that describes “error”

Categories ordered by frequency

Pareto chart aims to help identify “opportunities for improvement

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

00.050.1

0.150.2

0.250.3

0.35

attribute 2 attribute 3 late attribute 1

frac

tion

of re

cord

s

error

Pareto chart, weeks 1-25

Page 54: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 54

Now, answer the questions

Leader questions: ¨ Is a selected supplier any good? Answer: Absolutely not ¨ Are the data getting better/worse? Answer: No, there is considerable week-to-week

variation, but no evidence of any trend. ¨ What are the key areas of improvement? Answer: Attributes 2 and 3 and late delivery Supplier question: ¨ Do the data I supply meet the customers’ needs? Answer: No. Average performance is well below

requirements.

Page 55: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Business Rules: Pros: ¨ Simple, easy-to-use ¨ Once set up, can be applied almost for free. ¨ Many good, commercially-available tools. ¨ Business rules can also be used for control ¨ When the data are poor (as in this case), there is no need for

more elaborate measurement. Cons: ¨ People may find ways to beat the business rules. ¨ This method may overestimate true accuracy. That can

become an issue when DQ nears 98%. ¨ CAUTION: Business rules can seduce!

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 55

Page 56: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013

© Navesink Consulting Group LLC, 2000-2013

56

Top-line pros and cons of various data accuracy measurement instrument

Comparison of Accuracy Measurement Devices

Device Advantages Disadvantages

Data Tracking most powerful expensive

Expert Opinion quick and easy Experts aren’t always right

Compare to Real-World

best accurometer very expensive

Business Rules can be applied to entire DB deceptively hard

Complaints eyes of customer hard, if not already set up

Surveys can yield potent insights often hard to interpret

Page 57: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Good Controls Mean Predictable Performance

¨ Control defined ¨ Why control matters ¨ Types of data controls ¨ Simple controls using business rules ¨ Trusted data

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 57

Page 58: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

58

Control - Background

o Formally, control is the managerial activity of comparing actual performance against standards and taking action on the difference (after Dr. Juran).

o We wish to emphasize the managerial aspect of control. “Controls” without clear management accountability do not qualify.

o There are many, many types of controls, from budget controls, to automatic controls, to quality controls.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 59: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

The generic control process*

59

4. Log results

2. Compare

1. Set standard

3. Take (corrective)

action

Underlying Process/Dat

a

Data/measurement of interest

Standard/goal

Pass? done

Control Log

yes no

*based on the work of Dr. Juran /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 60: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

A thermostat effects control

60

18°C ± 1°C

Set desired temperature

Turn furnace or AC on or

off

Thermometer Measures

continuously

Temperature 18° C

Pass?

Done = leave furnace or AC alone

yes no

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Presenter
Presentation Notes
This slide begins a series of examples. Almost everyone is familiar with a thermostat, so it is a good starting place.
Page 61: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

61

Control in your personal life Teenager to father: “You’re not the boss of me.” STEP 1: Consider a situation in your everyday life where

you apply a control. Examples: Weight, budget, children’s bedtime.

STEP 2: Describe each element of the control. STEP 3: Evaluate the control. Does it work? Can you

improve it? /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 62: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Why control matters As these examples make clear, control is fundamental to all

management o For data quality, various types of control:

n Help identify and correct errors n Help prevent them n Help ensure that the gains of process improvement are

sustained. n Establish a basis for prediction and so help manage

processes n Ensure that the data quality program is working as

designed o Net, net, good controls help ensure better quality at lower

cost. 62 /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 63: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

63

Data Quality Controls o “Validation” controls (also called or “edit” controls) are based on

business rules

n In-process controls detect/correct errors and keep them from being passed further downstream.

n Clean-up controls correct errors that have already been made.

o Statistical process controls establish a basis for predicting future performance.

o Quality Assurance = Audit controls ensure the quality program is working as designed

o Customer error correction controls ensure that customer-discovered issues are addressed.

o Calibration controls make sure equipment works correctly

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 64: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

A simple data validation: Age and years with company

64

Age - Years >

18

Labor laws

Alert employee of error. Seek resolution

Employee Data

Age – years with

company

Can’t start work until

18

Pass? done yes no

Log error and

resolution Control

Log

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 65: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

65

Can attest that data can be trusted when the plot looks like this….

LCL exceeds requirements

Long period of stability

p-chart, first-time, on-time

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

1 6 11

16

21

26

31

36

41

46

51

56

61

66

71

76

81

week

frac

tion

pass

ing

rule

s

and

on-ti

me

DQ

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 66: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Customer Controls

4. Log results

Assign resp to address

1. Set standard

Correct error/Resol

ve issue

Customer Issue:

“This doesn’t look right”

e.g., All customer questions answered correctly within 24

hours

Cust corr?

done Control Log

no

yes

Customer controls aim to make sure questions from customers are answered in a timely fashion. Customer controls are similar to other controls, except the issue is brought by a customer.

Advise customer

66 /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Call/email “Help Deck”

Research issue

Page 67: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Quality Improvement

¨ Quality improvement and the scientific method ¨ The Quality Improvement Cycle ¨ A (stunningly real) example ¨ Common opportunities for improvement

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 67

Page 68: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

68

What is Quality Improvement?

Quality improvement is a structured, team-based

approach for identifying and completing specific “projects” to eliminate or mitigate root causes of error and keep them from coming back!

We’ll use the term “project” to mean an opportunity

scheduled for completion (after Juran) /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 69: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

69

The Scientific Method There are many, many specific approaches to

improvement. n The six-sigma technique known as DMAIC (Define,

Measure, Analyze, Improve, Control) is most popular.

n All good methods of improvement stem from the scientific method.

n You should use the method that is best suited to your organization.

n I teach the Quality Improvement Cycle, because I like its emphasis on properly chartering projects.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 70: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

70

Quality Improvement Cycle (QIC)

1. Select project

2. Form and Charter project team

3. Conduct root cause analysis

4. Identify and trial solution

5. Implement solution

Complete project/ Dissolve QIT

6. Hold the gains

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 71: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

A Simple QIC Example Step 1: Select project

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 71

0%

5%

10%

15%

20%

25%

30%

E C D A all othersAttribute

Percent Erred Attributes,4Q2012

Background: A process management team reviews 4Q2012 attribute error rates, obtaining the following:

The team decides to address Attribute E.

Page 72: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Step 2: Form and charter the team

The PMT: ¨ Forms a project team composed of one person from each major

step of the process producing this data. ¨ Selects a junior manager to lead the team The PMT and improvement team agree to the following charter: ¨ “Reduce the error rate in Attribute E by 50% in three months.

Virtually eliminate it in three additional months.” The improvement team: ¨ Agrees to provide a monthly status report to the PMT. ¨ NOTE: This clarity about roles and responsibilities is the reason I

teach QIC

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 72

Page 73: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

73

Step 3: Conduct Root Cause Analysis ¨ The improvement team learns that Attribute E is added to

the data record by a clerk reporting into Finance. ¨ A member of the improvement team meets with this

clerk. ¨ The clerk admits that “he never knew what Attribute E

was” (Note: A white-space problem). So he just entered anything that the “system” would allow.

¨ Further investigation reveals that: § Finance’s procedures do not specify how data is to be

populated. § The clerk’s manager assumed that “the clerk would

get help if it was needed.”

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 74: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Several candidate solutions were proposed

Two received serious consideration: ¨ Move responsibility to entering Attribute E the Field

Engineering Group (currently a field engineer creates Attribute E and emails it to Finance). Clarify exactly when and what is expected.

¨ Clarify the clerk’s responsibility. Clarify exactly when and what is expected.

Note: There are pros and cons of both. In this case, the

PMT pointed out that proposed solution 2 was simpler.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 74

Page 75: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

75

Step 4: Identify and Trial Solution ¨ The Improvement Team, working with the clerk, drafted Finance Procedure for Creating and Entering Attribute E. ¨ The clerk agreed to give it a try. ¨ One improvement team member agreed to check in with the clerk, answer questions, etc. ¨ The improvement team conducted a follow-up study near the end of its original three-month deliverable. Results suggested the problem was solved.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

0

0.05

0.1

0.15

0.2

0.25

0.3

Before After

erro

r rat

e

Attribute E Improvement Project

Page 76: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

76

Step 5: Implement Solution

¨ The Improvement Team meets with the clerk’s manager to review results and suggest that their draft procedure be made standard.

¨ She agrees and also points out that clerks turn over rapidly. So training of new clerks is essential.

¨ The manager maintains a New Clerks’ Introduction Guide. She agrees to include the Attribute E standard in the Guide.

¨ The manager wonders if there are other similar issues that should be standardized.

¨ The Improvement Team agrees that this may be the case, and advises her that she is free to charter an improvement project to investigate.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 77: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Step 6: Hold the gains

A new control is defined: o PMT: Check measurements every month. If there is

more than a single erred Attribute E, they will advise the manager and clerk directly.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 77

Page 78: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

To Complete the Project

¨ The improvement team makes its final report in the form of a storyboard and delivers it to the PMT.

¨ The PMT accepts the report and closes out the project. ¨ The improvement team is disbanded, with thanks. ¨ While the QIC does not require it, a senior leader

should recognize the efforts of the improvement team.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 78

Page 79: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Attribute E Improvement Project: Story Board

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 79

Attribute E Improvement Project April – June, 2013

Step 1: Project selection Motivation: Eliminate the highest error rate

0%5%

10%15%20%25%30%

E C D A all othersAttribute

Percent Erred Attributes, 4Q2012

Step 2: Charter Project Team Members: Jim Henson (Finance), Bev Miller (leader), Paul Thorgood (Engineering) Charter: Reduce Attribute E errors by 50% in three months and virtually eliminate them three months later

0

0.1

0.2

0.3

Before After

erro

r rat

e

Attribute E Improvement Project

Step 3: Root Cause Analysis Suspected Cause: Clerk doesn’t know understand Attribute E. Deeper Cause: High turnover and clerks poorly trained.

Step 5: Implement Solution Solution, part B: Include standard in “New Clerk Training Manual”

Step 4: Identify/Trial solutions Solution, part A: Attribute E Data Creation and Entry Standard

Step 6: Control PMT to advise manager and clerk if there is ever more than Attribute E error in a month

Page 80: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Experience confirms that many improvement projects are quite simple

Look for: ¨ Lack of communication between customers and

creators ¨ People not understanding what is expected ¨ Broken interfaces between steps ¨ Overly complex steps ¨ Uncalibrated measurement devices

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 80

Page 81: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

All Work is Part of a Process

¨ The Customer-Supplier Model helps build communications channels ¨ The Process Management Cycle wraps the workhorses (customer needs, measurement, control, improvement) ¨ Smart process owners focus on communications and interfaces between steps. ¨ The Supplier Management Cycle extends the thinking to external suppliers ¨ Why process and supplier management “work.”

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 81

Page 82: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Processes and their importance A process is a set of inter-related work activities, usually

characterized by specific inputs and repeated value-added steps, which produce a specific set of outputs.*

Processes are the means by which businesses deliver added value to their customers, whether the added value is data, a service or a tangible product.

The basic idea is to define, implement, operate, and improve processes that meet customer needs: n Consistently n In a cost-effective manner n In ways that evolve as customer needs evolve.

*Redman, Data Quality: The Field Guide, 2001.

82 /Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Presenter
Presentation Notes
I think this slide is straightforward, but it is really important. It starts with a definition of process. The key notion is repeatability. Then it connects process to meeting customer needs. Finally it connects process to business success.
Page 83: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

83

The Customer-Supplier Model

Suppliers Customers Your Process

inputs outputs

requirements requirements

feedback feedback

Data suppliers can be inside or outside the business

Larger process

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Presenter
Presentation Notes
The customer-supplier model is perhaps the most potent tool in all of quality management. Don’t be fooled by its apparent simplicity. The model has proven remarkably fungible and scalable. Provide a high-level description here: You in the middle, Suppliers to the left, customers to the right Data flow, left to right Requirements and feedback (i.e., communications channels). Embedded in larger process We’ll start by dissecting it piece by piece.
Page 84: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

84

Failure Of Communications Channels

Suppliers Customers

Your Process

inputs outputs

requirements requirements

feedback feedback

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Failure to understand customer requirements has contributed to every quality issue I’ve ever worked

More generally and fundamentally, the root cause is failure to build all needed communications channels

Page 85: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Process Management Cycle

1. Establish management

responsibilities

2. Understand customer

needs

3. Describe process

4. Establish measurement

system

5. Establish control

& check conformance

to requirements

6. Identify & select

improvement opportunities

7. Make improvements

& sustain gains

The process management cycle provides a powerful, repeatable

means to bring the tasks (or habits) of data quality management to bear

/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 85 © Navesink Consulting Group LLC, 2000-2013

Presenter
Presentation Notes
This portion of the workshop module provides an overview of the process management cycle, with focus on its application for data. The working assumption is that attendees have taken the earlier DQ workshops, or are familiar with tools such as flowcharting and DMAIC, and have at least heard of statistical process control. One important point is that the process management cycle pulls familiar methods and tools together into a powerful and repeatable synthesis The module flows step by step and provides a couple of points on each step.
Page 86: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

1. Establish management responsibilities Generally, the “process team” (executive, owner, manager,

support personnel) must: o Be accountable for overall performance (meeting data

customer requirements at reasonable cost) of the process. An excellent starting point is:

“Halve the overall error rate every year” o Have the authority to effect changes (e.g. budget,

staffing, process design) o Have the skills to do the work called for throughout the

process management cycle.

/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 86 © Navesink Consulting Group LLC, 2000-2013

Page 87: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Composition of a Process Management Team

1. Representatives of departments that contribute to the process. n Detailed knowledge (or the ability to get it) of the data their department uses and

creates.. n Take and end-to-end view n Volunteers provided the time to effectively contribute. n Empowered representatives of their departments, qualified to evaluate and

implement any necessary changes. 2. Staff with expertise in data quality 3. Formed into sub-teams charged with:

n Understanding customers and their needs n Making and interpreting measurements n Starting and completing improvement projects (e.g., lean, DMAIC) n Defining and implementing controls

4. Processes that depend on outside data should name a “data supplier manager.” For data created/sources outside, this should be a formal position.

5. NOTE: Highly “data dependent” processes, “large” processes, and those which “share” data, almost certainly require a dedicated, full-time data quality professional

/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 87 © Navesink Consulting Group LLC, 2000-2013

Page 88: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Talents of the Process Owner

o For data, many liken process management to “herding cats.” o So the ideal process manager possesses the following skills/traits:

o The ability to lead through influence, rather than control. o Nerves of steel and steely optimism o Communications – written and verbal o Analysis and synthesis o Decision-making o Ability to negotiate across organizational boundaries o The ability to build and coordinate a diverse team o Perseverance

/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 88 © Navesink Consulting Group LLC, 2000-2013

Page 89: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Successful process owners…

o Bring data front and center. o Emphasize communications channels o Align the work in the direction of the customer.

Importantly, they recognize the people’s bosses are customers.

o Focus less on the details of how departments and groups do their work and more on the interfaces between them.

o Gain needed authority, support, and resources

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 89

Page 90: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Common situation

o It is natural to expect conflict between “line” and “process” management.

o Good process managers avoid most of it, by focusing where they can do the most good, usually the interfaces between steps

o Said differently, good process owners focus less on the work and more on how the work fits together.

o They also focus on less on “their span of control” and more on their “span of influence.”

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 90

Page 91: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Example: To shorten cycle time, process owners focus first on “queue time” between steps

1

T=7 days

T=0

2 3 4

Value-added steps Queue (wait)

time

1

T=7 days

T=0

2 3 4

“Value-added time:” 1 day “Queue time:” 6 days Total time: 7 days

“Value-added time:” 1 day “Queue time:” 3 days Total time: 4 days

BEFORE

AFTER (cutting wait time)

/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 91 © Navesink Consulting Group LLC, 2000-2013

Page 92: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 92

The Supplier Management Cycle is a tuned-up version for external suppliers

4. Define Planning (new feature), Control, and Improvement projects

2. Develop Customer requirements

3. Baseline Supplier performance

5. Complete projects

6. Track performance

1. Assign Supplier management responsibilities and engage selected Supplier

SUPPLIER MANAGEMENT CYCLE

© Navesink Consulting Group LLC, 2000-2013

Page 93: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

93

You can’t expect your suppliers to understand your needs and requirements unless you’ve

explained them

Be honest with me. Have we really done a good job

explaining our business and making our needs clear?

I think you have. But explain them one more time so

I can be sure!

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Presenter
Presentation Notes
Two really important points: You already know, from the last module, that it is difficult to understand your customer’s needs. It is just as difficult for them. Of course this simply underscores the need to explain your needs to your suppliers.
Page 94: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Why Process and Supplier Management “Work”

o Synthesize four basic tasks (customers, measurement,

control, improvement) into a powerful whole. o Directly address political, organizational, and cultural

issues: n White space n Communication n Interfaces n “Right things” get measured improved n Best hope (so far) for working across silos

/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 94 © Navesink Consulting Group LLC, 2000-2013

Page 95: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Actively Manage Change

¨ Everything about data quality management is political! ¨ My five-step method for addressing social issues.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 95

Page 96: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 96

The Cold Brutal Reality No data issue is so trivial that it doesn’t generate enormous

political heat!

Page 97: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Advancing a culture that values data and data quality

1. Understand the political realities 2. Understand the realities surrounding change (and

understand that you MUST be an agent of change) 3. Understand both positives and negatives (you are more

likely to succeed building on strengths) 4. “Bring lawyers, guns, and money…” 5. All change is top-down

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 97

Page 98: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 98

A-1. Power/data sharing/ownership: In the Information Age, Possession of Data Conveys Power!

Sweeney’s Database has two terabytes and

ours only has one! Get me two more teras!

Page 99: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 99

A-2. Though it is Universally Praised, Data Sharing is the Exception!

NOTE: Many of The 48 Laws of Power (Greene and Elffers, Viking, 1998) seem to argue against sharing data.

Of course you can have our data. Just get your 30-11 form signed by

the Head of Legal, the Head of Accounting, and the Head of HR! Then we’ll run it up the line here!!

Page 100: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 100

G-4. Since the Data are “In The Warehouse,” the CIO must be responsible! I’ve told that #*%! CIO

about these data problems a million

times! Why can’t they get them right?

Page 101: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 101

A Model for Managing Change

Sense Of

Urgency

Sense of

Urgency

Sense Of

Urgency

Sense Of

Urgency

Sense Of

Urgency

Four components for successful change

When a component is missing

Clear, shared vision

Clear, shared vision

Clear, shared vision

Clear, shared vision

Clear, shared vision

Capacity for

change

Capacity for

change

Capacity for

change

Capacity for

change

Capacity for

change

Actionable first

steps

Actionable first

steps

Actionable first

steps

Actionable first

steps

Actionable first

steps

Successful change

Low-priority no action

Fast start that fizzles, directionless

Anxiety, frustration

Haphazard efforts,

false starts

+

+

+ + +

+ +

+ + +

+ +

+ + +

=

=

=

=

=

Page 102: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

102

Rate each component of the “RAG” Scale

Sense of

Urgency

Clear, shared vision

Capacity for

change

Actionable first

steps

Successful change

+

+

+

=

People are motivated by fear, fame, fun, and fortune. Bone-numbing fear seems to work best for organizations

HINT: Attach the data quality program to the organization’s top priorities

The “shared” portion of this component is the most demanding.

HINT: Engage as many people as possible, as early as possible.

Evaluate “intellectual,” “financial,” and “emotional” capabilities separately. HINT: Educate, educate, educate HINT: After an initial investment, set the DQ effort up to be “self-funding.” HINT: “Carry the wounded. But shoot the laggards.”

Hint: Think globally, plan regionally, act locally

Hint: Pilot studies are essential, but no panacea

Hint: No one ever comes to saying “today, I’m going to do something to foul up a customer.” People want to do a good job. Its your job to align them to the effort

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 103: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 103

A Force-Field Analysis can help summarize current state

Quality of critical data

Complacency with current results and

position

Entrenched QA group

Senior management distracted by acquisitions

Solid record of innovation

Process initiative gaining traction

Talented staff. People seem to

care about consumers and

quality

Lack of connection

between inward- and

outward-facing groups

Lack quantitative

thinking

Many groups have close customer

connections

Page 104: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Step 4: Bring Everything You Can to Bear ¨ Lead the quality initiative from the business side

(Landauer) o Build organizational capabilities o Educate, educate, educate o Build political capital o Avoid “insurmountable opportunities” barriers o Persist

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 104

Page 105: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Step 5: “Eventually, all change is top-down”

o Probably not literally true, but the advise is solid: Over time, engage increasingly senior leadership.

o Almost all are very smart: “If they don’t get the joke,

you’re not telling it right.” o I’ve not found it to be the case that: “A big success in

one Department A leads Department B to pick DQ up.” o On the other hand, a big success in Department A is

pre-requisite.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 105

Page 106: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Organizing for Data Quality

¨ How many people? ¨ Take responsibility for DQ out of IT ¨ Fundamental organizational unit for DQ ¨ Federated structure

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 106

Page 107: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

The Five Most Common Things I Hear

o “We’re data rich and information poor.” o “I’ve been in this industry twenty-five years. Trust me.

These data are as good as they can possibly be.” o “Tom, you’ve got to keep in mind that we are much

more siloed than the other companies (industries, etc) you work with.”

o “Of course my customers like what I give them. I’ve still got a job, don’t I?”

o “If its in the computer, it must be IT’s responsibility.”

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 107

Page 108: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Current State: Today’s organizations

are unfit for data

o Lack talent, up and down the organization chart. o Quality is essentially unmanaged. o Responsibility for data buried in the bowels of IT. Step

one: Move it out! o Silos impede data sharing. More generally, the politics

is brutal. o Organizations have not thought through how to

compete with data, nor gained enough experience to do so in a sensible fashion.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 108

Page 109: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

How Many People Will it Take? Base organization only: o Informal study of the effort devoted to managing other

assets (e.g., people and capital): n 1-2% with most concentrated in the upper ranges

o Informal study of effort applied to quality: n 2% overall (data plus) n Up to 4% in a “surge”

o Far more for “data providers.”

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 109

Page 110: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

© Navesink Consulting Group LLC, 2000-2013 TCR, Page 110

Who is responsible for data quality? Since the data is “in the warehouse,” it must be the CIO!

I’ve told that #*%! CIO about these data problems a million times! Why can’t

she get it right?

/Redman-IDQ-DQtutorial-Nov2013

Several lines of reasoning and the data confirm: STEP ONE: GET LEAD RESPONSIBILITY FOR DATA OUT OF IT!

Page 111: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

A Federated Organization Model for Data

People Management Data Assets Day-in, day-out: “Regular” people and managers. HR role: Policy setting and admin

Day-in, day-out: High-quality data creation and novel use of data is the responsibility of people, processes and departments. DG role: Policy setting and admin

Departmental HR: Help their units find and advance the talent they need

Departmental DG: Help their units find and/or create the high-quality data they need. Home for quality facilitators, analysts

Corporate HR: Succession planning, pay scales, etc

Corporate DG: “metadata” processes, special provision for unique data, etc

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 111

Page 112: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Fundamental Organizational Unit for Data Quality

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 112

Leadership

Tech Management

Supplier Management

Process Management

Requirements Team

Improvement Teams

Measurement Team

Customer Team

Measurement Team

Control Team Control Team

Data creators and customers work within processes!

Page 113: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Base Capability: Departmental Data Group

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 113

Departmental Data Officer

Training Team

Metrics Team

Business Case Team

Strategy Team Quality Team

Ext Supplier Team

Change Mgmt/ Comms Team

Improvement Facilitators

Metadata Processes

Page 114: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Federated Structure for Data Quality

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 114

Top Data Job

Data Council Unique Data Team

Change Mgmt/ Comms

Strategy Team

Fundamental Orgs for DQ

Quality Team

Policy Team

Corporate Metadata Departmental

DGs

Page 115: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Data Quality and Strategy

¨ Sooner or later, you’re going to need a data strategy. ¨ Four basic strategies ¨ Data may be your ultimate proprietary asset

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 115

Page 116: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

So far, I’ve identified eighteen distinct ways to “put data to work”

Provide (Sell) Content o New Content o Re-package o Informationalization o Unbundling o Exploiting

Asymmetries o Closing Asymmetries

Facilitators o Own the Identifiers o Infomediation o Big Data/Advanced

Analytics o Privacy and security o Training o New Marketplaces o Infrastructure

technologies o Information appliances o Tools

Working out “what’s right for us” is the key challenge for senior leadership!

• Internally o Improve

operational efficiency

o 360°-view o Data-Driven

Culture

/Redman-IDQ-DQtutorial-Nov2013 T. C. Redman, Page 116 © Navesink Consulting Group LLC, 2000-2013

Page 117: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Four basic strategies (with dozens of variants)

o Innovation (Big Data/Advanced Analytics): Find hidden nuggets in the data and,…

o Content: Provide or exploit content that others don’t have. n Informationalization n Infomediation (e.g., Google) n Asymmetry (e.g., Hedge fund)

o Build a Data-Driven Culture: Make better decisions, bottom-to-top and across the company.

o Be the low-cost provider: Superior data quality keeps costs down!

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 117

Page 118: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Data Quality and Data Strategy

Strategy Innovation/Big Data Bad data is highly leveraged in unknown ways Content Markets demand high-quality data (e.g., Apple Maps) Data-Driven Culture People discount data they don’t trust in favor of their

intuitions (rightly so) Low-cost Provider Eliminating the cost of finding and fixing errors

provides enormous cost savings

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 118

Page 119: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Collateralized Debt Obligations

o The promise in Finance: “Slice and dice” mortgages to create products with pre-defined risk-reward profiles.

o Billions made in the early 2000s. o Other benefits: More people could purchase homes,

because the mortgage originator didn’t have to hold it. o Everyone knows what happened.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 119

Page 120: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Content Providers Beware: External customers are less tolerant of poor quality data than internal

ones ¨ Drivers sent the wrong way blame the whole car

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 120

©Jimromensko.com

Page 121: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Data Doc’s Rule of Ten

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 121

Data ok?

Input data

Fix errors

Complete our Value-added

work

no

yes

Said differently, if the “straight-through path,” costs a dollar, then

the “fix errors path” costs ten.

“It costs ten times as much to complete a unit of work when the data is flawed in any way as it does when they are perfect!

Page 122: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 122

“IT Doesn’t Matter” (Carr, HBR, 2003)

PROPRIETARY INFRASTRUCTURE Can be “owned” by a single organization

(Eventually) part of general business infrastructure

Patented drug, unique process

Railroads, electric grid

Protected Become commoditized Basis for sustained

advantage Not a basis for sustained

advantage

Proprietary vs. Infrastructure Technologies

Page 123: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

Advantage Stems from Scarcity…

o Carr argues that basic IT capabilities (storage, processing, and transport technologies) are now readily available to all.

o Carr does not argue that IT isn’t important. Only that it

is not strategic. o While there are many implications, our primary interest

is data!

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 123

Page 124: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

© Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 124

Data May Be Your “Ultimate Proprietary Technology”

Note that: o Your dollars are exactly the same as your competitors. o Your employees walk out the door every night. And your

competitors can hire the best ones away. Not so with regards to your data: Unless you let someone “steal”

them, they are yours and yours alone. Further: o Data you create are uniquely yours. And you make more each

day. o We have already noted that data are subtle and nuanced (e.g., you

can define “football” in your own way). Of course: Some, maybe most data, become standardized to facilitate

communications.

/Redman-IDQ-DQtutorial-Nov2013

Page 125: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

© Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 125

Your Unique Data Merit Special Attention

o Difficult to sustain an advantage from the same stuff every else has (e.g.. publicly-available data)

o Unique data offer opportunity for sustained advantage

o These data merit special attention!

o You must:

n Know which are “special” n Be careful what you allow to become standardize n Broaden and deepen the advantage

/Redman-IDQ-DQtutorial-Nov2013

Page 126: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

As a practical matter, the quality program only goes as far as the senior

team (perceived to be) leading the effort insists

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 126

Page 127: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 127

“Non-delegatable” Roles for the Leaders (at any level)

o Become very intolerant of poor quality in the data they

need to manage. o Become very intolerant of poor quality in the data they

need to compete. o Focus, focus, focus, on the most important data. o Get management accountabilities right. o Set really high targets. o Build the organizational capabilities needed to effect the

above. o Advance a “data culture.”

Page 128: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 128

Why They Are “Non-Delegatable” “They thought they could make the right speeches,

establish broad goals, and leave everything else to subordinates... They didn’t realize that fixing quality meant fixing whole companies, a task that can’t be delegated.”

Dr. Juran, 1993 Experience so far is that “data” is even tougher than the

factory floor.

Page 129: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 129

Policy for Management Responsibility for Data

“Model” Data Policy:

Don’t take junk data from the guy upstream. And don’t pass junk data on to the next guy!

Said differently:

Be a demanding data customer and a great data creator!

Page 130: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

130

Quality Planning/Targets for improvement

o On one level, targets for improvement are nothing more

than: “Half the error rate every year. Forever.”

“Add two significant new features every year”

o On another, quality planning may include “positioning” “We aspire to be, and be perceived as, the highest

quality data provider in our industry”

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013

Page 131: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

The Big Ideas o Data quality, done right, is a huge win/win! o You have to get in front! This means you have find and eliminate

the root causes of error. o Quality is in the eyes of the customer. o You have to do the technical work tolerably well. o Data quality is done in the line (e.g., business process). o You have to overcome organizational momentum, so actively

manage change. o Get responsibility for DQ out of Tech. Built human capability. o Sooner or later, you’re going to need a data strategy. o The quality program only goes as far as the senior team (perceived

to be) leading the effort insists.

/Redman-IDQ-DQtutorial-Nov2013 © Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 131

Page 132: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

© Navesink Consulting Group LLC, 2000-2013 T. C. Redman, Page 132

Questions?

Thomas C. Redman, Ph.D. “the Data Doc”

+1 732-933-4669 [email protected]

www.navesinkconsultinggroup.com

/Redman-IDQ-DQtutorial-Nov2013

Page 133: The Management System for Data Quality (IDQ Tutorial) · PDF fileThe Management System for Data Quality (IDQ Tutorial) Thomas C. Redman, Ph.D. “the Data Doc” Navesink Consulting

/Redman-Data Driven Quiz © Navesink Consulting Group LLC, 2000-2013 NCG, Page 133

Thomas C. Redman, “the Data Doc” o Ph.D., Statistics, Florida State, 1980.

o Conceived and led the Data Quality Lab at AT&T Bell Labs.

o Formed Navesink Consulting Group in 1996.

o Helped dozens of companies think through, define, and advance their data and data quality programs.

o Led development of most of today’s best-practice data quality management methods & techniques.

o Latest and greatest: Data Driven: Profiting from Your Most Important Business Asset, Harvard Business School Press, 2008.

o Known bias: “Data are quite obviously the key asset of the Information Age. Yet today’s organizations are singularly ill-designed for data. This leads me to conclude that organizing for data is THE management challenge of the 21st century.”

o Much current work focuses on “organizing for data.”


Recommended