+ All Categories
Home > Documents > Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through...

Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through...

Date post: 01-Feb-2018
Category:
Upload: dangkhuong
View: 217 times
Download: 0 times
Share this document with a friend
35
October 17, 2016 Promoting Accuracy Through Data Quality: The UC Data Validation Framework University of California Office of the President O FFICE OF I NSTITUTIONAL R ESEARCH & A CADEMIC P LANNING [IRAP] CAIR 2016 Conference O LA P OPOOLA – Director: Reporting & Analytics 1
Transcript
Page 1: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

October 17, 2016

Promoting Accuracy Through Data Quality: The UC Data Validation Framework

University of California Office of the President OFFICE OF INSTITUTIONAL RESEARCH & ACADEMIC PLANNING

[IRAP] CAIR 2016 Conference

OLA POPOOLA – Director: Reporting & Analytics

1

Page 2: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

2

Dealing with bad data?

1

2

Page 3: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Agenda • Desired Presentation Outcomes • Data Quality

– Attributes of Data Quality – Causes & Costs of Poor Quality Data

• The UC Data Validation Framework • Creating Your Own Data Quality

Management Program • Final Thoughts

UCOP-IRAP/CAIR 2016 3

Page 4: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Presentation Outcomes

4

Page 5: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Desired Presentation Outcomes

• A better understanding of data quality from an IR perspective

• Exposure to data quality principles, methods and techniques that enable continuous improvement in data quality

• How to conduct simple data quality audits by implementing a successful Data Quality Program (DQP)

UCOP-IRAP/CAIR 2016 5

Page 6: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Quality 6

Page 7: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

What is Data Quality?

Definition 1 The state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use. (Government of British Columbia)

Definition 2 The quality of a particular dataset or record is to describe the fitness of that dataset or record for a particular use that one may have in mind for the data. (Chrisman, 1991)

UCOP-IRAP/CAIR 2016 7

Page 8: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Attributes of Data Quality

Accurate Complete Flexible

Timely Consistent Available

UCOP-IRAP/CAIR 2016 8

Page 9: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Causes of Poor Quality Data • Lack of data governance • User errors – manual data entry • Lack of identified “authoritative” data sources • Complex IT infrastructure • Bad business processes • Silo-driven solutions • Multiple disconnected processes • Tactical initiatives to “re-solve” data accuracy

rather than understanding and addressing root cause UCOP-IRAP/CAIR 2016

9

Page 10: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

The Cost of Poor Data Quality • Wasted revenue - $3.1 trillion in US alone

(2016) • Mistrust • Bad or delayed decisions • Impacted funding • Constant rework • Missed opportunities

UCOP-IRAP/CAIR 2016 10

Page 11: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

The UC Implementation

11

Page 12: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

The UC Story

• Challenge associated with data submission from 10 different campus locations, central office, three laboratories and ANR with diverse transactional systems

• Implementation of a new data warehouse called for an extensive review of data quality processes

• Selection of a data quality methodology that involved business practice review and change.

UCOP-IRAP/CAIR 2016 12

Page 13: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

UC Quality Program Guidelines • UC Applicable

– For UC business; based on user needs • Flexible

– Adaptable to evolving data content areas • Scalable

– Could be expanded or reduced in scale – Could be deployed across multiple UC locations

• Prudent – Minimal implementation costs

• Complementary – Compatible with UC Standards

UCOP-IRAP/CAIR 2016 13

Page 14: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Elements of UC DQM

• Technology • People

• Processes • Governance

Define goals

Identify areas

for change

Build Solutions

Monitor and plan

updates

UCOP-IRAP/CAIR 2016 14

Page 15: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

UC Data Infrastructure

Input Files Data Processing

Reporting &

Analytics

Staging Layer Reporting Layer

Data Marts

UCOP-IRAP/CAIR 2016

15

Page 16: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

UC Data Validation Framework

Review data validation reports

Standard input files

Review data audit reports

Data Staging Layer

Reporting Layer

N1

N2

N3

N4

N5

N6

UCOP-IRAP/CAIR 2016

16

Page 17: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Data Collection & File Specs… • File specifications: data collection

instrument • Proper data collection instrumental to

integrity of research • A good file specification:

– Clarifies how you expect all institutions to submit their data

– Clarifies the length, format, error levels and valid values

– Has an accompanying overview and file characteristics that contains:

• File submission schedule • File physical characteristics • Any special conventions

– Has an accompanying code book

UCOP-IRAP/CAIR 2016 17

Page 18: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Error Groups • Error Framework

– Database Tables • Rejected Files (R)

– Header Record Type • Severe Errors (S)

– Invalid Campus Code – Invalid Student ID

• Element Errors (E) – Invalid Sex Code

• Group Errors (G) – Campus-College-Major

combinations

UCOP-IRAP/CAIR 2016 18

Page 19: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Our Toolset

19

Page 20: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

UC Data Quality Toolset

• Atlassian JIRA • IBM DB2 Database • IBM DataStage • IBM Cognos • Microsoft Excel

UCOP-IRAP/CAIR 2016 20

Page 21: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Creating Your Data Quality Program

21

Page 22: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Key Requirements for a DQP A Data Quality Vision

A Data Quality Strategy

UCOP-IRAP/CAIR 2016 22

Page 23: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Develop A Data Quality Vision • Every organization needs:

– A vision with respect to having good quality data – An accompanying policy to implement that vision – A strategy for implementation

• Every organization should look: – For efficiencies in data collection and quality

control processes – Beyond immediate use and examine user

requirements – For ways to build networks and partnerships

UCOP-IRAP/CAIR 2016 23

Page 24: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Define a Data Quality Strategy

Strategy

Business Process Review

Data Quality Assessment

Review Results

Business Practice Change

UCOP-IRAP/CAIR 2016 24

Page 25: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Review Your Business Processes

• How and when is data collected?

• Where is data stored? • Is the same data stored

in more than one system?

• Who creates the data? • Who uses the data? • What kind of quality

checks already exist?

UCOP-IRAP/CAIR 2016 25

Page 26: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Do a Data Quality Assessment • What are the quality

criteria? • What are the

acceptable range of values?

• What kind of thresholds should be in place?

• What are your business rules?

UCOP-IRAP/CAIR 2016 26

Page 27: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Review Your Results

• Develop a systematic approach to reviewing results

• Develop a process for data cleaning or correction

• Identify source of data problems

• Communicate!

UCOP-IRAP/CAIR 2016 27

Page 28: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Implement Necessary Changes • Implement changes to

improve data quality – Centralize reference

data codes – Consolidate data

collection and storage • Adopt ongoing data

quality review process – Review data regularly – Communicate quality

improvements

UCOP-IRAP/CAIR 2016 28

Page 29: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Where Are We Now?

Page 30: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Where are we now? • Standardizing input file specifications across

all content areas • Implementation of a requirement

managements tool • Documenting business related quality rules • Promoting data governance through the

creation of a data operations website • Improving communication between the data

creators and IRAP • Improving relationship between IRAP and IT

UCOP-IRAP/CAIR 2016 30

Page 31: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

Final Thoughts • Quality data is

achievable if you are willing to: – Take a critical look at

your existing data – Implement changes to

how you collect and manage data

– Invest the time to educate and communicate with data creators and users

– Make data quality improvements an ongoing process

UCOP-IRAP/CAIR 2016 31

Page 32: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

32

It’s not the things you don’t know that matter, it’s the things you know that aren’t so. Will Rogers, Famous Okie GI specialist

Page 33: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

33

Fast is fine but accuracy is final. Wyatt Earp - Officer of the law, gambler and saloon keeper in the Wild West

Page 34: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

34

Good data are the data you already have. Dr. Edgar Horwood - Founder of the Urban and Regional Information Systems

Page 35: Promoting Accuracy Through Data Quality: The UC Data ... · PDF filePromoting Accuracy Through Data Quality: ... • User errors – manual data entry ... • IBM DataStage • IBM

35


Recommended