+ All Categories
Home > Data & Analytics > 2008 Pharmasug, Parallel Validation of Files

2008 Pharmasug, Parallel Validation of Files

Date post: 15-Apr-2017
Category:
Upload: alejandro-jaramillo
View: 65 times
Download: 2 times
Share this document with a friend
12
Comparing Files without Proc Compare Pharmasug 2008 Alejandro Jaramillo Russ Lavery Jaramillo & Lavery Pharmasug 208 1
Transcript
Page 1: 2008 Pharmasug, Parallel Validation of Files

Comparing Files without Proc Compare

Pharmasug 2008Alejandro Jaramillo

Russ Lavery

Jaramillo & Lavery Pharmasug 208 1

Page 2: 2008 Pharmasug, Parallel Validation of Files

• Purpose• To present an efficient methodology to compare and validate files

that are expected to have the same data structure and contents

• Business Case• In migrating data to new system business rules may indivertibly

changed. Having an adequate method and process to independently and efficiently flag potential unexpected changes in the data is the key to the project success

• Parallel File Comparison• Parallel File Comparison is defined as the process for recreating data

files from raw data sources by an independent team and comparing them to files produced by development team to feed enterprise or production system. The goal is to detect differences due to different interpretation or application of business rules or human error.

Jaramillo & Lavery Pharmasug 208 2

Page 3: 2008 Pharmasug, Parallel Validation of Files

Scenarios for Parallel Data Comparisons• Forward Data is compared and validated independently at every stage

before data goes into production to feed enterprise applications

• Backward Data is fed to enterprise application and results are regenerated independently from raw data. Even if enterprise results match, validation of granular data feeding enterprise application is done

Enterprise Results

Pre summarization

Granular

Raw Data

Validation

Validation

Validation

ValidationForward

Backward

Jaramillo & Lavery Pharmasug 208 3

Page 4: 2008 Pharmasug, Parallel Validation of Files

The Parallel File Comparison Method• Method discussed in this presentation is based on using SAS for

comparison and validation. However method can be applied when using any other system

• SAS Proc compare provides an excellent way to compare files when they are expected to have no differences. However when proc compare shows differences, a more detail methodology should be used to trace the source of the differences.

• The method of comparing two files that are suppose to have the same data and file structure but show differences via “Proc compare” has the following 5 steps:

1. Produce the files to be compared against development or production data

2. Start comparing pairs of similar files using proc compared. If comparison fails go to #3

3. Compare file structure via proc contents= > If fails stop and get files to conform to the same structure

4. Define keys and data 5. On both files run summaries on major keys (time, period, product code,

market code..etc) 6. Compare both raw files at the record level with regards to keys and data

=> if 6 or 5 fail, inquiry about file differences using raw data must be followed

Jaramillo & Lavery Pharmasug 208 4

Page 5: 2008 Pharmasug, Parallel Validation of Files

Early DiagnosisIf Proc compare shows differences. A more detail analysis is

required. Start with Proc contents

After confirming same file structure and number of observations, a

more detail check on the raw data must be conducted

Jaramillo & Lavery Pharmasug 208 5

Page 6: 2008 Pharmasug, Parallel Validation of Files

Store Reg Prod LQ1 LQ2 LQ3 RQ1 RQ2 RQ3

AAA A 0P1 12 10 8 12 10 8

BBB A 0P1 10 11 7 10 11 7

FFF A 0P1 17 11 8 19 10 6

CCC A 0P1 12 10 8

DDD B 0P2 10 15 2

EEE B 0P3 10 15 2 10 15 4

NNN c 1P1 19 15 11

CCZ A 0P1 12 10 8

Store Reg Prod LQ1 LQ2 LQ3

AAA A 0P1 12 10 8

BBB A 0P1 10 11 7

FFF A 0P1 19 10 6

EEE B 0P3 10 15 4

NNN c 1P1 19 15 11

CCZ A 0P1 12 10 8

Store Reg Prod LQ1 LQ2 LQ3

AAA A 0P1 12 10 8

BBB A 0P1 10 11 7

FFF A 0P1 17 11 8

CCC A 0P1 12 10 8

DDD B 0P2 10 15 2

EEE B 0P3 10 15 2

Logic

---Left FIle--- ---Right FIle---

Match

Match

Data

Key

ODD

Data

ODD

Key

This method

checks

EVERY

value.

Match lines

on Key and

use array &

loop to

compare

data values.

Checking Keys and Data gives exact answers

Key

Error

Jaramillo & Lavery Pharmasug 208 6

Page 7: 2008 Pharmasug, Parallel Validation of Files

Left FileRight File

Both

both files

good and bad

matches

bad_left

It is only on

the left file

Badright

It is only on

the right file

Get Merge by keys

Generation of

matching variables

Top view for comparing left and right files

run freqs on matching variables

List and compare a few raw records form bad files to get an idea

of the source of mismatchesJaramillo & Lavery Pharmasug 208 7

Page 8: 2008 Pharmasug, Parallel Validation of Files

Store Reg Prod LQ1 LQ2 LQ3 RQ1 RQ2 RQ3

AAA A 0P1 12 10 8 12 10 8

BBB A 0P1 10 11 7 10 11 7

FFF A 0P1 17 11 8 19 10 6

CCC A 0P1 12 10 8

DDD B 0P2 10 15 2

EEE B 0P3 10 15 2 10 15 4

NNN c 1P1 19 15 11

CCZ A 0P1 12 10 8

mismatch Left_vs_Right

|1= Obs |10= Obs |11= Obs | Total

Frequency |in Left |in |in both |

Percent |Only |Right |Left and|

| |only |Right |

-----------------ˆ--------ˆ--------ˆ--------ˆ

NO problems | 0 | 0 | 2 | 2

with key or data | 0.00 | 0.00 | 25.00 | 25.00

-----------------ˆ--------ˆ--------ˆ--------ˆ

Yes: Problems | 2 | 2 | 2 | 6

with key or data | 25.00 | 25.00 | 25.00 | 75.00

-----------------ˆ--------ˆ--------ˆ--------ˆ

Total 2 2 4 8

25.00 25.00 50.00 100.00

Logic

Match

Match

Data

Key

ODD

Data

ODD

Key

Checking Keys and Data gives exact answers

We are

comparing

data with

missing

values.

Data

problemJaramillo & Lavery Pharmasug 208 8

Page 9: 2008 Pharmasug, Parallel Validation of Files

mismatch Sand_vs_ODW

|1= Obs |10= Obs |11= Obs | Total

Frequency |in Left |in |in both |

Percent |Only |Right |Left and|

| |only |Right |

-----------------ˆ--------ˆ--------ˆ--------ˆ

NO problems | 0 | 0 | 2 | 2

with key or data | 0.00 | 0.00 | 25.00 | 25.00

-----------------ˆ--------ˆ--------ˆ--------ˆ

Yes: Problems | 2 | 2 | 2 | 6

with key or data | 25.00 | 25.00 | 25.00 | 75.00

-----------------ˆ--------ˆ--------ˆ--------ˆ

Total 2 2 4 8

25.00 25.00 50.00 100.00

Store Reg Prod STrx1 STrx2 STrx3 OTrx1 OTrx2 OTrx3

AAA A 0P1 12 10 8 12 10 8

BBB A 0P1 10 11 7 10 11 7

FFF A 0P1 17 11 8 19 10 6

CCC A 0P1 12 10 8

DDD B 0P2 10 15 2

EEE B 0P3 10 15 2 10 15 4

NNN c 1P1 19 15 11

CCZ A 0P1 12 10 8

Logic

Match

Match

Data

Key

ODD

Data

ODD

Key

Ideally, all obs should be

here

Checking Keys and Data gives exact answers

Keys Match, problems

with the data

Jaramillo & Lavery Pharmasug 208 9

Page 10: 2008 Pharmasug, Parallel Validation of Files

Timeline

Left File Right Left File

Check for duplicates Check for duplicates

Check for bad codes Check for bad codes

Clean the

file

Clean the

file

Contents: date & size Contents: date & size

Freq by Prod_code Freq by Prod_code

R

P

T

Rpt

Merge-Calc

Diff by

Prod_cd

Merge-

Calc High

Level Diffs

Rpt

electroni

c copy

Identify every row with problem

electroni

c copy

Problem Analysis – Row. electroni

c copy

Key Analysis

Problem Analysis – Rx

Rpt

Jaramillo & Lavery Pharmasug 208 10

Page 11: 2008 Pharmasug, Parallel Validation of Files

Timeline QC Process

Write programs, for

series of files, in

anticipation of file

delivery.

A batch of

files to be

compared

is

delivered

Run QC

Programs on

the batch

files

Assemble

report on

batch files

(concurrent

w/ run)

QC Programming

Review/ annotate

report (1 day)

Arrange meeting

with Responsible

Group.

(1 week)

Discuss report W/

Responsible Group

and create action

items. (1 day)

FAIL

Create

new

version of

files

(2 weeks)

Investigate /

fix action

items.

(1 week)

File is OK

or “close”

If files are close user runs

reports with new file and

compares results(1 week)

Pass

F

A

I

L

log as

file done

Jaramillo & Lavery Pharmasug 208 11

Page 12: 2008 Pharmasug, Parallel Validation of Files

Conclusion & Recommendations

• When data sources and process change use of a systematic approach as the outlined in this presentation to compare data at the top and record level provides an efficient mechanism to track progress, identify and resolve potential problems.

• Comparison and validation should be included in project timeline.

• QC metrics should be established for development team. However total validation must be conducted independently.

• Differences in data must be accounted 100% of the times.

Jaramillo & Lavery Pharmasug 208 12


Recommended