Post on 23-Feb-2016
description
transcript
Copyright 2007, Information Builders. Slide 1
How well do you know your DATA?
Glenn Wiebe
May 15, 2012
Is Data Liability?
$$$ for Data Storage $$$ for Data Backups $$$ for Data Archiving $$$ for Data Replication $$$ for Data Synchronization $$$ for Disaster Recovery Planning
Is Data Asset?
Helps in making decisions Provides 360 degree view across the enterprise Helps to understand the customer Helps in building effective Marketing Campaigns Predictive Analysis Statistical Analysis Sentimental Analysis
Data Governance Program
People Organizations need
executive sponsorship
Process Documented repeatable
processes and procedures
Technology Data Integration, Data
Quality, Data Synchronization, and Data Management
Data Governance
People
ProcessTechnology
iWay Data Integration Enablement
SFA/CRM Amdocs/Clarify BMC/Remedy MSDynamics Oracle/Siebel Salesforce.com SAP
Data Warehouse DB2 ETL Oracle/Essbase MS SSAS/OLAP Netezza SAP BW Teradata
B2B Internet EDI Legacy EDI MFT Online B2B XML
ERP/Financials Ariba I2 JD Edwards Lawson Manugistics Microsoft Oracle SAP
Industry HIPAA CIDX HL7 RNIF SWIFT 1Sync
Legacy Systems CICS IMS VSAM .NET Java TUXEDO etc
300+Adapters
Data Profiling Statistical Analysis
An overview of summary values, such as extremes, distribution and frequency analysis.
Domain Analysis A configurable analysis of data types.
Mask and Group Analysis An overview of value formats, groups and
dimensions. Business Rules
An analysis of the results of user-defined business rules.
Foreign Key and Dependency Analyses An inside look into complex connections in the
data. Drill Through
The option to display individual records that correspond to aggregated results.
Data Mart Reporting and analysis across multiple data set
analyses Web and/or hardcopy report viewing and
distribution
Data Quality Management Cycle
Parsing
Association(householding)
Formatcorrection
Issues causesidentification
Contentevaluation
Metadataunderstanding
Automaticcorrection
Profiling
Context-basedcleansing
Devianceidentification
Standardization
Ongoingmonitoring
Enrichment
KPIdefinition
Unification
Deduplication/ identification
Data understandingMonitoring and reporting
Data enhancement Data cleansing
iWay Data Quality Center
Parsing: Decomposition of fieldsinto component parts.
Cleansing: Modification of data valuesto meet domain restrictions, integrity constraintsor other business rules that define sufficientdata quality for the organization.
Standardization: Formatting of values into consistent layouts based on industry standards, local standards, user-defined business rules and knowledge bases of values and patterns.
Validation: Formatting of values into consistent layouts based on industry standards, local standards, user-defined business rules and knowledge bases of values and patterns.
Enrichment: Enhancing the value of internally held data by appending related attributes from external sources.
Matching: Identification, linking or merging related entries within or across sets of data.
Mastering Master Data
What is Master Data? Data describing your main business entities Data duplicated in multiple systems Data reused by multiple business processes
Examples Customer/Citizen/Patient Company/Partner/Agency Products/Items/Equipment Vendors/Suppliers Cost Centers/Employees Etc, etc, …
Master Data – Match & Merge
Unification identification of the set of records connected to one
person address vehicle contact …etc.
Deduplication golden record creation (the best representation of the identified subject)
Identification new data entries – to identify subject (person, address, etc.) to which the new record is
connected (matched)
Complex business rules using sophisticated algorithms and functions including
Levenstein distance Hamming distance Edit distance Data quality scores values Data stamps of last modification Source system originating data etc.
Data Quality Portal - Complex Exception Handling
Exception DB
ResolutionQueue
DQplan
KPI / DQIcalculation
Portal
Invalid dataextraction
Reports
Resolution queue
Workflow
Exceptionmanagement
Human Mind vs. Computer Systems
Hahaha raed tihs! i cdnuolt blveiee taht I cluod aulaclty
uesdnatnrd waht I was rdanieg. The phaonemnel pweor of the hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it dseno't mtaetr in waht oerdr the ltteres in a wrod are, the olny iproamtnt tihng is taht the frsit and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it whotuit a pboerlm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Azanmig huh?
Original data – before cleansing
Source data
Name G SIN Birth Date AddressDr. John Smith M 000000000 12/16/1978 14618 110 Ave Surrey V3R 2A9
Smtih W. John M 095-242-434 16.12.1978 Surrey 14618 110 Ave
Jhon William Simth SIN095242434 781612 25 Linden Str Toronto M4X 1V5
Dr. J.W. Smith M 095242433 11/16/78
John Smith 095252433 16.11.1978 8500 Leslie L3T 7M8 Toronto
Smith Jhon 16.11.1978 8500 Leslie street Marham
John Smiht 095252433 16.11.1978
Prepared data (after cleansing)
Cleansed data
First Last G SIN Birth Date AddressJohn Smith M 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue
John Smtih M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue
Jhon Simth M 095242434 M4X 1V5;ON;Toronto;25 Linden Street
Smith M 1978-11-16
John Smith M 095252433 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.
Jhon Smith M 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.
John Smiht 095252433 1978-11-16
Match
Cleansed data
First Last G SIN Birth Date AddressJohn Smith M 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue
John Smtih M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue
Jhon Smith M 095242434 M4X 1V5;ON;Toronto;25 Linden Street
Smith M 1978-11-16
John Smith M 095252433 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.
Jhon Smith M 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.
John Smiht 095252433 1978-11-16
Merge
Cleansed data
First Last G SIN Birth Date AddressJohn Smith M 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue
John Smtih M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue
Jhon Smith M 095242434 M4X 1V5;ON;Toronto;25 Linden Street
Golden recordFirst Last G SIN Birth Date Address
John Smith M
095242434 1978-12-16
M4X 1V5;ON;Toronto;25 Linden Street
The newest permanent address
The most frequent address
V3R 2A9;BC;Surrey;14618 110 Avenue
Merged records – before update
Source data
First Last G SIN Birth Date AddressJohn Smith M 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue
John Smith M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue
John Smith M 095242434 M4X 1V5;ON;Toronto;25 Linden Street
John Smith M 095252433 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.
John Smith M 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.
John Smiht 095252433 1978-11-16
Golden recordFirst Last G SIN Birth Date Address
John Smith M 095242434 1978-12-16 M4X 1V5;ON;Toronto;25 Linden Street
John Smith M 095252433 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.
Merged records – after update
Source data
First Last G SIN Birth Date AddressJohn Smith M 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue
John Smith M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue
John Smith M 095252433 M4X 1V5;ON;Toronto;25 Linden Street
John Smith M 095252433 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.
John Smith M 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.
John Smiht 095252433 1978-11-16
Golden recordFirst Last G SIN Birth Date Address
John Smith M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue
John Smith M 095252433 1978-11-16 M4X 1V5;ON;Toronto;25 Linden Street
One updated source recordmay cause modification in several records in MDC
Real World Use Case
The Goal Major hospital group is building a Master Patient Index Need to bring in acquisitioned systems Cleanse, Standard, DeduplicateThe Challenge Previously manually processed by hiring temporary staff Current phase projected to take temporary staff of 20 over 18 monthsThe Strategy Automate the cleansing, matching and merging business rules Data Stewardship provides human oversight to automated processThe Benefits Identifies the duplicate records according to very complex business rules Reusable rules for future phases Significantly reduced project time – from 18 down to 4 months. Over 400% ROI projected
Real World Use Case
Goal Performance Management Business Intelligence Change Management Process
The Challenge 100 Locations 14 Systems with out-of-sync master data
The Strategy Cleanse, Standardize, Match Master Data Management – Directorate, Borough, Site, Service Type, Service
Point, Team, Staff, Patient Master Data Governance Workflow
The Benefits Dynamic organizational change to support strategic initiatives Complete visibility into performance of organization vs goals
Real World Use Case
The Goal Services organization supporting the airline industry sells decision support information to
the industry members.
The Challenge Data Quality was adversely affecting the customer base satisfaction Data Quality was impacting new revenue generation opportunities
The Strategy Profile analysis according to specific business validation rules Monitor rolling 13 month window comparison of monthly data profiles Accumulate and report analysis to data providers
The Benefits Improves customer satisfaction and confidence in the information Increases reliability of the information as new data sources are added Documents and audits quality-control processes for customer review Reduces the dependency on human resources to detect and correct data quality issues
Summary of considerations
Access to variety of data sources Ability to influence data improvement anywhere in the
process Useable in batch and/or (real) real-time processing mode Extensible by customized business rules Access to third party data and services Historical and distributable analysis Reusability across multiple phases and projects Integrated data stewardship Platform flexibility for deployment and licensing Vendor partnership and support
Copyright 2007, Information Builders. Slide 22
InformationAccess
DataQuality
MasterData
Management
DataGovernance
iWay Software Benefits
Integrate All InformationAny Data
Any SystemAny ProtocolAny Platform
Any Process LatencyScheduled
Process DrivenEvent DrivenUser Driven
Real-time, Online, and BatchData Integration
Application IntegrationBusiness Integration
Service Oriented Architecture
Single Solution PlatformSingle Engine
Fast and ScalableSecure and Reliable
Fully Extensible
Questions?