Date post: | 20-Oct-2014 |
Category: |
Technology |
View: | 2,112 times |
Download: | 1 times |
Resurrecting the Resurrecting the Prodigal Son - Data Prodigal Son - Data
Quality Quality
Resurrecting the Resurrecting the Prodigal Son - Data Prodigal Son - Data
Quality Quality ““Rise from Ashes: Battle of Rise from Ashes: Battle of
Data Quality Testing”Data Quality Testing”
Speakers
Bhoomika Goyal
Working @ Microsoft for over an year Engineer from Mumbai Loves playing Chess, Solving Puzzles and Reading
Raj
Working @ Microsoft Business Intelligence COE 5.5 + years of Testing experience Loves watching movies, reading suspense thrillers & playing
cricket Passion - Testing (http://www.itest.co.nr )
www.Test2008.in
2
Horror Story Loss: $ 125 million Reason: Discrepancy
between the two measures (rocket thrusts to newtons)
NASA Mars Climate Orbiter spacecraft LOST
www.Test2008.in
3
Bad, Bad, Bad Data Quality
www.Test2008.in
4
Erroneous Mailing hit $611 billion for
US businesses in 2002
DQ is not my problem?
Think Again !!!!!
www.Test2008.in
5
DQ Hot Candidates
www.Test2008.in
6
Data Movement
Migrations
BackupsRestore
Import
ExportData Warehousing
Business Intelligence
OLTP
OLAP
CRM
ERP
DQ Ishikawa Diagram
www.Test2008.in
7
Bad Decisions
(Loss $ & Customers)
DQ Reqmts not documented
Lack of white box testing
Data is dynamic
CRM & ERPs Implementations
Mergers / Take Over
www.Test2008.in
8
Data Quality
DQ is an indicator that tells about the health of the DATA
www.Test2008.in
9
GOOD Data Quality
DQ is good if data is fit to use for
decision making
www.Test2008.in
10
Data Quality Testing
Involves validating, monitoring
& reporting various attributes of Data like
accuracy, validity, timeliness etc
DQ Checks
www.Test2008.in
11
Row Counts
Consistency
ReferentialIntegrity
Redundancy
UsabilityCompleteness
Domain Integrity Timeliness Accuracy
Validity
Row Count Check
www.Test2008.in
12
Completeness Check
www.Test2008.in
13
Among Voters seen Dead People
www.Test2008.in
14
US General Election:
4,755 deceased people voted
Consistency Check
www.Test2008.in
15
A One-House, $400 Million Bubble Goes Pop
www.Test2008.in
16
$1,21, 000 overvalued at $ 400 million
Govt. Expected $8 million as Tax Revenue
Accuracy Check
www.Test2008.in
17
Validity Check
www.Test2008.in
18
CD Mail Fraud
Man received 22,260 CDs at discounted price by making each address different enough
www.Test2008.in
19
David Loshin
123 Main Street Any town, NY 11787
David Loshin
123 Main Street, Near Wal-Mart Any town, NY 11787
Redundancy Check
www.Test2008.in
20
Referential Integrity Check
www.Test2008.in
21
Domain Integrity Check
www.Test2008.in
22
Timeliness
www.Test2008.in
23
How do we test DQ?
www.Test2008.in
24
DQ Rule EngineMetadata
Results
Create Procedure RowCount (SrcTbl, TgtTbl)
Begin
Declare SRC, TGT Integer Select SRC = Count(*) from SrcTbl Select TGT = Count(*) from TgtTbl)
If SRC = TGT Then
Return “PASS” Else
Return SRC – TGT End If
End
Rule Tbl1 Tbl2
RC Emp Emp
RI Emp Dept
DC HR HR
Rule Result
Comment
RC Pass -
RI Fail 10
DC Pass -
Metadata
Results
Row Count Logic
Duplicate Logic
Create Procedure Duplicate(Tbl)
Begin
Declare Dup Integer
Select Dup = Count of Select * from Tbl GroupBy <<ColumnList>> Having count(*) > 1
If Dup = 0 Then
Return “PASS” Else
Return Dup End If
End
End
You can’t improve what you can’t measure
www.Test2008.in
25
Thre
shold
Time
5 %
10 %
100 %
Data Quality
Red: BAD DQ
Yellow: Watch it
Green: Good DQ
DQ Testing is your friend !!!
High Data (Test) Coverage Automation (Manual Effort Reduction) High confidence about your data Accurate Decisions
www.Test2008.in
26
Referenceshttp://www.dataqualitysolutions.com/data/index.shtml
http://searchdatamanagement.techtarget.com/generic/0,295582,sid91_gci1251808,00.html
http://en.wikipedia.org/wiki/Effect_of_Hurricane_Katrina_on_New_Orleans
www.Test2008.in
27