Тестирование данных с помощью Data Quality Services (MS SQL 12)

Post on 07-Nov-2014

495 views 0 download

Tags:

description

Презентация доклада Дмитрия Романова на конференции SQADays-14, Львов 8-9 ноября 2013

transcript

Quality Assurance for Data with Data Quality Services (MS SQL 12)

Dmitriy RomanovItera Consulting, Kiev

Dmitriy Romanov

dmitriy.romanov@gmail.com

Areas of expertise:

Test Automation for various projects in:Business IntelligenceRIABilling systems

Agenda

• Intro– Data Quality – what it is about ?– Data Quality in Business Intelligence projects– Tools selection

• Data Quality Services– Structure– Project component– Data Quality routine

• Conclusions

Typical information flow

Data Quality Components

DATA QUALITY

Validity

Accuracy

Consistency

Integrity

Timeliness

Completeness

Data Quality IssuesBefore QA :

After QA :

Data Quality: What is it?

Business intelligence (BI) is a set of methodologies, processes, and technologies that

transform raw data into meaningful and useful information for business purposes.

Data Quality – represents the degree to which Data is suitable for business usages

Data Quality: Tools selection

Custom Tools• Variety of technologies• Flexibility• Accuracy

PROS

• Higher Competence level in business area / tech. stack

• Lots of development efforts

CONS

3rd-party software• Established methods, standards,

algorithms• Open / Expandable / Reusable• Lower entry level for newcomers

PROS

• Scalability / performance issues• Limitations

CONS

Gartner Magic Quadrant for BI platformsCHALLENGERS LEADERS

NICHE PLAYERS VISIONARIES

COMPLETENESS OF VISION

ABILITY TO EXECUTE

Data Quality: tasksData Quality Services (DQS) is a Knowledge-Driven data

quality solution enabling data stewards to easily improve the quality of their data

Cleansing Matching

Profiling Monitoring

DQS: Knowledgebase creation process

Build

Use

DQ Projects

KnowledgeManagement

Match & De-dupe Correct & sta

ndardize

Manage Knowledge

Connect

EnterpriseData

ReferenceData

Cloud Services

KnowledgeBase

Discover /

Explore Data

Notifications

Progress

Status

MatchingReference

Data

DQ Clients

DQ Server

DQ Projects Store

Common Knowledge Store

Knowledge Base Store

DQ Engine

3rd Party / Internal

SSIS DQ Component

DQ Active Projects

Published KBs

Knowledge Discovery

Data Profiling & Exploration

Cleansing

Azure Market Place

Reference Data API(Browse, Get,

Update…)

RD Services API

(Browse, Set, Validate…)

Data Domains

DQS User Interface

DQS Structure

DQS Usage

Knowledge Base

Reference Data Definition

Values/Rules

New

Suggestions

Correct & Corrected

Invalid

Source DQS CleansingComponent

SSIS Package

Destination

Reference Data Services

DQS Server

Design Run

Monitor Review & Manage

Activity Monitoring

Interactive Cleansing Project

Real Examples

Business Case – Source Data Quality Assurance

Source Data

Oracle

DB2

csv

Screen

DQS

Load

KDVH

ConfirmStatus

“Ready to load”

DQ Reports

Data steward - requesting source data fixing

ETL

Data steward - managing data KB- monitoring DQ process

How DQS could help QA Engineer ?

• In general it allows to bring closer things Data Analytics usually deal with

• Helps to understand underlaying data better • Introduce measurement and manageability to DQ

matters• Increase re-use/decrease re-work• Open and extendable proposal of new standard to

store and treat Knowledge Bases on iterative basis

Thank you