+ All Categories
Home > Documents > TETERprofessionaltester.com/magazine/backissue/PT025/ProfessionalTester... · ples include...

TETERprofessionaltester.com/magazine/backissue/PT025/ProfessionalTester... · ples include...

Date post: 09-Mar-2018
Category:
Upload: lyquynh
View: 226 times
Download: 4 times
Share this document with a friend
6
Essential for software testers TE TER SUBSCRIBE It’s FREE for testers February 2014 v2.0 number 25 £ 4 ¤ 5 / test effectiveness Including articles by: Markus Steinhauser Testbirds Ingo Nickles Vector Software Stefaan Luckermans Thaste IT Wayne Yaddow Anthea Whelan Grid-Tools Ariel Aharoni Testuff
Transcript
Page 1: TETERprofessionaltester.com/magazine/backissue/PT025/ProfessionalTester... · ples include Informatica and Microsoft ... • using automated ETL testing tools ... Validation Option

E s s e n t i a l f o r s o f t w a r e t e s t e r sTE TERSUBSCRIBE

It’s FREE for testers

February 2014 v2.0 number 25£ 4 ¤ 5/

test effectiveness

088902

088902

Including articles by:

Markus Steinhauser Testbirds

Ingo Nickles Vector Software

Stefaan Luckermans Thaste IT

Wayne Yaddow

Anthea Whelan Grid-Tools

Ariel Aharoni Testuff

Page 2: TETERprofessionaltester.com/magazine/backissue/PT025/ProfessionalTester... · ples include Informatica and Microsoft ... • using automated ETL testing tools ... Validation Option

18 PT - February 2014 - professionaltester.com

Test effectiveness

Successful testing in a data warehous-ing project requires effective early planning. The role of testing is to assure the quality not only of the work done, but the data itself before, during and after transformation, to evaluate whether the required business knowledge to the required levels of accuracy and useful-ness will be obtained. The complexity of these requirements and the data volumes involved are typically extreme.

If you are not familiar with the importance and concepts of data warehousing I recommend the short papers Data Quality Management, The Most Critical Initiative You Can Implement, Jonathan G. Geiger, 2004, SUGI 29, Intelligent Solutions Inc, http://www2.sas.com/proceedings/

sugi29/098-29.pdf and Proven Testing Techniques in Large Data Warehousing Projects, 2012, Syntel Corp, http://syntelinc.com/uploadedFiles/Syntel/Digital_Lounge/White_Papers/SYNT_testing_capability_inputs_BIDW.pdf.

This article will provide planning check-lists, discuss a variety of testing scenarios and explain the concepts and methods used for data verification.

Test targets, objects and typesDWH testing is focused on:

• data completeness and quality

• data transformations, source to target

• referential integrity of DWH facts and dimensions

• performance and scalability

• compliance to applicable standards

Assurance of these is achieved by study-ing technical artifacts:

• requirements documents, business and technical

• data models

• business rules

• data mapping documents

• DWH loading design logic

The DWH can be viewed as the product, but a single sequential lifecycle based on that idea is not sufficient. Rather, the product is the process that will create the DWH: for testing purposes, it may be helpful to think of that process as an appli-cation, although it is in fact a sequence

Warehouse awarenessby Wayne Yaddow

Wayne Yaddow’scomprehensive tutorial

Creating trust in DWH content quality

Page 3: TETERprofessionaltester.com/magazine/backissue/PT025/ProfessionalTester... · ples include Informatica and Microsoft ... • using automated ETL testing tools ... Validation Option

19PT - February 2014 - professionaltester.com

Test effectiveness

Figure 2: verifying the ETL process

Figure 3: testing methods to support the DWH development process

Data testing validates that:• application/database interfaces operate correctly• data has been accessed and updated correctly• data has been extracted, scrubbed and loaded correctly• data conversion criteria converts data as expected• applied calculations and manipulations yield accurate results

Database testing validates:• stored procedures, triggers, views, constraints• existing data quality• referential integrity and data consistency• data value persistence and retrieval• use of data inspection tools• visual inspection of results• use of in-memory database to speed test execution

GUI and business rule testing validates:• online delivery of DWH output• conformance of data and its behaviour with business rules and workflow• custom reporting• system performance• source to staging accuracy• staging to enterprise accuracy• warehouse accuracy

Performance testing validates that:• short transactions minimize long-term locks and improve concurrency• user interaction is avoided during transactions• the database is high-normalized and redundancy is reduced• historical/aggregated data is minimized• indexes are used carefully• SQL queries, system configuration and hardware are well tuned

StandardsData sources(various)

data dataETL ETL

data comparison data comparison

data comparisondata comparison

BI tools

staging DB DWHdatamart

datasources

reports

• validate data acquisi-tion business logic

• compare schema and data between sources and staged data (row counts, new and missing data, mis-matched data,missing or invalid constraints eg primary/foreign keys)

• tune performance of data loading jobs

• validate data integra-tion and transforma-tion logic between staged data and that loaded in the DWH

• validate the dimension model

• tune performance of data staging jobs

• validate data mart design

• compare data between ODS and data marts with SQL queries

• tune performance of data mart access

• validate data on reports with DWH and data marts

• validate report filters and drill downs

• performance tune data retrieval to reports

Figure 1: primary disciplines of DWH testing

Page 4: TETERprofessionaltester.com/magazine/backissue/PT025/ProfessionalTester... · ples include Informatica and Microsoft ... • using automated ETL testing tools ... Validation Option

20 PT - February 2014 - professionaltester.com

Test effectiveness

of operations, which might be thought of as its components, performed using multiple existing and custom-built tools. Therefore integration, user acceptance and especially regression testing need to be performed frequently throughout the project to prevent the development of the process going off track.

Test objectives and activitiesFigure 1 (adapted with permission from http://www.virtusa.com/services/independ-ent-software-quality/technology-offerings/specialized-qa/data-warehouse-testing) shows the four primary disciplines of DWH testing and what they aim to validate.

Figure 2 (adapted with permission from http://querysurge.com) shows an example project. Each arrow is a separate develop-ment, and therefore testing, activity.

Figure 3 (adapted with permission from Adventures with Testing BI/DW Application: On a crusade to find the Holy Grail, Raj Kamal, 2013, Microsoft Corp) summarizes some typical testing methods as they occur in parallel with and to sup-port the development project.

Special effort must be made in incident management. Continuing work in the presence of a data-affecting defect could cause severe disruption and rework: if exhaustively redundant infrastructure is not available, it could possibly even cause irretrievable loss of data. This means every incident is a showstopper until it is fully understood, so the investigation pro-cess must be efficient. It’s important that appropriate business and IT operations experts are available to support it.

When reviewing the test strategy, the fol-lowing should be considered carefully.

What data should be used for testing? This question usually arises very early in the project and refers to a part of test data acquisition activity. Data sources are often the operational systems (online transaction processing) providing the lowest level of data. Source data can be databases, front-user applications,

legacy feeders, modular data models or files (flat files, .xls, ASCII, EBCDIC etc). Multiple data sources potentially introduce a large number of issues such as semantic conflicts, data capturing and synchronization. So the testing objec-tive here is to understand the different sources of data and to check whether all data from heterogeneous systems are properly fed into the target tables (see Meeting the Data Warehouse Business Intelligence Testing Challenges, Abhijit Singh, 2013, L&T Infotech, http://lntin-fotech.com/services/testing/documents/Meeting_the_Data_Warehouse_Business_Intelligence_Testing_Challenges.pdf).

Is the data accurate? Once the data acquisition phase is complete, the next step is often data cleansing and filtering. Data cleansing is the process of remov-ing unwanted data. After cleansing, data becomes usable and ready to be fed to other work areas. The main testing objec-tive here is to check the data and ensure that there are no corrupting errors. In addi-tion, testing often validates the field values against known lists of entities, and makes sure that first level of transformation rules are implemented correctly.

Are the expected data transformations completed? Data transformations are a process of mapping source-to-destination data using the business transforming logic. Mappings include 1-1 look-ups, switch cases, DB logics, combinations, truncating, defaulting and null processing. End-to-end testing of data flow is very important to ensure accuracy, complete-ness (no missing or invalid data) and consistency of the transformed data. The main testing objective is modelling business concepts in technical aspects; so every business rule should be validated considering the business objective.

Are all desired targets loaded cor-rectly? Data load phases generally process data into the end target, usually the DWH. Depending on the business requirement, data loads can be full or incremental. Loading can take place daily, weekly or monthly; so testing in this

phase needs to assure that correct data is loaded in the defined duration. Data loaded in the dimensions and fact tables actually presents the final reporting picture of the warehouse. Business Intelligence (BI), being the user facing area, is a very important part; and, facts act as feeders for the reports. Testing should further verify that data flows properly from data source to staging them in the DWH.

Common defectsDWH testing often reports the following types of defects. This list may be used as a source for error guessing, or later for incident investigation.

• inadequate extract, transform, load (ETL) design documents are available for test planning; for example, data models and mapping or detailed design documents are missing or not current

• source table field values are unexpect-edly null or do not meet data mapping specifications (source data profiling is not complete)

• target DWH table field values are unexpectedly null or do not meet data mapping specifications (source data profiling is not conducted after each load to the DWH)

• source data to populate the DWH is “dirty”, ie does not comply with expecta-tions and requirements

• many ETL errors have been discovered in testing, ie developer unit testing is inadequate (this is particularly common when ETL is developed without the use of integration tools: well-known exam-ples include Informatica and Microsoft SQL Server Integration Services)

• issues with source to target data model and mappings: for example, they have not been reviewed/approved by stake-holders or not maintained consistently during development

• important data is missing or unsuitable because data volumes and/or variation

Page 5: TETERprofessionaltester.com/magazine/backissue/PT025/ProfessionalTester... · ples include Informatica and Microsoft ... • using automated ETL testing tools ... Validation Option

21PT - February 2014 - professionaltester.com

Test effectiveness

of data types are too large: this often means maintaining historical data has been neglected for too long, making using it in the current project unviable

• field constraints and SQL are not coded correctly for automated ETL

• more ETL logs and messages are being produced than can be acted upon (especially likely to occur early in projects)

Test basisThe following should be reviewed in detail and repaired as necessary before use for test planning and design.

Business requirements documenta-tion: testers must develop the project test strategy and high-level test scenarios, notifying business and other stakeholders of potential quality issues identified

Data models for source and DWH target schemas: from these, testers understand the primary and foreign keys and data flows from source to target

ETL or stored procedure design and logic documents: used to develop test cases (functional, structural and ‘grey-box’) to verify the end-to-end ETL process

Production and QA deployment tasks: used to develop test scenarios address-ing the physical deployment and load of the DWH

During test planning and design it is vital to consider the skills that will be needed for test implementation and execution. These will typically include:

• firm understanding of DWH and DB concepts

• advanced use of SQL queries, stored procedures, DB and SQL editors

• understanding of project data used by the business (data sources, data tables, data dictionary, business terminology)

• data profiling methods and tools (see below)

• understanding of data models, data mapping documents, ETL design and ETL coding

• experience with multiple DB systems: Oracle, SQL Server, Sybase, DB2 etc

• troubleshooting automated ETL ses-sions and workflows

• using automated ETL testing tools such as QuerySurge or Informatica Data Validation Option

• deployment of DB code to databases

• unix scripting, Autosys, Anthill, etc.

• using Microsoft Excel and Access for small-scale data analysis

DataprofilingThis is sometimes called “source systems analysis”: the examination and assessment of the source systems data quality, integrity and consistency. Data profiling should be planned immediately after the candidate data sources are identified and executed against the data from each source immediately when it becomes available. Waiting until all sources become available then attempt-ing to profile them as a whole is a common and damaging mistake.

Data profiling reveals defects such as the following examples:

• data elements (fields, columns) used for purposes other than expected

• empty columns

• invalid values

• inconsistent representation of equivalent or similar values within fields

• null values in fields defined “not null”

• violation of structural dependencies

• violation of expected column relation-ships such as order of date values

• violation of business rules

• unrealistic frequency of occurrence of a specific value in a column

Root cause analysis of these and other defects detected by data profiling should be carried out by business analysts: it is not enough to know what is wrong with the data, someone needs to find out how it was introduced, because it may mean that related data which appears accept-able is in fact wrong!

Failure to carry out data profiling and act on its findings correctly often leads to these defects or, worse, this wrongness being carried forward into ETL development, causing significant and very hard to detect quality issues in the final data warehouse (see The Necessity of Data Profiling: A How-to Guide to Getting Started and Driving Value, Matt Austin, 2010, http://tdwi.org/articles/2010/02/03/data-profiling-value.aspx).

BasicETLverificationschecklistTesting must be designed to verify, for every ETL session, that:

• data mappings, source to target were correct before and after

• all table and fields were loaded from source to staging

• primary and foreign keys were properly generated using sequence generator

• not-null fields were populated in all target DWH objects

• no data was truncated

• field lengths, data types and data formats are as specified in the design phase

• no duplicate records exist in individual target tables

Page 6: TETERprofessionaltester.com/magazine/backissue/PT025/ProfessionalTester... · ples include Informatica and Microsoft ... • using automated ETL testing tools ... Validation Option

22 PT - February 2014 - professionaltester.com

Test effectiveness

• data transformations were applied based on business rules

• numeric fields are populated to cor-rect precision

• each session completed with only expected exceptions

• all cleansing, error and exception han-dling were implemented as planned

• data calculations and aggregations were completed

Performance testingAs we saw at the start of this article, the product of a DWH project is the process which will produce the DWH, and that can be thought of as similar to an application. As such, it needs to perform adequately to produce the required DWH within the necessary timeframe, meaning that all its components (ETLs) need certain respon-siveness and scalability. There are usually many and complex non-functional require-ments, but here is a typical minimal set:

• execute with peak production volume to show that the ETL process can com-plete within a specified time

• analyse ETL loading times with a smaller amount of data to predict scal-ability issues

• measure ETL processing times compo-nent by component to identify the most potentially beneficial improvements

• measure the time to reject and manage large volumes of rejected data

• shut down servers during ETL execution and test for restartabil-ity (recoverability)

• simulate maximum concurrent users for all periodic BI reports and ad-hoc reports

• assure BI reports can be accessed during simulated ETL production loads

Testing and trustData warehousing is still thought of by some as rather vague and imprecise: the “big data” concept can suggest either crude aggregation to support glib gener-alizations or speculative trend-spotting to inspire creative marketing initiatives. This view is now outdated: DWHs are integral to operational and strategic functions such as process monitoring and manage-ment, business intelligence, financial reporting and many more. These require that those using the data are highly con-fident of its quality. Poor quality will ruin their endeavours but will not be apparent in the way failure of, for example, an application usually is. Rather, they need to be able to trust the data with no way of checking it.

For that reason, expect the rigour, effort and skill devoted to DWH testing, and correspondingly investment in them, to continue to increase fast

Wayne Yaddow ([email protected]) is a senior data warehouse/ETL/BI report QA analyst


Recommended