Date post: | 07-Jul-2018 |
Category: |
Documents |
Upload: | alain-bismark-almeida-diaz |
View: | 348 times |
Download: | 8 times |
of 182
8/18/2019 iWay Data Quality Center User's Guide
1/182
iWay Data Quality Center User's
GuideVersion 6.0.1 Service Manager (SM)
iWay
DN3501942.0709
8/18/2019 iWay Data Quality Center User's Guide
2/182
Cactus, EDA, EDA/SQL, FIDEL, FOCUS, Information Builders, the Information Builders logo, iWay, iWay Software,Parlay, PC/FOCUS, RStat, TableTalk, Web390, and WebFOCUS are registered trademarks, and Magnify is a trademarof Information Builders, Inc.
Due to the nature of this material, this document refers to numerous hardware and software products by theirtrademarks. In most, if not all cases, these designations are claimed as trademarks or registered trademarks by therespective companies. It is not this publisher’s intent to use any of these names generically. The reader is thereforcautioned to investigate all claimed trademark rights before using any of these names other than to refer to theproduct described.
Copyright © 2009, by Information Builders, Inc. and iWay Software. All rights reserved. Patent Pending. This manuaor parts thereof, may not be reproduced in any form without the written permission of Information Builders, Inc.
8/18/2019 iWay Data Quality Center User's Guide
3/182
iWay
Contents
Preface................................................................................................................9
Documentation Conventions............................................................................................1
Related Publications........................................................................................................1
Customer Support...........................................................................................................1
Help Us to Serve You Better.............................................................................................1
User Feedback................................................................................................................1
iWay Software Training and Professional Services..............................................................1
1. Introducing iWay Data Quality Center...........................................................17About iWay Data Quality Center........................................................................................18
Managing Data Quality.....................................................................................................18
Unifying Records.............................................................................................................1
Supplied Modules...........................................................................................................1
Summary of Other Product Features.................................................................................20
2. System Requirements and Installation.........................................................23
System Requirements.....................................................................................................24
Installation Procedure......................................................................................................2Installing Database Connectivity Drivers............................................................................2
License Key....................................................................................................................2
3. Getting Started..............................................................................................27
Creating a New Project....................................................................................................28
Plan File Basics..............................................................................................................28
Using Input Files.............................................................................................................28
Running and Debugging a Plan.........................................................................................2
Connecting to a Database................................................................................................2
4. Configuring Services ....................................................................................31
XDDQAgent.....................................................................................................................3
XDDQCBatchExecAgent....................................................................................................3
iWay Data Quality Center User's Guide
8/18/2019 iWay Data Quality Center User's Guide
4/182
8/18/2019 iWay Data Quality Center User's Guide
5/182
Date Functions........................................................................................................6
String Functions......................................................................................................63
Bitwise Functions....................................................................................................7
MinMax Functions...................................................................................................76
Aggregate Functions................................................................................................7
Conditional Expressions...........................................................................................8
Conversion and Formatting Functions........................................................................83
Word Set Operation Functions..................................................................................8
Unclassified Functions.............................................................................................89
Regular Expressions........................................................................................................9
@" Syntax (Single Escaping).....................................................................................9
Capturing Groups.....................................................................................................9
8. Unifying Records............................................................................................93
Candidate Groups...........................................................................................................9
Basic Method: SimpleKey.........................................................................................94
Symmetric Merging Method: Union............................................................................94
Hierarchical Merging Method: Hierarchical / ClassicHierarchical..................................9
Hierarchical With Union Merging Method: HierarchicalUnion........................................9
Creating Client Groups.....................................................................................................9
Unification Roles.............................................................................................................9
Manual Override..............................................................................................................9Group ID Stability............................................................................................................9
9. Running iWay DQC in Command Line Mode.................................................101
Scripts for Command Line Mode....................................................................................102
Return Codes................................................................................................................10
10. Configuring Run-Time Variables.................................................................105
Introduction..................................................................................................................106
Data Sources................................................................................................................10
Folder Shortcuts............................................................................................................10
Run-Time Components...................................................................................................10
11. Using Online Services................................................................................109
Online Server Configuration............................................................................................11
iWay Data Quality Center User's Guide
Contents
8/18/2019 iWay Data Quality Center User's Guide
6/182
Server Configuration Components..................................................................................11
SecuredWebAccess Component..............................................................................11
HttpDispatcher Component....................................................................................11
OnlineServices Component.....................................................................................113
OnlineServices Component Configuration........................................................................11
ServiceReference Element......................................................................................11
Input and Output Methods......................................................................................11
HttpInputMethod/HttpOutputMethod.......................................................................11
Input and Output Formats..............................................................................................11
CSV Format...........................................................................................................118
XML Format...........................................................................................................11
SOAP Format.........................................................................................................123
Multipart Format....................................................................................................12Logging Requests and Responses..................................................................................12
Example: serviceConfig Configuration.............................................................................12
Creating a Simple SOAP Web Service..............................................................................12
Preconditions........................................................................................................128
Procedures for Creating the Service........................................................................12
Sample Input Message..........................................................................................134
Sample Output Message........................................................................................13
12. Monitoring..................................................................................................137What Is Monitoring?......................................................................................................138
File Output Format.........................................................................................................138
Graphical User Interface................................................................................................140
Batch...................................................................................................................141
Online Server........................................................................................................14
Connection...........................................................................................................141
Connection Options...............................................................................................14
Filtering................................................................................................................142
Filtering Options....................................................................................................14
Refresh.................................................................................................................14
Snapshots............................................................................................................142
Drill Down.............................................................................................................14
6 iWay Softwar
Contents
8/18/2019 iWay Data Quality Center User's Guide
7/182
8/18/2019 iWay Data Quality Center User's Guide
8/182
8 iWay Softwar
Contents
8/18/2019 iWay Data Quality Center User's Guide
9/182
iWay
Preface
This document is written for system integrators and application designers who need toensure data quality control in transactional and analytical applications. It describes how touse iWay Data Quality Center (DQC) in software integration projects to create applicationsfor data quality assurance.
How This Manual Is Organized
This manual includes the following chapters:
ContentsChapter/Appendix
Provides an overview of iWay Data Quality Center(DQC). It describes the product features used in themanagement of data quality, and the suppliedmodules that enable integration with theinfrastructure at your site. It also summarizesdeployment, operational, and performance featuresof the product.
Introducing iWay DataQuality Center
1
Describes the requirements of the two majorcomponents of iWay DQC. It also describes how toinstall iWay DQC as part of iWay Integration Tools(iIT).
System Requirements andInstallation
2
Describes iWay DQC Manager, which is a design toolfor solving data quality problems.
Getting Started3
Describes how to configure the two predefinedservices that you can use as part of your iWay DQC
projects.
Configuring Services4
Describes the supported data types in iWay DQCrecords, I/O operations, and step properties.
Working With Data Types5
iWay Data Quality Center User's Guide
8/18/2019 iWay Data Quality Center User's Guide
10/182
ContentsChapter/Appendix
Describes the dictionary files that are created andmaintained in iWay DQC.
Creating Dictionary Files6
Describes expressions used in iWay DQC steps.Using Expressions7
Describes unification, which is identifying groups of records that belong to one logical entity (usually called client), based on a certain set of criteria.
Unifying Records8
Describes how to run iWay DQC in command line(batch) mode.
Running iWay DQC inCommand Line Mode
9
Describes how to control certain run-time aspects of
iWay DQC by setting variables in the configurationfile.
Configuring Run-Time
Variables
10
Describes online services, which provideService-Oriented Architecture (SOA) functionality iniWay DQC.
Using Online Services11
Describes how to view the progress of an iWay DQCconfiguration that is running, or the state of the onlineserver.
Monitoring12
Describes best practices that are used in the
implementation of iWay DQC. It includes projectdirectory, naming, and scoring conventions.
Best PracticesA
Provides the definition for various terms used in thisguide.
Glossary B
Documentation Conventions
The following table lists and describes the conventions that apply in this manual.
DescriptionConvention
Denotes syntax that you must enter exactly as shown.THIS TYPEFACE
or
this typeface
10 iWay Softwar
Documentation Conventions
8/18/2019 iWay Data Quality Center User's Guide
11/182
DescriptionConvention
Represents a placeholder (or variable), a cross-reference, or animportant term. It may also indicate a button, menu item, or dialog
box option that you can click or select.
this typeface
Indicates a default setting.underscore
Highlights a file name or command.this typeface
Indicates keys that you must press simultaneously.Key + Key
Indicates two or three choices. Type one of them, not the braces.{ }
Separates mutually exclusive choices in syntax. Type one of them,not the symbol.
|
Indicates that you can enter a parameter multiple times. Type only the parameter, not the ellipsis points (...).
...
Indicates that there are (or could be) intervening or additionalcommands.
.
.
.
Related Publications
To view a current listing of our publications and to place an order, visit our World Wide Website, http://www.iwaysoftware.com. You can also contact the Publications Order Departmenat (800) 969-4636.
Customer Support
Do you have questions about iWay Data Quality Center (DQC)?
Join the Focal Point community. Focal Point is our online developer center and more than amessage board. It is an interactive network of more than 3,000 developers from almostevery profession and industry, collaborating on solutions and sharing tips and techniques
Access Focal Point at http://forums.informationbuilders.com/eve/forums.
iWay Data Quality Center User's Guide 1
Preface
http://www.iwaysoftware.com/http://forums.informationbuilders.com/eve/forumshttp://forums.informationbuilders.com/eve/forumshttp://www.iwaysoftware.com/
8/18/2019 iWay Data Quality Center User's Guide
12/182
You can also access support services electronically, 24 hours a day, with InfoResponseOnline. InfoResponse Online is accessible through our World Wide Web site,http://techsupport.iwaysoftware.com/ . You can connect to the tracking system and knownproblem database at the Information Builders support center. Registered users can open,update, and view the status of cases in the tracking system and read descriptions of reportesoftware issues. New users can register immediately for this service. The technical supporsection also provides usage techniques, diagnostic tips, and answers to frequently askedquestions.
Call Information Builders Customer Support Services (CSS) at (800) 736-6130 or (212) 7366130. Customer Support Consultants are available Monday through Friday between 8:00A.M. and 8:00 P.M. EST to address all your questions. Information Builders consultants caalso give you general guidance regarding product capabilities and documentation. Be prepareto provide your six-digit site code ( xxxx.xx) when you call.
To learn about the full range of available support services, ask your Information Buildersrepresentative about InfoResponse Online, or call (800) 969-INFO.
Help Us to Serve You Better
To help our consultants answer your questions effectively, be prepared to providespecifications and sample files and to answer questions about errors and problems.
The following table lists the environment information that our consultants require.
Platform
Operating System
OS Version
JVM Vendor
JVM Version
The following table lists the deployment information that our consultants require.
For example, JCA, Business Services Provider, iWay Service Manager
Adapter Deployment
For example, WebSphereContainer
Version
12 iWay Softwar
Help Us to Serve You Better
http://techsupport.iwaysoftware.com/http://techsupport.iwaysoftware.com/
8/18/2019 iWay Data Quality Center User's Guide
13/182
Enterprise Information
System (EIS) - if any
EIS Release Level
EIS Service Pack
EIS Platform
The following table lists iWay-related information needed by our consultants.
iWay Adapter
iWay Release Level
iWay Patch
The following table lists the types of iWay Explorer. Specify the version (and platform, if different than listed previously) in the columns provided.
PlatformVersioniWay Explorer Type
Swing
Servlet
Eclipse™
Embedded in iWay Designer
The following table lists additional questions to help us serve you better.
Error/Problem Details or InformationRequest/Question
Did the problem arise through
a service or event?
Provide usage scenarios orsummarize the applicationthat produces the problem.
iWay Data Quality Center User's Guide 1
Preface
8/18/2019 iWay Data Quality Center User's Guide
14/182
Error/Problem Details or InformationRequest/Question
When did the problem start?
Can you reproduce thisproblem consistently?
Describe the problem.
Describe the steps toreproduce the problem.
Specify the error message(s).
Any change in theapplication environment: for
example, softwareconfiguration, EIS/databaseconfiguration, or application?
Under what circumstancedoes the problem not occur?
Following is a list of error/problem files that might be applicable.
Input documents (XML instance, XML schema, non-XML documents)
Transformation filesError screen shots
Error output files
Trace files
Service Manager package to reproduce problem
Custom functions and services in use
Diagnostic Zip
Transaction log
For information on tracing, see the iWay Service Manager User's Guide.
14 iWay Softwar
Help Us to Serve You Better
8/18/2019 iWay Data Quality Center User's Guide
15/182
8/18/2019 iWay Data Quality Center User's Guide
16/182
16 iWay Softwar
iWay Software Training and Professional Services
8/18/2019 iWay Data Quality Center User's Guide
17/182
iWay
Introducing iWay Data Quality
Center
1
Topics:This section provides an overview of iWay Data Quality Center (DQC). It describesthe product features used in themanagement of data quality, and thesupplied modules that enable integrationwith the infrastructure at your site.
About iWay Data Quality Center
Managing Data Quality
Unifying Records
This section also summarizesdeployment, operational, andperformance features of the product.
Supplied Modules
Summary of Other Product Features
iWay Data Quality Center User's Guide 1
8/18/2019 iWay Data Quality Center User's Guide
18/182
8/18/2019 iWay Data Quality Center User's Guide
19/182
Parsing and standardization. Parsing is the decomposition of a field into its componenparts. Standardization applies consistent formats to field values, based on industry standards, local standards (for example, postal authority standards for address data),user-defined business rules, and knowledge bases that consist of values and patterns
Cleansing. Cleansing is the modification of data values to satisfy domain restrictions,integrity constraints, or other business rules that define data quality for your organizationWith cleansing, inaccurate data from a data source is detected and corrected or removedCleansing ensures that a given set of data is complete, accurate, and valid, making thedata meaningful and useful. Cleansing minimizes data errors and improves businessperformance.
Matching. Matching is identifying, then linking or merging, related entries within or acrossets of data.
Enrichment. Enrichment is the enhancement of internally stored data by appendingrelated attributes from external sources (for example, consumer demographic attributeor geographic descriptors).
Monitoring. Monitoring is the deployment of controls to ensure ongoing conformity of data to the business rules that define data quality for your organization.
Unifying Records
One of the main technological capabilities of a data quality management tool is unificationof any number of records that contain the same content.
iWay DQC enables data integration from different sources by analyzing the content, applyincleansing rules, and validating data against specified dictionaries. The processed data cathen be unified using the iWay DQC hierarchical unification methods.
The process also enables associative pairing, even when different identification key structureexist. Associative pairing includes partially complete records. A single identification key isnot required.
When data quality is poor or when insufficient information about the identification key affectunification results, iWay DQC explicitly marks records to allow for manual correction.
Supplied Modules
iWay DQC architecture is customizable. The product is shipped with ready-to-use modules
that allow for easy integration with an existing Information Technology (IT) infrastructure.
Data Quality Modules
iWay DQC Base. The core module used in data quality and data flow management. Iincludes the ability to define business rules.
iWay Data Quality Center User's Guide 1
1. Introducing iWay Data Quality Center
8/18/2019 iWay Data Quality Center User's Guide
20/182
iWay DQC Profile. Module for advanced data profiling. It includes semantic analysisand the application of business rules.
iWay DQC Reporting. Module for data quality monitoring and reporting.
Business Task Modules
iWay DQC Address. Module for parsing, cleansing, and identifying address records iany form, including unstructured text in a field.
iWay DQC Party. Module for identification and unification of physical persons and legaentities.
iWay DQC Contact. Module for contact information quality management.
iWay DQC Household. Module for implementation of client identification, addressesand additional information used to identify households.
iWay DQC Car. Module for vehicle data identification.
Technology Modules
iWay DQC Batch. Data interface for batch processing mode.
iWay DQC Online. Data interface for on-demand processing mode. It includes Webservice methods and implementation of data quality firewall functionality.
The technology behind iWay DQC is configurable through management applications ormetadata. From templates supplied with the product, you can derive new configurations fo
specific information entities. For example, you can modify the iWay DQC Party configuratiotemplate to create new configurations for managing the quality of driver license data.
Summary of Other Product
Features
iWay DQC provides the following deployment, operational, and performance features.
Deployment. iWay DQC is compatible with other platforms in the industry. Compatibilitis achieved by leveraging proven Java™ technologies. The product technology is easy tintegrate with an existing Information System/Information and Communication
Technologies (IS/ICT) infrastructure. It integrates with any Enterprise Service Bus (ESBService-Oriented Architecture (SOA), or extract, transform, load (ETL) tool, including iWaService Manager, IBM WebSphere ® , Oracle WebLogic ® , and SAP NetWeaver ® .
20 iWay Softwar
Summary of Other Product Features
8/18/2019 iWay Data Quality Center User's Guide
21/182
Flexibility and open standards. The iWay DQC solution is easily configured usingsupplied administration applications. Operation does not require any external tools orother third-party applications. iWay DQC is platform independent. It is based on openstandards (XML, Web services, and SOA). iWay DQC implements documented conceptuadata models that are portable across many existing database platforms.
Core functionality. The core system is composed of a set of algorithms capable of hierarchical unification by identification keys, regardless of internal data structure. By using the defined keys, iWay DQC can perform approximate matching in record unification
External reference data sources. iWay DQC taps into external data sources, such anational addresses or name registries, to retrieve reference data for parsing, cleansingand validation. iWay DQC also uses names, organizations, academic titles, phonenumbers, and other dictionaries of information to parse and validate input data. You caextend this feature with your own custom lists.
Performance. iWay DQC uses parallel data processing methods to ensure scalability and enable incremental data processing, both in batch and on-demand online processinmodes. Online mode can perform the data quality process within less than 0.1 secondBatch mode can process more than 5,000,000 records in an hour. You can embed iWaDQC into business-to-business (B2B), application-to-application (A2A), portal, and extracttransform, load (ETL) processes for both online and batch modes.
iWay Data Quality Center User's Guide 2
1. Introducing iWay Data Quality Center
8/18/2019 iWay Data Quality Center User's Guide
22/182
22 iWay Softwar
Summary of Other Product Features
8/18/2019 iWay Data Quality Center User's Guide
23/182
iWay
System Requirements and
Installation
2
Topics:This section describes the systemrequirements of the two majorcomponents of iWay Data Quality Center(DQC). It also describes how to installiWay DQC as part of iWay IntegrationTools (iIT).
System Requirements
Installation Procedure
Installing Database Connectivity
Drivers
License Key
iWay Data Quality Center User's Guide 2
8/18/2019 iWay Data Quality Center User's Guide
24/182
System Requirements
iWay DQC consists of two major components: the server engine and the graphical userinterface. Each component has a different set of system requirements.
Server Engine (Core)
The code for the server engine is platform-independent. Therefore, you can run the serverengine on almost any platform (combination of operating system and processor architectureas long as there is a suitable Java Runtime Environment (JRE) for that platform.
The server engine requires JRE 1.4 or later. However, JRE 1.5 or later is recommended. Inparticular, certain advanced features (namely, the Reporting step) are not available if iWayDQC is run on JRE 1.4.
iWay DQC requires a sufficient amount of memory (at least 256 MB). Large configurationsmay require up to 1 GB. Additional memory may improve performance of the engine.
iWay DQC also requires enough disk space for temporary files and data. Two to three timethe amount of memory for the input data is recommended.
Graphical User Interface
The iWay DQC Graphical User Interface (GUI) is available for Microsoft Windows ® . The GUis bundled with JRE 1.5. No additional pre-installed packages are required.
For optimum performance, a 2 GHz Intel ® Pentium-class processor (or equivalent) with 1 Gof memory, and a screen resolution of at least 1024x768, is recommended.
The installed product requires approximately 400 MB of disk space.
The following table summarizes the requirements.
iWay DQC GUIiWay DQC CoreComponent
Intel-compatible. 2 GHz isrecommended.
Any.Processor
Microsoft Windows, 32-bit versiononly.
Any.Operating system
None.JRE 1.4 or later. JRE 1.5is recommended.
Software
At least 512 MB. 1 GB isrecommended.
At least 256 MB. 1 GB ormore is recommended.
Memory
400 MB.80 MB.Disk space forinstallation
24 iWay Softwar
System Requirements
8/18/2019 iWay Data Quality Center User's Guide
25/182
iWay DQC GUIiWay DQC CoreComponent
At least 1024x768.Not applicable.Screen resolution
Choosing the Correct JRE for the Server Engine
For most platforms, multiple JREs from different vendors are available. Not all JREs arestable enough to allow processing of large amounts of data. As a best practice, it isrecommended that you use the Sun JRE on Windows and Linux/UNIX ® systems runningIntel-compatible processors. Most vendors of commercial UNIX distributions provide JREsthat are stable for their platforms.
If available, a commercial JRE with support and regular updates is recommended forproduction deployments.
Installation ProcedureiWay Data Quality Center (DQC) is currently packaged with iWay Integration Tools (iIT). Youmust have a valid license key to use iWay DQC with iIT.
iWay DQC is distributed in two bundles:
Platform-independent iWay DQC server engine (core).dqc-core-version.zip
Graphical user interface with bundled JRE. A copy of iWay DQC core is located in the run-time subdirectory within the archive.
dqc-version-win32.zip
Installation of the product consists of extracting the files to the chosen location (for examplec:\Program Files\DQC on Windows, /opt/DQC on Linux/UNIX), and copying the license filto the user home folder (this folder is usually c:\Documents and Settings\ user_name onWindows and ~ on Linux/UNIX).
When you install the GUI, it is recommended that you place a shortcut to dqc.exe in a Starmenu folder or on the desktop for easy access.
See License Key on page 26 for more information on the license file.
iWay Data Quality Center User's Guide 2
2. System Requirements and Installation
8/18/2019 iWay Data Quality Center User's Guide
26/182
Installing Database Connectivity
Drivers
iWay DQC uses the Java Database Connectivity (JDBC) API for connecting to databases.
JDBC drivers are available for most database engines and are distributed as componentsof the database engine, or separately as connectivity components. The licensing terms donot always allow distribution of these drivers with iWay DQC. Therefore, iWay DQC ships wita basic set of drivers for the most common databases. You may install additional drivers.
The following drivers, which are shipped with iWay DQC, are located in the lib/jdbc subfoldeof the iWay DQC core installation.
DescriptionDriver
A JDBC driver for Oracle databases. The distribution contains the
9i and 10g versions of the driver.
Oracle
An open-source driver for connecting to both Microsoft SQL Serverand Sybase server.
jTDS
You must install each driver (including those shipped with the product) before you can useit. You can install a driver to the core by copying its .jar file to the lib subfolder of the coreinstallation, and using the dialog Window > Preferences > iWay DQC > DB Drivers in the GU
License Key
By purchasing iWay DQC, you obtain the license key (a file with a .plf extension). When iWaDQC core starts, it looks for this file first in the installation folder, then in the home folderof the current user, and finally in the folder defined by the PURITY_HOME system variable
Each license file may contain several restrictions, such as the operating system, iWay DQCversion, or date validity range. A license file is valid only if all its conditions for use are metAdditionally, a license file may contain a restriction on product functionality. Functionality not covered by the license file is reported as an error by both the GUI and core.
If no matching license key is found, iWay DQC exits with an error.
26 iWay Softwar
Installing Database Connectivity Drivers
8/18/2019 iWay Data Quality Center User's Guide
27/182
iWay
Getting Started3
Topics:iWay DQC Manager is a design tool forsolving data quality problems. An intuitivedrag-and-drop graphical interface allowsyou to easily build complex dataprocessing logic and quickly diagnoseproblems. The many included dataprocessing engines allow you to addressa wide variety of problems.
Creating a New Project
Plan File Basics
Using Input Files
Running and Debugging a Plan
iWay DQC Manager uses industry-standard formats, such as MicrosoftExcel and JDBC. It is built on top of theEclipse Integrated DevelopmentEnvironment (IDE) for proven stability andease of use.
Connecting to a Database
You can also run iWay DQC in commandline mode.
iWay Data Quality Center User's Guide 2
8/18/2019 iWay Data Quality Center User's Guide
28/182
Creating a New Project
To create a new project, select New > Empty Project, Simple Project, or DQ Project by rightclicking the DQ Projects node in the DQC Explorer (or use the File menu or toolbar).
An Empty Project is a project that contains no files or folders by default.
A Simple Project is a project that contains a default Plan file.
A DQ Project is a project with a pre-defined folder structure and Plan file based onavailable templates.
A Simple Project is automatically created when you first run iWay DQC Manager.
Plan File Basics
The core of any iWay DQC project is a Plan file. A Plan defines the logic and rules to be
applied to the input data in order to produce the desired output. Plans are created by placinsteps on a canvas and connecting them. Steps can be used to read, write, transform, andanalyze data, among other actions.
To create a new Plan file, select New > Plan by right-clicking a project or folder in the DQCExplorer (or use the File menu or toolbar). To start building a Plan, drag a step from thepalette and drop it onto the canvas. Connect steps by dragging from the "out" endpoint ofone step to the "in" endpoint of another.
You can edit properties for each step by double-clicking the step, or by right-clicking the steand clicking Edit Properties. To easily align and arrange the steps in a Plan, use the auto-layout and alignment buttons above the canvas (or select those options by right-clicking on
or more steps).You can embed Plans in other Plans in order to reuse a series of steps that have already been created. This is done by dragging the New Include object from the palette onto thecanvas and selecting the Plan file to include. To connect the Included Plan to other stepsin the Plan, right-click the Included Plan and click Add Step reference. Select the appropriatinput or output steps from the displayed list of steps in the embedded Plan.
To use the embedded Plan, connect the steps inside the Included Plan to the steps in thecontaining Plan. Double-clicking the include box opens the Included Plan for editing. To returto the containing Plan, use the tabs at the bottom of the canvas.
Using Input FilesYou can add existing files to iWay DQC Manager for use as input data for a Plan. For exampleyou can add files by dragging and dropping them from the file system to the desired projecin the DQC Explorer, or by copying them from the destination folder to the desired projectfolder inside the workspace folder in the file system.
28 iWay Softwar
Creating a New Project
8/18/2019 iWay Data Quality Center User's Guide
29/182
To use an input file in a Plan, you must first assign it metadata describing the format of thedata. When a data file (for example, .txt or .csv file) is opened for the first time, the MetadatEditor is launched. It presents options on how to read the file, such as the type of delimiteused, the data types of each column, and whether the file contains header rows.
You can preview the resulting data in the lower panel of the editor to assess the results othe metadata settings. Clicking OK in the Metadata Editor opens the data file for viewing.You can edit the file metadata later by right-clicking the file and clicking Edit Metadata.
To use input files inside a Plan, add one of the input steps to the canvas (for example, TexFile Reader or Excel File Reader), and type the input file name in the File Name property. Fomore information on the available steps in iWay DQC Manager, refer to the documentationfor each step. Alternatively, you can drag text files from the DQC Explorer directly onto thecanvas, where a Text File Reader is generated after the metadata is created.
Running and Debugging a Plan
To run a Plan, click the Run button on the toolbar, or right-click the canvas and click Run.
Errors in the Plan are shown in the Properties panel as the Plan is constructed. Clicking anindividual step shows only the warnings and errors for that step. Double-clicking an error inthe Properties panel opens the step properties dialog to the field that contains the error.
You can also debug individual steps by clicking the Debug button on the toolbar when a steis selected, or by right-clicking a step and clicking Debug .
Connecting to a Database
The following JDBC database drivers are included with iWay DQC Manager. You can add
other drivers in the DB Drivers preferences.
Oracle
Sybase
Microsoft SQL Server
To connect to one of these database types, right-click the Databases node in the DQCExplorer, and click New > Database Connection. Clicking a driver name from the drop-downlist populates the URL string field with a template for connecting to the specified databastype.
After the database connection has been made, the database is shown in the Databasesnode in the DQC Explorer. Clicking the table names shows metadata for each table in theProperties panel.
iWay Data Quality Center User's Guide 2
3. Getting Started
8/18/2019 iWay Data Quality Center User's Guide
30/182
To view the results of an SQL query on a table, right-click a table and click Open in SQLeditor . A default query is shown, listing all table entries (grouped in batches if the numberof rows is large). To change the query, edit the query text and click the Execute button. Toretrieve more results from the query, click Next batch or Read rest (to show all results).
30 iWay Softwar
Connecting to a Database
8/18/2019 iWay Data Quality Center User's Guide
31/182
iWay
Configuring Services4
Topics:iWay supplies two predefined servicesthat you can use as part of your iWay Data Quality Center (DQC) projects. XDDQAgent
This topic describes how to configure thesupplied services so that you canincorporate them in process flows.
XDDQCBatchExecAgent
iWay Data Quality Center User's Guide 3
8/18/2019 iWay Data Quality Center User's Guide
32/182
XDDQAgent
The supplied iWay DQC service named com.ibi.agents.XDDQAgent is configured to passinformation to the named Data Quality Provider and to retrieve the responses generated bthe iWay DQC Plan. Using iWay Integration Tools, you must supply parameters (property
values) that define this service.
For details on the use of this service, see the iWay Data Quality Center Getting Startedmanual.
XDDQCBatchExecAgent
In this section:
Supplying Parameters
Generating a Run-Time Configuration File
How Does the XDDQCBatchExecAgent Work?
Sample Files
Referring to a File Name
The supplied iWay DQC service named com.ibi.agents.XDDQCBatchExecAgent invokes theiWay DQC run-time (batch) execution environment, through the runcif.bat file. This serviceenables dynamic allocation of external files and data sources. By running the runcif.bat filethe service executes a Plan with a dynamic run-time configuration file.
For details on the runcif.bat file, see Running iWay DQC in Command Line Mode on pag101.
For details on the run-time configuration file, see Configuring Run-Time Variables on pag105.
Supplying Parameters
You must supply parameters that define the XDDQCBatchExecAgent. An inbound documencauses the iWay DQC run-time environment to execute, based on the supplied parameters
32 iWay Softwar
XDDQAgent
8/18/2019 iWay Data Quality Center User's Guide
33/182
The following table describes the XDDQCBatchExecAgent parameters.
DescriptionParameter Name
Location of the runcif.bat file. By default, the runcif.batfile is located in the DQC_BASE/runtime/bin directory.For example:
C:\dqc\runtime\bin
DQC Runtime Command File(required)
Fully qualified location of the Plan file that the runcif.batfile will execute. For example:
C:\dqc\workspace\samples\
01_Hello_World\bin\batch_Hello_World.plan
Plan File Location (required)
Fully qualified location of the default run-time configuration
file. This file contains all the static default allocations.
Runtime Configuration File
Location (required)
Comma-separated list of names of additional pathvariables, or a single name of an additional path variable.Use this parameter to add one or more path variables tothe dynamic default run-time configuration file.
Use this parameter with the Additional Path VariableValue(s) parameter. For each additional name, there mustbe a corresponding value.
If you supply this parameter, the path variables will be
added to the default configuration file. The file will thenbe used to execute the iWay DQC run-time environment.
For example:
MyPath
For a detailed example of a run-time configuration file withadditional path variable names, see Sample Run-TimeConfiguration File With Additional Path Variable Names onpage 35.
You may leave this parameter blank.
Additional Path VariableName(s)
iWay Data Quality Center User's Guide 3
4. Configuring Services
8/18/2019 iWay Data Quality Center User's Guide
34/182
DescriptionParameter Name
Comma-separated list of additional path variable values.Use this parameter to add path variable values (allocation
values) to the preceding list of names.For example:
C:/temp
Additional Path VariableValue(s)
Time, in seconds, for an iWay DQC timeout. The defaultvalue, 0, means no timeout.
Timeout
The following guidelines apply.
You may supply values that are discrete strings or Special Register (SREG) references
in the format SREG(variableName).
You must specify the iWay DQC base installation location. For example, if iWay DQC isinstalled in C:\DQC, the required parameter is C:\iway60\etc\dqc\bin.
Generating a Run-Time
Configuration File
Example:
Sample Default Run-Time Configuration File
Sample Run-Time Configuration File With Additional Path Variable NamesOther Examples
In the iWay DQC Graphical User Interface (GUI), you can generate a run-time configurationfile. Right-click your project, click New , and click iWay Runtime Configuration.
In design time, you can create a path variable. Right-click your project, click New , and clicPath Variable.
34 iWay Softwar
XDDQCBatchExecAgent
8/18/2019 iWay Data Quality Center User's Guide
35/182
Sample Default Run-Time Configuration FileExample:
Sample Run-Time Configuration File With Additional Path Variable NamesExample:
In the Additional Path Variable Name(s) field, specify the following:
PathOne,PathTwo,PathThree
in the Additional Path Variable Value(s) field, specify:
C:/pathOne,c:/pathTwo,c:/pathThree
The resulting run-time configuration file used by the service is shown here. It is based onthe default run-time configuration file.
iWay Data Quality Center User's Guide 3
4. Configuring Services
8/18/2019 iWay Data Quality Center User's Guide
36/182
Other ExamplesExample:
The following table lists other examples of path variable names and their values.
Additional Path Variable ValueAdditional Path Variable Name
APathSREG(DQC.pathnames)
C:\apathSREG(DQC.PathValues)
How Does the XDDQCBatchExecAgent
Work?
The XDDQCBatchExecAgent accepts an XML document and executes the configured Plan.
The resulting XML document is the original document with the addition of the attribute
DQCResult="0" on the root element.
The following table describes the possible return codes.
DescriptionReturn
Code
iWay DQC execution completed successfully.0
iWay DQC execution completed with warnings.16
iWay DQC execution completed with errors.17
Abnormal iWay DQC execution termination.18
No valid license file was found.19
Plug-in version check failed. This usually means that the iWay DQCinstallation is corrupted. Reinstallation is recommended.
20
Incorrect arguments were given to the runcif script.21
Assume that you have the following XML input file:
36 iWay Softwar
XDDQCBatchExecAgent
8/18/2019 iWay Data Quality Center User's Guide
37/182
After successful execution of the XDDQCBatchExecAgent, the resulting XML file is:
With the XDDQCBatchExecAgent, the structure of the original XML file is preserved.
Sample Files
runcif.bat File
@echo off
rem Start script for DQC - batch mode
rem $Id: runcif.bat 11177 2009-02-06 15:50:18Z pavel.nejedly $
set PURITY_HOME=D:\DQC-5.3.1\runtime
rem preparing classpath
set CLASSPATH=
for %%I in (%PURITY_HOME%\lib\*.jar) do @call %PURITY_HOME%\bin\appendcp.bat %%I
rem echo Using CLASSPATH=%CLASSPATH%
:okJava
"D:\DQC-5.3.1\jre\bin\java" cz.adastra.cif.processor.bin.CifProcessor %*
:end
Run-Time Configuration File
iWay Data Quality Center User's Guide 3
4. Configuring Services
8/18/2019 iWay Data Quality Center User's Guide
38/182
Referring to a File Name
In the iWay DQC Plan, the Text File Reader refers to the location using:
purity://MyVariable/filename
In the iWay DQC Graphical User Interface (GUI), use the path variable as follows. The firstimage shows the file name in the File Name field for the Text File Reader.
The next image shows the DQC Explorer tree.
To directly refer to a file name, instead of using folder navigation, use the following syntax
purity://MyFileVariable/
38 iWay Softwar
XDDQCBatchExecAgent
8/18/2019 iWay Data Quality Center User's Guide
39/182
iWay
Working With Data Types5
Topics:This section provides information on thesupported data types in iWay DQCrecords, input/output (I/O) operations,and step properties.
Supported Data Types
Formatting Data Types
Parsing Errors
Data Types in Step Properties
JDBC Data Type Conversions
iWay Data Quality Center User's Guide 3
8/18/2019 iWay Data Quality Center User's Guide
40/182
Supported Data Types
iWay DQC supports the following data types in records:
Integer. Whole number ranging from -231 to 231-1.
Long. Arbitrary-precision signed decimal number.
Float. Arbitrary-precision signed decimal number. You can control the output precisionand the precision of the division operation by the double.scale run-time parameter, whichas a value of 10 by default.
String. Sequence of characters that is treated as text.
Day. Calendar date without time fields. For more information, see Parsing Errors on pag40.
Datetime. Calendar date with time fields. For more information, see Parsing Errors on
page 40.
Boolean. Logical value that can be true or false.
Formatting Data Types
Formatting rules for parsing input and output data into iWay DQC data types are defined bthe data format parameters of the respective input/output processing steps. See thedocumentation on steps for details.
Parsing Errors
In all cases, if null exists in the input field, then null is written to the related output fieldwithout generating an error.
The following errors may occur for each data type:
STRING. Does not generate any errors.
BOOLEAN. When there is a non-null value in the input that cannot be parsed, anUNPARSABLE_FIELD error is generated.
INTEGER. When there is a non-null value in the input that cannot be parsed, anUNPARSABLE_FIELD error is generated.
FLOAT. When there is a non-null value in the input that cannot be parsed, anUNPARSABLE_FIELD error is generated.
LONG. When there is a non-null value in the input that cannot be parsed, anUNPARSABLE_FIELD error is generated.
40 iWay Softwar
Supported Data Types
8/18/2019 iWay Data Quality Center User's Guide
41/182
DAY. If the data parsing ends with an error, an INVALID_DATE error is generated. If theREAD_POSSIBLE option is set, the step parses the data again, this time with addedleniency towards nonsensical numeric parts of the date. For example, the string32-13-2000 represents a valid date value that is parsed as 1.2.2001. If even lenientparsing fails, an UNPARSABLE_FIELD error is generated.
DATETIME. Processing is the same as for the DAY data type.
Each step that handles I/O parsing of iWay DQC data types must implement a specificstrategy that manages error handling.
Data Types in Step
Properties
You can use the following data types in the definition of step properties:
stringinteger
long
date
float
boolean
double
JDBC Data Type Conversions
When data is read from a database type to an internal data type, or when data is writtenfrom an internal data type to a database type, a set of predefined conversions is used. Thfollowing table shows how data is converted between a database type and an internal dattype.
JDBC set MethodJDBC get MethodSQL Data TypeInternal Data Type
setBooleangetBooleanBITboolean
setIntgetIntINTEGERinteger
setBigDecimalgetBigDecimalBIGINTlong
setTimestampgetTimestampTIMESTAMPdate
iWay Data Quality Center User's Guide 4
5. Working With Data Types
8/18/2019 iWay Data Quality Center User's Guide
42/182
JDBC set MethodJDBC get MethodSQL Data TypeInternal Data Type
setDategetDateDATEday
setBigDecimalgetBigDecimalDECIMALfloat
setStringgetStringVARCHARstring
To read data from a database or write data to a database, the JDBC get or set method isused. For example, to read/write a date internal data type from/to a database, the JDBCfunctions getTimestamp()/setTimestamp() are used. These conversions are used by allJDBC-related steps (such as Jdbc Reader, Jdbc Writer, SQL Execute, and SQL Select).
JDBC Internal Conversions
The JDBC specifications define the JDBC capability for inner type conversions (the differenc
between which JDBC method you use to read/write data and the real database column dattype). These specifications are available here. The conversion abilities of certain driversdepend on the JDBC specification version they implement. Base conversions are defined iAPI 1.0 and extended in 3.0.
Most of the drivers support JDBC 3.0. However, some drivers may not implement theseconversions fully, or a database may use its own extra data types. Real conversion abilitieare JDBC driver dependent. The previously mentioned JDBC methods used to read/writedata from/to a database were chosen taking into consideration maximum compatibility witmajor databases and their JDBC connectors.
42 iWay Softwar
JDBC Data Type Conversions
8/18/2019 iWay Data Quality Center User's Guide
43/182
iWay
Creating Dictionary Files6
Topics:It is often necessary to use referencedata with certain steps (for example, tolook up values for matching purposes).The reference data must be placed indictionary files, which are created andmaintained in iWay Data Quality Center(DQC).
Dictionary File Types
Dictionary File Type Summary
Information for Specific Steps
The process for creating dictionary filesinvolves:
Reading the reference data from asupported input type (text file, DBF file, or JDBC).
Preparing the data (for example,creating a matching value with theCreate Matching Value step).
Generating the dictionary file using
the appropriate generator.
iWay Data Quality Center User's Guide 4
8/18/2019 iWay Data Quality Center User's Guide
44/182
Dictionary File Types
In this section:
StringLookup
IndexedTableLookup
MatchingLookup
SelectiveMatchingLookup
iWay DQC uses four types of dictionary files:
StringLookup, which is an indexed list of strings.
IndexedTableLookup, which is an indexed table.
MatchingLookup, which is a lookup file indexed by a matching value that contains realvalues.
SelectiveMatchingLookup, which is an extension of the MatchingLookup file type, usedfor selective lookup matching.
StringLookup
This dictionary file is an indexed list of strings, used for getting information about the presencof a string in a dictionary file. This file consists of a single column of strings. Data typesother than string are not valid. Other data types must first be converted to string if they arto be used.
Used by: String Lookup step, Validate Email step, Validate Phone Number step, GuessName Surname step, Experimental Exclude Spaces step
Generator: String Lookup Builder step
IndexedTableLookup
This dictionary file is an indexed table with defined index values, used for looking up recordby their corresponding keys. The full record data is contained in the file, as it was definedduring the generation of the file.
Used by: Apply Replacement step, Convert Phone Numbers step, Strip Titles step, TransformLegal Forms step, Validate In Res step, Validate SKRZ step, Validate Vat Id step, ValidateVin step, Table Matching step, Value Replacer step
Generator: Indexed Table Builder step
44 iWay Softwar
Dictionary File Types
8/18/2019 iWay Data Quality Center User's Guide
45/182
MatchingLookup
This dictionary file is used for looking up a matching value from a real value. The file isindexed by the matching value.
Used by: Guess Name Surname step, Intelligent Swap Name Surname step, Swap NameSurname step, Validate Vin step
Generator: Matching Lookup Builder step
SelectiveMatchingLookup
This dictionary file is an extension and modification of the MatchingLookup file. Otherparameters (in addition to the real and matching values) can be used in the lookup. Theother parameters provide a lookup of the best variant from the set of variants that fit thepair of matching and real values.
Used by: Selective Res Lookup step
Generator: Selective Matching Lookup step
Dictionary File Type Summary
The following table contains a list of the steps that require dictionary files and details ontheir use.
DescriptionDictionary File TypeFilename PropertyStep
File contains numbers
only. Indexed by names.For further information,see below.
IndexedTableLookupfirstNameRatioLookupFileNameUpdate
Gender
File contains numbersonly. Indexed by surnames.
IndexedTableLookupsurnameRatioLookupFileName
File contains all top-leveldomains in uppercasewithout dots.
StringLookuptldLookupFileNameValidateEmail
File contains all knownIDCs.
StringLookupidcLookupFileNameValidatePhoneNumber
File contains prefixes of known Telcos.
StringLookupprovLookupFileName
iWay Data Quality Center User's Guide 4
6. Creating Dictionary Files
8/18/2019 iWay Data Quality Center User's Guide
46/182
DescriptionDictionary File TypeFilename PropertyStep
File contains originalvalues with their
replacements. Indexed by the original values.
IndexedTableLookuplegalFormsLookupFileNameTransformLegal Forms
File contains referencedata of companies.Indexed by company registration number.
IndexedTableLookupdatabaseFileValidate InRes
File contains originalprefixes with patterns toform a number in the new
format. For furtherinformation, see below.
IndexedTableLookupconversionTableFileNameConvertPhoneNumbers
File contains knownnames.
MatchingLookupfirstNameLookupFileNameGuess NameSurname
File contains knownsurnames.
MatchingLookuplastNameLookupFileName
File contains known multi-word names.
MatchingLookupmultiFirstNameLookupFileName
File contains known multi-word surnames.
MatchingLookupmultiLastNameLookupFileName
File contains knownnames.
MatchingLookupfirstNameLookupFileNameIntelligentSwap NameSurname
File contains knownsurnames.
MatchingLookuplastNameLookupFileName
File contains matchingvalues with theirreplacements for knowntitles. Indexed by matching value.
IndexedTableLookuptitleLookupFileNameStrip Titles
46 iWay Softwar
Dictionary File Type Summary
8/18/2019 iWay Data Quality Center User's Guide
47/182
DescriptionDictionary File TypeFilename PropertyStep
File contains knownnames. This step is
deprecated. Use theIntelligent Swap NameSurname step instead.
MatchingLookupfirstNameLookupFileNameSwap NameSurname
File contains knownsurnames. This step isdeprecated. Use theIntelligent Swap NameSurname step instead.
MatchingLookuplastNameLookupFileName
File contains numbers and
names of known taxoffices. Indexed by numbers.
IndexedTableLookupfoLookupFileNameValidate Vat
Id
File contains knowncompany registrationnumbers and company names. Indexed by numbers.
IndexedTableLookupcnLookupFileName
File contains known WMIcodes as keys and
patterns to match VINs ina second dictionary file.For further information,see below.
IndexedTableLookupwmiFileNameValidate Vin
File contains the followingcolumns: patterns formatching input VIN,manufacturer, car model,year that VIN was issued,position of CRC number,
and position of yearnumber. Indexed by matching pattern.
IndexedTableLookupvinInfoFileName
iWay Data Quality Center User's Guide 4
6. Creating Dictionary Files
8/18/2019 iWay Data Quality Center User's Guide
48/182
DescriptionDictionary File TypeFilename PropertyStep
File contains Slovakdistrict codes and names.
Indexed by district codes.
IndexedTableLookupdistrictLookupFileNameValidateSKRZ
File contains originalvalues with theirreplacements. Indexed by original values.
IndexedTableLookupreplacementsFileNameApply Replacements
File contains a list of strings from which to lookup.
StringLookuplookupFileNameString Lookup
File contains referencedata of companies. Thisincludes real andmatching values of company names, company registration numbers, andan additional optionalfield.
SelectiveMatchingLookupfileNameSelective ResLookup
File contains table fromwhich to look up data.Indexed by keys used for
looking up data.
IndexedTableLookupindexTableFileNameTableMatching
File contains list of knownwords.
StringLookupdatabaseFileExperimentalExcludeSpaces
File contains replacementnames (first names andsurnames) written only inuppercase. Indexed by original values in
uppercase.
IndexedTableLookupnameLookupFileNameAnonymizer
48 iWay Softwar
Dictionary File Type Summary
8/18/2019 iWay Data Quality Center User's Guide
49/182
Information for Specific
Steps
In this section:
ValidateVINAlgorithm Dictionary Files
Convert Phone Numbers Step Dictionary Files
Update Gender Step Dictionary Files
This topic provides details on steps that require additional explanation or have more compleconfiguration requirements.
ValidateVINAlgorithm Dictionary
Files
Background information about WMI (World Manufacturer Identifier) and VIN (VehicleIdentification Number) codes is not provided here. For information about those codes, refeto the VIN article on Wikipedia at http://www.wikipedia.org .
The Validate VIN step needs two dictionary files in order to execute successfully.
WMI Dictionary File
The first dictionary file, referred to by the wmiFileName property, is of the MatchingLookupfile type. It must contain a WMI code as a matching value and a key name for lookup in theVIN dictionary file. The key name is a string that consists of a WMI code and a mask(optional), followed by the underscore character (_) and a unified manufacturer name (inuppercase and without accents).
The mask starts at the fourth position of the VIN (the first three characters are for the WMcode) and can consist of up to 11 characters. If no mask is defined, a default mask of *********** (11 asterisks) is used. An asterisk is a wild card that represents any character, as opposed to a specific character.
If a character other than an asterisk is placed in any of the mask fields, the specifiedcharacter will be used at that position. For example, the mask ***6Y defines characters6Y at the 7th and 8th positions. The whole key name will then look like, for example,TMB***6Y_SKODA (SKODA is the manufacturer name). It will match VINTMB1236Y234567890 but not TMB12345234567890.
VIN Dictionary File
iWay Data Quality Center User's Guide 4
6. Creating Dictionary Files
8/18/2019 iWay Data Quality Center User's Guide
50/182
The second dictionary file, referred to by the vinInfoFileName property, is of the IndexedTable file type. It is indexed by the key names (the same values that are in the WMI dictionarfile). It contains, in order, these columns: key name, real name of manufacturer, car modeyear that VIN was issued (in four-digit format), position of CRC number (if the VIN codecontains any), and position of year number (if any).
Convert Phone Numbers Step
Dictionary Files
The only dictionary file for this step, referred to by conversionTableFileName, is of the IndexeTable file type. The table is indexed by the source prefix, which consists of the old prefix anthe beginning of the original number that is going to be replaced by the step. The tablecontains the source prefix (the value that was indexed from), the length of the number thawill not be replaced, and the new prefix.
Example: You need to convert all numbers with the old prefix 02 that start at number 2 (0
22 93 44 23, 02 23 48 79 67) to a 9-digit national format. The table must have a lineindexed with 022 (02 as the original prefix, 2 as the start number) and must contain 022(source prefix), 7 (number length), and 22 (new prefix). The step then replaces 022 fromthe beginning of a number with 22 from the new prefix and copies 7 numbers from theoriginal phone number.
Update Gender Step Dictionary
Files
Numbers written in the dictionary files are the ratios of males to females with thecorresponding name (names are the indexed value). They are INTEGER values calculated a
(male_count*1000)/(male_count+female_count). This corresponds to 0 and small numberfor most female names, and 1000 and large numbers primarily for male names.
50 iWay Softwar
Information for Specific Steps
8/18/2019 iWay Data Quality Center User's Guide
51/182
iWay
Using Expressions7
Topics:This section describes expressions usedin iWay Data Quality Center (DQC) steps.Places where the expressions may beused are described in the descriptionsections of the appropriate steps.
Operands
Handling Null Values
Variables
Operations and Functions
Regular Expressions
iWay Data Quality Center User's Guide 5
8/18/2019 iWay Data Quality Center User's Guide
52/182
Operands
Expression operands may be of a defined column type, such as INTEGER, FLOAT, LONG,STRING, DATETIME, DAY, and BOOLEAN. If a number assigned to either an INTEGER or LONGvariable overflows or underflows the interval of permitted values for that type (that is, -
2147483648;+2147483647 for INTEGER, and - 9223372036854775808;+9223372036854775807 for LONG), then the number wraps around the interval. Forexample, the value 2147483649 assigned to an INTEGER variable is interpreted as -2147483647.
Operands are automatically converted to a wider type if needed. This feature is relevant fonumeric data types INTEGER, LONG, and FLOAT (widening INTEGER -> LONG -> FLOAT) andatetime types DAY and DATETIME (DAY -> DATETIME). In case of comparisons, and set anconditional operations, all operands are converted to the most general type before theoperation is performed.
An operand is any expression with a type corresponding to a valid type of a given operationOperands can be divided into four categories:
Literals. Numeric constants, string constants, or logical constants (TRUE, FALSE,UNKNOWN - deprecated; all the keywords are case-insensitive). Can also be NULL litera(case-insensitive).
Columns. Columns are defined by their names and represent their values. If there is aspace character in the column name, the name must be enclosed in square brackets [If the step retrieves data from multiple inputs, the column names are specified using donotation, that is, input_name.column_name. If the step uses just one input, you can omthe dot notation.
Set. Can be used only in combination with the IN operation, in which the set representa constant expression. A set can occur only on the right side of the IN operation.
Complex expressions.
Handling Null Values
Operations and functions handle arguments with a NULL value conforming to SQL rules.There is one exception to the STRING data type. NULL string and empty string are considereequal. As a result, null string arguments are handled as empty (zero length) strings.
Example:
The following are legal comparisons that give a non-null Boolean result:
"abc" == NULL
"abc" > NULL
52 iWay Softwar
Operands
8/18/2019 iWay Data Quality Center User's Guide
53/182
Respectively, they are analogous to the following comparisons:
"abc" == ""
"abc" > ""
However, in SQL, both of these expressions result in a NULL (UNKNOWN) value.
Variables
The expression can be formed as a sequence of assignment expressions followed by oneresulting expression. Multiple expressions are delimited by a semicolon (;). An assignmenexpression has the following syntax:
variable := expression
The first occurrence of a variable on the left-hand side defines this variable and its type. Areference to a variable in an expression is valid only after its definition. Each followingoccurrence of a variable, including an occurrence on the left-hand side of the assignment
expression, must conform to the variable type.
Example:
a := 2;
b := 4 - a;
3 * b
iWay Data Quality Center User's Guide 5
7. Using Expressions
8/18/2019 iWay Data Quality Center User's Guide
54/182
Operations and Functions
In this section:
Arithmetic Operations
Logical Operations
Comparison (Relational) Operators
Set Operations
Other Operations
Date Functions
String Functions
Bitwise Functions
MinMax Functions
Aggregate Functions
Conditional Expressions
Conversion and Formatting Functions
Word Set Operation Functions
Unclassified Functions
iWay DQC provides the following operation and function categories:
Arithmetic operations
Logical operations
Comparison operations
Set operations
Other operations
Date functions
String functionsBitwise functions
MinMax functions
Aggregate functions
54 iWay Softwar
Operations and Functions
8/18/2019 iWay Data Quality Center User's Guide
55/182
Conditional expressions
Conversion and formatting functions
Word set operation functions
Caution: All operations and functions that do not have the locale parameter set or defineuse the default iWay DQC locale. The step locale setting does not influence this behavior.
Arithmetic Operations
This category includes common arithmetic operations: addition, subtraction, multiplicationand division. The result of an arithmetic operation applied to the type INTEGER or LONG isalways INTEGER or LONG. The result is type LONG if at least one operand is type LONG.
Note: Type NUMBER stands for data types INTEGER, LONG, or FLOAT in the description oinput (operand) and output (result) types.
TypeDescriptionUsageName
Operand Type:
NUMBERNUMBER
Result Type:
NUMBER
Subtraction of numeric operands a and b.a - b-
Operand Type:
NUMBER
Result Type:
NUMBER
Negation of numeric operand a. For example:
-(a*c)
Note: The unary expression operator cannotimmediately follow another arithmetical operatorunless parenthesized. The following expression isinvalid:
a*-b
Instead use either
-b*a
or:
a*(-b)
-a-
iWay Data Quality Center User's Guide 5
7. Using Expressions
8/18/2019 iWay Data Quality Center User's Guide
56/182
TypeDescriptionUsageName
Operand Type:
NUMBERNUMBER
Result Type:
FLOAT
Division of numeric operands a and b.a / b /
Operand Type:
NUMBERNUMBER
Result Type:
NUMBER
Multiplication of numeric operands a and b.a * b*
Operand Type:
INTEGERINTEGER
Result Type:
INTEGER
Modulo, the remainder after numerical division of a by b.
a % b%
Operand Type:
LONGLONG
Result Type:
LONG
56 iWay Softwar
Operations and Functions
8/18/2019 iWay Data Quality Center User's Guide
57/182
TypeDescriptionUsageName
Operand Type:
NUMBERNUMBER
Result Type:
NUMBER
Addition of numeric operands a and b, or stringconcatenation.
a + b+
Operand Type:
STRINGSTRING
Result Type:
STRING
Operand Type:
INTEGERINTEGER
Result Type:
INTEGER
Division of integer operands without a remainder.a div bdiv
Operand Type:
LONGLONG
Result Type:
LONG
Logical Operations
Common logical operations are AND, NOT, OR, and XOR (all keywords are case-insensitive
iWay Data Quality Center User's Guide 5
7. Using Expressions
8/18/2019 iWay Data Quality Center User's Guide
58/182
TypeDescriptionUsageName
Operand Type:
BOOLEAN BOOLEAN
Result Type:
BOOLEAN
Logical conjunctiona AND bAND
Operand Type:
BOOLEAN
Result Type:
BOOLEAN
Logical negationNOT aNOT
Operand Type:
BOOLEAN BOOLEAN
Result Type:
BOOLEAN
Logical suma OR bOR
Operand Type:
BOOLEAN BOOLEAN
Result Type:
BOOLEAN
Exclusive ORa XOR bXOR
Comparison (Relational)
Operators
TypeDescriptionUsageName
Operand Type:
Any two compatible types
Result Type:
BOOLEAN
Tests if the value of a is less thanb.
a < b<
58 iWay Softwar
Operations and Functions
8/18/2019 iWay Data Quality Center User's Guide
59/182
TypeDescriptionUsageName
Operand Type:
Any two compatible types
Result Type:
BOOLEAN
Tests if the value of a is less thanor equal to b.
a
Operand Type:
Any two compatible types
Result Type:
BOOLEAN
Tests if the value of a is greaterthan or equal to b.
a >= b>=
Set Operations
For sets, a few basic operations are implemented. Set members are literals of types define
for columns or column names themselves.
iWay Data Quality Center User's Guide 5
7. Using Expressions
8/18/2019 iWay Data Quality Center User's Guide
60/182
TypeDescriptionUsageName
Operand Type:
Any type, set
Result Type:
BOOLEAN
Tests whether operand a is a member of thespecified set. As opposed to the "is in"
operation, if operand a is not a member of theset and a null value is a member of the set,then the result is null.
a in {elem[, elem]...}in
Operand Type:
Any type, set
Result Type:
BOOLEAN
Tests whether operand a is a member of thespecified set. Always returns TRUE or FALSE.
a is in {elem[, elem]...}is in
Operand Type:
Any type, set
Result Type:
BOOLEAN
Tests whether operand a is not a member of the specified set.a is not in {elem
[,elem]...}is not in
Operand Type:
Any type, set
Result Type:
BOOLEAN
Tests whether operand a is not a member of the specified set. As opposed to the "is notin" operation, if operand a is not a member of the set and a null value is a member of theset, then the result is null.
a not in {elem[, elem]...}not in
Example:
company IN {"Smith inc.", "Smith Moving inc.",
"Speedmover inc.", [candidate column], clear_column}
a IN {1, 2, 5, 10}
b IN {TRUE, FALSE}
60 iWay Softwar
Operations and Functions
8/18/2019 iWay Data Quality Center User's Guide
61/182
Other Operations
TypeDescriptionUsageName
Operand Type:
Any two compatible types or null
Result Type:
BOOLEAN
Tests if a is equal to b. Null values areallowed as operands. A typical use is:
a is null
a is bis
Operand Type:
Any two compatible types or null
Result Type:
BOOLEAN
Tests if a is not equal to b. Null values areallowed as operands. A typical use is:
a is not null
a is not bis not
Date Functions
In iWay DQC, a date is represented by DAY and DATETIME types. The DAY type representsa date to the detail level of days. DATETIME represents a date to the detail level of milliseconds. The time values that are compatible with each format are described in thefollowing table.
Included in Date TypeRangeDate Part Name
DATETIME, DAYAny positive numberYEAR
DATETIME, DAY1 - 12MONTH
DATETIME, DAY1 - max.monthDAY
DATETIME0 - 23HOUR
DATETIME0 - 59MINUTE
DATETIME0 - 59SECOND
A day starts at 00:00:00 and ends at 23:59:59. If a given function requires identificationof a date part as a parameter, the identifier is written in the expression in the form of astring literal, for example, "MONTH". Otherwise, the expression is evaluated as incorrect.Identifiers are case-sensitive and must be written in uppercase.
iWay Data Quality Center User's Guide 6
7. Using Expressions
8/18/2019 iWay Data Quality Center User's Guide
62/182
Example:
expression='dateAdd(inDate,10,"DAY")'
All the listed date parts are represented by positive integers. The date functions do notsupport milliseconds.
Note: Data type DATE-TYPE represents the date type DAY or DATETIME in the descriptionof input (operand) and output (result) types.
TypeDescriptionDate Function
Operand Type:
DATE-TYPEINTEGERSTRING
Result Type:
DATE-TYPE
Adds the specified srcValue of the type specified by fieldName (YEAR, MONTH, or DAY) to the srcDate. Thisfunction allows subtraction, so the srcValue can benegative. The return value is the result of the add (subtract)operation. If any of the operands are invalid or if an attempt
is made to add an unsupported fieldName to the date typeDAY (HOUR, MINUTE, or SECOND), then the expressionreports an error.
dateAdd( srcDate, srcValue, fieldName)
Operand Type:
DATE-TYPEDATE-TYPESTRING
Result Type:
INTEGER
Returns the difference between endDate and startDateexpressed in fieldName units. If the result exceeds themaximum range of INTEGER, then the value null is returned.If any of the parameters are invalid, the expression reportsan error.
A combination of date type DAY and fieldName HOUR,
MINUTE, SECOND can be used. The value of these fieldsis considered to be 0.
dateDiff( startDate,endDate, fieldName)
Operand Type:
DATE-TYPESTRING
Result Type:
INTEGER
Returns the value of the field fieldName of srcDate. If any of the parameters are invalid, the expression reports anerror. For the fields HOUR, MINUTE, and SECOND set forthe date type DAY, the function returns 0.
datePart( srcDate,fieldName)
62 iWay Softwar
Operations and Functions
8/18/2019 iWay Data Quality Center User's Guide
63/182
TypeDescriptionDate Function
Operand Type:
DATE-TYPESTRING
Result Type:
DATE-TYPE
Truncates less important parts of the srcDate up to thelevel specified by fieldName. Truncation changes values
of the fields by the following rules: MONTH and DAY to 1,HOUR, MINUTE, and SECOND to 0.
The function may be used even for the DAY type with thefieldName HOUR, MINUTE, and SECOND. The function doesnot have an effect on the data. Result and input valuesare the same.
If any of the parameters are invalid, the expression reportsan error.
Example: For srcDate 5.5.1980 12:35:10 and fieldNameHOUR, the function returns 5.5.1980 12:00:00.
dateTrunc( srcDate,fieldName)
Operand Type:
DATE-TYPE
Result Type:
DAY
Returns the date in the format defined by the specified srcExpression (type DAY or DATETIME), with the time setto zero (HH:mm:ss:sss).
getDate( srcExpression)
Result Type:
DATETIME
Returns the time at which processing of the current requeststarted. This is the iWay DQC application start time in batchmode, and the Web service request time in online mode.
getRequestTime()
Result Type:
DATETIME
Returns the current time with the type DATETIME. Thisfunction always returns the time when it is evaluated, thatis, the current time.
now()
Result Type:
DAY
Returns the current date in type DAY. This function returnsthe same value for all records (iWay DQC application startdate), even if iWay DQC runs past midnight.
today()
String Functions
The following are common functions used for string processing.
iWay Data Quality Center User's Guide 6
7. Using Expressions
8/18/2019 iWay Data Quality Center User's Guide
64/182
TypeDescriptionString Function
Operand Type:
STRING
Result Type:
STRING
Transforms all words in the string srcStr in thefollowing manner: the first character of each word
to uppercase and all following characters tolowercase. A word consists of alphabeticcharacters (letters). All other characters areconsidered separators.
capitalize( srcStr )
Operand Type:
STRINGSTRING[,STRING]...
Result Type:
STRING
Transforms all words in the string srcStr (with theexception of the words given as the parametersexc) in the following manner: the first characterof each word to uppercase and all followingcharacters to lowercase. A word consists of alphabetic characters (letters). All other
characters are considered separators.
capitalizeWithException( srcStr ,exc[, exc]...)
Operand Type:
STRINGSTRING
Result Type:
BOOLEAN
Searches for the occurrence of the word srcWordin the string srcStr . Word is a sequence of letterswith no whitespaces. Words in the string aredefined as sequences of letters separated by aspace (' '). Beginning, ending, and multiplespaces are ignored. This function is case-sensitive.
containsWord( srcStr , srcWord)
Operand Type:
STRING
Result Type:
INTEGER
Returns the number of characters is the string
srcStr that include diacritical marks.
countNonAsciiLetters( srcStr )
Operand Type:
STRINGSTRINGSTRING
Result Type:
INTEGER
Takes a string as an input wrongly read using theactualCp charset and transforms it into a correctcorrectCp charset. An example is a file that is allin windows-1250 charset except for one column,a, which is in the latin2 charset. This file will beread using the windows-1250 charset. For the
column named a, the following expression canbe used:
cpConvert(a, 'windows-1250', 'latin2')
cpConvert( str, actualCp,correctCp)
64 iWay Softwar
Operations and Functions
8/18/2019 iWay Data Quality Center User's Guide
65/182
TypeDescriptionString Function
Operand Type:
STRING
Result Type:
STRING
Operand Type:
STRINGSTRING
Result Type:
STRING
Operand Type:
STRINGSTRINGSTRING[,STRING]...
Result Type:
STRING
Returns a string that contains concatenated partsof the original string srcStr . Repeated parts, or
parts not listed as srcItem, are omitted. Theparameter srcSeparator specifies the separatorof the string parts. If srcSeparator is missing orset to NULL, the space character is theseparator. The listing of parameters in srcItemrestricts the output string parts to the listeditems only. If the string srcStr is NULL or empty,the function returns NULL.
distinct( srcStr[, srcSeparator [, srcItem[, srcItem]...]])
Operand Type:
STRINGResult Type:
STRING
Encodes srcStr to a double metaphone primary string. It removes accents from the srcStr before
evaluating the double metaphone value. See theMetaphone article on Wikipedia, athttp://www.wikipedia.org .
doubleMetaphone( srcStr )
Operand Type:
STRINGTRUE
Result Type:
STRING
Encodes srcStr to a double metaphone secondary string if the parameter isAlternate is true. Itremoves accents from the srcStr beforeevaluating the double metaphone value.Otherwise, it returns the primary string. See theMetaphone article on Wikipedia, athttp://www.wikipedia.org
.
doubleMetaphone( srcStr,isAlternate)
iWay Data Quality Center User's Guide 6
7. Using Expressions
8/18/2019 iWay Data Quality Center User's Guide
66/182
TypeDescriptionString Function
Operand Type:
STRINGSTRING
Result Type:
INTEGER
Operand Type:
STRINGSTRINGBOOLEAN
Result Type:
INTEGER
Returns the edit distance between strings srcStr1and srcStr2. The parameter caseInsensitive
determines whether case-sensitivity should beconsidered or not. By default, the function iscase-insensitive. The difference betweenLevenshtein and Edit distance lies in thedefinition of distance of two switched adjacentcharacters. Levenshtein considers the switch astwo changes, whereas Edit distance considersthe switch to be one change. If both of the stringsare NULL, then the result is 0. If just one of thestrings is NULL, then the result is the length of the other string.
editDistance( srcStr1, srcStr2 [,caseInsensitive])
Operand Type:
STRINGINTEGERBOOLEAN
Result Task:
STRING
Removes spaces between separate characters(words of length 1) in string srcStr . The pa