Download - Service Oriented Architecture

Service Oriented Architecture

white paper

ascential.com

TABLE OF CONTENTS

Enabling Authoritative Source with Service-Oriented Architecture 3

What is MDM? 3

Good Data is a Key to Attracting and Retaining Customers 4

Companies Lack Good Customer Data Today 6

A Reference Architecture for Master Customer Data Management 7

A Proven Implementation Process 8

Profile Source Systems 8

Cleanse and Standardize Data 9

Match and Load the Master Cross-Reference 10

Load Analytical Data into the Data Warehouse 11

Define a Data Model for the Services 12

Create your Data Service Requests 12

Create Update Services 13

Joint Solution by IBM and BEA Systems 14

Companies Must Manage Master Customer Data as an Asset 15

Conclusion 16

white paper

ascential.com

white paper

ascential.com

3

ENABLING AUTHORITATIVE SOURCE WITH SERVICE-ORIENTED ARCHITECTURE

A JOINT WHITE PAPER BY BEA SYSTEMS AND IBM

Customers are the lifeblood of your business. In order to attract and retain them, your

customer-facing processes must be as efficient and effective as possible. Complete

visibility to all current information on the customer will have an enormous impact on

these processes. In addition, every customer interaction should refine the bigger picture

understanding of the customer. But, this information is often locked away in multiple

systems, making it difficult to obtain the complete, accurate, and current view of the

customer required. Traditional solutions to this problem often involve replicating all data in

one or multiple places. But replication is expensive and creates latency related problems,

as well as errors and inconsistency in the data.

This paper describes a Service Oriented Architecture for the federated management

of master customer data. The architecture creates a set of shared data services that

allow for consistent accessing and updating of data across multiple systems, providing

complete, accurate, and current customer data to any process or application at any

time. This approach involves creating cross-reference keys linking multiple systems and

creating services against a logical representation of the data in multiple systems. These

data services on a logical model combined with “clean” cross-reference keys creates a

solution that is 10X cheaper to build.

white paper

ascential.com

4

WHAT IS MDM?

Every business has elements of core business reference data which are used in multiple

types of applications and business processes. Often this is the most important data

that the business has, since it represents the business’ understanding of its customers,

suppliers, products, inventory, bills of materials, or parts. This type of data is called Master

Data, and it is one of the most important assets that a company owns, although it is often

not treated that way.

Because of its importance to the business, this data is often stored in multiple systems

for different purposes. For example, customer data is captured and stored during the

sales cycle in many business applications, and later it is also stored in support systems to

provide ongoing customer support after the sale. The problem with this is that keeping it

synchronized and aggregated is very difficult. In our example, it is likely that after the sale

is made, the support system will have better information on the customer than the sales

system. If Sales now wants to sell a new product to that customer, it has no way to benefit

from the new information in the support system since this data is kept in separate stove-

piped systems, with little or no visibility across them.

Master data management (MDM) focuses on creating a single logical representation of

this data that can be shared across multiple applications and processes. Rather than

creating separate copies of master data with no way to tie them together, MDM provides

linkages between separate instances of this data, allowing businesses to maintain a

consistent and complete view of it at all times. In effect, MDM allows enterprises to

leverage this data as a consistent and important corporate asset.

GOOD DATA IS A KEY TO ATTRACTING AND RETAINING CUSTOMERS

For many businesses, customer data is the most important information in the business.

This information is vital to understanding customer buying patterns, identifying up-sell

opportunities, providing a higher level of customer service, tailoring and optimizing

marketing activities, predicting and addressing business triggers like renewals, recalls,

and upgrades. Today many processes and applications have a direct impact on

customers, and most of these are starved for information that could and should have

an impact on their operations and decisions. For example, knowing that a customer has

white paper

ascential.com

5

recently made a large purchase may impact the level of customer service. Unfortunately,

this data is simply not available in an aggregated form in most businesses.

“ By 2007/08, 30% of the Global 2000 will have created a comprehensive framework for the

management of referenced at a.” – META Group

A complete and accurate view of customer data, sometimes referred to as a single or

“360° view,” provides enormous business benefits. These benefits include a reduction in

sales and marketing costs, improved customer satisfaction, higher service renewal rates,

and optimized allocation of resources.

The basic requirements to provide the optimal value from this data are the following:

• Complete – It must represent all known information about customers across all systems

• Accurate – It must be cleansed, de-duplicated, and verified against established business rules

• Current – It must be up-to-date, reflecting the latest relevant business events

• Accessible – It must be easily available to enterprise processes whenever they need it

It is easy to point out everyday examples of the customer service frustration that can

be caused by a lack of integrated customer data. For example, anyone who has had to

repeat their personal information over and over again on the telephone to various support

people in order to get a simple question answered knows this frustration. However, the

true business impact goes well beyond this.

According to the META Group, “Customer data integration can provide real ROI benefits

by improving the underlying quality and real-time accessibility of synchronized customer

data.”

CASE STUDY:

A Global 1000 manufacturing company needed a better understanding of their customers.

While their existing EAI infrastructure and CRM systems were excellent at processing

customer-based transactions, they were unable to consolidate a complete customer

record from their multiple source systems. At the same time, their databases contained

many duplicate entries for the same customer with no way of linking records together

across systems, or even within the same system.

As a result, they were spending too much money on their marketing programs without

white paper

ascential.com

6

seeing the level of results commensurate with the level of investment.

By implementing an MDM solution, they were able to reduce the average cost per

qualified marketing lead by 80%. In addition, they now have a single source for complete

and accurate customer data across all systems which has a positive impact on many

other processes, as well.

COMPANIES LACK GOOD CUSTOMER DATA TODAY

The problem of distributed and duplicated customer data is not a new one. Companies

have used a variety of mechanisms to try to deal with this issue, however none of

them have completely solved the problem of providing complete, accurate, and current

data that is easily plugged into customer centric processes. In fact, they are all likely

participants in a customer-oriented master data management solution.

CIF — Traditionally, many companies have used Customer Information Files (CIFs)

to centralize customer data. CIFs are usually created by extracting data from source

systems in batch, and loading customer data into a common location. CIFs fail to

meet the requirements for current accurate, and complete data, since they are loaded

infrequently and do little to correct data or link records together across systems.

CRM — Probably the most common misconception is that Customer Relationship

Management (CRM) systems solve this problem. These systems allow management of

customer-centric processes like sales, marketing, and customer service, but data accuracy

is often a problem, and they are unable to provide a complete, cross-system view of data.

Data Warehouse — Data warehousing has also been commonly used as a mechanism

to consolidate customer data, as they are excellent sources of complete and accurate

data. However, they fail to meet the requirement of current data since they are incapable

of providing up-to-date data and immediate recognition of events. In addition, warehouses

are usually designed around dimensional schemas that are optimized for multi-conditional

queries, but not well-suited to rapid cross-referencing of core data elements.

ODS — Operational data stores (ODS) are another mechanism commonly thought to

provide customer data consolidation. However, ODS implementations fail to meet the

completeness requirement, since they simply aggregate raw transactions, and do not

white paper

ascential.com

7

provide context around how individual transactions or their underlying data elements are

related together, making it impossible to obtain a complete view of any given customers.

More recently, as companies have begun to focus specifically on MDM, other alternatives

have emerged. Many of the enterprise application vendors have created master data

components that augment their base offering. These products can be effective, but they

do not automatically accommodate data completeness or accuracy, so additional software

is required. In addition, these products can be difficult to plug into enterprise processes,

focusing instead on providing an application to access the data. However, achieving the

synchronized view in this way is virtually impossible.

All of these and other similar solutions attempt to create a monolithic database for

customer data. This approach has fundamental limitations as replication involves bulk

movement of data and creates latency, synchronization and inconsistency issues.

A REFERENCE ARCHITECTURE FOR MASTER CUSTOMER DATA MANAGEMENT

The following diagram describes the service-oriented architecture for federated master

data management. In this architecture:

1. A matching service (such as IBM® WebSphere® QualityStage™) is used to create cross-reference

keys across multiple systems. While creating the cross-reference keys, best practices include

data profiling, data cleansing, de-duplication, and standardizing data formats.

2. An ETL tool (such as IBM® WebSphere® DataStage®) is used to perform the initial load of the

cross reference database, and to create “historical” aggregates, which can be stored in a

separate data warehouse.

3. An EII tool (such as BEA Liquid Data) is used to create composite data services that span

all sources, cross reference database as well as a data warehouse. Note that the cross-

reference keys created in step 1 are essential to link data in different systems. All data access

and update logic as well as policies like security and caching are defined in this layer - which

allows for consistent definition of these policies across all systems.

4. Data cleansing and matching functions are also exposed as services by the EII tool. This

means applications can link “Phil” on the phone to “Phillip” in the customer database.

Thus, all of this data is available to calling applications (like dashboards, workflows, and

portals) as a service from one logical source. This dramatically simplifies application

development as well as solves the traditional of data latency and inconsistency.

white paper

ascential.com

8

A PROVEN IMPLEMENTATION PROCESS

Implementing this service oriented architecture is not complex and can be up to 10X

faster than implementing data replication and related synchronization processes.

Following is a suggested approach that we have seen work for mutual customers.

PROFILE SOURCE SYSTEMS

The first step to an MDM strategy is to understand the data in source systems. This

requires the ability to profile the data in source systems to understand its content,

structure, and relationships. Automated profiling tools accelerate this process, allowing

analysis of column values and structures, and uncovering data anomalies, primary and

foreign keys, relationships, and table normalization opportunities. Profiling also helps to

uncover business rules within the data that will later be used to provide ongoing validation

of data quality.

Profiling is a critical activity for accelerating MDM efforts. When choosing a profiling tool,

white paper

ascential.com

9

it is important to select tools that can access and understand the various data sources

you are trying to reach. In addition, the profiling tool needs to automate the process of

understanding data and assist in building a common metadata understanding across

systems. Other considerations for profiling tools include the ability to deal with large

volumes of data. Most tools will suggest profiling on a sample set of data, but often these

approaches will miss important data trends that do not show up in the sample.

CLEANSE AND STANDARDIZE DATA

The second step is to clean and standardize the data records. Cleansing and

standardizing involves assigning categories to individual elements within the data and

applying rules to these data category based on their business content. For example,

the text “100 St. Virginia St.” would be parsed into four fields of data. A lexical analysis

of those fields would determine that the “100” is the street number, the “Virginia” is the

street name, and the second “St.” is the street type. The first “St.” could be mistaken for

a repeated street type, but a contextual analysis would show that it is actually part of the

street name: “St.Virginia”. Once the data is categorized, it can first be cleansed, removing

any unexpected characters or flagging anything that can’t be categorized. Next, it is

standardized to ensure the same abbreviations and standards are applied consistently

to all elements in the same category. In this example, it could be determined that “Street”

will always be abbreviated as “St”. In this case, the period after the second “St” would be

removed. Address verification could then be run to verify that this address is an actual

address according to local postal records.

This ensures that the data is as clean as possible, establishes a foundation for ongoing

data cleansing, and lays the groundwork for matching and record-linkage. It’s important to

select a cleansing tool that is flexible enough to handle non name and address data such

as product and material descriptions. Some tools only work with names and addresses.

Cleansing tools should be able to validate and certify location data on a global basis, so

Unicode support and a global reference file for location data is critical to success.

While cleansing the data, you may also need to create a temporary copy in a staging

database. This staging database can be used to build cleansing, matching, and cross-

reference creation logic – such that heavy interactions will not impact source systems.

white paper

ascential.com

10

MATCH AND LOAD THE MASTER CROSS-REFERENCE

The next step is to create a master cross-reference database. This entails applying

matching and survivorship rules to the data in all pertinent source systems to create a

database that stores the key structures of the various systems involved in each matched

record.

At its most basic level, the master cross reference simply defines the key structure

relationships between systems for any particular customer. This cross-reference must

also contain enough information on the customer to identify a positive match when new

inbound data is received. This may just be name and address information, but it may also

include other information, such as hierarchical information that describes the relationship

of this customer to other customer entities. For example, an employer or parent may be

used to help identify an individual.

Creating this cross-reference is one of the most challenging aspects of customer MDM. It

requires that a strong understanding of source systems is in place, and that the complex

matching and survivorship rules are in place. The engine used to load the data must be

capable of working through the large amounts of data in all the source systems. It also

must be capable of maintaining a metadata lineage of the sources and processing of the

data.

Matching and linking records between systems involves identifying common elements

across systems, and determining how data will be matched together. It also involves

determining a precedence order for how data elements will be selected when it exists in

multiple systems. Most matching products employ a deterministic matching algorithm to

determine when records match. This mechanism looks for matches from multiple data

sets or multiple records from within single data set using full agreement across a set

of common variables (e.g. name, phone number, birth date). Some matching products

employ probabilistic matching, which also considers the frequency of data values within

the database when determining a match (effectively giving less common entries in the

database a higher weighting – two John Smiths are less likely to match than two Zeke

Durgans). Probabilistic matching is preferable in cases where a reliable and accurate

identifying field is not present in all records of data, as it has been shown to produce

white paper

ascential.com

11

higher match rates and lower chances of false positives in these cases.

Once the cross-reference database is in place, an ongoing mechanism must be created

to ensure that new records do not dramatically degrade the quality of the initial load

by creating duplicates and unmatched entries. This is accomplished by packaging as

services the same matching rules used to create the cross-reference database. This

enables the determination of whether or not new customer data actually already exists

in any system. If a match is determined, the complete record can be assembled and

returned. If a match is not determined, the customer is a new record, and can be entered

into the systems. The matching logic can be packaged as a service that can be easily

exposed and reused other applications and environments. This ensures that the logic is

applied consistently, and that there is only one point of maintenance moving forward.

When choosing a data quality product, the ability to create reusable services from

matching logic should be considered in order to meet this requirement. In addition, the

product must be able to handle high real-time processing volumes, and provide high

availability to ensure that outages will not occur during critical operating hours.

LOAD ANALYTICAL DATA INTO THE DATA WAREHOUSE

Concurrent with loading the cross reference database, many customers also load the related

analytical data into the relevant Data Warehouse for ongoing historical and trend analysis.

Incremental updates to the cross-reference database and the data warehouse are a regular

part of the production schedule. Unlike the cross-reference database, the data warehouse

may receive full record data from the source systems, which may be interesting from an

analytical perspective. For example, a customer’s purchasing habits over the past 90 days.

Loading the data warehouse involves transforming the data into a structure that is

optimized for analysis. Many customers choose a star schema or snowflake schema

for this purpose. This requires the different dimensions of the data to be split out into

separate tables. Aggregates and calculations are also often applied to the data to provide

additional analytical information. These calculations are performed as the data is loaded

into the data warehouse.

When choosing a data integration product, the ability to load very large volumes of data

within very short processing windows and trickle feed as required, should be considered.

white paper

ascential.com

12

DEFINE A DATA MODEL FOR THE SERVICES

With the cross-reference database and matching services in place, the next step is to

develop data services for your applications. Liquid Data uses a model based approach

to develop data services. A model helps you to create and maintain data services in an

organized fashion. In this approach all the complexity of data access, such as transforms,

data integration, validation rules, caching, and security is hidden behind the data model

– which creates breakthrough productivity for application development.

The data model in Liquid Data is defined based on the application’s data requirements.

The data model is mapped to the underlying physical sources. The data model can

encompass the underlying operational data sources, analytical data sources and other

XML and non relational sources. It uses the cross-reference information to link the various

sources. The data model can be used to get data from single or multiple sources. Hence

all the complexity of accessing or updating the data is hidden from the developers.

The mapping in the data model can be simple or complex. It can map to physical data

sources and also to services such as IBM’s matching engine. For example, in order to

get an accurate response when identification data is either unreliable or not provided,

the mapping in the data model is defined to call out to the matching service to obtain a

definitive match, or a list of probable candidates. Similarly, the data model mapping may

also call out to a service to get the survivorship information for the data model that has

multiple similar potential sources.

CREATE YOUR DATA SERVICE REQUESTS

With the logical data model in place, the application developer can write data services

across the virtual data source. The logical data model hides all the complexity of different

types of sources, different API’s, matching service, survivorship rules, and validation rules

from the application developer. The application developer defines the services against the

one virtual data source, and does not need to understand the source data structures or

how they relate to that view.

The data services can be invoked by multiple types of applications. For Java applications,

generally XML/SDO oriented approach works better, while reporting type applications

may require JDBC/SQL type access to the data. Finally, the queries can also be saved as

white paper

ascential.com

13

WSDL’s for SOA centric environments.

Once you understand the type of services your applications will use, you can configure

caching strategies, security policies and validation rules on the logical data model itself.

Caching allows the data be held in memory, so that subsequent requests for the same

data do not impact source systems. Inmost cases, the size and refresh rate of the cache

is configurable. Security allows you to control who can get access to what type of data.

Generally, you will need ability to specify security by data source, and user. In some cases

of sensitive or financial information, you may implement query level or field level security

policy. Validation rules on the logical data model allow you to specify valid updates and

hence make sure only “good” data goes back into your source systems.

For creating the data services, the key issue is performance tuning and debugging.

Understanding the performance characteristic of a service request that spans multiple

sources and optimizing its execution path requires rich tooling.

CREATE UPDATE SERVICES

When any event occurs that creates or updates customer information, these changes

need to be reflected in source systems and in the cross-reference database. The

logic for updating this information across systems can be very complex. It involves an

understanding of the important data elements across systems, along with the mapping

rules. The design of these processes can leverage metadata and business rules

discovered during profiling to jump start the development process. In addition, these

update processes will likely reuse the existing matching services to ensure that they are

not creating duplicate records.

These update processes can be published as services that are callable from any other

process or application. This ensures that these common business rules will be shared

from project to project rather than re-created. The result is a higher level of consistency

across all processes. Like the query services, these processes need to be secured, to

ensure that only entitled resources can call them.

white paper

ascential.com

14

JOINT SOLUTION BY IBM AND BEA SYSTEMS

BEA and IBM are working together on a service-oriented architecture for federated Master

Customer Data Management.

In this joint solution, shown in the picture below, IBM® WebSphere® ProfileStage™ is

used to profile the data sources, IBM® WebSphere® QualityStage™ is used to cleanse,

standardize, and match the data, and WebSphere DataStage is used to create and load

the cross-reference database and the data warehouse.

BEA Liquid Data Liquid Data is used to create a logical model spanning all underlying

data sources, and define composite data services for the applications. All the matching

logic and cross-reference keys generated in the IBM products can be exposed in BEA

Liquid Data.

The combined approach gives you a services oriented approach to federated master

customer data management.

white paper

ascential.com

15

COMPANIES MUST MANAGE MASTER CUSTOMER DATA AS AN ASSET

Master data management initiatives are significant undertakings for most companies

because so much investment has gone into creating and maintaining separate instances

of reference customer data, and so many processes are linked to this data. The only

way to meet the challenges of providing complete, accurate, and current data that is

accessible to enterprise processes is to take a comprehensive approach to managing

master data.

In this paper, we have described a comprehensive service-oriented architecture and

approach that we see applicable to most Fortune 500 companies. The advantages of this

approach are:

• Our initial customer successes indicate that this approach is up to 10X cheaper than

approaches based on replicating the data. Replicating the data is expensive due to cost of data

migration and synchronization. Further, this approach reduces the fundamental limitations of

replication such as latency, and inconsistency issues.

• Ongoing data quality issues are resolved as updates are always consistently applied

across multiple sources; and

• The shared services-oriented approach allows for reuse in the enterprise. It is not a problem

that every application developer has to solve again and again.

50 Washington Street Westboro, MA 01581

About Ascential Software Ascential Software Corporation, an IBM company, is the leader in enterprise information integration. Customers

and partners worldwide use the IBM Websphere Enterprise Integration Suite™ to confidently transform data into accurate, reliable and complete business

information to improve operational performance and decision-making across every critical business dimension. Our comprehensive end-to-end solutions

provide on demand information integration complemented by our professional services, industry expertise, and methodologies. Ascential Software is

headquartered in Westboro, Mass., and has customers and partners globally across such industries as financial services and banking, insurance,

healthcare, retail, manufacturing, consumer packaged goods, telecommunications and government. For more information call 1-800-966-9875 (508-366-

3888 if calling from outside the US or Canada ) or visit the Ascential Software website at www.ascential.com.

© 2005 Ascential Software Corporation., an IBM company. All rights reserved. Ascential and Ascential DataStage are trademarks of Ascential Software Corporation or its affiliates and may be registered in the United States or other jurisdictions. IBM, WebSphere, WebSphere Data Integration Suite, and WebSphere DataStage are trademarks of International Business Machines Corporation. Other marks are the property of the owners of those marks.

800.966.9875, Option 2 508.366.3888 ascential.com

white paper 16

CONCLUSION

Customer data is the lifeblood of a business. A complete and accurate understanding of

this data across systems is vital to providing top-tier customer service and maximizing

revenue opportunities. Using a federated approach, achieving this requires a master data

management strategy that involves data profiling and de-duplication, the creation of a

cross-reference database, and the creation of a virtual queryable view across multiple

source data systems. The ideal architecture for this is a service oriented architecture,

where shared data services provide consistent access and update of data across multiple

systems in real time.

The benefits of implementing this approach are enormous, allowing any new or existing

application, process, or user to get a complete and accurate view of any customer at

any time. The quality processes embedded in the design provide ongoing assurance of

the validity of the data, and the federated approach reduces the risk, latency, and cost of

replicating data across databases.