+ All Categories
Home > Documents > Enterprise Information Management: Information Virtualization for … · Enterprise Information...

Enterprise Information Management: Information Virtualization for … · Enterprise Information...

Date post: 07-Sep-2018
Category:
Upload: vudieu
View: 215 times
Download: 0 times
Share this document with a friend
25
Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations Abstract Enterprises need to integrate content from multiple sources to deliver consistent, timely, and meaningful information to their business processes. This is critical to reducing costs, increasing corporate revenue, and improving business agility, while maintaining regulatory compliance. This white paper identifies a spectrum of information integration and analysis use cases—from enterprise performance management and a 360º customer view to master data management and legal discovery—and describes the complex technical challenges that these use cases bring. Enterprise information management (EIM) is a strategic combination of components and services that can meet these challenges, from near-real-time information access and semantically-driven content integration to information virtualization and information as a service. While the big picture can be complex, practical starting points for EIM design and implementation include foundational capabilities in the areas of content management, master data management, information services for applications, information governance, and end-user access and analysis. August 2008
Transcript
Page 1: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Enterprise Information Management: Information Virtualization

for a Unified Business View Technology Concepts and Business Considerations

Abstract

Enterprises need to integrate content from multiple sources to deliver consistent, timely, and meaningful information to their business processes. This is critical to reducing costs, increasing corporate revenue, and improving business agility, while maintaining regulatory compliance. This white paper identifies a spectrum of information integration and analysis use cases—from enterprise performance management and a 360º customer view to master data management and legal discovery—and describes the complex technical challenges that these use cases bring. Enterprise information management (EIM) is a strategic combination of components and services that can meet these challenges, from near-real-time information access and semantically-driven content integration to information virtualization and information as a service. While the big picture can be complex, practical starting points for EIM design and implementation include foundational capabilities in the areas of content management, master data management, information services for applications, information governance, and end-user access and analysis.

August 2008

Page 2: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Copyright © 2008 EMC Corporation. All rights reserved.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com

All other trademarks used herein are the property of their respective owners.

Part Number H5586

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 2

Page 3: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Table of Contents Executive summary ............................................................................................4 Introduction.........................................................................................................5

Author........................................................................................................................................... 5 Audience ...................................................................................................................................... 5

Introduction to enterprise information management.......................................6 Representative EIM use cases...........................................................................7

Analytic use cases ....................................................................................................................... 7 Operational use cases ................................................................................................................. 7

Technical challenges of EIM ..............................................................................8 Dealing with varying degrees of structure in information sources ............................................... 8 Dynamically locating information and accessing it securely........................................................ 9 Understanding the meaning of information................................................................................ 10 Integrating federated, heterogeneous information..................................................................... 11 Facilitating user navigation, analysis, and visualization of information ..................................... 12

Information virtualization—the EIM stack.......................................................14 The historical stack .................................................................................................................... 14 The EIM stack ............................................................................................................................ 14 Information virtualization ............................................................................................................ 15

Gaining traction—practical starting points for EIM .......................................23 IT management.......................................................................................................................... 23 Governance, risk, and compliance ............................................................................................ 23 Content management ................................................................................................................ 24 Application support .................................................................................................................... 24 End-user capabilities.................................................................................................................. 24

Conclusion ........................................................................................................24 Acknowledgements ..........................................................................................25

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 3

Page 4: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Executive summary Most enterprises have hundreds of internal and external information sources scattered across databases, content repositories, e-mail archives, file systems, spreadsheets, digital images, audio files, and more. It is enormously complex to deliver meaningful, combined information to the business processes that need it. From financial services, pharmaceuticals, and manufacturing to retailing, telecommunications, intelligence gathering, and the energy sector, dozens of industries share four mission-critical objectives to: • Reduce the cost of locating and combining information through automation • Increase revenue through better decisions based on a broader set of relevant information • Introduce new products and processes through agility and flexibility in information management • Comply with information regulations and policies Presenting enterprise users with a current and unified view of information has multiple technical challenges. Information is spread across many locations, and is represented with varying degrees of structure using different formats and conflicting semantics. It needs to be discovered in a business context, securely accessed, interpreted, and meaningfully integrated. Whether intended for operational or analytic processes, information should be quick to find and easy to navigate, visualize, and analyze. These are the goals of enterprise information management (EIM). EIM is a strategic combination of components and services that delivers consistent, timely, and meaningful information to analytic and operational business processes. In a sense, EIM virtualizes diverse sources of information to provide a unified business view. Managing enterprise business performance is an important analytical use case for EIM. It involves a broad range of information sources including competitive websites, analyst reports, and SEC filings in addition to the more traditional databases of revenues and costs by organization, region, and product line. Other analytical use cases include improving product quality, creating individualized data views, assessing experimental drug effectiveness, unifying genomic information, and conducting legal discovery. On the operational side, use cases include managing master data, tracking real-time inventory levels, creating a 360˚ customer view, identifying complex enterprise events, exchanging information as part of eGovernment, accessing multiple departmental systems, managing digital assets, assessing information compliance and risk, and supporting change and configuration management in a data center. Historically, integration meant a batch-processed view of structured information. Components included a data warehouse over a DBMS, populated by an extract-transform-load (ETL) tool and accessed by a business intelligence front end. Now, the complex technical challenges presented by EIM use cases have led to an expanded stack of components and services for near real-time information access, analysis, integration and governance. The expanded stack reflects the importance of a metadata-based semantic view across diverse information sources, with varying degrees of structure. This helps automated and human processing of information. Clues to meaning are essential, whether searching for relevant information, reconciling conflicting data, analyzing information in a business context, detecting patterns across sources, or managing compliance. To provide a unified business view, multiple components of the EIM stack work together to virtualize information independent of source, format, and vocabulary. Database and repository managers, transformation and integration services, content analysis and search capabilities, metadata managers, and providers of information as a service, interoperate on behalf of the enterprise to provide a virtual, business-oriented view of information for operational and analytic use. Depending on your goals, use cases, business processes and existing investments in IT infrastructure, there are multiple approaches to EIM. Begin by integrating information sources key to the business, looking for a combination of significant business value and manageable complexity. But keep the end goal in mind, so that stepwise progress in architectural and process maturity will extend and scale to a broader solution. Practical starting points for EIM include foundational capabilities for content management, master data management, information services, information governance, and end-user access and analysis. New directions to consider are an XML repository, query acceleration, an information service bus, enterprise search, mashups, grid operating environments, and information rights management.

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 4

Page 5: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Introduction This white paper begins with business drivers for enterprise information management, followed by representative use cases that illustrate the scope of EIM. It describes the technical challenges associated with EIM and lays out the components and services in the expanded EIM stack that can help you meet these challenges. The paper ends with advice on how to gain traction in EIM and recommendations for practical starting points.

Author Dave Reiner, Ph.D., is a member of EMC’s CTO Office. Dr. Reiner has a long history of innovation in database and information management technologies, and has been involved in many strategic projects in his 6 years at EMC. These include designing caching models to support storage array design decisions, developing information lifecycle management use cases, simplifying data center resource management, and specifying common software architecture across EMC products. He is currently focused on semantic information integration and access, new business intelligence and enterprise search applications, enterprise information management, and enterprise information governance. Prior to EMC, Dave architected CRM software at NetGenesis and Fidelity Investments, was Chief Scientist for database marketing at Epsilon, invented parallel database technology at Kendall Square Research, and directed database research at Computer Corporation of America. He is a former Editor-in-Chief of IEEE Database Engineering and co-edited Query Processing in Database Systems. Dave holds multiple patents on parallel query and web mining algorithms.

Audience The intended audience for this white paper is the business IT professional or enterprise architect who wants to gain a strategic view and overall understanding of current and evolving enterprise information management technologies, their impact on business goals, and practical starting points.

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 5

Page 6: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Introduction to enterprise information management A health care industry executive recently recounted that his company had nearly 200 major sources of information across the organization. These sources were used in business processes such as patient admissions, clinical diagnosis, billing, and vendor tracking. He emphasized how enormously complex it was to combine and repackage information in a meaningful way across these sources, which were scattered across databases, content repositories, e-mail archives, file systems, spreadsheets, audio recordings, and more. A patchwork of software tools and manual procedures was barely keeping up with the flood of information1. There were difficulties adding new data sources, and high maintenance costs for point-to-point integrations. Unfortunately, the lack of a cohesive information management strategy was limiting the company’s ability to make decisions, improve processes, respond to competitive pressures, and innovate. From financial services, pharmaceuticals, and manufacturing to retailing, telecommunications, intelligence gathering, and the energy sector, dozens of industries are dealing with similar scenarios. What they also share are four mission-critical objectives to: • Reduce the cost of locating and combining information through automation • Increase revenue through better decisions based on a broader set of relevant and more timely

information • Introduce new products and processes through agility and flexibility in information management • Comply with information regulations and policies “Customers consistently identify ‘rapid access to relevant information’ among the top two business requirements for IT.”—IDC, 20072 Presenting enterprise users with a current and unified view of information has multiple technical challenges. Information is spread across many locations, and is represented with varying degrees of structure using different formats and conflicting semantics. It needs to be discovered in a business context, securely accessed, and meaningfully integrated. Whether intended for operational or analytic processes, information should be quick to find and easy to navigate and visualize. These are the goals of enterprise information management (EIM). EIM is a strategic combination of components and services that weave together and deliver holistic information—consistent, timely, and meaningful—to business processes. In a sense, EIM virtualizes diverse sources of information to provide a unified business view. For the enterprise, it is important to see information independent of source and format, reconciled with overlapping or even conflicting information, and normalized to a domain specific vocabulary. EIM is aimed directly at the mission-critical objectives of reduced cost, increased revenue, and improved agility, while maintaining compliance. Given such a broad definition, it’s not surprising that approaches to EIM differ, driven by the particular information landscape, the vertical industry, and the business processes involved. Making EIM a reality generally requires information services that help to: • Access, reconcile, and synchronize information from diverse sources • Integrate structured data and unstructured content • Respond to information queries, searches, or service invocations • Provide a semantically consistent view of current information • Support easy visualization and navigation of results • Protect security and privacy, and comply with regulatory policies

1 Gantz, John; et al. “The Diverse and Exploding Digital Universe,” IDC, March, 2008. 2 IDC Predictions 2007, #204631

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 6

Page 7: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

In the past, it has been a labor-intensive and error-prone approach to “wire” each business process to each information source it needs. EIM’s approach of exposing and combining information sources through service interfaces is more economical, performant, and scalable. A service-oriented architecture (SOA) allows new processes, sources, and services to be added to the mix without introducing combinatorial complexity.

Representative EIM use cases Information integration is central to business competitiveness and differentiation in numerous industries. Some of the use cases listed probably match your business processes or could help support them. Other use cases may expand your perception of the breadth of EIM for analytic and operational business processes, and for several specific vertical markets.

Analytic use cases Manage enterprise business performance through a broad range of information sources including

competitive websites, analyst reports, and SEC filings in addition to the more traditional databases of revenues and costs by organization, region, and product line

Enable knowledge workers to create short-lived, individualized views by “mashing together” diverse information sources for analysis

Improve product quality by understanding how product test results, component specifications, repair rates, buying patterns, customer attitudes, and public events are related

Create a unified view of information to support legal discovery (eDiscovery) from a heterogeneous combination of e-mails, memos, wikis and other collaborative artifacts, presentations, RSS feeds, SEC filings, invoices, audio and video recordings, work breakdown structures, and contact databases

Create a 360˚ customer view for marketing and customer service from databases and content repositories of prior purchases, offers, interaction history (possibly including website visits and e-mails), demographics, and psychographics

Predict costs over multi-year manufacturing processes (for example, aircraft) through an integrated, semantic view of information systems with different structural, simulation, and financial models; and with varied scope, component names, formats, and change-propagation rules

Assess experimental drug effectiveness by combining pharmaceutical trial outcomes with research publications and peer discussion forums, enhanced through content analysis and pharmaceutical concept descriptions (ontologies)

Locate colleagues with specific expertise from wikis, discussion forums, human resource databases, resumes, citations, and publications

Search reports and unearth relevant facts for enterprise decision making using report metadata and content as well as additional enterprise documents

Operational use cases Track inventory levels in near real time across multiple suppliers to optimize the supply chain,

combining information across systems with different SKUs, product descriptions, and product groups

Manage master data (customers, products, partners, employees) through data cleansing, identity matching, policy definition, and workflow management; make the results available through information services

Identify and classify complex enterprise events from within a multi-source event cloud to accelerate enterprise response to delays in business processes, failures in supply chains, or rapidly evolving perceptions of the enterprise, through inter-event correlation, abstraction, and relationship detection

Manage digital assets (text, images, multimedia) through locating, annotating, cataloging, and storing assets, and managing and monitoring licensing, deployment, and usage

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 7

Page 8: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Improve marketing to telecom customers by analyzing transaction details (call patterns, music

purchase and use, Web browsing) to understand customer behavior and to adjust product capabilities, offers, and pricing in near real time

Integrate multiple databases and information sources following an acquisition or merger, through source consolidation, use of an information services layer, and support for federated queries and transactions

Provide a single point of entry to government information (eGovernment) to exchange information, provide services, and transact with citizens, businesses, and other government entities in areas such as benefits, grants, services, taxes, and job recruitment

Access multiple, diverse systems in real time, for example, for a 911 emergency call center; rapidly

assemble relevant information from systems including state and local police, court records, prisons, taxes, and a national repository of personal and property records

Enable collaborative information stewardship by involving employees in cleansing, tagging, linking, enhancing, and verifying information for which they are responsible

Support ITIL change and configuration management processes in a data center using a federated configuration management database that discovers information on people, business processes, and IT infrastructure from databases, configuration files, resource management repositories, policies, workflows, help desk records, and contracts

Improve data center availability proactively and reactively by indexing, searching, and correlating event information derived from alerts and traps, logs, scripts, code, configuration files, stack traces, activity reports, messages, and metrics

Assess enterprise information performance and risk with respect to compliance, security, and privacy policies and regulations

Almost all of these use cases require accessing and combining structured and unstructured information from a broader and less predetermined set of sources than would have been considered in the past. Many of the use cases have a dynamic, near-real-time flavor. Most necessitate a semantic understanding of information meaning and interrelationships to help create a holistic view in response to a purposeful information request and its context. And all are looking to improve the experience for the end user for whom ease of use and simplicity in accessing, navigating, and distilling information translate into business advantage.

Technical challenges of EIM These analytic and operational use cases present complex technical challenges, including: • Dealing with varying degrees of structure in information sources • Dynamically locating information and accessing it securely • Understanding the meaning of information • Integrating federated, heterogeneous information • Facilitating user navigation, visualization, and analysis of information

We’ll describe these challenges in more detail and then show how the components and services in the expanded EIM stack help to meet them.

Dealing with varying degrees of structure in information sources From planning documents, public web pages, and sales figures stored in corporate databases to customer e-mails and RSS feeds, new business information is growing exponentially. Information sources have varying degrees of explicitly defined structure. The most structured sources, where tabular information is regimented into rows with identical typed columns and keys, include relational and legacy databases, multidimensional databases, flat files in record format, and tables in spreadsheets. The least structured sources, for which there is little external structure defining information content, format, or meaning, include most documents, images and graphics, audio and video files, web pages, wikis, blogs, instant

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 8

Page 9: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

messages, source code, and reports. In the middle is a category of semi-structured sources, such as e-mXML data, RDF knowledge graphs, EDI documents, and RSS feeds. It’s estimated that more than 85 percent of data growth is in unstructured or semi-structured content.

ails,

tructured sources such as a relational database have explicit metadata or schemas that describe their

d

ch

n application above a single data source—or a set of similar sources—can be designed explicitly for a d

f

ust

rom a user’s perspective, it is maddeningly frustrating that … to get a complete view of a customer, one

he search box is fast becoming ubiquitous in user applications. Executing a search query against erently

a typical corporate litigation, responding to a broad-scope eDiscovery request means sifting through al

l

ation

3

Sinformation in detail. Semi-structured sources have partial metadata, such as the author and subject associated with an e-mail. A semi-structured XML document may have an associated XML Schema Definition (XSD) file constraining its elements and attributes. Unstructured sources (and the unstructureportions of semi-structured sources, such as the body of an e-mail) do not have explicit metadata, so they must first be indexed, tagged, or otherwise analyzed before they can be filtered and combined with other sources. Despite the analysis effort required, there’s tremendous value, for example, in taking structured data about customers and being able to link it to relevant unstructured and semi-structured information suas e-mails, scanned hardcopy correspondence, and notes from conversations. It provides rich context that can enable businesses to deliver much more responsive customer service. A database that lists terrorist sightings becomes simultaneously more powerful and more nuanced when it’s combined with photos, video, audio, and anecdotal text. Asingle information structure. But the breadth of sources required for EIM use cases means that humans anprograms that search, query, navigate, and analyze information often have to cope with a hodgepodge of structures.

Dynamically locating information and accessing it securely Even a straightforward marketing query about customer-spend-to-date above structured sources of customer information may need access to multiple databases to compute an aggregated total. Somepreprocessing may be done ahead of time (data warehouses and marts come to mind), but the speed obusiness has accelerated, and dynamically computed, near-real-time results are desirable. Often, these mbe assembled from a combination of SQL, API calls, and information services requests4 against enterprise systems. Fmust run a database report… scan emails about the customer, and find letters or other documents by or about that customer … all as separate actions.”—Unified Access to Content and Data, IDC, 2006 Tunstructured information sources, say research publications on pharmaceutical trial outcomes, is inha dynamic approach to information location. It depends on efficient and comprehensive search indexes, on the quality of content tagging, and on access rights to the underlying information sources. End-user navigation of large result sets, described below, is a related challenge. Indozens of loosely connected systems, ranging from formal databases and content repositories to individuPCs and their file systems. For example, in a contractual dispute with a vendor, you may need to produce all the e-mails that mention this vendor within a given time period, any presentation made to the vendor, alversions of the contract in dispute including annotations, notes of conversations between individual negotiators, and outstanding work orders that may be housed in the ERP system. Some of this informmay no longer be in production systems but archived on a different medium (such as disk or tape); other information may have been very loosely governed (such as evolving spreadsheet versions).

3 Tee, Phil. “The Data Management Gold Rush,” http://www.dmreview.com/article_sub.cfm?articleID=10889364 Information services are usually incorporated into applications at design time, but auto-discovery of such services at runtime allows more flexibility.

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 9

Page 10: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Concerns about security, privacy, and compliance complicate the picture. With respect to security, the required access credentials vary across diverse enterprise systems, and security breaches can bring enormous liability. Use cases such as inventory management may require secure access to supplier information outside of the corporate firewall. Of course, authentication, authorization, and accounting5 need to be strictly enforced at lower levels of access to databases, repositories, file systems, and information services, and whenever information lifecycle events such as archiving occur. Privacy violations may arise from secondary use or non-obfuscation of personal identifying information (PII), and from combining information across disparate sources. More generally, privacy may be harmed through information collection, processing, and dissemination, as well as through direct invasion.6

With respect to compliance, most personal and sensitive information falls under regulatory protection. Privacy legislation such as the Gramm-Leach-Bliley Act requires corporate and government institutions to prevent unauthorized access to non-public, personal information, including the fact that an individual is a customer of a particular financial institution. The Health Insurance Portability and Accountability Act’s (HIPAA) Privacy Rule establishes regulations for the use and disclosure of protected health information (PHI), which includes any individually identifiable health information. This protection extends to written, oral, faxed, and electronically stored information (ESI). Also, when information is being combined across sources, whether by IT tools, searches, or even “mashup” tools, there is an additional compliance issue since the combination may yield a new view, disclosing information that was not previously available (for example, for privacy reasons). Information governance is difficult in this case. A combination of security and compliance concerns is involved in keeping official, immutable records to answer questions about the origin and derivation paths of information. This area of data lineage (or data auditing) has application to document management, proof of legal compliance in eDiscovery, repeatable scientific investigations and statistical algorithms, financial earnings reports, and medical analyses and lab work.

Understanding the meaning of information It’s been a longstanding problem to understand the meaning of structured enterprise information used for operations and analysis. Across heterogeneous systems, there are subtle differences in definitions of customers, product categories, cost structures, revenue recognition, claim status, event classification, asset ownership, and so on. The very absence of a data value may be interpreted as unknown, not applicable, zero, or as a default value. Even with entirely structured information, different applications need different views of the combined information and the potential for misinterpretation is huge. Metadata helps query and system builders understand the source, derivation, and intended use of fields, but it may be unavailable or hidden below a business view of information. Information transformation and mapping approaches can clean and normalize data to create integrated views, although sometimes at the cost of burying semantic knowledge deeply in transformation processes. Where unstructured or semi-structured information is involved, many aspects of meaning need to be derived or extracted in a process often called content analysis. Metadata derived from content such as documents, e-mails, or web pages includes: • Attributes (such as “document author”) • Tags (such as “political” or “software”) • Classifications (such as “mission-critical”) • Extracted entities (such as “Kenneth Lay” and “Enron”) • Extracted relationships (such as “Dave Reiner works at EMC”) • Sentiments (such as “negative” in a customer complaint e-mail) • Clusters (such as related documents)

5 Authentication, authorization, and accounting (or auditing) are often referred to as security’s three A’s. 6 Solove, Daniel J. “A Taxonomy of Privacy,” University of Pennsylvania Law Review, Vol. 154, No. 3, January, 2006.

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 10

Page 11: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

• Indexes and facets (for search and navigation) • Mappings (to related information) • Information usage patterns (for optimization and governance)

Such metadata adds considerable value to information and enables it to be indexed, cataloged, searched, retrieved, and reused. Automated content analysis and metadata extraction reduce the burden on human users to classify or tag information. Several key concepts from the semantic web7 ,8 help to express the meaning of content in ways that can be understood by software agents as well as people. An ontology is a precisely defined common terminology that corresponds to a specific domain of knowledge such as vehicles. It describes entities (for example, people, vehicles), attributes (model, year, type, mileage), interrelationships (people own vehicles, people drive vehicles), and rules (people must be at least 16 years old to obtain a motorcycle permit in Massachusetts). An ontology provides meaning for a domain and can be used as a basis for classification, inference, search, mapping, and reasoning about entities in the domain. For example, “physicians” and “doctors” may be regarded as equivalent in a medical ontology. Web Ontology Language (OWL)9 enables knowledge-level interoperation of intelligent agents and allows programs to access information structure and content. OWL extends Resource Description Framework (RDF)10, which supports a more limited repertoire of resource and property description. Because it includes not only classes but also attributes, constraints, rules, and interrelationships, an ontology can portray a domain more fully than a hierarchical taxonomy, whose use is mainly for classification. To describe web services, the Web Services Description Language (WSDL)11 is foundational. But WSDL is purely a syntactic approach; its recent proposed extension with SAWSDL12 will allow web services to be tied to semantics critical to understanding domain specific information. SAWSDL extensions take two forms: model references that point to semantic concepts such as ontologies, and schema mappings that specify data transformations between messages’ XML data structures and the associated semantic model. While SAWSDL represents work in progress, its promise is to automate understanding of the meaning of information accessed through information services.

Integrating federated, heterogeneous information If information integration were the only concern of system designers, information sources would be consistently defined and managed. But in the enterprise, multiple, autonomous information systems are built over time to meet the diverse needs of business units and applications. In practice, information sources may have different data models, schemas, naming conventions, attribute domains, data quality, value precision, currency of information, encodings, and constraints. External sources of information also fuel the heterogeneity of structure and semantics. Federated information refers to information from systems and sources where overall central authority is weak, although partial sharing and coordination are possible. Integrating federated, heterogeneous information is clearly a major technical challenge. The goal is to bring information together using tools 7 Berners-Lee, Tim. Scientific American, May, 2001. 8 World Wide Web Consortium (W3C), “Semantic Web Activity,” http://www.w3.org/2001/sw/9 World Wide Web Consortium (W3C), “OWL Web Ontology Language Overview,” May 10, 2004, http://www.w3.org/TR/owl-features/10 World Wide Web Consortium (W3C), “Resource Description Framework (RDF),” 2004, http://www.w3.org/RDF/11 World Wide Web Consortium (W3C), “Web Services Description Language (WSDL) Version 2.0 Part 0: Primer,” 2007, http://www.w3.org/TR/wsdl20-primer/12 Kopecky, Jacek; Vitvar, Tomas; Bournez, Carine; Farrell, Joel. “SAWSDL: Semantic Annotations for WSDL and XML Schema,” IEEE Internet Computing, November/December 2007, pp. 60-67.

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 11

Page 12: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

and processes that deal with ambiguities, simultaneously missing as little as possible and keeping incorrect matches to a minimum. In addition to point-to-point integration, many historical solutions to the integration challenge have their roots in the extract-transform-load (ETL) tools used to build data warehouses. Data is extracted from its sources through queries, interfaces, and adapters. It is transformed, cleansed, and reconciled according to relatively arcane and constantly changing business rules, and then loaded as quickly as possible into a data warehouse. In turn, the data warehouse supplies extracts known as data marts that serve functional or geographic areas of the business. All of this is usually done overnight through a batch process, whose results are intended to reflect a particular point in time such as the end of the business day. In short, ETL tools work against repositories of structured data to feed business intelligence applications with integrated information such as customer purchase behavior over the last 24 hours. However, consolidating information overnight doesn’t meet the needs of business processes, such as supply chain management, where a near-real-time view of federated information is required. Rather than periodically moving and combining data, the challenge is to leave information where it resides and retrieve it on demand. This requires coordination and optimization of queries against federated information sources, which may have different query languages and service interfaces. There are complex tradeoffs to make between the extremes of materializing all the information that might be needed and fetching just what is required on demand. Regardless of the timing, an underlying technical challenge involves matching entities across information sources. For example, information about customers, products, employees, and partners is scattered and often inconsistent. But business processes such as revenue recognition, customer service, and financial rollup need to see a unique reference model, or “single version of the truth.” This challenge, which blends data modeling, fuzzy matching, and synchronization aspects with cross-organizational politics, is usually called master data management. For health care, the entities to be matched include patient, provider, and location; for insurance, they include consumer, provider, incident, and claim. A more general, related challenge is change data capture, or the determination that data in a database or other location has changed and may require change propagation or a more complex data synchronization action. Where unstructured or semi-structured content is being integrated, the challenges of integration are mainly at the metadata level. The underlying information is integrated (or at least captured or referenced) based on indexes, associated tags, extracted entities and relationships, and document clusters. To assess relevancy, this metadata needs to be compared to search targets and constraints, and expanded to factor in the initial search context and to apply any domain-specific ontologies that assist the automated interpretation of meaning.

Facilitating user navigation, analysis, and visualization of information Delivering holistic information to a user or business process is essential, but that’s not the end of the line. In an incremental fashion, users examine, rearrange, narrow, and analyze their information. It’s challenging to anticipate user information needs, to reformat and repackage results, to support navigation of virtual information views, and to simplify user analysis and visualization of information. User expectations are high for ease of use and performance, even when much more than a search index lookup is happening under the surface. There is also a community aspect to tagging and learning how to view information. What works for one user should be helpful to others, and more generally the patterns of analysis and interpretation that one enterprise applies can be shared with others. Anticipating user information needs starts with database design, which has always been a difficult task. It requires understanding the types and scope of user queries, and deciding how to collect, structure, and group information. More generally, EIM entails predicting important information sources and conveying their contents to users. Of course, such planning does not preclude later searching and serendipitous discovery of valuable information.

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 12

Page 13: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Navigating information means bridging to related information or drilling up or down with respect to the level of detail. For navigation, a useful generalization is the idea of dimensions of information. Traditionally, structured enterprise information has dimensions such as time, business unit, geographic region, product family, product price, customer segment, and sales channel. Unstructured content also has dimensions, or facets, which depend on content attributes, tags, classifications, and associated metadata. For example, the facets on a consumer product company’s shopping website might include manufacturer, price range, color, size, and average consumer rating. Constraints on dimensions represent a narrowing of context, while navigating dimensions is a natural way for users to link to related information. Dynamic tags generated by social interaction (for example, through del.icio.us) can be valuable as additional dimensions. This is particularly important for rich digital media. Integrating dynamic tags into navigational behavior—without a controlled vocabulary, or explicit metadata hierarchies, or guidelines for tagging—is certainly a new challenge. Information users may also need automated assistance to make sense of their data sets by identifying, interrelating, clustering, and grouping.13 A simple example is to present users with a “disposable” clustering of search results that guides further exploration. A more ambitious example is tracking and identifying terrorists, based on a combination of identifying attributes and observational data about suspected terrorists, their activities, and their communication patterns.14

Information analysis depends on metrics—periodic, quantitative assessments of business processes—derived from business information. The most important metrics are sometimes called key performance indicators (KPIs). The challenge is to compare past, present, and predicted future metrics, and to detect and understand significant trends and patterns. “At a time when firms offer similar products and services, business processes are among the last remaining points of differentiation. And analytics competitors wring every last drop of value from those processes. Like other companies, they know what products their customers want, but they also know what prices those customers will pay, how many items each will buy in a lifetime, and what triggers will make people buy more.” Examples: Amazon, Harrah’s, Capital One, Boston Red Sox. – Thomas Davenport, Harvard Business Review15 Analysis and navigation tools, from simple reporting to data mining, need connectivity to information services and sources through various protocols and access frameworks. Examples are JDBC, ODBC, proprietary APIs, REST16, Ruby on Rails17, and SOAP-based web service invocations. Service interfaces, adapters to information sources, and even “web page scrapers” can hide unnecessary details. For complex event processing applications, such as dashboards, arbitrage, and data center monitoring, the information sources to be analyzed may be near-real-time data streams. Mashup technologies make it easier for the next generation of knowledge workers to combine information in new ways. They may even be able to modify and correct information sources through the mashup view (unlike read-only portals). Mashups accelerate the pressure on IT to make the information service feeds reliable, accurate, secured, and accompanied by reasonable provenance, and to enable and validate updates. Finally we come to visualization. Visual representations of abstract information help users to explore and understand it. The challenge is not to present flashy visual effects but to show meaningful aspects of the data that amplify cognition and engage the pattern detection strengths of the human visual system.18 From

13 Pantel, Patrick; Philpot, Andrew; Hovy, Eduard. "Matching and Integration Across Heterogeneous Data Sources," June 1, 2006, http://www.patrickpantel.com/Download/Papers/2006/dgo06-01.pdf14 Ibid. 15 Davenport, Thomas. “Competing on Analytics,” Harvard Business Review, January, 2006. 16 Fielding, Roy; Taylor, Richard. "Principled Design of the Modern Web Architecture", ACM Transactions on Internet Technology, May, 2002. 17 http://www.rubyonrails.org/18 Few, Stephen. “A Ménage à Trois of Data, Eyes and Mind,” DM Review, January, 2008.

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 13

Page 14: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

timelines, radar charts, and treemaps to mindmaps, heat graphs, and mashups, visualizations can evoke the “aha!” response in a way that no spreadsheet or drab report can match.19 , 20

Information virtualization—the EIM stack The expanded EIM stack consists of components and services that support the use cases and meet the technical challenges of EIM. We’ll begin with a walk down memory lane.

The historical stack In the early days of data warehousing, structured data was managed in a relational DBMS such as Oracle or DB2. Overnight, or less frequently, portions of the data from operational databases were cleansed, standardized, and combined by ETL tools, such as Informatica or Data Junction, to create data warehouses and data marts designed for analysis. Business intelligence tools then queried these warehouses and marts to create reports and graphs for analysis. A simple view of these components is illustrated in Figure 1.

StructuredData

Relational & Legacy DBMSs& File Systems

Extract, Transform & Load (ETL)

Analytic Business Intelligence(BI)

Figure 1. The historical stack

But the historical stack can no longer meet business and user demands for meaningful, current views of enterprise information or the technical challenges of more complex and ambitious information use cases.

The EIM stack The expanded EIM stack in Figure 2 on page 15 builds upon what has come before. The historical components that construct and access data warehouses are still relevant. But through careful architectural planning, current enterprise configurations are generally expanding in one or more of six areas:

• Structured information access—near-real-time access to federated sources, acceleration of complex queries, and access to information services encapsulating legacy systems

• Unstructured and semi-structured information processing—content management and integration, access to XML sources, analysis of content semantics, use of domain ontologies and semantic web concepts

• Business analysis and applications—links to business process management, analytic and operational business intelligence, search-based applications and result navigation, and transactional content management

• Coordination and integration—master data management, enterprise search and query brokering, information as a service, workflow orchestration, complex event processing and human collaboration

19 Lengler, Ralph; Eppler, Martin. “A Periodic Table of Visualization Methods,” http://www.visual-literacy.org/periodic_table/periodic_table.html20 Lima, Manuel. Network visualization approaches (including knowledge networks), http://www.visualcomplexity.com/vc/

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 14

Page 15: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

• Information governance and management—security, metadata and knowledge management,

governance, risk and compliance; information rights management, and information lifecycle management

• Grid operating environment—scalable and high-performing information ingestion, transformation, and analysis; support for information as a service

1© Copyright 2008 EMC Corporation. All rights reserved.

StructuredData

RelationalDBMSs

Extract,Transform

& Load(ETL)

Analytic &Operational BIApplications

FileSystems &

OtherDBMSs

Ent.Appl.Integr(EAI)

Components and Capabilities in Expanded EIM StackInform

ation Service Managem

ent

Orchestration (BPM, Workflow, CEP)

Collaboration (including Social Networking)

Unstructured andSemi-Structured Content

Enterprise ContentManager

EnterpriseContentTrans-

formation

EnterpriseContent

Integration

TransactionalContent MgmtApplications

Content A

nalysis

EnterpriseInformationIntegration

(EII)

Security

ApplicationSuites

Metadata and Know

ledge Managem

entInform

ation Governance, R

isk and Com

plianceG

rid O

pera

ting

Env

ironm

ent

Enterprise Searchand Query Broker

SearchBased

Applications

XQuery,SPARQL &Other NewLanguages

Master DataManagement (MDM)

SemanticView andInformationVirtualization

Figure 2. Components and services in the expanded EIM stack

Information virtualization The new components and services in the stack reflect the importance of a semantic view—a metadata-based underpinning of meaning—across heterogeneous information sources. This helps automated and human processing of information. Clues to the meaning of information are essential, whether searching for relevant information, reconciling conflicting data, analyzing and interpreting information in a business context, detecting patterns and similarities across information sources, or managing compliance and governance. In the stack, multiple components help to create a semantic view, from those that describe and annotate information or that characterize the entities, relationships, and constraints of the domain of interest (for example, ontologies in domains such as financial services, insurance, bioinformatics, manufacturing, or health care) to those that standardize, reconcile, and combine information for purpose. The usefulness of the entire stack is in proportion to the correctness of the metadata and semantic representations that describe the information brought together. Information virtualization is in harmony with a semantic view of information. To provide a unified business view, multiple components and services of the EIM stack work together to virtualize information independent of source, format, and vocabulary. Database and repository managers, transformation and integration tools, content analysis and search components, information service and metadata management capabilities, and delivery channels, such as mobile devices, interoperate on behalf of the enterprise to provide a virtual, business-oriented view of information. A service-oriented architecture (SOA) facilitates adding new business processes, new information sources, and new information handling services to the mix without a combinatorial increase in complexity. A SOA allows later substitution of service

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 15

Page 16: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

implementations while preserving the invariance of the service interface. The concept of information as a service means componentizing information at a coarse-grained, business-focused level, and accessing it through a SOA—without binding the architecture to specific sources of data and content.21 Information may be delivered over SOAP and HTTP, or more cohesively through service mediation middleware such as an enterprise service bus. Master data management (MDM) services have emerged as a key layer for information virtualization. Through automated tracking and reconciliation of federated, inconsistent data sources, MDM services ensure the currency, meaning, and quality of enterprise reference data across subject areas and systems.22 Sophisticated entity matching, which employs cleansing, standardization, rules, and fuzzy matching techniques, gives business applications on-demand access to a “single version of the truth” for fundamental enterprise entities such as customers, products, employees, and partners. A unified, consistent view of these entities is needed for billing, customer service, revenue recognition, catalog management, and other applications. Services for change data capture and synchronization are a critical part of MDM as well. Business rules may require that a data change event in one location propagate to others, or that a more complex validation or synchronization action be performed. For example, discharging a patient from a hospital should trigger a sequence of changes to room and equipment inventory, to billing and chargeback systems, and to medical records—all for the correctly identified patient. MDM services can play a central role in the operational coordination of business systems, and thus are well suited for deployment on an enterprise service bus. Another aspect of information virtualization, high up in the stack, is provided by modern rich internet applications (RIAs). The highest quality RIAs follow a model-view-controller pattern that decouples data access and business logic from data presentation and user interaction, and are ideally suited for distributed and scalable solutions. The resource access required is provided through simple REST interfaces, where resource requests may be mediated by any number of connectors (for example, clients, servers, caches, tunnels), but each does so without seeing past its own view of the resource identifier and action requested. An application does not need to know whether there are caches, proxies, gateways, firewalls, tunnels, or anything else between it and the server actually holding the information, although it does need to understand the representation of the returned content, whether an HTML or XML document, an image, or anything else. The historic stack did not include search. However, information integration through search can be a nimble and relatively lightweight approach for many use cases. First, index the scattered content to make it accessible and searchable, then analyze and enrich it with meaning through content analysis. Add an enterprise search and query broker to unify searches over multiple local sources, to prevent unnecessary re-indexing, and to honor security policies, and this goes a long way toward piloting a first EIM solution. On the other hand, data analysis scenarios often join together information from disparate sources, where broad data sets need to be grouped and summed. Enterprise search, despite its global reach and utility at ferreting out relevant information, is not strong at joining massive amounts of data among sources. This makes the case for systems such as XML repositories that can collect, manage, locally index, and search data that originated from separate sources. XML repositories can function within the confines of data normalization approaches (mandated by the relational approach), but can also be queried more flexibly by XQuery without being bound to a specific schema. These capabilities may also come from RDF triple stores and other knowledge repositories that provide XML and semantic views through which to query their contents. Grid operating environments are another new option for accelerating processes ranging from information transformation and integration to rapid query processing, master data management, and information as a

21 Gilpin, Mike; Yuhanna, Noel. “Information-As-A-Service: What’s Behind This Hot New Trend?” Forrester Research Report, March, 2007. 22 Dyche, J., Levy, E. “Not Your Father’s List Management: MDM Matures, Part 1,” DM Review, March, 2008.

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 16

Page 17: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

service. While a grid operating environment is not strictly necessary for EIM, the allocation of processing, storage, and other resources in such an environment is flexible and cost-effective. Its high capacity and scalability directly address the explosion in enterprise information and the resource-intensive tasks required to manage this information. The term cloud computing refers to a subset of grid computing where the grid operating environment offloads services from local servers and personal devices. Finally, an emerging area of the expanded EIM stack is information governance, risk, and compliance (GRC). GRC is a policy-driven methodology, increasingly supported by software, which promotes enterprise business viability by unifying corporate strategy, information retention and destruction, compliance with external regulations and internal business rules, security, opportunity discovery, and loss mitigation. The intent is to standardize and automate processes and policies, regardless of underlying implementations, and to integrate technology across business systems and IT infrastructure. One important aspect of GRC is information rights management—a set of services to protect and control the dissemination of sensitive content such as documents and e-mail inside and outside the enterprise firewall. In a model-based approach to GRC, the perceived state of enterprise information systems can be compared to the desired state, with mitigation of differences to close the loop and thereby comply with information policies. In addition to the legal and strategic aspects of GRC, there are several purely practical aspects related to data management. Given that much data ingested by the enterprise needs to be retained for a long time, but still accessed efficiently when needed, minimizing operating costs for these tasks is paramount. Technology around compression, data range partitioning, caching, materialized views, data archiving, and, more broadly, information lifecycle management (ILM) becomes critical to making GRC affordable. Applying policies to information abstractly according to metadata describing its business value, type, content, owner, currency, immutability, and other attributes represents another dimension of information virtualization. Ontologies also hold strong promise for GRC, given their ability to express unified domain semantics, constraints, and rules that might otherwise be distributed across separate policies. The six tables that follow illustrate and clarify the expanded EIM stack. New requirements are matched to examples and to enabling technologies. Of course, your version of the new stack will not include every single component (as discussed in the section “Gaining traction—practical starting points for EIM” on page 23). The stack (and corresponding tables) break down into the following areas: • Structured information access on page 18 • Unstructured and semi-structured information processing on page 19 • Business analysis and applications on page 20 • Coordination and integration on page 21 • Information governance and management on page 22 • Grid operating environment on page 22

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 17

Page 18: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Table 1. Structured information access

EIM requirement Example Enabling technologies

Optimize federated, metadata-driven queries against a wide set of structured information sources, producing near-real-time views

Count how many of a given product are in inventory in a multi-vendor supply chain

Enterprise information integration (EII)

Invoke legacy product capabilities and access application data to create composite applications that support business processes

Notify marketing to follow up with e-mails and phone contacts when new product brochures are downloaded from the corporate website

Enterprise application integration (EAI)

Accelerate complex query processing against large amounts of relational data

How much would revenue over the prior year have increased if customer discounts in the range of 10%-20% had been reduced by 5% for reorders only?

Columnar DBMSs and other new approaches to relational database management (RDBMS) and data warehousing

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 18

Page 19: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Table 2. Unstructured and semi-structured information processing

EIM requirement Example Enabling technologies

Create, capture, store, publish, search, personalize, and present unstructured and semi-structured information such as text, images, compound documents, audio, and video

Capture and route insurance claims forms, and track related multimedia attachments such as photos

Enterprise content management (ECM)

Transform content, including rich digital media, into formats for multiple delivery channels such as the web, print, mobile phones, and video broadcast

Publish a product catalog including videos or photos in multiple formats simultaneously, including web and print

Enterprise content transformation (ECT)

Discover and access heterogeneous content from multiple repositories and content providers, leveraging local indexing, search and query capabilities; organize results for users

Connect knowledge professionals in a commercial law firm to legal content from diverse sources through a common interface

Enterprise content integration (ECI) and Enterprise search and query broker (ESQB)

Parse, structure, categorize, and cluster text; extract facts, concepts, relationships, sentiments, and taxonomies from it; also derive semantics from photos, video and audio, including objects, events, and behavior patterns

Tag and summarize incoming RSS news feeds to detect potential merger and acquisition activity

Content analytics (CA)

Express queries against information stored in XML documents or in RDF triples (whether the data is natively stored as RDF or viewed as RDF via middleware); allow queries to be easily extended to unanticipated data sources and applications

In the procedure section of a specific surgery report, what instruments were used in the second incision?

XQuery and SPARQL, respectively

Use precise, machine-interpretable terminology about the semantics of entities and relationships in a domain to locate, transform, reason about, and act upon heterogeneous information

Construct a unified view of financial documents expressed in eXtensible Business Reporting Language (XBRL)23, sourced from different countries

Ontologies and semantic web mechanisms

23 http://www.xbrl.org/Home

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 19

Page 20: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Table 3. Business analysis and applications

EIM requirement Example Enabling technologies

Define, execute, monitor, and optimize transactional and collaborative business processes consistently and reliably across organizations, systems, and applications

Assign and track telecom service requests to minimize delays, lower costs, and improve customer satisfaction

Business process management (BPM)

Enhance business management and performance improvement through analysis of historical, current, and predictive views of business information derived from multiple information sources

Why were Western region customer reorders down in Q4?

Analytic business intelligence (BI)

Drive and optimize business operations and decisions on a daily or hourly basis, anticipating problems and heading them off through a near real-time view of operational information and a set of applicable business rules

What rerouting of transport aircraft will best speed the handling of a sudden surge in package shipments to Europe?

Operational business intelligence (BI)

Handle essential business processes for enterprise resource planning (ERP), including supply chain management (SCM), and customer relationship management (CRM), which require operational and analytic access to unified databases and repositories

Oversee and track materials, information, and financials through the supply chain from multiple electronics component suppliers to a computer manufacturer (and onwards through wholesalers to retailers to consumers)

Application suites (ERP, SCM, CRM, and so on)

Search, filter, cluster, and navigate a heterogenous set of information from multiple sources to enhance knowledge and to establish interrelationships, trends, and significant patterns

A financial “watchdog” application detects money laundering and possible financial fraud

Search-based applications

Capture, process, deliver, and archive the paper and electronic documents that drive mission-critical business processes such as invoice processing, incoming purchase orders, and claims handling

Manage the process workflow of invoices for commercial property leases and facilitate their retention and retrieval

Transactional content management (TCM)

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 20

Page 21: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Table 4. Coordination and integration

EIM requirement Example Enabling technologies

Access, modify, analyze, and govern information sources through a service interface that hides details and permits new applications to be constructed rapidly

Quickly connect a new financial planning application to existing corporate databases and external information sources

Information service management (based on SOA), sometimes called information as a service (IaaS)

Manage and distribute common reference data and global identifiers as an information service including standardizing, cleansing, matching, linking, and synchronizing data (via change data capture) from multiple information systems

Does an online purchaser correspond to an existing customer in our database?

Master data management (MDM)

Discover and access content across the enterprise to be indexed, searched, and displayed; interpret search queries and coordinate access to information sources managed by multiple repositories and content managers

Create a customer service knowledge portal to enable access to diverse internal information about best practices and solutions to customer issues

Enterprise search and query broker

Achieve a multistep process, potentially with decision points and alternate flows, by planning, initiating, and coordinating the activities of service providers

Manage the creation from genome sequences of an energy research knowledge base to understand the functions of microbes and communities in the environment; track modeling, experimentation, sample handling, and analysis

Workflow orchestration

Automatically filter, classify, and correlate events into significant patterns in near real time, and, based on business rules, orchestrate a response to the issue or opportunity discovered

Monitor event clouds in data center operations to discover root causes of problems, anticipate and head off issues, and automate responses

Complex event processing (CEP)

Take advantage of community knowledge and analysis of information, from ratings and social tags to pointers to relevant discussions

What information sources have physicians found credible for pharmaceutical trial results?

Collaborative applications including social networking

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 21

Page 22: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Table 5. Information governance and management

EIM requirement Example Enabling technologies

Enforce security (authentication, authorization, and accounting) for access to databases, repositories, file systems, and information services; ensure that security is not compromised during subsequent information integration and dissemination

What users in what roles are able to access enterprise digital assets, such as film masters and original recordings?

Security policies and mechanisms for policy administration and enforcement

Enable application developers to discover and consume information services; information integration processes to locate, transform and combine information; end users to identify, characterize and interrelate relevant content; and enterprises to gather, organize, share, and analyze their knowledge of information resources

Aggregate manufacturing costs across multiple, loosely coupled systems for individual component design

Metadata management and knowledge management

Promote enterprise business viability by unifying corporate strategy, information retention and destruction, compliance with external regulations and internal business rules, opportunity discovery, and loss mitigation. Standardize and automate processes and policies; integrate technology across business systems and IT infrastructure.

Address all compliance and ethics requirements at a financial services company in a unified way, including risk assessment and incident investigation

Information governance, risk, and compliance (GRC), including privacy and information rights management (IRM)

Table 6. Grid operating environment

EIM requirement Example Enabling technologies

Provide scalable, high-performance, information ingestion, cleansing, standardization, transformation and reconciliation

Improve marketing to telecom customers by analyzing recent transaction details and adjusting offers and pricing in near real time

Ingestion and transformation capabilities engineered for grid operating environments (including cloud infrastructures)

Provide scalable, high performance, information analysis and mining

Support sophisticated reporting above multiple information sources to characterize customer segment migration patterns

Analysis and master data management engineered for grid operating environments

Access information sources through a high-performing service interface that virtualizes details and permits new applications to be constructed rapidly

Build a new application at an aerospace company to assess changes to total aircraft weight based on current design plans scattered throughout multiple systems

IaaS engineered for grid operating environments

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 22

Page 23: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Gaining traction—practical starting points for EIM Depending on your goals, use cases, business processes, and existing investments in IT infrastructure, there are multiple approaches to integrating and analyzing heterogeneous information. You certainly don’t have to quickly implement every component in the EIM stack—nor could you! Your most pressing concern can help narrow down the starting point for an enterprise information management solution. This might be cost reduction from automating manual processes for information integration, or revenue enhancement from the complete and timely information that leads to better decisions in a particular line of business. It might be the opportunities stemming from increased business agility in assembling and delivering new information combinations, or the assurance of compliance with stringent regulations. Or, a particular use case may be the driving force. It doesn’t help to invest in dazzling analytics tools until information integration is under control. Nor would it be prudent to focus on, say, SPARQL queries against RDF stores if the pressing business problem is to speed development of new business applications against a combination of existing information sources. My suggestion is to start with a pilot project and build on its success. Begin by integrating key information sources and add more over time. Strive for quality of integration, not quantity. Look for a combination of significant business value, demonstrable financial impact, and manageable complexity. But keep the end goal in mind, so that stepwise progress in architectural and process maturity will extend and scale to a broader solution. In particular, this means focusing on foundational capabilities for content management, master data management, information services for applications, information governance, and end-user access and analysis. Here are some practical starting points for EIM, grouped into five areas:

IT management • Use the EIM stack view as a catalyst for IT stakeholder discussions on the differences among existing

information management architectures, policies and processes, and on how to improve information integration and system interoperability

• Reconcile, manage, and distribute common reference data as an information service, using a scalable master data management (or customer data integration) system

• Deploy complex event processing to react to operational events—and head them off if desired—by automatically filtering, classifying, and correlating low-level events into significant patterns in near real time; orchestrate a rules-based response to the issue or opportunity discovered

• Unify information for data center management, through an indexing and search approach or a federated configuration management database

Governance, risk, and compliance • Improve security through integrated capabilities for authentication, authorization, and auditing across

enterprise information systems

• Introduce information classification and eDiscovery capabilities to improve understanding of information assets and their content for information lifecycle management and legal compliance

• Protect and control sensitive documents and e-mail inside and outside of the enterprise firewall through information rights management

• Address corporate information governance, risk, and compliance issues by unifying classifications, topologies, and events under a policy- and model-based approach spanning infrastructure management disciplines

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 23

Page 24: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

Content management • Expand a set of established, structured information sources to incorporate unstructured and semi-

structured information through content management, transformation, and integration capabilities

• Capture, process, deliver, and archive the paper and electronic documents that drive a mission-critical business process through transactional content management

• Enrich unstructured or semi-structured content for search by applying content analysis to tag, classify, and cluster it

• Deploy an enterprise search and query broker to unify searches over multiple sources

• Integrate multiple information sources through XML representations, which may be stored in an XML repository (or RDBMS) and queried using XQuery

Application support • Virtualize access to existing enterprise information to speed development of new business applications,

through an information service management layer, often called an information service bus or information as a service (IaaS)

• Access information in near real time through enterprise information integration’s (EII) federated queries to improve decision making in business intelligence applications

sts,

hile the big picture can be complex, it makes sense to begin by integrating information sources key to the

• Accelerate the processing of complex SQL queries by a columnar DBMS appliance to improve analysis speed and depth

• Develop or apply a domain-specific ontology to locate and reconcile content by interpreting

incompatible and overlapping vocabularies across information sources in areas such as bioinformatics, manufacturing, and anti-terrorism (may use an RDF or other semantic repository)

End-user capabilities Unify access to information in a• vertical area (such as health care) through an enterprise search-based application; include access to new external content sources

• Allow IT and end users to combine and visualize information quickly and serendipitously from varied sources through metadata-driven mashups (usually fed by information services), and to share these individualized mashup views

• Improve end-user capabilities to navigate, analyze, and visualize information through new front-end tools and applications

Conclusion What’s the bottom line? Enterprises need to integrate data and content from multiple sources to deliver consistent, timely, and meaningful information to their business processes. This is critical to reducing coincreasing corporate revenue, and improving business agility, while maintaining regulatory compliance. This white paper described a spectrum of information integration scenarios, and detailed the complex technical challenges they bring. EIM is a strategic combination of interoperating components and services that can meet these challenges, from near-real-time information access and semantically-driven content integration to information as a service. Wbusiness and to add more over time. Look for a combination of significant business value and manageable complexity. But keep the end goal in mind, so that stepwise progress in architecture will extend and scale to a broader solution. Depending on your business processes and existing investments in IT infrastructure, practical starting points for EIM design and implementation include foundational capabilities in the areas ofcontent management, master data management, information services for applications, information

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 24

Page 25: Enterprise Information Management: Information Virtualization for … · Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts

governance, and end-user access and analysis. New directions to consider are an XML repository, information service bus, enterprise search, mashups, query acceleration, a grid operating environment, information rights management.

an and

dge, helpful insights, cogent suggestions, and ongoing C

to

ki,

Acknowledgements I deeply appreciate the detailed knowleencouragement I received in discussions about enterprise information management with my EMcolleagues, and with EMC customers and partners. My heartfelt thanks to EMC CTO Jeff Nick, andRazmik Abnous, Glenn Arnold, David Black, Marc Brette, Ed Bueché, Pierre-Yves Chevalier, David Choy, Matt Coblentz, Jacques Conan, John Conte, Cornelia Davis, John Field, Patricia Florissi, Bob Goldsand, Steve Graham, Michael Hackney, Heather Healy, Kristen Hughes, Dan Hushon, Burt KalisUna Kearns, Roger Kilday, Leo Leung, Zongliang Li, Su Lim, Tom Maguire, Yves Mahe, Susan McKay, Wayne Pauley, Alex Rankov, Victor Spivak, Edgar St Pierre, Hans Timmerman, Kevin Walter, Fred Wild, and Tyler Yuniarto. Michelle Kerby and Mac McClelland contributed greatly as editors.

Enterprise Information Management: Information Virtualization for a Unified Business View Technology Concepts and Business Considerations 25


Recommended