+ All Categories
Home > Documents > IBM Software Group | DB2 Information Management Software

IBM Software Group | DB2 Information Management Software

Date post: 26-Jan-2015
Category:
Upload: tess98
View: 120 times
Download: 1 times
Share this document with a friend
Description:
 
46
IBM Software Group 1 Nigel Freeman Content Discovery specialist - IBM Software Group [email protected] May 2006 Information is Everywhere Managing Information for Discovery and Search
Transcript

IBM Software Group

1

Nigel FreemanContent Discovery specialist - IBM Software Group

[email protected]

May 2006

Information is Everywhere Managing Information for Discovery and Search

IBM Software Group

2

Agenda

Too much information – drowning or swimming ?

IBM is going beyond mere ‘search’… IBM Content Discovery architecture

Content Integration services: making connections between existing systems Information Integration Content Edition – overview

Enterprise Search: not the same as Internet search What do you need from Enterprise Search and text analytics middleware?

OmniFind – overview

Text Analysis: - Unstructured Information Management Architecture UIMA

Contextual Delivery, Information Accelerators to generate customer solutions WebSphere Content Discovery Server – overview

IBM Content Discovery products, summary

Customer Examples

IBM Software Group

3

Drowning in information, or swimming? Organisations today are faced with an ever-growing abundance of information.

The lack of a proper systems to access and manage their collective wisdom can cripple an organisation - not being able to find the relevant information when it is needed or finding it too late translates into bad decisions, missed opportunities, wasting time and money reinventing information that already exists.

“It is clear that we are all drowning in a sea of information. The challenge is to learn to swim in that sea, rather than drown in it.”- from a study by University of California, Berkeley School of Information Management and Systems

By implementing cutting-edge systems for organizing and accessing information, organisations will promote growth at significantly reduced cost to today’s enterprise.

“ An enterprise with 1,000 knowledge workers wastes $48,000 per week – $2.5 million per year – due to an inability to locate and retrieve information.” The High Cost of Not Finding Information, IDC

IBM w3 advertisement “w3 personalisation…”

IBM Software Group

4

Information is isolated in multiple silos …

Independent Systems

Customer Service

Council Tax

Social Services Education

Leisure Services

Planning Housing

The problem…

IBM Software Group

5

… and the vast majority is unstructured

• Office Documents• Images• Web pages• E-mail• Audio & Video• Free-form text fields

(comments/notes)

• File servers• Websites • Portals• ECM systems• Collaborative systems • Databases (BLOBs and

free-form text fields)

Examples Where It Exists

IBM Software Group

6

Typical search experience is not good enough

“Loan”

I need help finding a loan for college

Typical Online Experience

Burden of discovery is on the end user!

IBM Software Group

7

There is inherent tension between business and IT

Line-of-Business Owners and Project Leads Must deliver information to

their specific customers, partners and employees to facilitate business process

Care most about best of breed functionality and direct control over the end user experience

IT Architects and CIOs Must make information

available from across the enterprise in a secure and standard format

Care most about achieving leverage and reuse, with a low total cost of ownership

Search App 1 Search App 2 Search App 3

Enterprise Search Infrastructure

IBM Software Group

8

The IBM Approach: Content Discovery

Information is isolated in multiple silos

Native, bi-directional access ensures all assets are available and content can be continually improved

Much of it is unstructured, limiting its use

Uncovering the inherent meaning of unstructured content can enhance search relevance, giving new levels of business insight

Traditional search is a bottleneck to facilitating action

Understanding user intent and application context allows organizations to get the right information to the right people at the right time

IT wants standards but business wants control

Complete solutions built on a Service Oriented Architecture allow organisations to balance the needs of business and IT

Going Beyond “Search” to “Find”

IBM Software Group

9

Content Discovery

Analysis & Discovery Services

IBM Content Discovery Architecture

Content Integration Services

Information Accelerators

Search & Indexing

Text Analysis (UIMA)

Contextual Delivery

Extract knowledge and meaning, for greater relevance and insight

Industry vocabularies and solution templates shorten deployment time

Broad content access and native integration for secure

read and write access

Scalable search capability with sophisticated indexing and retrieval

Understand user intent and context, to guide action and

navigate large result sets

IBM Software Group

10

Content Discovery

Analysis & Discovery Services

Content Integration Services

Information Accelerators

Search & Indexing

Text Analysis (UIMA)

Contextual Delivery

IBM Software Group

11

The Problem: Multiple Silos of Content

36%

14%

25%

17%

1 repository5%

2-5 repositories

6-10 repositories10-15 repositories

4%

More than 15 repositories

Don't know

Survey base: 81 North American decision-makers(multiple responses accepted)

“The Future of Content in the Enterprise,” Connie Moore and Robert Markham

IBM Software Group

12

WebSphere II Content Edition

SOA, enterprise-class integration architecture for “content”

Single interface to multiple content sources and workflow systems

Many “out of the box” connectors and toolkit for custom connectors

Two-way access to expose underlying functionality

Adds cross-repository services such as federated search, event services, single sign-on, etc

“Out of the box”client, development components and APIs for building custom applications

CALL CENTER COMPLIANCESELF-SERVICECRM WEBSITES

Lets you work with content from multiple disparate content sources -

as if it were stored in one unified system

IBM Software Group

13

Display associated metadata with the ability to preview a document and update content or properties

Provide a single point of access to all documents associated with the customer, regardless of where they are stored

Content Integration ServicesSeamless Access to Distributed Content from Business Applications

IBM Software Group

14

WebSphere II Content Edition Integration Services Many Out-of-the-Box Connectors

Pre-built and fully supported real-time, bi-directional connectors

Exposes content, workflow and functionality of underlying systems

Available for most major commercial systems, including…

Connector SDK for custom systems

INTEGRATION SERVICES

Documentum Content Server, FileNet Content Services, FileNet Image Services, FileNet P8 Content Manager, FileNet P8 Business Process Manager, Hummingbird DM, IBM Content Manager, IBM Content Manager OnDemand, IBM Portal Document Manager, Lotus Domino Document Manager, IBM Lotus Notes, IBM WebSphere MQ Workflow, Interwoven Teamsite Content Server, Microsoft Index Server, OpenText Livelink Enterprise Server, Stellent Content Server, File Systems, Lab Services, Partner Connectors

IBM Software Group

15

WebSphere II Content Edition Federation Services Meta Data Mapping

Common schema across different systems

Federated Search Single search interface across multiple disparate systems

Virtual Repository Single, unified view of distributed content Consolidated view of work tasks from multiple workflow systems

Subscription Event Services Subscription-based notification of changes to content, across

multiple repositories

View Services Convert content on-the-fly to browser-readable formats (eg PDF,

HTML)

Single Sign-On (SSO) authentication Native and integration with LDAP and Active Directory

INTEGRATION SERVICES

FEDERATION SERVICES

IBM Software Group

16

WebSphere II Content Edition Developer Services

Federated Client Complete out-of-the-box UI for working with distributed content

Includes key functionality and a highly usable interface

Web Components Accelerates time to market for custom applications

Development components plug into web applications

Completely customizable look and feel

Includes JSR 168 compliant portlets

WebSphere II Content Edition API Complete access to content and workflow functionality

Easy to use Java API and SOAP-based Web Services API

INTEGRATION SERVICES

DEVELOPER SERVICES

FEDERATION SERVICES

IBM Software Group

17

IBM Federated Records Management

Consists of IBM DB2 Records Manager, WebSphere II

Content Edition, FRM Solution Components*

Key Features Central policy mgmt on distributed content

“Touchless” records declaration

Federated search for discovery operations

Two-way, consistent UI to content systems

…the application of records management to distributed content

Business Value Reduce risk with centralized RM policies

Accelerate time to compliance

Reduce discovery costs

Consolidate over a phased timeframe

Provide a “future proof” infrastructure

1

DCTM FILE OTEX HUMC

… Other Content Repositories …

DB2 Records Manager

2

DB2Content Manager

DB2Content Manager

Leave records in native repository

Move records to strategic repository at declaration

*Services Offering

IBM Software Group

18

Content Discovery

Analysis & Discovery Services

Content Integration Services

Information Accelerators

Search & Indexing

Text AnalysisContextual

Delivery

IBM Software Group

19

OmniFind: it’s not Google… …because Intranet Search is different from Internet Search

Corporate intranets are smaller … but it’s more difficult to return highly relevant resultsLess content in a corporate intranet … lower chance for perfectly

matching document

Less well linked – fewer links and anchor text cues – so Page Ranking isn’t the answer

The heterogeneous nature (both in form and size) makes search precision difficult

IBM Software Group

20

Q26: For which solutions do you plan to keep your existing tool, and for which would you like the portal to provide?

* Base = Those with portal solutions implemented, planned or under evaluation.

Intend to keep existing tool

Would like Portal to provide

Search 32% 68%

Content management 39% 61%

Reporting 40% 60%

Authentication/single sign on 41% 59%

Process automation/workflow 42% 59%

Collaboration 43% 57%

Directory 43% 57%

Enterprise application integration (EAI) 46% 54%

Taxonomy 52% 48%

Activity Tracking 60% 41%

Application server 63% 37%

Desktop productivity (spreadsheet, word processing, etc.) 68% 32%

Windows desktop 79% 21%

Search and content management are the top two capabilities expected by 289 Portal customers

Reference: Enterprise Portal Purchase and Usage Characteristics, Final Report, META Group Multi-Client Study, November 2003

IBM Software Group

21

WebSphere II OmniFind Edition

Crawl Index Search

Excellent search quality

Complements and uses IBM’s offerings in portal, content management, and Information Integration

Crawls a broad range of enterprise data sources

Leverages systems’ own security mechanisms

Open architecture (UIMA) for text analytics and semantic queries

Rich multilingual capabilities

Keyword

search

Semantic

search

Text

analysis

IBM Software Group

22

Key Technologies

Crawling Scalable Web crawler Data Source crawlers Custom Crawlers

Parsing/Tokenizing

HTML / XML 200+ Doc Filters Advanced Linguistics

SearchApplications

Categorization (optional)

Dynamic & Admin-influenced ranking Fielded Search Parametric Search Semantic search

Searching

Text Analytics Partner Apps UIMA

Indexing

Global Analysis Static Ranking Store

Security

Sources of

EnterpriseContent

Sources of

EnterpriseContent

IBM Software Group

23

OmniFind Crawlers Web content

HTTP / HTTPS

News groups (NNTP)

WebSphere Portal portlets and Portal Document Manager

Collaboration Lotus Notes /Domino databases, Domino.Doc, QuickPlace

MS Exchange public folders

Windows and Unix File systems - over 250 file formats: PDF, MS Word / Excel / Powerpoint, Lotus SmartSuite, etc etc

Enterprise Content Management systems DB2 Content Manager

via WebSphere Information Integrator Content Edition: FileNet Content Services, FileNet P8, Documentum, Hummingbird DM, OpenText LiveLink and more in future

Relational Data sources DB2 family (DB2, Informix, DB2 for z/OS)

WS Information Integrator relational data sources (Oracle, Informix, MS SQL Server, Sybase)

Federated access to LDAP and JDBC

Data Listener API for Custom crawlers

IIStandard Edition

Content Manager

QuickPlaceDomino

Domino.doc

MS Exchange

Windows FileSystem

Unix File System

Websites

Newsgroups

Data Listener

II Content Edition

SQL Server

IBM Software Group

24

OmniFind Security

Security can be set at Collection level or Document level

OmniFind uses the application’s own security for Access-Control Lists for the following data sources: Lotus Notes / Domino

Domino Document Manager

QuickPlace

WebSphere Portal Document Manager

Portal pages

FileNet CS

Windows File System

Documentum

IBM Software Group

25

Linguistic Support The document language is detected automatically and used for language-specific result filtering at

search time. Language-specific base form computation (eg “mouse” for “mice”) is provided.

Automatic language detection also works for Arabic, Hebrew, Hungarian and Turkish (but no base form support yet).

Basic Support Text is segmented using either white space information (for simple text languages) or n-grams (for

complex text languages). If simple and complex script languages are mixed in one document, the best segmentation strategy

(either white space or n-gram) is selected for each individual script range within the document. Basic support processing should work for all languages. No language limitation is built into OmniFind. IBM tests basic support for the following list of languages:

Simple Text Languages (STL)

Albanian, Bulgarian, Belarusian, Catalan, Croatian, Estonian, Hungarian, Icelandic, Indonesian, Kazakh, Latvian, Lithuanian, Macedonian, Malay, Romanian, Serbian (Cyrillic & Latin), Slovak, Slovenian, Turkish, Ukrainian

Complex Text Languages (CTL)

Arabic, Bengali, Gujarati, Hebrew, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu, Thai, Vietnamese

Language Support in OmniFind

OmniFind has Linguistic support for: Chinese (Simplified & Traditional), Czech, Danish, Dutch, English, Finnish, French, German, Greek, Italian, Japanese, Korean, Norwegian (Bokmal & Nynorsk), Polish, Portuguese, Portuguese, Russian, Spanish, Swedish

IBM Software Group

26

Search & Indexing ServicesSimple “Google” Style Search for Enterprise Content

Out-of-the-box search application provides “Google”-style results list with paging

• relevancy ranking, date, field values• site collapse• customizable look and feel

Configurable ‘Quick links’ provide immediate access to predetermined relevant sites, documents or applications

Broad support for searching across enterprise content sources

“Did you mean?” synonym expansion provides one click access to other potentially relevant queries or can be used for spelling correction

IBM Software Group

27

Content Discovery

Analysis & Discovery Services

Content Integration Services

Information Accelerators

Search & Indexing

Text Analysis (UIMA)

Contextual Delivery

Unstructured Information Management

Architecture (UIMA)

IBM Software Group

28

Most BI implementations ignore knowledge buried within free form text They can only report on predefined structured data, such as problem codes…

Problem descriptions, technician comments, call center notes and customer correspondence can contain a lot of the supporting details required for true insights

Text Analysis ServicesLeveraging Knowledge Buried in Unstructured Information

IBM Software Group

29

Text Analysis ServicesExtract Knowledge From Unstructured Information Identify concepts, entities and facts buried in unstructured content

Determine underlying issues or problems, parts referenced and actions from technician or customer service notes, customer surveys, consumer review sites and other sources

PART 1: Fuel PumpPART 2: Fuel FilterPART 3: Wiring HarnessPART 4: Wiring Harness Cover

PROBLEM 1: Corrosion PART 3: Wiring Harness

ACTION 1: Replace PART 1: Fuel Pump PART 2: Fuel Filter

ACTION 2: Remove PART 4: Wiring Harness Cover

Extracted knowledge can now be sent to a search engine, database or delivered as a service to rules processing engines and other business applications

Provide broader access through more simplified search and browse interfaces

IBM Software Group

30

Report on facts extracted from unstructured information Show other parts referenced, underlying root problems or issues,

and actions taken…

Create alerts to be notified of specified findings or thresholds

Provide simplified search interface extending access to broader set of users Easily find information about claims involving a fuel pump…

See all of the other parts, problems and actions referenced in the warranty claim

Text Analysis ServicesLeveraging Knowledge Buried in Unstructured Information

IBM Software Group

31

Iden

tify

Lang

uage

Fin

d W

ords

& R

oots

Cat

egor

izat

ion

Plu

g In

Ann

otat

or

Plu

g In

Ann

otat

or

ExtractedMetadataand Facts

Text Data Warehouse

RulesEngine

...any Application

Search Application

Reports

Search Index

WebSphere II OmniFind Edition

Plu

g In

Ann

otat

or

Plu

g In

Ann

otat

or

UIMA UIMA: Unstructured Information Management Architecture: a “plug and

play” framework for advanced text analysis components

UIMA framework allows “Annotators” to add value to text find words specific to an industry, from dictionary or by rules

add further information around these terms, like Latitude/Longitude for places

allow Indexed and annotated results to go to other processes / systems as well as to a Search Engine, for further analysis or semantic search

IBM Software Group

32

Content Discovery

Analysis & Discovery Services

Content Integration Services

Information Accelerators

Search & Indexing

Text AnalysisContextual

Delivery

WebSphere Content Discovery Server

(iPhrase)

WCDS demo on-screen “WCDS Self Service demo.exe”

IBM Software Group

33

Embed Rich HTML responses within

result

Interactive promotion

guides action

Understands user intent and provides actionable response

WebSphere Content Discovery for Self Service

IBM Software Group

34

Contextual Delivery ServicesIntegration into Contact Centres facilitates faster Problem Resolution

Launch query for possible resolutions directly from Siebel Call Center…

…leverage context and customer info to automatically find most relevant content

Return integration enables creation of new solutions based on findings

Enable agents to easily filter content by source, product and other attributes

IBM Software Group

35

Empower business managers to easily refine the end-user experience

Monitor end-user behavior and effectiveness of business rules

Contextual Delivery ServicesBusiness User Control

IBM Software Group

36

IBM Product Offerings

Integrating Content from Multiple Sources into

Business Applications

WebSphereContentEdition

WebSphereOmniFind

Edition

WebSphereContent Discovery

Server

Infrastructure for Enterprise Search and

Text Analytics

Business Driven Search Applications

Contextual Delivery

Search & Indexing

Text Analytics

Content Integration

IBM Software Group

37

Customer Examples

Content Discovery

Analysis & Discovery Services

Content Integration Services

Information Accelerators

Search & Indexing

Text Analysis (UIMA)

Contextual Delivery

IBM Software Group

38

Growth through AcquisitionChallenge

Wachovia improved business effectiveness and addressed compliance issues by providing integrated view of all content

Access and work with content from multiple repositories following mergers

Deliver repository independent customer service, brokerage and workflow applications

Benefits

Greater accessibility resulted in 50-fold increase in number of content retrievals

$2.3 million savings within 2 years for a 64% return on initial investment

$1 million savings for each additional business unit implementing content integration services

Business executives have immediate access to newly acquired systems

Content Integration

IBM Software Group

39

Challenge

IFPMA makes it easier for doctors and patients to research clinical trial information worldwide

Doctors and patients need to find info about all clinical trials sponsored by the pharmaceutical industry

Unstructured information from multiple companies and clinical trials registries

Benefits

Enables searching by disease area, medicine name or trial location

Recognizes medical and geographical synonyms across multiple languages, without manual indexing

Allows doctors and patients to find trials they can join and review summarized results

Search & Indexing

Text Analytics

IBM Software Group

40

Challenge

CBI Engineering increased productivity by allowing employees to access Lotus Notes from their intranet search solution

Need for improved search relevancy across file system and Lotus Notes to make engineers more productive

Must respect security already defined within Lotus Notes

Benefits Common search framework for intranet, file system and

Lotus Notes content

Engineers able to seamlessly access native Notes documents from intranet search results

Allowed CBI to provide broad content access while honoring stringent native repository security

Search & Indexing

IBM Software Group

41

Challenge

IBM Workplace for Customer Support (Lotus Premium Support) increased customer satisfaction and productivity with Content Discovery

Revitalize customer interest in using lower cost online support channel

Streamline customer self-sufficiency while continuing to deliver personalized service from IBM support staff

Benefits Increased customer satisfaction through the delivery of relevant information in 3 clicks

or less

Unified content from disparate repositories to simplify problem resolution

Enabled resolution of repetitive product problems in less than five minutes

Decreased number of problem management reports submitted

Personalization enables results to be automatically limited to customer owned products

Customers can escalate and preserve context

Enables searching across multiple content stores and easy user navigation

Contextual Delivery

IBM Software Group

42

Summary

Getting the right information to the right people at the right time is a key element of achieving Information On Demand

IBM is building this capability around a portfolio of Content Integration

Text Analytics

Search & Indexing

Contextual Delivery

Information Accelerators

IBM Content Discovery brings these capabilities together to help organizations drive measurable results for their business

IBM Software Group

43

Thank You

Any questions ?

IBM Software Group

44

The IBM Content Discovery software portfolio

WebSphere ContentDiscovery Server

WebSphere IIOmniFind Edition

WebSphere IIContent Edition

Allows organizations to …

Quickly deploy business driven solutions that increase revenue and reduce support costs

Records Management

M&A Content Migration

Byproviding …

Example initiatives

A rich understanding of user intent and application context to help people quickly find the information they need to make purchases, answer questions, and solve problems

Implement a single search architecture to underpin enterprise portal and BI initiatives

Robust enterprise search capabilities and a text analytics foundation able to uncover the inherent meaning of large volumes of content from around the globe

Manage, leverage and extend their enterprise content without painful ripping and replacing

Virtual access to dozens of content silos via a single interface to increase productivity, manage risk, and lower development costs

Issues Analytics

Intranet Search

eCommerce

Self-Service websites

IBM Software Group

45

OmniFind - Linguistic Analysis

Linguistic processing when adding document to index Determines language of document

Tokenizes text

Creates index using tokens

Linguistic processing performed during search Query string segmented, analyzed, searched in index

Stop word removal – removing “a”, “the”, etc.

Character normalization Normalization performed in Unicode

Case normalization – finding documents with “USA” when searching with “usa”

Umlaut normalization – finding documents with “shoen” when searching with “schön”

Accent removal – finding documents with “é” when searching for “e”

Other diacritics removal – finding documents with “ç” when searching for “c”

Ligature expansion – finding documents with “Æ” when searching for “ae”

Normalization works in both directions

IBM Software Group

46

OmniFind - Linguistic Analysis

Recognize documents in a wide range of languages: Arabic, Chinese (traditional and simplified), Czech, Danish, Dutch, English, Finnish,

French, German, Greek, Hebrew, Hungarian, Italian, Japanese, Korean, Polish, Portuguese (Brazilian), Russian, Spanish, Swedish, Turkish

Dictionary-based linguistic support for documents in recognized languages Word segmentation Stemming, find “mice” when searching for “mouse” Break contractions into parts, make “wouldn’t” into “would” and “not” Clitics, a form of contractions, make “l’avenue” into “le” and “avenue” Recognize non-alphabetic characters as part of or separate from a lexical unit, e.g.,

URLs, dates Recognize abbreviations Recognize end of sentence for sentence segmentation

Basic support for documents not in a recognized language Word segmentation via white space or blanks, and, n-gram segmentation


Recommended