1
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
IBM IT Training Services
IBM WebSphere Portal and Lotus Workplace technical symposiumSession Number: B0F2Session Title: Text Search and Portal IntegrationSpeaker's e-mail: [email protected]
Aya Soffer, Manager, Search Technologies Dept.
2
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
AgendaWebsphere Portal Search Engine (PSE)
Overview and Architecture
Main functions
Usage Examples and Planning Guidelines
Common Components: Lotus Workplace Search DemoInformation ResourcesQ & A
3
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
What is the Portal Search Engine? (PSE)
High level functional overview
Administrator: indexing / collecting content/documentso HTTP crawler o Indexer componento Text analysis functions (taxonomy, categorizer, language tools,
summarizer)o Simple workflow to control what and how gets indexed
End-user: searcho web-style searcho high precision relevance rankingo browse through the collection
4
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
General informationOriginally developed by IBM Research in Israel
Proven technology base with emphasis on search quality
Backed by the joint Research and Software group program – Institute for Search and Text Analysis
Fulltext search technology100% pure Java implementation
Suitable for server as well as client environments
Emphasis on highly accurate results - constantly benchmarking and evaluating via official forums such as TREC and INEX
internal interfaces allow for convenient integration in IBM products and solutions
o Rich set of APIs suitable for simple and complex implementationso Easy to customize and extend - Adapt ranking formulas, extend built-in
methods, add new document typesIBM strategic component
5
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Portal Search Engine – where used ....
Portal Search Engine portlet application:
Administer multiple indexes (collections), where each may include multiple sitesEnd-user search portlet for both handling search requests and browsing through the documents in the collection
Integrated with Portal Document Manager (PDM)
Integrated with Lotus Workplace 1.1
Integrated with WebSphere Portal Content Publisher (WPCP)
6
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
New key features with Websphere Portal Version 5
Taxonomies and categorizationA taxonomy is a hierarchical representation of a set of categoriesIt includes rules per category that are applied to a document through a categorizerTwo types of taxonomies available
o A pre-defined taxonomy allowing for simple manipulation (like renaming of categories and definition of new categories)
o A rules based taxonomy which can be built and defined by the userCategorization – process of assigning a document to category(-ies)
Summarizationthe top ‘3’ key sentences are extracted“the first ‘250’ characters of text” used for CJK and BiDi type languages
Document filtersSupports >250 document formatsTechnology wrapped into the ‘document conversion services’ (DCS) which add support for additional document formats
7
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Conceptual overview – Index build process
Metadata injectedinto original content
Approved set ofContent “In-basket”
1 2
ContentCrawlerFilter
Text analysisComponents:
•Categorizer•Summarization•Document filters
ApprovalWorkflow Indexer
Collection
8
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Creating a new document collection is easy
Create a new collection
Specify a web site to collect information/content from
Click on ‘Start collecting’ icon/text to initiate the index build process
Processing status and status of the index are shown at the bottom of the portlet, for:
the selected site
the selected collection (index)
Select the ‘Manage search collections’ portlet
9
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
A look at the Manage Collections portlet
10
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Manage Collections Portlet – Options and Status
Select ‘Portal Settings’ Manage Search Index
11
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
End user – Search portlet – detailed view
12
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced options
Portlet for defining a new collection
13
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced optionsPortlet for defining a new site
14
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced optionsPortlet for defining a schedule for periodic indexing of a site
15
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced optionsPortlet for defining filters for sites
16
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced options
Portlet for defining destination categories for the site
17
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced options‘Browse document’ portlet
18
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced options‘Search’ portlet
19
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced options‘Advanced search’ portlet
20
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Usage example
Goal: provide a community of users with information about competitors in the market
How: catalog information such as news articles and related information from external websites
Additional steps to take:
When creating a collection, select “User-defined” from the taxonomy pull-down
From the main administration portlet choose “Category tree” in the Manage Collections frame
21
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Category Tree portlet• Build the taxonomy tree• then go to ‘Manage Rules’ to define rules for each category
22
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
What the rule set looks like .....
• a ‘rule’ is essentially a search query one would use to find such specific documents• you can use ‘+’ and ‘-’ and ‘ “ ‘ and ‘*’ within the rule
23
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Last step – assign categories to each website
24
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Result: search and browse
25
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Planning numbers, performance
Index size informationapproximately 40% to 60% of the textual content size of indexed documents/pages
indexing throughputcrawling/indexing rate between 100 to 200 documents per minute
Search responsivenesstypically a search result page is completed and ready for transmission in less than 0.5 seconds
26
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Additional Information and Resources
IBM Resources:Websphere Portal - http://www-3.ibm.com/software/genservers/portal/
Websphere Portal Catalog: http://www-3.ibm.com/software/genservers/portal/portlet/catalog
Websphere Portal Developer’s Zonehttp://www-106.ibm.com/developerworks/websphere/zones/portal/
WebSphere Portal Toolkit -http://www-3.ibm.com/software/info1/websphere/index.jsp?tab=products/portaltoolkit
Documentation - http://www-3.ibm.com/software/genservers/portal/library/
Education - http://www-3.ibm.com/software/genservers/portal/education/
WebSphere Commerce Portal - http://www-3.ibm.com/software/genservers/commerce/portal/
IBM Lotus Workplacehttp://www.lotus.com/engine/jumpages.nsf/wdocs/ondemand