+ All Categories
Home > Documents > Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... ·...

Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... ·...

Date post: 12-Mar-2020
Category:
Upload: others
View: 13 times
Download: 0 times
Share this document with a friend
96
Cover Page Verity Integration Guide 10g Release 3 (10.1.3.3.1) May 2007
Transcript
Page 1: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Cover Page

Verity Integration Guide10g Release 3 (10.1.3.3.1)

May 2007

Page 2: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Verity Integration Guide, 10g Release 3 (10.1.3.3.1)Copyright © 2007, Oracle. All rights reserved.

Contributing Author: Karen Johnson

Contributors: Sebastian Celis, Samuel White

The Programs (which include both the software and documentation) contain proprietary information; they are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright, patent, and other intellectual and industrial property laws. Reverse engineering, disassembly, or decompilation of the Programs, except to the extent required to obtain interoperability with other independently created software or as specified by law, is prohibited.

The information contained in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. This document is not warranted to be error-free. Except as may be expressly permitted in your license agreement for these Programs, no part of these Programs may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose.

If the Programs are delivered to the United States Government or anyone licensing or using the Programs on behalf of the United States Government, the following notice is applicable:

U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the Programs, including documentation and technical data, shall be subject to the licensing restrictions set forth in the applicable Oracle license agreement, and, to the extent applicable, the additional rights set forth in FAR 52.227-19, Commercial Computer Software--Restricted Rights (June 1987). Oracle USA, Inc., 500 Oracle Parkway, Redwood City, CA 94065.

The Programs are not intended for use in any nuclear, aviation, mass transit, medical, or other inherently dangerous applications. It shall be the licensee's responsibility to take all appropriate fail-safe, backup, redundancy and other measures to ensure the safe use of such applications if the Programs are used for such purposes, and we disclaim liability for any damages caused by such use of the Programs.

Oracle, JD Edwards, PeopleSoft, and Siebel are registered trademarks of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.

The Programs may provide links to Web sites and access to content, products, and services from third parties. Oracle is not responsible for the availability of, or any content provided on, third-party Web sites. You bear all risks associated with the use of such content. If you choose to purchase any products or services from a third party, the relationship is directly between you and the third party. Oracle is not responsible for: (a) the quality of third-party products or services; or (b) fulfilling any of the terms of the agreement with the third party, including delivery of products or services and warranty obligations related to purchased products or services. Oracle is not responsible for any loss or damage of any sort that you may incur from dealing with any third party.

Page 3: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

T a b l e o f C o n t e n t s

Chapter 1: IntroductionOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-1

About Verity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-1Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-2Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-2

About This Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-3

Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-3

Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-4

Chapter 2: Installation and ConfigurationOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-1

Pre-Installation Tasks and Considerations. . . . . . . . . . . . . . . . . . . . . . . . . .2-2VDK6 Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-2Checking the Verity Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3

Installation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3

Installing Content Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-4

Installing Verity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-4

Upgrading Verity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-6

Additional Configuration Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-7Configuring for the Universal Locale . . . . . . . . . . . . . . . . . . . . . . . . . . .2-7Configuring for a Double Byte Language. . . . . . . . . . . . . . . . . . . . . . . .2-8SearchIndexerEngineName Setting . . . . . . . . . . . . . . . . . . . . . . . . . . .2-8Transitioning Between Search Indexer Engines . . . . . . . . . . . . . . . . . .2-9

Updating or Rebuilding the Search Index . . . . . . . . . . . . . . . . . . . . . . . . . .2-9

Uninstalling the Verity Component. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-10

Verity Integration Guide iii

Page 4: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Table of Contents

Chapter 3: Searching With Verity IntegrationOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1

Searching Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-2About Metadata Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-2Performing a Metadata Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-2Metadata Search Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3Metadata Search Case Sensitivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-4Metadata Wildcards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5

Searching Full-Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6About Full-Text Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6Performing a Full-Text Search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7Full-Text Search Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7Full-Text Search Case Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-8Full-Text Wildcards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-8Full-Text Search Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-9

Search Pages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11Quick Search Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-12Home Page Search Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-12Search Results Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-13

Chapter 4: Configuring MetadataOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-1

Configuring Metadata for Optimal Verity Searching. . . . . . . . . . . . . . . . . . .4-2Using Individual Data Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2Housekeeping of Verity Part Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2Setting up Individual Verity Data Tables . . . . . . . . . . . . . . . . . . . . . . . .4-3Zone Indexing a Metadata Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3Setting Zone Searchable Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4

Zone Searching Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-5

Define Filter Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-6Update Database Design Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7Advanced Search Design Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-9Advanced Options for FieldName Screen . . . . . . . . . . . . . . . . . . . . . .4-10

Chapter 5: IndexingOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-1

Full-Text Indexing with Verity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-2About Full-Text Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-2

iv Verity Integration Guide

Page 5: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Table of Contents

Supported File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-3Date Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-3

Customizing Indexing and Searching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4Setting Accent Insensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4Setting Tokenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4Customizing Verity for PDF Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-6Indexing Structured Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-8

XML Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-8HTML Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-9SGML Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-9Internet Message Format Documents . . . . . . . . . . . . . . . . . . . . . . .5-9

About Writing Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-9Basic Verity Query Script. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-9Basic Verity Script Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-11Verity and Query Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-11

Understanding Verity Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-11About Verity Collections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-11Verity Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-12

Partition File Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-12Merging Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-13

Limiting the Verity Search Index on UNIX . . . . . . . . . . . . . . . . . . . . . .5-14Example: Open File Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-15

Performance Tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-15

Chapter 6: International ConfigurationOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-1

Verity Locales. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-1

Supported Verity Locales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-2

Setting the Verity Locale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-4

Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-5

Appendix A: TroubleshootingOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1

Accessing Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1

Console Server Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2

Console Output Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2

Search Engine Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3Manually Rebuilding the Verity Index . . . . . . . . . . . . . . . . . . . . . . . . . A-4

Verity Integration Guide v

Page 6: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Table of Contents

Search Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6Accented Letters Are Not Found . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6Documents With Asian Characters Not Found . . . . . . . . . . . . . . . . . . A-7Text in PDF Files Cannot Not Found. . . . . . . . . . . . . . . . . . . . . . . . . . A-7Microsoft Word Documents with Embedded Links . . . . . . . . . . . . . . . A-8

Appendix B: Third Party LicensesOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1

Apache Software License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1

W3C® Software Notice and License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2

Zlib License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3

General BSD License. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4

General MIT License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5

Unicode License. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5

Miscellaneous Attributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-7

Glossary

Index

vi Verity Integration Guide

Page 7: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

C h a p t e r

1.INTRODUCTION

OVERVIEWThis section contains the following topics:

About Verity (page 1-1)

Features (page 1-2)

Considerations (page 1-2)

About This Guide (page 1-3)

Audience (page 1-3)

Conventions (page 1-4)

ABOUT VERITYThe Verity VDK6 component provides an alternative content search and retrieval solution when installed with Content Server. Verity provides metadata and full-text indexing and search capabilities, which means every word in a file is indexed, not only its metadata, and all the information can be searched.

Verity uses the Verity Developer Kit (VDK) to provide metadata and full-text search capabilities. When content is checked in to Content Server, the content and metadata are passed through the embedded search engine and are indexed.

Verity Integration Guide 1-1

Page 8: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Introduction

When users search for content by metadata or keywords, a query is issued against the search index, not the database. The results can be sorted by any of the metadata fields or based on a relevancy score assigned by the search engine.

There are a number of search engine locales for Verity, which are used to make sure special characters in text are indexed correctly. Verity locales include a number of Western European languages (for example, English, German, French, Spanish, and so forth) as well as for Asian languages (for example, Japan, Korea).

FeaturesVerity includes the following features:

Automated indexing

Notification after document completes indexing

Search caching available

Metadata searching available

Ranges can be specified for returned answers on search results

Dynamic drill down available but not exposed

PDF highlighting of results available

Spell checking and synonym matching is available but not exposed

Zone and scope searching is available

ConsiderationsThe following items should be considered when using Verity:

Index rebuilds are required when metadata changes

Index must be on the same server as Content Server (a benefit or consideration, depending on infrastructure)

Minimum system requirements for installing Verity on a supported operating system platform that is running Content Server are:

• 512 MB RAM

• 3 GB disk space

1-2 Verity Integration Guide

Page 9: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Introduction

• 800 MHz or higher processor

• 400 MB of space designated for /tmp or /var/tmp (UNIX), or user-defined temporary directory (Windows)

ABOUT THIS GUIDEThis guide includes the following chapters:

This chapter provides general information about Verity and the content of this guide. This information is intended for all audiences.

Chapter 2 (Installation and Configuration) provides installation and configuration information. This information is intended for administrators and system integrators.

Chapter 3 (Searching With Verity Integration) provides information on how to use Verity search functions. This information is intended for all users.

Chapter 4 (Configuring Metadata) provides information on how to manage repository content and metadata with Verity. This information is intended for administrators.

Chapter 5 (Indexing) provides information on configuring system settings for Verity. This information is intended for system integrators and administrators.

Chapter 6 (International Configuration) provides information on languages supported by Verity and on setting Verity locales. This information is intended for system integrators and administrators.

Appendix A (Troubleshooting) provides information on troubleshooting problems with Verity indexing and searching. This information is provided for system integrators and administrators.

AUDIENCEThis guide is intended to address the needs of system integrators, administrators, and users for planning, installing, configuring, optimizing, and using Verity on Content Server 10g Release 3 (10.1.3.3.1).

Note: For additional information about requirements for Verity locales and gateways, see the Verity Locales Release Notes, the Verity Gateways Release Notes, and other related documents available from www.autonomy.com.

Verity Integration Guide 1-3

Page 10: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Introduction

CONVENTIONSThe following conventions are used throughout this guide:

The notation <Install_Dir>/ is used to refer to the location on your system where the content server instance is installed.

Forward slashes (/) are used to separate the directory levels in a path name. A forward slash will always appear after the end of a directory name.

Notes, technical tips, important notices, and cautions use these conventions:

Symbols Description

This is a note. It is used to bring special attention to information.

This is a technical tip. It is used to identify information that can be used to make your tasks easier.

This is an important notice. It is used to identify a required step or required information.

This is a caution. It is used to identify information that might cause loss of data or serious system problems.

1-4 Verity Integration Guide

Page 11: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

C h a p t e r

2.INSTALLATION ANDCONFIGURATION

OVERVIEWThis section covers the following topics:

ConceptsPre-Installation Tasks and Considerations (page 2-2)

Installation Overview (page 2-3)

Additional Configuration Tasks (page 2-7)

TasksInstalling Content Server (page 2-4)

Installing Verity (page 2-4)

Upgrading Verity (page 2-6)

Configuring for the Universal Locale (page 2-7)

Configuring for a Double Byte Language (page 2-8)

Transitioning Between Search Indexer Engines (page 2-9)

Updating or Rebuilding the Search Index (page 2-9)

Uninstalling the Verity Component (page 2-10)

Verity Integration Guide 2-1

Page 12: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Installation and Configuration

PRE-INSTALLATION TASKS AND CONSIDERATIONSPlease note the following pre-installation tasks and considerations for Verity:

The VDK6 component must be installed on the same system as Content Server.

The VDK6 component includes Verity 6.1.2.

If you have been using Verity on an earlier version of Content Server, you can continue to use VDK 4.5 or VDK 5.5 with Content Server 10gR3.

If you upgrade Verity from an earlier release that used VDK 4.5 or VDK 5.5, the search engine that was being used in the earlier release will continue to be used unless you configure the Content Server to use VDK6.

If you choose to continue using VDK 4.5 or VDK 5.5 with Content Server version 10g Release 3 (10.1.3.3.1), consider that it is advantageous to upgrade to VDK6 because not only does it include new features and increased platform support, but VDK 4.5 and Internet Explorer 7 are incompatible. If you wish to run VDK and Internet Explorer 7 on the same machine, it is recommended that you upgrade to VDK6.

VDK6 ComponentThe distribution media contains the VDK6 component in a .zip file. The component file contains the following .zip files. Which file to install will depend on the operating system on which your Content Server is installed and your Verity language locale requirements. For information about Verity locales see Chapter 6 (International Configuration).

vdk6win32 Microsoft Windows operating system

vdk6solaris Sun Solaris operating system

vdk6linux Linux operating system

vdk6hpux HP operating system

vdk6aix IBM operating system

vdk6asian Asian language package

vdk6german German language package

2-2 Verity Integration Guide

Page 13: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Installation and Configuration

Checking the Verity VersionSeveral methods exist for administrators to check the version of Verity running on a Content Server instance:

To check the difference between VDK 4.5.1, 5.5, and 6.1 currently running with a particular Content Server instance, use the following procedure:

1. Launch a web browser and log on to Content Server as a user with administrator rights.

2. Check a new document into the content server, or (temporarily) update the metadata of an existing document (for example, its title).

3. Go to the Administration page.

4. Click Admin Server.

5. Click the appropriate server button to the right of the “start–stop–restart” icons.

6. In the options menu on the left, click View Server Output.

7. The output will show a number of lines that start with the word ‘indexer’. One of these lines will contain the version of the Verity engine used.

Check the version string of the VDK6 component on Content Server by selecting Administration—Configuration, then clicking Enabled Component Details.

Check the VDK version in the <Install_Dir>/custom/VDK6/vdk/common/patchinfo_vdk.txt file.

INSTALLATION OVERVIEWAfter completing the pre-installation tasks, installation and configuration of Verity consists of the following tasks:

Installing Content Server (page 2-4)

Installing Verity (page 2-4)

Additional Configuration Tasks (page 2-7)

Updating or Rebuilding the Search Index (page 2-9)

Note: Indexer tracing must be turned on for this procedure to function.

Verity Integration Guide 2-3

Page 14: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Installation and Configuration

INSTALLING CONTENT SERVERInstall Content Server version 10g Release 3 (10.1.3.3.1) before installing the VDK6 component.

Refer to Content Server Installation Guide for Microsoft Windows or Content Server Installation Guide for UNIX for instructions.

INSTALLING VERITYAfter successfully installing Content Server, please review the application Pre-Installation Tasks and Considerations (page 2-2) before installing the Verity component.

Component Manager1. Open a new browser window and log in to Content Server as a system administrator.

2. Go to the Administration applets page and click Admin Server.

3. On the Admin Server page click the button of the content server instance on which to install the component.

The status page for the content server instance is displayed.

4. In the option list for the server instance, click Component Manager.

The Component Manager page is displayed.

5. Click Browse.

6. Navigate to the appropriate VDK6 component .zip file, and select it.

7. Click Open.

The path for the component is displayed in the Install New Component field.

8. Click Install.

A list of component items that will be installed is displayed.

9. Click Continue.

A message is displayed that asks if you want to immediately enable the VDK6 component or return to the Component Manager.

2-4 Verity Integration Guide

Page 15: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Installation and Configuration

10. Click the option to enable the VDK6 component.

The component is enabled and the Component Manager page is displayed.

11. Restart the content server.

12. In a text editor, open Content Server’s /config/config.cfg file:<Install_Dir>/config/config.cfg

13. Set VerityLocale=<language> where <language> is a supported Verity language.

For information on supported language locales see Chapter 6 (International Configuration). For information on specifying the universal locale or a double byte language see Additional Configuration Tasks (page 2-7).

14. Save and close the config.cfg file.

15. Restart Content Server.

16. Rebuild the search index.

Component Wizard1. Start the Component Wizard:

In Windows: choose Start—Programs—Stellent Content Server—<instance>—Utilities—Component Wizard.

In UNIX, navigate to the <Install_Dir>/bin directory. At the command prompt, type ComponentWizard.

The Component Wizard main screen and the Component List screen are displayed.

2. Click Install.

The Install dialog is displayed.

3. Click Select and navigate to the appropriate VDK6 component .zip file.

4. Double-click the zip file or click Open.

The Install list displays the files that will be installed.

Note: If you do not have access to a text editor, you can modify the configuration using the General Configuration section of the Admin Server.

Caution: Depending on the quantity and size of your files, rebuilding the search index can be a time-consuming process. When rebuilding is necessary, it is recommended that you rebuild the index at an off-peak time of content user use.

Verity Integration Guide 2-5

Page 16: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Installation and Configuration

5. Click OK.

All required files are installed on Content Server.

6. After all files have been copied and installed, you are prompted to confirm enabling the listed components. Click Yes.

7. Close the Component Wizard.

8. Restart Content Server.

9. In a text editor, open Content Server’s /config/config.cfg file:<Install_Dir>/config/config.cfg

10. Set VerityLocale=<language> where <language> is a supported Verity language.

For more information on supported language locales see Chapter 6 (International Configuration). For information on specifying the universal locale or a double byte language see Additional Configuration Tasks (page 2-7).

11. Save and close the config.cfg file.

12. Restart Content Server.

13. Rebuild the search index.

UPGRADING VERITY

To continue using Verity, first review the Pre-Installation Tasks and Considerations (page 2-2) to determine whether to upgrade Verity. After installing or upgrading to

Note: If you do not have access to a text editor, you can modify the configuration using the General Configuration section of the Admin Server.

Caution: Depending on the quantity and size of your files, rebuilding the search index can be a time-consuming process. When rebuilding is necessary, it is recommended that you rebuild the index at an off-peak time of content user use.

Note: Upgrading to VDK6 from VDK5.5 or VDK4.5.1 will always require a rebuild.

Caution: Depending on the quantity and size of your files, rebuilding the search index can be a time-consuming process. When rebuilding is necessary, it is recommended that you rebuild the index at an off-peak time of content user use.

2-6 Verity Integration Guide

Page 17: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Installation and Configuration

Content Server version 10gR3, follow the instructions for Installing Verity (page 2-4) and for Additional Configuration Tasks (page 2-7). Note the following configuration issues:

If you are already using Verity, you can choose to continue to use Verity VDK4.5 or VDK5.5 with Content Server 10gR3. See Pre-Installation Tasks and Considerations (page 2-2) for information about the advantages of upgrading to VDK6.

All of the “languagex” locales have been changed to “languagev” in VDK6. After upgrading to VDK6, you must change your VerityLocale setting in the config.cfg file and rebuild your collection.

ADDITIONAL CONFIGURATION TASKSDepending on your requirements, Verity may require the following additional configuration:

Configuring for the Universal Locale (page 2-7)

Configuring for a Double Byte Language (page 2-8)

SearchIndexerEngineName Setting (page 2-8)

Transitioning Between Search Indexer Engines (page 2-9)

Configuring for the Universal LocaleAfter installing and enabling the Verity VDK6 component on Content Server, to use the ‘uni’ (universal) locale you need to perform the following configuration steps:

1. As a system administrator, use a browser to access Content Server and install and enable the following component:VerityUniversalCollectionBundle.zip

This component adds an xLanguage field to Content Server that specifies the language of a document. Without this component, VDK does not know which language a document is written in, and it cannot search using a given language.

2. In a text editor, open Content Server’s /bin/intradoc.cfg file:<Install_Dir>/bin/intradoc.cfg

3. Set FileEncoding=UTF8.

Note: If you do not have access to a text editor, you can modify the configuration using the General Configuration section of the Admin Server.

Verity Integration Guide 2-7

Page 18: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Installation and Configuration

4. Save and close the intradoc.cfg file.

5. In a text editor, open Content Server’s /config/config.cfg file:<Install_Dir>/config/config.cfg

6. Set SearchLocale=uni.

7. Save and close the config.cfg file.

8. Restart Content Server.

9. Rebuild the search index.

Configuring for a Double Byte LanguageAfter installing and enabling the Verity VDK6 component on Content Server, if you wish to use a double byte language such as Japanese or Korean, you need to perform the following configuration steps:

1. In a text editor, open Content Server’s /bin/intradoc.cfg file:<Install_Dir>/bin/intradoc.cfg

2. Set FileEncoding=UTF8.

3. Save and close the intradoc.cfg file.

4. Restart Content Server.

5. Rebuild the search index (see Updating or Rebuilding the Search Index (page 2-9)).

SearchIndexerEngineName SettingThe SearchIndexerEngineName setting specifies the active search engine on a Content Server instance. The specific setting for Content Server running Verity VDK6 is VERITY.VDK.6. If you are using Verity 4 or 5, the setting would be VERITY.VDK.4.

Caution: Depending on the quantity and size of your files, rebuilding the search index can be a time-consuming process. When rebuilding is necessary, it is recommended that you rebuild the index at an off-peak time of content user use.

Note: If you do not have access to a text editor, you can modify the configuration using the General Configuration section of the Admin Server.

Note: This setting is case sensitive. Use all upper case for the value.

2-8 Verity Integration Guide

Page 19: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Installation and Configuration

Transitioning Between Search Indexer EnginesThe new configuration setting SearchIndexerEngineName:Rebuild, in the config.cfg file, is provided to help transition between search engines. For transitioning to Verity VDK6, an administrator can set SearchIndexerEngineName:Rebuild to VERITY.VDK.6 when SearchIndexerEngineName is still set to the former search engine, which enables users to continue using the former search engine binaries until the administrator does a rebuild. During the rebuild, the former search engine is still the active search engine. After the rebuild is complete, you change the SearchIndexerEngineName configuration to the new search engine, restart Content Server, and VERITY.VDK.6 becomes the active search engine.

UPDATING OR REBUILDING THE SEARCH INDEXAdministrators (not subadministrators) can use the Indexer tab on the Repository Manager screen in Content Server to perform the following tasks:

Update the Search Index: Incrementally updates the index database. This is usually not necessary because the index is automatically updated approximately every five minutes by the server.

Rebuild the Collection: The search index is entirely rebuilt, and the old index collection is replaced with a new index collection.

Suspend an Update or a Rebuild: Stops the update or rebuild temporarily. You can restart the process by clicking the appropriate Start button.

Cancel Update Search: Index update process terminates, and only files processed to that point are accessible to the search engine.

Note: This setting can also be used to switch between other search engines such as DATABASE, DATABASEFULLTEXT, and FAST with minimum downtime.

Note: This setting is case sensitive. Use all upper case for the value.

Caution: Rebuilding the search index is necessary only when you change or add metadata fields. Depending on the quantity and size of your files, this can be a time-consuming process. If rebuilding is necessary, rebuild at times of non-peak system usage. A rebuild is not required for adding or changing metadata fields if you use database search and index.

Verity Integration Guide 2-9

Page 20: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Installation and Configuration

Cancel Rebuild Collection: Index rebuild process terminates, and the previous index database continues to be used by the search engine.

To access the Indexer functions, log in to Content Server, select Administration—Admin Applets—Repository Manager, and click the Indexer tab.

UNINSTALLING THE VERITY COMPONENTTo uninstall a component, perform these steps using either Component Wizard or Component Manager:

1. Disable the component.

2. Restart the content server.

3. Click Remove or Uninstall.

4. Restart Content Server.

Note: Uninstalling a component means that Content Server no longer recognizes the component, but the component files are not deleted from the file system.

2-10 Verity Integration Guide

Page 21: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

C h a p t e r

3.SEARCHING WITH VERITYINTEGRATION

OVERVIEWThis section provides information on using Verity to search for files.

ConceptsAbout Metadata Searching (page 3-2)

Metadata Search Operators (page 3-3)

Metadata Search Case Sensitivity (page 3-4)

Metadata Wildcards (page 3-5)

About Full-Text Searching (page 3-6)

Full-Text Search Rules (page 3-7)

Full-Text Search Case Sensitivity (page 3-8)

Full-Text Wildcards (page 3-8)

Full-Text Search Operators (page 3-9)

TasksPerforming a Metadata Search (page 3-2)

Performing a Full-Text Search (page 3-7)

Verity Integration Guide 3-1

Page 22: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Searching With Verity Integration

InterfaceQuick Search Field (page 3-12)

Home Page Search Fields (page 3-12)

Search Results Page (page 3-13)

SEARCHING METADATAThe Verity search engine provides metadata search capabilities for Content Server. This section covers the following topics:

About Full-Text Searching (page 3-6)

Performing a Metadata Search (page 3-2)

Metadata Search Operators (page 3-3)

Metadata Search Case Sensitivity (page 3-4)

Metadata Wildcards (page 3-5)

About Metadata SearchingMetadata searching is similar to finding a book in a library by searching for its author, title, or subject. When you search by metadata, you specify as much information as you know about a file or a group of files. For example, if you want to find all files written by your supervisor for your department that were released on or after 1/1/2002, you would specify the following on the search page:

Author: supervisor’s user name

Department: department name

Release Date From: 1/1/2002

Performing a Metadata SearchUse the following procedure to search for files using metadata as the search criteria:

1. Display the home page search fields (page 3-12) or the Search tray.

Note: When searching for metadata, case sensitivity and wildcard options will vary depending on how the system administrator has configured Content Server. See your system administrator for your specific configuration.

3-2 Verity Integration Guide

Page 23: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Searching With Verity Integration

2. Enter your search criteria in the metadata search fields.

• Select the appropriate metadata search operators (page 3-3).

• Use metadata wildcards as necessary (page 3-5).

• Keep metadata search sensitivity in mind (page 3-4).

3. Select the results options for displaying the results.

4. Click Search.

The files that match your search criteria are displayed on the search results page (page 3-13).

Metadata Search OperatorsOn the advanced search page, search operators can be used to refine the search criteria for a number of metadata fields. These operators are listed as options in lists to the left of each field.

Operator Description Example

Substring Finds content items with the specified string anywhere in the metadata field. This has the same effect as placing a wildcard before and after the search term. This is the most commonly used operator and is the default operator.

When form is typed in the Title field, the search returns items with words such as forms, performance, and reform in their title.

Contains Verity Only: Finds items with the specified whole word or phrase in the metadata field. No wildcard is placed before or after the specified value.

When form is typed in the Title field, the search returns items with the word form in their title, but does not return items with the words performance or reform.

Matches Finds items with the exact specified value in the metadata field.

When address change form is typed in the Title field, the search returns items with the exact title of Address Change Form.

Verity Integration Guide 3-3

Page 24: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Searching With Verity Integration

Metadata Search Case Sensitivity

Case sensitivity for metadata searches varies depending on how your system administrator has configured Content Server. When Content Server is using the Verity search engine,

Starts Finds items with the specified value at the beginning of the metadata field. This has the same effect as placing a wildcard after the search term when using the Matches operator.

When form is typed in the Title field, the search returns all items with titles that begin with the word form, including forms, forming, etc.

Ends Finds items with the specified value at the end of the metadata field. This has the same effect as placing a wildcard before the search term when using the Matches operator.

When form is typed in the Title field, the search returns all items with titles that end with the word form, such as form, perform, chloroform, etc.

Has Word Verity Zoned Searches Only: Finds items with the specified word in the metadata field. No wildcard is placed before or after the specified value.

When form is typed in the Title field, the search returns items with the word form in their title, but does not return items with the words performance, reform, formed, or forming.

Has Word Prefix Verity Zoned Searches Only: Finds items with the specified value at the beginning of the metadata field. This has the same effect as placing a wildcard after the search term when using the Matches operator.

When form is typed in the Title field, the search returns items with the word form, formed, or forming in their title, but does not return items with the words performance or reform.

Not Has Word Verity Zoned Searches Only: Finds items that do not contain the word in the metadata field. No wildcard is placed before or after the specified value.

When user1 is selected from the Author field, the search returns items authored by anyone except user1.

Operator Description Example

Note: See your system administrator for your specific configuration.

3-4 Verity Integration Guide

Page 25: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Searching With Verity Integration

metadata searches are not case-sensitive unless your system administrator has identified a metadata field for zoned searching. For zoned searching, metadata searches use the following rules for case-sensitivity:

When you use all lowercase or all uppercase letters in search expressions, metadata searches are not case-sensitive.

For example, a search for the word SERVER and server will find SERVER, server, Server, and SerVer.

When you use mixed-case letters in search expressions, metadata searches are case-sensitive.

When you use the “Matches” search operator, the metadata searches are case-sensitive. For details refer Metadata Search Operators (page 3-3).

Metadata WildcardsA wildcard substitutes for unknown or unpredictable characters in the search term. With Verity you can use the following wildcards in conjunction with the certain operators for metadata searches.

An asterisk (*) stands for zero or more alphanumeric characters. For example:

• form* matches form and formula

• *orm matches form and reform

• *form* matches form, formula, reform, and performance

A question mark (?) stands for one alphanumeric character. For example:

• form? matches forms and form1, but not form or formal

• ??form matches reform but not perform

Tech Tip: Generally, you should use all lowercase search strings to ensure that you find all of the files that match your search expression. Use mixed-case search strings only if you are looking for a specific combination of lower case and upper case.

Note: When your system is configured to use Verity, wildcards used in the Quick Search field are evaluated as wild when searching the full-text index, and literally when searching metadata. See also the Quick Search Field (page 3-12).

Verity Integration Guide 3-5

Page 26: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Searching With Verity Integration

SEARCHING FULL-TEXTThe Verity search engine provides full-text search capabilities for SQL Server, Oracle, and DB2 databases used with Verity. This section covers the following topics:

About Full-Text Searching (page 3-6)

Performing a Full-Text Search (page 3-7)

Full-Text Search Rules (page 3-7)

Full-Text Search Case Sensitivity (page 3-8)

Full-Text Wildcards (page 3-8)

Full-Text Search Operators (page 3-9)\

About Full-Text SearchingFull-text searching enables you to find a content item based on the text contained in the file itself. When a content item is checked into the content server, the indexer stores all of the words in the web-viewable version of the content item (PDF, HTML, text, or other supported file formats) in an index. When you perform a full-text search, the search expression is compared with the index, and any content items and discussions that contain your search text are returned in the search results.

A full-text search expression can include the following elements:

Strings—partial words (such as addr)

Words—individual whole words (such as addresses)

Phrases—multiple-word phrases (such as new addresses)

Operators—logic applied to words and phrases (such as news AND addresses)

See Full-Text Search Rules (page 3-7) for more information.

3-6 Verity Integration Guide

Page 27: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Searching With Verity Integration

Performing a Full-Text SearchUse the following procedure to perform a full-text search:

1. Display the Quick Search field, home page search fields, Search tray, or advanced search page.

2. Enter your search terms in the full-text search field.

• Take the full-text search rules into account.

• Keep full-text search case sensitivity in mind.

• Use the Verity full-text search options as necessary.

3. Select the results options for displaying the results.

4. Click Search.

The files that match your search criteria are displayed on the search results page or in the Results tab under the Search tray in the Portal Navigation Bar.

Full-Text Search RulesThe following Verity search rules will help you refine your full-text search criteria:

Full-text searches using Verity are case-sensitive. See Full-Text Search Case Sensitivity (page 3-8).

You can use wildcards in full-text search queries. See Full-Text Wildcards (page 3-8)

You can use search operators in full-text search queries. See Full-Text Search Operators (page 3-9).

When you perform a full-text search, the search finds the word you specify and words that have the same stem. For example, searching for the word address finds files with the word address, addressing, addresses, and addressed in them. If you want to limit the search to the word you specify, place the word in double quotes (for example, “address”).

Note: Search term context highlighting is not available.

Note: Stemming (the ability to search using the ‘root’ of a word) cannot be used with mixed case searches (capitol and lower case letters) in Verity. In order to achieve the best results, query terms must be explicitly specified if searching for upper or lower case occurrences.

Verity Integration Guide 3-7

Page 28: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Searching With Verity Integration

See the Content Server User Guide for other full-text search rules that apply to all search engines.

Full-Text Search Case SensitivityFull-text searches with the Verity search engine use the following rules for case sensitivity:

When you use all lowercase or all uppercase letters in search expressions, full-text searches are not case-sensitive.

For example, a search for the word SERVER and server will find SERVER, server, Server, and SerVer.

When you use mixed-case search expressions, full-text searches are case-sensitive.

For example, a search for the word Server will find Server, but not SERVER or server.

With the Verity full-text indexing engine, you can use the <CASE> full-text search operator to restrict a full-text search expression to lower case or upper case. For details refer to Full-Text Search Operators (page 3-9).

Full-Text WildcardsA wildcard substitutes for unknown or unpredictable characters in the search term. The following wildcards can be used in Verity full-text search fields:

An asterisk (*) stands for zero or more alphanumeric characters. For example:

• form* matches form, formal, and formula

• *form matches form and reform

• *form* matches form, formula, reform, and performance

A question mark (?) stands for one alphanumeric character. For example:

• form? matches forms and form1, but not form or formal

• ??form matches reform but not perform

Note: If Verity is configured to use the ‘uni’ locale, full-text searches are not case-sensitive.

Note: When your system is configured to use Verity, wildcards used in the Quick Search field are evaluated as wild when searching the full-text index, and literally when searching metadata. See also the Quick Search Field (page 3-12).

3-8 Verity Integration Guide

Page 29: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Searching With Verity Integration

Full-Text Search OperatorsThe following operators can be used to refine your Verity full-text search expression.

Important: AND, OR, and NOT are treated as operators by default and do not require angle brackets (< >). If you use more than one of these operators in a search expression, place each operator in angle brackets. If you want to use these terms as search terms, place them in single quotes (for example, ‘and’). You must place all other operators in angle brackets.

Note: For clarity, the operators are shown in upper case, but they can be in lower case as well.

Note: Stemming (the ability to search using the “root” of a word) is available but cannot be used with mixed case searches (upper and lower case letters). In order to achieve the best results, query terms must be explicitly specified if searching for upper or lower case occurrences.

Operator Description Example

AND Finds content items that contain all of the specified terms.

address AND name returns content items that contain both specified words.

OR Finds content items that contain at least one of the specified terms.

safety <OR> security <OR> protection returns content items that contain at least one of the three words.

, (comma) Finds content items that contain at least one of the specified terms.

safety, security, protection returns content items that contain at least one of the three words.

NOT Finds content items that contain the term that precedes the operator (if any), and ignores content items that contain the term that follows it.

NOT server returns content items that do not contain the word server.internet NOT server returns content items that contain the word internet and do not contain the word server.

Verity Integration Guide 3-9

Page 30: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Searching With Verity Integration

" (double quotation mark) or <WORD>

Finds content items that contain only the specified term, not any variations of the stem of the word.

“address” returns content items that contain address as a whole word, but does not return content items that contain addresses, addressing, or addressed.

‘ (single quotation mark) or <STEM>

Finds content items that are variations of the term.

‘address’ returns content items that are variations of the word, such as addresses, addressing, or addressed.

` (back quotation mark) Allows for proper parsing of date format when used within a scripted full-text search.

dInDate > `<$dateCurrent(-7)$>` returns all content checked in and released within the last seven days. Without the back quotes, the current date format may not properly span multiple years.

<ACCRUE> Finds content items that contain at least one of the specified words.

<ACCRUE> internet server returns content items that contain either the word internet or server, but items that contain both terms are ranked higher.

<NEAR> Finds content items that contain the specified words in close proximity to each other. Terms that are closer together receive a higher score.

internet <NEAR> server returns content items that contain the specified words close to one another.

<SENTENCE> Finds content items that contain the specified terms within the same sentence.

internet <SENTENCE> server returns content items that contain internet and server in the same sentence.

<PARAGRAPH> Finds content items that contain the specified terms within the same paragraph.

internet <PARAGRAPH> server returns content items that contain internet and server in the same paragraph.

Operator Description Example

3-10 Verity Integration Guide

Page 31: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Searching With Verity Integration

SEARCH PAGESVerity provides additional functionality for content item searches from the following places in the Content Server interface:

Quick Search Field (page 3-12)

Home Page Search Fields (page 3-12)

Search Results Page (page 3-13)

<THESAURUS> Finds content items that contain words that are synonyms for the specified word.

<THESAURUS> talk returns content items that contain talk, speak, and say.

<CASE> Finds content items that contain the specified term in the specified case.

<CASE> ADDRESS returns content items that contain ADDRESS, but does not return items that contain address or Address.

<TYPO> Finds content items that contain all words that are spelled similarly to the word that follows this operators.

<TYPO> word returns content items that contain word, ward, and worn.

<SOUNDEX> Finds all content items that contain all words that sound or have a letter pattern similar to the word that follows this operator. You must build your own sound-alike index to use this operator.

<SOUNDEX> near returns content items that contain near, dear, and here.

Operator Description Example

Verity Integration Guide 3-11

Page 32: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Searching With Verity Integration

Quick Search Field

The Quick Search field enables you to perform a search regardless of what page is displayed in the content area. The Quick Search field searches the title and content ID metadata, as well as the full-text index.

Different search configurations used with Content Server use different wildcards and evaluate them differently for full-text and metadata. Because the Quick Search field searches both full-text and metadata, search results from wildcards used in the Quick Search field will depend on your system configuration. For information about wildcards used with Verity, see Metadata Wildcards (page 3-5) and Full-Text Wildcards (page 3-8). (For information about wildcards used with other search configurations and for more information about the Quick Search field, see the Content Server User Guide.)

Home Page Search Fields

The home page search fields enable you to perform a metadata search, full-text search, or a combination of both from the content server home page. Only the most commonly used search fields are available from the home page.

The Score option in the Sort By list is provided only with Verity. (For information about the other home page search fields, see the Content Server User Guide.)

Note: If you have changed your layout to the Classic layout, you need to enable the Quick Search field by selecting the Quick Search check box on your System Links page.

3-12 Verity Integration Guide

Page 33: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Searching With Verity Integration

Search Results Page

The Search Results page displays a list of content items that match the criteria specified during a search. The search results page is slightly different depending on which search engine your system uses. The following table describes the features on the search results page specifically provided by the Verity search engine. (For information about the other Search Results page fields, see the Content Server User Guide.)

Feature Description

Sort By list Specifies the field that the search results will be sorted on:Release Date (default)—Sorts by the Release Date metadata field.Title—Sorts alphabetically by the Title metadata field.Score (Verity only)—Sorts by the number of occurrences of search terms, or the proximity of search terms when a proximity operator such as <NEAR> is used. Applies only to full-text searches when Content Server is configured to use the Verity search engine.

Verity Integration Guide 3-13

Page 34: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Searching With Verity Integration

Feature Description

Found x items matching the query

Shows the number of content items that match the search criteria.

Items x-y of z Shows the number of content items being displayed on the current search results page. This is displayed only when more than one page of content items is returned.

“Page x of y” choice list

Enables navigation to any page of content items in the search results list. This feature is displayed at the top and bottom of the search results page only when more than one page of content items is returned.

Arrow buttons

Forward: Advances to the next search results page in a series.Back: Returns to the previous search results page in a series.

3-14 Verity Integration Guide

Page 35: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

C h a p t e r

4.CONFIGURING METADATA

OVERVIEWVerity content metadata and indexing can be configured in the Content Server repository to optimize searching functions. This section contains the following topics:

ConceptsUsing Individual Data Tables (page 4-2)

Housekeeping of Verity Part Files (page 4-2)

TasksSetting up Individual Verity Data Tables (page 4-3)

Zone Indexing a Metadata Field (page 4-3)

InterfaceDefine Filter Screen (page 4-6)

Update Database Design Screen (page 4-7)

Advanced Search Design Screen (page 4-9)

Advanced Options for FieldName Screen (page 4-10)

Verity Integration Guide 4-1

Page 36: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Configuring Metadata

CONFIGURING METADATA FOR OPTIMAL VERITY SEARCHING

There are several ways to interact with Verity to optimize searching:

Using Individual Data Tables (page 4-2)

Housekeeping of Verity Part Files (page 4-2)

Setting up Individual Verity Data Tables (page 4-3)

Zone Indexing a Metadata Field (page 4-3)

Using Individual Data TablesDuring indexing Verity places the results of its indexing in a “parts” subdirectory for the collection. For each text metadata field, you can configure Verity to place a specific field in its own data table or with other fields in a shared data table. Placing a field in its own data table using the Advanced Search Design Screen (page 4-9), can make a search against that field ten times faster; however, it also has these drawbacks:

Indexing is slower

The number of part files opened during a search is greater, resulting in the time to open a connection being slower. On a network share, this can create a 10 second lag time.

If the number of part files is too large, the operating system may cause a connection failure.

Housekeeping of Verity Part FilesWhen Verity initially indexes content, it creates new parts files for each batch load indexed. Then Verity goes through a housekeeping phase. During this phase, Verity rolls up some of the part files into larger part files. The number of files rolled up is based on an internal Verity algorithm, which tries to limit the total number of part files for each data table. If you have five connections open, 30 separate data tables, and 12 part files per data table, then you have 1800 open files, which is over the standard 1024 open file limit for Solaris.

Note: Because of the costly indexing issues, it is important to limit the number of fields that you place in separate tables.

4-2 Verity Integration Guide

Page 37: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Configuring Metadata

A specific index request could be timed out by Verity if there are a lot of separate data tables or part files. Therefore, you may need to set an increased timeout value for housekeeping. By default this is ten minutes.

If you are concerned that you have too many part files, you can force a full housekeeping each time an indexing operation is performed. This stops the creation of extra part files; however, it slows indexing down considerably. In config.cfg, enter:AdditionalIndexBuildParams=-optimize maxmerge-squeeze

To set this parameter to force a full housekeeping occasionally, then use the configuration entry:

DoAutoMaxMerge

Generally, the larger the collection, the more infrequent the maxmerge should be.

Setting up Individual Verity Data TablesTo set up an individual Verity data table, perform the following steps:

1. On the Configuration Manager screen, click Update Design.

The Advanced Search Design Screen (page 4-9) is displayed.

2. Select a text field for which to set up a separate data table.

3. Click Edit.

The Advanced Options for FieldName Screen (page 4-10) is displayed.

4. Enable the Has a separate index data table option.

5. Click OK.

Zone Indexing a Metadata FieldVerity search performance can be improved by zone indexing a metadata field for quick search. This enables full-text queries to search against the value in the quick search field, using the <in> operator. This causes minimal overhead in Verity and returns good performance. However, you must train your users to search in this manner.

To set up zone indexing for a Verity metadata field, perform the following steps:

1. On the Configuration Manager screen, click Update Advanced Search Design.

The Advanced Search Design Screen (page 4-9) is displayed.

2. Check Enable the zone quick search.

Verity Integration Guide 4-3

Page 38: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Configuring Metadata

3. Select a text field to zone index for quick searches.

4. Click Edit.

The Advanced Options for FieldName Screen (page 4-10) is displayed.

5. Enable the Is zone indexed option and, optionally, the Is zone searched on the standard query page option.

6. Click OK.

Setting Zone Searchable FieldsTo improve search performance, you can identify metadata fields as search zones in Verity. An added benefit of setting a field to be zone searchable is that the field is full-text indexed. Therefore, full-text searches will include the specified metadata fields in the search, and search terms entered in the specified fields on a Search page will be considered full-text search criteria.

By default, the Security Group (dSecurityGroup) field is zone searchable. To set any other metadata fields to be zone searchable, complete the following steps:

1. In Content Server, add the following line to the Additional Configuration Variables field in the Admin Server or to the <Install_Dir>/config/config.cfg file where the value is a comma-separated list of field names:ZonedSecurityFields=dDocTitle,xComments,dDocAccount

2. Save your changes.

3. Restart Content Server.

4. Rebuild your Index collection.

Zone Searching ExampleWhen you enter a search term in a zone searchable metadata field, the following syntax is implied (the <IN> operator specifies a full-text search):

(search term) <IN> zone_name

Caution: Depending on the quantity and size of your files, rebuilding the search index can take up to a couple of days. When rebuilding is necessary, it is recommended that you rebuild the index at an off-peak time of content user use.

4-4 Verity Integration Guide

Page 39: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Configuring Metadata

For example, if you enter the term stellent in the Title field, the following full-text search will be performed:

(stellent) <IN> dDocTitle

INTERFACEThis section covers the following topics:

Define Filter Screen (page 4-6)

Update Database Design Screen (page 4-7)

Advanced Search Design Screen (page 4-9)

Advanced Options for FieldName Screen (page 4-10)

Verity Integration Guide 4-5

Page 40: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Configuring Metadata

Define Filter Screen

The Define Filter screen is used to narrow the list of revisions, users, and so forth that is displayed on several administration application screens. If a revision failed, it may be because of a Verity error, and it is listed in the Conversion Status field of this screen. For

4-6 Verity Integration Guide

Page 41: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Configuring Metadata

more information about the Define Filter screen see the Managing Repository Content guide.

Update Database Design Screen

The Update Database Design screen is used to add or delete metadata fields in the content server database. To access this screen, in Content Server go to the Configuration Manager: Information Fields tab, add or delete a custom metadata field, and then click Update Database Design.

Feature Description

Conversion Status field The conversion status of the revision:Converted: The revision was converted successfully and the web-viewable file is available.Processing: The revision is being converted by the Inbound Refinery.Failed: The revision is deleted, locked, or corrupted, or a Verity Integration error occurred.MetaData Only: Full-text indexing was bypassed and only the revision’s metadata was indexed.Refinery PassThru: Inbound Refinery failed to convert the revision and passed the native file through to the web.Incomplete Conversion: An error occurred in the conversion after a valid web-viewable file was produced and the file was full-text indexed.

Verity Integration Guide 4-7

Page 42: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Configuring Metadata

The following table lists the events after which a database update or search index rebuild is required when you are using the Verity index system:

Feature Description

Info field(s) that will be added Lists the metadata fields that were added since the last time the database was updated.

Info field(s) to delete check boxes Lists the metadata fields that were deleted since the last time the database was updated.Selected—The metadata field will be deleted from the database.Clear—The metadata field will not be deleted from the database. The field remains hidden on the Configuration Manager screen and checkin and search pages, but it still exists in the database.

Caution: Depending on the size of your search index and available system resources, the search index rebuild process may take several days. If rebuilding is necessary, rebuild at times of non-peak system usage.

Event Action Required

Add metadata field Update database

Edit metadata field Update database*

Delete metadata field Update database

Enable or disable Enable for Search Index for metadata field Rebuild search index

Add metadata field with Enable for Search Index selected Rebuild search index

* Changes to the Require Value, Option List Default Value, Option List Key, and OptionList values do not require a database update.

4-8 Verity Integration Guide

Page 43: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Configuring Metadata

Advanced Search Design Screen

The Advanced Search Design Screen is used to tune Verity for optimal searching. To access this screen, click Advanced Search Design on the Configuration Manager screen. This option is not available for other search engines used with Content Server.

Feature Description

Name The Verity field name. For example dDocName or dInDate.

CaptionThe displayed metadata field name, corresponding to the Verity field name.

OptionsSpecifies whether the field is in the Search Results and whether it has a corresponding data table.

Verity Integration Guide 4-9

Page 44: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Configuring Metadata

Advanced Options for FieldName Screen

The Advanced Options for FieldName screen is used to perform advanced tuning on Verity Integration for optimal searching. To access this screen, click Edit on the Advanced Search Design Screen (page 4-9). This option is not available for other search engines used with content server.

Enable the zone quick search

For users performing a quick search, the values of the zoned fields will be searched in addition to the full-text, which is normally searched. For more information, see Zone Indexing a Metadata Field (page 4-3)

Edit

Select a field to enable this button. Then, this button is enabled and will open the Advanced Options for FieldName Screen (page 4-10). Use this screen for advanced search tuning options.

OK Enable the changes.

Cancel Cancel changes.

Help Invoke help.

Feature Description

Current stateThe current state of the Verity field. For example, indexed, has a data table, or missing.

Feature Description

4-10 Verity Integration Guide

Page 45: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Configuring Metadata

Has a separate index data table

If a field has its own data table, then searches against that field can be up to ten times faster, and you do sorts using that field. There are two drawbacks to using a separate index data table:

Indexing is slower

The number of part files to open during a search is increased. Therefore, the time to open a connection is greater.

It is recommended that you limit the number of fields with separate index data tables. dDocTitle is in a separate data table by default. For more information, see Using Individual Data Tables (page 4-2)

Is zone indexed

Zone indexing a field is a way to boost search performance by adding specific metadata fields to be full-text searched. By default, this is not on the standard query page. For more information, see Zone Indexing a Metadata Field (page 4-3).

Is zone searched on the standard query page

If a field has zone indexing enabled, you can enable this field to enable zone searching on a standard query page. For more information, see Zone Indexing a Metadata Field (page 4-3).

Is returned in the search results

This field is returned in the search results if this option is selected.

OK Enable the changes.

Cancel Cancel changes.

Help Invoke help.

Feature Description

Verity Integration Guide 4-11

Page 46: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section
Page 47: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

C h a p t e r

5.INDEXING

OVERVIEWThis section covers the following topics:

ConceptsAbout Full-Text Indexing (page 5-2)

Supported File Formats (page 5-3)

Date Storage (page 5-3)

About Writing Queries (page 5-9)

About Verity Collections (page 5-11)

Verity Partitions (page 5-12)

Performance Tuning (page 5-15)

TasksSetting Accent Insensitivity (page 5-4)

Setting Tokenization (page 5-4)

Customizing Verity for PDF Files (page 5-6)

Indexing Structured Documents (page 5-8)

Limiting the Verity Search Index on UNIX (page 5-14)

Verity Integration Guide 5-1

Page 48: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

FULL-TEXT INDEXING WITH VERITY This section covers the following topics:

About Full-Text Indexing (page 5-2)

Supported File Formats (page 5-3)

Date Storage (page 5-3)

About Full-Text IndexingFull-text indexing means that every word in a file is indexed, not only its metadata. The system applies Verity full-text indexing in one of these ways:

By default, full-text indexing is applied to all converted files.

By default, Content Server full-text indexes files that are passed through in any of the formats listed in Supported File Formats (page 5-3).

For example, if you want to convert your Microsoft Word (.doc) files to text files instead of PDF, you can specify this in the Configuration Manager. Then the text file is fully indexed before it is passed to the web site.

You can enable contributors to specify whether to full-text index a file by enabling the format override feature in System Properties.

For example, if you have set Corel WordPerfect (.wpd) files to be passed through as text and a contributor selects the use default option in the Format field on the checkin page, the file will be converted to text and full-text indexed. If the contributor selects Corel WordPerfect Document, the file will be passed through in its native format and will not be full-text indexed.

You must rebuild the index if you have configured locales that Verity supports.

Tech Tip: You can use multi-byte characters in the names of Content IDs, security groups, content types, and accounts. However, if you are using Dynamic Converter and searching for a multi-byte filename, the search will fail. This is a limitation of the program in this release.

5-2 Verity Integration Guide

Page 49: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

Supported File FormatsBy default, Verity full-text indexes files that are passed through in any of the following formats:

If you define a file format to PASSTHRU in the native format, and the format name contains one of the types listed above (such as application/ms-excel.native), the passed through native file will be full-text indexed by default.

Date StorageIn Content Server 6.1 and earlier, Verity stored dates as the number of seconds since 1904, up to the year 2037 (1904/1/1 through 2037/12/31). As of version 6.2, Verity stores dates in an extended date format, as the number of minutes since the year 1000 through the year 2999 (1000/1/1 through 2999/12/31).

Content Server is designed to store dates with four-digit years until the year 10,000 AD. Dates with two-digit years are translated and stored according to the following rules:

• pdf • xml • ms-powerpoint

• html • msword • ppt

• htm • doc • ms-excel

• text • rtf • xls

• txt

OS/Database/Indexing Engine Configuration Two-digit year Translated as:

Windows/SQL Server/VerityWindows/Oracle/VeritySolaris/Oracle/Verity

00-6970-99

2000-20691970-1999

Windows/Access/Verity 00-29 2000-2029

30-99 1930-1999

Verity Integration Guide 5-3

Page 50: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

CUSTOMIZING INDEXING AND SEARCHINGThe Verity indexing and search engine can be customized in the following ways:

Setting Accent Insensitivity (page 5-4)

Setting Tokenization (page 5-4)

Customizing Verity for PDF Files (page 5-6)

Indexing Structured Documents (page 5-8)

Setting Accent InsensitivityYou can specify that searches are accent-insensitive. For example, the search term bibliotheque would find documents containing “bibliotheque” or “bibliothèque”.

To enable accent insensitivity:

1. Make a backup copy of the <Install_Dir>/custom/VDK6/vdk/common/<locale_name>/loc.prm file (Verity locale file) where <locale_name> is the appropriate setting for the language used at your site (for example, englishv).

2. Open the original loc.prm file in a text editor.

3. Comment out the following line in the loc.prm file:$define LOC_ACCENT_SENSITIVE

4. Save the loc.prm file.

5. Restart the content server.

6. Rebuild your index collection.

Setting TokenizationWhen Verity indexes a document the tokenizer breaks words at whitespace and punctuation characters into shorter search terms or tokens. For example,

Note: By default, accent-insensitivity will not apply to PDF files. See Customizing Verity for PDF Files (page 5-6).

Caution: Depending on the size of your index collection and available system resources, the rebuild process can take up to a couple of days. If rebuilding is necessary, rebuild at times of non-peak system usage.

5-4 Verity Integration Guide

Page 51: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

www.company.com/mypage.htm can be indexed into five searchable tokens: www, company, com, mypage, and htm. By default, the multilanguage locale uses simple-tokens behavior. For other locales, tokenization must be enabled.

If you enable the tokenization function, you can also:

Exclude specific symbols from functioning as token delimiters. For example, if you don’t want to have e-mail addresses split into tokens, you could exclude the @ symbol.

Specify symbols that should be treated as search terms. For example, users can search for $ and © symbols.

To enable tokenization, complete the following steps:

1. Make a backup copy of the <Install_Dir>/custom/VDK6/vdk/common/<locale_name>/uni.cfg file (Verity locale file) where <locale_name> is the appropriate setting for the language used at your site (for example, englishv).

2. Open the original uni.cfg file in a text editor.

3. Find the following line and change the variable’s value to yes:simple_tokens: yes

4. To exclude specific symbols from functioning as token delimiters and instead treat them as punctuation, locate the treat_as_punctuation block within the global post-process section. (If the block is commented out, uncomment the parts of it you are going to use.)

To specify individual characters, follow the chars: label with a space-separate list of character codes. All character codes must be Unicode.

To specify a range of characters, follow the range: label with the first and last character codes in the range, separated by a space. All character codes must be Unicode. You can have more than one range: statement.

Note: By default, tokenization behavior will not apply to PDF files. See Customizing Verity for PDF Files (page 5-6).

Note: If you have enabled the accent-insensitivity function, the tokenization function is already enabled, with the following defaults:

The hyphen (-), underscore (_), and ampersand (&) are excluded.

The following symbols are searchable: # $ % © ® ¢ £ ¥

See Setting Accent Insensitivity (page 5-4).

Verity Integration Guide 5-5

Page 52: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

5. To specify characters as symbols and treat them as search terms, locate the treat_as_symbol block within the global post-process section. Modify the block in the same way as in the previous step, using individual codes or code ranges.

6. To specify symbol characters as regular alphabetic characters specify them in the treat_as_alphabetic block. Modify the block in the same way as in the previous steps, using individual codes or code ranges.

7. To specify symbol characters as alphabetic characters, but only when they do not occur at the beginning of a word, specify them in the not_allowed_leading_char block.

8. To specify symbol characters as alphabetic, but only when they do not occur at the end of a word, specify them in the not_allowed_trailing_char block.

9. Save and close the uni.cfg file.

10. Restart the content server.

11. Rebuild your index collection.

Customizing Verity for PDF FilesContent Server uses a different Verity filter for PDF files than for all other file formats. Consider the following items for additional configuration.

Verity 6: The unique Verity filter (flt_pdf) allows keyword highlighting of PDF files; however, the accent-sensitivity and tokenization functions in the Verity 6.1 VDK (see Setting Accent Insensitivity (page 5-4) and Setting Tokenization (page 5-4)) are not available with this filter.

If you want to enable the accent-sensitivity and tokenization functions and disable keyword highlighting in PDF files, you can specify the standard filter (flt_kv) as follows:

1. Create a directory called vdk6_custom_style in the <install_dir>/search/ directory. (The directory must be named correctly to work.)

Note: If you specify a symbol character in the not_allowed_leading_char block and also in the treat_as_symbol or treat_as_alphabetic block, then treat_as_symbol or treat_as_alphabetic takes precedence.

Caution: Depending on the size of your index collection and available system resources, the rebuild process can take up to a couple of days. If rebuilding is necessary, rebuild at times of non-peak system usage.

5-6 Verity Integration Guide

Page 53: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

2. Copy all of the files from the <install_dir>/custom/VDK6/style directory to the new vdk6_custom_style directory.

3. Add the following line to the Additional Configuration Variables field in the Admin Server or to the <install_dir>/config/config.cfg file:

4. UseVdk6CustomStyle=1This tells Content Server to use the files in the new directory you created.

5. Open the <install_dir>/search/vdk6_custom_style/style.uni file in a text editor.

6. Find the type: "application/pdf" section of the file.

7. Move the # comment symbol from the line that specifies the flt_kv filter to the line that specifies the flt_pdf filter. The code should look like this:

type: "application/pdf"# /format-filter = "flt_pdf"/format-filter = "flt_kv"

8. Save and close the file.

9. Restart Content Server.

10. Rebuild your index collection.

Verity 4.5 only: Full-text indexing of PDF files is not always handled correctly for multi-byte and bidirectional languages if the default Verity settings are used. To fix this, you must edit Verity’s style.uni to use the keyview filter for PDFs:

1. Add the entry UseVdk4CustomStyle=true to <CS_Instance_Dir>/config/ config.cfg.

2. Save the file, and restart the content server.

3. Create a directory called vdk4_custom_style under the <CS_Instance_Dir>/search directory.

4. Copy all files from the <CS_Instance_Dir>/shared/search/style/basic directory to the <CS_Instance_Dir>/search/vdk4_custom_style directory.

5. Start a text editor and open style.uni in the <CS_Instance_Dir>/search/ vdk4_custom_style directory.

Caution: Do not modify the original Verity style files; always make changes in the custom style files.

Caution: Depending on the size of your index collection and available system resources, the rebuild process can take up to a couple of days. If rebuilding is necessary, rebuild at times of non-peak system usage.

Verity Integration Guide 5-7

Page 54: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

6. Comment out the line /format-filter = "flt_pdf" under type: "application/pdf" and uncomment /format-filter = "flt_kv", for example:

type: "application/pdf"# /format-filter = "flt_pdf"

/format-filter = "flt_kv"

7. Save and rebuild the search index.

Indexing Structured DocumentsWhen Verity indexes certain types of structured documents (such as XML and HTML), it includes zone information in the collection’s full-text index. Zones are specific regions of a document to which searches can be limited, such as a particular field or tag. Zone searching improves speed and efficiency because the searches are limited to specified portions of documents.

By default, Verity applies a zone filter when indexing the following types of structured documents:

XML Documents (page 5-8)

HTML Documents (page 5-9)

SGML Documents (page 5-9)

Internet Message Format Documents (page 5-9)

XML DocumentsEach XML tag in a well-formed XML document is indexed as a zone, with the zones given the same name as the XML tags. The content in an XML tag can also be indexed as a searchable metadata field. META tags are automatically indexed as fields unless they are in a suppressed region.

The default XML filter can be customized using a style.xml file. You can specify field names, ignore tags and just index the content, or suppress certain elements altogether.

Note: Even using the “flt_kv” filter, full-text indexing of PDF files is not always indexed correctly for multi-byte characters.

Tech Tip: If you are updating from a pre-6.0 release of Verity, you will need to rebuild the index to use the structured document filters.

5-8 Verity Integration Guide

Page 55: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

HTML DocumentsVerity Integration’s zone filter includes built-in support for standard HTML tags. Each standard HTML tag is indexed as a zone.

SGML DocumentsTo index SGML documents using zones, you need to define how the tags are to be indexed.

Internet Message Format DocumentsVerity’s zone filter includes built-in support for both standard e-mail and Usenet news messages. These documents must conform to the RFC822 standard.

ABOUT WRITING QUERIESYou can write custom query expressions when you define query links as part of building a web site. The method that you use to write custom queries varies depending on the kind of query that you write. Basic Verity query information is provided in the following topics.

Basic Verity Query Script (page 5-9)

Basic Verity Script Examples (page 5-11)

Verity and Query Links (page 5-11)

Basic Verity Query ScriptTo write directory custom queries, you can use Verity script and Idoc Script. Idoc Script is Content Server’s scripting language, which is described in detail in Getting Started with the Software Developer’s Kit (SDK) and the Idoc Script Reference Guide. Verity script involves words and operators, which are words that show logical relationships between the words in your query. The following table contains some basic operators and their use.

See Also– Full-Text Search Operators (page 3-9)

Verity Integration Guide 5-9

Page 56: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

Operator Use

, (comma)Returns files that contain one or more of the specified words, ranking them by the frequency of the word use.

AND Returns files that contain the words it links.

ORReturns files that contain at least one of the words it links.

NOTReturns files containing the word that precedes it and not the word that follows it.

<ACCRUE>Returns files that contain at least one of the words entered.

<NEAR>Returns files that contain the specified words if they are in the same general location. They do not need to be next to each other.

'word' (single quotation mark) or <STEM>

Returns variations of the word. For example, worded, wording, words.

"word" (double quotation mark) or <WORD>

Returns the word with no variations. For example, word NOT wording, worded, and so forth.

<TYPO>Returns all files that contain all words that are spelled similarly to the word that follows this operator.

<SOUNDEX>Returns all files that contain all words that sound like the word that follows this operator.

5-10 Verity Integration Guide

Page 57: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

Basic Verity Script ExamplesFinds all files with the word summary in the title that were authored by sysadmin:dDocTitle <contains> 'summary' AND dDocAuthor <matches> 'sysadmin'

Finds all files with the HR substring in the Comments section that do not have the word benefit in the Comments section.xComments <substring> 'HR' NOT xComments <contains> 'benefit'

Verity and Query LinksWhen you create a query link and enter information into the Query Link Definition screen, Verity can cause different results with the text entered into the Text 1 and Text 2 fields. When adding Idoc Script and HTML tags to these fields, keep in mind that any resulting HTML tags can affect the display of the search results page.

For example, the VDKSUMMARY variable retrieves a summary of the full-text index of a content item. If a content item contains HTML tags, these tags could be included in the full-text index, and then would be included in the Text 1 or Text 2 field when <$VDKSUMMARY$> is specified.

To prevent formatting errors, use the xml Idoc Script function to escape the HTML syntax in Idoc Script variables. For example, <$xml (VDKSUMMARY) $>.

For more information see Managing System Settings and Processes, and Idoc Script Reference Guide.

UNDERSTANDING VERITY COLLECTIONSThis section covers these topics:

About Verity Collections (page 5-11)

Verity Partitions (page 5-12)

Limiting the Verity Search Index on UNIX (page 5-14)

About Verity CollectionsA Content Server instance uses a single Verity collection. When you add documents to a content server, they are added to the same collection. When you rebuild a collection, the

Verity Integration Guide 5-11

Page 58: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

content server will create a new collection, but only one collection is active at any given time.

Verity stores Verity collections in two index directories: <install_dir>/search/index1/ and /index2/. The index1 directory stores the active collection, and the index2 directory is in reserve for a rebuild function. You can tell which index is active by looking in the /search/activeindex.hda file, which contains information about which of the two index subdirectories contain the active index (verity.1 or verity.2).

Each Verity collection is made up of several Verity Partitions (page 5-12). The partitions that make up the collection are stored in the intradocbasic/parts/ subdirectory in each of the indexes.

Verity PartitionsA partition is a set of files within a Verity collection. Every time Content Server asks Verity to index documents, a new partition is created. The Content Server indexing process is asynchronous to the checkin process. When a document is checked in either by batch load or an HTTP upload, the indexing process is notified, and an indexing cycle is initiated. The indexing cycle then groups a reasonable number of documents to be indexed together. The number of documents depends on configuration parameters and the file sizes of the documents being indexed. During batch loads, it is common to have 25 files per indexing cycle, while during everyday checkin activities, it is common to have 1 file per indexing cycle.

Verity partitions are located in the search/indexX/intradocbasic/parts/ subdirectory. Each file in a partition has the same file name with different extensions. The file name is an eight-digit number, and the extensions are three characters in sequence. For example, files 00000008.dva and 00000008.dvb are files that make up the same partition.

Partition File ExtensionsContent Server automatically creates a file in the <install_dir>/search/indexX/intradocbasic/style/ directory called style.ufl. This file contains information for Verity about how to index documents. By default, the content server puts each non-numeric (non-date and non-integer) metadata field into its own file within the partition. The file name extension starts at .dva and goes to .dvz. If there are

Note: The logic Verity uses to generate the file extensions is not entirely straightforward. Sometimes Verity will roll up multiple parts files into a single file. In this case, you won’t see predictable file extensions. If many different parts files are created, then it is more likely that many different extensions are also generated.

5-12 Verity Integration Guide

Page 59: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

more than 26 metadata fields, Verity starts over at .dva and stores multiple fields in the same file extension. Because date and integer fields are not indexed, certain file extensions are skipped. Ultimately, the number of files per partition depends on the number of metadata fields in your system.

Merging PartitionsVerity manages the partitions by periodically merging several partitions into one new partition and deleting the old partitions. The best way to understand this merge process is by example:

Assume that a large batch load is occurring and 25 documents are being indexed for each indexing cycle. In addition, assume that the content server has no content items.

25 Documents get indexed into Partition 00000001

25 Documents get indexed into Partition 00000002

25 Documents get indexed into Partition 00000003

25 Documents get indexed into Partition 00000004

When there are four partitions at the same level, they are merged into a single partition. In this case, there are four partitions that have never been merged (level 0), so Verity merges these four partitions in to a new partition:

100 Documents get merged into Partition 00000005

This new partition will be in level 1, or containing one merged set of documents. The pattern continues as follows:

25 Documents get indexed into Partition 00000006

25 Documents get indexed into Partition 00000007

25 Documents get indexed into Partition 00000008

25 Documents get indexed into Partition 00000009

100 Documents get merged into Partition 00000010

Thus, you will have two active partitions (number 5 and 10) with 100 documents each. Soon you will have four level 1 partitions that will be merged as follows:

100 Documents in Partition 5

100 Documents in Partition 10

Verity Integration Guide 5-13

Page 60: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

100 Documents in Partition 15

100 Documents in Partition 20

Now a merge to level 2 takes place. 400 documents are merged into Partition 21, and Partitions 5, 10, 15 and 20 are no longer in use. It is common to have between 5 and 15 partitions at one time.

Limiting the Verity Search Index on UNIXOn some versions of certain flavors of UNIX, a process may only open 1024 files at a time. Some levels of some UNIX operating systems allow you to configure this number, while others do not. When the open file limit is reached, the content server will crash. To determine the number of open files you require, use the following equation:

Open Files Required = (# of Verity Partitions) x (# of Files per Partition) x (# of Concurrent Verity Requests)

The following is a list of things you can do to make sure that enough files can be opened on the UNIX system:

Increase the number of files that are allowed to be opened: Contact your UNIX System Administrator to increase the number of files. See also Example: Open File Limit (page 5-15).

Control the number of partitions: You can periodically run a MaxMerge, Squeeze, and MaxClean against the collection. This can either be done once a night, once a week, or for every indexing cycle. (See the AdditionalIndexBuildParams configuration variable in the Idoc Script Reference Guide for more information.)

Controlling the number of files per partition: This approach requires changes to the code in the style.ddd file, which controls the creation of the style.ufl file. Possible changes include:

• Putting all metadata fields into a single file extension.

• Putting highly searched metadata fields into their own file extension, enabling you turn on indexing for these files. The indexing of certain metadata fields can dramatically increase the speed of searches. This in turn can cause the number of concurrent search requests to decrease.

Reduce the number of concurrent Verity requests from the content server: Set the MaxSearchConnections setting to a lower number. (See the MaxSearchConnections configuration variable in the Idoc Script Reference Guide for more information.)

5-14 Verity Integration Guide

Page 61: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

Example: Open File LimitOne of the default values in the UNIX startup scripts is the open file descriptor, INTRADOCMS_FDLIMIT=1024. This value sets the number of open file descriptors that Content Server will be allowed to use. If you want to increase this number, you need to change the value in the Content Server configuration file, <install_dir>/etc/config.

To increase the number of open file descriptors available to Content Server (in this example, to 4096):

1. Create a file called config in the <install_dir>/etc/ directory.

2. In the config file, add the following line:INTRADOCMS_FDLIMIT=4096

3. Make sure that the kernel is configured to allow the number of open file descriptors per process that you set in the config files.

PERFORMANCE TUNINGConsider the following options for tuning Verity to achieve better performance.

One of the simplest way to tune the Verity index is to use options on the mkvdk.exe command. This command is used to establish parameters for the Verity index. See the Idoc Script Reference Manual.

Another way to tune Verity is to use parts files. Parts files enable you to place commonly searched metadata fields into separate files. For example, security group, document title, or author are often used as search terms. These fields can be put into separate files for faster searching. If you have many data files, searching will be fast but indexing may be slower.

The ability to configure the data file placement of metadata fields is a part of Content Server. See the Managing Repository Content Guide for details about parts file use.

You can customize Verity indexing by setting accent insensitivity, using tokens (splitting terms with punctuation or symbols into shorter search terms), keyword highlighting for PDF files, and setting options for indexing structured documents (such as XML, HTML, and SGML documents). See Customizing Indexing and Searching (page 5-4) for details.

It’s important to maintain a clean, small index wherever possible. The following indexer variables are useful for optimizing for consumption using Verity. This may

Verity Integration Guide 5-15

Page 62: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Indexing

not always be optimal for contribution. Always consider a balanced approach based on the usage at your site.

• AdditionalIndexBuildParams: Adds build parameters to every Verity indexer (mkvdk.exe) execution. This can be used to force optimization to occur after every indexing bulk load instead of every few bulk loads.

• DoAutoMaxMerge: The “auto tune-up” feature automatically adds cleanup build parameters to Verity indexer execution at calculated intervals. This can improve searching efficiency and indexing performance.

• MaxMergeBaseCount: Sets the frequency at which the Verity indexer “auto tune-up” feature is executed.

• TimeoutPerOneMegInSec: Sets the timeout for indexing files into the Verity collection. Setting this option too low causes index problems and setting it too high causes you to wait longer before killing a stuck process.

You can minimize potentially slow batch load performance when checking in a large number of content items by making adjustments to the Batch Loader utility. For the Verity index system, you could optimize the search collection prior to inserting the batch load file. For more information about Verity optimization scripts and parameters, see the AdditionalIndexBuildParams configuration variable in the Idoc Script Reference Guide.

5-16 Verity Integration Guide

Page 63: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

C h a p t e r

6.INTERNATIONAL CONFIGURATION

OVERVIEWThis section covers the following topics:

ConceptsVerity Locales (page 6-1)

Supported Verity Locales (page 6-2)

Considerations (page 6-5)

TasksSetting the Verity Locale (page 6-4)

VERITY LOCALESVerity is a search engine that can be used with Content Server to provide full-text and metadata search capability. There are a number of “language locales” for Verity, which are used to make sure language-specific characters in text are indexed correctly. If you choose to use Verity, the Verity locale is set during the component installation, but you can also change it at a later time.

There are Verity locales for a number of Western European languages (for example, English, German, French, Spanish, and so forth) as well as for Asian languages (for example, Japan, Korea). Each of the Western European locales can handle special

Verity Integration Guide 6-1

Page 64: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

International Configuration

characters in all Western European languages, including ä, Ö, ß, and so forth (German), é, â, ç, and so forth (French), and ó, ñ, á, and so froth (Spanish). The Asian locales are not interchangeable. This means that the Japanese locale can only handle Japanese texts and not Korean or Chinese.

SUPPORTED VERITY LOCALESThe following table shows the Verity locales that are supported:

Note: All Content Server instances (including proxied servers) have their own Verity locales. You can only use one Verity locale per Content Server instance. This means Verity cannot handle both Japanese and Western European languages in the same Verity instance.

Caution: If you want to change the search index locale, you need to rebuild the search index. This can be a very time-consuming process, depending on the number of content items managed by the Verity system.

Verity locale Languages that Verity index engine can handle

english English (2)

englishv (= default) All Western European languages (1)

frenchv All Western European languages (1)

germanv All Western European languages (1)

spanishv All Western European languages (1)

italianv All Western European languages (1)

portugv All Western European languages (1)

dutchv All Western European languages (1)

danishv All Western European languages (1)

swedishv All Western European languages (1)

bokmalv All Western European languages (1)

6-2 Verity Integration Guide

Page 65: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

International Configuration

Notes:

1. ‘Western European languages’ in the table refers to the following languages (in addition to English): German, French, Spanish, Italian, Portuguese, Dutch, Danish, Swedish, and Finnish (all sharing the ISO-8859-1 encoding.)The differences between the locales are related to the language in which Verity-related messages are presented and the way results are presented in the search results pages, and so forth.

2. ‘English’ in the table essentially refers to all ASCII characters—that is, a-z, A-Z, 0-9, and common punctuation marks (comma, colon, question mark, and so forth), but not special letters such as é, Å, ö, ñ, ß, and so forth.

3. The ‘uni’ locale is not supported by Verity VDK 4.5.1; it requires VDK 5.x or higher. If required, contact Oracle support for assistance.

finnishv All Western European languages (1)

japanb Japanese + English (2)

koreab Korean + English (2)

simpcb Simplified Chinese + English (2)

tradcb Traditional Chinese + English (2)

uni (3)All languages and language combinations. In addition, this includes Arabic, Czech, Greek, Hebrew, Hungarian, Nynorskv, Polish, Russian, and Turkish.

Verity locale Languages that Verity index engine can handle

Note: As shown in the table above, the ‘uni’ Verity locale (UTF-8) provides the most comprehensive indexing and searching capabilities. If you anticipate that the content server will handle multilingual content from various language groups, it is recommended that you use this Verity locale.

Verity Integration Guide 6-3

Page 66: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

International Configuration

SETTING THE VERITY LOCALEWhen the Verity component is installed and enabled, a locale must be specified. Specifying this configuration setting is part of Installing Verity (page 2-4). The same procedure can be used to change the locale.

To specify a language locale, use the following procedure:

1. Go to the file <Install_Dir>/<instance>/config/config.cgf (where <Install_Dir> is the installation directory of the Content Server instance), and open it in a text editor.

2. Add the following entry:VerityLocale=<value>

where <value> is any of the Verity locales in the table in Supported Verity Locales (page 6-2), for example:VerityLocale=spanishv

3. Save the modified configuration file and exit the text editor. (If a warning is displayed about saving the file to a text-only format, go ahead and confirm the action.)

4. Restart the Content Server instance.

Note: The language setting for Verity is separate from language configuration for Content Server.

Note: Asian and German language settings require that you install and enable the VDK6asian or VDK6german component. In order to use the ‘uni’ locale, you must install and enable the VDK6, VDK6asian, and VDK6german components. For details see Considerations (page 6-5).

Caution: If the new Verity locale does not use the same encoding scheme as the old one (for example, from ‘englishv’ to ‘japanb’), you need to rebuild the search index. This may be a very time-consuming process, depending on the number of content items managed by your Content Server instance. It is therefore recommended that you perform the index rebuild during off-peak hours of Content Server use (typically at night or on the weekend).

6-4 Verity Integration Guide

Page 67: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

International Configuration

CONSIDERATIONSPlease note the following important considerations with regard to the Verity search engine:

Two sets of languages require installation of an additional VDK6 component:

• VDK6asian.zip for Japanese and Korean

• VDK6german.zip for German

You can install and enable the components using the Component Wizard or Component Manager, similar to installing the Verity component (see Installing Verity (page 2-4). After the component is installed, follow the procedure in Setting the Verity Locale (page 6-4).

You can only use one Verity locale per Content Server instance. If you want Verity to handle content in languages that do not belong to the same language group, you must use ‘uni’ as the Verity locale. For example:

• Combinations of Western European languages and Asian languages (for example, English and Japanese)

• Combinations of Western and Eastern European languages (for example, English, German, and Russian)

• Combinations of Asian languages (for example, Japanese and Korean)

The content ID must be supplied in single-byte characters. Multi-byte characters for the content ID are not supported by Verity. It is recommended that you autogenerate your content IDs (to configure this use Administration—Admin Server—<Instance Name> button—General Configuration—Automatically assign a content ID on check-in).

You can also hide the content ID field on the content check-in page by adding the following parameter to <Install_Dir>/config/config.cfg (where <Install_Dir> is the installation directory of the Content Server instance):dDocName:isHidden=1

You can use search operators to broaden or narrow your Verity full-text search. Some common search operators include AND, OR, and NOT. Verity supports localized (that is, non-English) search operators. However, you need to enclose these in angle brackets, for example: “système <ET><SAUF> gestion.” If you use multiple English search operators, you also need to enclose them in single brackets.

Verity Integration Guide 6-5

Page 68: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

International Configuration

The table below shows some localized versions of the AND, OR, and NOT search operators.

If you are having problems finding full-size and half-size Japanese katakana in your checked-in documents, you may need a patch from Verity and/or some modifications to the Verity configuration.

If PDF bookmarks for Word documents with multi-byte characters in their headings are not displayed correctly, make sure that you enable the “Unicode signatures for PDF bookmarks” option in Inbound Refinery (local configuration setting). This setting specifies that Unicode character coding should be used when creating PDF bookmarks for Microsoft Word documents rather than ASCII. Unicode uses 16 bits, which means that non-ACSII characters can be used in the PDF bookmarks.

Verity 4.5 only: Search word highlighting is not supported for multi-byte locales. This feature should be turned off in Verity if your system locale is a multi-byte language (for example, Japanese or Korean).

To do this, proceed as follows:

1. UNIX: Start the System Properties utility by running the System_Properties script, which is located in the bin subdirectory of the Verity installation directory.

Windows: Choose Start—Programs—Verity—<Instance Name>—Tools—System Properties (make sure you have administrator rights).

2. Make sure the Options tab is opened.

Language Search Operator

English AND OR NOT

French <ET> <OU> <SAUF>

German <UND> <ODER> <NICHT>

Spanish <Y> <O> <EXCEPTO>

Portuguese <E> <OU> <SALVO>

Tech Tip: If you are using a non-English Verity locale, you can still use English search operators by enclosing them in angle brackets and using a hash symbol (for example, “système <#AND><#NOT> gestion.”

6-6 Verity Integration Guide

Page 69: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

International Configuration

3. Make sure the “Enable search keyword highlighting” check box is cleared, and click OK.

4. Restart the content server instance.

Verity Integration Guide 6-7

Page 70: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section
Page 71: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

A p p e n d i x

A.TROUBLESHOOTING

OVERVIEWThis section covers the following topics:

Accessing Log Files (page A-1)

Console Server Logs (page A-2)

Console Output Logs (page A-2)

Search Engine Issues (page A-2)

Accented Letters Are Not Found (page A-6)

Documents With Asian Characters Not Found (page A-7)

Text in PDF Files Cannot Not Found (page A-7)

Microsoft Word Documents with Embedded Links (page A-8)

ACCESSING LOG FILESThe log files of Content Server are normally accessed from the Administration page of the content server. When your Verity locale has been specified in the config.cfg file, Content Server restarted, and the index rebuilt, Verity will be included in the log files.

Note: You must be logged in to the content server as an administrator to be able to view the log files.

Verity Integration Guide A-1

Page 72: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Troubleshooting

You can also view log files on the file system of the content server computer. To view Verity log files, go to the following directory:

<Install_Dir>/<Instance>/weblayout/groups/secure/logs/verity

CONSOLE SERVER LOGSContent Server logs are listed by date and time. One file is generated for each day. Entries are added to the file throughout the day as events occur. For more information see the Troubleshooting Guide.

CONSOLE OUTPUT LOGSWhen the Verity is running as a Windows service, the console output logs are created automatically when the Verity is launched and is properly configured. In the event of a server crash, this feature enables the capture of output from the Java Virtual Machine (VM), which includes logging output from any enabled tracing facilities such as Verity output. For more information on console output logs see the Troubleshooting Guide.

SEARCH ENGINE ISSUESThis section details some of the ways to troubleshoot issues related to the Verity search engine and the content server.

Considerations (page A-3)

Manually Rebuilding the Verity Index (page A-4)

Tech Tip: The error “Error with path to collection. Directory 'verity.1' does not exist.” might appear in the content server or Verity logs every few minutes or irregularly. The system may otherwise seem to be functioning correctly. To solve this problem, locate and delete the file <Install_Dir>/search/activeindex.hda. The system will recreate it automatically and the error should go away.

A-2 Verity Integration Guide

Page 73: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Troubleshooting

ConsiderationsBefore rebuilding the search index in order to solve an indexing problem, please do the following (beyond the usual):

Make sure all stray mkvdk processes are gone.

Reboot the machine.

Perform a ‘-repair’ operation on the search collection.

Look for timeout errors in the server output or content server logs and increase the timeouts if necessary until these errors go away. The standard configuration entry TimeoutPerOneMegInSec can be used (increase to 1800 or some number like that). In Content Server 7.x, (but not 7.1.1) this entry is ignored if all the files have file size zero. In Content Server 7.1.1, this was fixed, plus a new configuration entry TimeoutForIndexingHousekeepingInSec was added, which can be set as a global timeout for all indexing sessions. Its current default is 600 (or 10 minutes).

If none of the above actions resolve the issue, any of the following could cause the problem:

The operating system thinks a file is locked. You can check this by seeing if you can rename the search directory (with no content server process running).

The timeouts are not set long enough (this is a very common problem).

A third-party virus scanner is locking critical files.

There are too many parts files causing “too many file handles open” issues on UNIX.

Performance is bad due to too few parts files.

The AdditionalIndexBuildParams configuration parameter is over-used.

The disk drive is dying or has a bad driver. Usually Verity is the first to discover this because it is by far the most disk-intensive activity going on.

Performance is bad because zone-indexed fields are not being used.

There are performance issues because complicated Verity queries could be replaced with simple optimized database queries.

A hanging mkvdk process is blocking all other successful indexing sessions.

Verity Integration is over-used to create a navigational structure, when navigational structures should be prebuilt by a background thread.

A hanging network connection is locking a critical collection file.

Verity Integration Guide A-3

Page 74: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Troubleshooting

A network card is bad.

An incorrectly implemented firewall solution is blocking critical network ports.

Search connection performance is bad because of a slow network share.

Network failure (NFS) makes an unkillable mkvdk process.

Manually Rebuilding the Verity IndexWhen the index files for Verity become corrupted, you can do a complete rebuild of the indexing to correct the problem. However, first closely examine possible reasons for the corruption and whether a more simple solution is available (see Considerations (page A-3)).

A complete rebuild of the Verity index files is initiated by removing key Verity directories and field before performing a Collection Rebuild. When Verity detects the key files are absent, it recreates new versions of the files. When the Collection Rebuild completes, the problems associated with the corrupted index are usually resolved.

Before applying this solution for a collection rebuild, consider the number of documents in the content server. If the content server has 5000 or more documents, a rebuild of the index may take considerable time and a Verity cleanup may provide a quicker solution.

1. Log in to the content server using a login with administrator rights.

2. Open the Repository Manager and select the Indexer tab.

3. Select the Configure button in the Automatic Update Cycle window.

4. Clear the Enabled box for Indexer Auto Updates.

5. Stop the content server.

Caution: Normally a complete rebuild of this type is a last resort. Depending on the size of the index and available system resources, the rebuild process can take days. If rebuilding is necessary, rebuild at times of non-peak system usage. Also, if your system is a production system, you might want to keep the current index active as shown in the activeindex.hda file. If you rename both index collections, nothing will be searchable while doing a rebuild. Also, it is suggested to not rename the activeindex.hda file unless there is reason to believe it is corrupted.

Important: You will need to enable this selection after the successful rebuild of the index.

A-4 Verity Integration Guide

Page 75: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Troubleshooting

6. Navigate to your content Server search directory, <instance>\search\, and rename the directories and files shown below as back. If you do not have index2, you may skip that directory.

index1 to index1_bak

index2 to index2_bak

activeindex.hda to activeindex_back.hda

7. Navigate to the rebuild directory, <instance>\search\rebuild\, and rename the files below in the rebuild directory as backup. If you do not have a rebuild directory, you may skip this step.

changes.txt to changes_bak.txt

state.hda to state_bak.hda

8. Navigate to the lock directory, <instance>\search\lock\, and rename the lock files below. If you do not have these lock files, you may skip this step.

rebuild_lock.dat to rebuild_lock_bak.dat

rebuild_suspect.dat to rebuild_suspect_bak.dat

9. Start the content server.

10. Log in to the content server using a login with administrator rights.

11. Open the Repository Manager and select the Indexer tab.

12. Start the rebuild cycle by clicking the Start button in the Collection Rebuild Cycle window. The rebuild process time of the index will depend on how many files you have on the content server.

13. When the rebuild is complete, test it by searching for content or by testing the issue that was present before the rebuild.

14. Select the Configure button in the Automatic Update Cycle window.

15. Check the Enabled box for Indexer Auto Updates.

16. After a successful rebuild of the index, you can delete any backed up files (named in previous steps).

Verity Integration Guide A-5

Page 76: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Troubleshooting

SEARCH ISSUESThis section provides solutions to several search issues.

Accented Letters Are Not Found (page A-6)

Documents With Asian Characters Not Found (page A-7)

Text in PDF Files Cannot Not Found (page A-7)

Microsoft Word Documents with Embedded Links (page A-8)

Accented Letters Are Not Found

SymptomIf end users do a full-text search for certain letters, the accented varieties of those letters are not found. For example, searching for ‘e’ does not find ‘é’, ‘ê’, ‘ë’, ‘É’, etc.

ProblemBy default, Verity will not find accented versions of characters. It can be set up to make searches accent-insensitive, but you need to do this manually.

Alternatively, if you upgraded the Verity software from a release prior to 5.1, your Verity locale may still be set to ‘english’. With this Verity locale, accented characters are indexed as question marks (?), which means end users cannot search for word containing them.

RecommendationTo specify that searches should be accent-insensitive, you need to change some settings on the content server. SeeSetting Accent Insensitivity (page 5-4).

Also, if your Verity locale is still set to ‘english’, change it to ‘englishv’. For details on Verity locales, refer to Chapter 6 (International Configuration).

Note: Once you set up accent-insensitive searches, all searches need to be performed in lower case. In other words, a search for ‘e’ will find ‘é’, ‘ê’, ‘ë’, and ‘É’, but a search for ‘E’ will only find ‘E’.

A-6 Verity Integration Guide

Page 77: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Troubleshooting

Documents With Asian Characters Not Found

SymptomIf end users do a full-text search for Asian characters (for example, Japanese or Korean), nothing is found.

ProblemFor Asian characters to be found successfully, the correct Verity locale needs to be set up. The Verity locale must be configured for Verity. Asian languages require their own Verity component (VDK6Asian) be installed for Content Server version 10g Release 3 (10.1.3.3.1). Earlier versions of Verity require a specific locale, such as ‘japanb’.

RecommendationMake sure that the Verity locale has been set up to support the correct Asian language (for example, ‘japanb’ for Japanese or ‘koreab’ for Korean). For instructions on how to do that, refer to Setting the Verity Locale (page 6-4). If you want to search for multiple Asian languages, you should use the Verity locale ‘uni’.

Text in PDF Files Cannot Not Found

SymptomText in PDF files cannot be found, but the indexer log file shows no indexing errors.

ProblemText in PDF files in bidirectional languages (Hebrew, Arabic) is sent to Verity for full-text indexing, but Verity does not handle it correctly (it basically indexes the text backwards). Since the text was indexed, no errors are reported.

RecommendationUse the XMLIndexerExport component. This component extracts the text in the PDF files to XML files, which are subsequently full-text indexed by Verity correctly.

Verity Integration Guide A-7

Page 78: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Troubleshooting

Microsoft Word Documents with Embedded LinksMicrosoft Word documents with embedded links might not be included in the search index. This means that these files will not be found in search queries. The easiest way to ensure that these files are included in the search index is to remove all embedded hyperlinks from the documents. If this is not an option, add the following line to the <Install_Dir>/config/config.cfg file and restart Content Server:

CheckMkvdkDocCount=true

The configuration entry will ensure that the Word files are included in the search index.

Note: Only the metadata will be included in the index, not the full text.

A-8 Verity Integration Guide

Page 79: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

A p p e n d i x

B.THIRD PARTY LICENSES

OVERVIEWThis appendix includes a description of the Third Party Licenses for all the third party products included with this product.

Apache Software License (page B-1)

W3C® Software Notice and License (page B-2)

Zlib License (page B-3)

General BSD License (page B-4)

General MIT License (page B-5)

Unicode License (page B-5)

Miscellaneous Attributions (page B-7)

APACHE SOFTWARE LICENSE* Copyright 1999-2004 The Apache Software Foundation.

* Licensed under the Apache License, Version 2.0 (the "License");

* you may not use this file except in compliance with the License.

* You may obtain a copy of the License at

* http://www.apache.org/licenses/LICENSE-2.0

*

Verity Integration Guide B-1

Page 80: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Third Party Licenses

* Unless required by applicable law or agreed to in writing, software

* distributed under the License is distributed on an "AS IS" BASIS,

* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

* See the License for the specific language governing permissions and

* limitations under the License.

W3C® SOFTWARE NOTICE AND LICENSE* Copyright © 1994-2000 World Wide Web Consortium,

* (Massachusetts Institute of Technology, Institut National de

* Recherche en Informatique et en Automatique, Keio University).

* All Rights Reserved. http://www.w3.org/Consortium/Legal/

*

* This W3C work (including software, documents, or other related items) is

* being provided by the copyright holders under the following license. By

* obtaining, using and/or copying this work, you (the licensee) agree that

* you have read, understood, and will comply with the following terms and

* conditions:

*

* Permission to use, copy, modify, and distribute this software and its

* documentation, with or without modification, for any purpose and without

* fee or royalty is hereby granted, provided that you include the following

* on ALL copies of the software and documentation or portions thereof,

* including modifications, that you make:

*

* 1. The full text of this NOTICE in a location viewable to users of the

* redistributed or derivative work.

*

* 2. Any pre-existing intellectual property disclaimers, notices, or terms

* and conditions. If none exist, a short notice of the following form

* (hypertext is preferred, text is permitted) should be used within the

* body of any redistributed or derivative code: "Copyright ©

* [$date-of-software] World Wide Web Consortium, (Massachusetts

B-2 Verity Integration Guide

Page 81: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Third Party Licenses

* Institute of Technology, Institut National de Recherche en

* Informatique et en Automatique, Keio University). All Rights

* Reserved. http://www.w3.org/Consortium/Legal/"

*

* 3. Notice of any changes or modifications to the W3C files, including the

* date changes were made. (We recommend you provide URIs to the location

* from which the code is derived.)

*

* THIS SOFTWARE AND DOCUMENTATION IS PROVIDED "AS IS," AND COPYRIGHT HOLDERS

* MAKE NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT

* NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR

* PURPOSE OR THAT THE USE OF THE SOFTWARE OR DOCUMENTATION WILL NOT INFRINGE

* ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.

*

* COPYRIGHT HOLDERS WILL NOT BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL OR

* CONSEQUENTIAL DAMAGES ARISING OUT OF ANY USE OF THE SOFTWARE OR

* DOCUMENTATION.

*

* The name and trademarks of copyright holders may NOT be used in advertising

* or publicity pertaining to the software without specific, written prior

* permission. Title to copyright in this software and any associated

* documentation will at all times remain with copyright holders.

*

ZLIB LICENSE* zlib.h -- interface of the 'zlib' general purpose compression library

version 1.2.3, July 18th, 2005

Copyright (C) 1995-2005 Jean-loup Gailly and Mark Adler

This software is provided 'as-is', without any express or implied

warranty. In no event will the authors be held liable for any damages

Verity Integration Guide B-3

Page 82: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Third Party Licenses

arising from the use of this software.

Permission is granted to anyone to use this software for any purpose,

including commercial applications, and to alter it and redistribute it

freely, subject to the following restrictions:

1. The origin of this software must not be misrepresented; you must not

claim that you wrote the original software. If you use this software

in a product, an acknowledgment in the product documentation would be

appreciated but is not required.

2. Altered source versions must be plainly marked as such, and must not be

misrepresented as being the original software.

3. This notice may not be removed or altered from any source distribution.

Jean-loup Gailly [email protected]

Mark Adler [email protected]

GENERAL BSD LICENSECopyright (c) 1998, Regents of the University of California

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

"Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

"Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

"Neither the name of the <ORGANIZATION> nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

B-4 Verity Integration Guide

Page 83: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Third Party Licenses

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

GENERAL MIT LICENSECopyright (c) 1998, Regents of the Massachusetts Institute of Technology

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,

INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A

PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT

HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF

CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE

OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

UNICODE LICENSEUNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE

Unicode Data Files include all data files under the directories http://www.unicode.org/Public/, http://www.unicode.org/reports/, and http://www.unicode.org/cldr/data/ . Unicode Software includes any source code published in the Unicode Standard or under the directories http://www.unicode.org/Public/, http://www.unicode.org/reports/, and http://www.unicode.org/cldr/data/.

Verity Integration Guide B-5

Page 84: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Third Party Licenses

B-6 Verity Integration Guide

NOTICE TO USER: Carefully read the following legal agreement. BY DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING UNICODE INC.'S DATA FILES ("DATA FILES"), AND/OR SOFTWARE ("SOFTWARE"), YOU UNEQUIVOCALLY ACCEPT, AND AGREE TO BE BOUND BY, ALL OF THE TERMS AND CONDITIONS OF THIS AGREEMENT. IF YOU DO NOT AGREE, DO NOT DOWNLOAD, INSTALL, COPY, DISTRIBUTE OR USE THE DATA FILES OR SOFTWARE.

COPYRIGHT AND PERMISSION NOTICE

Copyright © 1991-2006 Unicode, Inc. All rights reserved. Distributed under the Terms of Use in http://www.unicode.org/copyright.html.

Permission is hereby granted, free of charge, to any person obtaining a copy of the Unicode data files and any associated documentation (the "Data Files") or Unicode software and any associated documentation (the "Software") to deal in the Data Files or Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, and/or sell copies of the Data Files or Software, and to permit persons to whom the Data Files or Software are furnished to do so, provided that (a) the above copyright notice(s) and this permission notice appear with all copies of the Data Files or Software, (b) both the above copyright notice(s) and this permission notice appear in associated documentation, and (c) there is clear notice in each modified Data File or in the Software as well as in the documentation associated with the Data File(s) or Software that the data or software has been modified.

THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THE DATA FILES OR SOFTWARE.

Except as contained in this notice, the name of a copyright holder shall not be used in advertising or otherwise to promote the sale, use or other dealings in these Data Files or Software without prior written authorization of the copyright holder.

________________________________________Unicode and the Unicode logo are trademarks of Unicode, Inc., and may be registered in some jurisdictions. All other trademarks and registered trademarks mentioned herein are the property of their respective owners

Page 85: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Third Party Licenses

MISCELLANEOUS ATTRIBUTIONSAdobe, Acrobat, and the Acrobat Logo are registered trademarks of Adobe Systems Incorporated.FAST Instream is a trademark of Fast Search and Transfer ASA.HP-UX is a registered trademark of Hewlett-Packard Company.IBM, Informix, and DB2 are registered trademarks of IBM Corporation.Jaws PDF Library is a registered trademark of Global Graphics Software Ltd.Kofax is a registered trademark, and Ascent and Ascent Capture are trademarks of Kofax Image Products.Linux is a registered trademark of Linus Torvalds.Mac is a registered trademark, and Safari is a trademark of Apple Computer, Inc.Microsoft, Windows, and Internet Explorer are registered trademarks of Microsoft Corporation.MrSID is property of LizardTech, Inc. It is protected by U.S. Patent No. 5,710,835. Foreign Patents Pending.Oracle is a registered trademark of Oracle Corporation.Portions Copyright © 1994-1997 LEAD Technologies, Inc. All rights reserved.Portions Copyright © 1990-1998 Handmade Software, Inc. All rights reserved.Portions Copyright © 1988, 1997 Aladdin Enterprises. All rights reserved.Portions Copyright © 1997 Soft Horizons. All rights reserved.Portions Copyright © 1995-1999 LizardTech, Inc. All rights reserved.Red Hat is a registered trademark of Red Hat, Inc.Sun is a registered trademark, and Sun ONE, Solaris, iPlanet and Java are trademarks of Sun Microsystems, Inc.Sybase is a registered trademark of Sybase, Inc.UNIX is a registered trademark of The Open Group.Verity is a registered trademark of Autonomy Corporation plc

Verity Integration Guide B-7

Page 86: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section
Page 87: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

C h a p t e r

GLOSSARY

administration applicationOne of the following Java applications that are used for administration purposes: User Admin (page 1-5), Workflow Admin (page 1-6), Web Layout Editor (page 1-6), Repository Manager (page 1-4), Configuration Manager (page 1-2), Archiver (page 1-1)These applications can be run as a Java applet from a Java-enabled browser, or in stand-alone mode from the content server computer. See also administration utility (page 1-1).

administration utilityOne of the following Java applications that are used for administration purposes: Component Wizard (page 1-2), Batch Loader (page 1-2), System Properties (page 1-5)These utilities can be run only in stand-alone mode from the content server computer. See also administration application (page 1-1).

alternate fileA web-viewable version of the primary file, or a version that can be converted to a web-viewable format upon checkin. The alternate file must be specified and checked in at the same time as the primary file. See also primary file (page 1-4).

ArchiverAn administration application (page 1-1) that is used to transfer and reorganize content server files and information.

Verity Integration Guide G-1

Page 88: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Batch LoaderAn administration utility (page 1-1) that is used to check in (insert), delete, or update multiple content items at a time.

componentAn optional module that adds to or changes the functionality of the out-of-the-box installation of Content Server.

Component ManagerAn Admin Server feature that enables administrators to remotely enable, disable, upload, and download content server components.

Component WizardAn administration utility (page 1-1) that is used to create and manage custom components.

Configuration ManagerAn administration application (page 1-1) that enables the system administrator to create and manage content types, file formats, and metadata fields.

contentA collective term for the content items in the Verity repository.

content IDA standard, required metadata field that provides a unique identifier for each content item.

content informationSee metadata (page 1-3).

content itemA file that has been checked in to the Verity repository. A content item includes a primary file (page 1-4) and metadata (page 1-3), and can include an alternate file (page 1-1).

content profileA set of one or more rules that can be used to control the display of metadata fields on the check-in and search pages. Content profiles are used to filter what information is displayed on these pages, based on user attributes, content attributes, or a combination of the two. This enables system administrators to make the check-in and search pages less complex and more specifically geared to particular user or content types.

G-2 Verity Integration Guide

Page 89: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

content repositoryThe place where content files are stored.

external collectionA set of content files that are indexed and stored in a separate Verity collection rather than in the content server database.

full-text indexingThe process of creating a searchable index that includes every word in a file.

full-text searchA search that compares the query expression against every word in a file. See also metadata search (page 1-4).

full-text search operatorA word or symbol that refines the query expression for a full-text search (for example, AND, OR, and double quotation marks ").

Idoc ScriptStellent’s proprietary server-side script language that is used to modify the functionality and look-and-feel of Stellent products. Idoc Script tags are in the format <$script$>.

localeA setting that specifies the language of the content server interface and defines how the content server handles language-specific issues, such as date formatting and full-text indexing. See also system locale (page 1-5), user locale (page 1-5), and Verity locale (page 1-6).

metadataInformation about a content item, such as Title, Author, Security Group, and so on. Metadata is used to describe, find, and group content items. Also referred to as content information.

metadata fieldA field on a web page that is used to define metadata during checkin, or to define search criteria. Also referred to as content information field.

Verity Integration Guide G-3

Page 90: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

metadata searchA search that compares the query expression against metadata field values. See also full-text search (page 1-3).

native file

The original file that is checked into the content server file repository. See also primary file (page 1-4).

primary fileThe original file that is checked in to the Verity repository. See also native file (page 1-4) and alternate file (page 1-1).

querySee search (page 1-4).

query expressionA statement that specifies the criteria to be matched during a search. See also search criteria (page 1-4).

query result pageSee search results page (page 1-5).

Repository ManagerAn administration application (page 1-1) that is used to: manage content items (view status, delete revisions, and so on); create criteria subscriptions and assign users to subscriptions; update and rebuild the search index

searchTo retrieve a list of content items that match specified criteria.

search collectionSee Verity collection (page 1-6).

search criteriaThe metadata values and/or full-text words and phrases to be matched during a search. See also query expression (page 1-4).

G-4 Verity Integration Guide

Page 91: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

search engineSoftware included with Verity that performs metadata and full-text searches. See also search index (page 1-5).

search indexA set of files that contain metadata information and the full-text indexes. The search index is created by the Indexer and is read by the search engine (page 1-5).

search operatorA word or symbol that can be used in a query expression to refine the search criteria (for example, AND, OR, NOT, Substring, Matches, and so on).

search resultsA list of content items that match the search criteria.

search results pageStandard Stellent page that displays the results of a query. Also referred to as query result page.

system localeA setting that specifies the language of the content server interface and defines how the content server handles language-specific issues on a system-wide basis. See also user locale (page 1-5) and Verity locale (page 1-6).

System PropertiesAn administration utility (page 1-1) that is used to set global options and customize the content server environment.

User AdminAn administration application (page 1-1) that is used to manage content server users and security access.

user localeA setting that specifies the language of the content server interface and defines how the content server handles language-specific issues for an individual user. See also system locale (page 1-5) and Verity locale (page 1-6).

Verity IntegrationAn optional indexer and search engine tool that can be used with Content Server.

Verity Integration Guide G-5

Page 92: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Verity collectionA set of files that are created by Verity Integration for indexing and searching. Also referred to as index collection or search collection.

Verity localeA Verity Integration setting that extends the Verity search function to work with languages other than English. See also system locale (page 1-5) and user locale (page 1-5).

Web Layout EditorAn administration application (page 1-1) that is used to create the Library hierarchy, define reports, modify search result pages, and update the portal page.

Workflow AdminAn administration application (page 1-1) that is used to set up and manage workflows.

G-6 Verity Integration Guide

Page 93: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

I n d e x

#* (wildcard)

Verityfull-text, 3-8metadata, 3-5

? (wildcard)Verity, 3-5, 3-8

Aaccented characters

searching for --, A-6setting search insensitivity, 5-4

ACCRUE (search operator)Verity, 3-10

addingmetadata fields, 4-11

AND (search operator)Verity, 3-9

audience, 1-3

BBack quotation mark (‘) (search operator), 3-10

CCASE (search operator)

Verity, 3-11case sensitivity

metadata searches, 3-4checking the Verity version, 2-3CheckMkvdkDocCount, A-8collections

Verity, 5-11, 5-11Comma(,) (search operator), 3-9Contains (metadata search operator - Verity only), 3-3content IDs

multibyte --, 6-5

conventions in this guide, 1-4conversion

status, 4-7customizing

Verity, 5-4Verity for PDF files, 5-6

Ddate storage, Verity, 5-3documents with embedded links, Microsoft Word, A-8Double quotation mark (") (search operator), 3-10Dynamic Converter, 5-2

Eembedded links, Microsoft Word documents, A-8enabling

tokenization, 5-4Ends (metadata search operator), 3-4examples

merging Verity partitions, 5-13Verity script, 5-11

excluding symbols from tokenization, 5-5

Ffeatures, 1-2fields

adding metadata, 4-11file extensions, partition, 5-12file formats

Verity supported, 5-3formats

Verity supported, 5-3full-text indexing

Verity, 5-2full-text searches

home page search fields, 3-12options, 3-6

Verity Integration Guide Index-1

Page 94: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Index

quick search field, 3-12search engines, 3-6

HHas Word (metadata search operator), 3-4Has Word Prefix (metadata search operator), 3-4headings in Word documents, 6-6home page search fields, 3-12HTML documents, 5-9

Iindex

location on server, 1-2rebuilds, 1-2

indexingstructured documents with Verity, 5-8Verity full-text, 5-2

Internet message format documents, 5-9

Kkeyword highlighting, 5-6

Llimiting Verity search index on UNIX, 5-14locales, 5-2

checking the Verity version, 2-3special characters, 6-1

localized Verity search operators, 6-5

MMatches (metadata search operator), 3-3merging partitions, 5-13message format documents, 5-9metadata fields

adding, 4-11, 4-11metadata search fields, 3-3metadata search operators

Contains, 3-3Ends, 3-4Has Word, 3-4Has Word Prefix, 3-4Matches, 3-3Not Has Word, 3-4Starts, 3-4Substring, 3-3

metadata searchescase sensitivity, 3-4home page search fields, 3-12search operators, 3-3wildcards, 3-5

Microsoft Word documents with embedded links, A-8multi-byte characters, 5-2multibyte content IDs, 6-5

NNEAR (search operator)

Verity, 3-10NOT (search operator)

Verity, 3-9Not Has Word (metadata search operator), 3-4

Oopen file limit on UNIX, 5-15OR (search operator)

Verity, 3-9

PPARAGRAPH (search operator)

Verity, 3-10partitions

file extensions, 5-12merging, 5-13

passthru, 5-3PDF bookmarks, 6-6PDF files, 5-4, 5-5, 5-6PDF files, customizing Verity for, 5-6performance

search, 4-4

Qqueries

writing Verity, 5-9query script

Verity, 5-9quick search field, 3-12

Sscript

Verity example, 5-11search collections

understanding, 5-11

Index -2 Verity Integration Guide

Page 95: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Index

search engines, 3-6search fields

home page, 3-12quick search, 3-12

search index, 2-9limiting on UNIX, 5-14

search operatorsmetadata searches, 3-3

search operators (Verity), 6-5search performance, 4-4search zones, 4-4searching

zone, 4-4searching for content

home page search fields, 3-12quick search field, 3-12

SENTENCE (search operator)Verity, 3-10

settingaccent insensitivity, 5-4tokenization, 5-4

SGML documents, 5-9Single quotation mark (’) (search operator), 3-10SOUNDEX (search operator)

Verity, 3-11special characters and Verity locale, 6-1special characters in search index, 1-2specifying

symbols as search terms, 5-5Starts (metadata search operator), 3-4structured documents, indexing with Verity, 5-8Substring (metadata search operator), 3-3supported file formats, Verity, 5-3symbols, specifying as search terms, 5-5system requirements, 1-2

Ttasks

Indexer, 2-9THESAURUS (search operator)

Verity, 3-11tokenization, 5-4

defaults, 5-5excluding symbols, 5-5

TYPO (search operator)Verity, 3-11

Uunderstanding

Verity collections, 5-11

UNIXlimiting Verity search index, 5-14open file limit, 5-15

Update Database Design screenmetadata fields, 4-7

VVerity

about, 1-1accent insensitivity, 5-4checking -- version, 2-3collections, 5-11, 5-11customizing, 5-4customizing for PDF files, 5-6date storage, 5-3excluding symbols from tokenization, 5-5full-text indexing, 5-2HTML documents, 5-9indexing structured documents, 5-8Internet message format documents, 5-9limiting search index on UNIX, 5-14merging partitions, 5-13Microsoft Word documents with embedded links, A-8partition file extensions, 5-12PDF files, 5-4, 5-5, 5-6query script, 5-9script example, 5-11search operators, 6-5SGML documents, 5-9specifying symbols as search terms, 5-5supported file formats, 5-3system requirements, 1-2tokenization, 5-4wildcards in metadata searches, 3-5writing queries, 5-9XML documents, 5-8zone filter, 5-8

Verity locale, 6-1Verity search operators

" (double quotation mark), 3-10, (comma), 3-9‘ (back quotation mark), 3-10’ (single quotation mark), 3-10ACCRUE, 3-10AND, 3-9CASE, 3-11NEAR, 3-10NOT, 3-9OR, 3-9PARAGRAPH, 3-10SENTENCE, 3-10SOUNDEX, 3-11

Verity Integration Guide Index-3

Page 96: Verity Integration Guide - Oracleotndnld.oracle.co.jp/document/products/oecm/content/1013/... · 2008-10-10 · Verity Integration Guide 1-1 Chapter 1.INTRODUCTION OVERVIEW This section

Index

THESAURUS, 3-11TYPO, 3-11

Wwildcards

* (asterisk)Verity (full-text), 3-8Verity (metadata), 3-5

? (question mark)Verity, 3-5

metadata searches, 3-5Word documents with embedded links, A-8Word documents, headings in --, 6-6writing

Verity queries, 5-9

XXML documents, 5-8

Zzone filter, 5-8zone searching, 4-4Zoned Search

OperatorsHas Word, 3-4Has Word Prefix, 3-4Not Has Word, 3-4

ZonedSecurityFields, 4-4zones, 5-8

Index -4 Verity Integration Guide


Recommended