+ All Categories

TREX7.0

Date post: 14-Apr-2018
Category:
Upload: ravibabu1620
View: 217 times
Download: 0 times
Share this document with a friend
105
PRIN T FROM SAP HELP PORTAL Document: TREX 7.0 URL: http://help.sap.com/erp2005_ehp_06/helpdata/en/40/83505303bd5616e10000000a114cbd/content.htm Date created:  August 18, 2013 © 2013 SAP AG or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary software c omponents of other software vendors. National product specificat ions m ay vary. These materials are provided by SAP AG and its affili ated c ompanies (" SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that a re set forth in the express warranty statements acc ompanyin g such products and services, if any. Nothi ng herein should be construed as constituting an additional w arranty. SAP and other SAP product s and services menti oned herein as wel l as their respec tive logos are trade marks or registered trad emarks of SAP AG in German y and other countri es. Please see www.sap.com/c orporate -en/lega l/copyright/index.ep x#trade mark for additional trademark i nformation and notices. Note This PDF document contains the selected topic and its subtopics (max. 150) in the selected structure. Subtopics from other structures are not included. Th e selected structure has more than 150 subtopics. This download contains only the first 150 subtopics. You can manually download the missing subtopics. PUBLIC © 2013 SAP AG or an SAP affiliate company. All rights reserved. Page 1 of 105
Transcript
Page 1: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 1/105

PRINT FROM SAP HELP PORTAL

Document:TREX 7.0

URL:http://help.sap.com/erp2005_ehp_06/helpdata/en/40/83505303bd5616e10000000a114cbd/content.htm

Date created: August 18, 2013

© 2013 SAP AG or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the expresspermission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary

software components of other software vendors. National product specifications m ay vary. These materials are provided by SAP AG and its affiliated companies (" SAP Group") for 

informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only

warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein

should be construed as constituting an additional warranty. SAP and other SAP products and services mentioned herein as wel l as their respective logos are trademarks or 

registered trademarks of SAP AG in Germany and other countries. Please see www.sap.com/corporate-en/legal/copyright/index.epx#trademark for additional trademark information

and notices.

Note

This PDF document contains the selected topic and its subtopics (max. 150) in the selected structure.Subtopics from other structures are not included.The selected structure has more than 150 subtopics. This download contains only the first 150 subtopics. Youcan manually download the missing subtopics.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 1 of 105

Page 2: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 2/105

TREX 7.0

Purpose

 App lications based on SAP NetWeaver 7.0 or SAP NetWeaver 7.3 can use TREX 7.0 or TREX 7.1.

Documentation Structure

This documentation is organized into the following areas:

● TREX Architecture 

This area contains information about the TREX architecture, the TREX components, and their functions.

● TREX Configuration 

This area contains all relevant procedures that describe how you can configure TREX. The configuration is organized as follows:

○ Post-Installation Configuration 

○ Initial Configuration 

○  Advanced Configuration 

● TREX Administration 

Here you can find information about administrating TREX:

○ Starting and Stopping TREX 

○ TREX Admin Tools 

○ Data Backup and Restore for TREX 

○ Monitoring TREX with CCMS 

TREX ArchitectureTREX is based on a client/server architecture. The client component is integrated into the application that uses the TREX functions, and allows communication

with the TREX servers. The server component processes the requests; it indexes and class ifies documents and answers search queries.

The client component is subdivided into the Java client and ABAP client. The server component is subdivided into the following servers:

● Web server with TREX extension

● RFC server 

● Queue server 

● Preprocessor 

● Index server 

● Name server 

The graphic below shows the individual components and the communication between components:

 

Java Client and ABAP ClientTREX provides programming interfaces (Application Programming Interfaces, APIs) for the languages Java and ABAP. These interfaces are also called the Java

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 2 of 105

Page 3: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 3/105

client and the ABAP c lient.

The interfaces allow access to all TREX functions. You can use the interfaces to create indexes and queues, to perform indexing, and to perform searches. In

addition, the interfaces provide functions to query the internal status of TREX.

The interfaces are part of the NetWeaver Application Servers (NW AS).

 

Web Server with TREX ExtensionThe Web server is responsible for the communication between Java applications and the TREX servers. The application sends requests to the Web server in

XML format using HTTP/HTTPS. The Web server converts the requests to a TREX-internal format and then forwards them to the responsible TREX servers.

 A TREX component that enhances the Web server with TREX-specific functions is installed on the Web server. Technically, this component is implemented as

follows:

· On Windows, as an ISAPI server extension for the Microsoft Internet Information Server 

· On UNIX, as a shared library for the Apache Web server 

 

RFC Server The RFC server is responsible for the communication between an SAP system and the TREX servers.

The SAP system sends requests to an RFC server using an SAP Gateway. The RFC server converts the requests to a TREX-internal format and then forwards

them to the responsible TREX servers.

 

Queue Server The queue server coordinates the p rocessing steps that take p lace during indexing. It collects incoming document, triggers preprocessing by the preprocessor,

and further processing b y the index s erver.

The queue server enables documents to be indexed asynchronously. This has the advantage that you can control the time of indexing. For example, you can

schedule indexing for times when the system load is lower because there are fewer search queries.

In addition, the queue server can trigger index replication and integration of the delta index in the main index.

 

Preprocessor The preprocessor preprocesses documents and search queries.

Document preprocessing comprises the following steps:

· Loading documents

If the application transmits the documents as URIs rather than directly, TREX resolves the URIs. This involves fetching the documents from the repository that

the URIs reference.

· Filtering documents

Documents can exist in various formats, such as Microsoft Word, Microsoft PowerPoint, PDF, and so on. The preprocessor extracts textual content from the

documents and then converts it into the UTF-8 Unicode format for further processing.

· Analyzing documents linguistically

Linguistic analysis involves sp litting text into individual words and reducing words to base forms (stems). The preprocessor uses a lexicon that exists in

several languages for this.

During search queries, the preprocessor performs a linguistic analysis . It transmits the results of the analysis to the index server, which continues the processing

of the document.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 3 of 105

Page 4: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 4/105

Index Server The index server indexes and c lassifies documents and answers search queries. The processing takes place in the engines that belong to the index server.

There are the following engines:

· Search engine:

This engine is responsible for standard search functions such as the exact, error-tolerant, linguistic, Boolean, and phrase searches.

· Text-mining engine

This engine is responsible for classification, searching for similar documents (‘See Also’ search), the extraction of key words, and so on.

· Attribute engine

This engine is responsible for searching for document attributes such as author, creation date, and change date.

 

Name Server The name server manages information on the entire TREX system. It makes sure that the TREX servers can communicate with each other and that they receive

all necessary information. The name server has the following tasks:

· Managing topology data

The topology data includes information on the central components of a TREX system (TREX servers, indexes, and queues).

· Coordinating replication services

The replication services are only relevant for a distributed TREX system. The name server has information on which TREX server has a particular data

status. It makes sure that changed data is replicated.

· Load-balancing

The name server accepts requests and distributes them to the responsible TREX servers. It is responsible for distributing indexes and search queries.

· Ensuring high availability

The name server launches several watch dogs. They constantly monitor whether the TREX servers are available. If a TREX is not available, the name server 

ensures that the TREX server that is down does not receive any requests.

 

TREX Configuration

Purpose

The configuration of Search and Classification (TREX) is organized as follows:

● Post-Installation Configuration

You must work through these steps immediately after the installation, so that a single-host installation of TREX works correctly and can be addressed using

an ABAP or Java application. The documentation distinguishes between configuration steps that you have to complete on the TREX server side andconfiguration steps that you have to complete on the client side, that is, on the side of the application using TREX.

● Initial Configuration

The initial configuration comprises procedures that allow you to check problems that occur and solve them, if necessary. You can also improve TREX

performance. These configuration steps are not required in order for TREX to work correctly in the default configuration or in order to allow applications to use

TREX.

●  Advanced Configuration

 Advanced configuration comprises the following areas:

○ Language Recognition and Processing with TREX

TREX supports the indexing of documents that exist in different languages. When TREX is installed, you select the languages to be identified by language

recognition. You can retrospectively configure TREX to recognize additional languages.

○ File Formats Supported by TREX

Documents whose content and attributes can be indexed and searched by TREX can exist in numerous different file formats. You can configure which file formats

you want to exclude from processing and which parts of XML and HTML files you want to exclude from indexing.

○ Changing Proxy Server SettingsThe TREX preprocessor can access documents on Web pages using a proxy server. You can configure the settings for the proxy server.

○  Activating Python Extensions

Some TREX functions are implemented as Python extensions. If the application using TREX uses these functions, you have to activate the Python extensions.

○ Configuration of the TREX Services in the SAP J2EE Engine

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 4 of 105

Page 5: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 5/105

The TREX Java client is implemented as a TREX service in the J2EE engine. You can use the Visual Administrator to configure TREX caches and the TREX Java

client.

○ Delta Index Configuration

TREX provides the option of activating delta indexes. This allows you to update indexes faster and improve the performance of TREX.

○ Changing the TREX Host Name (Single and Multiple-Host Installation)

You can change the name of the host on which you installed TREX later on, or you can install TREX with a virtual host name. You can do this for both single-host

and multiple-host installations.

○ Configuration of the TREX Security Settings

You can configure secure communication between TREX and the application using it (for example, SAP Enterprise Portal or SAP Customer Relationship

Management). 

Post-Installation Configuration

Purpose

 After the Search and Classification (TREX) function has been installed, you perform a number of technical configuration steps. The sections below describe:

· General configuration steps that you carry out for your operating p latform.

· Configuration steps that you only carry out if the application in question communicates with TREX using an HTTP or an RFC connection.

 

Server Side

Purpose

The following sections describe the configuration steps that you have to carry out on the server side.

 

Configuring TREX for the System Landscape Directory (SLD)

Use

 A modern computing environment consists of a number of hardware and software components that depend on each other with regard to installation, software

updates, and demands on interfaces. The SAP System Landscape Directory (SLD) simp lifies the administration of your system landscape.

The SLD is a server application that communicates with a client application using the Hypertext Transfer Protocol (HTTP). The SLD server contains component

information, a landscape description, and a name reservation, which are based on the standard Common Information Model (CIM). The CIM standard is a general

schema for describing the elements in a system landscape. This s tandard is independent of any implementation.

The component description provides information about all available SAP software modules, as well as their combination options and dependencies. This includes

version numbers, current patch level, and dependencies between landscape components.

For more information about the SAP System Landscape Directory, see SAP Help Portal help.sap.com.

To supply data to the SLD that originates from a system other than a J2EE or ABAP system, the executable sldreg is used. The sldreg sends data in XML format

using a predefined DTD. For this purpose it uses an HTTP connection, as shown in the figure below:

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 5 of 105

Page 6: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 6/105

On the TREX host, there is an SLD client, which generates an XML file of this type and which registers itself with the SLD server using sldreg.

Prerequisites

● After the TREX installation, the SLD c lient and the associated executable files are located on your TREX host.

● The SLD server is running.

● You or your SLD administrator have generated the SLD configuration files slddes t.cfg and slddes t.cfg.key.

The slddest.cfg.key file is only available if the configuration of sldreg was generated using the - usekeyfile parameter.

● The user specified in the SLD configuration file slddest.cfg belongs to the DataSupplierLD user role, in order to have permiss ion to send the files to the SLD.

Generating SLD Configuration Files

In case you generate the SLD configuration files (slddest.cfg and slddest.cfg.key) by yourself you have to know the host, port, user and password of the SLD

server. You generate these configuration files by using the executable files which are located on your TREX host.

1. Set the environment variables required by TREX by executing the following scrip ts in a command prompt in the directory <TREX_DIR>:

UNIX

○ Bourne shell sh, Bourne-again shell bash, Korn shell ksh:

. TREXSettings.sh

○ C shell csh:

source TREXSettings.cs h

Windows

TREXSettings.bat

2. Execute the following commands:

○ Without usekeyfile:sldreg -configure <path>/slddest.cfg

○ With usekeyfile: sldr eg -usekeyfile -configure <path>/slddest.cfg

Copying the SLD Configuration Files to the Global SLD Directory

To configure TREX for the System Landscape Directory (SLD), you copy the SLD configuration files slddest.cfg and slddest.cfg.key (if available) to the global SLD

directory on your TREX host.

This directory is called <disk_drive>:\usr\sap\<SAPSID>\SYS\global on Windows and /usr/sap/<SAPSID>/SYS/global on UNIX. In the case of a distributed

TREX installation on Windows, all TREX instances use the configuration files for the TREX global file system with first TREX instance as\\<host_central_instance>\sapmnt\<SAPSID>\SYS\global.

Result

By copying the files slddest.cfg and slddest.cfg.key, you have configured TREX for integration in the System Landscape Directory (SLD).

TREX checks every five minutes whether anything has changed in the TREX system landscape and reports any changes automatically to the SLD server. If 

nothing has changed, TREX reports every twelve hours to the SLD server. This allows you to see that this landscape is still active.

Display Results

1. To disp lay the information about TREX systems and services navigate to the screen Content Maintenance

○ In the initial screen for the System Landscape Directory ® Development: Content Maintenance

○ In the initial screen for the System Landscape Direc tory ® Administration ® Content: Content Maintenance

2. In the screen Content Maintenance navigate to Subset and choose All With Instances in the dropdown list.

3. Navigate to Class. In the dropdown list you can display the TREX Services (for examp le TREX Index Service, TREX Name Service) and TREX systems

known by SLD.

Information Transferred to the SLD Server 

TREX transfers the following information to the SLD server:

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 6 of 105

Page 7: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 7/105

Information about naming and version

● Software component version (for example, TREX 7.0)

● SAP name (for example, TREX)

● Version (for example, 7.0)

Information about the TREX servers

● Host name, on which the server is running

● Port number that the server is using

● Type of server, for example, indexserver 

● Web server URL (instead of the port)

● RFC destination of the RFC server (ins tead of the port)

Information about the TREX instances on individual hosts

● System ID

● Instance number 

● Installation directory

● Vers ion information for the TREX software

Information about the TREX configuration

● Name of the TREX hosts (Hosts) that belong to the TREX system landscape

● TREX server roles

○ Roles of the TREX name server (Name Server Mode)

Possible roles are: 1st, 2nd, 3rd Master Name Server, Slave Server 

○ Use as master index server or master queue server 

○ Roles of the master, slave, and backup index servers

● TREX prep rocessor mode (Preprocessor Mode)

● Information about the TREX installation directory (Base Path)

● Services that have been s tarted by the TREX daemon (Services)

 

General UNIX Configuration

Purpose

The following sections describe the steps that are necessary after an installation on UNIX.

 

Checking and Changing UNIX Kernel Parameters

Use

Check the following UNIX kernel parameters and modify them if necessary:

· Number of open files per process

On UNIX platforms, each process may only have a certain number of files open at once. If you create a large number of indexes and queues during routine

operation, the TREX processes, in particular the queue server and index server, open a lot of files.

With many UNIX installations, the value for the maximum number of files that the processes are allowed to have open is too low. The parameter must have

the following value:

Operating System Value

 AIX, HP-UX, Sun Solaris At least 2048

Linux At least 1024

· HP-UX only:¡ Process Size

The process size should be at least 2GB.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 7 of 105

Page 8: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 8/105

The process size is not limited for AIX and Sun Solaris.

¡ Files larger than 2 GB

Since TREX can also use files that are larger than 2GB, these must be activated at operating system level.

The TREX directory contains a test program that you can use to check whether the kernel parameters are set at a suitable level. If this is not the case, you should

change the kernel parameters.

Checking Kernel Parameters

1. Log on with the user <saps id>adm.2. Go to the TREX directory.

3. Set the environment variab les required by TREX:

¡ Bourne shell sh, Bourne-again shell bash, Korn shell ksh:

. TREXSettings.sh

¡ C shell csh:

source TREXSettings.csh

4. Test the size and number of open files per process:

portlibtester.x –file

Number of open files:

This command creates test files in the directory /tmp/portlibtester. The test must give a result of at least 1000 files (Linux) or 2000 files for other UNIX

platforms. If this is not the case, you should change the kernel parameters.

5. Only HP-UX – Test the possib le process size:

portlibtester.x –mem

This command calls upon as much main memory as possible. The test must output the value 1900 MB at least. If this is not the case, you should changethe kernel parameters.

Changing Kernel Parameters

AIX

1. Log on as root.

2. Carry out the following steps as app ropriate, depending on whether you are working with or without a Network Information Sys tem (NIS).

¡ (Without NIS) Execute the following command:

chuser nofiles=2000 trx<instance_number>

¡ (With NIS) Add the following entry to the file /etc/security/l imits:

trx<instance_number>:

nofiles=2000

3. Restart the host using reboot.

HP-UX

Changing the process s ize

1. Log on as root.

2. Open the administration tool SAM (usr/sb in/sam).

3. Set at least the following values in the dialog box kernel configuration/configurable.

Kernel Parameter Lowest Acceptable Value

Process Size

maxdsiz 0X80000000 or 2147483648

maxdsiz_64bit 0X80000000 or 2147483648

maxtsiz 0X40000000 or 1073741824

maxtsiz_64bit 0X40000000 or 1073741824

Number of Open Files

maxfiles 2048

maxfiles_lim 2048

nfile 20000

4. Restart the host using reb oot.

 Activating files larger than 2 GB

1. Log on as root.

2. Execute the following command:

fsadm -o largefiles <mount-point>

In doing this, you activate usage of files larger than 2 GB on a certain file system.

Linux

1. Add the following line to the end of the scrip t <TREX_D IR>/TREXSettings.sh:

ulimit -n 1024

2. Add the following line to the end of the script <TREX_D IR>/TREXSettings.csh:

unlimit openfiles

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 8 of 105

Page 9: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 9/105

TREXSettings.csh is not relevant for the TREX daemon. It is only relevant if you start the TREX servers manually or execute test scripts.

3. If the TREX daemon is running, restart it.

Sun Solaris

1. Log on as root.

2. Add the following lines to the configuration file /etc/system.

set rlim_fd_max=2048

set rlim_fd_cur=2048

3. Restart the host using reboot.

Result

 After making the change, execute portlibtester.x –file again. If the number of open files is still too low, the UNIX system administrator must have restricted this

parameter in another way. Contact the UNIX system administrator to remove this restriction.

Note for Linux: If you receive error messages during indexing, the value 1024 for the number of open files may not be sufficient. If this is the case, run TREX

on root (you can only raise the parameter value to 2048 on root). Proceed as follows:

· Make sure that the scrip t <TREX_D IR>/TREXSettings.sh contains the following line at the end:

ulimit -n 2048

· Make sure that the script <TREX_DIR>/TREXSettings.csh contains the following line at the end:

unlimit openfiles

TREXSettings.csh is not relevant for the TREX daemon. It is only relevant if you start the TREX servers manually or execute test scripts.

· Add a comment sign to the configuration file <TREX_DIR>/<host_name>/TREXDaemon.ini before the following lines:

#userid = trx<instance_number>

#groupid = <group>

This change causes the TREX daemon to run on root next time it starts.

 

Configuration of the RFC Connection

Purpose

The following sections describe the steps that you carry out if the application and TREX are communicating using an RFC connection.

Process Flow

1. Define the SAP system users.

2. Determine the SAP system connection data

3. Configure the RFC connection in the TREX admin tool using the TREX admin tool (stand-alone).

For more information about how you s tart the TREX admin tool (stand-alone), see Starting the TREX Admin Tool.

Result

For more information about the RFC connection and handling c onnection and configuration errors, see the documentation on the TREX admin tool (stand-alone). You

can find this documentation in the SAP Library at help.sap.com/nw70 ® SAP NetWeaver.

 

Creating an SAP System User for the TREX Admin Tool(Standalone)

Use

You must create an SAP user that the TREX admin tool (standalone) can use to log on to the SAP system. In addition, the SAP user is required so that the TREX

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 9 of 105

Page 10: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 10/105

alert server has permission to regularly test and check the RFC configuration. When doing this, the user can have been created in the default client or in another 

client. In this case, make sure that you enter the associated client for the user during the configuration of the RFC c onnection in the TREX admin tool.

The TREX admin tool (standalone) is used to configure and monitor TREX. You also use this admin tool to configure the RFC connection between TREX and the

 ABAP app lication that is using TREX. To use the TREX admin tool (standalone) to create the RFC destination, the admin tool requires a SAP sys tem user that you

create based on the predefined role SAP_BC_TREX_ADMIN. This user then has the authorization required to configure the RFC connection.

For more information on the SAP_BC_TREX_ADMIN role, see SAP Note 766516.

Overview of the Permissions Assigned by the SAP_BC_TREX_ADMIN RoleType and Scope of the Permission Activity Explanation

Permission check

for RFC access

Execute Name of the RFC object to be protected: SYST,

TREX_ARW_ADMINISTRATION

 Administration for the

RFC destination

 Add or generate, change, display , delete, extended

maintenance

Type of entry in RFCDES: Start of an external

program using TCP/IP

Check on the transaction code at transaction launch Transaction code: SM59, TREXADMIN,

TREXADMIN_AUTH

 Administrating TREX Change, disp lay, execute

 ABAP: Program run checks Schedule programs for background processing,

execute ABAP program, maintain variants for and

execute ABAP program

 

 ALV standard layout Maintain

 App lication log Disp lay, delete

More Information

Configuring and Administrating the RFC Connection

Configuring the RFC Connection in the TREX Admin Tool

Procedure

Create an SAP system user for the TREX admin tool (standalone) and assign the SAP_BC_TREX_ADMIN role to this user.

1. Launch transaction SU01 (user maintenance) or choose Administration ® System Administration ® User Maintenance ® User in the SAP menu. The User 

Maintenance: Initial Screen appears.

2. Enter a new user name and choose Create.

3. On the Address tab page, enter the personal data for the user.

4. On the Roles tab page, assign the SAP_BC_TREX_ADMIN role and thus the permiss ion to access the SAP system to the SAP sys tem user for the TREX

admin tool (s tandalone).

Result

This user for the TREX admin tool (standalone) now has the authorization required to configure the RFC connection.

 

Determining the SAP System Connection Information

Use

The TREX admin tool (stand-alone) can connect to an SAP system in two ways.

· Through a specific app lication server of the SAP system (variant A)

· Through the message server of the SAP system (variant B)

This variant uses the load-balancing function for the SAP system. The message server assigns the request from the TREX admin tool to any application

server.

Depending on the variant used, the TREX admin tool requires different connection information for the SAP system. You must determine the connection information

and specify it later in the TREX admin tool.

SAP recommends using variant B. Variant A has the disadvantage that the connection does not work if the application server is not available.

Procedure

1. Open the SAP Logon.SAP Logon is the program that you use to log on to an SAP system.

2. Note the following connection information:

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 10 of 105

Page 11: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 11/105

Connection Setup Type Required Connection Information

Through an application server (variant A) · SAP system ID (SID)

· System number 

· App lication server host name

Through the message server (variant B) · SAP system ID (SID)

· Logon group, such as PUBLIC

· Message server host name

 

Configuring the RFC Connection in the TREX Admin Tool

Use

You work through the steps below using the TREX admin tool (stand-alone).

Configuration of the RFC connection with the TREX admin tool (stand-alone) is only available as of SAP Basis Component SAP_BASIS 6.20 SP58, 6.40

SP16, and 7.0 SP6. If you are using TREX with an SAP system based on an earlier support package, you have to configure the RFC connection manually as

described in the SAP NetWeaver 04 Installation Guide for Search and Classification (TREX) 6.1. You can find this guide on the SAP Service Marketplace at

service.sap.com/instguides ® SAP NetWeaver ®Released 04 ®Installation ®Cross-NW ®Installation Guide Search and Classification TREX 6.1.

Creating a Connection

1. In the Landscape RFC window, choose the Create Connection function.

2. Choose connection type A or B. Specify the connection data for the SAP system (see Determining the SAP System Connection Information).

3. Specify the SAP system user, the associated password, and the client that the TREX admin tool is to use to log on (see Creating a SAP System User for the

TREX Admin Tool (Stand-Alone)).

If the SAP system user in question exists in the default client, you do not need to specify the client.

 

Creating an RFC Destination

1. In the Landscape RFC window, choose the RFC Destination (SM59) function.

2. Enter the following parameters:

Field Entry

SAP Sys tem SAP system that you want to set up the connection to.

The list contains all SAP systems that you have registered using Create

Connection.

RFC Destination Name of the RFC destination.

Description Meaningful description of the purpose

The program ID determines under which name the TREX RFC server registers with the SAP gateway. The program ID must be unique for each SAP

gateway. The TREX admin tool ensures this by generating the program ID.

 

3. Dec ide which SAP gateway you want to use. You have the following options:

Option Comment

Gateway local

(Default setting)

Use local SAP gateways for the application servers.

Gateway central Use the central SAP gateway.

We advise against using a central SAP gateway for distributed TREX

systems. The central SAP gateway is a “single point of failure.”

If you choose this option, enter the following additional parameters:

● Host name (with domain name if necessary) or the IP address of the host

on which the gateway is installed.

● Name of the SAP gateway in the form sapgw<instance_number>

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 11 of 105

Page 12: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 12/105

 

SAP advises against creating the RFC destination directly in the SAP system. The name of the RFC destination and the program ID must satisfy certain

naming conventions. The TREX admin tool ensures that these are fulfilled.

If you nevertheless create the RFC destination directly in the SAP system, note the following:

● We recommend starting the name of the RFC destination with TREX_.

● Choose the activation type Registered Server Program.

● Choose a program ID that is unique for the SAP gateway used.

● Use the RFC Destinationsfunction to register the RFC destination in the TREX admin tool.

 

Completing the RFC Configuration

1. In the Landscape RFC window, choose the Connect function.

The TREX admin tool creates the connection to all SAP systems that are known to it. Because the RFC configuration is still incomplete, the configuration

status is yellow or red.

 

2. Choose Repair All.

The TREX admin tool completes the RFC configuration and starts the TREX RFC server.

This can take several minutes. During this time, the configuration status remains yellow or red. After completion of the configuration process, the status

changes to green.

 

Do not choose Repair All several times in quick succession. This would trigger the configuration process more than once and delay it.

 

3. Check the progress by choosing Refresh to update the display.

 

Client Side

Purpose

The following sections describe the configuration steps that you have to carry out on the client side.

 

Java Application (HTTP Connection)If a Java application communicates with TREX, you configure the TREX Java client, which is integrated as a TREX service in the J2EE engine. You also check

the client-side proxy settings.

 

Specifying the Address of the TREX Name Server 

Use

TREX provides APIs (Application Programming Interfaces) for the languages Java and ABAP, which allow access to all TREX functions. The Java interface

(TREX Java client) is part of the SAP Web AS Java as TREX service. The TREX Java client needs to know the address of the TREX name server in order to

communicate with the TREX servers.

The following procedure describes how you determine the TREX name server address and how you specify it in the SAP NetWeaver Visual Administrator.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 12 of 105

Page 13: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 13/105

The TREX Java client communicates with the TREX server by HTTP and TCP/IP. Make sure that the TCP port that the name server uses is open.

Procedure

You have to specify the address of the TREX name server in the SAP NetWeaver Visual Administrator by naming the following values

<host_name_of_trex_host>:<name_server_port>:

● <host_name_ of_trex_ host>: name of the host on which TREX is ins talled and where the TREX name server runs.

● <name_server_p ort>: port of the TREX name server 

1. You can determine the TREX name server address in two ways:

a. Start the TREX admin tool (see Starting the TREX Admin Tool) and determine the address of the name server using Landscape ® Tree ® topology ® globals

® all_masters.

For example: mytrexhost:34801

b. Determine the port of the TREX name server by means of the following rule: <name_server_port>: 3<instance_number>01

The value <instance_number> signifies the TREX instance number which had been specified during the TREX installation:

Installation directory for TREX

■ On UNIX /usr/sap/<sapsid>/trx<instance_number >

■ On Windows <disk_drive>:\usr\sap \<SAPSID>\TRX<instance_number>

The value for <host_name_of_trex_host> you know from the host where TREX is installed (mytrexhost).

2. Use the user <j2eeadm> to log onto the host on which the J2EE Engine is running.

3. Start the SAP NetWeaver Visual Administrator and log on to the J2EE Engine.

For more information about using SAP NetWeaver Visual Administrator, see SAP Help Portal help.sap .com ® Documentation ® SAP NetWeaver ® SAP

Library ® SAP NetWeaver Library ® SAP NetWeaver by Key Capability ® Application Platform by Key Capability ® Java Technology ® Administration

Manual ® J2EE Engine Administration Tools ® Visual Administrator 

4. Click Cluster and navigate to Services ® TREX Service.

5. Enter the address of the TREX name server into the parameter nameserver.address.

tcpip://<host_name_of_trex_host>:<name_server_port>

You enter only the host name or the host name and the domain depending on your network environment.

tcpip://mytrexhost:34801 or tcpip://mytrexhost.mydomain:34801

The address of the TREX name server must be configured for all server processes of the cluster. Otherwise the connection between the J2EE Engine and

TREX cannot be established.

6. Save your changes and confirm the restart of the service.

 

Checking Proxy Settings

Use

If an application is unable to communicate with TREX, it may be due to the application trying to access TREX using a proxy server. If this is the case, you have to

change the configuration so that access does not take place using the proxy server.

The procedure depends on the application concerned:

● SAP Enterprise Portal 6.0 with Content Management

● Other Java appl ications based on J2EE 6.40

ProcedureSAP Enterprise Portal 6.0 with Content Management

Check the settings in the portal at System Administration ® System Configuration ® Service Configuration ® Applications (Content Catalog) ®

com.sap.portal.ivs.httpservice ® Services ® proxy .

If a proxy server is entered there, you have to enter the TREX host in the field http – Bypass Proxy Servers.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 13 of 105

Page 14: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 14/105

Other Java applications based on J2EE 6.40

For other Java applications, you have to check the configuration of the J2EE Engine. The proxy settings belong to the Java parameters. If a proxy server is

configured in the Java parameters, enter the TREX host in the parameter nonProxyHosts. You can choose one of the following options:

● A lternative 1: D"http.nonProxyHosts=<hostname>.<my domain>|localhost

For <hostname>.<domain>, enter the host name and domain (if necessary) of the TREX host.

● A lternative 2: D"http.nonProxyHosts=*.<mydomain>|localhost

You can change the Java parameters using the SAP J2EE Engine GUI Config Tool. For more information about using this tool, see the SAP Library at the

Internet address help.sap.com ® Documentation ® SAP NetWeaver 

Note that you have to specify the name of the TREX host in the same way both on TREX side in the TREX configuration files (topology.ini, sapprofile.ini) and in

the configuration of the J2EE Engine as described above. In case you specify the TREX host name as fully qualified (e.g. PWDF12345.sap.corp) you have to

do so on both sides. A mixed usage of host names does not work.

Initial ConfigurationThe procedures for initial configuration are organized as follows:

· Single-Host System

The initial configuration of the single-host system comprises procedures that allow you to check problems that occur and solve them, if necessary. You can

also improve TREX performance. In contrast to the configuration steps following installation, these configuration steps are not necessary in order for TREX to

work correctly in the default configuration as a single-host system and allow use by an application.

· Distributed System

TREX consists of a client component and a server component. The server component is based on a flexible architecture that allows distributed installation

and thus modification to suit various different requirements. A minimal system consists of a single host that provides all TREX functions. You then have

numerous options for scaling TREX. You can distribute TREX components among several hosts and install individual components more than once. You can

use a scaled scenario to distribute the search and indexing load among several hosts and to ensure the availability of TREX.

 

Single-Host System

Changing the Index and Queue Directory

Use

SAPinst creates an index directory and a queue directory in the directory <TREX_DIR>. You can change these directories if necessary (for example, if you want

the directories to be located in a different partition).

Procedure

1. Create the index direc tory or queue directory in the required partition.

We recommend that you use the directory names index or queue.

2. Make sure that the directory permissions match with those of the original directory (<TREX_DIR>/index or <TREX_DIR>/queue).

3. Stop TREX (see Starting and Stopping TREX).

4. Edit the configuration file <TREX_DIR>/sap profile.ini. Change the parameter TREX/IndexServer/basepath/index or TREX/IndexServer/basepath/queue so that

the relevant parameter now points to the new directory.

Only use forward slashes (/) in p aths (even on Windows).

The standard configuration is:

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 14 of 105

Page 15: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 15/105

TREX/IndexServer/basepath/index=%(SAP_RETRIEVAL_PATH)/index

TREX/QueueServer/basep ath/queue=%(SAP_RETRIEVAL_PATH)/queue

If TREX is running on UNIX; enter the following:

TREX/IndexServer/basepath/index=/my_path/index

TREX/QueueServer/basepath/queue=/my_path/queue

If TREX is running on Windows and the directories are located on a local disk drive, enter the following:

TREX/IndexServer/basepath/index=D:/my_path/index

TREX/QueueServer/basepath/queue=D:/my_path/queue

If TREX is running on Windows and the directories are located on a file server, enter the following:

TREX/IndexServer/basepath/index=//my_server/my_path/index

TREX/QueueServer/basepath/queue=//my_server/my_path/queue

 All remaining paths are only relevant for a distributed system.

5. Start TREX (see Starting and Stopping TREX).

 

Changing the Web Server Address

Use

SAPinst enters the Web server address fully qualified with domain into the configuration file <TREX_DIR>/topology.ini. Your network configuration dictates whether you have to enter the Web server address with or without the domain. If you have to remove the domain from the address, proceed as follows:

Procedure

1. Stop TREX (see Starting and Stopping TREX).

2. Edit the configuration file <TREX_DIR>/<trex_ host_number>/topology.ini. Remove the domain from the Web server address:

<httpserver>

<<port>>

Before the change: url=http://mytrexhost.mydomain:<port>/ ...

After the change: url=http://mytrexhost:<port>/ ...

</httpserver>

3. Start TREX (see Starting and Stopping TREX).

 

Only Windows: Configuring IIS

Use

You have to configure Microsoft IIS as follows:

Version Configuration

Microsoft IIS 5.x Set Application Protection to High

Microsoft IIS 6.0 ● Create a Web service extension

● Create an application pool

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 15 of 105

Page 16: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 16/105

Procedure for Microsoft IIS 5.x

1. Choose:

○ Windows 2000: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and App lication ® Internet Information Services

○ Windows Server 200 3: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and App lication ® Internet Information Services

(IIS) Manager.

2. Navigate to the Web site SAP_ TREX_<instance_number>.

1. Display the properties of the virtual directory TREXHttpServer . This virtual directory is located beneath the Web site. On the tab Virtual Directory, choose

High (Isolated) in the field Application Protection.

2. Restart the Web server.

Procedure for Microsoft IIS 6.0

Choose:

● Windows 2000: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and Application ® Internet Information Services

● Windows Server 2003: Navigate to Control Panel ® Adminis trative Tools ® Computer Management ® Services and Application ® Internet Information

Services (IIS) Manager.

Create a Web service extension

1. Choose Web Service Extensions.

2. Create an extension with the following data:

Field Entry

Extension name TREXHTTPServer_<instance_number>

Required files <disk_drive>:\usr\sap\<SAPSID>\TRX<instance_number>\exe\WebServer\TREXISAPIExt.dll

Set extension status to Allowed Select this field.

Create an application pool

1. Choose Application Pools. Create an application pool with the following ID:

 AppPool_TREX_<instance_number>

You do not need to change the other settings.

2. Display the properties of the appli cation pool you just created and then choose Identity. Select Configurable. Enter the name of the user 

(<trex_instance_number>) and give the password twice.

The user <trex_instance_number> must belong to the group IIS_WPG (IIS Worker Process Group).

3. Disp lay the properties of the Web s ite SAP_TREX_<instance_number>. Choose Home Directory and assign the Web site to the application pool that you

 just created.

 

Only Windows: Checking Permissions for the TREX Directory

Use

The TREX setup program creates the Web site SAP_TREX_<instance_number> on the Web server. This causes an anonymous user for access to the Web site

to be defined. The anonymous user needs certain permissions for the TREX directory:

· IIS 5.X: Full Control

· IIS 6.X: Read & Execute

If an error occurs, find out the anonymous user and correct the settings.

Proceed as follows to do this:

· Determine the anonymous user entered in the Web site SAP_TREX_<instance_number>.

· Give this user Full Control access to the TREX directory and to all contained files and sub-directories.

Determining the Anonymous User 

Microsoft IIS 5.X

1. Choose:

¡ Windows 2000: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and App lication ® Internet Information Services

¡ Windows Server 200 3: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and Ap plication ® Internet Information Services

(IIS) Manager.

2. Use the secondary mouse button to clic k on the SAP_TREX_<instance_ number> Web s ite. Choose Properties ® Directory Security .

1. In the Anonymous access and authentication control area, choose Edit.

2. In the Anonymous access area, choose Edit .

3. Select the name that is entered in the Username field, and copy i t using CTRL+C.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 16 of 105

Page 17: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 17/105

4. Close the Internet Services Manager.

Now give the determined user full access to the TREX directory on Microsoft IIS 5.X.

Microsoft IIS 6.X

1. Choose:

¡ Windows 2000: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and App lication ® Internet Information Services

¡ Windows Server 200 3: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and Ap plication ® Internet Information Services

(IIS) Manager.

2. Use the secondary mouse button to clic k on the SAP_TREX_<instance_ number> Web s ite. Choose Properties ® Directory Security .

3. In the Authentication and access control area, choose Edit.

3. Select the name that is entered in the Username field, and copy it using CTRL+C.4. Close the Internet Information Services Manager.

Now give the determined user Read & Execute permission for the TREX directory on Microsoft IIS 6.X..

Giving the Determined User Certain Permissions

Windows 2000

1. Use the secondary mouse button to clic k on the TREX directory. Choose Properties ® Security .

2. Choose Add .

3. Select your local host under Look in.

4. Add the copied user name using CTRL+V Check the validity of the user name using Check Names.

5. Choose OK.

6. Select the user and grant the access p ermissions:

¡ IIS 5.X: Full Control

¡ IIS 6.X: Read & Execute

7. Choose Advanced.

8. Select the user again.

9. Select Allow inheritable permissions from parent to propagate to this objectand Reset permiss ions on all child objec ts and enable propagation of inheritable

permissions.

10. Choose OK twice.

Windows Server 200 3

1. Use the secondary mouse button to clic k on the TREX directory. Choose Properties ® Security .

2. Choose Add .

3. Select your local host using Locations.

4. Add the copied user name using CTRL+V Check the validity of the user name using Check Names. 

5. Choose OK.

6. Select the user and grant the access p ermissions:

¡ IIS 5.X: Full Control

¡ IIS 6.X: Read & Execute

7. Choose Advanced.

8. Select the user again.

9. Select Allow inheritable permissions from the parent to propagate to this object and Replace permission entries on all child objects.

10. Choose OK twice.

 

Creating a Web Site Manually (Only Windows)

Use

This section is only relevant if an application communicates with TREX using HTTP.

The TREX setup program normally creates the Web site SAP_TREX_<instance_number> on the Web server. If an error occurred during this process, you have

to create the Web site manually.

Procedure

1. Open the Internet Information Services (Microsoft IIS 5.0) or the Internet Information Services (IIS) Manager (Microsoft IIS 6.0).

○ Windows 2000: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and App lication ® Internet Information Services

○ Windows Server 200 3: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and App lication ® Internet Information Services

(IIS) Manager.

2. Use the secondary mouse button to clic k on the TREX Web site (Windows 200 3) or the computer icon (Windows 200 0), and choose New ® Web Si te.

3. A wizard that helps you with the creation process is s tarted. Enter the information from the table below, and adopt the default settings for all other fields.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 17 of 105

Page 18: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 18/105

Field Input

Description SAP_TREX_<instance_number>, for example SAP_TREX_48

TCP Port <free_port>

 

We recommend that you calculate the port as follows:

3000 0 + 100 * <instance_number> + 5

SAPinst calculates the ports of the TREX servers using this method. The

method ensures that the ports do not clash with another TREX instance on

the same host.

If the instance number is 48 , the port is 3480 5.

Path <disk_drive>:\usr\sap\<SAPSID>\TRX<instance_number>\exe

Permissions (Read, Run scripts, and so on) None. Make sure that no field is checked.

4. When you have created the Web site, you have to create a virtual directory. Use the secondary mouse button to clic k on the Web site

SAP_TREX_<instance_number>, and choose New ® Virtual Directory.

5. A wizard that helps you with the creation process is started. Enter the following information:

Field Input

 Alias TREXHTTPServer_<instance_number>

Path <disk_drive>:\usr\sap\<SAPSID>\TRX<instance_number>\exe

Permissions (Read, Run scripts, and so on) Select Execute (such as ISAPI applications or CGI ). Remove the selection for 

the other permissions.

6. Display the properties of the virtual directory TREXHTTPServer_<instance_number>. Choose the Virtual Directory tab, and remove the selection for Log

visits and Index this resource.

7. Disp lay the properties of the Web s ite SAP_TREX_<instance_number>. Choose the Web Site tab, and remove the selection for the Enable Logging field.

 

Checking an RFC Connection

Use

If the connection test fails when you create an RFC destination or search server relation, check the following:

● SAP gateway

● RFC destination

● TREX configuration

Checking the GatewayWith UNIX

1. Check that the process gwrd is running:

ps –fu <gwsadm> | grep gwrd

2. Check whether the group to which the user <gwsadm> b elongs has the access permission rwx for the directory

/usr/sap/<SAPSID>/TRX<instance_number>.

With Windows

1. Use the Task Manager to check whether the process gwrd.exe is running.

2. Check the settings of the gateway serv ice. To do this, choose the following paths:

○ Windows 2000 : Start ® Settings ® Control Panel ® Adminis trative Tools ® Services.

○ Windows Server 2003: Start ® Administrative Tools ® Services.

Start the service SAPGWS_<SAPSYSNR> if it is not already running. If necessary, change the start type of the service so that it starts automatically.

3. Open the SAP Management Console by choosing Start ® Programs or All Programs ® SAP System Management Console. Check whether the gateway

instance has started. If necessary, start the gateway instance using Action ® Start.

Checking an RFC destination:

Check the data that you entered when you created the RFC destination. Pay attention to lowercase and uppercase letters in the input parameters.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 18 of 105

Page 19: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 19/105

Checking the TREX Configuration

Check the gateway parameters in the file <TREX_DIR>/<TREX_host_name>/TREXRfcServer.ini:

● Is the host name of the host on which the gateway is installed correct?

● Does the instance number match the number you specified during the gateway installation?

[CONNECTION]

HOST=<local_host_or_host_name>

INSTANCE=sapgw<gw_instance_number>

The values for the parameters HOST and INSTANCE must be entered in lower case.

 

Creating a Search Server Relation

Use

It might be necessary to create a search server relation for communication between an application and TREX. The installation documentation on the application in

question will contain information on whether you need a search server relation.

Technical background: The need for a search server relation depends on the version of the TREX ABAP client that is used by the application in question.

There are the following versions:

● The SRET package with the function modules SRET*

● The STREX package with the function modules TREX_*

If the application in question uses the SRET package, you must create a search server relation. If the application uses the STREX package, this step is not

required.

Creating a search server relation consists of the following:

1. Creating a search server relation.

2. Testing the search server relation.

Creating a Search Server Relation

1. Choose transaction SRMO in the SAP system.

2. Choose Create SSR .

3. Enter a name for the search server relation in the field Search Server Relation ID (for example, SSR_TREX).

4. Choose Create SSR .

5. Enter the following data:

Field Entry

Search engine DRFUZZY

This is the internal name of the TREX search engine.

Make sure that you enter DRFUZZY in uppercase and in the format

specified.

RFC Destination (TCP/IP) Name of the RFC destination that you created with the activation type

Registration. This entry must match the name that you assigned when you

created the RFC destination (see Creating an RFC Destination with Activation

Type Registration).

TREXDEFAULT_REG

Description Description of the search server relation, for example, Search Server Relation

for Retrieval Service.

6. Save your entries.

You return to the previous dialog box.

7. Select the newly created search server relation in the table.

8. Choose Set SSR as Default .

9. In the confirmation prompt that appears , choose Yes.

The search server relation is then shown as default in the table.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 19 of 105

Page 20: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 20/105

Testing a Search Server Relation

1. Choose the RFC Destinations tab.

Two entries are listed for the search server relation you created: One for action I (indexing) and one for action S (searching).

2. Select the entry with action = S. Choose Connection Test under Search Engine Settings.

The connection with the TREX RFC server and the TREX search engine is established. You can see this in the version information that is shown for the TREX

components.

3. Select the entry with action = I. Choose Connection Test under Search Engine Settings.

The connection with the TREX RFC server and the TREX search engine is established. You can see this in the version information that is shown for the TREX

components.

If the RFC connection still cannot be established, see Checking an RFC Connection.

 

Activating Queue Server Usage

Use

There are two methods for indexing:

· With queue server 

The RFC server sends the documents to be indexed to the queue server. The queue server collects the documents and transmits them to the index server 

according to the conditions defined in the queue parameters. The actual indexing takes place on the index server.

· Without queue server 

The RFC server sends the documents to be indexed to the index server.

The most suitable configuration depends on the application. The version of the TREX ABAP client determines whether you can configure usage of the queue

server in the file TREXRfcServer.ini. SAP Note 658052 contains information on which configuration is most suitable for each application and whether you have to

activate the usage of the queue server in the file TREXRfcServer.ini.

Procedure

1. If you have to activate the usage of the queue server, edit the configuration file <TREX_DIR>/<host_name>/TREXRfcServer.ini.

2. In the [CONNECTION] section, set the USE_QUEUESERVER parameter to YES.

[CONNECTION]

USE_QUEUESERVER=YES

Result

The changes take effect when you next start the RFC server. The RFC server is automatically started by the TREX daemon and/or by SAP Gateway.

If you use the queue server, check the queue parameters regularly and set them according to your requirements. Make sure that you configure the intervals at

which the queue server is to transmit documents to the index server. The settings that are suitable depend on how often documents are to be indexed, and how

quickly you want them to be available for the search.

You can configure queue parameters using the Python version of the TREX administration tool, for example. For more information, see the SAP Library at the

Internet address help.sap.com ® Documentation ® SAP NetWeaver.

 

Configuring Queue Parameters

Use

The queue parameters control the interaction between the queue server and the index server. In particular, they specify when the queue server triggers indexing

and optimization of documents. It is important for performance reasons that you have optimum settings for the queue parameters.

When TREX creates a queue, it uses the default settings for the queue parameters. Depending on the document sets that you have to index initially and on thetype of documents you index, you may have to change the default settings.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 20 of 105

Page 21: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 21/105

The default settings that TREX uses for new queues are defined in the configuration file TREXQueueServer.ini. You can change the default settings. However,

you should only make changes to configuration files after consulting SAP support or with a consultant.

Prerequisites

You have already created indexes.

Procedure

You can change the queue parameters for existing queues as follows:

Tool Path

TREX admin tool Queue Admin ® Queue Parameters

TREX monitor in the portal System Administration ® Monitoring ® Knowledge Management ® TREX

Monitor ® Edit Queue Parameters

TREX Admin Tool in the SAP System Transaction TREXADMIN ® Queue Admin ® Set Queue Parameters

For more information about the meaning of the queue parameters, see the SAP Library at help.sap.com.

 

Checking Performance Settings for the Operating System

Use

To optimize the performance of TREX when using the released Windows platform, you need to check your Windows configuration and make changes if necessary.

Optimizing Data Throughput For Network Applications

The Windows installation normally makes caching settings that are optimized for file servers. The operating system then reserves a large part of the main memory

for the caching of files. Since this file-system cache impairs performance when indexing, you ought to change these settings.

1. Use the secondary mouse button to clic k on My Network Places on the Windows desk top, and choose Properties .

2. Use the secondary mouse button to clic k on the local network connection and choose Properties .

3. Select the entry File and Printer Sharing for Microsoft Networks and choose Properties.

4. Select Maximize data throughput for network app lications.

5. Choose OK twice.

Optimizing Performance for Background Processes

Programs such as Microsoft SQL Server and Microsoft Exchange make the setting described below automatically when they are installed. If you have

installed one of these programs, you do not need to make any changes.

The setting is only relevant if TREX is running as a Windows service.

Windows 2000

1. Use the secondary mouse button to clic k on My Computer on the Windows desktop, and choose Properties.

2. Choose the Advanced tab, and then choose Performance Options.

3. Under  Application Response, choose the Background Services field.

4. Choose OK twice.

Windows Server 200 3

1. Use the secondary mouse button to clic k on My Computer and choose Properties.

2. Choose the Advanced tab, and then choose Settings ® Advanced.

3. Select Background services under Adjust for best performance of.

4. Choose OK twice.

 

Distributed TREX Systems (Multiple Host Installation)

Purpose

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 21 of 105

Page 22: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 22/105

Search and Classification (TREX) consists of a client component and a server component. The server component is based on a flexible architecture that allows a

distributed installation. A distributed system has the following advantages:

· Load distrib ution

You can distribute the search and indexing load among several hosts.

· High availability

You can make searching and indexing highly available.

This guide explains how to plan and implement a distributed system. It is aimed at technology consultants.

The guide is structured as follows:

· Naming Conventions contains information on the naming conventions used in this guide.

· Required Documentation lists the documentation that you need to implement a distributed system.

· Fundamentals contains information on the TREX architecture and basic information on distributed systems. You need this information to plan a distributed

system. Read this information before you begin to implement your distributed system.

· Setting Up a Distributed System and Delta Index and Index Replication Configuration describe how to implement a distributed system.

· Changing a D istributed System describes changes that you can make to your system after the installation.

· Distributed Preprocessing of Documents describes how to distribute the preprocessing of documents among several hosts. This section is relevant if you

want to index documents whose preprocessing takes up a lot of time and system resources. This can be the case if you want to index large PDF files.

· The appendix contains information on stopping and starting a distributed system. It also contains information on starting the TREX admin tool and changing the

queue parameters and the Java client parameters.

 

Naming ConventionsThe following conventions are valid for this documentation.

Terminology

Term Meaning

TREX instance One installation of the TREX server software

TREX host Host on which the TREX server software is installed

Server Program that offers services (such as an index server or queue server)

Master host Host on which a master index server is running

Slave host Host on which a slave index server is running

Backup host Host on which a backup index server is running

 

Variables

Variable Meaning

<SAPSID> System ID in uppercase letters

<sapsid> System ID in lowercase letters

<TREX_DIR> Installation directory for a TREX instance. The path to the directory is:

· On UNIX /usr/sap/<SAPSID>/TRX<instance_number>

· On Windows <disk_drive>:\usr\sap\<SAPSID>\TRX<instance_number>

User <sapsid>adm Operating system user that you log on with to administrate TREX.

User SAPService<SAPSID> Operating system user under which the TREX processes run.

User <j2eeadm> Operating system user that you use to log on to the host on which the J2EE

Engine is running.

 

 Abbreviations

The following abbreviations are used in the graphics.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 22 of 105

Page 23: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 23/105

Page 24: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 24/105

TREX is based on a client/server architecture. The client component is integrated into the application that uses the TREX functions, and allows communication

with the TREX servers. The server component processes the requests; it indexes and class ifies documents and answers search queries.

The client component is subdivided into the Java client and ABAP client. The server component is subdivided into the following servers:

● Web server with TREX extension

● RFC server 

● Queue server 

● Preprocessor 

● Index server 

● Name server 

The graphic below shows the individual components and the communication between components:

 

Java Client and ABAP ClientTREX provides programming interfaces (Application Programming Interfaces, APIs) for the languages Java and ABAP. These interfaces are also called the Java

client and the ABAP c lient.

The interfaces allow access to all TREX functions. You can use the interfaces to create indexes and queues, to perform indexing, and to perform searches. In

addition, the interfaces provide functions to query the internal status of TREX.

The interfaces are part of the NetWeaver Application Servers (NW AS).

 

Web Server with TREX ExtensionThe Web server is responsible for the communication between Java applications and the TREX servers. The application sends requests to the Web server in

XML format using HTTP/HTTPS. The Web server converts the requests to a TREX-internal format and then forwards them to the responsible TREX servers. A TREX component that enhances the Web server with TREX-specific functions is installed on the Web server. Technically, this component is implemented as

follows:

· On Windows, as an ISAPI server extension for the Microsoft Internet Information Server 

· On UNIX, as a shared library for the Apache Web server 

 

RFC Server The RFC server is responsible for the communication between an SAP system and the TREX servers.The SAP system sends requests to an RFC server using an SAP Gateway. The RFC server converts the requests to a TREX-internal format and then forwards

them to the responsible TREX servers.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 24 of 105

Page 25: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 25/105

Name Server The name server manages information on the entire TREX system. It makes sure that the TREX servers can communicate with each other and that they receive

all necessary information. The name server has the following tasks:

· Managing topology data

The topology data includes information on the central components of a TREX system (TREX servers, indexes, and queues).

· Coordinating replication services

The replication services are only relevant for a distributed TREX system. The name server has information on which TREX server has a particular data

status. It makes sure that changed data is replicated.

· Load-balancing

The name server accepts requests and distributes them to the responsible TREX servers. It is responsible for distributing indexes and search queries.

· Ensuring high availability

The name server launches several watch dogs. They constantly monitor whether the TREX servers are available. If a TREX is not available, the name server 

ensures that the TREX server that is down does not receive any requests.

 

Queue Server The queue server coordinates the p rocessing steps that take p lace during indexing. It collects incoming document, triggers preprocessing by the preprocessor,

and further processing b y the index s erver.

The queue server enables documents to be indexed asynchronously. This has the advantage that you can control the time of indexing. For example, you can

schedule indexing for times when the system load is lower because there are fewer search queries.

In addition, the queue server can trigger index replication and integration of the delta index in the main index.

 

Preprocessor The preprocessor preprocesses documents and search queries.

Document preprocessing comprises the following steps:

· Loading documents

If the application transmits the documents as URIs rather than directly, TREX resolves the URIs. This involves fetching the documents from the repository that

the URIs reference.

· Filtering documents

Documents can exist in various formats, such as Microsoft Word, Microsoft PowerPoint, PDF, and so on. The preprocessor extracts textual content from the

documents and then converts it into the UTF-8 Unicode format for further processing.

· Analyzing documents linguistically

Linguistic analysis involves sp litting text into individual words and reducing words to base forms (stems). The preprocessor uses a lexicon that exists in

several languages for this.

During search queries, the preprocessor performs a linguistic analysis . It transmits the results of the analysis to the index server, which continues the processing

of the document.

 

Index Server The index server indexes and c lassifies documents and answers search queries. The processing takes place in the engines that belong to the index server.

There are the following engines:

· Search engine:

This engine is responsible for standard search functions such as the exact, error-tolerant, linguistic, Boolean, and phrase searches.

· Text-mining engine

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 25 of 105

Page 26: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 26/105

This engine is responsible for classification, searching for similar documents (‘See Also’ search), the extraction of key words, and so on.

· Attribute engine

This engine is responsible for searching for document attributes such as author, creation date, and change date.

 

TREX Instances and the TREX System A TREX instance is an administrative unit that comprises the TREX server components. A TREX instance is s tarted and stopped as a unit.

The following components belong to a TREX instance.

· A name server 

· A queue server 

· One or more index servers

· One or more prep rocessors

· Optionally, one or more RFC servers

· Optionally, one or more Web servers

 A TREX instance runs on a host. It is possible for several TREX instances to run on the same host. A TREX instance is identified by a two-character instance

number. This instance number must be unique on a host.

 A TREX sys tem consists of one or more TREX instances. If it consists of only one TREX instance, it is called a single host sys tem. If it consists of multiple

connected TREX instances, it is called a dis tributed system.

 

Distributed TREX SystemsThe sections below explain concepts that are relevant for distributed TREX systems.

 

Server TasksIn a distributed system, there are multiple instances of the individual TREX servers (name server, index server, queue server, and so on).

The servers in a distributed system do not have the same rights and have different tasks. The following sections describe these tasks.

 

Master, Slave, and Backup Index ServersThe index servers in a distributed system have one of the following roles:

· Master index server 

· Slave index servers

· Backup index server 

 A master index server is responsib le for indexing. In the default configuration, it is not responsible for searching.

 A slave index server is responsible only for searching and not for indexing.

The separation of the master index server and slave index servers is beneficial to performance. The indexing functions are separate from the searching functions,

so that there is no loss of performance during indexing runs.

 A backup index server can replace a master index server if it becomes unavailable. The backup index server is inactive if the master index server is available.

When the master index server restarts after becoming unavailable, it takes over its tasks from the backup server again.

You implement backup index servers in order to make indexing highly availab le. The indexes must be stored centrally, so that both the master and the backup

index servers can have write-access to them.

The index servers are the central components of a TREX system. In principle, their role determines the load that a host has to carry. The documentation below

therefore refers to the hosts according to the role of the index server: A master, slave, or backup host is a host on which a master, slave, or backup index server 

is running.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 26 of 105

Page 27: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 27/105

 

Master and Backup Queue ServersThe queue servers in a distributed system have one of the following roles:

· Master queue server 

· Backup queue server 

The master queue server is the primary server for managing the queues.

 A backup queue server can replace a master queue server if it becomes unavailable. The backup queue server is inactive if the master queue server is

available. When the master queue server restarts after becoming unavailable, it takes over its tasks from the backup queue server again.

You implement backup queue servers in order to make indexing highly available. The queues must be stored centrally, so that both the master and the backup

queue servers can have write-access to them.

 

Master and Slave Name ServersThe name servers in a distributed system have one of the following roles:

· Master name servers

· Slave name server 

The master name servers can update the topology data for the system. The slave name servers can only read the topology data.

In a distributed system you need at least two master name servers, and cannot define more than three. The system automatically defines an active master. If the

active master is unavailable, the next master name server takes over the tasks.

 

Preprocessor ModesThe preprocessors can run in the following different modes:

· search mode

The preprocessor only p reprocesses search queries. In this mode the preprocessor runs on the slave hosts by default.

· index mode

The preprocessor only preprocesses documents. In this mode the preprocessor runs on the master hosts by default.

· any mode

The preprocessor's tasks are not restricted.

The modes merely define preferences for the distribution of tasks for the preprocessors. If necessary, a preprocessor carries out all tasks regardless of its mode.

For example, in certain circumstances a p reprocessor that runs in index mode also processes search queries. This behavior increases the availab ility of the

system, because in principle all preprocessors are able to carry out all tasks.

 

Master and Slave Index A master index is the original version of an index. It is managed by a master index server.

 A slave index is a copy of a master index. It is managed by a s lave index server. The slave index is created and updated using a replication procedure.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 27 of 105

Page 28: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 28/105

Connection to the Application

HTTP Connection

If theTREX system is connected to a Java application, the Java application communicates with both the name server and with the Web server. The Java

application asks the name server via TCP/IP for the address of a Web server. It then sends the request to the Web server using HTTP/HTTPS. The Web server 

forwards the request TREX-internally. The graphic below depicts this communication:

There are multiple Web servers in a distributed TREX system. As soon as the Java application receives the address of one Web server, it communicates with

that Web server for as long as it is available. If the Web server does not answer (for example, because it is overloaded), the Java application swaps to another 

Web server.

RFC Connection

If the TREX system is connected to an ABAP application (that is, to an SAP system), both systems communicate via an RFC connection. The SAP system

sends its requests to an SAP gateway. The SAP gateway sends the requests to a TREX RFC server. The TREX RFC server forwards the requests TREX-

internally.

With regard to the SAP gateway, there are two variants:

· Communication takes p lace using the local SAP gateway of the application server.

· Communication takes place using a central SAP gateway.

In the case of a distributed TREX system, SAP strongly recommends using the local SAP gateways of the application servers. On the TREX side, TREX RFC

servers are registered with each local SAP gateway. Each TREX host is connected to each application server of the SAP system.

The graphic b elow depicts this.

Using the local SAP gateways has the following benefits:

· The local SAP gateways process the requests quicker then a central SAP gateway.

· The SAP gateway is not a “single point of failure.” If an appli cation server and its local SAP gateway fails, the requests are distributed among the remaining

application servers and still continue to reach the TREX system.

If you use a central SAP gateway and the SAP gateway fails, the RFC connection fails too. It is not possible to switch to another central SAP gateway

automatically.

Data StorageIn a distributed system you can keep TREX data (indexes, queues, and index snapshots) centrally or on the separate hosts.

Decentralized Data Storage

If data is not kept centrally, each host stores its data in its own directory structure. The data is normally located locally on the hosts.

The following graphic depicts the data and directory structure with decentralized data storage:

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 28 of 105

Page 29: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 29/105

The master indexes, corresponding queues, and the index snapshots are located on a master host. The index snapshots are index copies that the system needs

for index replication.

The slave indexes are located on a slave host. They are created and updated by index replication. There is no other data on the slave hosts.

You cannot use backup hosts in sys tems where data storage is decentralized. This means that you cannot make indexing highly availab le in such systems.

Centralized Data Storage

With centralized data storage, the data is stored so that all TREX hosts can access it.

Centralized data storage can be realized with different hardware solutions: The data can be located on a server that is optimized for file sharing, in a storage area

network (SAN), or on a network attached storage server (NAS server). It is important that the connection between the TREX hosts and the data is sufficiently fast. In

the following documentation, a central storage location is referred to as a file server regardless of the underlying hardware.

Centralized data storage is necessary if you want indexing to be highly available. You can only move from a master index or queue server to a backup index or 

queue server if you are using centralized data storage. You can use standard solutions such as the RAID system to make data highly available.

Centralized data storage also has the following advantages if you are only using master and slave hosts:

· Index replication generates less of a network load becaus e the replicated files do not have to be copied onto every s lave host.

· Index replication is quicker.

· Less disk space is required for the replicated indexes because all slave hosts share an index copy.

The following graphic depicts the data and directory structure with centralized data storage:

 

Features of a Blade System

If you do not want to implement individual hosts you can install TREX on a blade system. TREX supports blade systems that run on UNIX.

 A blade system consists of hosts in the form of server blades. A b lade system has the advantage that the initial costs and running costs for maintaining the

system are less than if you were using individual hosts.

The server blades are connected to a central disk storage. This is referred to here as a file server, regardless of the underlying hardware.

The special feature of a TREX installation on a blade system is that the TREX software can be stored centrally as well as the TREX data. This means that you only

have to install the software once on the file server. Maintaining the system is efficient because you only have to implement software updates once.

 All server blades on which TREX is running access the same program files. However, each server blade has its own configuration files. The configuration files in

the directory <TREX_DIR> are only used as templates. A script contained in the TREX delivery creates a separate subdirectory for each server blade and copies

the configuration files to this subdirectory. For more information, see Activating the Configuration Clones for Server Blades.

Except for the activation of this script, the remaining configuration takes place as for a system with individual hosts.

The graphic below depicts how data, programs, and configuration files might be stored in a blade system.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 29 of 105

Page 30: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 30/105

 

Supported SystemsThere are various ways of structuring distributed systems. The table below contains an overview of the systems that are supported. Because the index servers

are the central components, the systems are classed by the role of the index server.

Supported Systems

Number and Roles of Index Servers Data Storage

Backup Master Slave Decentralized Centralized

 – 1 1 ! !

 – 2 or more At least 1

per master 

! !

1 1 1 !

1 for all masters 2 or more At least 1

per master 

!

1 per master 2 or more At least 1

per master 

!

SAP recommends configuring at least two slave index servers for each master index server, to noticeably improve the performance of the TREX search.

However, all other combinations of master and slave index servers are also possible (for example, one master and three slaves). You can also start with a

minimal configuration of one master index server and one slave index server.

The following is valid for all supported systems:

· You can install all systems on individual hosts or you can use a blade system.

The graphics below depict sys tems on individual hosts. However, all graphics are also valid for blade systems.

· You can connect any system to an app lication using an HTTP connection and/or an RFC connection.

The graphics below depict systems in which Web servers and RFC servers run. If only one type of connection is relevant, only Web servers or only RFC

servers can run.

The sections below describe the supported systems in detail. In the details, the recommended ratio of one master index server to two slave index servers for 

improved TREX performance is assumed.

 

Systems with Master and Slave Index Servers A sys tem with master and slave index servers has the following advantages:

· Load distrib ution for search queries

Parallel search queries are distributed among several slave index servers and can therefore be answered more quickly .

· High availability for searching

Each index is available on multiple slave index servers. If one server goes down, the search queries are distributed among the remaining slave index

servers. If all slave index servers becoming unavailable, the master index server would process the search queries.

· Indexing larger data sets

 A master index server can only process a certain amount of data. If you use multiple master index servers, you can index more data than in a single host

system. The data must be dis tributed among several indexes.

If a system has no backup servers, you can store the TREX data either centrally or decentrally. The graphics below only depict systems with decentralized data

storage.

One Master, Multiple Slave Index Servers

You can build a system with one master and several slave index s ervers as dep icted below.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 30 of 105

Page 31: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 31/105

The master index server carries the entire indexing load in this scenario. The searching load is distributed among the slave index servers. Such a system is

suitable for scenarios where one master index server can cope with the amount of data to be indexed.

The smallest recommended system consists of one master and two slave index servers that run on separate hosts. The host that is configured as the master 

index server is also configured as the master queue server. The graphic below depicts the system.

Multiple Masters, Multiple Slave Index Servers

 A master index server can only process a certain amount of data. If large data sets are to be indexed and you can distribute the data among several indexes, you

can imp lement multiple master index servers. Each master index server manages some of the indexes.

You cannot define multiple master index servers to manage the same index.

TREX distributes the indexes among the master index servers using a round robin procedure. TREX also distributes the queues among the master queue servers

using a round robin procedure. Any queue is located on the same host as the master index to which it belongs.

The load on a master index server depends on how large the indexes become and how often you update the indexes. If automatic index distribution does not lead

to balanced load distribution, you can change the index distribution later on.

The smallest recommended system with multiple mas ter index servers consists of two masters, each with two slave index servers.

You can realize this system in two ways, according to how many CPUs and how much main memory the hosts have. For information on hardware requirements,

see Hardware, Software, and Other Requirements

One index server per host

If your hosts have few CPUs and not much main memory, only one index server can run per TREX instance. If this is the case, you distribute the master index

servers among multiple hosts.

The graphic below depicts a system with two master index servers that are distributed among two hosts.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 31 of 105

Page 32: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 32/105

Multiple index servers per host

If your hosts have sufficient CPUs and main memory, multiple index servers can run for each TREX instance. The TREX setup checks the hardware resources

and automatically configures the number of index servers.

The same number of index servers must run on the master host and on the corresponding slave hosts. If two index servers run on the master host, two index

servers must run on each slave host.

The following graphic depicts a system with two index servers for each host:

One master queue server per master host is sufficient. This server manages the queues for both master index servers running on the host.

You can build systems with multiple masters and multiple slave hosts and with multiple index servers p er host. The graphic b elow depicts such a system.

 

Systems with Master, Backup, and Slave Index ServersYou can enhance master/slave systems by adding backup servers (backup index servers and backup queue servers). Such enhanced systems offer additional

high availability for indexing.

Each master index server manages some of the indexes. If a master index server goes down, indexing does not normally take place for affected indexes. You

implement backup index servers to avoid this. A backup index server can replace a master index server if it becomes unavailable. The backup index server is

inactive if the master index server is available.

The same is true for the queue server: If a master queue server goes down, queuing normally does not take place for the documents affected, and this means that

indexing cannot take place either. You implement backup queue servers to avoid this. A backup queue server can replace a master queue server if it becomes

unavailable. The backup queue server is inactive if the master queue server is available.

The TREX data has to be stored centrally in systems with backup servers. Otherwise the master and backup servers cannot access the data.

You can build systems with backup servers in the following way:

· One backup server per master server 

· One backup server for all master servers

The following factors dictate which variant to choose.

· The number of master servers

· The number of master servers that are expected to be down for maintenance at the same time

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 32 of 105

Page 33: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 33/105

One Backup Server per Master Server 

The smallest recommended system with backup servers consists of one file server, one backup server, one master server, and two slave servers.

The graphic b elow depicts this sys tem.

 As many index servers must run on the backup host as run on the master host. If two index servers run on the master host, two index servers must run on the

backup host.

The graphic b elow depicts a larger system with multiple master and backup hosts.

In this system, b oth master hosts can b e down at the same time, because each has its own backup host.

One Backup Server for All Master Servers

You can build systems in which one backup host is ass igned to all master hosts. Only one master host can be down at any one time in such systems. If multiple

master hosts with a full load go down, one backup host cannot take on the entire load.

The graphic below depicts a system with two master hosts that share a backup host.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 33 of 105

Page 34: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 34/105

Summary: High AvailabilityIf you are using TREX productively, the system has to be available for as much of the time as possible. Planned downtimes for maintenance and unplanned

downtimes because of software errors should be reduced.

This section summarizes how to make searching and indexing highly available. The depicted measures are made on TREX server side and for the connection

between TREX and the application. Measures that affect other software components or the hardware (highly available file servers, redundant network connection

and so on) are not depicted.

High Availability for SearchingSearching is highly available in a system with master and slave servers. If a slave host goes down, TREX forwards search queries to the other slave hosts.

It can take up to a minute to switch to another slave host. During this phase TREX search queries may not be answered. An error message may be returned.

If you have one master and two slave hosts, you can shut down one of the hosts for maintenance purposes (either the master or one of the slave hosts).

Measures on TREX side

· Each master host has at least two slave hosts.

· Each index is available on at least two slave hosts.

· In systems with an HTTP connection: There are at least two Web servers.

· In systems wi th an RFC connection: See RFC Connection in Connection to the Application.

Measure on the application side

Type of Application Measure

Java application The Java client recognizes at least two name servers.

 ABAP app lication See RFC Connection in Connection to the Application.

 

High Availability for Indexing (Only with Queue Server)

If indexing takes place using queue servers, you can make indexing highly availab le. High availability means the following:

· The app lication can send indexing requests to TREX.

· The system automatically switches to a backup index or queue server if a master index or queue server goes down (failover). Failover is not possible in the

following cases:

¡ If there are network problems

¡ If a file server goes down

¡ If there are communication problems (app lication sends a request and receives no answer)

The switch to a backup index or queue server takes b etween 15 seconds and one minute. During this phase the sys tem stores indexing requests in a cache andsends them to the backup server after the switch.

Measures on TREX side

· There are at least two master name servers.

· Each master index server has a backup index server (its own or one that it shares with the other master index servers ).

· Each master queue server has a backup queue server (its own or one that it shares wi th the other master queue servers).

· If the integration of the delta index takes place us ing the Python scheduler: The Python scheduler is running on all hosts that are configured as master name

servers.

· If index replication takes p lace using the Python scheduler: The Python scheduler is running on all hosts that are configured as master name servers.

· In systems with an HTTP connection: There are at least two Web servers.

· In systems wi th an RFC connection: See RFC Connection in Connection to the Application.

Measure on the application side

Type of Application Measure

Java application The Java client recognizes at least two name servers.

 ABAP app lication See RFC Connection in Connection to the Application.

 

Global File System and TREX InstancesThe TREX server software comprises two parts:

· TREX Instances

These are the program files, configuration files, and so on.

· Global TREX file system

This is a directory structure, in which information about the TREX system instances is stored. For example, this information is required by management tools

to start the TREX system.

There is exactly one global TREX file system for a TREX system. When a TREX instance starts, it must have access to the global TREX file system.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 34 of 105

Page 35: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 35/105

Otherwise, it cannot start.

When planning a distributed TREX system, you must decide which host the global TREX file system should be located on. This determines the installation steps

that you work through later and which installation option you choose respectively. The following installation options are available:

Installation Option Description

Central TREX instance Installing a combination of a TREX instance and a global TREX file system

TREX dialog instance Install the TREX instance only

Global TREX file system Installing the global TREX file system only

The global TREX file system can be located on any host, as long as all TREX instances have access to it when they start. It can be located on a host that belongsto the TREX system, but it does not have to be.

SAP recommends the following for a distrib uted TREX system with centralized data storage:

Place the global TREX file system on the file server that the TREX data (indexes, queues, snapshots) is also stored on.

In order to serve as data storage, the file server and the connection between the TREX hosts and the file server must be highly available. If the global TREX

file system is also located on the file server, you can be sure that the TREX instances can access it at all times.

For the installation process this means that you install the global TREX file system on the file server. You install a TREX dialog instance on every master,

backup, and slave host.

 

SAP recommends the following for a distrib uted TREX system with decentralized data storage:

Place the global TREX file system on a host that is used as the master name server.

For the installation process this means that you install a central TREX instance on the host that is used as the master name server. On all other hosts, you

install TREX dialog instances.

The graphic b elow depicts such a scenario.

 

Hardware, Software, and Other RequirementsThis section lists the requirements that are unique to distributed systems. These hardware requirements relate to production systems.

CPU, RAM, Network

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 35 of 105

Page 36: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 36/105

Requirement Type Requirement

CPU For  one index server per TREX instance:

· At least 2 CPUs

· Recommended: 4 CPUs

With two index servers per TREX instance: At least 4 CPUs.

The supported processors are listed in the TREX installation guide.

RAM At least 2 GB per CPU

Network connection At least 100 Mbit

With centralized data storage: Connec tion to the file server. At leas t 1 Gb it

SAP recommends that you define a separate sub network.

 All TREX hosts must be identical as regards the number of CPUs, RAM, and network connection.

Decentralized Data Storage: Disk Space for TREX Data

The formulas specified here are approximate and do not return exact values.

Required d isk space on one master host

Only HTML/Text Documents Mixed Documents (DOC, PDF, and so on)

Index size + queue (permanent) =

Document set size x 2

Index size + queue (permanent) =

Document set size x 0.5

Index snapshot size (p ermanent) =

Document set size – 2 x 0 .7

Index snapshot size (permanent) =

Document set size x 0.5 x 0.7

Temporary disk space =

Document set size x 1.5

Temporary disk space =

Document set size x 0.5

We strongly recommend that you place the master indexes and the index snapshots on different hard disks. This improves performance when indexing and

replicating indexes.

Required disk space per slave host

Only HTML/Text Documents Mixed Documents (DOC, PDF, and so on)

Index size (permanent) =

Document set size x 2

Index size (permanent) =

Document set size x 0.5

Index snapshot size (temporary) =

Document set size x 2 x 0.7

Index snapshot size (temporary) =

Document set size x 0.5 x 0.7

The hard disk capacity and performance must be identical on master and slave hosts.

You have a document set size of 50 GB of HTML/text documents or 50 GB of mixed documents.

The following table presents the required space on the master host.

Master Host 50 GB

HTML/Text Documents

50 GB

Mixed Documents

Index + queue (permanent) 100 GB

(50 GB x 2)

25 GB

(50 GB x 0.5)

Index snapshot (permanent) 70 GB

(50 GB x 2 x 0 .7)

17.5 GB

(50 GB x 0.5 x 0.7)

Temporary 75 GB

(50 GB x 1.5)

25 GB

(50 GB x 0.5)

Total 245 GB 67.5 GB

 

The following table presents the required space on each slave host.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 36 of 105

Page 37: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 37/105

Slave Host 50 GB

HTML/Text Documents

50 GB

Mixed Documents

Index

(permanent)

100 GB

(50 GB x 2)

25 GB

(50 GB x 0.5)

Index snapshot (temporary) 70 GB

(50 GB x 2 x 0 .7)

17.5 GB

(50 GB x 0.5 x 0.7)

Total 170 GB 42.5 GB

 

Centralized Data Storage: Disk Space for TREX Data

The formulas specified here are approximate and do not return exact values.

Required d isk space on the file server 

Only HTML/Text Documents Mixed Documents (DOC, PDF, and so on)

Index size + queue (permanent) =

Document set size x 2

Index size + queue (permanent) =

Document set size x 0.5

Index snapshots s ize (permanent) =

Document set size x 2 x 1.4

Index snapshots size (permanent) =

Document set size x 0.5 x 1.4

Temporary disk space =

Document set size x 1.5

Temporary disk space =

Document set size x 0.5

You do not need additional disk space for the slave index. The slave index servers use one of the index snapshots as their slave index.

You have a document set size of 50 GB of HTML/text documents or 50 GB of mixed documents.

This results in the following disk requirements on the file server:

File server 50 GB

HTML/Text Documents

50 GB

Mixed Documents

Index + queue (permanent) 100 GB

(50 GB x 2)

25 GB

(50 GB x 0.5)

Index snapshots (permanent) 140 GB

(50 GB x 2 x 1.4)

35 GB

(50 GB x 0.5 x 1.4)

Temporary 75 GB

(50 GB x 1.5)

25 GB

(50 GB x 0.5)

Total 315 GB 85 GB

 

Disk Space for TREX Software and SAPinst

 As for a single host sys tem (see the TREX installation guide).

Software Requirements

Requirement Type Requirement

Operating system platform All TREX hosts must run on the same operating system platform Mixed

installations (for example, one TREX host on HP-UX and another on Windows)

are not supported.

There is no dependency between TREX and the application using TREX with

regard to the operating system used. You can install TREX on a different

operating system to the application that accesses TREX.

TREX release All TREX hosts must have the same TREX release with the same patch level.

The software requirements in the TREX installation guide are also valid.

Operating System User and PermissionsThe installation automatically c reates the operating system user SAPService<SAPSID>.

In the case of a TREX system with centralized data storage, you must ensure that the user SAPService<SAPSID> has full access permission for the TREX data

directory on the file server. Note the following:

· If the user is a network user (domain user), you have to ensure this for this one network user.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 37 of 105

Page 38: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 38/105

· If the user is a local user, you have to ensure this for all local SAPService<SAPSID> users.

In the case of a TREX system with decentralized data storage, there are no special requirements regarding access permission.

System ID

During the TREX installation, you enter a three digit system ID, for example, TRX. You must use the same system ID for all TREX instances that you want to

group together as a distributed system.

TREX Instance Number SAP recommends that you use the same instance number for all TREX instances in order to simplify administration. You define the instance number during the

TREX installation.

There is only one TREX installation in a blade system with centralized program storage. The instance number is the same for all server blades. During the

installation of TREX you have to choose an instance number that is still free on all the server blades on which TREX is going to run.

TREX Daemon

You only have to change the configuration of the TREX daemon on the individual hosts under certain circumstances. These circumstances are described in this

documentation.

Otherwise, you can keep the standard configuration, even if the TREX daemon starts processes that are not used. Such processes do not use up system

resources and therefore do not affect performance. If you keep the standard configuration it is easy to change the roles of the hosts.

By default, a queue server runs on each host. The queue server has no function on a slave host. It is not used. You do not need to make configuration changesto the TREX daemon on the slave host.

Connecting TREX to More Than One Application

In principle, you can connect one TREX system to more than one application. Note the following:

· The TREX system must have appropriate dimensions so that it can process the load of all the applications.

· You must take organizational measures to ensure that the applications use separate index namesp aces.

 

ConstraintsNote the following constraints for distributed systems.

TREX Instances

The TREX instances that form a distributed system must run on different hosts. You cannot combine several TREX instances on the same host to form a

distributed system.

Hosts

· SAP recommends using a maximum of 4 master hosts.

· SAP recommends using a maximum of 2 slave hosts per master host

For information on equipping the hosts, see Hardware, Software, and Other Requirements

Master name servers

You need at least two master name servers, and cannot define more than three. Keep to the following rules:

· First distribute the master name servers on the master hosts.

· If there is a backup host, distribute the other master name servers there.· If there is no backup host, distribute the other master name servers on the slave hosts.

Indexes

· You can have a maximum of 50 master indexes per master index server.

· If the master index server has 2 G B working memory per CPU, the maximum size of a master index is 100 GB.

 

Planning

Purpose

In the planning phase you analyze your requirements and define the structure of the distributed system. An analysis of your requirements shows you how many

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 38 of 105

Page 39: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 39/105

hosts you need and the tasks that the hosts will carry out.

To simplify the installation and configuration of the system, you should create the following during the planning phase:

· A graphical depiction of the distrib uted system

· A table containing the host names and roles of the hosts involved

Example

Graphical depiction of the system:

 

Table with system information:

Host Name Installation Option To Use Role Comment

myfileserver Global TREX file system File server Storage location for:

· Global TREX file system

· TREX data (indexes , queues,

index snapshots) and topology

file

 

mytrexmaster1 TREX dialog instance Master name server  

Master index server 

Master queue server 

Master host, manages part of the

master indexes

mytrexmaster2 TREX dialog instance Master name server  

Master index server 

Master queue server 

Master host, manages part of the

master indexes

mytrexbackup TREX dialog instance Master name server  

Backup index server 

Backup queue server 

Backup host for mytrexmaster1 and

mytrexmaster2

mytrexslave1 TREX dialog instance Slave name server  

Slave index server 

Slave host for mytrexmaster1

mytrexslave2 TREX dialog instance Slave name server  

Slave index server 

Slave host for mytrexmaster1

mytrexslave3 TREX dialog instance Slave name server  

Slave index server 

Slave host for mytrexmaster2

mytrexslave4 TREX dialog instance Slave name server  

Slave index server 

Slave host for mytrexmaster2

 

Setting Up a Distributed System

Purpose

If you are implementing a distributed system, you initially install all server software on each host. You then configure the hosts according to the tasks that each

host is to carry out.

The following sections describe how you set up a distributed system from scratch. All tasks that are necessary for the initial configuration of the system are

described.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 39 of 105

Page 40: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 40/105

Checklists

Purpose

The procedure depends on the following:

· The type of data storage you use (centralized or decentralized)

· The hardware you are using (individual hosts or blade system)

Below are the checklists for the different scenarios.

Individual hosts with central data storage

! Action

  On the file server:

Install a TREX global file system (see the TREX installation guide)

  Create a Central TREX Data Directory

  On all TREX hosts

Install a TREX dialog instance (see the TREX installation guide)

  Only UNIX: Mount the Central TREX Data Directory

  Only Windows: Define a network drive for the central TREX data directory

Start TREX

  On a future master name server 

  Configure the Landscape

  With an RFC connection: On all TREX hosts and in the SAP system

  Configure the RFC Connection

  With an HTTP connection: On the J2EE Engine

  Configure the HTTP Connection

 

Blade system with central program and data storage

! Action

  On the file server:

Install TREX (see the TREX installation guide Single Host )

   Activate the Configuration Clones

  On all server blades on which TREX is to run

Start TREX

  On a future master name server 

  Configure the Landscape

  With an RFC connection: On all server blades and in the SAP system

  Configure the RFC Connection

  With an HTTP connection: On the J2EE Engine

  Configure the HTTP Connection

 Individual hosts with decentralized data storage

! Action

  On a future master name server 

Install a central TREX instance (see the TREX installation guide)

  On all other TREX hosts

Install a TREX dialog instance (see the TREX installation guide)

  On all TREX hosts

Start TREX

  On a future master name server 

  Configure the Landscape

  With an RFC connection: On all TREX hosts and in the SAP system

  Configure the RFC Connection

  With an HTTP connection: On the J2EE Engine

  Configure the HTTP Connection

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 40 of 105

Page 41: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 41/105

 

Preparing for Centralized Data Storage

Purpose

If you want to store the TREX data centrally on a file server, you have to prepare first. The sections below describe the steps necessary for this.

 

Creating a Central TREX Data Directory

Procedure on UNIX1. Create a directory for the TREX data on the file server.

2. Make sure that the directory belongs to the user SAPService<SAPSID>.

3. Share the directory so that all TREX hosts have full permission (read, write, and execute) for it.

The exact procedure is described in the documentation for your operating system platform.

Procedure on Windows

1. Create a directory for the TREX data on the file server.

2. Share the directory so that the user SAPServ ice<SAPSID> has full permission for it.

The exact procedure is described in the documentation for your operating system platform.

 

Only UNIX: Mounting the Central TREX Data Directory

Procedure

Use mount to mount the TREX data direc tory that you created on the file server onto all TREX hosts. Note the following:

· Mount the directory in the same place (mount point) in the file system on all TREX hosts.

You created the directory mytrexdir on the file server. You mount this directory on all hosts at /mymountpoint/mytrexdir.

This mount point must be the same on all hosts. Otherwise, the system cannot swap from a master server to a backup server. Moreover, the slave servers

cannot use a common slave index.

· Make sure that the user <saps id>adm has full permission (read, write, and execute) for this directory.

· Make sure that the directory will b e automatical ly mounted if the host is reb ooted before starting TREX.

The exact procedure is described in the documentation for your operating system platform.

 

Only Windows: Defining the Network Drive

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 41 of 105

Page 42: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 42/105

Use

SAP strongly recommends that you define a network drive on all TREX hosts for the central TREX data directory on the file server. Access via a network drive is

much quicker than access without a drive.

The procedure below describes the required configuration steps.

Procedure

Edit the configuration file TREXDaemon.ini on all TREX hosts as follows:TREXDaemon.ini

[mappings]

map_<network_drive_letter>=\\<file_server>\<TREX_data_directory>

Define the same network drive on all TREX hosts: Use the same network drive letter and specify the same directory.

In the standard system, the system uses the user SAPSys tem<SAPSID> to access the network drive.

If you want the system to use a different user for access, specify this as follows:

map_<network_drive_letter>=\\<file_server>\<TREX_data_directory>

user_<network_drive_letter>=<user_name>

password_<network_drive_letter>=<password_in_plain_text>

Result

The changes take effect when TREX is next started.

Example

You have created the directory mytrexdir on the file server myfileserver and shared it as mytrexshare. You want to connect the directory on all TREX hosts as the

network drive T:.

Configuration in TREXDaemon.ini:

[mappings]

map_t=\\myfileserver\mytrexshare

 

Activating the Configuration Clones for Server Blades

Use

You can install TREX on a blade system so that the TREX data and program files are stored only once on the file server and are used by all server blades. Every

server blade on which TREX is running needs its own configuration files.

You use a Python script to duplicate the profile files and the configuration to all server blades in your TREX landscape so that each server blade receives its own

configuration files.

You do this in the following steps:

Initial Installation of TREX on a Central File Server 1. Mount the central file server.

/mnt/myfileserver 

SAP recommends that you enter the directory /mnt/myfileserver  in the configuration file /etc/fstab, so that the directory is automatically remounted when the

host is started again.

2. Create a subdirec tory, for examp le, <SAPSID> for the directory /mnt/myfileserver .

/mnt/myfileserver/<SAPSID>

3. Generate symbolic links (symlinks ), which link from the directories /usr/sap/<SAPSID> and /sapmnt/<SAPSID> to the directory

mnt/myfileserver/<SAPSID>.

4. Install TREX

5. Check whether TREX has been started and, if necessary, start TREX.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 42 of 105

Page 43: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 43/105

Duplicating Profile Files and the Configuration to Server Blades

6. Log on with the user root.

7. Mount the central file server.

/mnt/myfileserver 

8. Generate symbolic links (symlinks ), which link from the directories /usr/sap/<SAPSID> and /sapmnt/<SAPSID> to the directory

mnt/myfileserver/<SAPSID>.

9. Switch to the TREX directory /usr/sap/<SAPSID>/TRX<instance_number>.

10. Set the environment variables required by TREX by executing the following Shell scripts.

¡ Bourne shell sh, Bourne-again shell bash, Korn shell ksh:

. TREXSettings.sh

¡ C shell csh:

source TREXSettings.csh

11. Execute the Python script cloneInst.py:

python exe/python_support/cloneInst.py

Result

The Python script cloneInst.py executes the following actions on the server blades that have been added:

· Create the same users on the added server blade as on the initial server blade

· Copy and modify the SAP profile files from the initial server b lade

· Copy and modify the configuration files from the initial server blade

· Extend the directories /etc/init.d and /usr/sap/sapservices

· Start TREX

 

Landscape Configuration

Purpose

You use the TREX admin tool to configure the landscape. This tool has a graphical administration interface.

Prerequisites

TREX has been started on the hosts that form the distributed system.

Process Flow

1. Start the TREX administration tool on one of the future master name servers.

2. Go to the Landscap e Configuration window.

3. Define a new landscape.

4.  Add the remaining hosts.

5. Define the roles of the hosts.

6. Configure centralized data storage if required.

7. Check and activate the configuration.

Result

You have now defined the structure of a distributed system. You now have to configure the delta index and index replication. For more information, see Delta Index

and Index Replication Configuration.

 

Defining a New Landscape

Use

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 43 of 105

Page 44: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 44/105

The local host is entered in the Hosts table in the Landscape Configuration window. By default, the local name server has been defined during the TREX

installation as the First (1st ) Master Name Server(see the column Name Server Mode) and as the Master Index Server and the Master Queue Server. Since the

local name server is already preconfigured as the master name server, you can use it as the starting point for configuring your distributed TREX system

landscape.

Procedure

Enter a meaningful description for the new TREX system landscape.

 

Adding a Host

Use

You use the preconfigured master name server as the starting point for the configuration of your distributed TREX system landscape and then add the remaining

TREX hosts to it. Note the following:

· If multiple TREX instances are running on a host, you can only add one of them.

· You can only add TREX instances that belong to no other distributed system.

· You can only add TREX instances that have the same sy stem ID.

Procedure

1. Choose Add Host.

2. Enter the address of the name server that runs on the host to be added. The name server port is

3<trex_instance_number>01

If the instance number is 48 , the name server port is 348 01.

 

Defining the Roles of Hosts

Use

 After you have added hosts to the distributed system, you define which roles the hosts are to have. There are the following roles:

· Master name servers

There can be up to three master name servers in a distributed system. At least two must be defined. See Constraints for information on the hosts on which

the master name servers must be located.

· Master index servers and master queue servers

· Slave index servers

· Backup index servers and backup queue servers

Defining a Master Name Server 

1. Select the required host in the Hosts table.

2. In the column Name Server Mode, choose 1st master, 2nd master, or 3rd master.

Defining a Master Index Server or Master Queue Server 

1. Select the required host in the Hosts table.

2. Select Master Index/Queue Server.

Defining a Slave Index Server 

1. Select Use Slave Index Servers.

2. Define the slave index server in the Hosts table.

a. Select the host that you want to define as a slave index server.

b. In the column Slave Index Server for… spec ify the master index server to which the server b elongs.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 44 of 105

Page 45: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 45/105

Defining a Backup Index Server or Backup Queue Server 

1. Select Use Backup Index/Queue Servers.

2. If the master servers are to share a backup server, select Use One Shared Backup Server.

The graphics below only depic t the master and backup index servers. The graphics are also valid for master and backup queue servers.

Example 1

The following system has two master servers that are sharing a backup server.

In this case, select Use One Shared Backup Server.

Example 2

In the following system, each master server has its own backup server.

In this case, do not select Use One Shared Backup Server.

3. Define the backup server in the Hosts table.

a. Select the host that you want to define as a backup server.

b. If you have selected Use One Shared Backup Server, you just need to indicate that the host is the backup server. If you did not select this field, specify the

master server to which this server belongs in the column Backup Index/Queue Server for…

 

Configuring Centralized Data Storage

UseIf you want to store the TREX data centrally on a file server, specify this fact when configuring the landscape.

If you are using a file server, TREX automatically stores a topology file on the file server. The master name servers then share this topology and no longer use

their local topology files.

This has the advantage that the master name servers do not need to synchronize their local topology files. In some circumstances, synchronization can cause

a master name server to use an out-of-date topology file because it did not receive all of the changes.

Example: A host on which a master name server is running has not been in operation for some time. Its local topology file is therefore out-of-date. If you stop all

TREX hosts and then start the master name server that has been out of operation first, the system will use its out-of-date topology file. If this happens, update

the topology file manually us ing backup copies.

Prerequisites

You have prepared for centralized data storage.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 45 of 105

Page 46: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 46/105

Procedure

1. Select Use a File Server .

2. On all hosts, change the path specifications so that they reference the central TREX data directory on the file server:

a. Select a host in the Hosts table.

b. Enter the relevant central TREX data directory on the file server (UNIX) or network drive (Windows) in the Base Path column.

Examples of Path Specifications

UNIX: Individual hosts with a file server 

You have created the central TREX data directory mytrexdir on the file server myfileserver. This directory is mounted at /mypath/mytrexdir on all TREX hosts. The

path sp ecifications are as follows:

Host Base Path

mytrexhost_1 /mypath/mytrexdir  

... ...

mytrexhost_n /mypath/mytrexdir  

 

UNIX: Blade system with a file server 

You have installed TREX in the directory usr/sap/trex_<instance_number> on the file server myfileserver. All server b lades access this directory. The TREX data

should be located in the installation directory. The path specifications are as follows:

Host Base Path

mytrexhost_1 /usr/sap/<SAPSID>/TRX<instance_number>

... ...

mytrexhost_n /usr/sap/<SAPSID>/TRX<instance_number>

You do not have to change the default value for the base path.

Windows: Individual hosts with a file server 

You have created the central TREX data directory mytrexdir on the file server myfileserver. The directory is connected as the network drive T:. The path

specifications are as follows:

Host Base Path

mytrexhost_1 T:

... ...

mytrexhost_n T:

 

Checking and Activating the Configuration

UseYou can check whether the landscape configuration is consistent and complete at any time. This allows you to check the effects of the configuration changes

without activating them. Activate the configuration when you have made all necessary settings.

Procedure

· To check the configuration, choose Check.

· To activate the configuration, choose Deploy.

Result

When you check the configuration, the output area shows the checks that are carried out. If the configuration is not consistent, the system issues a message telling

you so. You can use information in this message to revise your configuration.

The system also checks the configuration when you activate it. If the configuration is consistent, the system updates the configuration files of the affected hosts

and restarts the servers if necessary. If the configuration has errors, the output area displays appropriate messages and does not update the configuration files.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 46 of 105

Page 47: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 47/105

Example ConfigurationsThe following sections depict example systems and the relevant configurations.

 

Systems with Master and Slave Index ServersThis section depicts the configuration for systems with master and slave index servers.

One Master Index Server, Two Slave Index Servers

TREX admin tool, Landscape Configuration

Areas Scenario, Scenario Details, and Index 

 Area Field Value

Scenario Use Backup Index/Queue Servers

Use Slave Index Servers !

Use a File Server 

Scenario Details Use One Shared Backup Server  

 Assign Existing Indexes/Queues to New

Backup/Slave Servers

 

Index Search on Master/Backup Server  

Search Version majority

Replication Threads 1

Hosts table (extract 1)

Host Name Server Mode Master Index/Queue Server Slave Index Server for Preprocessor Mode

mytrexmaster1 1st master ! index

mytrexslave1 slave ! mytrexmaster1 search

mytrexslave2 2nd master ! mytrexmaster1 search

Hosts table (extract 2)

Host Base Path

mytrexmaster1 %(SAP_RETRIEVAL_PATH)

mytrexslave1 %(SAP_RETRIEVAL_PATH)

mytrexslave2 %(SAP_RETRIEVAL_PATH)

 

Two Master Index Servers, Two Slave Index Servers Each

One index server per host

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 47 of 105

Page 48: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 48/105

TREX admin tool, Landscape Configuration

Areas Scenario, Scenario Details, and Index 

 As in the previous example

Hosts table (extract 1)

Host Name Server Mode Master Index/Queue Server Slave Index Server for Preprocessor Mode

mytrexmaster1 1st master ! index

mytrexmaster2 2nd master ! index

mytrexslave1 slave ! mytrexmaster1 search

mytrexslave2 slave ! mytrexmaster1 search

mytrexslave3 slave ! mytrexmaster2 search

mytrexslave4 slave ! mytrexmaster2 search

Hosts table (extract 2)

Host Base Path

mytrexmaster1 %(SAP_RETRIEVAL_PATH)

mytrexmaster2 %(SAP_RETRIEVAL_PATH)

mytrexslave1 %(SAP_RETRIEVAL_PATH)

mytrexslave2 %(SAP_RETRIEVAL_PATH)

mytrexslave3 %(SAP_RETRIEVAL_PATH)

mytrexslave4 %(SAP_RETRIEVAL_PATH)

Two index servers per host

TREX admin tool, Landscape Configuration

Areas Scenario, Scenario Details, and Index 

 As in the previous example

Hosts table (extract 1)

Host Name Server Mode Master Index/Queue Server Slave Index Server for Preprocessor Mode

mytrexmaster1 1st master ! index

mytrexslave1 slave ! mytrexmaster1 search

mytrexslave2 2nd master ! mytrexmaster1 search

Hosts table (extract 2)

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 48 of 105

Page 49: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 49/105

Host Base Path Services

mytrexmaster1 %(SAP_RETRIEVAL_PATH) nameserver, preprocessor1,

indexserver1, queueserver, indexserver2

mytrexslave1 %(SAP_RETRIEVAL_PATH) nameserver, preprocessor1,

indexserver1, queueserver, indexserver2

mytrexslave2 %(SAP_RETRIEVAL_PATH) nameserver, preprocessor1,

indexserver1, queueserver, indexserver2

TREXDaemon.ini on all hosts (extract)

[daemon]

programs=nameserver, p reprocessor1, indexserver1, queueserver, indexserver2

 

Systems with Master, Backup, and Slave Index ServersThis section depicts the configuration for systems with master, backup, and slave index servers. The systems differ as to the number and ass ignment of backup

index servers. The following variants are taken into account:

· One backup index server per master index server 

¡ One backup index server, one master index server 

¡ One backup index server, two master index servers

· One backup index server for all master index servers

The same spec ifications are valid for the master and backup queue servers.

One Backup Index Server, One Master Server 

TREX admin tool, Landscape Configuration

Areas Scenario, Scenario Details, and Index 

 Area Field Value

Scenario Use Backup Index/Queue Servers !

Use Slave Index Servers !

Use a File Server !

Scenario Details Use One Shared Backup Server  

 Assign Existing Indexes/Queues to New

Backup/Slave Servers

 

Index Search on Master/Backup Server  

Search Version majority

Replication Threads 1

Hosts table (extract 1)

Host Name Server Mode Master Index/Queue Server Backup Index/Queue Server  

for 

Slave Index Server for 

mytrexmaster 1st master !

mytrexbackup 2nd master ! mytrexmaster  

mytrexslave1 slave ! mytrexmaster  

mytrexslave2 slave ! mytrexmaster  

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 49 of 105

Page 50: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 50/105

Hosts table (extract 2)

Host Preprocessor Mode Base Path

mytrexmaster index UNIX: /mypath/mytrexdir  

Windows: T:

mytrexbackup index UNIX: /mypath/mytrexdir  

Windows: T:

mytrexslave1 search UNIX: /mypath/mytrexdir  

Windows: T:

mytrexslave2 search UNIX: /mypath/mytrexdir  

Windows: T:

Windows: TREXDaemon.ini on all hosts (extract)

[mappings]

map_t=\\myfileserver\mytrexshare

Two Backup Index Servers, Two Master Index Servers

TREX admin tool, Landscape Configuration

Areas Scenario, Scenario Details, and Index 

 As in the previous example

Hosts table (extract 1)

Host Name Server Mode Master Index/Queue Server Backup Index/Queue Server  

for 

Slave Index Server for 

mytrexmaster1 1st master !

mytrexmaster2 2nd master !

mytrexbackup1 3rd master ! mytrexmaster1

mytrexbackup2 slave ! mytrexmaster2

mytrexslave1 slave ! mytrexmaster1

mytrexslave2 slave ! mytrexmaster1

mytrexslave3 slave ! mytrexmaster2

mytrexslave4 slave ! mytrexmaster2

Hosts table (extract 2)

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 50 of 105

Page 51: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 51/105

Host Preprocessor Mode Base Path

mytrexmaster1 index UNIX: /mypath/mytrexdir  

Windows: T:

mytrexmaster2 index UNIX: /mypath/mytrexdir  

Windows: T:

mytrexbackup1 index UNIX: /mypath/mytrexdir  

Windows: T:

mytrexbackup2 index UNIX: /mypath/mytrexdir  

Windows: T:

mytrexslave1 search UNIX: /mypath/mytrexdir  

Windows: T:

mytrexslave2 search UNIX: /mypath/mytrexdir  

Windows: T:

mytrexslave3 search UNIX: /mypath/mytrexdir  

Windows: T:

mytrexslave4 search UNIX: /mypath/mytrexdir  

Windows: T:

One Backup Index Server for All Master Index Servers

TREX admin tool, Landscape Configuration

Area Scenario Details

 Area Field Value

Scenario Use Backup Index/Queue Servers !

Use Slave Index Servers !

Use a File Server !

Scenario Details Use One Shared Backup Server !

Assign Existing Indexes/Q ueues to New

Backup/Slave Servers

 

Index Search on Master/Backup Server  

Search Version majority

Replication Threads 1

Hosts table (extract 1)

Host Name Server Mode Master Index/Queue Server Backup Index/Queue Server Slave Index Server for  

mytrexmaster1 1st master !

mytrexmaster2 2nd master !

mytrexbackup 3rd master !

mytrexslave1 slave ! mytrexmaster1

mytrexslave2 slave ! mytrexmaster1

mytrexslave3 slave ! mytrexmaster2

mytrexslave4 slave ! mytrexmaster2

Hosts table (extract 2)

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 51 of 105

Page 52: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 52/105

Host Preprocessor Mode Base Path

mytrexmaster1 index UNIX: /mypath/mytrexdir  

Windows: T:

mytrexmaster2 index UNIX: /mypath/mytrexdir  

Windows: T:

mytrexbackup index UNIX: /mypath/mytrexdir  

Windows: T:

mytrexslave1 search UNIX: /mypath/mytrexdir  

Windows: T:

mytrexslave2 search UNIX: /mypath/mytrexdir  

Windows: T:

mytrexslave3 search UNIX: /mypath/mytrexdir  

Windows: T:

mytrexslave4 search UNIX: /mypath/mytrexdir  

Windows: T:

Features of a Blade System

If you are using a blade system and the TREX data is located in the installation directory, the column Base Path has the following value:

Hosts table (extract)

Host Base Path

mytrexmaster usr/sap/<SAPSID>/TRX<instance_number>

... ...

The rest of the configuration is the same.

 

Configuration of the RFC Connection

Purpose

If you want to connect the TREX system to an SAP system, you must configure an RFC connection.

Process Flow

1. Define an SAP system user .

2. Determine the connection data for the SAP system.

3. Configure the RFC connection using the TREX admin tool (stand-alone).

For more information about starting the TREX admin tool (s tand-alone), see Starting the TREX Admin Tool.

Result

For more information about the RFC connection and handling c onnection and configuration errors, see the documentation on the TREX admin tool (stand-alone). You

can find this documentation in the SAP Library at help.sap.com/nw70 ® SAP NetWeaver.

More Information

Connection to the Application 

Creating an SAP System User for the TREX Admin Tool

(Standalone)

Use

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 52 of 105

Page 53: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 53/105

You must create an SAP user that the TREX admin tool (standalone) can use to log on to the SAP system. In addition, the SAP user is required so that the TREX

alert server has permission to regularly test and check the RFC configuration. When doing this, the user can have been created in the default client or in another 

client. In this case, make sure that you enter the associated client for the user during the configuration of the RFC c onnection in the TREX admin tool.

The TREX admin tool (standalone) is used to configure and monitor TREX. You also use this admin tool to configure the RFC connection between TREX and the

 ABAP app lication that is using TREX. To use the TREX admin tool (standalone) to create the RFC destination, the admin tool requires a SAP sys tem user that you

create based on the predefined role SAP_BC_TREX_ADMIN. This user then has the authorization required to configure the RFC connection.

For more information on the SAP_BC_TREX_ADMIN role, see SAP Note 766516.

Overview of the Permissions Assigned by the SAP_BC_TREX_ADMIN Role

Type and Scope of the Permission Activity Explanation

Permission check

for RFC access

Execute Name of the RFC object to be protected: SYST,

TREX_ARW_ADMINISTRATION

 Administration for the

RFC destination

 Add or generate, change, display , delete, extended

maintenance

Type of entry in RFCDES: Start of an external

program using TCP/IP

Check on the transaction code at transaction launch Transaction code: SM59, TREXADMIN,

TREXADMIN_AUTH

 Administrating TREX Change, disp lay, execute

 ABAP: Program run checks Schedule programs for background processing,

execute ABAP program, maintain variants for and

execute ABAP program

 

 ALV standard layout Maintain

 App lication log Disp lay, delete

More Information

Configuring and Administrating the RFC Connection

Configuring the RFC Connection in the TREX Admin Tool

Procedure

Create an SAP system user for the TREX admin tool (standalone) and assign the SAP_BC_TREX_ADMIN role to this user.

1. Launch transaction SU01 (user maintenance) or choose Administration ® System Administration ® User Maintenance ® User in the SAP menu. The User 

Maintenance: Initial Screen appears.

2. Enter a new user name and choose Create.

3. On the Address tab page, enter the personal data for the user.

4. On the Roles tab page, assign the SAP_BC_TREX_ADMIN role and thus the permiss ion to access the SAP system to the SAP sys tem user for the TREX

admin tool (s tandalone).

Result

This user for the TREX admin tool (standalone) now has the authorization required to configure the RFC connection.

 

Determining the SAP System Connection Information

UseThe TREX admin tool (stand-alone) can connect to an SAP system in two ways.

· Through a specific app lication server of the SAP system (variant A)

· Through the message server of the SAP system (variant B)

This variant uses the load-balancing function for the SAP system. The message server assigns the request from the TREX admin tool to any application

server.

Depending on the variant used, the TREX admin tool requires different connection information for the SAP system. You must determine the connection information

and specify it later in the TREX admin tool.

SAP recommends using variant B. Variant A has the disadvantage that the connection does not work if the application server is not available.

Procedure

1. Open the SAP Logon.

SAP Logon is the program that you use to log on to an SAP system.

2. Note the following connection information:

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 53 of 105

Page 54: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 54/105

Connection Setup Type Required Connection Information

Through an application server (variant A) · SAP system ID (SID)

· System number 

· App lication server host name

Through the message server (variant B) · SAP system ID (SID)

· Logon group, such as PUBLIC

· Message server host name

 

Configuring the RFC Connection in the TREX Admin Tool

Use

You work through the steps below using the TREX admin tool (stand-alone).

Configuration of the RFC connection with the TREX admin tool (stand-alone) is only available as of SAP Basis Component SAP_BASIS 6.20 SP58, 6.40

SP16, and 7.0 SP6. If you are using TREX with an SAP system based on an earlier support package, you have to configure the RFC connection manually as

described in the SAP NetWeaver 04 Installation Guide for Search and Classification (TREX) 6.1. You can find this guide on the SAP Service Marketplace at

service.sap.com/instguides ® SAP NetWeaver ®Released 04 ®Installation ®Cross-NW ®Installation Guide Search and Classification TREX 6.1.

Creating a Connection

1. In the Landscape RFC window, choose the Create Connection function.

2. Choose connection type A or B. Specify the connection data for the SAP system (see Determining the SAP System Connection Information).

3. Specify the SAP system user, the associated password, and the client that the TREX admin tool is to use to log on (see Creating a SAP System User for the

TREX Admin Tool (Stand-Alone)).

If the SAP system user in question exists in the default client, you do not need to specify the client.

 

Creating an RFC Destination

1. In the Landscape RFC window, choose the RFC Destination (SM59) function.

2. Enter the following parameters:

Field Entry

SAP Sys tem SAP system that you want to set up the connection to.

The list contains all SAP systems that you have registered using Create

Connection.

RFC Destination Name of the RFC destination.

Description Meaningful description of the purpose

The program ID determines under which name the TREX RFC server registers with the SAP gateway. The program ID must be unique for each SAP

gateway. The TREX admin tool ensures this by generating the program ID.

 

3. Dec ide which SAP gateway you want to use. You have the following options:

Option Comment

Gateway local

(Default setting)

Use local SAP gateways for the application servers.

Gateway central Use the central SAP gateway.

We advise against using a central SAP gateway for distributed TREX

systems. The central SAP gateway is a “single point of failure.”

If you choose this option, enter the following additional parameters:

● Host name (with domain name if necessary) or the IP address of the host

on which the gateway is installed.

● Name of the SAP gateway in the form sapgw<instance_number>

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 54 of 105

Page 55: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 55/105

 

SAP advises against creating the RFC destination directly in the SAP system. The name of the RFC destination and the program ID must satisfy certain

naming conventions. The TREX admin tool ensures that these are fulfilled.

If you nevertheless create the RFC destination directly in the SAP system, note the following:

● We recommend starting the name of the RFC destination with TREX_.

● Choose the activation type Registered Server Program.

● Choose a program ID that is unique for the SAP gateway used.

● Use the RFC Destinationsfunction to register the RFC destination in the TREX admin tool.

 

Completing the RFC Configuration

1. In the Landscape RFC window, choose the Connect function.

The TREX admin tool creates the connection to all SAP systems that are known to it. Because the RFC configuration is still incomplete, the configuration

status is yellow or red.

 

2. Choose Repair All.

The TREX admin tool completes the RFC configuration and starts the TREX RFC server.

This can take several minutes. During this time, the configuration status remains yellow or red. After completion of the configuration process, the status

changes to green.

 

Do not choose Repair All several times in quick succession. This would trigger the configuration process more than once and delay it.

 

3. Check the progress by choosing Refresh to update the display.

 

Configuring the HTTP Connection

Use

If you want to connect the TREX system to a Java application, you must register at least one name server with the TREX Java client.

We recommend that you specify all master name servers on the client side. This increases the availability of the connection between the application and

TREX. If the Java client cannot reach one master name server, it can attempt to reach another instead.

The client-side configuration is separate from the server-side configuration. In principle, you can enter any name servers on the client side, regardless of their 

server-side role.

Procedure

1. If you do not know the addresses of the master name servers, look for them in the TREX admin tool at Landscape ® Configuration:

<host>:<name_server_port><:name server mode>.

2. Change the Java client parameters for all server processes in the clus ter as follows:

More information: Specifying the Address of the TREX Name Server  

a. Enter one master name server in the following parameter:

nameserver.address tcpip://<one_address>

b. Specify the TREX backup name servers in the nameserver.backupserverlist parameter. When doing so, separate the backup name servers using a

comma and use the following format: tcpip://<host1>:<port1>,tcpip://<host2>:<port2>, …

The addresses of the master name servers must be configured for all server processes in the cluster. Otherwise the connection between the J2EE Engine and

TREX cannot be established.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 55 of 105

Page 56: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 56/105

Information on Breakdown of Master ServersIf you have imp lemented a system with master and backup servers and a master server breaks down, the backup server becomes active automatically. When

the master server becomes available again, the system swaps b ack to the master server.

If you create an index while the master server is unavailable, TREX proceeds as follows:

· The backup server becomes the master server for this index

· This index has no backup server 

The TREX admin tool displays this in the Index Landscape area as follows:

TREX does not change this assignment, even when the master server becomes available again. You have to correct the assignment for this index as follows:

1. Start the TREX admin tool on any host in the distributed system.

2. Make sure that the master and backup servers are available.

3. Go to the Index Landscap e window.

4. In the column <light><host_name_ master_index_ server>:<port>, click on the line for the index in question. Choose Add backup here from the context menu.

5. Click on the same cel l again and choose Switch master/backup from the context menu.

The index now has a backup server, and the master and backup servers are ass igned to the index correctly.

6. Go to the Queue Landscape window.

7. Carry out the same changes that you just carried out for the index, b ut this time for the queue.

 

Delta Index and Index Replication Configuration

Purpose

Delta indexes speed up updates of the master indexes. Index replication transfers changes made on master indexes to slave indexes.

Delta indexes and index replication are deactivated by default. The best time for activating them depends on which of the following scenarios you have:

Scenario Procedure

Initial indexing of large data sets (more than 100,000 documents) 1. Create indexes and carry out the initial indexing of the data.

During this phase, the system only carries out indexing. It does notreplicate data.

2. Activate the delta indexes.

3. Trigger the first index replication manually.

4. Configure regular index repl ication.

No initial indexing of large data sets 1. Create indexes.

2. Configure regular index repl ication.

3. Monitor the size of the master indexes during routine operation. Activate

the delta indexes when a master index reaches a certain size.

The sections below contain background information on delta indexes and index replication, and describe the configuration required.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 56 of 105

Page 57: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 57/105

Delta Index Configuration

Purpose

TREX provides the option of activating delta indexes. This can speed up the update of the index.

This documentation contains:

● General information on the delta index

● Information on activating the delta index

● Information on integrating the delta index into the main index

 

Delta IndexWhen TREX updates an index, it rewrites the majority of the index files. If the indexes are large this process can take a long time and generate a high system

load.

TREX allows you to activate a delta index in order to speed up the update. The delta index is a separate index that TREX creates in addition to the main index.

The main index and its delta index only differ TREX-internally. Outside of TREX they form a unit.

If the delta index is activated changes flow into the delta index. Because the delta index is smaller than the main index, fewer documents are affected by the

update. The delta index can therefore be updated more quickly.

The delta index is deactivated by default. The following rules are valid for its activation:

· If you have a single host system the activation is optional. However, it is recommended if the main index has reached a certain size. If you activate the deltaindex to soon, performance does not improve.

· If you have a distrib uted TREX sys tem the activation is obligatory. However, you still only activate it once the main index has reached a certain size.

 Activating the delta index doesn't only sp eed up the update of the master index - it also enables fast index replication with a low network load.

When index replication takes place the master index server replicates all changed master index files. Because the delta index consists of fewer files, it

naturally has fewer files to replicate. This means that index replication is quicker. Moreover, if you have decentralized data storage the network load is also

less b ecause TREX has to copy less files to the slave hosts.

The delta index only speeds up the update if it is kept small. If it becomes too large, it no longer improves performance. When it reaches a certain size you have

to integrate it in the main index. You can integrate the delta index manually or configure TREX so that TREX regularly integrates it automatically. TREX creates a

new delta index automatically when the integration of the previous delta index is complete.

 

Activating the Delta Index

Use

The delta index is deactivated by default. You can activate it using the TREX admin tool. You activate it per index, not globally.

The best time for activating it depends on your indexing process.

SAP recommends the following:

· Initial indexing of large document sets

 Activate the delta index after the initial indexing run. If you do not do this, the delta index grows too quickly and you have to integrate it into the main index earlier than you would wish. This means that you need twice the indexing time: Firstly to index the documents in the delta index, and then to integrate the delta index into

the main index.

· No initial indexing of large data sets

Monitor the size of the main index during routine operation. Activate the delta index if the main index reaches 100,000 to 1,000,000 documents or 500 MB.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 57 of 105

Page 58: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 58/105

Procedure

1. Go to the window Index Admin ® Index Info in the TREX admin tool.

2. Select the index that you want to activate the delta index for. Choose Delta Index On.

 

Integrating a Delta Index into the Main Index

Use

 A delta index only speeds up the update of the corresponding index if it is small. If it becomes too large, you have to integrate it into the main index. After the

integration has taken place TREX creates a new delta index.

The integration process involves TREX rewriting all main index files. The duration of the integration process depends on the size of the main index. It can last a few

minutes or several hours.

In a distributed system the entire main index has to be replicated after the integration has taken place. This replication takes about the same amount of time as

the initial replication.

The index server cannot index new documents during the integration of the delta index. This has the following effects:

· If indexing takes place with a queue server, the queue server retains the documents until the integration process has been completed. Then the queue server 

transmits the documents to the index server.

· If indexing takes place without a queue server, the app lication can continue to send indexing requests to the index server. However, the index server only

processes them after the completion of the integration process. This means that it takes longer for indexing requests to be processed and for the application

to receive the relevant response.

You can trigger the integration process manually or carry it out at defined time intervals. There are two difference procedures for time-dependent integration. The

procedure that you use depends on whether indexing takes place with or without a queue server (QS). The table below gives an overview of the procedures.

Use with

Procedure Indexing with QS Indexing without QS

Manual ! !

Time-dep endent using the queue server !

Time-dependent using the Python scheduler !

We recommend the following for the time of the integration:

· Trigger the first integration process if the delta index is bigger than 500 MB. You can find out the size of the delta index in the window Index Admin ® Index Info

in the TREX admin tool.

· The integration process should take place at times when the system is not too busy.

· Do not carry out the integration process too often. With large indexes, the integration and subsequent replication of the main index takes a corresponding amount

of time.

Integrating the Delta Index Manually

1. Go to the window Index Admin ® Index Info in the TREX admin tool.

2. Select the index in question and choose Merge Delta Index.

Integrating the Delta Index Time-Dependently Using the Queue Server 

In the queue parameters enter the time for the integration in Merge Time for Delta Index.

Use All (4:00) to trigger replication every morning at 4am.

You do not need to coordinate the integration time with other activities carried out by the queue server and index server. If the activities collide, the index server 

coordinates when it carries out which action.

For more information on changing queue parameters, see Configuring Queue Parameters.

Integrating the Delta Index Time-Dependently Using the Python Scheduler 

Change the following configuration files on all master name servers:

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 58 of 105

Page 59: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 59/105

Configuration file Change

TREXDaemon.ini 1. Activate the Python scheduler by changing the TREX configuration file

TREXDaemon.ini in the TREX admin tool, menu path Landscape ® Ini as

follows:

[daemon]

programs=<other_sections>,cron

2. Once you have saved the changes, the TREX admin tool asks you

whether it should trigger reconfiguration so that the changes to the configuration

file take effect.

Confirm this query by choosing Yes.

crontab.ini Remove the comment sign from the following line:

<schedule> python mergeDeltaIndex.py silent allIndexes=1 ''

Modify the schedule if necessary. For information on syntax and for examples,

see the configuration file.

 

Index Replication Configuration

Purpose

Index replication transfers changes made on master indexes to slave indexes. The sections below describe the process and configuration of index replication.

 

Index Replication ProcessIndex rep lication takes p lace in a system with master and slave index servers. The master index server manages the original indexes and the slave index

servers access index copies. Replication makes sure that changes to the master indexes are transferred to the index copies.

Replication takes place in different ways depending on the type of data storage.

Replication with Decentralized Data Storage

The initial replication of an index takes place as follows:

1. The master index server generates an index snapshot. The name server tells the slave index servers that the index snapshot is availab le. The slave index

servers request the snapshot from the master index server.

2. When all slave index servers have the index snapshot, they integrate it into their index one after the other. The slave index server currently integrating the

files has the status ‘inactive’. This means that it is not available for searching. It receives the status ‘active’ again as soon as the integration has been completed.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 59 of 105

Page 60: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 60/105

Because all index files are copied for the initial replication, the process can take a long time if the index in question is large.

In subsequent replications the system only replicates the changed index files. This is normally a smaller amount of data than for the initial replication, and

subsequent replications are therefore faster. The process flow is as follows:

1. The master index server compares the master index and the index snapshot in order to determine changed index files. It then updates the index snapshot.

The name server tells the slave index servers that a new index version is available. The slave index servers request the changed index files from the master 

index server.

2. When all slave index servers have the changed index files, they integrate them into their index one after the other. The slave index server currently integrating

the files has the status ‘inactive’. This means that it is not available for searching. It receives the status ‘active’ again as soon as the integration has been

completed.

 

Replication with Centralized Data Storage

The initial replication of an index takes place as follows:

1. The master index server generates a complete copy of the index (index snapshot).

2. The slave index servers connect to the index snapshot and use this as their slave index.

If the master index changes and replication needs to take place again, the following occurs:

1. The master index server generates a second index snapshot.

2. The slave index servers change to the second index snapshot.

 All subsequent replications take place as follows:

1. The master index server determines the changed index files b y comparing the master index with the index snapshot that the slave index servers are not

currently using. It then updates this index snapshot.

2. The slave index servers change to the updated index snap shot.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 60 of 105

Page 61: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 61/105

 

Triggering Index Replication

Use

By default index replication is deactivated. You can trigger replication in various ways. The table below gives an overview of the methods and of when you should

use them.

Procedure Effect Use with

Ind exing with QS* Indexing without QS*

1. Manual replication Index changes are available

for searching when you have

replicated the index.

You use this method for the

initial replication.

! !

2. Automatic replication following

optimization in the queue

server (Replicate after 

Optimize).

Following the optimization of 

documents, index changes

are available in the queue

server for the search.

!

3. Automatic replication following

every index update

 All index changes are

available quick ly for the

search.

! !

4. Time-dependent replication

using the queue server 

Index changes are available

for searching when the next

replication has taken place.

Replication takes p lace

regularly according to a

defined schedule.

!

5. Time-dependent replication

using the Python scheduler 

Index changes are available

for searching when the next

replication has taken place.

Replication takes p lace

regularly according to a

defined schedule.

!

6. Replication triggering by the

application using TREX

TREX provides the

(ABAP/Java) app lication with

methods for triggering

replication.

! !

*QS = queue server 

The system replicates the entire index for the initial replication. In subsequent replications the system only replicates the changed index files. The duration of the

replication and the generated system load depends on the following factors:

● Are the indexes s tored centrally or are they distributed?

With decentralized data storage the replication generates a higher net load because the system has to copy the indexes to the slave hosts.

● How often is the index updated?

● How many index files need to be replic ated? This depends on the size of the index or delta index.

● How many indexes need to be replicated?

● How large are the indexes? What type of documents are indexed (documents with attributes only, documents with attributes and text content, or only

documents with text content). Does the index contain text-mining information?

● How quick ly should the updated information be availab le for searching?

In order to determine the optimum time for replication, you have to weigh up the required topicality against the system load generated.

We recommend that you carry out the initial replication manually, since it can last a lot longer than subsequent replications.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 61 of 105

Page 62: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 62/105

If large indexes need to be replicated frequently, it may not be possible for the system to keep to your configured interval for replication. If this is the case, the

system carries out automatic replication at the next possible point in time.

1. Replicating an Index Manually

1. Go to the Index Landscape window in the TREX admin tool.

2. Carry out one of the following steps :

○ To replicate all indexes, choose Replicate All.

○ To replicate a single index, select the index in question and choose Replicate Index from its context menu.

2. Replicating the Index Automatically Following Optimization in the Queue Server (Replicate

after Optimize).

Set the Replicate After Optimize queue parameter to On.

For more information on changing queue parameters, see Configuring Queue Parameters.

We recommend that you arrange for the index to be replicated automatically after every update - as described in point 3 - rather than using the Replicate After 

Optimize procedure. Replicate After Optimize only replicates changes involving the TREX queue server. Changes made without involving the queue server,

such as changes to index properties and taxonomies, are not replicated.

3. Replicating an Index Automatically Immediately After Every Update

When you create an index, you can arrange for it to be replicated automatically immediately after every update. To do so, proceed as follows.

1. Go to Index area in the Landscape ® Configuration window of the TREX admin tool.

2. Activate automatic index replication by selecting the Auto Replication checkbox.

You can change this setting later on if you want.

3. Go to the Index ® Landscape window in the TREX admin tool.

4. Use the secondary mouse button to clic k on the index whose index replication settings you want to change.

5. Choose Landscape Configuration and then Enable Auto Replication or Disable Auto Replication.

This way of triggering index replication is particularly important in scenarios that do not use a TREX queue server.

4. Replicating the Delta Index Time-Dependently

Enter the time at which the replication is to take place in the Replication Time queue parameter .

Use All-3 to trigger replication every three hours. Use All (3:00) to trigger replication every morning at 3am.

For more information on changing queue parameters, see Configuring Queue Parameters.

5. Using the Python Scheduler to Schedule Index Replication

Change the following configuration files on all master name servers:

Configuration file Change

TREXDaemon.ini If the Python scheduler is not yet active, activate it now:

[daemon]

programs=<other_sections>,cron

crontab.ini Remove the comment sign from the following line:

<scheduler> python replicate.py silent allIndexes=1 ''

The default setting causes the system to check for changes to an index every

5 minutes. If there are no changes, the system takes no further action. If 

changes have taken place, they are replicated.

Modify the schedule if necessary. For information on syntax and for examples,

see the configuration file.

Result

You can monitor index replication in the TREX admin tool (stand-alone) in the Index Landscape window. If necessary, you can terminate replications in progress

there.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 62 of 105

Page 63: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 63/105

Controlling the Replication Load

Use

You can define how many indexes the system replicates in parallel. This allows you to influence the following:

· The system load on the master index server 

· If you are using decentralized data storage, the network load that arises due to copying the changed index files

The higher the number of indexes replicated in parallel, the greater the load. The lower the number, the lower the load. However, this causes replication to take

longer.

Procedure

1. Go to the window Landscape Configuration in the TREX admin tool.

2. In the field Replication Threads, enter how many indexes the system is to replicate in p arallel.

 

Configuring Topicality of Search ResultsUse

You can define how up-to-date you want the searched index to be. There are the following options:

Option Meaning

majority The search takes place on the index version available on the majority of slave

index servers. If two index versions are equally available, TREX uses the

more up-to-date of the two.

 Advantage: The search queries are distributed. This setting gives the highest

availability for the search because during replication TREX only switches to

the new version from the old version when the majority of the slave index

servers have the new version.

Disadvantage: The search may not take place using the most up-to-date data.

majority is the default setting.

latest The search takes place using the most up-to-date index that has been

released for replication.

 Advantage: The search takes place us ing the most up-to-date data.

Disadvantage: This setting can hamper search performance. TREX always

uses the up-to-date version, even if only a few (or even no) slave index

servers have the most up-to-date version. If no slave index server has the

most up-to-date version, the master index server receives the search queries -

even if it is locked for searching. This ensures that search queries are always

answered and the application receives no error message.

You can change the standard configuration in the following two ways:

· For all new master indexes

· For exis ting master indexes

Changing the setting for all new master indexes

1. Go to the window Landscape Configuration in the TREX admin tool.

2. Choose the required setting for Search Version.

Changing the setting for an existing master index

1. Go to the Index Landscape window in the TREX admin tool.

2. Select the index in question and choose Landscape Configuration from the context menu.

3. Choose the required setting for Search Version.

 

Changing a Distributed System

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 63 of 105

Page 64: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 64/105

Purpose

The sections below describe the changes that you can make to a distributed system after the installation. For all changes, note the Constraints that are relevant for 

distributed systems.

 

Adding and Removing Hosts

Features

You can use the TREX admin tool (stand-alone) to add or remove a host (server or blade server) to/from a TREX landscape. You do this if you have configured a

distributed TREX landscape.

Prerequisites

Make sure that you will still have enough CPU capacity and memory for your TREX landscape after removing a host.

Process Flow

● Removing a Host

○ Removing a host temporarily

○ Removing a host permanently

●  Adding a Host

 

Removing a Host

Use

You can use the TREX admin tool (stand-alone) to remove a host from a TREX landscape temporarily or permanently.

Removing a Host Temporarily

1. Go to the Landscape ® Configuration window in the TREX admin tool (stand-alone).

2. Remove the Master Index/Queue Server indicator for the host that you want to remove from your TREX landscape temporarily.

3. Choose Check and then Deploy  to save your change.

4. In the Landscape ® Reorg window, go to the Plan tab page.

5. Choose Start Reorg to start the required reorganization of your TREX landscape.

The reorganization process distributes indexes that are located on the removed host to other hosts. When the reorganization is finished, there are no more

indexes on the host in question.

If you select the Split/Merge Indexes checkbox before performing the reorganization, the system not only reorganizes the indexes but also distributes and

splits the logical indexes again. During this type of reorganization, the system also recalculates the number of parts of which a logical index consists.

Note that this reorganization can cause a complete reindexing process that can last as long as the initial indexing run. During this period, the system cannot

perform indexing runs and searching is limited.

 

To add the host to your landscape again, proceed as described in  Adding a Host.

Removing a Host Permanently1. Stop TREX on the host that you want to remove from your landscape.

The host is highlighted in red as soon as you have stopped it.

2. Go to the Landscape ® Configuration window in the TREX admin tool (stand-alone).

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 64 of 105

Page 65: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 65/105

3. Select the host that you want to remove permanently .

4. Choose Remove Host.

You are asked whether you want the indexes located on this host to be moved automatically.

5. Choose Move if you want this to happen.

The system removes all the indexes from the host in question.

 After permanently removing a host, do not simply carry out an organization. For performance reasons, you should completely redistribute the indexes. To do so,

select the Split/Merge Index checkbox in the Landscape ® Reorg window of the TREX admin tool (stand-alone) and then s tart the reorganization. During thistype of reorganization, the system also recalculates the number of parts of which a logical index consists.

Note that this reorganization can cause a complete reindexing process that can last as long as the initial indexing run. During this period, the system cannot

perform indexing runs and searching is limited.

 

Adding a Host

Use

You use the TREX admin tool (stand-alone) to add a new host (server or server blade) to your TREX landscape.

Procedure

1. Start TREX on the host that you want to add to your TREX system landscape.

○ Install a TREX instance on the server 

If you have not yet installed a TREX instance on the host that you want to add to your TREX landscape, do so before continuing with the procedure.

For more information about the installation of TREX, see the SAP NetWeaver 7.0 Search and Classification (TREX) Single Host installation guide. The guide is

located in the SAP Service Marketplace at service.sap.com/installNW70.

○ Install a TREX instance on the server blade

For a distributed TREX installation with server blades, use the cloneInst.py script to generate a new TREX instance on the server blade.

See: Activating the Configuration Clones for Server Blades2. Go to the Landscape ® Configuration window in the TREX admin tool (stand-alone).

3. Add the server or server blade to your TREX landscap e as follows:

○ Following the installation of an additional TREX instance on a server, execute the Add host command (see Adding a Host)

○ The cloneInst.py script automatically adds the server blade to the landscape

4. Select the Master Index/Queue Server indicator for the host that you want to add to your TREX landscape.

5. Choose Check and then Deploy  to save your change.

6. In the Landscape ® Reorg window, go to the Plan tab page.

7. Choose Start Reorg to start the required reorganization of your TREX landscape.

 After adding a host (server or server blade) to your TREX landscape, do not simply carry out a reorganization. For performance reasons, you should completely

redistribute the indexes. To do so, select the Split/Merge Index checkbox in the Landscape ® Reorg window of the TREX admin tool (stand-alone) and then

start the reorganization. During this type of reorganization, the system also recalculates the number of parts of which a logical index consists.

Note that this reorganization can cause a complete reindexing process that can last as long as the initial indexing run. During this period, the system cannot

perform indexing runs and searching is limited.

 

Changing Hosts

Purpose

You can make the following changes in a distributed system:

· Add master, backup , and slave hosts

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 65 of 105

Page 66: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 66/105

· Remove backup and slave hosts

· Replace backup or slave hosts

For details, see the relevant sections.

 

Reorganization Function in the TREX Admin Tool

You can use the reorganization function in the TREX admin tool (see Landscape ® Reorg) to automatically redistribute and optimize your TREX system landscape.

This function redistributes the TREX indexes among the master hosts and removes any backup and slave hosts that are not longer required (see Optimizing the

Landscape Using the Reorg Function). 

Adding a TREX Index Server 

Use

You can add additional index servers to a TREX host on which an index server is already running. You do this to distribute large indexes over multiple index

servers.

Each index server can use a maximum of 2GB of memory. Do not configure more index servers than can be supported by the memory on your host.

Prerequisites

1. Open the TREXDaemon configuration file in a text editor.

This file is located in the <TREX_DIR>/<trex_host_name> directory.

2. In the [daemon] section, add one or more index servers beneath the programs parameter: programs=nameserver,p reprocessor1,indexserver1,

indexserver<next_number>,queueserver,alertserver.

Depending on the hardware of your host, one or two index servers are entered in the file by default.

3. Copy the [indexserver1] section and rename the copied section as [indexserver <next_number> ].

Repeat this procedure for each of the index servers that you want to add. Choose a new value for the port number of the additional index server (arguments=-

port <index_server_port> parameter).

Determine the port of the first index server according to the following convention: <index_server_port>=3<TREX_instance_number>03. Increase the values

for the port numbers in steps of ten to avoid conflicts.

If your TREX instance number is 47:

[indexserver1]

arguments=-port 34703

...

[indexserver2]

arguments=-port 347 13

...

[indexserver3]

arguments=-port 347 23

...

[indexserver4]

arguments=-port 34733

...

4. Stop and start TREX so that your changes take effect.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 66 of 105

Page 67: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 67/105

Optimizing the Landscape Using the Reorg Function in the TREXAdmin Tool

UseYou can use the reorganization function in the TREX admin tool (see Landscape ® Reorg) to automatically redistribute and optimize your TREX system landscape.

This function redistributes the TREX indexes on the master hosts and removes backup and slave hosts that are no longer required.

For details about using the Reorg functions in the TREX admin tool, see Reorganization of the TREX System Landscape.

Procedure

1. Check the roles of the servers in the TREX admin tool at Landscape ®Configuration. If necessary , correct the roles and the choose Deploy.

2. Switch to Landscape ® Reorg:

The system automatically calculates a new, optimized distribution of your TREX system landscape according to the newly-defined roles. The new distribution of the

servers is then displayed at Landscape ® Reorg® Plan and at Landscape ® Reorg ® Usage By Service.

3. Start the reorganization of the landscap e by choosing Start Reorg in Landscape ® Reorg ® Summary.

The progress of the reorganization is displayed at Landscape ® Reorg ® Plan.

Result

The TREX system landscape has been reorganized and optimized according to your settings.

 

Reorganization of the TREX System Landscape

UseYou can use the Reorg function to distribute the indexes in a TREX system landscape among the available hosts to optimize their memory requirements.

The reorganization aims to achieve a balanced memory load and CPU load for the TREX system landscape.

 

Integration

This function is available in the TREX admin tool (stand-alone).

You can also launch it from the BI Accelerator Monitor. However, several screens are used in this case.

The TREX alert server contains the reorg check. When this check runs, you are automatically informed by e-mail (if configured), if the system recommends a

reorganization.

 

Features

Based on different key figures, TREX calculate whether a reorganization should be performed. The key figures are summarized on the Summary tab page and

displayed in detail on the Usage by Service (I) and (II) and Usage by Index tab pages.

 

The Landscape: Reorg window contains the following tab pages:

Overview of Tab Pages

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 67 of 105

Page 68: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 68/105

Tab Page Description

Summary Displays whether TREX recommends a reorganization.

The lower table displays key figures from which TREX calculates the

percentage improvement that could be achieved by a reorganization. These

estimates are compared with fixed program-internal values. If the estimated

improvement is high enough, TREX recommends the reorganization.

If TREX recommends a reorganization (summary = yes), you can start the

reorganization immediately or at a later time.

To start the reorganization immediately, choose Start Reorg.

To start the reorganization later, specify the date and time in the following

format: YYYY-MM-DD HH:MM:SS. Then choose Start Reorg.

Plan Displays the various steps that TREX would perform during a reorganization or  

that are being performed during a reorganization

In addition, you see the steps and the status of the latest reorganization.

Usage by Service (I) Displays the memory load and CPU load of the hosts in graphical form.

Based on these values and irrespective of the selected algorithm, TREX

calculates whether a reorganization is necessary.

For example, TREX recommends a reorganization if the distribution of memory

is unbalanced (identifiable by the different heights of the bar displays).

You can use a filter to show and hide the CPU load.

You cannot perform any other activities on this tab page, it is mainly for 

information purposes.

Usage by Service (II) Displays various key figures for the hosts in table form.

You cannot perform any activities on this tab page, it is mainly for information

purposes.

Usage by Index Displays various key figures for all indexes in table form.

You can use a filter to define which indexes should be displayed.

You cannot perform any other activities on this tab page, it is mainly for 

information purposes.

Interactive Reorg Displays various current key figures in table and graphical form.

On the graphic v iew, you can distribute the indexes manually using

Drag&Drop. This function is intended for experts.

Options You can define the algorithm and various parameters to be used for the

reorganization.

We recommend that you use the memory algorithm. All other algorithms are

used for test purposes.

Normally, you do not have to make any changes on this tab page.

 

 A reorganization can be necessary in the following circumstances:

● You make changes to your TREX system landscape. For example, you remove index servers or add new ones.

● The current size of indexes does not match the initial size estimates.

 

If indexes are moved during the reorganization, no update of the affected indexes is possible. Indexing is interrupted for the duration of the reorganization. The

affected indexes are displayed with a yellow traffic light in the Index: Landscape window.

 

Parameter Overview

The following parameters are available on the Options tab page. You do not have to make any changes by default.

Reorganization Parameters

Parameter Description

Split Indexes Specifies whether indexes are split into logical indexes with more than one

part if the defined size is exceeded.

This specification is in KB.

This parameter is deactivated by default.

Merge Indexes Specifies whether parts of indexes are merged if the size falls below a defined

value.

This specification is in KB.

This parameter is deactivated by default.

Small Indexes Specifies whether small indexes are distributed equally among the available

hosts if the size falls below a defined value.

This specification is in KB.This parameter is ac tivated by default and has the size 1,000 KB.

Remove Temporary Indexes If it is activated, temporary indexes are deleted during the reorganization.

This parameter is ac tivated by default.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 68 of 105

Page 69: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 69/105

 

Activities

Start the TREX admin tool (stand-alone) and navigate to the Landscape: Reorg ® Summary window. If TREX recommends a reorganization (summary = yes), start

the function by choosing Start Reorg. The display switches automatically to the Plan tab page. The window displays which steps are being performed. You can

choose the F5 button to update the display.

To cancel the reorganization, choose Cancel Reorg.

The reorganization is complete once all planned steps have been performed. The Summary tab page displays the status done.

 

Adding a Master Host

Use

You can add a new master host to a distributed system. You need to do this if the capacity of the existing master hosts is insufficient.

This has the following effect on the distribution of the indexes:

· The assignment of exis ting indexes remains unchanged.

· The new master index server receives all new indexes until all master index servers have the same number of indexes . TREX then distributes the new

indexes among all master index servers according to a round robin procedure.

The same principle is used for queues.

If you previously had one master host and add another, the indexes are distributed as follows:

If you are using backup hosts, the new master host needs to receive a backup host.

· If there is one backup host for all master hosts, this backup host is automatically made backup host for the new master host.

· If each master host has its own backup host, you have to add a new backup host for the new master host.

Procedure

1. Install TREX on the host that you want to add.

2. If you are using centralized data storage: Mount the central TREX data directory (UNIX) or define it as a network drive (Windows).

3. Start TREX on the new host.

4. Start the TREX admin tool on a host that is already configured in the distributed system.

5. Go to the Landscap e Configuration window.

6. Use Add Host to add the new host.

7. Configure the new host in the Hosts table as follows:a. Mark it as a master index/master queue server.

b. If you are using centralized data storage: In the column Base Path enter the central TREX data directory on the file server.

8. If you are using backup hosts and every master has its own backup host: Add a new backup host.

9. Check the configuration. If the check does not find any errors, activate the configuration using Deploy.

 

Adding a Backup Host

Use

You can add a new backup host to a distributed system. You need to do this if you have added a new master host and want it to have its own backup host.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 69 of 105

Page 70: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 70/105

Procedure

1. Install TREX on the host that you want to add.

2. Mount the central TREX data directory (UNIX) or define it as a network drive (Windows).

3. Start TREX on the new host.

4. Start the TREX admin tool on a host that is already configured in the distributed system.

5. Go to the Landscap e Configuration window.

6. Use Add Host to add the new host.

7. Select Assign Existing Indexes/Queues to New Slave/Backup Servers.

8. Configure the new host in the Hosts table as follows:

a. In the Backup Index/Queue Server for… column, specify the master host to which the host belongs.

b. In the Base Path column, enter the central TREX data directory on the file server.

9. Check the configuration. If the check does not find any errors, activate the configuration using Deploy.

 

Adding a Slave Host

Procedure

1. Install TREX on the host that you want to add.

2. If you are using centralized data storage: Mount the central TREX data directory (UNIX) or define it as a network drive (Windows).

3. Start TREX on the new host.

4. Start the TREX admin tool on a host that is already configured in the distributed system.

5. Go to the Landscap e Configuration window.

6. Use Add Host to add the new host.

7. Select Assign Exis ting Indexes/Queues to New Slave/Backup Servers. Otherwise the new slave host does not receive existing indexes.

8. Configure the new host in the Hosts table as follows:

a. In the Slave Index Server for… column, specify the master host to which the host belongs.

b. If you are using centralized data storage: In the column Base Path enter the central TREX data directory on the file server.

9. Check the configuration. If the check does not find any errors, activate the configuration using Deploy.

 

Removing a Backup Host

Use

You can remove a backup host from a distributed system. You may want to do this if you used the host for test purposes and no longer need it in the distributed

system.

Procedure

1. Start the TREX admin tool on any host in the distributed system.

2. Go to the Index Landscape window. Check the column <light><host_name>:<index_server_port> for the host that you want to remove.a. Check which indexes the host is ass igned to as the backup index server.

If you remove this host, TREX automatically removes these assignments. The affected indexes then no longer have a backup index server. If you want indexing to

be highly available, you need to assign these indexes to another backup index server.

b. Check whether the backup index server is currently active.

This is displayed in the column using the entry +backup. You should not remove the host if the backup index server is active. If you remove the host anyway, the

system does not switch automatically to using the master index server. The master index server is only assigned to the affected indexes when it is next started.

c. Check whether the host is assigned to any indexes as the master index server.

If this is the case, you have to assign another master index server to the indexes before removing the host.

d. Check whether the host is assigned to any indexes as the slave index server.

If you remove the host, TREX removes these assignments automatically. You have to assign these indexes to another slave index server too.

See also: Changing Index Assignments

3. Go to the Queue Landscape window. Check the same things for the queues as you just checked for the indexes.

See also: Changing Q ueue Ass ignments

4. If you are sure that you want to remove the host, go to the Landscape Configuration window.5. Select the host in the Hosts table and then choose Remove Host.

6. Check the configuration. If the check does not find any errors, activate the configuration using Deploy.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 70 of 105

Page 71: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 71/105

Result

The TREX instance is still installed on the removed backup host. The host may still contain configuration data with information on the distributed system. However,

since these configuration files are not consistent, the TREX instance on this host will normally not start any longer. You should therefore deinstall this TREX

instance.

 

Removing a Slave Host

Use

You can remove a slave host from a distributed system. You may want to do this if you used the host for test purposes and no longer need it in the distributed

system.

Procedure

1. Start the TREX admin tool on any host in the distributed system.

2. Go to the Landscape Configuration window.

3. Select the slave host that you want to remove in the Hosts table. Remove the selection in the column Slave Index Server for.4. Remove the host from the landscape using Remove Host.

5. Check the configuration. If the check does not find any errors, activate the configuration using Deploy.

Result

TREX is still installed on the removed slave host. The host may still contain index copies and configuration files with information on the distributed system.

However, since these configuration files are not consistent, the TREX instance on this host will normally not start any longer. You should therefore deinstall this

TREX instance.

 

Replacing a Backup or Slave Host

Use

You can replace a backup or slave host with a new host. You may want to do this if the current host needs to be maintained and will therefore be unavailable for a

while.

Replacing a Backup Host

1.  Add a new backup host to the distributed system.

2. Remove the p revious backup host from the distributed system.

Replacing a Slave Host1. Remove the previous slave host from the distributed system.

2.  Add a new slave host to the distributed sys tem.

 

Changing Index Assignments

Use

When you create an index, TREX assigns a master index server and slave index server to it. If you are using backup index s ervers, TREX also assigns a

backup index server to the index.

You can change these assignments if necessary. You may want to do this if you need to remove a host from the distributed system and assign the indexes to

other servers first.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 71 of 105

Page 72: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 72/105

Prerequisites

You are using centralized data storage.

Procedure

1. Go to the Index Landscape window.

2. In the table select the index whose assignment you want to change.

3. Click in a column relating to a host. Choose the required function from the context menu.

Function Description

Move master here Assigns another master index server to the index.

Switch master/backup Switches the master and backup index servers. The master index server is

then used as the backup index server for this index (and vice versa).

Remove this backup Removes the assignment of index to backup index server.

 Add backup here Assigns a backup index server to the index.

Remove this slave Removes the assignment of index to slave index server.

 Add slave here Assigns a slave index server to the index.

The functions only change the assignment of index to server. The indexes are not physically moved.

You can also change the assignments if the currently assigned master, backup, or slave index server is active at that point in time. The currently ass igned

server completes its current activity b efore the change takes effect.

 

Changing Queue Assignments

Use

When you create an index, TREX automatically creates a corresponding queue and assigns the queue to a master queue server. If you are using backup queue

servers, TREX also assigns a b ackup queue server to the queue.

You can change these assignments if necessary. You may want to do this if you need to remove a host from the distributed system and assign the queues to other 

servers first.

Prerequisites

You are using centralized data storage.

Procedure

1. Go to the Queue Landscape window.

2. In the table select the queue whose ass ignment you want to change.

3. Click in a column relating to a host. Choose the required function from the context menu.

Function Description

Move master here Assigns another master queue server to the queue.

Switch master/backup Switches the master and backup queue servers. The master queue server is

then used as the backup queue server for this queue (and vice versa).

Remove this backup Removes the assignment of queue to backup queue server.

 Add backup here Assigns another backup queue server to the queue.

The functions only change the assignment of queue to server. The queues are not physically moved.

You can also change the assignments if the currently assigned master or backup queue server is active at that point in time. The currently assigned master or 

backup server completes its current activity b efore the change takes effect.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 72 of 105

Page 73: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 73/105

Allowing Searching on Master Indexes

Use

In a distributed system, you cannot search on the master indexes by default. Search requests are answered only by slave index servers, and not by the master 

index servers.

The default configuration has the following advantages:· Faster indexing

The resources on the master index server do not need to be shared among indexing and searching processes.

· Less main memory requirement

 A write variant and a read variant exist for each master index. If the master index server only carries out indexing, only the write variant has to be loaded to

the main memory. If the master index server carries out indexing and searching, both variants have to be loaded to the main memory.

You can change the default configuration so that the master index servers are also used for searching. This makes sense in the following cases:

· You are able to ensure that the master index servers do not index and search at the same time. This may be the case, for example, if indexing always takes

place at night when there are no users using the system for searching.

· You have static indexes. These are indexes that you have created and intend to update rarely (for example, every three months).

You can change the standard configuration in the following two ways:

· For all new master indexes

· For exis ting master indexes

Changing the setting for all new master indexes

1. Start the TREX admin tool on any host in the distributed system.

2. Go to the Landscape Configuration window.

3. Select Search on Master/Backup Server 

4. Activate this change by choosing Deploy.

Changing the setting for an existing master index

1. Start the TREX admin tool on any host in the distributed system.

2. Go to the Index Landscape window.

3. Select the index in question and choose Index Properties from the context menu.

4. Select Search on Indexer (Master/Backup)

 

Changing Default Directories for Indexes, Snapshots, or Queues

Use

There are default directories for indexes, snapshots, and queues. TREX creates new data in these directories. You can change the default directories. You may

want to do this if you are running out of disk space in the existing default directories.

This change has no effect on existing indexes, snapshots, or queues. They remain in the previous default directories. TREX creates new data in the new default

directories.

If you want to move existing indexes, snapshots, or queues, contact SAP Support.

Procedure with Centralized Data Storage

On UNIX

1. Create a new TREX data directory on the file server .

2. Mount the new directory.

3. Start the TREX admin tool on any host in the distributed system.

4. Go to the Landscape Configuration window.

5. Spec ify the new directory for all hosts in the Basepath column of the Hosts table.

6. Activate this change by choosing Deploy.

On Windows

1. Create a new TREX data directory on the file server .

2. Edit the configuration file TREXDaemon.ini on all hosts that belong to the distributed system. Define the new directory as a network drive.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 73 of 105

Page 74: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 74/105

[mappings]

map_t=\\myfileserver\myoldtrexshare

map_u=\\myfileserver\mynewtrexshare

For more information, see Defining the Network Drive.

3. Stop TREX on all hosts that belong to the distrib uted system. Restart TREX.4. Start the TREX admin tool on any host in the distributed system.

5. Go to the Landscape Configuration window.

6. Spec ify the new network drive, or a subdirectory thereof, for all hosts in the Basepath column of the Hosts table.

7. Activate this change by choosing Deploy.

Procedure with Decentralized Data Storage

1. Create a new TREX data direc tory on the host in question.

2. UNIX only: Make sure that the directory belongs to the user SAPService<SAPSID>.

3. Start the TREX admin tool on any host in the distributed system.

4. Go to the Landscape Configuration window.

5. Spec ify the new directory for the affected host in the Basepath column of the Hosts table.

6. Activate this change by choosing Deploy.

 

Distributed Preprocessing of Documents

Purpose

Indexing is a complex p rocess consisting of several phases. One phase is the preprocessing of documents by the preprocessor. Preprocessing includes the

following steps:

· Loading documents if the appl ication transmitted them as URIs.

· Filtering

· Carrying out a linguistic analysis

Preprocessing can take a similar amount of time and use similar system resources to the actual indexing process. The filtering of a large number of large

documents that are not in text or HTML form can be particularly time- and resource-consuming (for example, large PDFs).

In order to increase throughput in preprocessing, you can distribute the preprocessing among multiple hosts. For example, you can use one host (or more than

one) exclusively for preprocessing documents. You do this if there are a large number of documents to be preprocessed for the initial indexing run.

The following sections contain information on the distributed preprocessing of documents.

· The section Fundamentals explains the preprocessing flow for indexing. It also tells you about distribution options and how to control load distribution and

performance.

· The section Configuration explains how to configure distributed preprocessing.

The preprocessor is involved in p rocessing search and text-mining requests as well as in indexing. In all of these processes, the preprocessor has the task of 

preparing the actual preprocessing.

The sections below only relate to the preprocessing of documents for indexing. The role of the preprocessor in processing search and text-mining requests is

not described.

 

FundamentalsThe following sections provide fundamental information on the topics below.

· Preprocessing Flow· Distributing Preprocessing

· Load Distribution and Performance

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 74 of 105

Page 75: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 75/105

 

Preprocessing FlowThe graphic below depicts the most important steps that take place immediately before, during, and after preprocessing.

The graphic depicts the process flow of the application transmitting the URI of a document to TREX. If the application transmits the document directly, the step

‘Load document (HTTP/HTTPS Get)’ does not take place.

The application sends indexing requests to the TREX Web server or TREX RFC server. This server then forwards the requests to the queue server. The queue

server assigns the requests to the correct queues and distributes the requests among one or more preprocessors. The actual preprocessing of documents then

takes place on the preprocessor(s).

When the preprocessing has been completed, the preprocessor passes the analyzed document to the queue server. The queue server collects the documents

and, depending on its configuration, triggers further processing on the index server.

How Does the Distribution of Documents Take Place?

The distribution of documents among the preprocessors is controlled by the name server. The distribution takes place according to a round robin procedure that

takes the number of times that a preprocessor has been accessed into account. Preprocessors that have been accessed less often are preferred when

distributing documents.

The process flow is as follows:

1. When a queue server receives a document it assigns it to a preprocessor client.

2. The preprocessor client asks the name server for the address of a prep rocessor.

3. The name server returns the preprocessor that has been accessed least often.

4. The preprocessor client forwards the document to the preprocessor and waits for a response. Preprocessor clients are busy while waiting for a response.

They receive no further documents from the queue server during this time.

5. When the preprocessing of the documents is over, the preprocessor client receives a response from the preprocessor, and returns its own response to the

queue server.

6. Only then is the preprocessor client free to receive further documents from the queue server.

 

Distributing PreprocessingThe preprocessing of documents is carried out by preprocessors running in any or index mode. If you set up the system according to Landscape Configuration,

these are

· The preprocessors that run on the master hosts

· If you are using backup hosts, the preprocessors that run on the backup hosts

For more information on the meaning of the modes, see Preprocessor Modes.

If the preprocessing capacity of the master and backup hosts is insufficient, you can use one host or multiple hosts exclusively for preprocessing. Preprocessing

then takes place on additional preprocessors, allowing more documents to be preprocessed in parallel. This increases throughput for preprocessing.

On a host used exclusively for preprocessing, one or more preprocessors run in index mode, and a name server also runs. Such a host is referred to as a

preprocessor host.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 75 of 105

Page 76: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 76/105

The graphic below depicts a system with one master host, two slave hosts, and one preprocessor host. The preprocessor host supports the master host in

preprocessing.

 

Load Distribution and PerformanceYou have to configure the preprocessors and queue server if you want to use distributed preprocessing. The following parameters are important for load distribution

and performance:

● Number of Preprocessors and Preprocessor Threads

● Preprocessor Threads and Queue Server Pool Size

You can only improve performance by taking all parameters into account together. Changing one individual parameter cannot improve performance.

 

Number of Preprocessors and Preprocessor ThreadsThe following parameters in the preprocessor and queue server are important for performance:

● The number of preprocessors running on a host

(Number of preprocessors per host)

● The number of threads in a preprocess or process

(Number of threads per preprocessor)

● Number of preprocessor clients in the queue server 

(Pool size per queue server)

You can use the pool size for the queue servers to directly influence the number of preprocessor threads. The number of preprocessor threads and the pool

size are connected as follows: <queue server pool size> = <number of preprocessor threads> For more information, see Preprocessor Threads and Queue

Server Pool Size.

Configuration Rules for Preprocessor and Queue Server 

You must take into account the following relationships and configuration rules for a high-performance configuration of distributed preprocessing:● <maximum number of preprocessors per host> = <number of CPUs>

That is, a maximum of one preprocessor per CPU.

● <maximum number of threads per preprocessor> = 3

That is, a maximum of three threads per preprocessor and per CPU.

● <total pool size of all queue servers> = <total number of CPUs for all preprocessor hosts> * 3

These relationships are explained in more detail below.

How Many Preprocessors Can Run On a Host?

The number of preprocessors that can run on a host is limited by the available main memory and the number of CPUs.

Each preprocessor process has its own main memory area. If there are multiple preprocessors running, they need a correspondingly large amount of main

memory. The main memory requirement of a preprocessor depends on the following factors:

● How big are the documents?

● What format do the documents have (PDF, HTML, and so on)?

● For how many languages is language recognition activated?

The main memory requirement for one language is between 30 and 40 MB per preprocessor. If there are more languages, the main memory requirement is

normally around 100 MB per p reprocessor.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 76 of 105

Page 77: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 77/105

In some cases, the main memory requirement may be between 500 MB and 1 GB. The worst case scenario can occur if language recognition is activated for all

languages and a large number of preprocessor threads are processing large documents at the same time.

If the host has enough main memory, the following upper limit is valid:

<Maximum number of preprocessors on a host> = <number of CPUs>

What Is the Maximum Possible Number of Preprocessor Threads?

 A preprocessor process can consist of one or more threads. If there are multiple threads, the preprocessor can distribute the requests among the threads and

process the requests in parallel. The preprocessor automatically starts the number of threads that is required for processing.

For each preprocessor process, a maximum of three preprocessor threads per CPU should be started:<number of preprocessor threads p er preprocessor process> = 3

Since only one preprocessor per CPU and only three threads per preprocessor should be started, this results in the following relationship:

<maximum number of all preprocessor threads running on a host> =

<number of CPUs> * 3

You use the queue server pool size to indirectly configure the number of preprocessor threads (see Preprocessor Threads and Queue Server Pool Size).

If the preprocessor uses the maximum number of threads it is also using the maximum amount of system resources. You will have almost complete CPU load.

If you want the preprocessor to have fewer system resources, you can choose to have a smaller number of threads. However, you ought not to choose to have

a greater number of threads, since this can cause performance to drop.

The more threads invoked in parallel, the longer the operating system takes to administrate the threads (to trigger, stop, and monitor them). If the number of 

threads invoked in parallel is too great, the operating system is overwhelmed by thread administration.

More Preprocessors or More Threads?

If you want to optimize preprocessing performance, you need to decide whether to increase the number of preprocessors or the number of preprocessor threads.

Your decision depends on the following factors:

● Required load distribution among the hosts

● Sys tem resources of the hosts (number of CPUs and available main memory)

If only one host is preprocessing documents, it makes no difference whether one preprocessor is running with multiple threads or several with one thread each.

If several hosts are preprocessing documents, the parameters have the following effect:

● Load balancing

The number of preprocessors running on each host controls the load distribution among the hosts.

The more preprocessors running on a host, the more load that host receives.

Preprocessing takes place on the master host and on a preprocessor host. Because the master host also carries out indexing you want it to receive a smaller 

preprocessing load. There is therefore only one preprocessor on the master host, but two preprocessors on the preprocessor host.

The load is distributed among the two hosts in the ratio 1:2.

● Performance

The number of preprocessor threads controls the performance on one host.

The more threads there are, the more documents a preprocessor can process in parallel.

You cannot use the pool size on the queue server to increase the number of preprocessor threads (see Preprocessor Threads and Queue Server Size) and the

number of preprocessors without restriction. The maximum number depends on the available system resources.

Availability

 Availability can also play a part when deciding on the number of preprocessors and preprocessor threads.

Using multiple p reprocessors increases the availabili ty of the system. This is because different processes (preprocessors) have less impact on one another than

do the different threads of a process. If a thread hangs, this can affect other threads of the same process but not of another process.

However, using multiple preprocessors also requires more main memory (see the How Many Preprocessors Can Run On a Host? section above).

 

Preprocessor Threads and Queue Server Pool SizeThe pool size is important for achieving optimum integration between the queue servers and preprocessors. The pool size determines how many documents a

queue server can distribute to the preprocessors at once.

From a technical point of view, the pool size determines how many preprocessor clients a queue server instantiates at startup. The preprocessor client is an

internal component of the queue server. The queue server uses the preprocessor clients to communicate with the preprocessors and uses its services.

Depending on the number of preprocessor clients started in the queue server (= pool size), the corresponding number of preprocessor threads are started by a

central worker thread management. You can use the pool size in the queue server to control the number of preprocessor threads on the hosts that preprocessors

are running on.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 77 of 105

Page 78: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 78/105

The following relationship applies:

<queue server p ool s ize> = <number of preprocessor threads>

Example

For example, if you set the pool size in the queue server to the value 6, the corresponding number of preprocessor threads are started on the host that the

preprocessor(s) are running on. If two preprocessor processes are running there, the threads are distributed between two preprocessor processes, which

correspond to three threads per process.

Example of the relationship between the pool size and the number of preprocessor threads

What Value Should the Queue Server Pool Size Have?

You must determine the optimum pool size and thus the number of preprocessor threads individually for your system. For each preprocessor process, a

maximum of three preprocessor threads should be started using the entry for the pool size.

Thus, the following relationship applies:

<number of preprocessor threads p er preprocessor process> = 3

Since a maximum of only one preprocessor should be started per CPU (see Number of Preprocessors and Preprocessor Threads), this results in the following

relationship for a dis tributed system landscape with multiple queue servers and preprocessor hosts:

<total pool size of all queue servers> =

<total number of CPUs for all preprocessor hosts> * 3

If the pool size is too low, the preprocessor can have unnecessary idle times and not have a full load, although resources are still available. If the pool size is too

large, the host on which the queue server is running uses too many system resources to manage the pool.

You should check the CPU load for the preprocessors for a while. If system resources are still available, you can increase the pool size to improve performance.

However, if you increase the pool size beyond the recommendations, you gain no performance benefits and might actually cause performance to drop.

The pool size of queue servers is configured in the file TREXQueueServer.ini.

 

Configuration

Purpose

The sections below explain how to set up distributed preprocessing with a preprocessor. It also contains information on how to increase the number of 

preprocessors and preprocessor threads if necessary.

 

Configuration RecommendationsTo achieve high-performance preprocessing that does not hamper the other TREX servers, use the following configuration.

Preprocessor hosts

● In accordance with the configuration rules that are speci fied in Number of Preprocessors and Preprocessor Threads, start the required number of 

preprocessors on the preprocessor host.

● Monitor the load on the host during prep rocessing. If system resources are still available, you can uses the pool size in the queue server to increase the

preprocessor threads up to the maximum number recommended.

Master Host

We recommend that you keep the default configuration for the preprocessor on a master host.

If you give the preprocessor additional system resources, the performance of the queue server and index server suffers. Preprocessing will be faster, but

subsequent processing steps will be slower.

Backup host

If the master index server and master queue server are active, there is little load on the backup hosts. If you want to use more load for preprocessing on a backup

host, you can start more preprocessors on it, provided the hardware allows this (when doing this, note the configuration rules that are specified in Number of 

Preprocessors and Preprocessor Threads.) This allows you to make better use of the system resources on the backup host. However, the performance of the

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 78 of 105

Page 79: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 79/105

indexing is not so high if either the backup index server or backup queue server is actually ac tive.

Example

Example 1

The following hosts preprocess documents:

● Master host:

2 CPUs – one preprocessor – one queue server 

● Preprocessor hosts:

2 CPUs – two preprocessors

Both hosts have two CPUs. Because the preprocessor host only preprocesses documents, it should take more of the load than the master host. The preprocessor 

host therefore has two preprocessors. The only queue server in the system and a preprocessor are running on the master host. Therefore, a total of three

preprocessors are running. Since each preprocessor process should process a maximum of three threads per CPU in parallel, you can calculate the maximum

pool size as follows:

pool size = number of all preprocessors * 3 = 3 * 3 = 9

Thus the pool size must be set to the value 9 in the configuration file TREXQueueServer.ini on the master host. As a result, the queue server makes available a

total of nine queue server clients for the preprocessors and in turn a total of nine threads are started in the preprocessors.

Example 2

The following hosts preprocess documents:

● Master host 1

2 CPUs – one preprocessor – one queue server 

● Master host 2

2 CPUs – one preprocessor – one queue server 

● Backup host

2 CPUs – one preprocessor – one queue server 

● Preprocessor host

2 CPUs – two preprocessors

The preprocessor host therefore has 2 preprocessors, as in example 1, but no queue server. Since the system consists of two master hosts and one backup

host, there are a total of three queue servers. We can assume that two of these three queue servers are always active: Either both master queue servers, or one

master queue server and one backup queue server.

The pool size for two active queue servers is determined as follows:

pool size = number of all preprocessors * 3 = 5 * 3 = 15

This pool size divided by the number of active queue servers gives a pool size of 7 or 8 per queue server. This is the pool size that you enter in the configuration

file TREXQueueServer.ini of all queue servers.

 

Setting Up Distributed Preprocessing

Use

The procedure below explains how to implement distributed preprocessing. The description assumes that:

● You have set up a distributed system with at least one master host.

● You want to connect a host that exc lusively preprocesses documents (p reprocessor host). You want the preprocessors on this host to have as many system

resources as possible.

Adding a Preprocessor Host to the Distributed System1. Install TREX on the preprocessor host. During the installation speci fy the number of preprocessors to run on the host.

2. If TREX is not running, start it.

3. Start the TREX admin tool on a host that is already configured in the distributed system.

4. Go to the Landscap e Configuration window.

5. Use Add Host to add the new preprocessor host.

Configuring Preprocessor Hosts

1. Choose the preprocess or mode index for the preprocessor host.

2. Configure the TREX daemon on the prep rocessor host so that only the name server and preprocessors run there:

a. Select the host in question and choose Edit Services.

b. Change the programs parameter as follows:

[daemon]

programs = nameserver, preprocessor1, ..., p reprocessor<n>

3. Go to the Landscape Services window.4. Select one of the servers to run on the preprocessor host. Choose Start New/Stop Removed Services@<hostname>(*)from the context menu.

Configuring Master and Backup Hosts

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 79 of 105

Page 80: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 80/105

1. Go to the Landscape Ini window.

2. Establish the maximum possible number of prep rocessor threads for all hosts that preprocess documents. Take into account all hosts on which a

preprocessor is running in either any or index mode.

For more information about the calculation, see Preprocessor Threads and Queue Server Pool Size.

3. Calculate the pool size for each queue server.

For more information, see Preprocessor Threads and Queue Server Pool Size.

4. Edit the configuration file TREXQueueServer.ini for all queue servers . Enter the calculated value in the parameter poolsize.

5. Go to the Landscape Services window.

6. Select a queue server whose configuration you have changed. Choose Restart queueserver@<host_name>:<port> from the context menu.Carry out this step for all other queue servers.

The queue servers are automatically restarted by the TREX daemon.

Result

You can check whether the preprocessors are receiving as many system resources as possible by looking at the CPU load for the hosts in question in the TREX

admin tool. When documents are being preprocessed, the CPU usage should be at the upper limit.

 

Example ConfigurationThis section shows the configuration for a system in which preprocessing takes place on one master host and one preprocessor host.

The configuration is only spec ified for mytrexmaster and mytrexpreprocessor, and only where distributed preprocessing is involved.

 Assumptions:

● Both hosts have two CPUs each.

● Two prep rocessors should run on mytrexp reprocessor .

● The only queue server in the system is running on mytrexmaster .

● Only searches take place on mytrexslave1/2, there is no preprocessing here.

TREX admin tool, Landscape Configuration

Hosts table (extract 1)

Host Name Server Mode Master Index/Queue

Server 

Slave Index Server for Preprocessor Mode

mytrexmaster 1st master ! index

mytrexpreprocessor slave index

mytrexslave1/2 slave search

...

Hosts table (extract 2)

Host Base Path Services

mytrexmaster ... ...

mytrexpreprocessor /usr/sap/<SAPSID>/TRX<instance_number> nameserver, preprocessor1, preprocessor2

...

TREXDaemon.ini for ‘mytrexpreprocessor’ (extract)

[daemon]

programs = nameserver, preprocessor1, preprocessor2

 

TREX admin tool, Landscape Ini 

TREXQueueServer.ini for ‘mytrexmaster’[preprocessor]

poolsize=9

The pool size is calculated as follows:

2 * <preprocessor-threads> on <mytrexpreprocessor> + <preprocessor-threads> on <mytrexmaster> = 2 * 3 + 3 = 9

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 80 of 105

Page 81: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 81/105

 

Increasing the Number of Preprocessors

Use

If necessary you can increase the number of preprocessors running on a host. See Number of Preprocessors and Preprocessor Threads for information on when

this is recommended.

Procedure

1. Start the TREX admin tool on any host in the distributed system.

2. Go to the Landscape Configuration window.

3. Select the host in question and choose Edit Services.

4. Add an entry for the new preprocessor to the parameter programs.

[daemon]

programs = ..., p reprocessor<new_number>

5. Make sure that there is a section with the same name (preprocessor<new_number> ) containing the start parameter for the preprocessor. If there is no suchsection, copy an existing section and rename it as follows:

[preprocessor<new_number>]

Windows: executable=TREXPreprocessor.exe

UNIX: executable=TREXPreprocessor.x

. . .

6. Go to the Landscape Services window.

7. Select any TREX server running on the host in question. Choose Start New/Stop Removed Services@<hostname>(*)from the context menu.

8. Modify the pool size of all master and backup queue servers.

For information on calculating the pool size, see Pool Size of Queue Servers. For information on the procedure, see the section Master and Backup Hosts in

Setting Up Distributed Preprocessing.

 

Appendix

Information on Stopping/Starting Distributed SystemsThere are no special rules to take into account when stopping a distributed system. You can stop TREX in any order on the individual hosts.

When you start a distributed system, the type of data storage dictates whether there is a defined sequence.

· If you are using centralized data storage, there is no special sequence.

· If you are using decentralized data storage, you firstly have to start a master name server that was running just before the system was stopped. This ensures

that the system is based on an up-to-date topology file.

The hosts mytrexhost1, mytrexhost2, and mytrexhost3 are configured as master name servers. mytrexhost3 has not been operating for a while, which means

that its topology file is not up-to-date. Changes that have been made since (such as new indexes) are not known to this host.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 81 of 105

Page 82: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 82/105

You also stop TREX on the remaining hosts for maintenance reasons. If you now want to restart TREX, you now have to start it on mytrexhost1 or mytrexhost2first. These master name servers have up-to-date topology files.

If you were to start TREX on mytrexhost3 first, the system would be based on an out-of-date topology file.

The master name servers compare their topology files at startup. If the files are different, the master name server saves the files as topology.<date>.old and

topology.<date>.new. This allows the correct topology to be restored even if the required start sequence is not observed.

If this happ ens in your system, contact SAP Support.

 

Starting the TREX Admin Tool

Prerequisites

On UNIX: Since the TREX admin tool has a graphical interface, you need an X server. You cannot use a terminal program that only supports text mode, such as

telnet.

Procedure

1. Log on with the user <sap sid>adm.

2. Carry out one of the following steps :

Operating System Procedure

UNIX Enter the following:

cd <TREX_DIR>

./TREXAdmin.sh

Windows Choose Start ® Programs or All Programs ® SAP TREX ® Instance

<instance_number> ® Tools ® TREX Administration

You can also start the TREX admin tool by double-clicking

<TREX_DIR>\TREXAdmin.bat in Windows Explorer.

 

Configuring Queue Parameters

Use

The queue parameters control the interaction between the queue server and the index server. In particular, they specify when the queue server triggers indexing

and optimization of documents. It is important for performance reasons that you have optimum settings for the queue parameters.

When TREX creates a queue, it uses the default settings for the queue parameters. Depending on the document sets that you have to index initially and on the

type of documents you index, you may have to change the default settings.

The default settings that TREX uses for new queues are defined in the configuration file TREXQueueServer.ini. You can change the default settings. However,

you should only make changes to configuration files after consulting SAP support or with a consultant.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 82 of 105

Page 83: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 83/105

Prerequisites

You have already created indexes.

Procedure

You can change the queue parameters for existing queues as follows:

Tool Path

TREX admin tool Queue Admin ® Queue Parameters

TREX monitor in the portal System Administration ® Monitoring ® Knowledge Management ® TREX

Monitor ® Edit Queue Parameters

TREX Admin Tool in the SAP System Transaction TREXADMIN ® Queue Admin ® Set Queue Parameters

For more information about the meaning of the queue parameters, see the SAP Library at help.sap.com.

 

Changing Java Client Parameters

Use

You change the Java client parameters using the SAP J2EE Engine Visual Administrator Tool.

Procedure

1. Log on to the host on which the SAP J2EE Engine is running. Use the user <j2eeadm>.

2. Start the SAP J2EE Engine Visual Administrator Tool and log on to the SAP J2EE Engine.

For information on using this tool, see the SAP Library at help.sap.com.

3. Choose Cluster ® Services ® TREX Service.

4. Make the required changes.

5. Save your changes and confirm the restart of the service.

6. Repeat the last three steps for all other server processes of the clus ter.

 

Advanced Configuration Advanced configuration comprises the following areas:

● Language Recognition and Processing with TREX

TREX supports the indexing of documents that exist in different languages. When TREX is installed, you select the languages to be identified by language

recognition. You can retrospectively configure TREX to recognize additional languages.

● File Formats Supported by TREX

Documents whose content and attributes can be indexed and searched by TREX can exist in numerous different file formats. You can configure which file

formats you want to exclude from processing and which parts of XML and HTML files you want to exclude from indexing.

● Changing Proxy Server Settings

The TREX preprocessor can access documents on Web pages using a proxy server. You can configure the settings for the proxy server.

●  Activating Python Extensions

Some TREX functions are implemented as Python extensions. If the application using TREX uses these functions, you have to activate the Python

extensions.

● Configuration of the TREX Services in the SAP J2EE Engine

The TREX Java client is implemented as a TREX service in the J2EE engine. You can use the Visual Administrator to configure TREX caches and the TREX

Java client.

● Delta Index Configuration

TREX provides the option of activating delta indexes. This allows you to update indexes faster and improve the performance of TREX.

● Changing the TREX Host Name (Single and Multiple-Host Installation)  

You can change the name of the host on which you installed TREX later on, or you can install TREX with a virtual host name right from the start. You can do

this for both single-host and multiple-host installations.

● Configuration of the TREX Security Settings

You can configure secure communication between TREX and the application using it (for example, SAP Enterprise Portal or SAP Customer Relationship

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 83 of 105

Page 84: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 84/105

Management).

 

Language Recognition and Processing with TREX

Use

Search and Classification (TREX) supports the indexing of documents in different languages. When TREX is installed, you select the languages to be identified by

language recognition. You can configure TREX to recognize additional languages later on (see Configuring Language Recognition).

Language processing takes place after the language recognition process. This involves generating terms that are significant as regards creating an index, and is

done using various text operations.

 

Integration

TREX can process all languages supported by SAP. However, the functionality differs depending on the language. For more information, see:

· Supported Languages

These languages are recognized by TREX and supported without restriction. You can use all TREX functions including search, retrieval, text-mining, andclassification.

· Supported Languages with Restricted Functionality

These languages are recognized by TREX and supported with restrictions. Text-mining functions are particularly restricted.

· Languages that TREX Can Process

TREX cannot recognize these languages directly, but it can process them. This is done by mapping these languages to languages that TREX does support.

 

Language Recognition and Processing Function

Language Recognition and Processing Interaction

 

Language Recognition

Documents can exist in various languages and file formats. The TREX preprocessor converts the documents into UTF-8 encoded HTML so that they can be

processed by TREX. If there is no information on the document language, the preprocessor also carries out a language recognition process before processing the

document further. You can specify the languages to be recognized by the preprocessor in the configuration file std.langid-config both during the TREX installation

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 84 of 105

Page 85: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 85/105

and later on. For more information, see Configuring Language Recognition. The language of a document is needed so that it can be placed in the correct language

version of the index.

Language recognition is based on statistical methods: Because the frequency of certain combinations of letters is a characteristic of a language, these

combinations can be used in order to identify it with a reasonable degree of probability. A frequency file exists for each of the languages supported by TREX. It

contains frequency ratings and weightings for letter combinations that are typical for the language in question. The TREX preprocessor checks the text document it

is identifying to see whether it contains these combinations. It is then assigned to the language to which it is most similar.

Because the language of documents with only a small amount of text cannot be reliably identified, TREX preprocessor language recognition is only activated if at

least 7 terms (default value) can be recognized per document. When the language has been identified the term recognition process, which takes place after the

language recognition process, can improve the number of terms recognized using user-specific dictionaries.

Language Processing

Not all words that appear in a piece of document text are equally significant as regards representing that document in an index. This is why language processing

takes place after the language recognition process. This involves generating terms that are significant as regards creating an index, and is done using various text

operations.

Text Operations for Language Processing

· Tokenization: Determining words and sentence boundaries

· Normalization: Normalizing orthography

· Tagging: Determining word types

· Stemming: Reducing words to their stem form (for example, mice ® mouse)

· Stop words: Eliminating frequent words (such as and , and or )

 

Supported Languages

Use

TREX currently supports the following languages fully (May 2006/external software version 3.7.3):

● Arabic

● Chinese (simplified)

● Chinese (traditional)

● Danish

● German

● English

● Finnish

● French

● Dutch

● Italian

● Japanese

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 85 of 105

Page 86: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 86/105

● Catalan

● Korean

● Croatian

● Norwegian (Bokmal)

● Norwegian (Nynorsk)

● Portuguese

● Russian

● Swedish

● Serbian

● Slovakian

● Slovenian

● Spanish

● Czech

You can find the most up-to-date information about the languages that TREX supports on SAP Service Marketplace service.sap.com/pam in the Platform

 Availability Matrix (PAM).

For a detailed description of how to subsequently activate other languages, see Configuring Language Recognition.

IntegrationThe languages that TREX fully supports and the languages of the SAP applications using TREX (such as Knowledge Management in SAP Enterprise Portal) can

differ.

 

Supported Languages with Restricted Functionality

UseTREX supports several other languages for which restrictions currently apply as regards TREX functionality. TREX currently supports the following additional

languages with restrictions (May 20 06/external software version 3.7.3):

● Greek

● Hebrew

● Polish

● Romanian

● Thai

● Turkish

● Hungarian

You can find the most up-to-date information about the languages that TREX supports on SAP Service Marketplace service.sap.com/pam in the Platform

 Availability Matrix (PAM).

For a detailed description of how to activate these additional languages, see Configuring Language Recognition.

Constraints

Certain restrictions apply because the linguistic processing development for these additional languages is still at a relatively early stage. TREX functions such as

search, attribute query, query-based classification, and other functions that use text-mining functions rarely or never work at the same level of quality as for fully

supp orted languages.

However, the linguistic text-mining functions sometimes delivery results of less quality that in the case for the fully supported languages. Results of poor quality

can occur in the following areas:

● Feature extraction

 Automatically calculated document and/or class features may b e of poor quality.

● Example-based classification

When using this classi fication method the elements may b e class ified less precisely.

● Linguistic search

Incomplete or unexpected grammatical variations of a search term may be returned in the search results list.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 86 of 105

Page 87: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 87/105

Languages that TREX Can Process

Use

TREX cannot recognize these languages directly, but it can process them. The SAP application using TREX sends documents to be processed to TREX and

delivers information on the document language at the same time. Using a list, the document language transmitted is mapped to a language to which it is related

and which TREX can process.

The application using TREX sends TREX a document written in Bulgarian. Bulgarian is mapped to the related language Russian and indexed in a Bulgarian

index. The document is further processed in Russian.

 

TREX can currently process the following languages by mapping them to a language that it supports:

Document Language Processing Language

 Afrikaans Dutch

Bulgarian Russian

Estonian Finnish

Indonesian English

Icelandic Danish

Latvian Polish

Lithuanian Polish

Malaysian English

Norwegian Norwegian (Bokmal)

Serbian (Latin) Czech

Ukrainian Russian

 

Configuring Language Recognition

Use

Language recognition takes place first using the lexicon software of third-party providers and then using the TREX preprocessor. You can configure both types of 

language recognition.

Naming Convention

● Central directory for executable files <CENTRAL_DIR>

○ On UNIX: usr/SAP/<SAPSID>/SYS/exe/nuc/<OS>

○ On Windows: <drive>:usr\SAP\<SAPSID>\SYS\exe\nuc\<O S>

 As part of the CPE (Central Patch Environment), the sapcpe program takes on the automatic synchronization of executable files and copies them from the central

directory for executable files, <CENTRAL_DIR>, into the local directory for executable files, <TREX_DIR>\exe. When you restart TREX, the system automatically

launches the sapcpe p rogram. During all subsequent starts, sapcpe checks whether or not the local executable files are up-to-date and copies new or changed

executable files from the central directory to the local directory, <TREX_DIR>\exe.

● TREX installation directory <TREX_INSTALL>

○ UNIX: /usr/sap/<sapsid>/trx<instance_number>/<TREX_host_name>

○ Windows: <disk_drive>:\usr\sap\<SAPSID>\

TRX<instance_number>\<trex_hostname>

Modifying Language Recognition with Lexicon Software

Language recognition with lexicon software includes the following areas:

● Configure add itional languages

You can retrospectively configure language recognition for additional languages. When TREX is installed, you select the languages to be identified by

language recognition.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 87 of 105

Page 88: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 88/105

Only activate the languages that appear in your documents and that you also want to index. Doing this optimizes the performance of the language recognition

procedure and of indexing in general. Moreover, the less languages used, the better the results of language recognition.

● Disr egarding parts of HTML or XML documents

You can configure the system so that certain parts of HTML or XML documents are ignored when the language recognition procedure takes place.

Documents that are to be indexed and are in HTML or XML format often contain elements (such as JavaScript programs) that damage the performance of the

language recognition procedure.

● Changing the number of characters for language r ecognition

Language recognition using lexicon software is set up so that only a certain number of characters are taken into consideration. This is usually set to its

optimum value at delivery. If it turns out that languages are not being recognized correctly, you can increase the quantity of text that is taken into

consideration.

In certain cases, language recognition might not deliver the correct language. In particular, problems can occur when processing documents that are very short

or that contain a large number of abbreviations or words loaned from another language.

You modify the lexicon software language recognition by editing the std.langid-config configuration file on the TREX preprocessor. The settings are valid for all

indexes on the preprocessor. If you are using more than one TREX preprocessor, you need to modify the configuration file of each preprocessor.

1. Open the std.langid-config configuration file in the central directory for executab le files, <CENTRAL_DIR>\lexicon, in a text editor.

2. In the section <encodings-languages-covered>, check the lis t of languages to be taken into consideration for the language recognition procedure. The list is

under <list key = "utf_8">.

Delete languages that you do not need, or flag them using <!-- -->.

You can add more languages to the list as needed as long as the languages in question are supported by the language recognition service. The following list

shows which languages you can use, and gives the entry that you enter into the list for each language.

 

Languages supported by TREX

Language Entry

Chinese (simplified) <item key = "simplified-chinese" />

Chinese (traditional) <item key = "traditional-chinese" />

Danish <item key = "danish" />

German <item key = "german" />

English <item key = "english" />

Finnish <item key = "finnish" />

French <item key = "french" />

Dutch <item key = "dutch" />

Italian <item key = "italian" />

Japanese <item key = "japanese" />

Korean <item key = "korean" />

Norwegian (Bokmal) <item key = "bokmal" />

Norwegian (Nynorsk) <item key = "nynorsk" />

Portuguese <item key = "portuguese" />

Swedish <item key = "swedish" />

Spanish <item key = "spanish" />

 

Languages supported by TREX with limited functionality

Language Entry

 Arabic <item key = "arabic" />

Greek <item key = "greek" />

Hebrew <item key = "hebrew" />

Polish <item key = "polish" />

Romanian <item key = "romanian" />

Russian <item key = "russian" />

Thai <item key = "thai" />

Czech <item key = "czech" />

Turkish <item key = "turkish" />

Hungarian <item key = "hungarian" />

Only limited text-mining functions are currently available for these additional languages. For more information about these languages, see Supported Languages

with Restricted Functionality.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 88 of 105

Page 89: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 89/105

In the following example, English, French, and Danish are taken into consideration. Italian is not.

<encodings-languages-covered>

<list key = "utf_8">

<item key = "english" />

<item key = "french" />

<item key = "german" />

<!-- item key = "italian" -->

...

 All documents are converted to URF-8 Unicode format before language recognition takes place. Therefore only the section <list key = "utf_8"> is relevant in

the language list. The other types of coding do not need to be modified.

3. The section <remove-markup-content> contains a list of markings that are ignored when language recognition takes p lace. All texts with these markings are

ignored.

The following example shows a section of the list.

<remove-markup-content>

<item key = "applet" />

<item key = "code" />

<item key = "script" />

...

<item key = "title" />

For example, if JavaScrip t programs (marked b y <s cript>) occur in HTML documents, they are ignored when language recognition takes p lace.

If necessary, you can add more elements to the list, or remove existing elements from the list.

You have French documents that also contain a short summary in English. The summary is marked with the tag <English-Abstract>: <English-Abstract> This

is the abstract in English. ... </English-Abstract>. Add the line <item key = "english-abstract" /> to the list mentioned above.

On the other hand, if you want text marked with <title> to be taken into consideration for language recognition, you need to remove the line <item key = "title" />

from the list.

4. The section <detection-buffer-size> determines the quantity of text that is taken into consideration when a document is sub jected to the language recognition

procedure. You can increase this value if you think that the quantity of text is too small.

However, this should only be done in exceptional circumstances. The larger the quantity of text, the longer language recognition, and therefore indexing, takes.

The value in the section <detection-buffer-size> cannot be greater than the value in the section <langid-buffer-size>.

5. Save the file and close the text editor.

6. Restart TREX.

For the changes to the std.langid-config configuration file in the <CENTRAL_DIR>\lexicon directory to take effect, you must restart TREX. When you restart

TREX, the sapcpe program copies the changed configuration files from the central directory for executable files, <CENTRAL_DIR>\lexicon, to the local TREX

directory, <TREX_DIR>\exe\lexicon, and overwrites the std.langid-config configuration file there.

You can also use the TREX admin tool (stand-alone), area Landscape ® Ini to change the std.langid-configconfiguration file and then have the changes take

effect by restarting the TREX preprocessor. Note that only the file in the <TREX_DIR>\exe\lexicon directory is changed if you use this method. If you have

changed the std.langid-config configuration file in the central directory, <CENTRAL_DIR>\lexicon, as described above and restarted TREX, the system

overwrites the changed file in the local directory, <TREX_DIR>\exe\lexicon, during the automatic synchronization by the CPE and the changes are lost.

Modifying Language Recognition with the TREX Preprocessor The language of documents with only a small amount of text cannot be reliably identified, therefore TREX preprocessor language recognition is only activated if at

least seven terms (default value) can be recognized for each document. You can change this value if you are using TREX in a scenario with very short sentences.

You make modifications for TREX preprocessor language recognition in the TREXPreprocessor.ini configuration file.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 89 of 105

Page 90: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 90/105

1. Open the <TREX_INSTALL>\TREXPreprocessor.ini configuration file with a text editor.

2. In the section [lexicon], change the min_valid_tokens parameter.

The default value of this parameter is 7. Choose a lower value if you want the TREX preprocessor to try to identify the language of documents with fewer terms

per document.

3. Restart the TREX prep rocessor.

You need to stop and restart the preprocessor for the new settings to take effect. You do this with the TREX admin tool (s tandalone), using the function for 

starting and stopping the TREX servers. Note that the TREX daemon automatically restarts the server after it has been stopped. The settings are valid for all

documents indexed after the TREX preprocessor is restarted.

The new settings do not affect documents that have already been indexed. This means that if, for example, a document that has already been indexed has

been assigned to the wrong language, it must be reindexed.

 

File Formats Supported by TREX

Use

Documents whose content and attributes can be indexed and searched by TREX can exist in numerous different file formats. The TREX preprocessor converts

the document text and attributes of the different file formats into UTF-8 encoded HTML. The file filters of a special filter software are used to enable the subsequent

searching and indexing of all prevalent file formats such as MS WORD; MS PowerPoint, PDF, and HTML.

 

Features

The table below lists all file formats that are currently supported by TREX.

Supported File Formats (May 2006/Version 8.1 of Filter Software)

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 90 of 105

Page 91: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 91/105

File formats for text processing – generic Versions

 ASCII Text (7 & 8 bit versions available) All versions

 ANSI Text (7 & 8 bit) All versions

EBCDIC (Extended B inary Coded Dec imal Interchange Code) All vers ions

HTML Versions up to and including 3.0

IBM Revisable Form Text All versions

IBM FFT All versions

Microsoft Rich Text Format (RTF) All versions

MHTML (MIME Encapsulation of Aggregate HTML Documents) No speci fic version

Text Mail (MIME) No specific version

Unicode Text All versions

UUEncode

WML Compatible with WML specification 5.2

XML No specific version

Special Features of HTML Files and XML Files

TREX processes HTML files and XML files without filtering, because the conversion to HTML is not necessary. In principle, the lexicon software integrated in

TREX ignores the text of the mark-up elements of the actual HTML and XML code, which is located between the tag brackets (<...>). In this way, texts such as

“font size”, “color”, and so on within the tag <font size="7" color="#FF0000"> are not passed on for indexing, because this information occurs in many HTML filesand thus is not characteristic for the respective document content.

Using the mark-up elements, you can configure which texts within HTML and XML documents should not be indexed. For example, this makes sense in the case

of JavaScript program code, which is marked in HTML by the tags <scrip t type=“text/javascript“...> ... </script>. The JavaScript p rogram code itself does not

contain any characteristic content for the document in question and can thus be ignored.

For more information, see Excluding Parts of XML and HTML Files From Indexing.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 91 of 105

Page 92: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 92/105

File formats for text processing - DOS Versions

DEC WPS Plus (DX) Versions up to and including 4.0

DEC WPS Plus (WPL) Versions up to and including 4.1

DisplayWrite 2 & 3 (TXT) All versions

DisplayWrite 4 & 5 Versions up to and including Release 2.0

Enable Versions 3.0, 4.0, and 4.5

First Choice Versions up to and including 3.0

Framework Version 3.0

IBM Writing Assistant Version 1.01

Lotus Manuscript Versions up to and including 2.0

MASS11 Versions up to and including 8.0

Microsoft Word Versions up to and including 6.0

Microsoft Works Versions up to and including 2.0

MultiMate Versions up to and including 4.0

Navy DIF All versions

Nota Bene Version 3.0

Novell WordPerfect Versions up to and including 6.1

Office Writer Versions 4.0 to 6.0

PC-File Letter Versions up to and including 5.0

PC-File+ Letter Versions up to and including 3.0

PFS:Write Versions A, B, and C

Professional Write Versions up to and including 2.1

Q&A Version 2.0

Samna Word Versions up to and including Samna Word IV+

SmartWare II Versions up to and including Samna Word IV+

Sprint Version 1.0

Total Word Version 1.2

Volkswriter 3 & 4 Versions up to and including 1.0

Wang PC (IWP) Versions up to and including 2.6

WordMARC Versions up to and including Composer Plus

WordStar Versions up to and including 7.0

WordStar 2000 (DOS) Versions up to and including 3.0

XyWrite Versions up to and including III Plus

 

File formats for text processing - Windows Versions

 Adobe FrameMaker (MIF) Up to and including version 6.0

Corel/Novell WordPerfect for Windows Versions up to and including 10

Corel WordPerfect Suite for Windows Version 12.0

Hangul Version 97, 2002 (text only)

JustSystems Ichitaro Versions 5.0, 6.0, 8.0, 9.0, 10.0, 13.0, and 2004

JustWrite Versions up to and including 3.0

Legacy Versions up to and including 1.1

Lotus AMI/AMI Professional Versions up to and including 3.1

Lotus Word Pro (non-Windows) Version 96 -- Millennium Edition 9.6, text only

Lotus Word Pro (non-Windows)

Microsoft Works for Windows Versions up to and including 4.0

Microsoft Windows Write Versions up to and including 3.0

Microsoft Word for Windows Versions up to and including 2003

Microsoft WordPad All versions

Novell Perfect Works Version 2.0

Professional Write Plus Version 1.0

Q&A Write for Windows Version 3.0

StarOffice Writer for Windows and UNIX Version 5.2, 6.X, 7.X; text only

OpenOffice Version 1.1

WordStar for Windows Version 1.0

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 92 of 105

Page 93: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 93/105

 

File formats for text processing - Macintosh Versions

MacWrite II Version 1.1

Microsoft Word for Mac Versions 3.0 - 4.0, 98, 2001, 2004, and v.X

Microsoft Works for Mac Versions up to and including 2.0

Novell WordPerfect Version 1.02 up to and including 3.0

 

Table Calculation Formats Versions

Enable Versions 3.0, 4.0, and 4.5

First Choice Versions up to and including 3.0

Framework Version 3.0

Lotus 1-2-3 (DOS & Windows) Versions up to and including 5.0

Lotus 1-2-3 (OS/2) Versions up to and including 2.0

Lotus 1-2-3 Charts (DOS & Windows) Versions up to and including 5.0

Lotus 1-2-3 for SmartSuite SmartSuite 97, Millennium and Millennium 9.6

Lotus Symphony Versions 1.0, 1.1, and 2.0

Microsoft Excel Charts Versions 2.x - 7.0

Microsoft Excel Macintosh Versions 3.0 – 98, 2004, and v.X

Microsoft Excel Windows Version 2.2 up to and including 2003

Microsoft Multiplan Version 4.0

Microsoft Works (DOS) Versions up to and including 2.0

Microsoft Works (Mac) Versions up to and including 2.0

Microsoft Works for Windows Versions up to and including 4.0

Mosaic Twin Version 2.5

Novell Perfect Works Version 2.0

PFS:Professional Plan Version 1.0

QuattroPro for DOS Versions up to and including 5.0

QuattroPro for Windows Versions up to and including version 12

SmartWare II Version 1.02

StarOffice Calc for Windows and UNIX Version 5.2, 6.X, 7.X; text only

OpenOffice Version 1.1

SuperCalc 5 Version 4.0

VP Planner 3D Version 1.0

 

Database Formats Versions

 Access Versions up to and including 2.0

dBASE Versions up to and including 5.0

DataEase Version 4.x

dBXL Version 1.3

Enable Versions 3.0, 4.0, and 4.5

First Choice Versions up to and including 3.0

FoxBase Version 2.1

Framework Version 3.0

Microsoft Works (DOS) Versions up to and including 2.0

Microsoft Works (Mac) Versions up to and including 2.0

Microsoft Works for Windows Versions up to and including 4.0

Paradox (DOS) Versions up to and including 4.0

Paradox (Windows) Versions up to and including 1.0

Personal R:BASE Version 1.0

R:BASE 5000 Versions up to and including 3.1

R:BASE System V Version 1.0

Reflex Version 2.0

Q & A Versions up to and including 2.0

SmartWare II Version 1.02

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 93 of 105

Page 94: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 94/105

Presentation Formats Versions

Corel/Novell Presentations Versions up to and including 12

Harvard Graphics for DOS Versions 2.x & 3.x

Harvard Graphics for Windows Windows versions

Freelance for Windows Versions up to and including Millennium Edition 9.6

Freelance for OS/2 Versions up to and including 2.0

Microsoft PowerPoint for Macintosh Versions 4.0 up to and including 2004 and v.X

Microsoft PowerPoint for Windows Versions 3.0 up to and including 2003

StarOffice Impress for Windows and UNIX Versions 5.2 (text only), 6.X - 7.X (full support)

OpenOffice Version 1.1 (text only)

 

Graphic Formats Versions

In most cases, only the graphic type and name of the file is displayed for 

graphic formats. Only maintained properties are indexed for some graphic

formats. Text inside graphics cannot be indexed.

 

 Adobe FrameMaker Graphics (FMV) Version 5.0

 Adobe Illustrator Versions up to and including 9.0

 Adobe Photoshop (PSD) Version 4.0

 Adobe Portable Document Format (PDF)

Text inside PDF documents can normally be indexed. Text inside

graphics cannot be indexed. Some postscript fonts for text inside PDFs

cannot be indexed.

For more information, see SAP Note 622419: Embedded Fonts in PDF and

Postscript Documents.

Versions up to and including 6.0 (including PDF 1.5)

 

 AmiDraw (SDW) Ami Draw

 AutoCAD Interchange and Native Drawing Formats (DXF and DWG) V. 2.5 - 2.6, 9.0 - 14.0, 2000i - 200 2

 AutoShade Rendering (RND) Version 2.0

Binary Group 3 Fax All versions

Bitmap (BMP, RLE, ICO, CUR, OS/2, DIB & WARP) Windows

CALS Raster (GP4) Type I and Type II

Corel Clipart format (CMX) Versions 5 - 6

Corel Draw (CDR) Versions 3.0 - 8.0

Corel Draw (CDR with TIFF header) Versions 2.0 - 9.0

Computer Graphics Metafile (CGM) ANSI, CALS NIST version 3.0

Encapsulated PostScript (EPS) TIFF header only

GEM Paint (IMG) All versions

Graphics Environment Mgr. (GEM) Bitmap & Vector  

Graphics Interchange Format (GIF) All versions

Hewlett Packard Graphics Language (HPGL) Version 2

IBM Graphics Data Format (GDF) Version 1.0

IBM Graphics Data Format (GDF) Version 1.0

IBM Picture Interchange Format (PIF) Version 1.0

Initial Graphics Exchange Spec (IGES) Version 5.1

JBIG2 (Joint Bi-level Image Experts Group) JBIG2 graphic embeddings in PDF

JFIF (JPEG not in TIFF format) All versions

JPEG (incl. EXIF) All versions

Kodak Flash Pix (FPX) All versions

Kodak Photo CD (PCD) Version 1.0

Lotus PIC All versions

Lotus Snapshot All versions

Macintosh PICT1 & PICT2 Bitmap only

MacPaint (PNTG) No specific version

MacroMedia Flash Macromedia Flash 6.x and 7.x,

and Macromedia Flash Lite

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 94 of 105

Page 95: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 95/105

Micrografx Draw (DRW) Versions up to and including 4.0

Micrografx Designer (DW) Versions up to and including 3.1

Micrografx Designer (DSF) Windows 95, Version 6.0

Novell PerfectWorks (Draw) Version 2.0

OS/2 Bitmap All versions

OS/2 PM Metafile (MET) Version 3.0

Paint Shop Pro (PSP) Versions 5.0 and 5.01

Paint Shop Pro 6 (PSP) Win32 only

PC Paintbrush (PCX and DCX) No specific version

Portable Bitmap (PBM) All versions

Portable Graymap (PGM) No specific version

Portable Network Graphics (PNG) Version 1.0

Portable Pixmap (PPM) No specific version

Postscript (PS) Levels 1 - 2

Progressive JPEG No specific version

StarOffice Draw for Windows and UNIX Versions 2, 6.x, 7.x

Sun Raster (SRS) No specific version

TIFF Versions up to and including 6

TIFF CCITT Group 3 & 4 Versions up to and including 6

Truevision TGA (TARGA) Version 2

Visio (Preview) Version 4

Visio Versions 5, 2000, 2002, and 2003

WBMP No specific version

Windows Enhanced Metafile (EMF) No specific version

Windows Metafile (WMF) No specific version

WordPerfect Graphics (WPG & WPG2) Versions up to and including 2.0

X-Windows Bitmap (XBM) x10 compatible

X-Windows Dump (XDM) x10 compatible

X-Windows Pixmap (XPM) x10 compatible

 

Compressed File Formats Versions

GZIP No specific version

LZA Self Extracting Compress No specific version

LZH Compress No specific version

Microsoft Binder Versions 7.0-97

MIME-encoded mail messages No specific version

UNIX Compress No specific version

UNIX TAR No specific version

ZIP PKWARE versions up to and including 2.04g

Special Features of Compressed File Formats (Archives)The document content of files that are contained in an archive can only be indexed if TREX knows the file format of the files in question. The system uses the filter 

software to identify the type of files in the archive and filters the file content according to the file type identified. All files in an archive are handled as one large

document.

The filter software may sometimes incorrectly assign file types that it does not recognize in an archive to the wrong file type and filter them as such. For example,

binary files (*.bin), the content of which was filtered by accident and then indexed, fill the index created with a large number of terms that make no sense.

You can respond to this issue in two ways:

1. You can exclude compressed file formats (archives) from processing by the preprocessor by removing the corresponding MIME type (for example,

application/zip) from the TREXValidMimeTypes.iniconfiguration file.

For more information about this procedure, see Excluding File Formats from Processing.

2. You can modify the filter software configuration file, default.tpt, in such a way that the names, b ut not the file content of the files that the archive contains are

indexed.

For more information about this procedure, see SAP Note 900742.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 95 of 105

Page 96: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 96/105

 

Other File Formats Versions

Executables (EXE, DLL) No specific version

Executables for Windows NT No specific version

Microsoft Office 2003 for Windows Version 2003

Microsoft Outlook Message (MSG) Text and HTML ; codepage CP1252

(ISO 88 59-1) and Unicode

Microsoft Project Versions 2003/2002/2000/1998, text only

MP3 ID3 (Identify an MP3) Information

Signature Version 1.0

vCalender No specific version

vCard Electronic Business Card Version 2.1

Yahoo! Instant Messenger Versions 6.x and 7.x

 

Excluding File Formats from Processing

Use

You can use a list of MIME types in the configuration file TREXValidMimeTypes.ini to control which file formats are to be processed by TREX. MIME types for 

graphic formats such as image/jpeg, image/gif, and image/bmp are not listed in the configuration file although these formats are supported by the filter software

integrated into TREX (see Supported File Formats). This exclusion prevents TREX from being unnecessarily b urdened by the processing of these formats, since

it is not normally sensible to index images and graphics. There may be other scenarios where it makes sense to exclude certain file formats.

 A company archives its financial statements in the form of PDF files. These files contain mostly figures, with hardly any relevant text information. The

processing of these large files would unnecessarily hamper the performance of TREX but not simplify the indexing of the content. It therefore makes sense to

exclude these files from processing.

Procedure

You exclude the document content of a particular file format from being processed by TREX by removing the corresponding MIME types from the configuration file

TREXValidMimeTypes.ini. Proceed as follows to do this.

1. Stop TREX.

2. Open the configuration file <TREX_installation_directory>\TREXValidMimeTypes .ini with a text editor.

The configuration file TREXValidMimeTypes.ini is located in the TREX installation directory. The path to the directory is:

¡ On UNIX: /usr/sap/trex_<instance_number>

¡ On Windows: <disk_ drive>:\usr\sap\trex_<instance_number>

3. Remove the entry for the file format that you want to exclude from the list.

You do not want TREX to process PDF files because such files contain no relevant text information for your scenario. You remove the entry application/pdf from

the list of MIME types in the configuration file TREXValidMimeTypes.ini.

4. Save the file.

5. Start TREX.

 

List of MIME Types in the Configuration File TREXValidMimeTypes.ini

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 96 of 105

Page 97: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 97/105

MIME Type File Extension Application

application/andrew-inset ec

application/dca-rft rft IBM Revisable Form Text

application/excel xls MS EXCEL

application/macwriteii MWII MacWrite II

application/msword doc,dot MS Word

application/oda oda CALS Raster (GP4)

application/pdf pdf Adobe PDF

application/powerpoint ppt MS Powerpoint

application/rtf rtf Rich Text Format

application/smil smil, smi

application/vnd.lotus-1-2-3 123, w4, w3, w1 Lotus 1-2-3

application/vnd.lotus-freelance prz, pre Lotus Freelance

application/vnd.lotus-wordpro lwp, sam Lotus WordPro

application/vnd.ms-excel xls, xlb MS EXCEL

application/vnd.ms-powerpoint ppt, pps, pot MS PowerPoint

application/vnd.ms-wpl wpl DEC WPS Plus (WPL)

application/wordperfect5.1 wp5 Word Perfect 5.1

application/x-123 w1, wk3, wk4, wks Lotus 1-2-3 (DOS & Windows)

application/x-cdlink vcd

application/x-chess-pgn pgn

application/x-compress UNIX compress

application/x-csh csh UNIX CShell Script

application/x-dvi dvi

application/x-freelance pre Freelance for Windows

application/x-gtar gtar GNU UNIX tar archive

application/x-gzip gz, tgz GNU Zip compressed data

application/x-httpd-php

application/x-javascript js JavaScript

application/x-latex latex LaTex

application/x-maker frm, maker, frame, rm, fb, book, fbdoc Adobe FrameMaker  

application/x-mif mif Adobe FrameMaker (MIF)

application/x-msdos-program dll Dynamic Link Library

application/x-msexcel xls, xlb MS EXCEL

application/x-msmetafile wmf MS Metafile

application/x-netcdf nc, cdf  

application/x-ns-proxy-autoconfig pac Netscape Proxy Auto Config

application/x-perl pl, pm Perl Program

application/x-sh sh UNIX Bourne Shell Script

application/x-tar tar UNIX tar Archive

application/x-tcl tcl TCL Script

application/x-tex tex

application/x-texinfo texinfo, texi

application/x-troff t, tr, troff UNIX troff document

application/x-troff-man man UNIX man page

application/x-troff-me me UNIX troff document

application/x-troff-ms ms UNIX troff document

application/x-ustar ustar  

application/x-wais-source src

application/xlc xlc

application/zip zip

File formats of the MIME types text/*, including HTML, XML, and plain text formats such as *.txt and *.rtf, are processed by TREX without being filtered.

text/asp asp Active Server Pages

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 97 of 105

Page 98: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 98/105

text/css css Cascading Style Sheets

text/html html, htm, shtml Hypertext Markup Language

text/plain txt, c, ec, cpp, h, hpp, eml, sap

text/richtext rtx

text/rtf rtf  

text/src-c c

text/src-c++ cpp

text/src-java java

text/src-perl perl

text/src-tcl tcl

text/tab-separated-values tsv

text/thtml

text/vnd.wap.wml wml

text/wiki

text/wml wml

text/x-asm

text/x-setext

text/x-sgml

text/x-ssi-html

text/x-uil

text/x-uuencode

text/x-vCalendar 

text/x-vCard

text/xml xml Extensible Markup Language

 

Excluding Parts of XML and HTML Files From Indexing

Use

XML (EXtensible Mark-up Language) and HTML (Hyper Text Mark-up Language) are so-called mark-up languages, which structure and disp lay the text in a

document using mark-up elements. Using the mark-up elements in XML and HTML files, you can define in the <TREX-installation_directory>\Lexicon\std.html-

config file which texts within HTML and XML documents should not be indexed.

This makes sense in the following cases:

· Excluding technical information from indexing

For example, you can exclude the technical information in JavaScript program code from indexing, which is marked in HTML by the tags <script

type=“text/javascript“...> ... </script>. The JavaScript program code that is marked by these tags does not contain any characteristic content for the

respective document and thus can be ignored during the indexing run.

· Exclude redundant text parts from indexing

You can exclude text parts from indexing if they are identical in more than one XML or HTML file and thus do not contain any information about the respective

document content.

Excluding Technical Information From Indexing

1. Open the <TREX_Installation_Directory>\Lexicon\std.html-configconfiguration file with a text editor.

You must change entries in the sections <remove-region> and <multimedia-markup> in the std.html-config file. In each of these sections, you can find a list of 

mark-up elements for XML or HTML code. The texts that are marked by these elements in the XML or HTML file are not taken into account during indexing. In

the case of HTML, these are mark-up elements that contain technical information about processing and displaying HTML files.

The following examples each contain an extract from these lists:

¡ <remove-region>

<item key = "applet" />

<item key = "code" />

<item key = "script" />

...

<item key = "title" />

¡ <multimedia-markup>

<item key = "applet" />

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 98 of 105

Page 99: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 99/105

<item key = "code" />

<item key = "script" />

<item key = "server" />

...

<item key = "title" />

 

Special features of XML files

The selection of the mark-up elements in the std.html-config configuration file is based on that fact that all HTML language elements are standardized and

defined on an international level. Thus for HTML is guaranteed that the mark-up element listed contain only technical information (<applet>, <script>, <code>,

and so on), and not texts relevant to the document content.

However, in XML you can use a DTD (Document Type Definition) or an XML schema to define your own XML language elements, whose descriptions can be

identical to HTML language elements and which can contain text that is relevant to the document content. If you are indexing XML files, you therefore need to

check whether some of the mark-up elements have the same names and remove any affected elements from the list so that TREX processes them.

2. Remove an element from the list or add an element to the list:

¡ Remove an element from the list if you want the system to index the text that is marked by this element.

You do this, for example, by deleting the line <item key = "ap plet" /> from the list.

¡ Add an element to the list if you do not want the system to index the text that is marked by this element.

You do this by adding the line <item key = "Markup_Element" /> to the above list. In doing this, you replace Markup_Element with the element that you want to

exclude from processing.

Note that the list in the std.html-configconfiguration file contains certain default elements that are not taken into account during indexing.

3. Save the file and close the text editor.

4. Stop the TREX preprocessor and restart it, so that the new settings take effect. You start and stop the preprocessor using the function for starting and stopping

the TREX servers in the TREX admin tool (stand-alone).

Note that the TREX daemon automatically restarts the server after it has been stopped. The settings are valid for all documents indexed after the TREX

preprocessor is restarted. The new settings do not affect documents that have already been indexed.

Exclude Redundant Text Parts From Indexing

To exclude redundant text parts from indexing, proceed as follows:

1. Flag these text parts within the XML or HTML code in the relevant XML or HTML documents us ing a dedicated mark-up element (for example, <trexignore> ...

</trexignore>).

Note that, in the case of XML file, you must define the new mark-up element in the associated DTD or XML schema, otherwise the XML document is not well-

defined. In the case of HTML, the new mark-up element is ignored by the browser when displaying the document, because it is not part of the HTML standard.

2. Add the newly-defined mark-up element (for examp le, <trexignore> ... </trexignore>) in the two sections <remove-region> and <multimedia-markup > in the

std.html-config file as <item key = "trexignore" /> as described in the procedure Excluding Technical Information From Indexing (see above).

 

Changing Proxy Server Settings

Use

The TREX preprocessor prepares documents for indexing by the TREX engines. The application using TREX (for example, Content Management in SAP

Enterprise Portal) transmits the documents to be indexed to the preprocessor in the form of URIs that reference the storage location of the documents in question.

The preprocessor resolves these URIs and collects the actual documents from a Web server using HTTP.

 Access to Web pages can take place us ing a Proxy server regardless of whether the pages are in the Internet or in an Intranet. If you want to index documents

that can only b e accessed using a proxy server, you have to register the proxy server with the TREX preprocessor.

There might also be documents in your environment that can be accessed without a proxy server, for example, documents on local servers or your enterprise’s

external homepage. You can inform the preprocessor of the servers it can access without a proxy server. This speeds up the processing of documents on these

servers.

You specified settings for the proxy server when you installed TREX. If you want to change this later on, modify the TREXPreprocessor.ini configuration file on the

server on which the TREX preprocessor is running.

The graphic below shows a portal scenario. Some of the documents to be indexed are located on servers on the intranet, others on servers on the Internet. The

documents on the Internet can only be reached using a proxy server. The proxy server is not needed for documents on the intranet.

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 99 of 105

Page 100: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 100/105

Enter the proxy server into the section [httpclient] in the configuration file TREXPreprocessor.ini so that TREX can load external documents. Enter exclusion

rules for internal documents into the section [proxyrules].

Procedure

1. Open the configuration file <TREX_Installation_Directory>\TrexPreprocessor.ini on the server on which the TREX prep rocessor is running. Use a text editor to

do this.

2. Modify the following parameters in the [httpclient] and [proxyrules] sec tion:

[httpclient]

proxyhost=<name_of_proxy> (hostname and domain of the proxy server)

proxy.mylocation.mycompany.com

proxyport=<proxy port>

8080

proxyuser=<user_for_the_proxy> (optional)

You only need to enhance the line proxyuser if a user ID is needed to access the proxy server.

proxypassword=<password_for_user> (optional)

You only need to enhance the line 'proxypassword=' if a password is also needed for the user ID.

You can specify the password for the proxy user during the installation of TREX. You can use a script to change this password later on or to define a password

if you did not enter one when installing TREX. For more information, see Configuring TREX Security Settings ® Specifying the Password for the Proxy Server .

The listing of the parameters cannot contain empty lines. Keep to the format outlined above. The system distinguishes between lowercase and uppercase.

[proxyrules]

Specify the addresses for which the proxy server is not to be used. You normally enter one or more character strings in which the addresses in your intranet

end.

mycompany.com or mylocation.mycompany.com

Do not use the asterisk (*) as a placeholder. Lines that begin with # or ! are treated as comments and are therefore ignored. This is also true for IP addresses.

To exclude the IP address space 10.10.0.0-10.10.255.255, add the line 10.10. [proxyrules] to the section. This ensures that no proxy is used for URLs that

contain IP addresses in this space.

3. Save the file and close the text editor.

4. You have to stop and restart the TREX preprocessor so that it recognizes the changes to the configuration file TREXPreprocessor.ini.You do this using the

TREX admin tool (s tandalone). For more information, see Starting and Stopping the TREX Servers. Note that the TREX daemon automatically restarts the servers

after they have been stopped.

 

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 100 of 105

Page 101: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 101/105

Activating Python Extensions

Use

Some TREX functions are implemented as Python extensions. If the application used by TREX uses these functions, you have to activate the Python extensions.

The installation documentation for the application in question contains information on whether you have to activate any Python extensions.

The following Python extensions are available:

Extension Description

XML attribute extraction Extracts the attributes to be indexed from XML files.

This extension is required if the texts to be indexed consist only of attributes

and the attributes are transmitted to TREX as XML files.

Expansion of linguistic search queries Enhances linguistic search queries so that TREX can carry out an exact

search as well as a linguistic search.

Metadata extraction Extracts metadata from HTML documents.

Topic maps Uses topic maps to determine terms that have a semantic relationship to the

search term.

The semantic relationships involved depend on the structure of the topic map.

In most cases the topic map stores synonyms, hypernyms, and hyponyms

(superordinate and subordinate terms).

Semantic search Uses topic maps to enhance search queries with additional search terms.

This extension allows you to include lists of synonyms in the search, for 

example.

 

The following procedure explains how you activate the Python extensions globally for all indexes.

If you need to activate Python extensions locally for your application, the relevant information can be found in SAP Note 700771.

The global activation consists of the following two steps:

1. Activate the Python extension handler.

2. Registering the required Python extensions

Activate the Python extension handler.

1. Edit the configuration file <TREX_DIR>/TREXExtensions.ini.2. Check that the [activate] section has the structure below, and modify the section if necessary .

[activate]

imsapi=search, thesaurus, admin

preprocessor 

3. In the [extensionhandlers]sec tion, add the line trexxpy and/or remove the comment sign (#).

[extensionhandlers]

trexxpy

Registering the Python extensions

The directory <TREX_DIR>\extensions\examplecontains the file _extensions.py. This serves as a template for the configuration file extensions.py.

1. Copy the file _extens ions.py to the TREX installation directory <TREX_DIR> and rename it to extensions.py.

2. Edit the configuration file extensions.py.

3. In the relevant section, change the entry if 0: to if 1:. You identify the extensions by the class name.

Extension Class

XML attribute extraction XmlExtractor  

Expansion of linguistic search queries LinguistFix

Metadata extraction AttributeExtractor  

Topic maps XtmExpander  

Semantic search SemanticSearch

 

Register XML attribute extraction:

# XML attribute extractor extension

# --------------------

if 1:

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 101 of 105

Page 102: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 102/105

sys.path.append(os.path.join(os.getenv('SAP_RETRIEVAL_PATH'),

'extensions', 'attribute-extractor'))

from xmlextractor import XmlExtractor 

trexx.registerExtension(trexx.EXTCLASS_INDEXING,

XmlExtractor(debug=0, mimetypes=['text/xml' ]))

Result

The changes take effect when you next start the TREX daemon.

If you want to use the semantic search or topic maps, you must carry out further configuration steps. If necessary, contact SAP Support.

If errors occur during routine operation and the required functions are not available, check the trace file (<TREX_DIR>/trace/PythonExtension.log). This contains

information on the incorrect entries in the TREX configuration files. If you cannot solve the problem, contact SAP support.

 

Configuration of the TREX Services in the SAP J2EE Engine

Use

TREX provides programming interfaces (Application Programming Interfaces, APIs) for the languages Java and ABAP that allow access to all TREX functions.

The Java interface (TREX Java client) is part of the SAP Web AS Java as a TREX service.

The graphic below shows the TREX Java client as the interface between the TREX servers and the Java application that uses TREX (for example, Knowledge

Management (KM)):

 

The configuration of the TREX service in the SAP J2EE Engine comprises the following areas.

● TREX Caches

TREX uses caches in the portal to store search results temporarily, for example. You use the configuration of the caches to display the caches and modify

them to your requirements.

There are the following TREX caches:

○ Adminis tration Cache

○ Memory Cache

● TREX Java Client

Java applications use TCP/IP and HTTP/XML to access the TREX search and text-mining functions through the TREX Java client that is part of the SAP

Web AS as a TREX service. The TREX Java client needs to know the address of the TREX name server in order to communicate with the TREX servers.

You configure this during the TREX installation. You sometimes have to configure the other parameters for this communication too.

For more information about configuring the TREX name server, see Specifying the Address of the TREX Name Server .

The following areas of the TREX service configuration are displayed in the SAP J2EE Visual Administrator :

○ TCP/IP communication

○ SSL

○ Name server 

○ Cache for search queries

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 102 of 105

Page 103: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 103/105

Note that this documentation only describes those parameters and values for the TREX service that you have to configure in specific circumstances.

 

TREX Caches

Use

The TREX caches are used by the TREX infrastructure of the portal to store search results temporarily, for example. You use the configuration of the caches to

display the caches and modify them to your requirements. Normally, you do not change these settings. You use the Visual Administrator to display the TREX

cache in the SAP J2EE Engine.

Features

The TREX caches comprise the following cache types:

● Administration cache

● Memory cache

Note that only those parameters and values for the TREX service are described that you have to configure in specific circumstances.

Administration Cache

The administration cache is a memory cache that objects are stored in. The administration cache is used to store TREX commands that are initiated by the TREX

administration control.

Shell Value Description

cache.trexadmin.capacity 100 Capacity of the cache; this value depends on the

number of different search requests.

cache.trexadmin.defaulttimetolive 300 Expiry time for the cache; specifies in seconds how

long the cache entry is to exist.

 

Memory Cache

The administration cache is a memory cache that objects are stored in. The memory cache is used to store search queries and the associated responses.

Shell Value Description

cache.trexmemory.capacity 100 Capacity of the cache; this value depends on the

number of different search requests.

cache.trexmemory.defaulttimetolive 300 Expiry time for the cache; specifies in seconds how

long the cache entry is to exist.

The required caches have already been selected. Do not change these settings.

 

TREX Java Client

Use

The TREX Java client is an interface that Java applications can use to access Search and Classification (TREX) functions. Communication between a Java

application and TREX can take place directly using TCP/IP or using a TREX HTTP server and HTTP/XML.

Integration

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 103 of 105

Page 104: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 104/105

Java applications use TCP/IP and HTTP/XML to access the TREX search and text-mining functions through the TREX Java client that is part of the SAP Web AS

as a TREX service. The TREX Java client needs to know the address of the TREX name server in order to communicate with the TREX servers. You configure this

during the TREX installation. You sometimes have to configure the other parameters for this communication too.

Features

In order to communicate with TREX, the individual TREX servers and their parameters must be registered with the TREX Java client. You do this by configuring

the name server, which then carries out further steps itself.

Overview of Areas and Parameters in the TREX Java Client

Function Description

TCP/IP Communication Parameters of TCP/IP communication between the Java client and the TREX

servers.

SSL Parameter for secure communication using SSL/HTTPS between the TREX

components, the portal, and Content Management.

Name Server  Parameters for configuration of the TREX name server.

Cache for Search Queries Display of caches that are used to store TREX search queries temporarily.

 

TCP/IP Communication

Use

The TREX Java client can access the TREX servers directly using TCP/IP communication. If required, you must configure the parameters for TCP/IP

communication specified below for secure communications.

Features

Parameters Relevant for TCP/IP Communication

Key Value Description

communication.issecure false You must change this value to true if TREXNet has

been configured for secure communication.

For more information about the configuration of TREXNet for secure communication, see Configuring TREXNet for Secure Communication.

 

SSL

Use

You configure the parameters for secure communication (with HTTPS) between the TREX Java client, which is integrated in the Web AS Java as a TREX service

and the TREX Web server here. The Java client and Web server both need a certificate issued by the same certification authority (CA) in order to be able to

communicate with one another securely.

● The Java client needs a client certificate.

● The Web server needs a server certificate.

● Both components need the root certificate of the CA that issues the other two certificates.

For more information about the configuration of secure communication between the TREX Java client and the TREX Web server, see Configuration of the TREX

Security Settings ® Providing the Certificates for the Java Client

 

Features

PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.

Page 104 of 105

Page 105: TREX7.0

7/28/2019 TREX7.0

http://slidepdf.com/reader/full/trex70 105/105

Relevant SSL Parameters

Key Value Description

default.keystore TREXKeyStore Keystores in which the certificates for secure

communication between the Java client and CM

are stored (public key and private key certified by

the CA).

default.truststore TrustedCAs Keystores in which the certificates of certification

authorities (CAs) that you trust are stored.

 

Name Server 

Use

The TREX name server stores and coordinates system-wide information on the TREX installation and on communication between the TREX servers and CM. The

name server settings automatically determine the parameters of the HTTP server, queue server, and index server. There can be scenarios that implement more

than one name server. If this is the case, they are listed here.

The TREX Java client communicates with the central name server directly using TCP/IP and not using the HTTP server using HTTP/XML.

Features

Overview of Parameters for TREX Name Servers

Key Value Description

nameserver.address tcpip://<nameserver>:

<nameserverport>

By default, the name server port is predefined.

 Address of the central name server currently being

used. The name server manages the topology of a

TREX installation.

nameserver.backupserverlist tcpip://< nameserver>:

<nameserverport1>,

<nameserverport2>,

<nameserverport3, ...

Multiple name servers are separated by commas.

List of all available name servers.