Date post: | 14-Apr-2018 |
Category: |
Documents |
Upload: | ravibabu1620 |
View: | 217 times |
Download: | 0 times |
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 1/105
PRINT FROM SAP HELP PORTAL
Document:TREX 7.0
URL:http://help.sap.com/erp2005_ehp_06/helpdata/en/40/83505303bd5616e10000000a114cbd/content.htm
Date created: August 18, 2013
© 2013 SAP AG or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the expresspermission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary
software components of other software vendors. National product specifications m ay vary. These materials are provided by SAP AG and its affiliated companies (" SAP Group") for
informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only
warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein
should be construed as constituting an additional warranty. SAP and other SAP products and services mentioned herein as wel l as their respective logos are trademarks or
registered trademarks of SAP AG in Germany and other countries. Please see www.sap.com/corporate-en/legal/copyright/index.epx#trademark for additional trademark information
and notices.
Note
This PDF document contains the selected topic and its subtopics (max. 150) in the selected structure.Subtopics from other structures are not included.The selected structure has more than 150 subtopics. This download contains only the first 150 subtopics. Youcan manually download the missing subtopics.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 1 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 2/105
TREX 7.0
Purpose
App lications based on SAP NetWeaver 7.0 or SAP NetWeaver 7.3 can use TREX 7.0 or TREX 7.1.
Documentation Structure
This documentation is organized into the following areas:
● TREX Architecture
This area contains information about the TREX architecture, the TREX components, and their functions.
● TREX Configuration
This area contains all relevant procedures that describe how you can configure TREX. The configuration is organized as follows:
○ Post-Installation Configuration
○ Initial Configuration
○ Advanced Configuration
● TREX Administration
Here you can find information about administrating TREX:
○ Starting and Stopping TREX
○ TREX Admin Tools
○ Data Backup and Restore for TREX
○ Monitoring TREX with CCMS
TREX ArchitectureTREX is based on a client/server architecture. The client component is integrated into the application that uses the TREX functions, and allows communication
with the TREX servers. The server component processes the requests; it indexes and class ifies documents and answers search queries.
The client component is subdivided into the Java client and ABAP client. The server component is subdivided into the following servers:
● Web server with TREX extension
● RFC server
● Queue server
● Preprocessor
● Index server
● Name server
The graphic below shows the individual components and the communication between components:
Java Client and ABAP ClientTREX provides programming interfaces (Application Programming Interfaces, APIs) for the languages Java and ABAP. These interfaces are also called the Java
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 2 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 3/105
client and the ABAP c lient.
The interfaces allow access to all TREX functions. You can use the interfaces to create indexes and queues, to perform indexing, and to perform searches. In
addition, the interfaces provide functions to query the internal status of TREX.
The interfaces are part of the NetWeaver Application Servers (NW AS).
Web Server with TREX ExtensionThe Web server is responsible for the communication between Java applications and the TREX servers. The application sends requests to the Web server in
XML format using HTTP/HTTPS. The Web server converts the requests to a TREX-internal format and then forwards them to the responsible TREX servers.
A TREX component that enhances the Web server with TREX-specific functions is installed on the Web server. Technically, this component is implemented as
follows:
· On Windows, as an ISAPI server extension for the Microsoft Internet Information Server
· On UNIX, as a shared library for the Apache Web server
RFC Server The RFC server is responsible for the communication between an SAP system and the TREX servers.
The SAP system sends requests to an RFC server using an SAP Gateway. The RFC server converts the requests to a TREX-internal format and then forwards
them to the responsible TREX servers.
Queue Server The queue server coordinates the p rocessing steps that take p lace during indexing. It collects incoming document, triggers preprocessing by the preprocessor,
and further processing b y the index s erver.
The queue server enables documents to be indexed asynchronously. This has the advantage that you can control the time of indexing. For example, you can
schedule indexing for times when the system load is lower because there are fewer search queries.
In addition, the queue server can trigger index replication and integration of the delta index in the main index.
Preprocessor The preprocessor preprocesses documents and search queries.
Document preprocessing comprises the following steps:
· Loading documents
If the application transmits the documents as URIs rather than directly, TREX resolves the URIs. This involves fetching the documents from the repository that
the URIs reference.
· Filtering documents
Documents can exist in various formats, such as Microsoft Word, Microsoft PowerPoint, PDF, and so on. The preprocessor extracts textual content from the
documents and then converts it into the UTF-8 Unicode format for further processing.
· Analyzing documents linguistically
Linguistic analysis involves sp litting text into individual words and reducing words to base forms (stems). The preprocessor uses a lexicon that exists in
several languages for this.
During search queries, the preprocessor performs a linguistic analysis . It transmits the results of the analysis to the index server, which continues the processing
of the document.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 3 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 4/105
Index Server The index server indexes and c lassifies documents and answers search queries. The processing takes place in the engines that belong to the index server.
There are the following engines:
· Search engine:
This engine is responsible for standard search functions such as the exact, error-tolerant, linguistic, Boolean, and phrase searches.
· Text-mining engine
This engine is responsible for classification, searching for similar documents (‘See Also’ search), the extraction of key words, and so on.
· Attribute engine
This engine is responsible for searching for document attributes such as author, creation date, and change date.
Name Server The name server manages information on the entire TREX system. It makes sure that the TREX servers can communicate with each other and that they receive
all necessary information. The name server has the following tasks:
· Managing topology data
The topology data includes information on the central components of a TREX system (TREX servers, indexes, and queues).
· Coordinating replication services
The replication services are only relevant for a distributed TREX system. The name server has information on which TREX server has a particular data
status. It makes sure that changed data is replicated.
· Load-balancing
The name server accepts requests and distributes them to the responsible TREX servers. It is responsible for distributing indexes and search queries.
· Ensuring high availability
The name server launches several watch dogs. They constantly monitor whether the TREX servers are available. If a TREX is not available, the name server
ensures that the TREX server that is down does not receive any requests.
TREX Configuration
Purpose
The configuration of Search and Classification (TREX) is organized as follows:
● Post-Installation Configuration
You must work through these steps immediately after the installation, so that a single-host installation of TREX works correctly and can be addressed using
an ABAP or Java application. The documentation distinguishes between configuration steps that you have to complete on the TREX server side andconfiguration steps that you have to complete on the client side, that is, on the side of the application using TREX.
● Initial Configuration
The initial configuration comprises procedures that allow you to check problems that occur and solve them, if necessary. You can also improve TREX
performance. These configuration steps are not required in order for TREX to work correctly in the default configuration or in order to allow applications to use
TREX.
● Advanced Configuration
Advanced configuration comprises the following areas:
○ Language Recognition and Processing with TREX
TREX supports the indexing of documents that exist in different languages. When TREX is installed, you select the languages to be identified by language
recognition. You can retrospectively configure TREX to recognize additional languages.
○ File Formats Supported by TREX
Documents whose content and attributes can be indexed and searched by TREX can exist in numerous different file formats. You can configure which file formats
you want to exclude from processing and which parts of XML and HTML files you want to exclude from indexing.
○ Changing Proxy Server SettingsThe TREX preprocessor can access documents on Web pages using a proxy server. You can configure the settings for the proxy server.
○ Activating Python Extensions
Some TREX functions are implemented as Python extensions. If the application using TREX uses these functions, you have to activate the Python extensions.
○ Configuration of the TREX Services in the SAP J2EE Engine
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 4 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 5/105
The TREX Java client is implemented as a TREX service in the J2EE engine. You can use the Visual Administrator to configure TREX caches and the TREX Java
client.
○ Delta Index Configuration
TREX provides the option of activating delta indexes. This allows you to update indexes faster and improve the performance of TREX.
○ Changing the TREX Host Name (Single and Multiple-Host Installation)
You can change the name of the host on which you installed TREX later on, or you can install TREX with a virtual host name. You can do this for both single-host
and multiple-host installations.
○ Configuration of the TREX Security Settings
You can configure secure communication between TREX and the application using it (for example, SAP Enterprise Portal or SAP Customer Relationship
Management).
Post-Installation Configuration
Purpose
After the Search and Classification (TREX) function has been installed, you perform a number of technical configuration steps. The sections below describe:
· General configuration steps that you carry out for your operating p latform.
· Configuration steps that you only carry out if the application in question communicates with TREX using an HTTP or an RFC connection.
Server Side
Purpose
The following sections describe the configuration steps that you have to carry out on the server side.
Configuring TREX for the System Landscape Directory (SLD)
Use
A modern computing environment consists of a number of hardware and software components that depend on each other with regard to installation, software
updates, and demands on interfaces. The SAP System Landscape Directory (SLD) simp lifies the administration of your system landscape.
The SLD is a server application that communicates with a client application using the Hypertext Transfer Protocol (HTTP). The SLD server contains component
information, a landscape description, and a name reservation, which are based on the standard Common Information Model (CIM). The CIM standard is a general
schema for describing the elements in a system landscape. This s tandard is independent of any implementation.
The component description provides information about all available SAP software modules, as well as their combination options and dependencies. This includes
version numbers, current patch level, and dependencies between landscape components.
For more information about the SAP System Landscape Directory, see SAP Help Portal help.sap.com.
To supply data to the SLD that originates from a system other than a J2EE or ABAP system, the executable sldreg is used. The sldreg sends data in XML format
using a predefined DTD. For this purpose it uses an HTTP connection, as shown in the figure below:
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 5 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 6/105
On the TREX host, there is an SLD client, which generates an XML file of this type and which registers itself with the SLD server using sldreg.
Prerequisites
● After the TREX installation, the SLD c lient and the associated executable files are located on your TREX host.
● The SLD server is running.
● You or your SLD administrator have generated the SLD configuration files slddes t.cfg and slddes t.cfg.key.
The slddest.cfg.key file is only available if the configuration of sldreg was generated using the - usekeyfile parameter.
● The user specified in the SLD configuration file slddest.cfg belongs to the DataSupplierLD user role, in order to have permiss ion to send the files to the SLD.
Generating SLD Configuration Files
In case you generate the SLD configuration files (slddest.cfg and slddest.cfg.key) by yourself you have to know the host, port, user and password of the SLD
server. You generate these configuration files by using the executable files which are located on your TREX host.
1. Set the environment variables required by TREX by executing the following scrip ts in a command prompt in the directory <TREX_DIR>:
UNIX
○ Bourne shell sh, Bourne-again shell bash, Korn shell ksh:
. TREXSettings.sh
○ C shell csh:
source TREXSettings.cs h
Windows
TREXSettings.bat
2. Execute the following commands:
○ Without usekeyfile:sldreg -configure <path>/slddest.cfg
○ With usekeyfile: sldr eg -usekeyfile -configure <path>/slddest.cfg
Copying the SLD Configuration Files to the Global SLD Directory
To configure TREX for the System Landscape Directory (SLD), you copy the SLD configuration files slddest.cfg and slddest.cfg.key (if available) to the global SLD
directory on your TREX host.
This directory is called <disk_drive>:\usr\sap\<SAPSID>\SYS\global on Windows and /usr/sap/<SAPSID>/SYS/global on UNIX. In the case of a distributed
TREX installation on Windows, all TREX instances use the configuration files for the TREX global file system with first TREX instance as\\<host_central_instance>\sapmnt\<SAPSID>\SYS\global.
Result
By copying the files slddest.cfg and slddest.cfg.key, you have configured TREX for integration in the System Landscape Directory (SLD).
TREX checks every five minutes whether anything has changed in the TREX system landscape and reports any changes automatically to the SLD server. If
nothing has changed, TREX reports every twelve hours to the SLD server. This allows you to see that this landscape is still active.
Display Results
1. To disp lay the information about TREX systems and services navigate to the screen Content Maintenance
○ In the initial screen for the System Landscape Directory ® Development: Content Maintenance
○ In the initial screen for the System Landscape Direc tory ® Administration ® Content: Content Maintenance
2. In the screen Content Maintenance navigate to Subset and choose All With Instances in the dropdown list.
3. Navigate to Class. In the dropdown list you can display the TREX Services (for examp le TREX Index Service, TREX Name Service) and TREX systems
known by SLD.
Information Transferred to the SLD Server
TREX transfers the following information to the SLD server:
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 6 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 7/105
Information about naming and version
● Software component version (for example, TREX 7.0)
● SAP name (for example, TREX)
● Version (for example, 7.0)
Information about the TREX servers
● Host name, on which the server is running
● Port number that the server is using
● Type of server, for example, indexserver
● Web server URL (instead of the port)
● RFC destination of the RFC server (ins tead of the port)
Information about the TREX instances on individual hosts
● System ID
● Instance number
● Installation directory
● Vers ion information for the TREX software
Information about the TREX configuration
● Name of the TREX hosts (Hosts) that belong to the TREX system landscape
● TREX server roles
○ Roles of the TREX name server (Name Server Mode)
Possible roles are: 1st, 2nd, 3rd Master Name Server, Slave Server
○ Use as master index server or master queue server
○ Roles of the master, slave, and backup index servers
● TREX prep rocessor mode (Preprocessor Mode)
● Information about the TREX installation directory (Base Path)
● Services that have been s tarted by the TREX daemon (Services)
General UNIX Configuration
Purpose
The following sections describe the steps that are necessary after an installation on UNIX.
Checking and Changing UNIX Kernel Parameters
Use
Check the following UNIX kernel parameters and modify them if necessary:
· Number of open files per process
On UNIX platforms, each process may only have a certain number of files open at once. If you create a large number of indexes and queues during routine
operation, the TREX processes, in particular the queue server and index server, open a lot of files.
With many UNIX installations, the value for the maximum number of files that the processes are allowed to have open is too low. The parameter must have
the following value:
Operating System Value
AIX, HP-UX, Sun Solaris At least 2048
Linux At least 1024
· HP-UX only:¡ Process Size
The process size should be at least 2GB.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 7 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 8/105
The process size is not limited for AIX and Sun Solaris.
¡ Files larger than 2 GB
Since TREX can also use files that are larger than 2GB, these must be activated at operating system level.
The TREX directory contains a test program that you can use to check whether the kernel parameters are set at a suitable level. If this is not the case, you should
change the kernel parameters.
Checking Kernel Parameters
1. Log on with the user <saps id>adm.2. Go to the TREX directory.
3. Set the environment variab les required by TREX:
¡ Bourne shell sh, Bourne-again shell bash, Korn shell ksh:
. TREXSettings.sh
¡ C shell csh:
source TREXSettings.csh
4. Test the size and number of open files per process:
portlibtester.x –file
Number of open files:
This command creates test files in the directory /tmp/portlibtester. The test must give a result of at least 1000 files (Linux) or 2000 files for other UNIX
platforms. If this is not the case, you should change the kernel parameters.
5. Only HP-UX – Test the possib le process size:
portlibtester.x –mem
This command calls upon as much main memory as possible. The test must output the value 1900 MB at least. If this is not the case, you should changethe kernel parameters.
Changing Kernel Parameters
AIX
1. Log on as root.
2. Carry out the following steps as app ropriate, depending on whether you are working with or without a Network Information Sys tem (NIS).
¡ (Without NIS) Execute the following command:
chuser nofiles=2000 trx<instance_number>
¡ (With NIS) Add the following entry to the file /etc/security/l imits:
trx<instance_number>:
nofiles=2000
3. Restart the host using reboot.
HP-UX
Changing the process s ize
1. Log on as root.
2. Open the administration tool SAM (usr/sb in/sam).
3. Set at least the following values in the dialog box kernel configuration/configurable.
Kernel Parameter Lowest Acceptable Value
Process Size
maxdsiz 0X80000000 or 2147483648
maxdsiz_64bit 0X80000000 or 2147483648
maxtsiz 0X40000000 or 1073741824
maxtsiz_64bit 0X40000000 or 1073741824
Number of Open Files
maxfiles 2048
maxfiles_lim 2048
nfile 20000
4. Restart the host using reb oot.
Activating files larger than 2 GB
1. Log on as root.
2. Execute the following command:
fsadm -o largefiles <mount-point>
In doing this, you activate usage of files larger than 2 GB on a certain file system.
Linux
1. Add the following line to the end of the scrip t <TREX_D IR>/TREXSettings.sh:
ulimit -n 1024
2. Add the following line to the end of the script <TREX_D IR>/TREXSettings.csh:
unlimit openfiles
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 8 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 9/105
TREXSettings.csh is not relevant for the TREX daemon. It is only relevant if you start the TREX servers manually or execute test scripts.
3. If the TREX daemon is running, restart it.
Sun Solaris
1. Log on as root.
2. Add the following lines to the configuration file /etc/system.
set rlim_fd_max=2048
set rlim_fd_cur=2048
3. Restart the host using reboot.
Result
After making the change, execute portlibtester.x –file again. If the number of open files is still too low, the UNIX system administrator must have restricted this
parameter in another way. Contact the UNIX system administrator to remove this restriction.
Note for Linux: If you receive error messages during indexing, the value 1024 for the number of open files may not be sufficient. If this is the case, run TREX
on root (you can only raise the parameter value to 2048 on root). Proceed as follows:
· Make sure that the scrip t <TREX_D IR>/TREXSettings.sh contains the following line at the end:
ulimit -n 2048
· Make sure that the script <TREX_DIR>/TREXSettings.csh contains the following line at the end:
unlimit openfiles
TREXSettings.csh is not relevant for the TREX daemon. It is only relevant if you start the TREX servers manually or execute test scripts.
· Add a comment sign to the configuration file <TREX_DIR>/<host_name>/TREXDaemon.ini before the following lines:
#userid = trx<instance_number>
#groupid = <group>
This change causes the TREX daemon to run on root next time it starts.
Configuration of the RFC Connection
Purpose
The following sections describe the steps that you carry out if the application and TREX are communicating using an RFC connection.
Process Flow
1. Define the SAP system users.
2. Determine the SAP system connection data
3. Configure the RFC connection in the TREX admin tool using the TREX admin tool (stand-alone).
For more information about how you s tart the TREX admin tool (stand-alone), see Starting the TREX Admin Tool.
Result
For more information about the RFC connection and handling c onnection and configuration errors, see the documentation on the TREX admin tool (stand-alone). You
can find this documentation in the SAP Library at help.sap.com/nw70 ® SAP NetWeaver.
Creating an SAP System User for the TREX Admin Tool(Standalone)
Use
You must create an SAP user that the TREX admin tool (standalone) can use to log on to the SAP system. In addition, the SAP user is required so that the TREX
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 9 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 10/105
alert server has permission to regularly test and check the RFC configuration. When doing this, the user can have been created in the default client or in another
client. In this case, make sure that you enter the associated client for the user during the configuration of the RFC c onnection in the TREX admin tool.
The TREX admin tool (standalone) is used to configure and monitor TREX. You also use this admin tool to configure the RFC connection between TREX and the
ABAP app lication that is using TREX. To use the TREX admin tool (standalone) to create the RFC destination, the admin tool requires a SAP sys tem user that you
create based on the predefined role SAP_BC_TREX_ADMIN. This user then has the authorization required to configure the RFC connection.
For more information on the SAP_BC_TREX_ADMIN role, see SAP Note 766516.
Overview of the Permissions Assigned by the SAP_BC_TREX_ADMIN RoleType and Scope of the Permission Activity Explanation
Permission check
for RFC access
Execute Name of the RFC object to be protected: SYST,
TREX_ARW_ADMINISTRATION
Administration for the
RFC destination
Add or generate, change, display , delete, extended
maintenance
Type of entry in RFCDES: Start of an external
program using TCP/IP
Check on the transaction code at transaction launch Transaction code: SM59, TREXADMIN,
TREXADMIN_AUTH
Administrating TREX Change, disp lay, execute
ABAP: Program run checks Schedule programs for background processing,
execute ABAP program, maintain variants for and
execute ABAP program
ALV standard layout Maintain
App lication log Disp lay, delete
More Information
Configuring and Administrating the RFC Connection
Configuring the RFC Connection in the TREX Admin Tool
Procedure
Create an SAP system user for the TREX admin tool (standalone) and assign the SAP_BC_TREX_ADMIN role to this user.
1. Launch transaction SU01 (user maintenance) or choose Administration ® System Administration ® User Maintenance ® User in the SAP menu. The User
Maintenance: Initial Screen appears.
2. Enter a new user name and choose Create.
3. On the Address tab page, enter the personal data for the user.
4. On the Roles tab page, assign the SAP_BC_TREX_ADMIN role and thus the permiss ion to access the SAP system to the SAP sys tem user for the TREX
admin tool (s tandalone).
Result
This user for the TREX admin tool (standalone) now has the authorization required to configure the RFC connection.
Determining the SAP System Connection Information
Use
The TREX admin tool (stand-alone) can connect to an SAP system in two ways.
· Through a specific app lication server of the SAP system (variant A)
· Through the message server of the SAP system (variant B)
This variant uses the load-balancing function for the SAP system. The message server assigns the request from the TREX admin tool to any application
server.
Depending on the variant used, the TREX admin tool requires different connection information for the SAP system. You must determine the connection information
and specify it later in the TREX admin tool.
SAP recommends using variant B. Variant A has the disadvantage that the connection does not work if the application server is not available.
Procedure
1. Open the SAP Logon.SAP Logon is the program that you use to log on to an SAP system.
2. Note the following connection information:
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 10 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 11/105
Connection Setup Type Required Connection Information
Through an application server (variant A) · SAP system ID (SID)
· System number
· App lication server host name
Through the message server (variant B) · SAP system ID (SID)
· Logon group, such as PUBLIC
· Message server host name
Configuring the RFC Connection in the TREX Admin Tool
Use
You work through the steps below using the TREX admin tool (stand-alone).
Configuration of the RFC connection with the TREX admin tool (stand-alone) is only available as of SAP Basis Component SAP_BASIS 6.20 SP58, 6.40
SP16, and 7.0 SP6. If you are using TREX with an SAP system based on an earlier support package, you have to configure the RFC connection manually as
described in the SAP NetWeaver 04 Installation Guide for Search and Classification (TREX) 6.1. You can find this guide on the SAP Service Marketplace at
service.sap.com/instguides ® SAP NetWeaver ®Released 04 ®Installation ®Cross-NW ®Installation Guide Search and Classification TREX 6.1.
Creating a Connection
1. In the Landscape RFC window, choose the Create Connection function.
2. Choose connection type A or B. Specify the connection data for the SAP system (see Determining the SAP System Connection Information).
3. Specify the SAP system user, the associated password, and the client that the TREX admin tool is to use to log on (see Creating a SAP System User for the
TREX Admin Tool (Stand-Alone)).
If the SAP system user in question exists in the default client, you do not need to specify the client.
Creating an RFC Destination
1. In the Landscape RFC window, choose the RFC Destination (SM59) function.
2. Enter the following parameters:
Field Entry
SAP Sys tem SAP system that you want to set up the connection to.
The list contains all SAP systems that you have registered using Create
Connection.
RFC Destination Name of the RFC destination.
Description Meaningful description of the purpose
The program ID determines under which name the TREX RFC server registers with the SAP gateway. The program ID must be unique for each SAP
gateway. The TREX admin tool ensures this by generating the program ID.
3. Dec ide which SAP gateway you want to use. You have the following options:
Option Comment
Gateway local
(Default setting)
Use local SAP gateways for the application servers.
Gateway central Use the central SAP gateway.
We advise against using a central SAP gateway for distributed TREX
systems. The central SAP gateway is a “single point of failure.”
If you choose this option, enter the following additional parameters:
● Host name (with domain name if necessary) or the IP address of the host
on which the gateway is installed.
● Name of the SAP gateway in the form sapgw<instance_number>
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 11 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 12/105
SAP advises against creating the RFC destination directly in the SAP system. The name of the RFC destination and the program ID must satisfy certain
naming conventions. The TREX admin tool ensures that these are fulfilled.
If you nevertheless create the RFC destination directly in the SAP system, note the following:
● We recommend starting the name of the RFC destination with TREX_.
● Choose the activation type Registered Server Program.
● Choose a program ID that is unique for the SAP gateway used.
● Use the RFC Destinationsfunction to register the RFC destination in the TREX admin tool.
Completing the RFC Configuration
1. In the Landscape RFC window, choose the Connect function.
The TREX admin tool creates the connection to all SAP systems that are known to it. Because the RFC configuration is still incomplete, the configuration
status is yellow or red.
2. Choose Repair All.
The TREX admin tool completes the RFC configuration and starts the TREX RFC server.
This can take several minutes. During this time, the configuration status remains yellow or red. After completion of the configuration process, the status
changes to green.
Do not choose Repair All several times in quick succession. This would trigger the configuration process more than once and delay it.
3. Check the progress by choosing Refresh to update the display.
Client Side
Purpose
The following sections describe the configuration steps that you have to carry out on the client side.
Java Application (HTTP Connection)If a Java application communicates with TREX, you configure the TREX Java client, which is integrated as a TREX service in the J2EE engine. You also check
the client-side proxy settings.
Specifying the Address of the TREX Name Server
Use
TREX provides APIs (Application Programming Interfaces) for the languages Java and ABAP, which allow access to all TREX functions. The Java interface
(TREX Java client) is part of the SAP Web AS Java as TREX service. The TREX Java client needs to know the address of the TREX name server in order to
communicate with the TREX servers.
The following procedure describes how you determine the TREX name server address and how you specify it in the SAP NetWeaver Visual Administrator.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 12 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 13/105
The TREX Java client communicates with the TREX server by HTTP and TCP/IP. Make sure that the TCP port that the name server uses is open.
Procedure
You have to specify the address of the TREX name server in the SAP NetWeaver Visual Administrator by naming the following values
<host_name_of_trex_host>:<name_server_port>:
● <host_name_ of_trex_ host>: name of the host on which TREX is ins talled and where the TREX name server runs.
● <name_server_p ort>: port of the TREX name server
1. You can determine the TREX name server address in two ways:
a. Start the TREX admin tool (see Starting the TREX Admin Tool) and determine the address of the name server using Landscape ® Tree ® topology ® globals
® all_masters.
For example: mytrexhost:34801
b. Determine the port of the TREX name server by means of the following rule: <name_server_port>: 3<instance_number>01
The value <instance_number> signifies the TREX instance number which had been specified during the TREX installation:
Installation directory for TREX
■ On UNIX /usr/sap/<sapsid>/trx<instance_number >
■ On Windows <disk_drive>:\usr\sap \<SAPSID>\TRX<instance_number>
The value for <host_name_of_trex_host> you know from the host where TREX is installed (mytrexhost).
2. Use the user <j2eeadm> to log onto the host on which the J2EE Engine is running.
3. Start the SAP NetWeaver Visual Administrator and log on to the J2EE Engine.
For more information about using SAP NetWeaver Visual Administrator, see SAP Help Portal help.sap .com ® Documentation ® SAP NetWeaver ® SAP
Library ® SAP NetWeaver Library ® SAP NetWeaver by Key Capability ® Application Platform by Key Capability ® Java Technology ® Administration
Manual ® J2EE Engine Administration Tools ® Visual Administrator
4. Click Cluster and navigate to Services ® TREX Service.
5. Enter the address of the TREX name server into the parameter nameserver.address.
tcpip://<host_name_of_trex_host>:<name_server_port>
You enter only the host name or the host name and the domain depending on your network environment.
tcpip://mytrexhost:34801 or tcpip://mytrexhost.mydomain:34801
The address of the TREX name server must be configured for all server processes of the cluster. Otherwise the connection between the J2EE Engine and
TREX cannot be established.
6. Save your changes and confirm the restart of the service.
Checking Proxy Settings
Use
If an application is unable to communicate with TREX, it may be due to the application trying to access TREX using a proxy server. If this is the case, you have to
change the configuration so that access does not take place using the proxy server.
The procedure depends on the application concerned:
● SAP Enterprise Portal 6.0 with Content Management
● Other Java appl ications based on J2EE 6.40
ProcedureSAP Enterprise Portal 6.0 with Content Management
Check the settings in the portal at System Administration ® System Configuration ® Service Configuration ® Applications (Content Catalog) ®
com.sap.portal.ivs.httpservice ® Services ® proxy .
If a proxy server is entered there, you have to enter the TREX host in the field http – Bypass Proxy Servers.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 13 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 14/105
Other Java applications based on J2EE 6.40
For other Java applications, you have to check the configuration of the J2EE Engine. The proxy settings belong to the Java parameters. If a proxy server is
configured in the Java parameters, enter the TREX host in the parameter nonProxyHosts. You can choose one of the following options:
● A lternative 1: D"http.nonProxyHosts=<hostname>.<my domain>|localhost
For <hostname>.<domain>, enter the host name and domain (if necessary) of the TREX host.
● A lternative 2: D"http.nonProxyHosts=*.<mydomain>|localhost
You can change the Java parameters using the SAP J2EE Engine GUI Config Tool. For more information about using this tool, see the SAP Library at the
Internet address help.sap.com ® Documentation ® SAP NetWeaver
Note that you have to specify the name of the TREX host in the same way both on TREX side in the TREX configuration files (topology.ini, sapprofile.ini) and in
the configuration of the J2EE Engine as described above. In case you specify the TREX host name as fully qualified (e.g. PWDF12345.sap.corp) you have to
do so on both sides. A mixed usage of host names does not work.
Initial ConfigurationThe procedures for initial configuration are organized as follows:
· Single-Host System
The initial configuration of the single-host system comprises procedures that allow you to check problems that occur and solve them, if necessary. You can
also improve TREX performance. In contrast to the configuration steps following installation, these configuration steps are not necessary in order for TREX to
work correctly in the default configuration as a single-host system and allow use by an application.
· Distributed System
TREX consists of a client component and a server component. The server component is based on a flexible architecture that allows distributed installation
and thus modification to suit various different requirements. A minimal system consists of a single host that provides all TREX functions. You then have
numerous options for scaling TREX. You can distribute TREX components among several hosts and install individual components more than once. You can
use a scaled scenario to distribute the search and indexing load among several hosts and to ensure the availability of TREX.
Single-Host System
Changing the Index and Queue Directory
Use
SAPinst creates an index directory and a queue directory in the directory <TREX_DIR>. You can change these directories if necessary (for example, if you want
the directories to be located in a different partition).
Procedure
1. Create the index direc tory or queue directory in the required partition.
We recommend that you use the directory names index or queue.
2. Make sure that the directory permissions match with those of the original directory (<TREX_DIR>/index or <TREX_DIR>/queue).
3. Stop TREX (see Starting and Stopping TREX).
4. Edit the configuration file <TREX_DIR>/sap profile.ini. Change the parameter TREX/IndexServer/basepath/index or TREX/IndexServer/basepath/queue so that
the relevant parameter now points to the new directory.
Only use forward slashes (/) in p aths (even on Windows).
The standard configuration is:
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 14 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 15/105
TREX/IndexServer/basepath/index=%(SAP_RETRIEVAL_PATH)/index
TREX/QueueServer/basep ath/queue=%(SAP_RETRIEVAL_PATH)/queue
If TREX is running on UNIX; enter the following:
TREX/IndexServer/basepath/index=/my_path/index
TREX/QueueServer/basepath/queue=/my_path/queue
If TREX is running on Windows and the directories are located on a local disk drive, enter the following:
TREX/IndexServer/basepath/index=D:/my_path/index
TREX/QueueServer/basepath/queue=D:/my_path/queue
If TREX is running on Windows and the directories are located on a file server, enter the following:
TREX/IndexServer/basepath/index=//my_server/my_path/index
TREX/QueueServer/basepath/queue=//my_server/my_path/queue
All remaining paths are only relevant for a distributed system.
5. Start TREX (see Starting and Stopping TREX).
Changing the Web Server Address
Use
SAPinst enters the Web server address fully qualified with domain into the configuration file <TREX_DIR>/topology.ini. Your network configuration dictates whether you have to enter the Web server address with or without the domain. If you have to remove the domain from the address, proceed as follows:
Procedure
1. Stop TREX (see Starting and Stopping TREX).
2. Edit the configuration file <TREX_DIR>/<trex_ host_number>/topology.ini. Remove the domain from the Web server address:
<httpserver>
<<port>>
Before the change: url=http://mytrexhost.mydomain:<port>/ ...
After the change: url=http://mytrexhost:<port>/ ...
…
</httpserver>
3. Start TREX (see Starting and Stopping TREX).
Only Windows: Configuring IIS
Use
You have to configure Microsoft IIS as follows:
Version Configuration
Microsoft IIS 5.x Set Application Protection to High
Microsoft IIS 6.0 ● Create a Web service extension
● Create an application pool
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 15 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 16/105
Procedure for Microsoft IIS 5.x
1. Choose:
○ Windows 2000: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and App lication ® Internet Information Services
○ Windows Server 200 3: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and App lication ® Internet Information Services
(IIS) Manager.
2. Navigate to the Web site SAP_ TREX_<instance_number>.
1. Display the properties of the virtual directory TREXHttpServer . This virtual directory is located beneath the Web site. On the tab Virtual Directory, choose
High (Isolated) in the field Application Protection.
2. Restart the Web server.
Procedure for Microsoft IIS 6.0
Choose:
● Windows 2000: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and Application ® Internet Information Services
● Windows Server 2003: Navigate to Control Panel ® Adminis trative Tools ® Computer Management ® Services and Application ® Internet Information
Services (IIS) Manager.
Create a Web service extension
1. Choose Web Service Extensions.
2. Create an extension with the following data:
Field Entry
Extension name TREXHTTPServer_<instance_number>
Required files <disk_drive>:\usr\sap\<SAPSID>\TRX<instance_number>\exe\WebServer\TREXISAPIExt.dll
Set extension status to Allowed Select this field.
Create an application pool
1. Choose Application Pools. Create an application pool with the following ID:
AppPool_TREX_<instance_number>
You do not need to change the other settings.
2. Display the properties of the appli cation pool you just created and then choose Identity. Select Configurable. Enter the name of the user
(<trex_instance_number>) and give the password twice.
The user <trex_instance_number> must belong to the group IIS_WPG (IIS Worker Process Group).
3. Disp lay the properties of the Web s ite SAP_TREX_<instance_number>. Choose Home Directory and assign the Web site to the application pool that you
just created.
Only Windows: Checking Permissions for the TREX Directory
Use
The TREX setup program creates the Web site SAP_TREX_<instance_number> on the Web server. This causes an anonymous user for access to the Web site
to be defined. The anonymous user needs certain permissions for the TREX directory:
· IIS 5.X: Full Control
· IIS 6.X: Read & Execute
If an error occurs, find out the anonymous user and correct the settings.
Proceed as follows to do this:
· Determine the anonymous user entered in the Web site SAP_TREX_<instance_number>.
· Give this user Full Control access to the TREX directory and to all contained files and sub-directories.
Determining the Anonymous User
Microsoft IIS 5.X
1. Choose:
¡ Windows 2000: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and App lication ® Internet Information Services
¡ Windows Server 200 3: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and Ap plication ® Internet Information Services
(IIS) Manager.
2. Use the secondary mouse button to clic k on the SAP_TREX_<instance_ number> Web s ite. Choose Properties ® Directory Security .
1. In the Anonymous access and authentication control area, choose Edit.
2. In the Anonymous access area, choose Edit .
3. Select the name that is entered in the Username field, and copy i t using CTRL+C.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 16 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 17/105
4. Close the Internet Services Manager.
Now give the determined user full access to the TREX directory on Microsoft IIS 5.X.
Microsoft IIS 6.X
1. Choose:
¡ Windows 2000: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and App lication ® Internet Information Services
¡ Windows Server 200 3: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and Ap plication ® Internet Information Services
(IIS) Manager.
2. Use the secondary mouse button to clic k on the SAP_TREX_<instance_ number> Web s ite. Choose Properties ® Directory Security .
3. In the Authentication and access control area, choose Edit.
3. Select the name that is entered in the Username field, and copy it using CTRL+C.4. Close the Internet Information Services Manager.
Now give the determined user Read & Execute permission for the TREX directory on Microsoft IIS 6.X..
Giving the Determined User Certain Permissions
Windows 2000
1. Use the secondary mouse button to clic k on the TREX directory. Choose Properties ® Security .
2. Choose Add .
3. Select your local host under Look in.
4. Add the copied user name using CTRL+V Check the validity of the user name using Check Names.
5. Choose OK.
6. Select the user and grant the access p ermissions:
¡ IIS 5.X: Full Control
¡ IIS 6.X: Read & Execute
7. Choose Advanced.
8. Select the user again.
9. Select Allow inheritable permissions from parent to propagate to this objectand Reset permiss ions on all child objec ts and enable propagation of inheritable
permissions.
10. Choose OK twice.
Windows Server 200 3
1. Use the secondary mouse button to clic k on the TREX directory. Choose Properties ® Security .
2. Choose Add .
3. Select your local host using Locations.
4. Add the copied user name using CTRL+V Check the validity of the user name using Check Names.
5. Choose OK.
6. Select the user and grant the access p ermissions:
¡ IIS 5.X: Full Control
¡ IIS 6.X: Read & Execute
7. Choose Advanced.
8. Select the user again.
9. Select Allow inheritable permissions from the parent to propagate to this object and Replace permission entries on all child objects.
10. Choose OK twice.
Creating a Web Site Manually (Only Windows)
Use
This section is only relevant if an application communicates with TREX using HTTP.
The TREX setup program normally creates the Web site SAP_TREX_<instance_number> on the Web server. If an error occurred during this process, you have
to create the Web site manually.
Procedure
1. Open the Internet Information Services (Microsoft IIS 5.0) or the Internet Information Services (IIS) Manager (Microsoft IIS 6.0).
○ Windows 2000: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and App lication ® Internet Information Services
○ Windows Server 200 3: Navigate to Control Panel ® Administrative Tools ® Computer Management ® Services and App lication ® Internet Information Services
(IIS) Manager.
2. Use the secondary mouse button to clic k on the TREX Web site (Windows 200 3) or the computer icon (Windows 200 0), and choose New ® Web Si te.
3. A wizard that helps you with the creation process is s tarted. Enter the information from the table below, and adopt the default settings for all other fields.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 17 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 18/105
Field Input
Description SAP_TREX_<instance_number>, for example SAP_TREX_48
TCP Port <free_port>
We recommend that you calculate the port as follows:
3000 0 + 100 * <instance_number> + 5
SAPinst calculates the ports of the TREX servers using this method. The
method ensures that the ports do not clash with another TREX instance on
the same host.
If the instance number is 48 , the port is 3480 5.
Path <disk_drive>:\usr\sap\<SAPSID>\TRX<instance_number>\exe
Permissions (Read, Run scripts, and so on) None. Make sure that no field is checked.
4. When you have created the Web site, you have to create a virtual directory. Use the secondary mouse button to clic k on the Web site
SAP_TREX_<instance_number>, and choose New ® Virtual Directory.
5. A wizard that helps you with the creation process is started. Enter the following information:
Field Input
Alias TREXHTTPServer_<instance_number>
Path <disk_drive>:\usr\sap\<SAPSID>\TRX<instance_number>\exe
Permissions (Read, Run scripts, and so on) Select Execute (such as ISAPI applications or CGI ). Remove the selection for
the other permissions.
6. Display the properties of the virtual directory TREXHTTPServer_<instance_number>. Choose the Virtual Directory tab, and remove the selection for Log
visits and Index this resource.
7. Disp lay the properties of the Web s ite SAP_TREX_<instance_number>. Choose the Web Site tab, and remove the selection for the Enable Logging field.
Checking an RFC Connection
Use
If the connection test fails when you create an RFC destination or search server relation, check the following:
● SAP gateway
● RFC destination
● TREX configuration
Checking the GatewayWith UNIX
1. Check that the process gwrd is running:
ps –fu <gwsadm> | grep gwrd
2. Check whether the group to which the user <gwsadm> b elongs has the access permission rwx for the directory
/usr/sap/<SAPSID>/TRX<instance_number>.
With Windows
1. Use the Task Manager to check whether the process gwrd.exe is running.
2. Check the settings of the gateway serv ice. To do this, choose the following paths:
○ Windows 2000 : Start ® Settings ® Control Panel ® Adminis trative Tools ® Services.
○ Windows Server 2003: Start ® Administrative Tools ® Services.
Start the service SAPGWS_<SAPSYSNR> if it is not already running. If necessary, change the start type of the service so that it starts automatically.
3. Open the SAP Management Console by choosing Start ® Programs or All Programs ® SAP System Management Console. Check whether the gateway
instance has started. If necessary, start the gateway instance using Action ® Start.
Checking an RFC destination:
Check the data that you entered when you created the RFC destination. Pay attention to lowercase and uppercase letters in the input parameters.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 18 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 19/105
Checking the TREX Configuration
Check the gateway parameters in the file <TREX_DIR>/<TREX_host_name>/TREXRfcServer.ini:
● Is the host name of the host on which the gateway is installed correct?
● Does the instance number match the number you specified during the gateway installation?
[CONNECTION]
HOST=<local_host_or_host_name>
INSTANCE=sapgw<gw_instance_number>
The values for the parameters HOST and INSTANCE must be entered in lower case.
Creating a Search Server Relation
Use
It might be necessary to create a search server relation for communication between an application and TREX. The installation documentation on the application in
question will contain information on whether you need a search server relation.
Technical background: The need for a search server relation depends on the version of the TREX ABAP client that is used by the application in question.
There are the following versions:
● The SRET package with the function modules SRET*
● The STREX package with the function modules TREX_*
If the application in question uses the SRET package, you must create a search server relation. If the application uses the STREX package, this step is not
required.
Creating a search server relation consists of the following:
1. Creating a search server relation.
2. Testing the search server relation.
Creating a Search Server Relation
1. Choose transaction SRMO in the SAP system.
2. Choose Create SSR .
3. Enter a name for the search server relation in the field Search Server Relation ID (for example, SSR_TREX).
4. Choose Create SSR .
5. Enter the following data:
Field Entry
Search engine DRFUZZY
This is the internal name of the TREX search engine.
Make sure that you enter DRFUZZY in uppercase and in the format
specified.
RFC Destination (TCP/IP) Name of the RFC destination that you created with the activation type
Registration. This entry must match the name that you assigned when you
created the RFC destination (see Creating an RFC Destination with Activation
Type Registration).
TREXDEFAULT_REG
Description Description of the search server relation, for example, Search Server Relation
for Retrieval Service.
6. Save your entries.
You return to the previous dialog box.
7. Select the newly created search server relation in the table.
8. Choose Set SSR as Default .
9. In the confirmation prompt that appears , choose Yes.
The search server relation is then shown as default in the table.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 19 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 20/105
Testing a Search Server Relation
1. Choose the RFC Destinations tab.
Two entries are listed for the search server relation you created: One for action I (indexing) and one for action S (searching).
2. Select the entry with action = S. Choose Connection Test under Search Engine Settings.
The connection with the TREX RFC server and the TREX search engine is established. You can see this in the version information that is shown for the TREX
components.
3. Select the entry with action = I. Choose Connection Test under Search Engine Settings.
The connection with the TREX RFC server and the TREX search engine is established. You can see this in the version information that is shown for the TREX
components.
If the RFC connection still cannot be established, see Checking an RFC Connection.
Activating Queue Server Usage
Use
There are two methods for indexing:
· With queue server
The RFC server sends the documents to be indexed to the queue server. The queue server collects the documents and transmits them to the index server
according to the conditions defined in the queue parameters. The actual indexing takes place on the index server.
· Without queue server
The RFC server sends the documents to be indexed to the index server.
The most suitable configuration depends on the application. The version of the TREX ABAP client determines whether you can configure usage of the queue
server in the file TREXRfcServer.ini. SAP Note 658052 contains information on which configuration is most suitable for each application and whether you have to
activate the usage of the queue server in the file TREXRfcServer.ini.
Procedure
1. If you have to activate the usage of the queue server, edit the configuration file <TREX_DIR>/<host_name>/TREXRfcServer.ini.
2. In the [CONNECTION] section, set the USE_QUEUESERVER parameter to YES.
[CONNECTION]
USE_QUEUESERVER=YES
…
Result
The changes take effect when you next start the RFC server. The RFC server is automatically started by the TREX daemon and/or by SAP Gateway.
If you use the queue server, check the queue parameters regularly and set them according to your requirements. Make sure that you configure the intervals at
which the queue server is to transmit documents to the index server. The settings that are suitable depend on how often documents are to be indexed, and how
quickly you want them to be available for the search.
You can configure queue parameters using the Python version of the TREX administration tool, for example. For more information, see the SAP Library at the
Internet address help.sap.com ® Documentation ® SAP NetWeaver.
Configuring Queue Parameters
Use
The queue parameters control the interaction between the queue server and the index server. In particular, they specify when the queue server triggers indexing
and optimization of documents. It is important for performance reasons that you have optimum settings for the queue parameters.
When TREX creates a queue, it uses the default settings for the queue parameters. Depending on the document sets that you have to index initially and on thetype of documents you index, you may have to change the default settings.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 20 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 21/105
The default settings that TREX uses for new queues are defined in the configuration file TREXQueueServer.ini. You can change the default settings. However,
you should only make changes to configuration files after consulting SAP support or with a consultant.
Prerequisites
You have already created indexes.
Procedure
You can change the queue parameters for existing queues as follows:
Tool Path
TREX admin tool Queue Admin ® Queue Parameters
TREX monitor in the portal System Administration ® Monitoring ® Knowledge Management ® TREX
Monitor ® Edit Queue Parameters
TREX Admin Tool in the SAP System Transaction TREXADMIN ® Queue Admin ® Set Queue Parameters
For more information about the meaning of the queue parameters, see the SAP Library at help.sap.com.
Checking Performance Settings for the Operating System
Use
To optimize the performance of TREX when using the released Windows platform, you need to check your Windows configuration and make changes if necessary.
Optimizing Data Throughput For Network Applications
The Windows installation normally makes caching settings that are optimized for file servers. The operating system then reserves a large part of the main memory
for the caching of files. Since this file-system cache impairs performance when indexing, you ought to change these settings.
1. Use the secondary mouse button to clic k on My Network Places on the Windows desk top, and choose Properties .
2. Use the secondary mouse button to clic k on the local network connection and choose Properties .
3. Select the entry File and Printer Sharing for Microsoft Networks and choose Properties.
4. Select Maximize data throughput for network app lications.
5. Choose OK twice.
Optimizing Performance for Background Processes
Programs such as Microsoft SQL Server and Microsoft Exchange make the setting described below automatically when they are installed. If you have
installed one of these programs, you do not need to make any changes.
The setting is only relevant if TREX is running as a Windows service.
Windows 2000
1. Use the secondary mouse button to clic k on My Computer on the Windows desktop, and choose Properties.
2. Choose the Advanced tab, and then choose Performance Options.
3. Under Application Response, choose the Background Services field.
4. Choose OK twice.
Windows Server 200 3
1. Use the secondary mouse button to clic k on My Computer and choose Properties.
2. Choose the Advanced tab, and then choose Settings ® Advanced.
3. Select Background services under Adjust for best performance of.
4. Choose OK twice.
Distributed TREX Systems (Multiple Host Installation)
Purpose
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 21 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 22/105
Search and Classification (TREX) consists of a client component and a server component. The server component is based on a flexible architecture that allows a
distributed installation. A distributed system has the following advantages:
· Load distrib ution
You can distribute the search and indexing load among several hosts.
· High availability
You can make searching and indexing highly available.
This guide explains how to plan and implement a distributed system. It is aimed at technology consultants.
The guide is structured as follows:
· Naming Conventions contains information on the naming conventions used in this guide.
· Required Documentation lists the documentation that you need to implement a distributed system.
· Fundamentals contains information on the TREX architecture and basic information on distributed systems. You need this information to plan a distributed
system. Read this information before you begin to implement your distributed system.
· Setting Up a Distributed System and Delta Index and Index Replication Configuration describe how to implement a distributed system.
· Changing a D istributed System describes changes that you can make to your system after the installation.
· Distributed Preprocessing of Documents describes how to distribute the preprocessing of documents among several hosts. This section is relevant if you
want to index documents whose preprocessing takes up a lot of time and system resources. This can be the case if you want to index large PDF files.
· The appendix contains information on stopping and starting a distributed system. It also contains information on starting the TREX admin tool and changing the
queue parameters and the Java client parameters.
Naming ConventionsThe following conventions are valid for this documentation.
Terminology
Term Meaning
TREX instance One installation of the TREX server software
TREX host Host on which the TREX server software is installed
Server Program that offers services (such as an index server or queue server)
Master host Host on which a master index server is running
Slave host Host on which a slave index server is running
Backup host Host on which a backup index server is running
Variables
Variable Meaning
<SAPSID> System ID in uppercase letters
<sapsid> System ID in lowercase letters
<TREX_DIR> Installation directory for a TREX instance. The path to the directory is:
· On UNIX /usr/sap/<SAPSID>/TRX<instance_number>
· On Windows <disk_drive>:\usr\sap\<SAPSID>\TRX<instance_number>
User <sapsid>adm Operating system user that you log on with to administrate TREX.
User SAPService<SAPSID> Operating system user under which the TREX processes run.
User <j2eeadm> Operating system user that you use to log on to the host on which the J2EE
Engine is running.
Abbreviations
The following abbreviations are used in the graphics.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 22 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 23/105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 24/105
TREX is based on a client/server architecture. The client component is integrated into the application that uses the TREX functions, and allows communication
with the TREX servers. The server component processes the requests; it indexes and class ifies documents and answers search queries.
The client component is subdivided into the Java client and ABAP client. The server component is subdivided into the following servers:
● Web server with TREX extension
● RFC server
● Queue server
● Preprocessor
● Index server
● Name server
The graphic below shows the individual components and the communication between components:
Java Client and ABAP ClientTREX provides programming interfaces (Application Programming Interfaces, APIs) for the languages Java and ABAP. These interfaces are also called the Java
client and the ABAP c lient.
The interfaces allow access to all TREX functions. You can use the interfaces to create indexes and queues, to perform indexing, and to perform searches. In
addition, the interfaces provide functions to query the internal status of TREX.
The interfaces are part of the NetWeaver Application Servers (NW AS).
Web Server with TREX ExtensionThe Web server is responsible for the communication between Java applications and the TREX servers. The application sends requests to the Web server in
XML format using HTTP/HTTPS. The Web server converts the requests to a TREX-internal format and then forwards them to the responsible TREX servers. A TREX component that enhances the Web server with TREX-specific functions is installed on the Web server. Technically, this component is implemented as
follows:
· On Windows, as an ISAPI server extension for the Microsoft Internet Information Server
· On UNIX, as a shared library for the Apache Web server
RFC Server The RFC server is responsible for the communication between an SAP system and the TREX servers.The SAP system sends requests to an RFC server using an SAP Gateway. The RFC server converts the requests to a TREX-internal format and then forwards
them to the responsible TREX servers.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 24 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 25/105
Name Server The name server manages information on the entire TREX system. It makes sure that the TREX servers can communicate with each other and that they receive
all necessary information. The name server has the following tasks:
· Managing topology data
The topology data includes information on the central components of a TREX system (TREX servers, indexes, and queues).
· Coordinating replication services
The replication services are only relevant for a distributed TREX system. The name server has information on which TREX server has a particular data
status. It makes sure that changed data is replicated.
· Load-balancing
The name server accepts requests and distributes them to the responsible TREX servers. It is responsible for distributing indexes and search queries.
· Ensuring high availability
The name server launches several watch dogs. They constantly monitor whether the TREX servers are available. If a TREX is not available, the name server
ensures that the TREX server that is down does not receive any requests.
Queue Server The queue server coordinates the p rocessing steps that take p lace during indexing. It collects incoming document, triggers preprocessing by the preprocessor,
and further processing b y the index s erver.
The queue server enables documents to be indexed asynchronously. This has the advantage that you can control the time of indexing. For example, you can
schedule indexing for times when the system load is lower because there are fewer search queries.
In addition, the queue server can trigger index replication and integration of the delta index in the main index.
Preprocessor The preprocessor preprocesses documents and search queries.
Document preprocessing comprises the following steps:
· Loading documents
If the application transmits the documents as URIs rather than directly, TREX resolves the URIs. This involves fetching the documents from the repository that
the URIs reference.
· Filtering documents
Documents can exist in various formats, such as Microsoft Word, Microsoft PowerPoint, PDF, and so on. The preprocessor extracts textual content from the
documents and then converts it into the UTF-8 Unicode format for further processing.
· Analyzing documents linguistically
Linguistic analysis involves sp litting text into individual words and reducing words to base forms (stems). The preprocessor uses a lexicon that exists in
several languages for this.
During search queries, the preprocessor performs a linguistic analysis . It transmits the results of the analysis to the index server, which continues the processing
of the document.
Index Server The index server indexes and c lassifies documents and answers search queries. The processing takes place in the engines that belong to the index server.
There are the following engines:
· Search engine:
This engine is responsible for standard search functions such as the exact, error-tolerant, linguistic, Boolean, and phrase searches.
· Text-mining engine
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 25 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 26/105
This engine is responsible for classification, searching for similar documents (‘See Also’ search), the extraction of key words, and so on.
· Attribute engine
This engine is responsible for searching for document attributes such as author, creation date, and change date.
TREX Instances and the TREX System A TREX instance is an administrative unit that comprises the TREX server components. A TREX instance is s tarted and stopped as a unit.
The following components belong to a TREX instance.
· A name server
· A queue server
· One or more index servers
· One or more prep rocessors
· Optionally, one or more RFC servers
· Optionally, one or more Web servers
A TREX instance runs on a host. It is possible for several TREX instances to run on the same host. A TREX instance is identified by a two-character instance
number. This instance number must be unique on a host.
A TREX sys tem consists of one or more TREX instances. If it consists of only one TREX instance, it is called a single host sys tem. If it consists of multiple
connected TREX instances, it is called a dis tributed system.
Distributed TREX SystemsThe sections below explain concepts that are relevant for distributed TREX systems.
Server TasksIn a distributed system, there are multiple instances of the individual TREX servers (name server, index server, queue server, and so on).
The servers in a distributed system do not have the same rights and have different tasks. The following sections describe these tasks.
Master, Slave, and Backup Index ServersThe index servers in a distributed system have one of the following roles:
· Master index server
· Slave index servers
· Backup index server
A master index server is responsib le for indexing. In the default configuration, it is not responsible for searching.
A slave index server is responsible only for searching and not for indexing.
The separation of the master index server and slave index servers is beneficial to performance. The indexing functions are separate from the searching functions,
so that there is no loss of performance during indexing runs.
A backup index server can replace a master index server if it becomes unavailable. The backup index server is inactive if the master index server is available.
When the master index server restarts after becoming unavailable, it takes over its tasks from the backup server again.
You implement backup index servers in order to make indexing highly availab le. The indexes must be stored centrally, so that both the master and the backup
index servers can have write-access to them.
The index servers are the central components of a TREX system. In principle, their role determines the load that a host has to carry. The documentation below
therefore refers to the hosts according to the role of the index server: A master, slave, or backup host is a host on which a master, slave, or backup index server
is running.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 26 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 27/105
Master and Backup Queue ServersThe queue servers in a distributed system have one of the following roles:
· Master queue server
· Backup queue server
The master queue server is the primary server for managing the queues.
A backup queue server can replace a master queue server if it becomes unavailable. The backup queue server is inactive if the master queue server is
available. When the master queue server restarts after becoming unavailable, it takes over its tasks from the backup queue server again.
You implement backup queue servers in order to make indexing highly available. The queues must be stored centrally, so that both the master and the backup
queue servers can have write-access to them.
Master and Slave Name ServersThe name servers in a distributed system have one of the following roles:
· Master name servers
· Slave name server
The master name servers can update the topology data for the system. The slave name servers can only read the topology data.
In a distributed system you need at least two master name servers, and cannot define more than three. The system automatically defines an active master. If the
active master is unavailable, the next master name server takes over the tasks.
Preprocessor ModesThe preprocessors can run in the following different modes:
· search mode
The preprocessor only p reprocesses search queries. In this mode the preprocessor runs on the slave hosts by default.
· index mode
The preprocessor only preprocesses documents. In this mode the preprocessor runs on the master hosts by default.
· any mode
The preprocessor's tasks are not restricted.
The modes merely define preferences for the distribution of tasks for the preprocessors. If necessary, a preprocessor carries out all tasks regardless of its mode.
For example, in certain circumstances a p reprocessor that runs in index mode also processes search queries. This behavior increases the availab ility of the
system, because in principle all preprocessors are able to carry out all tasks.
Master and Slave Index A master index is the original version of an index. It is managed by a master index server.
A slave index is a copy of a master index. It is managed by a s lave index server. The slave index is created and updated using a replication procedure.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 27 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 28/105
Connection to the Application
HTTP Connection
If theTREX system is connected to a Java application, the Java application communicates with both the name server and with the Web server. The Java
application asks the name server via TCP/IP for the address of a Web server. It then sends the request to the Web server using HTTP/HTTPS. The Web server
forwards the request TREX-internally. The graphic below depicts this communication:
There are multiple Web servers in a distributed TREX system. As soon as the Java application receives the address of one Web server, it communicates with
that Web server for as long as it is available. If the Web server does not answer (for example, because it is overloaded), the Java application swaps to another
Web server.
RFC Connection
If the TREX system is connected to an ABAP application (that is, to an SAP system), both systems communicate via an RFC connection. The SAP system
sends its requests to an SAP gateway. The SAP gateway sends the requests to a TREX RFC server. The TREX RFC server forwards the requests TREX-
internally.
With regard to the SAP gateway, there are two variants:
· Communication takes p lace using the local SAP gateway of the application server.
· Communication takes place using a central SAP gateway.
In the case of a distributed TREX system, SAP strongly recommends using the local SAP gateways of the application servers. On the TREX side, TREX RFC
servers are registered with each local SAP gateway. Each TREX host is connected to each application server of the SAP system.
The graphic b elow depicts this.
Using the local SAP gateways has the following benefits:
· The local SAP gateways process the requests quicker then a central SAP gateway.
· The SAP gateway is not a “single point of failure.” If an appli cation server and its local SAP gateway fails, the requests are distributed among the remaining
application servers and still continue to reach the TREX system.
If you use a central SAP gateway and the SAP gateway fails, the RFC connection fails too. It is not possible to switch to another central SAP gateway
automatically.
Data StorageIn a distributed system you can keep TREX data (indexes, queues, and index snapshots) centrally or on the separate hosts.
Decentralized Data Storage
If data is not kept centrally, each host stores its data in its own directory structure. The data is normally located locally on the hosts.
The following graphic depicts the data and directory structure with decentralized data storage:
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 28 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 29/105
The master indexes, corresponding queues, and the index snapshots are located on a master host. The index snapshots are index copies that the system needs
for index replication.
The slave indexes are located on a slave host. They are created and updated by index replication. There is no other data on the slave hosts.
You cannot use backup hosts in sys tems where data storage is decentralized. This means that you cannot make indexing highly availab le in such systems.
Centralized Data Storage
With centralized data storage, the data is stored so that all TREX hosts can access it.
Centralized data storage can be realized with different hardware solutions: The data can be located on a server that is optimized for file sharing, in a storage area
network (SAN), or on a network attached storage server (NAS server). It is important that the connection between the TREX hosts and the data is sufficiently fast. In
the following documentation, a central storage location is referred to as a file server regardless of the underlying hardware.
Centralized data storage is necessary if you want indexing to be highly available. You can only move from a master index or queue server to a backup index or
queue server if you are using centralized data storage. You can use standard solutions such as the RAID system to make data highly available.
Centralized data storage also has the following advantages if you are only using master and slave hosts:
· Index replication generates less of a network load becaus e the replicated files do not have to be copied onto every s lave host.
· Index replication is quicker.
· Less disk space is required for the replicated indexes because all slave hosts share an index copy.
The following graphic depicts the data and directory structure with centralized data storage:
Features of a Blade System
If you do not want to implement individual hosts you can install TREX on a blade system. TREX supports blade systems that run on UNIX.
A blade system consists of hosts in the form of server blades. A b lade system has the advantage that the initial costs and running costs for maintaining the
system are less than if you were using individual hosts.
The server blades are connected to a central disk storage. This is referred to here as a file server, regardless of the underlying hardware.
The special feature of a TREX installation on a blade system is that the TREX software can be stored centrally as well as the TREX data. This means that you only
have to install the software once on the file server. Maintaining the system is efficient because you only have to implement software updates once.
All server blades on which TREX is running access the same program files. However, each server blade has its own configuration files. The configuration files in
the directory <TREX_DIR> are only used as templates. A script contained in the TREX delivery creates a separate subdirectory for each server blade and copies
the configuration files to this subdirectory. For more information, see Activating the Configuration Clones for Server Blades.
Except for the activation of this script, the remaining configuration takes place as for a system with individual hosts.
The graphic below depicts how data, programs, and configuration files might be stored in a blade system.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 29 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 30/105
Supported SystemsThere are various ways of structuring distributed systems. The table below contains an overview of the systems that are supported. Because the index servers
are the central components, the systems are classed by the role of the index server.
Supported Systems
Number and Roles of Index Servers Data Storage
Backup Master Slave Decentralized Centralized
– 1 1 ! !
– 2 or more At least 1
per master
! !
1 1 1 !
1 for all masters 2 or more At least 1
per master
!
1 per master 2 or more At least 1
per master
!
SAP recommends configuring at least two slave index servers for each master index server, to noticeably improve the performance of the TREX search.
However, all other combinations of master and slave index servers are also possible (for example, one master and three slaves). You can also start with a
minimal configuration of one master index server and one slave index server.
The following is valid for all supported systems:
· You can install all systems on individual hosts or you can use a blade system.
The graphics below depict sys tems on individual hosts. However, all graphics are also valid for blade systems.
· You can connect any system to an app lication using an HTTP connection and/or an RFC connection.
The graphics below depict systems in which Web servers and RFC servers run. If only one type of connection is relevant, only Web servers or only RFC
servers can run.
The sections below describe the supported systems in detail. In the details, the recommended ratio of one master index server to two slave index servers for
improved TREX performance is assumed.
Systems with Master and Slave Index Servers A sys tem with master and slave index servers has the following advantages:
· Load distrib ution for search queries
Parallel search queries are distributed among several slave index servers and can therefore be answered more quickly .
· High availability for searching
Each index is available on multiple slave index servers. If one server goes down, the search queries are distributed among the remaining slave index
servers. If all slave index servers becoming unavailable, the master index server would process the search queries.
· Indexing larger data sets
A master index server can only process a certain amount of data. If you use multiple master index servers, you can index more data than in a single host
system. The data must be dis tributed among several indexes.
If a system has no backup servers, you can store the TREX data either centrally or decentrally. The graphics below only depict systems with decentralized data
storage.
One Master, Multiple Slave Index Servers
You can build a system with one master and several slave index s ervers as dep icted below.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 30 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 31/105
The master index server carries the entire indexing load in this scenario. The searching load is distributed among the slave index servers. Such a system is
suitable for scenarios where one master index server can cope with the amount of data to be indexed.
The smallest recommended system consists of one master and two slave index servers that run on separate hosts. The host that is configured as the master
index server is also configured as the master queue server. The graphic below depicts the system.
Multiple Masters, Multiple Slave Index Servers
A master index server can only process a certain amount of data. If large data sets are to be indexed and you can distribute the data among several indexes, you
can imp lement multiple master index servers. Each master index server manages some of the indexes.
You cannot define multiple master index servers to manage the same index.
TREX distributes the indexes among the master index servers using a round robin procedure. TREX also distributes the queues among the master queue servers
using a round robin procedure. Any queue is located on the same host as the master index to which it belongs.
The load on a master index server depends on how large the indexes become and how often you update the indexes. If automatic index distribution does not lead
to balanced load distribution, you can change the index distribution later on.
The smallest recommended system with multiple mas ter index servers consists of two masters, each with two slave index servers.
You can realize this system in two ways, according to how many CPUs and how much main memory the hosts have. For information on hardware requirements,
see Hardware, Software, and Other Requirements
One index server per host
If your hosts have few CPUs and not much main memory, only one index server can run per TREX instance. If this is the case, you distribute the master index
servers among multiple hosts.
The graphic below depicts a system with two master index servers that are distributed among two hosts.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 31 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 32/105
Multiple index servers per host
If your hosts have sufficient CPUs and main memory, multiple index servers can run for each TREX instance. The TREX setup checks the hardware resources
and automatically configures the number of index servers.
The same number of index servers must run on the master host and on the corresponding slave hosts. If two index servers run on the master host, two index
servers must run on each slave host.
The following graphic depicts a system with two index servers for each host:
One master queue server per master host is sufficient. This server manages the queues for both master index servers running on the host.
You can build systems with multiple masters and multiple slave hosts and with multiple index servers p er host. The graphic b elow depicts such a system.
Systems with Master, Backup, and Slave Index ServersYou can enhance master/slave systems by adding backup servers (backup index servers and backup queue servers). Such enhanced systems offer additional
high availability for indexing.
Each master index server manages some of the indexes. If a master index server goes down, indexing does not normally take place for affected indexes. You
implement backup index servers to avoid this. A backup index server can replace a master index server if it becomes unavailable. The backup index server is
inactive if the master index server is available.
The same is true for the queue server: If a master queue server goes down, queuing normally does not take place for the documents affected, and this means that
indexing cannot take place either. You implement backup queue servers to avoid this. A backup queue server can replace a master queue server if it becomes
unavailable. The backup queue server is inactive if the master queue server is available.
The TREX data has to be stored centrally in systems with backup servers. Otherwise the master and backup servers cannot access the data.
You can build systems with backup servers in the following way:
· One backup server per master server
· One backup server for all master servers
The following factors dictate which variant to choose.
· The number of master servers
· The number of master servers that are expected to be down for maintenance at the same time
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 32 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 33/105
One Backup Server per Master Server
The smallest recommended system with backup servers consists of one file server, one backup server, one master server, and two slave servers.
The graphic b elow depicts this sys tem.
As many index servers must run on the backup host as run on the master host. If two index servers run on the master host, two index servers must run on the
backup host.
The graphic b elow depicts a larger system with multiple master and backup hosts.
In this system, b oth master hosts can b e down at the same time, because each has its own backup host.
One Backup Server for All Master Servers
You can build systems in which one backup host is ass igned to all master hosts. Only one master host can be down at any one time in such systems. If multiple
master hosts with a full load go down, one backup host cannot take on the entire load.
The graphic below depicts a system with two master hosts that share a backup host.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 33 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 34/105
Summary: High AvailabilityIf you are using TREX productively, the system has to be available for as much of the time as possible. Planned downtimes for maintenance and unplanned
downtimes because of software errors should be reduced.
This section summarizes how to make searching and indexing highly available. The depicted measures are made on TREX server side and for the connection
between TREX and the application. Measures that affect other software components or the hardware (highly available file servers, redundant network connection
and so on) are not depicted.
High Availability for SearchingSearching is highly available in a system with master and slave servers. If a slave host goes down, TREX forwards search queries to the other slave hosts.
It can take up to a minute to switch to another slave host. During this phase TREX search queries may not be answered. An error message may be returned.
If you have one master and two slave hosts, you can shut down one of the hosts for maintenance purposes (either the master or one of the slave hosts).
Measures on TREX side
· Each master host has at least two slave hosts.
· Each index is available on at least two slave hosts.
· In systems with an HTTP connection: There are at least two Web servers.
· In systems wi th an RFC connection: See RFC Connection in Connection to the Application.
Measure on the application side
Type of Application Measure
Java application The Java client recognizes at least two name servers.
ABAP app lication See RFC Connection in Connection to the Application.
High Availability for Indexing (Only with Queue Server)
If indexing takes place using queue servers, you can make indexing highly availab le. High availability means the following:
· The app lication can send indexing requests to TREX.
· The system automatically switches to a backup index or queue server if a master index or queue server goes down (failover). Failover is not possible in the
following cases:
¡ If there are network problems
¡ If a file server goes down
¡ If there are communication problems (app lication sends a request and receives no answer)
The switch to a backup index or queue server takes b etween 15 seconds and one minute. During this phase the sys tem stores indexing requests in a cache andsends them to the backup server after the switch.
Measures on TREX side
· There are at least two master name servers.
· Each master index server has a backup index server (its own or one that it shares with the other master index servers ).
· Each master queue server has a backup queue server (its own or one that it shares wi th the other master queue servers).
· If the integration of the delta index takes place us ing the Python scheduler: The Python scheduler is running on all hosts that are configured as master name
servers.
· If index replication takes p lace using the Python scheduler: The Python scheduler is running on all hosts that are configured as master name servers.
· In systems with an HTTP connection: There are at least two Web servers.
· In systems wi th an RFC connection: See RFC Connection in Connection to the Application.
Measure on the application side
Type of Application Measure
Java application The Java client recognizes at least two name servers.
ABAP app lication See RFC Connection in Connection to the Application.
Global File System and TREX InstancesThe TREX server software comprises two parts:
· TREX Instances
These are the program files, configuration files, and so on.
· Global TREX file system
This is a directory structure, in which information about the TREX system instances is stored. For example, this information is required by management tools
to start the TREX system.
There is exactly one global TREX file system for a TREX system. When a TREX instance starts, it must have access to the global TREX file system.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 34 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 35/105
Otherwise, it cannot start.
When planning a distributed TREX system, you must decide which host the global TREX file system should be located on. This determines the installation steps
that you work through later and which installation option you choose respectively. The following installation options are available:
Installation Option Description
Central TREX instance Installing a combination of a TREX instance and a global TREX file system
TREX dialog instance Install the TREX instance only
Global TREX file system Installing the global TREX file system only
The global TREX file system can be located on any host, as long as all TREX instances have access to it when they start. It can be located on a host that belongsto the TREX system, but it does not have to be.
SAP recommends the following for a distrib uted TREX system with centralized data storage:
Place the global TREX file system on the file server that the TREX data (indexes, queues, snapshots) is also stored on.
In order to serve as data storage, the file server and the connection between the TREX hosts and the file server must be highly available. If the global TREX
file system is also located on the file server, you can be sure that the TREX instances can access it at all times.
For the installation process this means that you install the global TREX file system on the file server. You install a TREX dialog instance on every master,
backup, and slave host.
SAP recommends the following for a distrib uted TREX system with decentralized data storage:
Place the global TREX file system on a host that is used as the master name server.
For the installation process this means that you install a central TREX instance on the host that is used as the master name server. On all other hosts, you
install TREX dialog instances.
The graphic b elow depicts such a scenario.
Hardware, Software, and Other RequirementsThis section lists the requirements that are unique to distributed systems. These hardware requirements relate to production systems.
CPU, RAM, Network
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 35 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 36/105
Requirement Type Requirement
CPU For one index server per TREX instance:
· At least 2 CPUs
· Recommended: 4 CPUs
With two index servers per TREX instance: At least 4 CPUs.
The supported processors are listed in the TREX installation guide.
RAM At least 2 GB per CPU
Network connection At least 100 Mbit
With centralized data storage: Connec tion to the file server. At leas t 1 Gb it
SAP recommends that you define a separate sub network.
All TREX hosts must be identical as regards the number of CPUs, RAM, and network connection.
Decentralized Data Storage: Disk Space for TREX Data
The formulas specified here are approximate and do not return exact values.
Required d isk space on one master host
Only HTML/Text Documents Mixed Documents (DOC, PDF, and so on)
Index size + queue (permanent) =
Document set size x 2
Index size + queue (permanent) =
Document set size x 0.5
Index snapshot size (p ermanent) =
Document set size – 2 x 0 .7
Index snapshot size (permanent) =
Document set size x 0.5 x 0.7
Temporary disk space =
Document set size x 1.5
Temporary disk space =
Document set size x 0.5
We strongly recommend that you place the master indexes and the index snapshots on different hard disks. This improves performance when indexing and
replicating indexes.
Required disk space per slave host
Only HTML/Text Documents Mixed Documents (DOC, PDF, and so on)
Index size (permanent) =
Document set size x 2
Index size (permanent) =
Document set size x 0.5
Index snapshot size (temporary) =
Document set size x 2 x 0.7
Index snapshot size (temporary) =
Document set size x 0.5 x 0.7
The hard disk capacity and performance must be identical on master and slave hosts.
You have a document set size of 50 GB of HTML/text documents or 50 GB of mixed documents.
The following table presents the required space on the master host.
Master Host 50 GB
HTML/Text Documents
50 GB
Mixed Documents
Index + queue (permanent) 100 GB
(50 GB x 2)
25 GB
(50 GB x 0.5)
Index snapshot (permanent) 70 GB
(50 GB x 2 x 0 .7)
17.5 GB
(50 GB x 0.5 x 0.7)
Temporary 75 GB
(50 GB x 1.5)
25 GB
(50 GB x 0.5)
Total 245 GB 67.5 GB
The following table presents the required space on each slave host.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 36 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 37/105
Slave Host 50 GB
HTML/Text Documents
50 GB
Mixed Documents
Index
(permanent)
100 GB
(50 GB x 2)
25 GB
(50 GB x 0.5)
Index snapshot (temporary) 70 GB
(50 GB x 2 x 0 .7)
17.5 GB
(50 GB x 0.5 x 0.7)
Total 170 GB 42.5 GB
Centralized Data Storage: Disk Space for TREX Data
The formulas specified here are approximate and do not return exact values.
Required d isk space on the file server
Only HTML/Text Documents Mixed Documents (DOC, PDF, and so on)
Index size + queue (permanent) =
Document set size x 2
Index size + queue (permanent) =
Document set size x 0.5
Index snapshots s ize (permanent) =
Document set size x 2 x 1.4
Index snapshots size (permanent) =
Document set size x 0.5 x 1.4
Temporary disk space =
Document set size x 1.5
Temporary disk space =
Document set size x 0.5
You do not need additional disk space for the slave index. The slave index servers use one of the index snapshots as their slave index.
You have a document set size of 50 GB of HTML/text documents or 50 GB of mixed documents.
This results in the following disk requirements on the file server:
File server 50 GB
HTML/Text Documents
50 GB
Mixed Documents
Index + queue (permanent) 100 GB
(50 GB x 2)
25 GB
(50 GB x 0.5)
Index snapshots (permanent) 140 GB
(50 GB x 2 x 1.4)
35 GB
(50 GB x 0.5 x 1.4)
Temporary 75 GB
(50 GB x 1.5)
25 GB
(50 GB x 0.5)
Total 315 GB 85 GB
Disk Space for TREX Software and SAPinst
As for a single host sys tem (see the TREX installation guide).
Software Requirements
Requirement Type Requirement
Operating system platform All TREX hosts must run on the same operating system platform Mixed
installations (for example, one TREX host on HP-UX and another on Windows)
are not supported.
There is no dependency between TREX and the application using TREX with
regard to the operating system used. You can install TREX on a different
operating system to the application that accesses TREX.
TREX release All TREX hosts must have the same TREX release with the same patch level.
The software requirements in the TREX installation guide are also valid.
Operating System User and PermissionsThe installation automatically c reates the operating system user SAPService<SAPSID>.
In the case of a TREX system with centralized data storage, you must ensure that the user SAPService<SAPSID> has full access permission for the TREX data
directory on the file server. Note the following:
· If the user is a network user (domain user), you have to ensure this for this one network user.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 37 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 38/105
· If the user is a local user, you have to ensure this for all local SAPService<SAPSID> users.
In the case of a TREX system with decentralized data storage, there are no special requirements regarding access permission.
System ID
During the TREX installation, you enter a three digit system ID, for example, TRX. You must use the same system ID for all TREX instances that you want to
group together as a distributed system.
TREX Instance Number SAP recommends that you use the same instance number for all TREX instances in order to simplify administration. You define the instance number during the
TREX installation.
There is only one TREX installation in a blade system with centralized program storage. The instance number is the same for all server blades. During the
installation of TREX you have to choose an instance number that is still free on all the server blades on which TREX is going to run.
TREX Daemon
You only have to change the configuration of the TREX daemon on the individual hosts under certain circumstances. These circumstances are described in this
documentation.
Otherwise, you can keep the standard configuration, even if the TREX daemon starts processes that are not used. Such processes do not use up system
resources and therefore do not affect performance. If you keep the standard configuration it is easy to change the roles of the hosts.
By default, a queue server runs on each host. The queue server has no function on a slave host. It is not used. You do not need to make configuration changesto the TREX daemon on the slave host.
Connecting TREX to More Than One Application
In principle, you can connect one TREX system to more than one application. Note the following:
· The TREX system must have appropriate dimensions so that it can process the load of all the applications.
· You must take organizational measures to ensure that the applications use separate index namesp aces.
ConstraintsNote the following constraints for distributed systems.
TREX Instances
The TREX instances that form a distributed system must run on different hosts. You cannot combine several TREX instances on the same host to form a
distributed system.
Hosts
· SAP recommends using a maximum of 4 master hosts.
· SAP recommends using a maximum of 2 slave hosts per master host
For information on equipping the hosts, see Hardware, Software, and Other Requirements
Master name servers
You need at least two master name servers, and cannot define more than three. Keep to the following rules:
· First distribute the master name servers on the master hosts.
· If there is a backup host, distribute the other master name servers there.· If there is no backup host, distribute the other master name servers on the slave hosts.
Indexes
· You can have a maximum of 50 master indexes per master index server.
· If the master index server has 2 G B working memory per CPU, the maximum size of a master index is 100 GB.
Planning
Purpose
In the planning phase you analyze your requirements and define the structure of the distributed system. An analysis of your requirements shows you how many
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 38 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 39/105
hosts you need and the tasks that the hosts will carry out.
To simplify the installation and configuration of the system, you should create the following during the planning phase:
· A graphical depiction of the distrib uted system
· A table containing the host names and roles of the hosts involved
Example
Graphical depiction of the system:
Table with system information:
Host Name Installation Option To Use Role Comment
myfileserver Global TREX file system File server Storage location for:
· Global TREX file system
· TREX data (indexes , queues,
index snapshots) and topology
file
mytrexmaster1 TREX dialog instance Master name server
Master index server
Master queue server
Master host, manages part of the
master indexes
mytrexmaster2 TREX dialog instance Master name server
Master index server
Master queue server
Master host, manages part of the
master indexes
mytrexbackup TREX dialog instance Master name server
Backup index server
Backup queue server
Backup host for mytrexmaster1 and
mytrexmaster2
mytrexslave1 TREX dialog instance Slave name server
Slave index server
Slave host for mytrexmaster1
mytrexslave2 TREX dialog instance Slave name server
Slave index server
Slave host for mytrexmaster1
mytrexslave3 TREX dialog instance Slave name server
Slave index server
Slave host for mytrexmaster2
mytrexslave4 TREX dialog instance Slave name server
Slave index server
Slave host for mytrexmaster2
Setting Up a Distributed System
Purpose
If you are implementing a distributed system, you initially install all server software on each host. You then configure the hosts according to the tasks that each
host is to carry out.
The following sections describe how you set up a distributed system from scratch. All tasks that are necessary for the initial configuration of the system are
described.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 39 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 40/105
Checklists
Purpose
The procedure depends on the following:
· The type of data storage you use (centralized or decentralized)
· The hardware you are using (individual hosts or blade system)
Below are the checklists for the different scenarios.
Individual hosts with central data storage
! Action
On the file server:
Install a TREX global file system (see the TREX installation guide)
Create a Central TREX Data Directory
On all TREX hosts
Install a TREX dialog instance (see the TREX installation guide)
Only UNIX: Mount the Central TREX Data Directory
Only Windows: Define a network drive for the central TREX data directory
Start TREX
On a future master name server
Configure the Landscape
With an RFC connection: On all TREX hosts and in the SAP system
Configure the RFC Connection
With an HTTP connection: On the J2EE Engine
Configure the HTTP Connection
Blade system with central program and data storage
! Action
On the file server:
Install TREX (see the TREX installation guide Single Host )
Activate the Configuration Clones
On all server blades on which TREX is to run
Start TREX
On a future master name server
Configure the Landscape
With an RFC connection: On all server blades and in the SAP system
Configure the RFC Connection
With an HTTP connection: On the J2EE Engine
Configure the HTTP Connection
Individual hosts with decentralized data storage
! Action
On a future master name server
Install a central TREX instance (see the TREX installation guide)
On all other TREX hosts
Install a TREX dialog instance (see the TREX installation guide)
On all TREX hosts
Start TREX
On a future master name server
Configure the Landscape
With an RFC connection: On all TREX hosts and in the SAP system
Configure the RFC Connection
With an HTTP connection: On the J2EE Engine
Configure the HTTP Connection
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 40 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 41/105
Preparing for Centralized Data Storage
Purpose
If you want to store the TREX data centrally on a file server, you have to prepare first. The sections below describe the steps necessary for this.
Creating a Central TREX Data Directory
Procedure on UNIX1. Create a directory for the TREX data on the file server.
2. Make sure that the directory belongs to the user SAPService<SAPSID>.
3. Share the directory so that all TREX hosts have full permission (read, write, and execute) for it.
The exact procedure is described in the documentation for your operating system platform.
Procedure on Windows
1. Create a directory for the TREX data on the file server.
2. Share the directory so that the user SAPServ ice<SAPSID> has full permission for it.
The exact procedure is described in the documentation for your operating system platform.
Only UNIX: Mounting the Central TREX Data Directory
Procedure
Use mount to mount the TREX data direc tory that you created on the file server onto all TREX hosts. Note the following:
· Mount the directory in the same place (mount point) in the file system on all TREX hosts.
You created the directory mytrexdir on the file server. You mount this directory on all hosts at /mymountpoint/mytrexdir.
This mount point must be the same on all hosts. Otherwise, the system cannot swap from a master server to a backup server. Moreover, the slave servers
cannot use a common slave index.
· Make sure that the user <saps id>adm has full permission (read, write, and execute) for this directory.
· Make sure that the directory will b e automatical ly mounted if the host is reb ooted before starting TREX.
The exact procedure is described in the documentation for your operating system platform.
Only Windows: Defining the Network Drive
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 41 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 42/105
Use
SAP strongly recommends that you define a network drive on all TREX hosts for the central TREX data directory on the file server. Access via a network drive is
much quicker than access without a drive.
The procedure below describes the required configuration steps.
Procedure
Edit the configuration file TREXDaemon.ini on all TREX hosts as follows:TREXDaemon.ini
[mappings]
map_<network_drive_letter>=\\<file_server>\<TREX_data_directory>
Define the same network drive on all TREX hosts: Use the same network drive letter and specify the same directory.
In the standard system, the system uses the user SAPSys tem<SAPSID> to access the network drive.
If you want the system to use a different user for access, specify this as follows:
map_<network_drive_letter>=\\<file_server>\<TREX_data_directory>
user_<network_drive_letter>=<user_name>
password_<network_drive_letter>=<password_in_plain_text>
Result
The changes take effect when TREX is next started.
Example
You have created the directory mytrexdir on the file server myfileserver and shared it as mytrexshare. You want to connect the directory on all TREX hosts as the
network drive T:.
Configuration in TREXDaemon.ini:
[mappings]
map_t=\\myfileserver\mytrexshare
Activating the Configuration Clones for Server Blades
Use
You can install TREX on a blade system so that the TREX data and program files are stored only once on the file server and are used by all server blades. Every
server blade on which TREX is running needs its own configuration files.
You use a Python script to duplicate the profile files and the configuration to all server blades in your TREX landscape so that each server blade receives its own
configuration files.
You do this in the following steps:
Initial Installation of TREX on a Central File Server 1. Mount the central file server.
/mnt/myfileserver
SAP recommends that you enter the directory /mnt/myfileserver in the configuration file /etc/fstab, so that the directory is automatically remounted when the
host is started again.
2. Create a subdirec tory, for examp le, <SAPSID> for the directory /mnt/myfileserver .
/mnt/myfileserver/<SAPSID>
3. Generate symbolic links (symlinks ), which link from the directories /usr/sap/<SAPSID> and /sapmnt/<SAPSID> to the directory
mnt/myfileserver/<SAPSID>.
4. Install TREX
5. Check whether TREX has been started and, if necessary, start TREX.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 42 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 43/105
Duplicating Profile Files and the Configuration to Server Blades
6. Log on with the user root.
7. Mount the central file server.
/mnt/myfileserver
8. Generate symbolic links (symlinks ), which link from the directories /usr/sap/<SAPSID> and /sapmnt/<SAPSID> to the directory
mnt/myfileserver/<SAPSID>.
9. Switch to the TREX directory /usr/sap/<SAPSID>/TRX<instance_number>.
10. Set the environment variables required by TREX by executing the following Shell scripts.
¡ Bourne shell sh, Bourne-again shell bash, Korn shell ksh:
. TREXSettings.sh
¡ C shell csh:
source TREXSettings.csh
11. Execute the Python script cloneInst.py:
python exe/python_support/cloneInst.py
Result
The Python script cloneInst.py executes the following actions on the server blades that have been added:
· Create the same users on the added server blade as on the initial server blade
· Copy and modify the SAP profile files from the initial server b lade
· Copy and modify the configuration files from the initial server blade
· Extend the directories /etc/init.d and /usr/sap/sapservices
· Start TREX
Landscape Configuration
Purpose
You use the TREX admin tool to configure the landscape. This tool has a graphical administration interface.
Prerequisites
TREX has been started on the hosts that form the distributed system.
Process Flow
1. Start the TREX administration tool on one of the future master name servers.
2. Go to the Landscap e Configuration window.
3. Define a new landscape.
4. Add the remaining hosts.
5. Define the roles of the hosts.
6. Configure centralized data storage if required.
7. Check and activate the configuration.
Result
You have now defined the structure of a distributed system. You now have to configure the delta index and index replication. For more information, see Delta Index
and Index Replication Configuration.
Defining a New Landscape
Use
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 43 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 44/105
The local host is entered in the Hosts table in the Landscape Configuration window. By default, the local name server has been defined during the TREX
installation as the First (1st ) Master Name Server(see the column Name Server Mode) and as the Master Index Server and the Master Queue Server. Since the
local name server is already preconfigured as the master name server, you can use it as the starting point for configuring your distributed TREX system
landscape.
Procedure
Enter a meaningful description for the new TREX system landscape.
Adding a Host
Use
You use the preconfigured master name server as the starting point for the configuration of your distributed TREX system landscape and then add the remaining
TREX hosts to it. Note the following:
· If multiple TREX instances are running on a host, you can only add one of them.
· You can only add TREX instances that belong to no other distributed system.
· You can only add TREX instances that have the same sy stem ID.
Procedure
1. Choose Add Host.
2. Enter the address of the name server that runs on the host to be added. The name server port is
3<trex_instance_number>01
If the instance number is 48 , the name server port is 348 01.
Defining the Roles of Hosts
Use
After you have added hosts to the distributed system, you define which roles the hosts are to have. There are the following roles:
· Master name servers
There can be up to three master name servers in a distributed system. At least two must be defined. See Constraints for information on the hosts on which
the master name servers must be located.
· Master index servers and master queue servers
· Slave index servers
· Backup index servers and backup queue servers
Defining a Master Name Server
1. Select the required host in the Hosts table.
2. In the column Name Server Mode, choose 1st master, 2nd master, or 3rd master.
Defining a Master Index Server or Master Queue Server
1. Select the required host in the Hosts table.
2. Select Master Index/Queue Server.
Defining a Slave Index Server
1. Select Use Slave Index Servers.
2. Define the slave index server in the Hosts table.
a. Select the host that you want to define as a slave index server.
b. In the column Slave Index Server for… spec ify the master index server to which the server b elongs.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 44 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 45/105
Defining a Backup Index Server or Backup Queue Server
1. Select Use Backup Index/Queue Servers.
2. If the master servers are to share a backup server, select Use One Shared Backup Server.
The graphics below only depic t the master and backup index servers. The graphics are also valid for master and backup queue servers.
Example 1
The following system has two master servers that are sharing a backup server.
In this case, select Use One Shared Backup Server.
Example 2
In the following system, each master server has its own backup server.
In this case, do not select Use One Shared Backup Server.
3. Define the backup server in the Hosts table.
a. Select the host that you want to define as a backup server.
b. If you have selected Use One Shared Backup Server, you just need to indicate that the host is the backup server. If you did not select this field, specify the
master server to which this server belongs in the column Backup Index/Queue Server for…
Configuring Centralized Data Storage
UseIf you want to store the TREX data centrally on a file server, specify this fact when configuring the landscape.
If you are using a file server, TREX automatically stores a topology file on the file server. The master name servers then share this topology and no longer use
their local topology files.
This has the advantage that the master name servers do not need to synchronize their local topology files. In some circumstances, synchronization can cause
a master name server to use an out-of-date topology file because it did not receive all of the changes.
Example: A host on which a master name server is running has not been in operation for some time. Its local topology file is therefore out-of-date. If you stop all
TREX hosts and then start the master name server that has been out of operation first, the system will use its out-of-date topology file. If this happens, update
the topology file manually us ing backup copies.
Prerequisites
You have prepared for centralized data storage.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 45 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 46/105
Procedure
1. Select Use a File Server .
2. On all hosts, change the path specifications so that they reference the central TREX data directory on the file server:
a. Select a host in the Hosts table.
b. Enter the relevant central TREX data directory on the file server (UNIX) or network drive (Windows) in the Base Path column.
Examples of Path Specifications
UNIX: Individual hosts with a file server
You have created the central TREX data directory mytrexdir on the file server myfileserver. This directory is mounted at /mypath/mytrexdir on all TREX hosts. The
path sp ecifications are as follows:
Host Base Path
mytrexhost_1 /mypath/mytrexdir
... ...
mytrexhost_n /mypath/mytrexdir
UNIX: Blade system with a file server
You have installed TREX in the directory usr/sap/trex_<instance_number> on the file server myfileserver. All server b lades access this directory. The TREX data
should be located in the installation directory. The path specifications are as follows:
Host Base Path
mytrexhost_1 /usr/sap/<SAPSID>/TRX<instance_number>
... ...
mytrexhost_n /usr/sap/<SAPSID>/TRX<instance_number>
You do not have to change the default value for the base path.
Windows: Individual hosts with a file server
You have created the central TREX data directory mytrexdir on the file server myfileserver. The directory is connected as the network drive T:. The path
specifications are as follows:
Host Base Path
mytrexhost_1 T:
... ...
mytrexhost_n T:
Checking and Activating the Configuration
UseYou can check whether the landscape configuration is consistent and complete at any time. This allows you to check the effects of the configuration changes
without activating them. Activate the configuration when you have made all necessary settings.
Procedure
· To check the configuration, choose Check.
· To activate the configuration, choose Deploy.
Result
When you check the configuration, the output area shows the checks that are carried out. If the configuration is not consistent, the system issues a message telling
you so. You can use information in this message to revise your configuration.
The system also checks the configuration when you activate it. If the configuration is consistent, the system updates the configuration files of the affected hosts
and restarts the servers if necessary. If the configuration has errors, the output area displays appropriate messages and does not update the configuration files.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 46 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 47/105
Example ConfigurationsThe following sections depict example systems and the relevant configurations.
Systems with Master and Slave Index ServersThis section depicts the configuration for systems with master and slave index servers.
One Master Index Server, Two Slave Index Servers
TREX admin tool, Landscape Configuration
Areas Scenario, Scenario Details, and Index
Area Field Value
Scenario Use Backup Index/Queue Servers
Use Slave Index Servers !
Use a File Server
Scenario Details Use One Shared Backup Server
Assign Existing Indexes/Queues to New
Backup/Slave Servers
Index Search on Master/Backup Server
Search Version majority
Replication Threads 1
Hosts table (extract 1)
Host Name Server Mode Master Index/Queue Server Slave Index Server for Preprocessor Mode
mytrexmaster1 1st master ! index
mytrexslave1 slave ! mytrexmaster1 search
mytrexslave2 2nd master ! mytrexmaster1 search
Hosts table (extract 2)
Host Base Path
mytrexmaster1 %(SAP_RETRIEVAL_PATH)
mytrexslave1 %(SAP_RETRIEVAL_PATH)
mytrexslave2 %(SAP_RETRIEVAL_PATH)
Two Master Index Servers, Two Slave Index Servers Each
One index server per host
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 47 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 48/105
TREX admin tool, Landscape Configuration
Areas Scenario, Scenario Details, and Index
As in the previous example
Hosts table (extract 1)
Host Name Server Mode Master Index/Queue Server Slave Index Server for Preprocessor Mode
mytrexmaster1 1st master ! index
mytrexmaster2 2nd master ! index
mytrexslave1 slave ! mytrexmaster1 search
mytrexslave2 slave ! mytrexmaster1 search
mytrexslave3 slave ! mytrexmaster2 search
mytrexslave4 slave ! mytrexmaster2 search
Hosts table (extract 2)
Host Base Path
mytrexmaster1 %(SAP_RETRIEVAL_PATH)
mytrexmaster2 %(SAP_RETRIEVAL_PATH)
mytrexslave1 %(SAP_RETRIEVAL_PATH)
mytrexslave2 %(SAP_RETRIEVAL_PATH)
mytrexslave3 %(SAP_RETRIEVAL_PATH)
mytrexslave4 %(SAP_RETRIEVAL_PATH)
Two index servers per host
TREX admin tool, Landscape Configuration
Areas Scenario, Scenario Details, and Index
As in the previous example
Hosts table (extract 1)
Host Name Server Mode Master Index/Queue Server Slave Index Server for Preprocessor Mode
mytrexmaster1 1st master ! index
mytrexslave1 slave ! mytrexmaster1 search
mytrexslave2 2nd master ! mytrexmaster1 search
Hosts table (extract 2)
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 48 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 49/105
Host Base Path Services
mytrexmaster1 %(SAP_RETRIEVAL_PATH) nameserver, preprocessor1,
indexserver1, queueserver, indexserver2
mytrexslave1 %(SAP_RETRIEVAL_PATH) nameserver, preprocessor1,
indexserver1, queueserver, indexserver2
mytrexslave2 %(SAP_RETRIEVAL_PATH) nameserver, preprocessor1,
indexserver1, queueserver, indexserver2
TREXDaemon.ini on all hosts (extract)
[daemon]
programs=nameserver, p reprocessor1, indexserver1, queueserver, indexserver2
Systems with Master, Backup, and Slave Index ServersThis section depicts the configuration for systems with master, backup, and slave index servers. The systems differ as to the number and ass ignment of backup
index servers. The following variants are taken into account:
· One backup index server per master index server
¡ One backup index server, one master index server
¡ One backup index server, two master index servers
· One backup index server for all master index servers
The same spec ifications are valid for the master and backup queue servers.
One Backup Index Server, One Master Server
TREX admin tool, Landscape Configuration
Areas Scenario, Scenario Details, and Index
Area Field Value
Scenario Use Backup Index/Queue Servers !
Use Slave Index Servers !
Use a File Server !
Scenario Details Use One Shared Backup Server
Assign Existing Indexes/Queues to New
Backup/Slave Servers
Index Search on Master/Backup Server
Search Version majority
Replication Threads 1
Hosts table (extract 1)
Host Name Server Mode Master Index/Queue Server Backup Index/Queue Server
for
Slave Index Server for
mytrexmaster 1st master !
mytrexbackup 2nd master ! mytrexmaster
mytrexslave1 slave ! mytrexmaster
mytrexslave2 slave ! mytrexmaster
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 49 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 50/105
Hosts table (extract 2)
Host Preprocessor Mode Base Path
mytrexmaster index UNIX: /mypath/mytrexdir
Windows: T:
mytrexbackup index UNIX: /mypath/mytrexdir
Windows: T:
mytrexslave1 search UNIX: /mypath/mytrexdir
Windows: T:
mytrexslave2 search UNIX: /mypath/mytrexdir
Windows: T:
Windows: TREXDaemon.ini on all hosts (extract)
[mappings]
map_t=\\myfileserver\mytrexshare
Two Backup Index Servers, Two Master Index Servers
TREX admin tool, Landscape Configuration
Areas Scenario, Scenario Details, and Index
As in the previous example
Hosts table (extract 1)
Host Name Server Mode Master Index/Queue Server Backup Index/Queue Server
for
Slave Index Server for
mytrexmaster1 1st master !
mytrexmaster2 2nd master !
mytrexbackup1 3rd master ! mytrexmaster1
mytrexbackup2 slave ! mytrexmaster2
mytrexslave1 slave ! mytrexmaster1
mytrexslave2 slave ! mytrexmaster1
mytrexslave3 slave ! mytrexmaster2
mytrexslave4 slave ! mytrexmaster2
Hosts table (extract 2)
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 50 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 51/105
Host Preprocessor Mode Base Path
mytrexmaster1 index UNIX: /mypath/mytrexdir
Windows: T:
mytrexmaster2 index UNIX: /mypath/mytrexdir
Windows: T:
mytrexbackup1 index UNIX: /mypath/mytrexdir
Windows: T:
mytrexbackup2 index UNIX: /mypath/mytrexdir
Windows: T:
mytrexslave1 search UNIX: /mypath/mytrexdir
Windows: T:
mytrexslave2 search UNIX: /mypath/mytrexdir
Windows: T:
mytrexslave3 search UNIX: /mypath/mytrexdir
Windows: T:
mytrexslave4 search UNIX: /mypath/mytrexdir
Windows: T:
One Backup Index Server for All Master Index Servers
TREX admin tool, Landscape Configuration
Area Scenario Details
Area Field Value
Scenario Use Backup Index/Queue Servers !
Use Slave Index Servers !
Use a File Server !
Scenario Details Use One Shared Backup Server !
Assign Existing Indexes/Q ueues to New
Backup/Slave Servers
Index Search on Master/Backup Server
Search Version majority
Replication Threads 1
Hosts table (extract 1)
Host Name Server Mode Master Index/Queue Server Backup Index/Queue Server Slave Index Server for
mytrexmaster1 1st master !
mytrexmaster2 2nd master !
mytrexbackup 3rd master !
mytrexslave1 slave ! mytrexmaster1
mytrexslave2 slave ! mytrexmaster1
mytrexslave3 slave ! mytrexmaster2
mytrexslave4 slave ! mytrexmaster2
Hosts table (extract 2)
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 51 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 52/105
Host Preprocessor Mode Base Path
mytrexmaster1 index UNIX: /mypath/mytrexdir
Windows: T:
mytrexmaster2 index UNIX: /mypath/mytrexdir
Windows: T:
mytrexbackup index UNIX: /mypath/mytrexdir
Windows: T:
mytrexslave1 search UNIX: /mypath/mytrexdir
Windows: T:
mytrexslave2 search UNIX: /mypath/mytrexdir
Windows: T:
mytrexslave3 search UNIX: /mypath/mytrexdir
Windows: T:
mytrexslave4 search UNIX: /mypath/mytrexdir
Windows: T:
Features of a Blade System
If you are using a blade system and the TREX data is located in the installation directory, the column Base Path has the following value:
Hosts table (extract)
Host Base Path
mytrexmaster usr/sap/<SAPSID>/TRX<instance_number>
... ...
The rest of the configuration is the same.
Configuration of the RFC Connection
Purpose
If you want to connect the TREX system to an SAP system, you must configure an RFC connection.
Process Flow
1. Define an SAP system user .
2. Determine the connection data for the SAP system.
3. Configure the RFC connection using the TREX admin tool (stand-alone).
For more information about starting the TREX admin tool (s tand-alone), see Starting the TREX Admin Tool.
Result
For more information about the RFC connection and handling c onnection and configuration errors, see the documentation on the TREX admin tool (stand-alone). You
can find this documentation in the SAP Library at help.sap.com/nw70 ® SAP NetWeaver.
More Information
Connection to the Application
Creating an SAP System User for the TREX Admin Tool
(Standalone)
Use
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 52 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 53/105
You must create an SAP user that the TREX admin tool (standalone) can use to log on to the SAP system. In addition, the SAP user is required so that the TREX
alert server has permission to regularly test and check the RFC configuration. When doing this, the user can have been created in the default client or in another
client. In this case, make sure that you enter the associated client for the user during the configuration of the RFC c onnection in the TREX admin tool.
The TREX admin tool (standalone) is used to configure and monitor TREX. You also use this admin tool to configure the RFC connection between TREX and the
ABAP app lication that is using TREX. To use the TREX admin tool (standalone) to create the RFC destination, the admin tool requires a SAP sys tem user that you
create based on the predefined role SAP_BC_TREX_ADMIN. This user then has the authorization required to configure the RFC connection.
For more information on the SAP_BC_TREX_ADMIN role, see SAP Note 766516.
Overview of the Permissions Assigned by the SAP_BC_TREX_ADMIN Role
Type and Scope of the Permission Activity Explanation
Permission check
for RFC access
Execute Name of the RFC object to be protected: SYST,
TREX_ARW_ADMINISTRATION
Administration for the
RFC destination
Add or generate, change, display , delete, extended
maintenance
Type of entry in RFCDES: Start of an external
program using TCP/IP
Check on the transaction code at transaction launch Transaction code: SM59, TREXADMIN,
TREXADMIN_AUTH
Administrating TREX Change, disp lay, execute
ABAP: Program run checks Schedule programs for background processing,
execute ABAP program, maintain variants for and
execute ABAP program
ALV standard layout Maintain
App lication log Disp lay, delete
More Information
Configuring and Administrating the RFC Connection
Configuring the RFC Connection in the TREX Admin Tool
Procedure
Create an SAP system user for the TREX admin tool (standalone) and assign the SAP_BC_TREX_ADMIN role to this user.
1. Launch transaction SU01 (user maintenance) or choose Administration ® System Administration ® User Maintenance ® User in the SAP menu. The User
Maintenance: Initial Screen appears.
2. Enter a new user name and choose Create.
3. On the Address tab page, enter the personal data for the user.
4. On the Roles tab page, assign the SAP_BC_TREX_ADMIN role and thus the permiss ion to access the SAP system to the SAP sys tem user for the TREX
admin tool (s tandalone).
Result
This user for the TREX admin tool (standalone) now has the authorization required to configure the RFC connection.
Determining the SAP System Connection Information
UseThe TREX admin tool (stand-alone) can connect to an SAP system in two ways.
· Through a specific app lication server of the SAP system (variant A)
· Through the message server of the SAP system (variant B)
This variant uses the load-balancing function for the SAP system. The message server assigns the request from the TREX admin tool to any application
server.
Depending on the variant used, the TREX admin tool requires different connection information for the SAP system. You must determine the connection information
and specify it later in the TREX admin tool.
SAP recommends using variant B. Variant A has the disadvantage that the connection does not work if the application server is not available.
Procedure
1. Open the SAP Logon.
SAP Logon is the program that you use to log on to an SAP system.
2. Note the following connection information:
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 53 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 54/105
Connection Setup Type Required Connection Information
Through an application server (variant A) · SAP system ID (SID)
· System number
· App lication server host name
Through the message server (variant B) · SAP system ID (SID)
· Logon group, such as PUBLIC
· Message server host name
Configuring the RFC Connection in the TREX Admin Tool
Use
You work through the steps below using the TREX admin tool (stand-alone).
Configuration of the RFC connection with the TREX admin tool (stand-alone) is only available as of SAP Basis Component SAP_BASIS 6.20 SP58, 6.40
SP16, and 7.0 SP6. If you are using TREX with an SAP system based on an earlier support package, you have to configure the RFC connection manually as
described in the SAP NetWeaver 04 Installation Guide for Search and Classification (TREX) 6.1. You can find this guide on the SAP Service Marketplace at
service.sap.com/instguides ® SAP NetWeaver ®Released 04 ®Installation ®Cross-NW ®Installation Guide Search and Classification TREX 6.1.
Creating a Connection
1. In the Landscape RFC window, choose the Create Connection function.
2. Choose connection type A or B. Specify the connection data for the SAP system (see Determining the SAP System Connection Information).
3. Specify the SAP system user, the associated password, and the client that the TREX admin tool is to use to log on (see Creating a SAP System User for the
TREX Admin Tool (Stand-Alone)).
If the SAP system user in question exists in the default client, you do not need to specify the client.
Creating an RFC Destination
1. In the Landscape RFC window, choose the RFC Destination (SM59) function.
2. Enter the following parameters:
Field Entry
SAP Sys tem SAP system that you want to set up the connection to.
The list contains all SAP systems that you have registered using Create
Connection.
RFC Destination Name of the RFC destination.
Description Meaningful description of the purpose
The program ID determines under which name the TREX RFC server registers with the SAP gateway. The program ID must be unique for each SAP
gateway. The TREX admin tool ensures this by generating the program ID.
3. Dec ide which SAP gateway you want to use. You have the following options:
Option Comment
Gateway local
(Default setting)
Use local SAP gateways for the application servers.
Gateway central Use the central SAP gateway.
We advise against using a central SAP gateway for distributed TREX
systems. The central SAP gateway is a “single point of failure.”
If you choose this option, enter the following additional parameters:
● Host name (with domain name if necessary) or the IP address of the host
on which the gateway is installed.
● Name of the SAP gateway in the form sapgw<instance_number>
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 54 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 55/105
SAP advises against creating the RFC destination directly in the SAP system. The name of the RFC destination and the program ID must satisfy certain
naming conventions. The TREX admin tool ensures that these are fulfilled.
If you nevertheless create the RFC destination directly in the SAP system, note the following:
● We recommend starting the name of the RFC destination with TREX_.
● Choose the activation type Registered Server Program.
● Choose a program ID that is unique for the SAP gateway used.
● Use the RFC Destinationsfunction to register the RFC destination in the TREX admin tool.
Completing the RFC Configuration
1. In the Landscape RFC window, choose the Connect function.
The TREX admin tool creates the connection to all SAP systems that are known to it. Because the RFC configuration is still incomplete, the configuration
status is yellow or red.
2. Choose Repair All.
The TREX admin tool completes the RFC configuration and starts the TREX RFC server.
This can take several minutes. During this time, the configuration status remains yellow or red. After completion of the configuration process, the status
changes to green.
Do not choose Repair All several times in quick succession. This would trigger the configuration process more than once and delay it.
3. Check the progress by choosing Refresh to update the display.
Configuring the HTTP Connection
Use
If you want to connect the TREX system to a Java application, you must register at least one name server with the TREX Java client.
We recommend that you specify all master name servers on the client side. This increases the availability of the connection between the application and
TREX. If the Java client cannot reach one master name server, it can attempt to reach another instead.
The client-side configuration is separate from the server-side configuration. In principle, you can enter any name servers on the client side, regardless of their
server-side role.
Procedure
1. If you do not know the addresses of the master name servers, look for them in the TREX admin tool at Landscape ® Configuration:
<host>:<name_server_port><:name server mode>.
2. Change the Java client parameters for all server processes in the clus ter as follows:
More information: Specifying the Address of the TREX Name Server
a. Enter one master name server in the following parameter:
nameserver.address tcpip://<one_address>
b. Specify the TREX backup name servers in the nameserver.backupserverlist parameter. When doing so, separate the backup name servers using a
comma and use the following format: tcpip://<host1>:<port1>,tcpip://<host2>:<port2>, …
The addresses of the master name servers must be configured for all server processes in the cluster. Otherwise the connection between the J2EE Engine and
TREX cannot be established.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 55 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 56/105
Information on Breakdown of Master ServersIf you have imp lemented a system with master and backup servers and a master server breaks down, the backup server becomes active automatically. When
the master server becomes available again, the system swaps b ack to the master server.
If you create an index while the master server is unavailable, TREX proceeds as follows:
· The backup server becomes the master server for this index
· This index has no backup server
The TREX admin tool displays this in the Index Landscape area as follows:
TREX does not change this assignment, even when the master server becomes available again. You have to correct the assignment for this index as follows:
1. Start the TREX admin tool on any host in the distributed system.
2. Make sure that the master and backup servers are available.
3. Go to the Index Landscap e window.
4. In the column <light><host_name_ master_index_ server>:<port>, click on the line for the index in question. Choose Add backup here from the context menu.
5. Click on the same cel l again and choose Switch master/backup from the context menu.
The index now has a backup server, and the master and backup servers are ass igned to the index correctly.
6. Go to the Queue Landscape window.
7. Carry out the same changes that you just carried out for the index, b ut this time for the queue.
Delta Index and Index Replication Configuration
Purpose
Delta indexes speed up updates of the master indexes. Index replication transfers changes made on master indexes to slave indexes.
Delta indexes and index replication are deactivated by default. The best time for activating them depends on which of the following scenarios you have:
Scenario Procedure
Initial indexing of large data sets (more than 100,000 documents) 1. Create indexes and carry out the initial indexing of the data.
During this phase, the system only carries out indexing. It does notreplicate data.
2. Activate the delta indexes.
3. Trigger the first index replication manually.
4. Configure regular index repl ication.
No initial indexing of large data sets 1. Create indexes.
2. Configure regular index repl ication.
3. Monitor the size of the master indexes during routine operation. Activate
the delta indexes when a master index reaches a certain size.
The sections below contain background information on delta indexes and index replication, and describe the configuration required.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 56 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 57/105
Delta Index Configuration
Purpose
TREX provides the option of activating delta indexes. This can speed up the update of the index.
This documentation contains:
● General information on the delta index
● Information on activating the delta index
● Information on integrating the delta index into the main index
Delta IndexWhen TREX updates an index, it rewrites the majority of the index files. If the indexes are large this process can take a long time and generate a high system
load.
TREX allows you to activate a delta index in order to speed up the update. The delta index is a separate index that TREX creates in addition to the main index.
The main index and its delta index only differ TREX-internally. Outside of TREX they form a unit.
If the delta index is activated changes flow into the delta index. Because the delta index is smaller than the main index, fewer documents are affected by the
update. The delta index can therefore be updated more quickly.
The delta index is deactivated by default. The following rules are valid for its activation:
· If you have a single host system the activation is optional. However, it is recommended if the main index has reached a certain size. If you activate the deltaindex to soon, performance does not improve.
· If you have a distrib uted TREX sys tem the activation is obligatory. However, you still only activate it once the main index has reached a certain size.
Activating the delta index doesn't only sp eed up the update of the master index - it also enables fast index replication with a low network load.
When index replication takes place the master index server replicates all changed master index files. Because the delta index consists of fewer files, it
naturally has fewer files to replicate. This means that index replication is quicker. Moreover, if you have decentralized data storage the network load is also
less b ecause TREX has to copy less files to the slave hosts.
The delta index only speeds up the update if it is kept small. If it becomes too large, it no longer improves performance. When it reaches a certain size you have
to integrate it in the main index. You can integrate the delta index manually or configure TREX so that TREX regularly integrates it automatically. TREX creates a
new delta index automatically when the integration of the previous delta index is complete.
Activating the Delta Index
Use
The delta index is deactivated by default. You can activate it using the TREX admin tool. You activate it per index, not globally.
The best time for activating it depends on your indexing process.
SAP recommends the following:
· Initial indexing of large document sets
Activate the delta index after the initial indexing run. If you do not do this, the delta index grows too quickly and you have to integrate it into the main index earlier than you would wish. This means that you need twice the indexing time: Firstly to index the documents in the delta index, and then to integrate the delta index into
the main index.
· No initial indexing of large data sets
Monitor the size of the main index during routine operation. Activate the delta index if the main index reaches 100,000 to 1,000,000 documents or 500 MB.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 57 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 58/105
Procedure
1. Go to the window Index Admin ® Index Info in the TREX admin tool.
2. Select the index that you want to activate the delta index for. Choose Delta Index On.
Integrating a Delta Index into the Main Index
Use
A delta index only speeds up the update of the corresponding index if it is small. If it becomes too large, you have to integrate it into the main index. After the
integration has taken place TREX creates a new delta index.
The integration process involves TREX rewriting all main index files. The duration of the integration process depends on the size of the main index. It can last a few
minutes or several hours.
In a distributed system the entire main index has to be replicated after the integration has taken place. This replication takes about the same amount of time as
the initial replication.
The index server cannot index new documents during the integration of the delta index. This has the following effects:
· If indexing takes place with a queue server, the queue server retains the documents until the integration process has been completed. Then the queue server
transmits the documents to the index server.
· If indexing takes place without a queue server, the app lication can continue to send indexing requests to the index server. However, the index server only
processes them after the completion of the integration process. This means that it takes longer for indexing requests to be processed and for the application
to receive the relevant response.
You can trigger the integration process manually or carry it out at defined time intervals. There are two difference procedures for time-dependent integration. The
procedure that you use depends on whether indexing takes place with or without a queue server (QS). The table below gives an overview of the procedures.
Use with
Procedure Indexing with QS Indexing without QS
Manual ! !
Time-dep endent using the queue server !
Time-dependent using the Python scheduler !
We recommend the following for the time of the integration:
· Trigger the first integration process if the delta index is bigger than 500 MB. You can find out the size of the delta index in the window Index Admin ® Index Info
in the TREX admin tool.
· The integration process should take place at times when the system is not too busy.
· Do not carry out the integration process too often. With large indexes, the integration and subsequent replication of the main index takes a corresponding amount
of time.
Integrating the Delta Index Manually
1. Go to the window Index Admin ® Index Info in the TREX admin tool.
2. Select the index in question and choose Merge Delta Index.
Integrating the Delta Index Time-Dependently Using the Queue Server
In the queue parameters enter the time for the integration in Merge Time for Delta Index.
Use All (4:00) to trigger replication every morning at 4am.
You do not need to coordinate the integration time with other activities carried out by the queue server and index server. If the activities collide, the index server
coordinates when it carries out which action.
For more information on changing queue parameters, see Configuring Queue Parameters.
Integrating the Delta Index Time-Dependently Using the Python Scheduler
Change the following configuration files on all master name servers:
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 58 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 59/105
Configuration file Change
TREXDaemon.ini 1. Activate the Python scheduler by changing the TREX configuration file
TREXDaemon.ini in the TREX admin tool, menu path Landscape ® Ini as
follows:
[daemon]
programs=<other_sections>,cron
2. Once you have saved the changes, the TREX admin tool asks you
whether it should trigger reconfiguration so that the changes to the configuration
file take effect.
Confirm this query by choosing Yes.
crontab.ini Remove the comment sign from the following line:
<schedule> python mergeDeltaIndex.py silent allIndexes=1 ''
Modify the schedule if necessary. For information on syntax and for examples,
see the configuration file.
Index Replication Configuration
Purpose
Index replication transfers changes made on master indexes to slave indexes. The sections below describe the process and configuration of index replication.
Index Replication ProcessIndex rep lication takes p lace in a system with master and slave index servers. The master index server manages the original indexes and the slave index
servers access index copies. Replication makes sure that changes to the master indexes are transferred to the index copies.
Replication takes place in different ways depending on the type of data storage.
Replication with Decentralized Data Storage
The initial replication of an index takes place as follows:
1. The master index server generates an index snapshot. The name server tells the slave index servers that the index snapshot is availab le. The slave index
servers request the snapshot from the master index server.
2. When all slave index servers have the index snapshot, they integrate it into their index one after the other. The slave index server currently integrating the
files has the status ‘inactive’. This means that it is not available for searching. It receives the status ‘active’ again as soon as the integration has been completed.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 59 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 60/105
Because all index files are copied for the initial replication, the process can take a long time if the index in question is large.
In subsequent replications the system only replicates the changed index files. This is normally a smaller amount of data than for the initial replication, and
subsequent replications are therefore faster. The process flow is as follows:
1. The master index server compares the master index and the index snapshot in order to determine changed index files. It then updates the index snapshot.
The name server tells the slave index servers that a new index version is available. The slave index servers request the changed index files from the master
index server.
2. When all slave index servers have the changed index files, they integrate them into their index one after the other. The slave index server currently integrating
the files has the status ‘inactive’. This means that it is not available for searching. It receives the status ‘active’ again as soon as the integration has been
completed.
Replication with Centralized Data Storage
The initial replication of an index takes place as follows:
1. The master index server generates a complete copy of the index (index snapshot).
2. The slave index servers connect to the index snapshot and use this as their slave index.
If the master index changes and replication needs to take place again, the following occurs:
1. The master index server generates a second index snapshot.
2. The slave index servers change to the second index snapshot.
All subsequent replications take place as follows:
1. The master index server determines the changed index files b y comparing the master index with the index snapshot that the slave index servers are not
currently using. It then updates this index snapshot.
2. The slave index servers change to the updated index snap shot.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 60 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 61/105
Triggering Index Replication
Use
By default index replication is deactivated. You can trigger replication in various ways. The table below gives an overview of the methods and of when you should
use them.
Procedure Effect Use with
Ind exing with QS* Indexing without QS*
1. Manual replication Index changes are available
for searching when you have
replicated the index.
You use this method for the
initial replication.
! !
2. Automatic replication following
optimization in the queue
server (Replicate after
Optimize).
Following the optimization of
documents, index changes
are available in the queue
server for the search.
!
3. Automatic replication following
every index update
All index changes are
available quick ly for the
search.
! !
4. Time-dependent replication
using the queue server
Index changes are available
for searching when the next
replication has taken place.
Replication takes p lace
regularly according to a
defined schedule.
!
5. Time-dependent replication
using the Python scheduler
Index changes are available
for searching when the next
replication has taken place.
Replication takes p lace
regularly according to a
defined schedule.
!
6. Replication triggering by the
application using TREX
TREX provides the
(ABAP/Java) app lication with
methods for triggering
replication.
! !
*QS = queue server
The system replicates the entire index for the initial replication. In subsequent replications the system only replicates the changed index files. The duration of the
replication and the generated system load depends on the following factors:
● Are the indexes s tored centrally or are they distributed?
With decentralized data storage the replication generates a higher net load because the system has to copy the indexes to the slave hosts.
● How often is the index updated?
● How many index files need to be replic ated? This depends on the size of the index or delta index.
● How many indexes need to be replicated?
● How large are the indexes? What type of documents are indexed (documents with attributes only, documents with attributes and text content, or only
documents with text content). Does the index contain text-mining information?
● How quick ly should the updated information be availab le for searching?
In order to determine the optimum time for replication, you have to weigh up the required topicality against the system load generated.
We recommend that you carry out the initial replication manually, since it can last a lot longer than subsequent replications.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 61 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 62/105
If large indexes need to be replicated frequently, it may not be possible for the system to keep to your configured interval for replication. If this is the case, the
system carries out automatic replication at the next possible point in time.
1. Replicating an Index Manually
1. Go to the Index Landscape window in the TREX admin tool.
2. Carry out one of the following steps :
○ To replicate all indexes, choose Replicate All.
○ To replicate a single index, select the index in question and choose Replicate Index from its context menu.
2. Replicating the Index Automatically Following Optimization in the Queue Server (Replicate
after Optimize).
Set the Replicate After Optimize queue parameter to On.
For more information on changing queue parameters, see Configuring Queue Parameters.
We recommend that you arrange for the index to be replicated automatically after every update - as described in point 3 - rather than using the Replicate After
Optimize procedure. Replicate After Optimize only replicates changes involving the TREX queue server. Changes made without involving the queue server,
such as changes to index properties and taxonomies, are not replicated.
3. Replicating an Index Automatically Immediately After Every Update
When you create an index, you can arrange for it to be replicated automatically immediately after every update. To do so, proceed as follows.
1. Go to Index area in the Landscape ® Configuration window of the TREX admin tool.
2. Activate automatic index replication by selecting the Auto Replication checkbox.
You can change this setting later on if you want.
3. Go to the Index ® Landscape window in the TREX admin tool.
4. Use the secondary mouse button to clic k on the index whose index replication settings you want to change.
5. Choose Landscape Configuration and then Enable Auto Replication or Disable Auto Replication.
This way of triggering index replication is particularly important in scenarios that do not use a TREX queue server.
4. Replicating the Delta Index Time-Dependently
Enter the time at which the replication is to take place in the Replication Time queue parameter .
Use All-3 to trigger replication every three hours. Use All (3:00) to trigger replication every morning at 3am.
For more information on changing queue parameters, see Configuring Queue Parameters.
5. Using the Python Scheduler to Schedule Index Replication
Change the following configuration files on all master name servers:
Configuration file Change
TREXDaemon.ini If the Python scheduler is not yet active, activate it now:
[daemon]
programs=<other_sections>,cron
crontab.ini Remove the comment sign from the following line:
<scheduler> python replicate.py silent allIndexes=1 ''
The default setting causes the system to check for changes to an index every
5 minutes. If there are no changes, the system takes no further action. If
changes have taken place, they are replicated.
Modify the schedule if necessary. For information on syntax and for examples,
see the configuration file.
Result
You can monitor index replication in the TREX admin tool (stand-alone) in the Index Landscape window. If necessary, you can terminate replications in progress
there.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 62 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 63/105
Controlling the Replication Load
Use
You can define how many indexes the system replicates in parallel. This allows you to influence the following:
· The system load on the master index server
· If you are using decentralized data storage, the network load that arises due to copying the changed index files
The higher the number of indexes replicated in parallel, the greater the load. The lower the number, the lower the load. However, this causes replication to take
longer.
Procedure
1. Go to the window Landscape Configuration in the TREX admin tool.
2. In the field Replication Threads, enter how many indexes the system is to replicate in p arallel.
Configuring Topicality of Search ResultsUse
You can define how up-to-date you want the searched index to be. There are the following options:
Option Meaning
majority The search takes place on the index version available on the majority of slave
index servers. If two index versions are equally available, TREX uses the
more up-to-date of the two.
Advantage: The search queries are distributed. This setting gives the highest
availability for the search because during replication TREX only switches to
the new version from the old version when the majority of the slave index
servers have the new version.
Disadvantage: The search may not take place using the most up-to-date data.
majority is the default setting.
latest The search takes place using the most up-to-date index that has been
released for replication.
Advantage: The search takes place us ing the most up-to-date data.
Disadvantage: This setting can hamper search performance. TREX always
uses the up-to-date version, even if only a few (or even no) slave index
servers have the most up-to-date version. If no slave index server has the
most up-to-date version, the master index server receives the search queries -
even if it is locked for searching. This ensures that search queries are always
answered and the application receives no error message.
You can change the standard configuration in the following two ways:
· For all new master indexes
· For exis ting master indexes
Changing the setting for all new master indexes
1. Go to the window Landscape Configuration in the TREX admin tool.
2. Choose the required setting for Search Version.
Changing the setting for an existing master index
1. Go to the Index Landscape window in the TREX admin tool.
2. Select the index in question and choose Landscape Configuration from the context menu.
3. Choose the required setting for Search Version.
Changing a Distributed System
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 63 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 64/105
Purpose
The sections below describe the changes that you can make to a distributed system after the installation. For all changes, note the Constraints that are relevant for
distributed systems.
Adding and Removing Hosts
Features
You can use the TREX admin tool (stand-alone) to add or remove a host (server or blade server) to/from a TREX landscape. You do this if you have configured a
distributed TREX landscape.
Prerequisites
Make sure that you will still have enough CPU capacity and memory for your TREX landscape after removing a host.
Process Flow
● Removing a Host
○ Removing a host temporarily
○ Removing a host permanently
● Adding a Host
Removing a Host
Use
You can use the TREX admin tool (stand-alone) to remove a host from a TREX landscape temporarily or permanently.
Removing a Host Temporarily
1. Go to the Landscape ® Configuration window in the TREX admin tool (stand-alone).
2. Remove the Master Index/Queue Server indicator for the host that you want to remove from your TREX landscape temporarily.
3. Choose Check and then Deploy to save your change.
4. In the Landscape ® Reorg window, go to the Plan tab page.
5. Choose Start Reorg to start the required reorganization of your TREX landscape.
The reorganization process distributes indexes that are located on the removed host to other hosts. When the reorganization is finished, there are no more
indexes on the host in question.
If you select the Split/Merge Indexes checkbox before performing the reorganization, the system not only reorganizes the indexes but also distributes and
splits the logical indexes again. During this type of reorganization, the system also recalculates the number of parts of which a logical index consists.
Note that this reorganization can cause a complete reindexing process that can last as long as the initial indexing run. During this period, the system cannot
perform indexing runs and searching is limited.
To add the host to your landscape again, proceed as described in Adding a Host.
Removing a Host Permanently1. Stop TREX on the host that you want to remove from your landscape.
The host is highlighted in red as soon as you have stopped it.
2. Go to the Landscape ® Configuration window in the TREX admin tool (stand-alone).
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 64 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 65/105
3. Select the host that you want to remove permanently .
4. Choose Remove Host.
You are asked whether you want the indexes located on this host to be moved automatically.
5. Choose Move if you want this to happen.
The system removes all the indexes from the host in question.
After permanently removing a host, do not simply carry out an organization. For performance reasons, you should completely redistribute the indexes. To do so,
select the Split/Merge Index checkbox in the Landscape ® Reorg window of the TREX admin tool (stand-alone) and then s tart the reorganization. During thistype of reorganization, the system also recalculates the number of parts of which a logical index consists.
Note that this reorganization can cause a complete reindexing process that can last as long as the initial indexing run. During this period, the system cannot
perform indexing runs and searching is limited.
Adding a Host
Use
You use the TREX admin tool (stand-alone) to add a new host (server or server blade) to your TREX landscape.
Procedure
1. Start TREX on the host that you want to add to your TREX system landscape.
○ Install a TREX instance on the server
If you have not yet installed a TREX instance on the host that you want to add to your TREX landscape, do so before continuing with the procedure.
For more information about the installation of TREX, see the SAP NetWeaver 7.0 Search and Classification (TREX) Single Host installation guide. The guide is
located in the SAP Service Marketplace at service.sap.com/installNW70.
○ Install a TREX instance on the server blade
For a distributed TREX installation with server blades, use the cloneInst.py script to generate a new TREX instance on the server blade.
See: Activating the Configuration Clones for Server Blades2. Go to the Landscape ® Configuration window in the TREX admin tool (stand-alone).
3. Add the server or server blade to your TREX landscap e as follows:
○ Following the installation of an additional TREX instance on a server, execute the Add host command (see Adding a Host)
○ The cloneInst.py script automatically adds the server blade to the landscape
4. Select the Master Index/Queue Server indicator for the host that you want to add to your TREX landscape.
5. Choose Check and then Deploy to save your change.
6. In the Landscape ® Reorg window, go to the Plan tab page.
7. Choose Start Reorg to start the required reorganization of your TREX landscape.
After adding a host (server or server blade) to your TREX landscape, do not simply carry out a reorganization. For performance reasons, you should completely
redistribute the indexes. To do so, select the Split/Merge Index checkbox in the Landscape ® Reorg window of the TREX admin tool (stand-alone) and then
start the reorganization. During this type of reorganization, the system also recalculates the number of parts of which a logical index consists.
Note that this reorganization can cause a complete reindexing process that can last as long as the initial indexing run. During this period, the system cannot
perform indexing runs and searching is limited.
Changing Hosts
Purpose
You can make the following changes in a distributed system:
· Add master, backup , and slave hosts
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 65 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 66/105
· Remove backup and slave hosts
· Replace backup or slave hosts
For details, see the relevant sections.
Reorganization Function in the TREX Admin Tool
You can use the reorganization function in the TREX admin tool (see Landscape ® Reorg) to automatically redistribute and optimize your TREX system landscape.
This function redistributes the TREX indexes among the master hosts and removes any backup and slave hosts that are not longer required (see Optimizing the
Landscape Using the Reorg Function).
Adding a TREX Index Server
Use
You can add additional index servers to a TREX host on which an index server is already running. You do this to distribute large indexes over multiple index
servers.
Each index server can use a maximum of 2GB of memory. Do not configure more index servers than can be supported by the memory on your host.
Prerequisites
1. Open the TREXDaemon configuration file in a text editor.
This file is located in the <TREX_DIR>/<trex_host_name> directory.
2. In the [daemon] section, add one or more index servers beneath the programs parameter: programs=nameserver,p reprocessor1,indexserver1,
indexserver<next_number>,queueserver,alertserver.
Depending on the hardware of your host, one or two index servers are entered in the file by default.
3. Copy the [indexserver1] section and rename the copied section as [indexserver <next_number> ].
Repeat this procedure for each of the index servers that you want to add. Choose a new value for the port number of the additional index server (arguments=-
port <index_server_port> parameter).
Determine the port of the first index server according to the following convention: <index_server_port>=3<TREX_instance_number>03. Increase the values
for the port numbers in steps of ten to avoid conflicts.
If your TREX instance number is 47:
[indexserver1]
arguments=-port 34703
...
[indexserver2]
arguments=-port 347 13
...
[indexserver3]
arguments=-port 347 23
...
[indexserver4]
arguments=-port 34733
...
4. Stop and start TREX so that your changes take effect.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 66 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 67/105
Optimizing the Landscape Using the Reorg Function in the TREXAdmin Tool
UseYou can use the reorganization function in the TREX admin tool (see Landscape ® Reorg) to automatically redistribute and optimize your TREX system landscape.
This function redistributes the TREX indexes on the master hosts and removes backup and slave hosts that are no longer required.
For details about using the Reorg functions in the TREX admin tool, see Reorganization of the TREX System Landscape.
Procedure
1. Check the roles of the servers in the TREX admin tool at Landscape ®Configuration. If necessary , correct the roles and the choose Deploy.
2. Switch to Landscape ® Reorg:
The system automatically calculates a new, optimized distribution of your TREX system landscape according to the newly-defined roles. The new distribution of the
servers is then displayed at Landscape ® Reorg® Plan and at Landscape ® Reorg ® Usage By Service.
3. Start the reorganization of the landscap e by choosing Start Reorg in Landscape ® Reorg ® Summary.
The progress of the reorganization is displayed at Landscape ® Reorg ® Plan.
Result
The TREX system landscape has been reorganized and optimized according to your settings.
Reorganization of the TREX System Landscape
UseYou can use the Reorg function to distribute the indexes in a TREX system landscape among the available hosts to optimize their memory requirements.
The reorganization aims to achieve a balanced memory load and CPU load for the TREX system landscape.
Integration
This function is available in the TREX admin tool (stand-alone).
You can also launch it from the BI Accelerator Monitor. However, several screens are used in this case.
The TREX alert server contains the reorg check. When this check runs, you are automatically informed by e-mail (if configured), if the system recommends a
reorganization.
Features
Based on different key figures, TREX calculate whether a reorganization should be performed. The key figures are summarized on the Summary tab page and
displayed in detail on the Usage by Service (I) and (II) and Usage by Index tab pages.
The Landscape: Reorg window contains the following tab pages:
Overview of Tab Pages
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 67 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 68/105
Tab Page Description
Summary Displays whether TREX recommends a reorganization.
The lower table displays key figures from which TREX calculates the
percentage improvement that could be achieved by a reorganization. These
estimates are compared with fixed program-internal values. If the estimated
improvement is high enough, TREX recommends the reorganization.
If TREX recommends a reorganization (summary = yes), you can start the
reorganization immediately or at a later time.
To start the reorganization immediately, choose Start Reorg.
To start the reorganization later, specify the date and time in the following
format: YYYY-MM-DD HH:MM:SS. Then choose Start Reorg.
Plan Displays the various steps that TREX would perform during a reorganization or
that are being performed during a reorganization
In addition, you see the steps and the status of the latest reorganization.
Usage by Service (I) Displays the memory load and CPU load of the hosts in graphical form.
Based on these values and irrespective of the selected algorithm, TREX
calculates whether a reorganization is necessary.
For example, TREX recommends a reorganization if the distribution of memory
is unbalanced (identifiable by the different heights of the bar displays).
You can use a filter to show and hide the CPU load.
You cannot perform any other activities on this tab page, it is mainly for
information purposes.
Usage by Service (II) Displays various key figures for the hosts in table form.
You cannot perform any activities on this tab page, it is mainly for information
purposes.
Usage by Index Displays various key figures for all indexes in table form.
You can use a filter to define which indexes should be displayed.
You cannot perform any other activities on this tab page, it is mainly for
information purposes.
Interactive Reorg Displays various current key figures in table and graphical form.
On the graphic v iew, you can distribute the indexes manually using
Drag&Drop. This function is intended for experts.
Options You can define the algorithm and various parameters to be used for the
reorganization.
We recommend that you use the memory algorithm. All other algorithms are
used for test purposes.
Normally, you do not have to make any changes on this tab page.
A reorganization can be necessary in the following circumstances:
● You make changes to your TREX system landscape. For example, you remove index servers or add new ones.
● The current size of indexes does not match the initial size estimates.
If indexes are moved during the reorganization, no update of the affected indexes is possible. Indexing is interrupted for the duration of the reorganization. The
affected indexes are displayed with a yellow traffic light in the Index: Landscape window.
Parameter Overview
The following parameters are available on the Options tab page. You do not have to make any changes by default.
Reorganization Parameters
Parameter Description
Split Indexes Specifies whether indexes are split into logical indexes with more than one
part if the defined size is exceeded.
This specification is in KB.
This parameter is deactivated by default.
Merge Indexes Specifies whether parts of indexes are merged if the size falls below a defined
value.
This specification is in KB.
This parameter is deactivated by default.
Small Indexes Specifies whether small indexes are distributed equally among the available
hosts if the size falls below a defined value.
This specification is in KB.This parameter is ac tivated by default and has the size 1,000 KB.
Remove Temporary Indexes If it is activated, temporary indexes are deleted during the reorganization.
This parameter is ac tivated by default.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 68 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 69/105
Activities
Start the TREX admin tool (stand-alone) and navigate to the Landscape: Reorg ® Summary window. If TREX recommends a reorganization (summary = yes), start
the function by choosing Start Reorg. The display switches automatically to the Plan tab page. The window displays which steps are being performed. You can
choose the F5 button to update the display.
To cancel the reorganization, choose Cancel Reorg.
The reorganization is complete once all planned steps have been performed. The Summary tab page displays the status done.
Adding a Master Host
Use
You can add a new master host to a distributed system. You need to do this if the capacity of the existing master hosts is insufficient.
This has the following effect on the distribution of the indexes:
· The assignment of exis ting indexes remains unchanged.
· The new master index server receives all new indexes until all master index servers have the same number of indexes . TREX then distributes the new
indexes among all master index servers according to a round robin procedure.
The same principle is used for queues.
If you previously had one master host and add another, the indexes are distributed as follows:
If you are using backup hosts, the new master host needs to receive a backup host.
· If there is one backup host for all master hosts, this backup host is automatically made backup host for the new master host.
· If each master host has its own backup host, you have to add a new backup host for the new master host.
Procedure
1. Install TREX on the host that you want to add.
2. If you are using centralized data storage: Mount the central TREX data directory (UNIX) or define it as a network drive (Windows).
3. Start TREX on the new host.
4. Start the TREX admin tool on a host that is already configured in the distributed system.
5. Go to the Landscap e Configuration window.
6. Use Add Host to add the new host.
7. Configure the new host in the Hosts table as follows:a. Mark it as a master index/master queue server.
b. If you are using centralized data storage: In the column Base Path enter the central TREX data directory on the file server.
8. If you are using backup hosts and every master has its own backup host: Add a new backup host.
9. Check the configuration. If the check does not find any errors, activate the configuration using Deploy.
Adding a Backup Host
Use
You can add a new backup host to a distributed system. You need to do this if you have added a new master host and want it to have its own backup host.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 69 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 70/105
Procedure
1. Install TREX on the host that you want to add.
2. Mount the central TREX data directory (UNIX) or define it as a network drive (Windows).
3. Start TREX on the new host.
4. Start the TREX admin tool on a host that is already configured in the distributed system.
5. Go to the Landscap e Configuration window.
6. Use Add Host to add the new host.
7. Select Assign Existing Indexes/Queues to New Slave/Backup Servers.
8. Configure the new host in the Hosts table as follows:
a. In the Backup Index/Queue Server for… column, specify the master host to which the host belongs.
b. In the Base Path column, enter the central TREX data directory on the file server.
9. Check the configuration. If the check does not find any errors, activate the configuration using Deploy.
Adding a Slave Host
Procedure
1. Install TREX on the host that you want to add.
2. If you are using centralized data storage: Mount the central TREX data directory (UNIX) or define it as a network drive (Windows).
3. Start TREX on the new host.
4. Start the TREX admin tool on a host that is already configured in the distributed system.
5. Go to the Landscap e Configuration window.
6. Use Add Host to add the new host.
7. Select Assign Exis ting Indexes/Queues to New Slave/Backup Servers. Otherwise the new slave host does not receive existing indexes.
8. Configure the new host in the Hosts table as follows:
a. In the Slave Index Server for… column, specify the master host to which the host belongs.
b. If you are using centralized data storage: In the column Base Path enter the central TREX data directory on the file server.
9. Check the configuration. If the check does not find any errors, activate the configuration using Deploy.
Removing a Backup Host
Use
You can remove a backup host from a distributed system. You may want to do this if you used the host for test purposes and no longer need it in the distributed
system.
Procedure
1. Start the TREX admin tool on any host in the distributed system.
2. Go to the Index Landscape window. Check the column <light><host_name>:<index_server_port> for the host that you want to remove.a. Check which indexes the host is ass igned to as the backup index server.
If you remove this host, TREX automatically removes these assignments. The affected indexes then no longer have a backup index server. If you want indexing to
be highly available, you need to assign these indexes to another backup index server.
b. Check whether the backup index server is currently active.
This is displayed in the column using the entry +backup. You should not remove the host if the backup index server is active. If you remove the host anyway, the
system does not switch automatically to using the master index server. The master index server is only assigned to the affected indexes when it is next started.
c. Check whether the host is assigned to any indexes as the master index server.
If this is the case, you have to assign another master index server to the indexes before removing the host.
d. Check whether the host is assigned to any indexes as the slave index server.
If you remove the host, TREX removes these assignments automatically. You have to assign these indexes to another slave index server too.
See also: Changing Index Assignments
3. Go to the Queue Landscape window. Check the same things for the queues as you just checked for the indexes.
See also: Changing Q ueue Ass ignments
4. If you are sure that you want to remove the host, go to the Landscape Configuration window.5. Select the host in the Hosts table and then choose Remove Host.
6. Check the configuration. If the check does not find any errors, activate the configuration using Deploy.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 70 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 71/105
Result
The TREX instance is still installed on the removed backup host. The host may still contain configuration data with information on the distributed system. However,
since these configuration files are not consistent, the TREX instance on this host will normally not start any longer. You should therefore deinstall this TREX
instance.
Removing a Slave Host
Use
You can remove a slave host from a distributed system. You may want to do this if you used the host for test purposes and no longer need it in the distributed
system.
Procedure
1. Start the TREX admin tool on any host in the distributed system.
2. Go to the Landscape Configuration window.
3. Select the slave host that you want to remove in the Hosts table. Remove the selection in the column Slave Index Server for.4. Remove the host from the landscape using Remove Host.
5. Check the configuration. If the check does not find any errors, activate the configuration using Deploy.
Result
TREX is still installed on the removed slave host. The host may still contain index copies and configuration files with information on the distributed system.
However, since these configuration files are not consistent, the TREX instance on this host will normally not start any longer. You should therefore deinstall this
TREX instance.
Replacing a Backup or Slave Host
Use
You can replace a backup or slave host with a new host. You may want to do this if the current host needs to be maintained and will therefore be unavailable for a
while.
Replacing a Backup Host
1. Add a new backup host to the distributed system.
2. Remove the p revious backup host from the distributed system.
Replacing a Slave Host1. Remove the previous slave host from the distributed system.
2. Add a new slave host to the distributed sys tem.
Changing Index Assignments
Use
When you create an index, TREX assigns a master index server and slave index server to it. If you are using backup index s ervers, TREX also assigns a
backup index server to the index.
You can change these assignments if necessary. You may want to do this if you need to remove a host from the distributed system and assign the indexes to
other servers first.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 71 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 72/105
Prerequisites
You are using centralized data storage.
Procedure
1. Go to the Index Landscape window.
2. In the table select the index whose assignment you want to change.
3. Click in a column relating to a host. Choose the required function from the context menu.
Function Description
Move master here Assigns another master index server to the index.
Switch master/backup Switches the master and backup index servers. The master index server is
then used as the backup index server for this index (and vice versa).
Remove this backup Removes the assignment of index to backup index server.
Add backup here Assigns a backup index server to the index.
Remove this slave Removes the assignment of index to slave index server.
Add slave here Assigns a slave index server to the index.
The functions only change the assignment of index to server. The indexes are not physically moved.
You can also change the assignments if the currently assigned master, backup, or slave index server is active at that point in time. The currently ass igned
server completes its current activity b efore the change takes effect.
Changing Queue Assignments
Use
When you create an index, TREX automatically creates a corresponding queue and assigns the queue to a master queue server. If you are using backup queue
servers, TREX also assigns a b ackup queue server to the queue.
You can change these assignments if necessary. You may want to do this if you need to remove a host from the distributed system and assign the queues to other
servers first.
Prerequisites
You are using centralized data storage.
Procedure
1. Go to the Queue Landscape window.
2. In the table select the queue whose ass ignment you want to change.
3. Click in a column relating to a host. Choose the required function from the context menu.
Function Description
Move master here Assigns another master queue server to the queue.
Switch master/backup Switches the master and backup queue servers. The master queue server is
then used as the backup queue server for this queue (and vice versa).
Remove this backup Removes the assignment of queue to backup queue server.
Add backup here Assigns another backup queue server to the queue.
The functions only change the assignment of queue to server. The queues are not physically moved.
You can also change the assignments if the currently assigned master or backup queue server is active at that point in time. The currently assigned master or
backup server completes its current activity b efore the change takes effect.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 72 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 73/105
Allowing Searching on Master Indexes
Use
In a distributed system, you cannot search on the master indexes by default. Search requests are answered only by slave index servers, and not by the master
index servers.
The default configuration has the following advantages:· Faster indexing
The resources on the master index server do not need to be shared among indexing and searching processes.
· Less main memory requirement
A write variant and a read variant exist for each master index. If the master index server only carries out indexing, only the write variant has to be loaded to
the main memory. If the master index server carries out indexing and searching, both variants have to be loaded to the main memory.
You can change the default configuration so that the master index servers are also used for searching. This makes sense in the following cases:
· You are able to ensure that the master index servers do not index and search at the same time. This may be the case, for example, if indexing always takes
place at night when there are no users using the system for searching.
· You have static indexes. These are indexes that you have created and intend to update rarely (for example, every three months).
You can change the standard configuration in the following two ways:
· For all new master indexes
· For exis ting master indexes
Changing the setting for all new master indexes
1. Start the TREX admin tool on any host in the distributed system.
2. Go to the Landscape Configuration window.
3. Select Search on Master/Backup Server
4. Activate this change by choosing Deploy.
Changing the setting for an existing master index
1. Start the TREX admin tool on any host in the distributed system.
2. Go to the Index Landscape window.
3. Select the index in question and choose Index Properties from the context menu.
4. Select Search on Indexer (Master/Backup)
Changing Default Directories for Indexes, Snapshots, or Queues
Use
There are default directories for indexes, snapshots, and queues. TREX creates new data in these directories. You can change the default directories. You may
want to do this if you are running out of disk space in the existing default directories.
This change has no effect on existing indexes, snapshots, or queues. They remain in the previous default directories. TREX creates new data in the new default
directories.
If you want to move existing indexes, snapshots, or queues, contact SAP Support.
Procedure with Centralized Data Storage
On UNIX
1. Create a new TREX data directory on the file server .
2. Mount the new directory.
3. Start the TREX admin tool on any host in the distributed system.
4. Go to the Landscape Configuration window.
5. Spec ify the new directory for all hosts in the Basepath column of the Hosts table.
6. Activate this change by choosing Deploy.
On Windows
1. Create a new TREX data directory on the file server .
2. Edit the configuration file TREXDaemon.ini on all hosts that belong to the distributed system. Define the new directory as a network drive.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 73 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 74/105
[mappings]
map_t=\\myfileserver\myoldtrexshare
map_u=\\myfileserver\mynewtrexshare
For more information, see Defining the Network Drive.
3. Stop TREX on all hosts that belong to the distrib uted system. Restart TREX.4. Start the TREX admin tool on any host in the distributed system.
5. Go to the Landscape Configuration window.
6. Spec ify the new network drive, or a subdirectory thereof, for all hosts in the Basepath column of the Hosts table.
7. Activate this change by choosing Deploy.
Procedure with Decentralized Data Storage
1. Create a new TREX data direc tory on the host in question.
2. UNIX only: Make sure that the directory belongs to the user SAPService<SAPSID>.
3. Start the TREX admin tool on any host in the distributed system.
4. Go to the Landscape Configuration window.
5. Spec ify the new directory for the affected host in the Basepath column of the Hosts table.
6. Activate this change by choosing Deploy.
Distributed Preprocessing of Documents
Purpose
Indexing is a complex p rocess consisting of several phases. One phase is the preprocessing of documents by the preprocessor. Preprocessing includes the
following steps:
· Loading documents if the appl ication transmitted them as URIs.
· Filtering
· Carrying out a linguistic analysis
Preprocessing can take a similar amount of time and use similar system resources to the actual indexing process. The filtering of a large number of large
documents that are not in text or HTML form can be particularly time- and resource-consuming (for example, large PDFs).
In order to increase throughput in preprocessing, you can distribute the preprocessing among multiple hosts. For example, you can use one host (or more than
one) exclusively for preprocessing documents. You do this if there are a large number of documents to be preprocessed for the initial indexing run.
The following sections contain information on the distributed preprocessing of documents.
· The section Fundamentals explains the preprocessing flow for indexing. It also tells you about distribution options and how to control load distribution and
performance.
· The section Configuration explains how to configure distributed preprocessing.
The preprocessor is involved in p rocessing search and text-mining requests as well as in indexing. In all of these processes, the preprocessor has the task of
preparing the actual preprocessing.
The sections below only relate to the preprocessing of documents for indexing. The role of the preprocessor in processing search and text-mining requests is
not described.
FundamentalsThe following sections provide fundamental information on the topics below.
· Preprocessing Flow· Distributing Preprocessing
· Load Distribution and Performance
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 74 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 75/105
Preprocessing FlowThe graphic below depicts the most important steps that take place immediately before, during, and after preprocessing.
The graphic depicts the process flow of the application transmitting the URI of a document to TREX. If the application transmits the document directly, the step
‘Load document (HTTP/HTTPS Get)’ does not take place.
The application sends indexing requests to the TREX Web server or TREX RFC server. This server then forwards the requests to the queue server. The queue
server assigns the requests to the correct queues and distributes the requests among one or more preprocessors. The actual preprocessing of documents then
takes place on the preprocessor(s).
When the preprocessing has been completed, the preprocessor passes the analyzed document to the queue server. The queue server collects the documents
and, depending on its configuration, triggers further processing on the index server.
How Does the Distribution of Documents Take Place?
The distribution of documents among the preprocessors is controlled by the name server. The distribution takes place according to a round robin procedure that
takes the number of times that a preprocessor has been accessed into account. Preprocessors that have been accessed less often are preferred when
distributing documents.
The process flow is as follows:
1. When a queue server receives a document it assigns it to a preprocessor client.
2. The preprocessor client asks the name server for the address of a prep rocessor.
3. The name server returns the preprocessor that has been accessed least often.
4. The preprocessor client forwards the document to the preprocessor and waits for a response. Preprocessor clients are busy while waiting for a response.
They receive no further documents from the queue server during this time.
5. When the preprocessing of the documents is over, the preprocessor client receives a response from the preprocessor, and returns its own response to the
queue server.
6. Only then is the preprocessor client free to receive further documents from the queue server.
Distributing PreprocessingThe preprocessing of documents is carried out by preprocessors running in any or index mode. If you set up the system according to Landscape Configuration,
these are
· The preprocessors that run on the master hosts
· If you are using backup hosts, the preprocessors that run on the backup hosts
For more information on the meaning of the modes, see Preprocessor Modes.
If the preprocessing capacity of the master and backup hosts is insufficient, you can use one host or multiple hosts exclusively for preprocessing. Preprocessing
then takes place on additional preprocessors, allowing more documents to be preprocessed in parallel. This increases throughput for preprocessing.
On a host used exclusively for preprocessing, one or more preprocessors run in index mode, and a name server also runs. Such a host is referred to as a
preprocessor host.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 75 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 76/105
The graphic below depicts a system with one master host, two slave hosts, and one preprocessor host. The preprocessor host supports the master host in
preprocessing.
Load Distribution and PerformanceYou have to configure the preprocessors and queue server if you want to use distributed preprocessing. The following parameters are important for load distribution
and performance:
● Number of Preprocessors and Preprocessor Threads
● Preprocessor Threads and Queue Server Pool Size
You can only improve performance by taking all parameters into account together. Changing one individual parameter cannot improve performance.
Number of Preprocessors and Preprocessor ThreadsThe following parameters in the preprocessor and queue server are important for performance:
● The number of preprocessors running on a host
(Number of preprocessors per host)
● The number of threads in a preprocess or process
(Number of threads per preprocessor)
● Number of preprocessor clients in the queue server
(Pool size per queue server)
You can use the pool size for the queue servers to directly influence the number of preprocessor threads. The number of preprocessor threads and the pool
size are connected as follows: <queue server pool size> = <number of preprocessor threads> For more information, see Preprocessor Threads and Queue
Server Pool Size.
Configuration Rules for Preprocessor and Queue Server
You must take into account the following relationships and configuration rules for a high-performance configuration of distributed preprocessing:● <maximum number of preprocessors per host> = <number of CPUs>
That is, a maximum of one preprocessor per CPU.
● <maximum number of threads per preprocessor> = 3
That is, a maximum of three threads per preprocessor and per CPU.
● <total pool size of all queue servers> = <total number of CPUs for all preprocessor hosts> * 3
These relationships are explained in more detail below.
How Many Preprocessors Can Run On a Host?
The number of preprocessors that can run on a host is limited by the available main memory and the number of CPUs.
Each preprocessor process has its own main memory area. If there are multiple preprocessors running, they need a correspondingly large amount of main
memory. The main memory requirement of a preprocessor depends on the following factors:
● How big are the documents?
● What format do the documents have (PDF, HTML, and so on)?
● For how many languages is language recognition activated?
The main memory requirement for one language is between 30 and 40 MB per preprocessor. If there are more languages, the main memory requirement is
normally around 100 MB per p reprocessor.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 76 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 77/105
In some cases, the main memory requirement may be between 500 MB and 1 GB. The worst case scenario can occur if language recognition is activated for all
languages and a large number of preprocessor threads are processing large documents at the same time.
If the host has enough main memory, the following upper limit is valid:
<Maximum number of preprocessors on a host> = <number of CPUs>
What Is the Maximum Possible Number of Preprocessor Threads?
A preprocessor process can consist of one or more threads. If there are multiple threads, the preprocessor can distribute the requests among the threads and
process the requests in parallel. The preprocessor automatically starts the number of threads that is required for processing.
For each preprocessor process, a maximum of three preprocessor threads per CPU should be started:<number of preprocessor threads p er preprocessor process> = 3
Since only one preprocessor per CPU and only three threads per preprocessor should be started, this results in the following relationship:
<maximum number of all preprocessor threads running on a host> =
<number of CPUs> * 3
You use the queue server pool size to indirectly configure the number of preprocessor threads (see Preprocessor Threads and Queue Server Pool Size).
If the preprocessor uses the maximum number of threads it is also using the maximum amount of system resources. You will have almost complete CPU load.
If you want the preprocessor to have fewer system resources, you can choose to have a smaller number of threads. However, you ought not to choose to have
a greater number of threads, since this can cause performance to drop.
The more threads invoked in parallel, the longer the operating system takes to administrate the threads (to trigger, stop, and monitor them). If the number of
threads invoked in parallel is too great, the operating system is overwhelmed by thread administration.
More Preprocessors or More Threads?
If you want to optimize preprocessing performance, you need to decide whether to increase the number of preprocessors or the number of preprocessor threads.
Your decision depends on the following factors:
● Required load distribution among the hosts
● Sys tem resources of the hosts (number of CPUs and available main memory)
If only one host is preprocessing documents, it makes no difference whether one preprocessor is running with multiple threads or several with one thread each.
If several hosts are preprocessing documents, the parameters have the following effect:
● Load balancing
The number of preprocessors running on each host controls the load distribution among the hosts.
The more preprocessors running on a host, the more load that host receives.
Preprocessing takes place on the master host and on a preprocessor host. Because the master host also carries out indexing you want it to receive a smaller
preprocessing load. There is therefore only one preprocessor on the master host, but two preprocessors on the preprocessor host.
The load is distributed among the two hosts in the ratio 1:2.
● Performance
The number of preprocessor threads controls the performance on one host.
The more threads there are, the more documents a preprocessor can process in parallel.
You cannot use the pool size on the queue server to increase the number of preprocessor threads (see Preprocessor Threads and Queue Server Size) and the
number of preprocessors without restriction. The maximum number depends on the available system resources.
Availability
Availability can also play a part when deciding on the number of preprocessors and preprocessor threads.
Using multiple p reprocessors increases the availabili ty of the system. This is because different processes (preprocessors) have less impact on one another than
do the different threads of a process. If a thread hangs, this can affect other threads of the same process but not of another process.
However, using multiple preprocessors also requires more main memory (see the How Many Preprocessors Can Run On a Host? section above).
Preprocessor Threads and Queue Server Pool SizeThe pool size is important for achieving optimum integration between the queue servers and preprocessors. The pool size determines how many documents a
queue server can distribute to the preprocessors at once.
From a technical point of view, the pool size determines how many preprocessor clients a queue server instantiates at startup. The preprocessor client is an
internal component of the queue server. The queue server uses the preprocessor clients to communicate with the preprocessors and uses its services.
Depending on the number of preprocessor clients started in the queue server (= pool size), the corresponding number of preprocessor threads are started by a
central worker thread management. You can use the pool size in the queue server to control the number of preprocessor threads on the hosts that preprocessors
are running on.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 77 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 78/105
The following relationship applies:
<queue server p ool s ize> = <number of preprocessor threads>
Example
For example, if you set the pool size in the queue server to the value 6, the corresponding number of preprocessor threads are started on the host that the
preprocessor(s) are running on. If two preprocessor processes are running there, the threads are distributed between two preprocessor processes, which
correspond to three threads per process.
Example of the relationship between the pool size and the number of preprocessor threads
What Value Should the Queue Server Pool Size Have?
You must determine the optimum pool size and thus the number of preprocessor threads individually for your system. For each preprocessor process, a
maximum of three preprocessor threads should be started using the entry for the pool size.
Thus, the following relationship applies:
<number of preprocessor threads p er preprocessor process> = 3
Since a maximum of only one preprocessor should be started per CPU (see Number of Preprocessors and Preprocessor Threads), this results in the following
relationship for a dis tributed system landscape with multiple queue servers and preprocessor hosts:
<total pool size of all queue servers> =
<total number of CPUs for all preprocessor hosts> * 3
If the pool size is too low, the preprocessor can have unnecessary idle times and not have a full load, although resources are still available. If the pool size is too
large, the host on which the queue server is running uses too many system resources to manage the pool.
You should check the CPU load for the preprocessors for a while. If system resources are still available, you can increase the pool size to improve performance.
However, if you increase the pool size beyond the recommendations, you gain no performance benefits and might actually cause performance to drop.
The pool size of queue servers is configured in the file TREXQueueServer.ini.
Configuration
Purpose
The sections below explain how to set up distributed preprocessing with a preprocessor. It also contains information on how to increase the number of
preprocessors and preprocessor threads if necessary.
Configuration RecommendationsTo achieve high-performance preprocessing that does not hamper the other TREX servers, use the following configuration.
Preprocessor hosts
● In accordance with the configuration rules that are speci fied in Number of Preprocessors and Preprocessor Threads, start the required number of
preprocessors on the preprocessor host.
● Monitor the load on the host during prep rocessing. If system resources are still available, you can uses the pool size in the queue server to increase the
preprocessor threads up to the maximum number recommended.
Master Host
We recommend that you keep the default configuration for the preprocessor on a master host.
If you give the preprocessor additional system resources, the performance of the queue server and index server suffers. Preprocessing will be faster, but
subsequent processing steps will be slower.
Backup host
If the master index server and master queue server are active, there is little load on the backup hosts. If you want to use more load for preprocessing on a backup
host, you can start more preprocessors on it, provided the hardware allows this (when doing this, note the configuration rules that are specified in Number of
Preprocessors and Preprocessor Threads.) This allows you to make better use of the system resources on the backup host. However, the performance of the
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 78 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 79/105
indexing is not so high if either the backup index server or backup queue server is actually ac tive.
Example
Example 1
The following hosts preprocess documents:
● Master host:
2 CPUs – one preprocessor – one queue server
● Preprocessor hosts:
2 CPUs – two preprocessors
Both hosts have two CPUs. Because the preprocessor host only preprocesses documents, it should take more of the load than the master host. The preprocessor
host therefore has two preprocessors. The only queue server in the system and a preprocessor are running on the master host. Therefore, a total of three
preprocessors are running. Since each preprocessor process should process a maximum of three threads per CPU in parallel, you can calculate the maximum
pool size as follows:
pool size = number of all preprocessors * 3 = 3 * 3 = 9
Thus the pool size must be set to the value 9 in the configuration file TREXQueueServer.ini on the master host. As a result, the queue server makes available a
total of nine queue server clients for the preprocessors and in turn a total of nine threads are started in the preprocessors.
Example 2
The following hosts preprocess documents:
● Master host 1
2 CPUs – one preprocessor – one queue server
● Master host 2
2 CPUs – one preprocessor – one queue server
● Backup host
2 CPUs – one preprocessor – one queue server
● Preprocessor host
2 CPUs – two preprocessors
The preprocessor host therefore has 2 preprocessors, as in example 1, but no queue server. Since the system consists of two master hosts and one backup
host, there are a total of three queue servers. We can assume that two of these three queue servers are always active: Either both master queue servers, or one
master queue server and one backup queue server.
The pool size for two active queue servers is determined as follows:
pool size = number of all preprocessors * 3 = 5 * 3 = 15
This pool size divided by the number of active queue servers gives a pool size of 7 or 8 per queue server. This is the pool size that you enter in the configuration
file TREXQueueServer.ini of all queue servers.
Setting Up Distributed Preprocessing
Use
The procedure below explains how to implement distributed preprocessing. The description assumes that:
● You have set up a distributed system with at least one master host.
● You want to connect a host that exc lusively preprocesses documents (p reprocessor host). You want the preprocessors on this host to have as many system
resources as possible.
Adding a Preprocessor Host to the Distributed System1. Install TREX on the preprocessor host. During the installation speci fy the number of preprocessors to run on the host.
2. If TREX is not running, start it.
3. Start the TREX admin tool on a host that is already configured in the distributed system.
4. Go to the Landscap e Configuration window.
5. Use Add Host to add the new preprocessor host.
Configuring Preprocessor Hosts
1. Choose the preprocess or mode index for the preprocessor host.
2. Configure the TREX daemon on the prep rocessor host so that only the name server and preprocessors run there:
a. Select the host in question and choose Edit Services.
b. Change the programs parameter as follows:
[daemon]
programs = nameserver, preprocessor1, ..., p reprocessor<n>
3. Go to the Landscape Services window.4. Select one of the servers to run on the preprocessor host. Choose Start New/Stop Removed Services@<hostname>(*)from the context menu.
Configuring Master and Backup Hosts
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 79 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 80/105
1. Go to the Landscape Ini window.
2. Establish the maximum possible number of prep rocessor threads for all hosts that preprocess documents. Take into account all hosts on which a
preprocessor is running in either any or index mode.
For more information about the calculation, see Preprocessor Threads and Queue Server Pool Size.
3. Calculate the pool size for each queue server.
For more information, see Preprocessor Threads and Queue Server Pool Size.
4. Edit the configuration file TREXQueueServer.ini for all queue servers . Enter the calculated value in the parameter poolsize.
5. Go to the Landscape Services window.
6. Select a queue server whose configuration you have changed. Choose Restart queueserver@<host_name>:<port> from the context menu.Carry out this step for all other queue servers.
The queue servers are automatically restarted by the TREX daemon.
Result
You can check whether the preprocessors are receiving as many system resources as possible by looking at the CPU load for the hosts in question in the TREX
admin tool. When documents are being preprocessed, the CPU usage should be at the upper limit.
Example ConfigurationThis section shows the configuration for a system in which preprocessing takes place on one master host and one preprocessor host.
The configuration is only spec ified for mytrexmaster and mytrexpreprocessor, and only where distributed preprocessing is involved.
Assumptions:
● Both hosts have two CPUs each.
● Two prep rocessors should run on mytrexp reprocessor .
● The only queue server in the system is running on mytrexmaster .
● Only searches take place on mytrexslave1/2, there is no preprocessing here.
TREX admin tool, Landscape Configuration
Hosts table (extract 1)
Host Name Server Mode Master Index/Queue
Server
Slave Index Server for Preprocessor Mode
mytrexmaster 1st master ! index
mytrexpreprocessor slave index
mytrexslave1/2 slave search
...
Hosts table (extract 2)
Host Base Path Services
mytrexmaster ... ...
mytrexpreprocessor /usr/sap/<SAPSID>/TRX<instance_number> nameserver, preprocessor1, preprocessor2
...
TREXDaemon.ini for ‘mytrexpreprocessor’ (extract)
[daemon]
programs = nameserver, preprocessor1, preprocessor2
TREX admin tool, Landscape Ini
TREXQueueServer.ini for ‘mytrexmaster’[preprocessor]
poolsize=9
The pool size is calculated as follows:
2 * <preprocessor-threads> on <mytrexpreprocessor> + <preprocessor-threads> on <mytrexmaster> = 2 * 3 + 3 = 9
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 80 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 81/105
Increasing the Number of Preprocessors
Use
If necessary you can increase the number of preprocessors running on a host. See Number of Preprocessors and Preprocessor Threads for information on when
this is recommended.
Procedure
1. Start the TREX admin tool on any host in the distributed system.
2. Go to the Landscape Configuration window.
3. Select the host in question and choose Edit Services.
4. Add an entry for the new preprocessor to the parameter programs.
[daemon]
programs = ..., p reprocessor<new_number>
5. Make sure that there is a section with the same name (preprocessor<new_number> ) containing the start parameter for the preprocessor. If there is no suchsection, copy an existing section and rename it as follows:
[preprocessor<new_number>]
Windows: executable=TREXPreprocessor.exe
UNIX: executable=TREXPreprocessor.x
. . .
6. Go to the Landscape Services window.
7. Select any TREX server running on the host in question. Choose Start New/Stop Removed Services@<hostname>(*)from the context menu.
8. Modify the pool size of all master and backup queue servers.
For information on calculating the pool size, see Pool Size of Queue Servers. For information on the procedure, see the section Master and Backup Hosts in
Setting Up Distributed Preprocessing.
Appendix
Information on Stopping/Starting Distributed SystemsThere are no special rules to take into account when stopping a distributed system. You can stop TREX in any order on the individual hosts.
When you start a distributed system, the type of data storage dictates whether there is a defined sequence.
· If you are using centralized data storage, there is no special sequence.
· If you are using decentralized data storage, you firstly have to start a master name server that was running just before the system was stopped. This ensures
that the system is based on an up-to-date topology file.
The hosts mytrexhost1, mytrexhost2, and mytrexhost3 are configured as master name servers. mytrexhost3 has not been operating for a while, which means
that its topology file is not up-to-date. Changes that have been made since (such as new indexes) are not known to this host.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 81 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 82/105
You also stop TREX on the remaining hosts for maintenance reasons. If you now want to restart TREX, you now have to start it on mytrexhost1 or mytrexhost2first. These master name servers have up-to-date topology files.
If you were to start TREX on mytrexhost3 first, the system would be based on an out-of-date topology file.
The master name servers compare their topology files at startup. If the files are different, the master name server saves the files as topology.<date>.old and
topology.<date>.new. This allows the correct topology to be restored even if the required start sequence is not observed.
If this happ ens in your system, contact SAP Support.
Starting the TREX Admin Tool
Prerequisites
On UNIX: Since the TREX admin tool has a graphical interface, you need an X server. You cannot use a terminal program that only supports text mode, such as
telnet.
Procedure
1. Log on with the user <sap sid>adm.
2. Carry out one of the following steps :
Operating System Procedure
UNIX Enter the following:
cd <TREX_DIR>
./TREXAdmin.sh
Windows Choose Start ® Programs or All Programs ® SAP TREX ® Instance
<instance_number> ® Tools ® TREX Administration
You can also start the TREX admin tool by double-clicking
<TREX_DIR>\TREXAdmin.bat in Windows Explorer.
Configuring Queue Parameters
Use
The queue parameters control the interaction between the queue server and the index server. In particular, they specify when the queue server triggers indexing
and optimization of documents. It is important for performance reasons that you have optimum settings for the queue parameters.
When TREX creates a queue, it uses the default settings for the queue parameters. Depending on the document sets that you have to index initially and on the
type of documents you index, you may have to change the default settings.
The default settings that TREX uses for new queues are defined in the configuration file TREXQueueServer.ini. You can change the default settings. However,
you should only make changes to configuration files after consulting SAP support or with a consultant.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 82 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 83/105
Prerequisites
You have already created indexes.
Procedure
You can change the queue parameters for existing queues as follows:
Tool Path
TREX admin tool Queue Admin ® Queue Parameters
TREX monitor in the portal System Administration ® Monitoring ® Knowledge Management ® TREX
Monitor ® Edit Queue Parameters
TREX Admin Tool in the SAP System Transaction TREXADMIN ® Queue Admin ® Set Queue Parameters
For more information about the meaning of the queue parameters, see the SAP Library at help.sap.com.
Changing Java Client Parameters
Use
You change the Java client parameters using the SAP J2EE Engine Visual Administrator Tool.
Procedure
1. Log on to the host on which the SAP J2EE Engine is running. Use the user <j2eeadm>.
2. Start the SAP J2EE Engine Visual Administrator Tool and log on to the SAP J2EE Engine.
For information on using this tool, see the SAP Library at help.sap.com.
3. Choose Cluster ® Services ® TREX Service.
4. Make the required changes.
5. Save your changes and confirm the restart of the service.
6. Repeat the last three steps for all other server processes of the clus ter.
Advanced Configuration Advanced configuration comprises the following areas:
● Language Recognition and Processing with TREX
TREX supports the indexing of documents that exist in different languages. When TREX is installed, you select the languages to be identified by language
recognition. You can retrospectively configure TREX to recognize additional languages.
● File Formats Supported by TREX
Documents whose content and attributes can be indexed and searched by TREX can exist in numerous different file formats. You can configure which file
formats you want to exclude from processing and which parts of XML and HTML files you want to exclude from indexing.
● Changing Proxy Server Settings
The TREX preprocessor can access documents on Web pages using a proxy server. You can configure the settings for the proxy server.
● Activating Python Extensions
Some TREX functions are implemented as Python extensions. If the application using TREX uses these functions, you have to activate the Python
extensions.
● Configuration of the TREX Services in the SAP J2EE Engine
The TREX Java client is implemented as a TREX service in the J2EE engine. You can use the Visual Administrator to configure TREX caches and the TREX
Java client.
● Delta Index Configuration
TREX provides the option of activating delta indexes. This allows you to update indexes faster and improve the performance of TREX.
● Changing the TREX Host Name (Single and Multiple-Host Installation)
You can change the name of the host on which you installed TREX later on, or you can install TREX with a virtual host name right from the start. You can do
this for both single-host and multiple-host installations.
● Configuration of the TREX Security Settings
You can configure secure communication between TREX and the application using it (for example, SAP Enterprise Portal or SAP Customer Relationship
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 83 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 84/105
Management).
Language Recognition and Processing with TREX
Use
Search and Classification (TREX) supports the indexing of documents in different languages. When TREX is installed, you select the languages to be identified by
language recognition. You can configure TREX to recognize additional languages later on (see Configuring Language Recognition).
Language processing takes place after the language recognition process. This involves generating terms that are significant as regards creating an index, and is
done using various text operations.
Integration
TREX can process all languages supported by SAP. However, the functionality differs depending on the language. For more information, see:
· Supported Languages
These languages are recognized by TREX and supported without restriction. You can use all TREX functions including search, retrieval, text-mining, andclassification.
· Supported Languages with Restricted Functionality
These languages are recognized by TREX and supported with restrictions. Text-mining functions are particularly restricted.
· Languages that TREX Can Process
TREX cannot recognize these languages directly, but it can process them. This is done by mapping these languages to languages that TREX does support.
Language Recognition and Processing Function
Language Recognition and Processing Interaction
Language Recognition
Documents can exist in various languages and file formats. The TREX preprocessor converts the documents into UTF-8 encoded HTML so that they can be
processed by TREX. If there is no information on the document language, the preprocessor also carries out a language recognition process before processing the
document further. You can specify the languages to be recognized by the preprocessor in the configuration file std.langid-config both during the TREX installation
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 84 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 85/105
and later on. For more information, see Configuring Language Recognition. The language of a document is needed so that it can be placed in the correct language
version of the index.
Language recognition is based on statistical methods: Because the frequency of certain combinations of letters is a characteristic of a language, these
combinations can be used in order to identify it with a reasonable degree of probability. A frequency file exists for each of the languages supported by TREX. It
contains frequency ratings and weightings for letter combinations that are typical for the language in question. The TREX preprocessor checks the text document it
is identifying to see whether it contains these combinations. It is then assigned to the language to which it is most similar.
Because the language of documents with only a small amount of text cannot be reliably identified, TREX preprocessor language recognition is only activated if at
least 7 terms (default value) can be recognized per document. When the language has been identified the term recognition process, which takes place after the
language recognition process, can improve the number of terms recognized using user-specific dictionaries.
Language Processing
Not all words that appear in a piece of document text are equally significant as regards representing that document in an index. This is why language processing
takes place after the language recognition process. This involves generating terms that are significant as regards creating an index, and is done using various text
operations.
Text Operations for Language Processing
· Tokenization: Determining words and sentence boundaries
· Normalization: Normalizing orthography
· Tagging: Determining word types
· Stemming: Reducing words to their stem form (for example, mice ® mouse)
· Stop words: Eliminating frequent words (such as and , and or )
Supported Languages
Use
TREX currently supports the following languages fully (May 2006/external software version 3.7.3):
● Arabic
● Chinese (simplified)
● Chinese (traditional)
● Danish
● German
● English
● Finnish
● French
● Dutch
● Italian
● Japanese
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 85 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 86/105
● Catalan
● Korean
● Croatian
● Norwegian (Bokmal)
● Norwegian (Nynorsk)
● Portuguese
● Russian
● Swedish
● Serbian
● Slovakian
● Slovenian
● Spanish
● Czech
You can find the most up-to-date information about the languages that TREX supports on SAP Service Marketplace service.sap.com/pam in the Platform
Availability Matrix (PAM).
For a detailed description of how to subsequently activate other languages, see Configuring Language Recognition.
IntegrationThe languages that TREX fully supports and the languages of the SAP applications using TREX (such as Knowledge Management in SAP Enterprise Portal) can
differ.
Supported Languages with Restricted Functionality
UseTREX supports several other languages for which restrictions currently apply as regards TREX functionality. TREX currently supports the following additional
languages with restrictions (May 20 06/external software version 3.7.3):
● Greek
● Hebrew
● Polish
● Romanian
● Thai
● Turkish
● Hungarian
You can find the most up-to-date information about the languages that TREX supports on SAP Service Marketplace service.sap.com/pam in the Platform
Availability Matrix (PAM).
For a detailed description of how to activate these additional languages, see Configuring Language Recognition.
Constraints
Certain restrictions apply because the linguistic processing development for these additional languages is still at a relatively early stage. TREX functions such as
search, attribute query, query-based classification, and other functions that use text-mining functions rarely or never work at the same level of quality as for fully
supp orted languages.
However, the linguistic text-mining functions sometimes delivery results of less quality that in the case for the fully supported languages. Results of poor quality
can occur in the following areas:
● Feature extraction
Automatically calculated document and/or class features may b e of poor quality.
● Example-based classification
When using this classi fication method the elements may b e class ified less precisely.
● Linguistic search
Incomplete or unexpected grammatical variations of a search term may be returned in the search results list.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 86 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 87/105
Languages that TREX Can Process
Use
TREX cannot recognize these languages directly, but it can process them. The SAP application using TREX sends documents to be processed to TREX and
delivers information on the document language at the same time. Using a list, the document language transmitted is mapped to a language to which it is related
and which TREX can process.
The application using TREX sends TREX a document written in Bulgarian. Bulgarian is mapped to the related language Russian and indexed in a Bulgarian
index. The document is further processed in Russian.
TREX can currently process the following languages by mapping them to a language that it supports:
Document Language Processing Language
Afrikaans Dutch
Bulgarian Russian
Estonian Finnish
Indonesian English
Icelandic Danish
Latvian Polish
Lithuanian Polish
Malaysian English
Norwegian Norwegian (Bokmal)
Serbian (Latin) Czech
Ukrainian Russian
Configuring Language Recognition
Use
Language recognition takes place first using the lexicon software of third-party providers and then using the TREX preprocessor. You can configure both types of
language recognition.
Naming Convention
● Central directory for executable files <CENTRAL_DIR>
○ On UNIX: usr/SAP/<SAPSID>/SYS/exe/nuc/<OS>
○ On Windows: <drive>:usr\SAP\<SAPSID>\SYS\exe\nuc\<O S>
As part of the CPE (Central Patch Environment), the sapcpe program takes on the automatic synchronization of executable files and copies them from the central
directory for executable files, <CENTRAL_DIR>, into the local directory for executable files, <TREX_DIR>\exe. When you restart TREX, the system automatically
launches the sapcpe p rogram. During all subsequent starts, sapcpe checks whether or not the local executable files are up-to-date and copies new or changed
executable files from the central directory to the local directory, <TREX_DIR>\exe.
● TREX installation directory <TREX_INSTALL>
○ UNIX: /usr/sap/<sapsid>/trx<instance_number>/<TREX_host_name>
○ Windows: <disk_drive>:\usr\sap\<SAPSID>\
TRX<instance_number>\<trex_hostname>
Modifying Language Recognition with Lexicon Software
Language recognition with lexicon software includes the following areas:
● Configure add itional languages
You can retrospectively configure language recognition for additional languages. When TREX is installed, you select the languages to be identified by
language recognition.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 87 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 88/105
Only activate the languages that appear in your documents and that you also want to index. Doing this optimizes the performance of the language recognition
procedure and of indexing in general. Moreover, the less languages used, the better the results of language recognition.
● Disr egarding parts of HTML or XML documents
You can configure the system so that certain parts of HTML or XML documents are ignored when the language recognition procedure takes place.
Documents that are to be indexed and are in HTML or XML format often contain elements (such as JavaScript programs) that damage the performance of the
language recognition procedure.
● Changing the number of characters for language r ecognition
Language recognition using lexicon software is set up so that only a certain number of characters are taken into consideration. This is usually set to its
optimum value at delivery. If it turns out that languages are not being recognized correctly, you can increase the quantity of text that is taken into
consideration.
In certain cases, language recognition might not deliver the correct language. In particular, problems can occur when processing documents that are very short
or that contain a large number of abbreviations or words loaned from another language.
You modify the lexicon software language recognition by editing the std.langid-config configuration file on the TREX preprocessor. The settings are valid for all
indexes on the preprocessor. If you are using more than one TREX preprocessor, you need to modify the configuration file of each preprocessor.
1. Open the std.langid-config configuration file in the central directory for executab le files, <CENTRAL_DIR>\lexicon, in a text editor.
2. In the section <encodings-languages-covered>, check the lis t of languages to be taken into consideration for the language recognition procedure. The list is
under <list key = "utf_8">.
Delete languages that you do not need, or flag them using <!-- -->.
You can add more languages to the list as needed as long as the languages in question are supported by the language recognition service. The following list
shows which languages you can use, and gives the entry that you enter into the list for each language.
Languages supported by TREX
Language Entry
Chinese (simplified) <item key = "simplified-chinese" />
Chinese (traditional) <item key = "traditional-chinese" />
Danish <item key = "danish" />
German <item key = "german" />
English <item key = "english" />
Finnish <item key = "finnish" />
French <item key = "french" />
Dutch <item key = "dutch" />
Italian <item key = "italian" />
Japanese <item key = "japanese" />
Korean <item key = "korean" />
Norwegian (Bokmal) <item key = "bokmal" />
Norwegian (Nynorsk) <item key = "nynorsk" />
Portuguese <item key = "portuguese" />
Swedish <item key = "swedish" />
Spanish <item key = "spanish" />
Languages supported by TREX with limited functionality
Language Entry
Arabic <item key = "arabic" />
Greek <item key = "greek" />
Hebrew <item key = "hebrew" />
Polish <item key = "polish" />
Romanian <item key = "romanian" />
Russian <item key = "russian" />
Thai <item key = "thai" />
Czech <item key = "czech" />
Turkish <item key = "turkish" />
Hungarian <item key = "hungarian" />
Only limited text-mining functions are currently available for these additional languages. For more information about these languages, see Supported Languages
with Restricted Functionality.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 88 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 89/105
In the following example, English, French, and Danish are taken into consideration. Italian is not.
<encodings-languages-covered>
<list key = "utf_8">
<item key = "english" />
<item key = "french" />
<item key = "german" />
<!-- item key = "italian" -->
...
All documents are converted to URF-8 Unicode format before language recognition takes place. Therefore only the section <list key = "utf_8"> is relevant in
the language list. The other types of coding do not need to be modified.
3. The section <remove-markup-content> contains a list of markings that are ignored when language recognition takes p lace. All texts with these markings are
ignored.
The following example shows a section of the list.
<remove-markup-content>
<item key = "applet" />
<item key = "code" />
<item key = "script" />
...
<item key = "title" />
For example, if JavaScrip t programs (marked b y <s cript>) occur in HTML documents, they are ignored when language recognition takes p lace.
If necessary, you can add more elements to the list, or remove existing elements from the list.
You have French documents that also contain a short summary in English. The summary is marked with the tag <English-Abstract>: <English-Abstract> This
is the abstract in English. ... </English-Abstract>. Add the line <item key = "english-abstract" /> to the list mentioned above.
On the other hand, if you want text marked with <title> to be taken into consideration for language recognition, you need to remove the line <item key = "title" />
from the list.
4. The section <detection-buffer-size> determines the quantity of text that is taken into consideration when a document is sub jected to the language recognition
procedure. You can increase this value if you think that the quantity of text is too small.
However, this should only be done in exceptional circumstances. The larger the quantity of text, the longer language recognition, and therefore indexing, takes.
The value in the section <detection-buffer-size> cannot be greater than the value in the section <langid-buffer-size>.
5. Save the file and close the text editor.
6. Restart TREX.
For the changes to the std.langid-config configuration file in the <CENTRAL_DIR>\lexicon directory to take effect, you must restart TREX. When you restart
TREX, the sapcpe program copies the changed configuration files from the central directory for executable files, <CENTRAL_DIR>\lexicon, to the local TREX
directory, <TREX_DIR>\exe\lexicon, and overwrites the std.langid-config configuration file there.
You can also use the TREX admin tool (stand-alone), area Landscape ® Ini to change the std.langid-configconfiguration file and then have the changes take
effect by restarting the TREX preprocessor. Note that only the file in the <TREX_DIR>\exe\lexicon directory is changed if you use this method. If you have
changed the std.langid-config configuration file in the central directory, <CENTRAL_DIR>\lexicon, as described above and restarted TREX, the system
overwrites the changed file in the local directory, <TREX_DIR>\exe\lexicon, during the automatic synchronization by the CPE and the changes are lost.
Modifying Language Recognition with the TREX Preprocessor The language of documents with only a small amount of text cannot be reliably identified, therefore TREX preprocessor language recognition is only activated if at
least seven terms (default value) can be recognized for each document. You can change this value if you are using TREX in a scenario with very short sentences.
You make modifications for TREX preprocessor language recognition in the TREXPreprocessor.ini configuration file.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 89 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 90/105
1. Open the <TREX_INSTALL>\TREXPreprocessor.ini configuration file with a text editor.
2. In the section [lexicon], change the min_valid_tokens parameter.
The default value of this parameter is 7. Choose a lower value if you want the TREX preprocessor to try to identify the language of documents with fewer terms
per document.
3. Restart the TREX prep rocessor.
You need to stop and restart the preprocessor for the new settings to take effect. You do this with the TREX admin tool (s tandalone), using the function for
starting and stopping the TREX servers. Note that the TREX daemon automatically restarts the server after it has been stopped. The settings are valid for all
documents indexed after the TREX preprocessor is restarted.
The new settings do not affect documents that have already been indexed. This means that if, for example, a document that has already been indexed has
been assigned to the wrong language, it must be reindexed.
File Formats Supported by TREX
Use
Documents whose content and attributes can be indexed and searched by TREX can exist in numerous different file formats. The TREX preprocessor converts
the document text and attributes of the different file formats into UTF-8 encoded HTML. The file filters of a special filter software are used to enable the subsequent
searching and indexing of all prevalent file formats such as MS WORD; MS PowerPoint, PDF, and HTML.
Features
The table below lists all file formats that are currently supported by TREX.
Supported File Formats (May 2006/Version 8.1 of Filter Software)
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 90 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 91/105
File formats for text processing – generic Versions
ASCII Text (7 & 8 bit versions available) All versions
ANSI Text (7 & 8 bit) All versions
EBCDIC (Extended B inary Coded Dec imal Interchange Code) All vers ions
HTML Versions up to and including 3.0
IBM Revisable Form Text All versions
IBM FFT All versions
Microsoft Rich Text Format (RTF) All versions
MHTML (MIME Encapsulation of Aggregate HTML Documents) No speci fic version
Text Mail (MIME) No specific version
Unicode Text All versions
UUEncode
WML Compatible with WML specification 5.2
XML No specific version
Special Features of HTML Files and XML Files
TREX processes HTML files and XML files without filtering, because the conversion to HTML is not necessary. In principle, the lexicon software integrated in
TREX ignores the text of the mark-up elements of the actual HTML and XML code, which is located between the tag brackets (<...>). In this way, texts such as
“font size”, “color”, and so on within the tag <font size="7" color="#FF0000"> are not passed on for indexing, because this information occurs in many HTML filesand thus is not characteristic for the respective document content.
Using the mark-up elements, you can configure which texts within HTML and XML documents should not be indexed. For example, this makes sense in the case
of JavaScript program code, which is marked in HTML by the tags <scrip t type=“text/javascript“...> ... </script>. The JavaScript p rogram code itself does not
contain any characteristic content for the document in question and can thus be ignored.
For more information, see Excluding Parts of XML and HTML Files From Indexing.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 91 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 92/105
File formats for text processing - DOS Versions
DEC WPS Plus (DX) Versions up to and including 4.0
DEC WPS Plus (WPL) Versions up to and including 4.1
DisplayWrite 2 & 3 (TXT) All versions
DisplayWrite 4 & 5 Versions up to and including Release 2.0
Enable Versions 3.0, 4.0, and 4.5
First Choice Versions up to and including 3.0
Framework Version 3.0
IBM Writing Assistant Version 1.01
Lotus Manuscript Versions up to and including 2.0
MASS11 Versions up to and including 8.0
Microsoft Word Versions up to and including 6.0
Microsoft Works Versions up to and including 2.0
MultiMate Versions up to and including 4.0
Navy DIF All versions
Nota Bene Version 3.0
Novell WordPerfect Versions up to and including 6.1
Office Writer Versions 4.0 to 6.0
PC-File Letter Versions up to and including 5.0
PC-File+ Letter Versions up to and including 3.0
PFS:Write Versions A, B, and C
Professional Write Versions up to and including 2.1
Q&A Version 2.0
Samna Word Versions up to and including Samna Word IV+
SmartWare II Versions up to and including Samna Word IV+
Sprint Version 1.0
Total Word Version 1.2
Volkswriter 3 & 4 Versions up to and including 1.0
Wang PC (IWP) Versions up to and including 2.6
WordMARC Versions up to and including Composer Plus
WordStar Versions up to and including 7.0
WordStar 2000 (DOS) Versions up to and including 3.0
XyWrite Versions up to and including III Plus
File formats for text processing - Windows Versions
Adobe FrameMaker (MIF) Up to and including version 6.0
Corel/Novell WordPerfect for Windows Versions up to and including 10
Corel WordPerfect Suite for Windows Version 12.0
Hangul Version 97, 2002 (text only)
JustSystems Ichitaro Versions 5.0, 6.0, 8.0, 9.0, 10.0, 13.0, and 2004
JustWrite Versions up to and including 3.0
Legacy Versions up to and including 1.1
Lotus AMI/AMI Professional Versions up to and including 3.1
Lotus Word Pro (non-Windows) Version 96 -- Millennium Edition 9.6, text only
Lotus Word Pro (non-Windows)
Microsoft Works for Windows Versions up to and including 4.0
Microsoft Windows Write Versions up to and including 3.0
Microsoft Word for Windows Versions up to and including 2003
Microsoft WordPad All versions
Novell Perfect Works Version 2.0
Professional Write Plus Version 1.0
Q&A Write for Windows Version 3.0
StarOffice Writer for Windows and UNIX Version 5.2, 6.X, 7.X; text only
OpenOffice Version 1.1
WordStar for Windows Version 1.0
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 92 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 93/105
File formats for text processing - Macintosh Versions
MacWrite II Version 1.1
Microsoft Word for Mac Versions 3.0 - 4.0, 98, 2001, 2004, and v.X
Microsoft Works for Mac Versions up to and including 2.0
Novell WordPerfect Version 1.02 up to and including 3.0
Table Calculation Formats Versions
Enable Versions 3.0, 4.0, and 4.5
First Choice Versions up to and including 3.0
Framework Version 3.0
Lotus 1-2-3 (DOS & Windows) Versions up to and including 5.0
Lotus 1-2-3 (OS/2) Versions up to and including 2.0
Lotus 1-2-3 Charts (DOS & Windows) Versions up to and including 5.0
Lotus 1-2-3 for SmartSuite SmartSuite 97, Millennium and Millennium 9.6
Lotus Symphony Versions 1.0, 1.1, and 2.0
Microsoft Excel Charts Versions 2.x - 7.0
Microsoft Excel Macintosh Versions 3.0 – 98, 2004, and v.X
Microsoft Excel Windows Version 2.2 up to and including 2003
Microsoft Multiplan Version 4.0
Microsoft Works (DOS) Versions up to and including 2.0
Microsoft Works (Mac) Versions up to and including 2.0
Microsoft Works for Windows Versions up to and including 4.0
Mosaic Twin Version 2.5
Novell Perfect Works Version 2.0
PFS:Professional Plan Version 1.0
QuattroPro for DOS Versions up to and including 5.0
QuattroPro for Windows Versions up to and including version 12
SmartWare II Version 1.02
StarOffice Calc for Windows and UNIX Version 5.2, 6.X, 7.X; text only
OpenOffice Version 1.1
SuperCalc 5 Version 4.0
VP Planner 3D Version 1.0
Database Formats Versions
Access Versions up to and including 2.0
dBASE Versions up to and including 5.0
DataEase Version 4.x
dBXL Version 1.3
Enable Versions 3.0, 4.0, and 4.5
First Choice Versions up to and including 3.0
FoxBase Version 2.1
Framework Version 3.0
Microsoft Works (DOS) Versions up to and including 2.0
Microsoft Works (Mac) Versions up to and including 2.0
Microsoft Works for Windows Versions up to and including 4.0
Paradox (DOS) Versions up to and including 4.0
Paradox (Windows) Versions up to and including 1.0
Personal R:BASE Version 1.0
R:BASE 5000 Versions up to and including 3.1
R:BASE System V Version 1.0
Reflex Version 2.0
Q & A Versions up to and including 2.0
SmartWare II Version 1.02
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 93 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 94/105
Presentation Formats Versions
Corel/Novell Presentations Versions up to and including 12
Harvard Graphics for DOS Versions 2.x & 3.x
Harvard Graphics for Windows Windows versions
Freelance for Windows Versions up to and including Millennium Edition 9.6
Freelance for OS/2 Versions up to and including 2.0
Microsoft PowerPoint for Macintosh Versions 4.0 up to and including 2004 and v.X
Microsoft PowerPoint for Windows Versions 3.0 up to and including 2003
StarOffice Impress for Windows and UNIX Versions 5.2 (text only), 6.X - 7.X (full support)
OpenOffice Version 1.1 (text only)
Graphic Formats Versions
In most cases, only the graphic type and name of the file is displayed for
graphic formats. Only maintained properties are indexed for some graphic
formats. Text inside graphics cannot be indexed.
Adobe FrameMaker Graphics (FMV) Version 5.0
Adobe Illustrator Versions up to and including 9.0
Adobe Photoshop (PSD) Version 4.0
Adobe Portable Document Format (PDF)
Text inside PDF documents can normally be indexed. Text inside
graphics cannot be indexed. Some postscript fonts for text inside PDFs
cannot be indexed.
For more information, see SAP Note 622419: Embedded Fonts in PDF and
Postscript Documents.
Versions up to and including 6.0 (including PDF 1.5)
AmiDraw (SDW) Ami Draw
AutoCAD Interchange and Native Drawing Formats (DXF and DWG) V. 2.5 - 2.6, 9.0 - 14.0, 2000i - 200 2
AutoShade Rendering (RND) Version 2.0
Binary Group 3 Fax All versions
Bitmap (BMP, RLE, ICO, CUR, OS/2, DIB & WARP) Windows
CALS Raster (GP4) Type I and Type II
Corel Clipart format (CMX) Versions 5 - 6
Corel Draw (CDR) Versions 3.0 - 8.0
Corel Draw (CDR with TIFF header) Versions 2.0 - 9.0
Computer Graphics Metafile (CGM) ANSI, CALS NIST version 3.0
Encapsulated PostScript (EPS) TIFF header only
GEM Paint (IMG) All versions
Graphics Environment Mgr. (GEM) Bitmap & Vector
Graphics Interchange Format (GIF) All versions
Hewlett Packard Graphics Language (HPGL) Version 2
IBM Graphics Data Format (GDF) Version 1.0
IBM Graphics Data Format (GDF) Version 1.0
IBM Picture Interchange Format (PIF) Version 1.0
Initial Graphics Exchange Spec (IGES) Version 5.1
JBIG2 (Joint Bi-level Image Experts Group) JBIG2 graphic embeddings in PDF
JFIF (JPEG not in TIFF format) All versions
JPEG (incl. EXIF) All versions
Kodak Flash Pix (FPX) All versions
Kodak Photo CD (PCD) Version 1.0
Lotus PIC All versions
Lotus Snapshot All versions
Macintosh PICT1 & PICT2 Bitmap only
MacPaint (PNTG) No specific version
MacroMedia Flash Macromedia Flash 6.x and 7.x,
and Macromedia Flash Lite
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 94 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 95/105
Micrografx Draw (DRW) Versions up to and including 4.0
Micrografx Designer (DW) Versions up to and including 3.1
Micrografx Designer (DSF) Windows 95, Version 6.0
Novell PerfectWorks (Draw) Version 2.0
OS/2 Bitmap All versions
OS/2 PM Metafile (MET) Version 3.0
Paint Shop Pro (PSP) Versions 5.0 and 5.01
Paint Shop Pro 6 (PSP) Win32 only
PC Paintbrush (PCX and DCX) No specific version
Portable Bitmap (PBM) All versions
Portable Graymap (PGM) No specific version
Portable Network Graphics (PNG) Version 1.0
Portable Pixmap (PPM) No specific version
Postscript (PS) Levels 1 - 2
Progressive JPEG No specific version
StarOffice Draw for Windows and UNIX Versions 2, 6.x, 7.x
Sun Raster (SRS) No specific version
TIFF Versions up to and including 6
TIFF CCITT Group 3 & 4 Versions up to and including 6
Truevision TGA (TARGA) Version 2
Visio (Preview) Version 4
Visio Versions 5, 2000, 2002, and 2003
WBMP No specific version
Windows Enhanced Metafile (EMF) No specific version
Windows Metafile (WMF) No specific version
WordPerfect Graphics (WPG & WPG2) Versions up to and including 2.0
X-Windows Bitmap (XBM) x10 compatible
X-Windows Dump (XDM) x10 compatible
X-Windows Pixmap (XPM) x10 compatible
Compressed File Formats Versions
GZIP No specific version
LZA Self Extracting Compress No specific version
LZH Compress No specific version
Microsoft Binder Versions 7.0-97
MIME-encoded mail messages No specific version
UNIX Compress No specific version
UNIX TAR No specific version
ZIP PKWARE versions up to and including 2.04g
Special Features of Compressed File Formats (Archives)The document content of files that are contained in an archive can only be indexed if TREX knows the file format of the files in question. The system uses the filter
software to identify the type of files in the archive and filters the file content according to the file type identified. All files in an archive are handled as one large
document.
The filter software may sometimes incorrectly assign file types that it does not recognize in an archive to the wrong file type and filter them as such. For example,
binary files (*.bin), the content of which was filtered by accident and then indexed, fill the index created with a large number of terms that make no sense.
You can respond to this issue in two ways:
1. You can exclude compressed file formats (archives) from processing by the preprocessor by removing the corresponding MIME type (for example,
application/zip) from the TREXValidMimeTypes.iniconfiguration file.
For more information about this procedure, see Excluding File Formats from Processing.
2. You can modify the filter software configuration file, default.tpt, in such a way that the names, b ut not the file content of the files that the archive contains are
indexed.
For more information about this procedure, see SAP Note 900742.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 95 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 96/105
Other File Formats Versions
Executables (EXE, DLL) No specific version
Executables for Windows NT No specific version
Microsoft Office 2003 for Windows Version 2003
Microsoft Outlook Message (MSG) Text and HTML ; codepage CP1252
(ISO 88 59-1) and Unicode
Microsoft Project Versions 2003/2002/2000/1998, text only
MP3 ID3 (Identify an MP3) Information
Signature Version 1.0
vCalender No specific version
vCard Electronic Business Card Version 2.1
Yahoo! Instant Messenger Versions 6.x and 7.x
Excluding File Formats from Processing
Use
You can use a list of MIME types in the configuration file TREXValidMimeTypes.ini to control which file formats are to be processed by TREX. MIME types for
graphic formats such as image/jpeg, image/gif, and image/bmp are not listed in the configuration file although these formats are supported by the filter software
integrated into TREX (see Supported File Formats). This exclusion prevents TREX from being unnecessarily b urdened by the processing of these formats, since
it is not normally sensible to index images and graphics. There may be other scenarios where it makes sense to exclude certain file formats.
A company archives its financial statements in the form of PDF files. These files contain mostly figures, with hardly any relevant text information. The
processing of these large files would unnecessarily hamper the performance of TREX but not simplify the indexing of the content. It therefore makes sense to
exclude these files from processing.
Procedure
You exclude the document content of a particular file format from being processed by TREX by removing the corresponding MIME types from the configuration file
TREXValidMimeTypes.ini. Proceed as follows to do this.
1. Stop TREX.
2. Open the configuration file <TREX_installation_directory>\TREXValidMimeTypes .ini with a text editor.
The configuration file TREXValidMimeTypes.ini is located in the TREX installation directory. The path to the directory is:
¡ On UNIX: /usr/sap/trex_<instance_number>
¡ On Windows: <disk_ drive>:\usr\sap\trex_<instance_number>
3. Remove the entry for the file format that you want to exclude from the list.
You do not want TREX to process PDF files because such files contain no relevant text information for your scenario. You remove the entry application/pdf from
the list of MIME types in the configuration file TREXValidMimeTypes.ini.
4. Save the file.
5. Start TREX.
List of MIME Types in the Configuration File TREXValidMimeTypes.ini
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 96 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 97/105
MIME Type File Extension Application
application/andrew-inset ec
application/dca-rft rft IBM Revisable Form Text
application/excel xls MS EXCEL
application/macwriteii MWII MacWrite II
application/msword doc,dot MS Word
application/oda oda CALS Raster (GP4)
application/pdf pdf Adobe PDF
application/powerpoint ppt MS Powerpoint
application/rtf rtf Rich Text Format
application/smil smil, smi
application/vnd.lotus-1-2-3 123, w4, w3, w1 Lotus 1-2-3
application/vnd.lotus-freelance prz, pre Lotus Freelance
application/vnd.lotus-wordpro lwp, sam Lotus WordPro
application/vnd.ms-excel xls, xlb MS EXCEL
application/vnd.ms-powerpoint ppt, pps, pot MS PowerPoint
application/vnd.ms-wpl wpl DEC WPS Plus (WPL)
application/wordperfect5.1 wp5 Word Perfect 5.1
application/x-123 w1, wk3, wk4, wks Lotus 1-2-3 (DOS & Windows)
application/x-cdlink vcd
application/x-chess-pgn pgn
application/x-compress UNIX compress
application/x-csh csh UNIX CShell Script
application/x-dvi dvi
application/x-freelance pre Freelance for Windows
application/x-gtar gtar GNU UNIX tar archive
application/x-gzip gz, tgz GNU Zip compressed data
application/x-httpd-php
application/x-javascript js JavaScript
application/x-latex latex LaTex
application/x-maker frm, maker, frame, rm, fb, book, fbdoc Adobe FrameMaker
application/x-mif mif Adobe FrameMaker (MIF)
application/x-msdos-program dll Dynamic Link Library
application/x-msexcel xls, xlb MS EXCEL
application/x-msmetafile wmf MS Metafile
application/x-netcdf nc, cdf
application/x-ns-proxy-autoconfig pac Netscape Proxy Auto Config
application/x-perl pl, pm Perl Program
application/x-sh sh UNIX Bourne Shell Script
application/x-tar tar UNIX tar Archive
application/x-tcl tcl TCL Script
application/x-tex tex
application/x-texinfo texinfo, texi
application/x-troff t, tr, troff UNIX troff document
application/x-troff-man man UNIX man page
application/x-troff-me me UNIX troff document
application/x-troff-ms ms UNIX troff document
application/x-ustar ustar
application/x-wais-source src
application/xlc xlc
application/zip zip
File formats of the MIME types text/*, including HTML, XML, and plain text formats such as *.txt and *.rtf, are processed by TREX without being filtered.
text/asp asp Active Server Pages
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 97 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 98/105
text/css css Cascading Style Sheets
text/html html, htm, shtml Hypertext Markup Language
text/plain txt, c, ec, cpp, h, hpp, eml, sap
text/richtext rtx
text/rtf rtf
text/src-c c
text/src-c++ cpp
text/src-java java
text/src-perl perl
text/src-tcl tcl
text/tab-separated-values tsv
text/thtml
text/vnd.wap.wml wml
text/wiki
text/wml wml
text/x-asm
text/x-setext
text/x-sgml
text/x-ssi-html
text/x-uil
text/x-uuencode
text/x-vCalendar
text/x-vCard
text/xml xml Extensible Markup Language
Excluding Parts of XML and HTML Files From Indexing
Use
XML (EXtensible Mark-up Language) and HTML (Hyper Text Mark-up Language) are so-called mark-up languages, which structure and disp lay the text in a
document using mark-up elements. Using the mark-up elements in XML and HTML files, you can define in the <TREX-installation_directory>\Lexicon\std.html-
config file which texts within HTML and XML documents should not be indexed.
This makes sense in the following cases:
· Excluding technical information from indexing
For example, you can exclude the technical information in JavaScript program code from indexing, which is marked in HTML by the tags <script
type=“text/javascript“...> ... </script>. The JavaScript program code that is marked by these tags does not contain any characteristic content for the
respective document and thus can be ignored during the indexing run.
· Exclude redundant text parts from indexing
You can exclude text parts from indexing if they are identical in more than one XML or HTML file and thus do not contain any information about the respective
document content.
Excluding Technical Information From Indexing
1. Open the <TREX_Installation_Directory>\Lexicon\std.html-configconfiguration file with a text editor.
You must change entries in the sections <remove-region> and <multimedia-markup> in the std.html-config file. In each of these sections, you can find a list of
mark-up elements for XML or HTML code. The texts that are marked by these elements in the XML or HTML file are not taken into account during indexing. In
the case of HTML, these are mark-up elements that contain technical information about processing and displaying HTML files.
The following examples each contain an extract from these lists:
¡ <remove-region>
<item key = "applet" />
<item key = "code" />
<item key = "script" />
...
<item key = "title" />
¡ <multimedia-markup>
<item key = "applet" />
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 98 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 99/105
<item key = "code" />
<item key = "script" />
<item key = "server" />
...
<item key = "title" />
Special features of XML files
The selection of the mark-up elements in the std.html-config configuration file is based on that fact that all HTML language elements are standardized and
defined on an international level. Thus for HTML is guaranteed that the mark-up element listed contain only technical information (<applet>, <script>, <code>,
and so on), and not texts relevant to the document content.
However, in XML you can use a DTD (Document Type Definition) or an XML schema to define your own XML language elements, whose descriptions can be
identical to HTML language elements and which can contain text that is relevant to the document content. If you are indexing XML files, you therefore need to
check whether some of the mark-up elements have the same names and remove any affected elements from the list so that TREX processes them.
2. Remove an element from the list or add an element to the list:
¡ Remove an element from the list if you want the system to index the text that is marked by this element.
You do this, for example, by deleting the line <item key = "ap plet" /> from the list.
¡ Add an element to the list if you do not want the system to index the text that is marked by this element.
You do this by adding the line <item key = "Markup_Element" /> to the above list. In doing this, you replace Markup_Element with the element that you want to
exclude from processing.
Note that the list in the std.html-configconfiguration file contains certain default elements that are not taken into account during indexing.
3. Save the file and close the text editor.
4. Stop the TREX preprocessor and restart it, so that the new settings take effect. You start and stop the preprocessor using the function for starting and stopping
the TREX servers in the TREX admin tool (stand-alone).
Note that the TREX daemon automatically restarts the server after it has been stopped. The settings are valid for all documents indexed after the TREX
preprocessor is restarted. The new settings do not affect documents that have already been indexed.
Exclude Redundant Text Parts From Indexing
To exclude redundant text parts from indexing, proceed as follows:
1. Flag these text parts within the XML or HTML code in the relevant XML or HTML documents us ing a dedicated mark-up element (for example, <trexignore> ...
</trexignore>).
Note that, in the case of XML file, you must define the new mark-up element in the associated DTD or XML schema, otherwise the XML document is not well-
defined. In the case of HTML, the new mark-up element is ignored by the browser when displaying the document, because it is not part of the HTML standard.
2. Add the newly-defined mark-up element (for examp le, <trexignore> ... </trexignore>) in the two sections <remove-region> and <multimedia-markup > in the
std.html-config file as <item key = "trexignore" /> as described in the procedure Excluding Technical Information From Indexing (see above).
Changing Proxy Server Settings
Use
The TREX preprocessor prepares documents for indexing by the TREX engines. The application using TREX (for example, Content Management in SAP
Enterprise Portal) transmits the documents to be indexed to the preprocessor in the form of URIs that reference the storage location of the documents in question.
The preprocessor resolves these URIs and collects the actual documents from a Web server using HTTP.
Access to Web pages can take place us ing a Proxy server regardless of whether the pages are in the Internet or in an Intranet. If you want to index documents
that can only b e accessed using a proxy server, you have to register the proxy server with the TREX preprocessor.
There might also be documents in your environment that can be accessed without a proxy server, for example, documents on local servers or your enterprise’s
external homepage. You can inform the preprocessor of the servers it can access without a proxy server. This speeds up the processing of documents on these
servers.
You specified settings for the proxy server when you installed TREX. If you want to change this later on, modify the TREXPreprocessor.ini configuration file on the
server on which the TREX preprocessor is running.
The graphic below shows a portal scenario. Some of the documents to be indexed are located on servers on the intranet, others on servers on the Internet. The
documents on the Internet can only be reached using a proxy server. The proxy server is not needed for documents on the intranet.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 99 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 100/105
Enter the proxy server into the section [httpclient] in the configuration file TREXPreprocessor.ini so that TREX can load external documents. Enter exclusion
rules for internal documents into the section [proxyrules].
Procedure
1. Open the configuration file <TREX_Installation_Directory>\TrexPreprocessor.ini on the server on which the TREX prep rocessor is running. Use a text editor to
do this.
2. Modify the following parameters in the [httpclient] and [proxyrules] sec tion:
[httpclient]
proxyhost=<name_of_proxy> (hostname and domain of the proxy server)
proxy.mylocation.mycompany.com
proxyport=<proxy port>
8080
proxyuser=<user_for_the_proxy> (optional)
You only need to enhance the line proxyuser if a user ID is needed to access the proxy server.
proxypassword=<password_for_user> (optional)
You only need to enhance the line 'proxypassword=' if a password is also needed for the user ID.
You can specify the password for the proxy user during the installation of TREX. You can use a script to change this password later on or to define a password
if you did not enter one when installing TREX. For more information, see Configuring TREX Security Settings ® Specifying the Password for the Proxy Server .
The listing of the parameters cannot contain empty lines. Keep to the format outlined above. The system distinguishes between lowercase and uppercase.
[proxyrules]
Specify the addresses for which the proxy server is not to be used. You normally enter one or more character strings in which the addresses in your intranet
end.
mycompany.com or mylocation.mycompany.com
Do not use the asterisk (*) as a placeholder. Lines that begin with # or ! are treated as comments and are therefore ignored. This is also true for IP addresses.
To exclude the IP address space 10.10.0.0-10.10.255.255, add the line 10.10. [proxyrules] to the section. This ensures that no proxy is used for URLs that
contain IP addresses in this space.
3. Save the file and close the text editor.
4. You have to stop and restart the TREX preprocessor so that it recognizes the changes to the configuration file TREXPreprocessor.ini.You do this using the
TREX admin tool (s tandalone). For more information, see Starting and Stopping the TREX Servers. Note that the TREX daemon automatically restarts the servers
after they have been stopped.
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 100 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 101/105
Activating Python Extensions
Use
Some TREX functions are implemented as Python extensions. If the application used by TREX uses these functions, you have to activate the Python extensions.
The installation documentation for the application in question contains information on whether you have to activate any Python extensions.
The following Python extensions are available:
Extension Description
XML attribute extraction Extracts the attributes to be indexed from XML files.
This extension is required if the texts to be indexed consist only of attributes
and the attributes are transmitted to TREX as XML files.
Expansion of linguistic search queries Enhances linguistic search queries so that TREX can carry out an exact
search as well as a linguistic search.
Metadata extraction Extracts metadata from HTML documents.
Topic maps Uses topic maps to determine terms that have a semantic relationship to the
search term.
The semantic relationships involved depend on the structure of the topic map.
In most cases the topic map stores synonyms, hypernyms, and hyponyms
(superordinate and subordinate terms).
Semantic search Uses topic maps to enhance search queries with additional search terms.
This extension allows you to include lists of synonyms in the search, for
example.
The following procedure explains how you activate the Python extensions globally for all indexes.
If you need to activate Python extensions locally for your application, the relevant information can be found in SAP Note 700771.
The global activation consists of the following two steps:
1. Activate the Python extension handler.
2. Registering the required Python extensions
Activate the Python extension handler.
1. Edit the configuration file <TREX_DIR>/TREXExtensions.ini.2. Check that the [activate] section has the structure below, and modify the section if necessary .
[activate]
imsapi=search, thesaurus, admin
preprocessor
3. In the [extensionhandlers]sec tion, add the line trexxpy and/or remove the comment sign (#).
[extensionhandlers]
trexxpy
Registering the Python extensions
The directory <TREX_DIR>\extensions\examplecontains the file _extensions.py. This serves as a template for the configuration file extensions.py.
1. Copy the file _extens ions.py to the TREX installation directory <TREX_DIR> and rename it to extensions.py.
2. Edit the configuration file extensions.py.
3. In the relevant section, change the entry if 0: to if 1:. You identify the extensions by the class name.
Extension Class
XML attribute extraction XmlExtractor
Expansion of linguistic search queries LinguistFix
Metadata extraction AttributeExtractor
Topic maps XtmExpander
Semantic search SemanticSearch
Register XML attribute extraction:
# XML attribute extractor extension
# --------------------
if 1:
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 101 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 102/105
sys.path.append(os.path.join(os.getenv('SAP_RETRIEVAL_PATH'),
'extensions', 'attribute-extractor'))
from xmlextractor import XmlExtractor
trexx.registerExtension(trexx.EXTCLASS_INDEXING,
XmlExtractor(debug=0, mimetypes=['text/xml' ]))
Result
The changes take effect when you next start the TREX daemon.
If you want to use the semantic search or topic maps, you must carry out further configuration steps. If necessary, contact SAP Support.
If errors occur during routine operation and the required functions are not available, check the trace file (<TREX_DIR>/trace/PythonExtension.log). This contains
information on the incorrect entries in the TREX configuration files. If you cannot solve the problem, contact SAP support.
Configuration of the TREX Services in the SAP J2EE Engine
Use
TREX provides programming interfaces (Application Programming Interfaces, APIs) for the languages Java and ABAP that allow access to all TREX functions.
The Java interface (TREX Java client) is part of the SAP Web AS Java as a TREX service.
The graphic below shows the TREX Java client as the interface between the TREX servers and the Java application that uses TREX (for example, Knowledge
Management (KM)):
The configuration of the TREX service in the SAP J2EE Engine comprises the following areas.
● TREX Caches
TREX uses caches in the portal to store search results temporarily, for example. You use the configuration of the caches to display the caches and modify
them to your requirements.
There are the following TREX caches:
○ Adminis tration Cache
○ Memory Cache
● TREX Java Client
Java applications use TCP/IP and HTTP/XML to access the TREX search and text-mining functions through the TREX Java client that is part of the SAP
Web AS as a TREX service. The TREX Java client needs to know the address of the TREX name server in order to communicate with the TREX servers.
You configure this during the TREX installation. You sometimes have to configure the other parameters for this communication too.
For more information about configuring the TREX name server, see Specifying the Address of the TREX Name Server .
The following areas of the TREX service configuration are displayed in the SAP J2EE Visual Administrator :
○ TCP/IP communication
○ SSL
○ Name server
○ Cache for search queries
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 102 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 103/105
Note that this documentation only describes those parameters and values for the TREX service that you have to configure in specific circumstances.
TREX Caches
Use
The TREX caches are used by the TREX infrastructure of the portal to store search results temporarily, for example. You use the configuration of the caches to
display the caches and modify them to your requirements. Normally, you do not change these settings. You use the Visual Administrator to display the TREX
cache in the SAP J2EE Engine.
Features
The TREX caches comprise the following cache types:
● Administration cache
● Memory cache
Note that only those parameters and values for the TREX service are described that you have to configure in specific circumstances.
Administration Cache
The administration cache is a memory cache that objects are stored in. The administration cache is used to store TREX commands that are initiated by the TREX
administration control.
Shell Value Description
cache.trexadmin.capacity 100 Capacity of the cache; this value depends on the
number of different search requests.
cache.trexadmin.defaulttimetolive 300 Expiry time for the cache; specifies in seconds how
long the cache entry is to exist.
Memory Cache
The administration cache is a memory cache that objects are stored in. The memory cache is used to store search queries and the associated responses.
Shell Value Description
cache.trexmemory.capacity 100 Capacity of the cache; this value depends on the
number of different search requests.
cache.trexmemory.defaulttimetolive 300 Expiry time for the cache; specifies in seconds how
long the cache entry is to exist.
The required caches have already been selected. Do not change these settings.
TREX Java Client
Use
The TREX Java client is an interface that Java applications can use to access Search and Classification (TREX) functions. Communication between a Java
application and TREX can take place directly using TCP/IP or using a TREX HTTP server and HTTP/XML.
Integration
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 103 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 104/105
Java applications use TCP/IP and HTTP/XML to access the TREX search and text-mining functions through the TREX Java client that is part of the SAP Web AS
as a TREX service. The TREX Java client needs to know the address of the TREX name server in order to communicate with the TREX servers. You configure this
during the TREX installation. You sometimes have to configure the other parameters for this communication too.
Features
In order to communicate with TREX, the individual TREX servers and their parameters must be registered with the TREX Java client. You do this by configuring
the name server, which then carries out further steps itself.
Overview of Areas and Parameters in the TREX Java Client
Function Description
TCP/IP Communication Parameters of TCP/IP communication between the Java client and the TREX
servers.
SSL Parameter for secure communication using SSL/HTTPS between the TREX
components, the portal, and Content Management.
Name Server Parameters for configuration of the TREX name server.
Cache for Search Queries Display of caches that are used to store TREX search queries temporarily.
TCP/IP Communication
Use
The TREX Java client can access the TREX servers directly using TCP/IP communication. If required, you must configure the parameters for TCP/IP
communication specified below for secure communications.
Features
Parameters Relevant for TCP/IP Communication
Key Value Description
communication.issecure false You must change this value to true if TREXNet has
been configured for secure communication.
For more information about the configuration of TREXNet for secure communication, see Configuring TREXNet for Secure Communication.
SSL
Use
You configure the parameters for secure communication (with HTTPS) between the TREX Java client, which is integrated in the Web AS Java as a TREX service
and the TREX Web server here. The Java client and Web server both need a certificate issued by the same certification authority (CA) in order to be able to
communicate with one another securely.
● The Java client needs a client certificate.
● The Web server needs a server certificate.
● Both components need the root certificate of the CA that issues the other two certificates.
For more information about the configuration of secure communication between the TREX Java client and the TREX Web server, see Configuration of the TREX
Security Settings ® Providing the Certificates for the Java Client
Features
PUBLIC© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Page 104 of 105
7/28/2019 TREX7.0
http://slidepdf.com/reader/full/trex70 105/105
Relevant SSL Parameters
Key Value Description
default.keystore TREXKeyStore Keystores in which the certificates for secure
communication between the Java client and CM
are stored (public key and private key certified by
the CA).
default.truststore TrustedCAs Keystores in which the certificates of certification
authorities (CAs) that you trust are stored.
Name Server
Use
The TREX name server stores and coordinates system-wide information on the TREX installation and on communication between the TREX servers and CM. The
name server settings automatically determine the parameters of the HTTP server, queue server, and index server. There can be scenarios that implement more
than one name server. If this is the case, they are listed here.
The TREX Java client communicates with the central name server directly using TCP/IP and not using the HTTP server using HTTP/XML.
Features
Overview of Parameters for TREX Name Servers
Key Value Description
nameserver.address tcpip://<nameserver>:
<nameserverport>
By default, the name server port is predefined.
Address of the central name server currently being
used. The name server manages the topology of a
TREX installation.
nameserver.backupserverlist tcpip://< nameserver>:
<nameserverport1>,
<nameserverport2>,
<nameserverport3, ...
Multiple name servers are separated by commas.
List of all available name servers.