+ All Categories
Home > Documents > ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition...

ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition...

Date post: 22-Mar-2018
Category:
Upload: phungkiet
View: 228 times
Download: 1 times
Share this document with a friend
93
© ABBYY. All rights reserved. Page 1 of 53 ABBYY Recognition Server 4 Feature List Release 5
Transcript
Page 1: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 1 of 53

ABBYY Recognition Server 4Feature ListRelease 5

Page 2: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 2 of 53

Table of contents

Introduction ......................................................................................................................................................... 6About the document ............................................................................................................................................ 6About the product ............................................................................................................................................... 6Release 5 – Key features and enhancements ....................................................................................................... 6Release 4 – Key features and enhancements ....................................................................................................... 6Release 3 - Key features and enhancements ........................................................................................................ 6Release 2 - Key features and enhancements ........................................................................................................ 6Release 1 Multilingual - Key features and enhancements .................................................................................... 7Release 1 (English and Russian User Interface) - Key features and enhancements ............................................... 7Installing the new version .................................................................................................................................... 8Licensing .............................................................................................................................................................. 8New Features and Improvements ........................................................................................................................ 8

Release 5 ............................................................................................................................................................. 81. Improved processing of email messages ...................................................................................................... 8

1.1. Import of email messages in EML format ........................................................................................ 81.2. Processing of attached email messages ........................................................................................... 81.3. Properties of output files produced by converting email messages .................................................. 91.4. Email body processing improvements ........................................................................................... 101.5. Ability to send output emails to recipients in the “To:” field of the input email .............................. 11

2. Saving word confidence values to output Alto XML files ...............................................................................123. Microsoft SharePoint 2016 support ..............................................................................................................124. Microsoft Failover Cluster support ...............................................................................................................125. ABBYY Recognition Server 4 IFilter improvements ........................................................................................13

Release 4 ............................................................................................................................................................131. Support of Microsoft SharePoint Online .......................................................................................................132. Improvements in SharePoint document libraries processing .........................................................................14

2.1. Ability to delete source documents in document libraries ............................................................. 152.2. Backup of SharePoint source documents ...................................................................................... 162.3. Output file creating options “If file exists” for SharePoint document library .................................. 172.4. Keeping correspondence between input and output files .............................................................. 192.5. Exporting document types to SharePoint library ........................................................................... 192.6. Indexing of documents stored in SharePoint libraries .................................................................... 20

3. Processing of digitally created documents ....................................................................................................213.1. Built-in OpenOffice for conversion of office file formats ................................................................ 213.2. Processing station roles ................................................................................................................ 233.3. Ability to detect files that do not require processing ..................................................................... 24

4. Mailbox messages processing ......................................................................................................................254.1. Processing email body with MS Office/LibreOffice handlers .......................................................... 254.2. Attachment of source files to notification email of failed job ......................................................... 264.3. Using index fields in output message title and body ...................................................................... 27

5. Import .........................................................................................................................................................275.1. Exclude mask in input settings ...................................................................................................... 27

6. Recognition..................................................................................................................................................28

Page 3: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 3 of 53

6.1. Thorough barcode recognition ...................................................................................................... 286.2. Improved Arabic OCR .................................................................................................................... 286.3. Support of formal language OCR-A ................................................................................................ 28

7. Indexing .......................................................................................................................................................297.1. Import of index fields format constraints from SharePoint library ................................................. 29

8. Export ..........................................................................................................................................................298.1. Element for using index field in output file naming and path schema ............................................ 298.2. Image smoothing mode of MRC .................................................................................................... 308.3. Correction of page orientation of PDF when adding a text layer .................................................... 318.4. Saving coordinates of retyped words into ALTO XML ..................................................................... 328.5. Writing barcode type into XML file ................................................................................................ 328.6. Using lossless export to JPEG2000................................................................................................. 32

9. Licensing ......................................................................................................................................................339.1. Information on Gothic pages quantity ........................................................................................... 33

10. Installation ...................................................................................................................................................3410.1. Windows 10 support..................................................................................................................... 3410.2. Patching procedure ...................................................................................................................... 34

11. API ...............................................................................................................................................................3411.1. Compatibility with the previous version ........................................................................................ 34

Release 3 ............................................................................................................................................................351. Import .........................................................................................................................................................35

1.1. Conversion of Office file formats ................................................................................................... 351.2. Import Event Handlers .................................................................................................................. 371.3. Processing the entire SharePoint portal with child sites within one workflow ................................ 371.4. Ability to process files only after an XML ticket is added ................................................................ 38

2. Processing ...................................................................................................................................................402.1. Support of Burmese OCR .............................................................................................................. 402.2. Support of user patterns created in FineReader 12........................................................................ 402.3. Extracting text from pictures ......................................................................................................... 402.4. Ability to use third-party engines for extracting separation barcodes ............................................ 412.5. Preserving the original PDF quality when merging several files ...................................................... 412.6. Workaround for “Not enough memory" issues .............................................................................. 42

3. Indexing .......................................................................................................................................................423.1. Fast loading of large documents on an Indexing station ................................................................ 423.2. Event handlers for document types............................................................................................... 423.3. Hidden and read-only index fields ................................................................................................. 45

4. Export ..........................................................................................................................................................454.1. Saving output files in input folders ................................................................................................ 454.2. Writing original documents as attachments to PDF/A and PDF documents .................................... 464.3. Improvements in export to ALTO XML........................................................................................... 474.3.1. Saving of source image file name .................................................................................................. 474.3.2. Writing original image coordinates to ALTO XML ........................................................................... 474.3.3. Support of ALTO XML version 3.0 .................................................................................................. 474.3.4. Improved saving of word coordinates for CJK languages................................................................ 474.3.5. Splitting an ALTO XML file into several files at export .................................................................... 48

5. Administration Console ................................................................................................................................485.1. Sending notifications to Administrator via an SMTP server ............................................................ 485.2. Logging the administrator's actions ............................................................................................... 495.3. Extended information in the job log .............................................................................................. 50

Page 4: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 4 of 53

5.4. Restart export of failed jobs .......................................................................................................... 505.5. Administration Console UI Improvements ..................................................................................... 51

6. API ...............................................................................................................................................................526.1. Support of WEB API versioning ..................................................................................................... 52

Releases 1-2 ........................................................................................................................................................531. Server Features ............................................................................................................................................531.1. Separate workflow queues .............................................................................................................531.2. Easy recovery after failure without data loss ...................................................................................531.3. Support working on Failover cluster ................................................................................................531.4. Internal database............................................................................................................................531.5. Server exceptions folder .................................................................................................................532. Administration Console ................................................................................................................................542.1. User rights management.................................................................................................................542.1.1. Usage of Active Directory groups ....................................................................................................542.2. Logs and reports .............................................................................................................................542.2.1. Improved logging ............................................................................................................................542.2.2. Saving information about the operator who edited or rejected the document ................................552.2.3. Correspondence between input and output files .............................................................................552.3. Notifications ...................................................................................................................................562.3.1. Including server and workflow names into the text of notification messages ...................................562.3.2. Notification about near license expiry .............................................................................................562.4. Job rejection without loss of files ....................................................................................................572.5. Interface improvements .................................................................................................................582.5.1. Main window of Administration Console .........................................................................................582.5.2. Workflow status pane .....................................................................................................................582.6. Soft stop of the workflow processing ..............................................................................................593. Workflow settings ........................................................................................................................................603.1. Document Library workflow type ....................................................................................................603.1.1. Periodical crawling of document libraries ........................................................................................603.2. Input settings .................................................................................................................................613.2.1. Processing SharePoint libraries .......................................................................................................613.2.2. Using IFilter for processing PDF files in MS SharePoint ....................................................................623.2.3. Filtering files for processing and settings for unprocessed files ........................................................623.2.4. Using the SSL protocol for data protection ......................................................................................653.3. Processing settings .........................................................................................................................653.3.1. Special mode for processing technical drawings ..............................................................................653.3.2. Despeckle images option ................................................................................................................663.3.3. Setup the color of filling the document edges after deskew ............................................................663.3.4. Additional fonts ..............................................................................................................................683.3.5. To speed up processing, text in pictures is not recognized by default ..............................................683.3.6. Blank page detection settings .........................................................................................................683.4. PDF processing options ...................................................................................................................693.4.1. Improved MRC compression method of output PDF files .................................................................693.4.2. Version, format, and other parameters of an output PDF file ..........................................................703.4.3. Export to PDF/A-3 format ...............................................................................................................703.4.4. Tagged PDF enabled by default .......................................................................................................703.4.5. Possibility to skip processing PDFs with a text layer .........................................................................703.4.6. Ability to embed a text layer and keep the image and all PDF file properties ...................................723.4.7. Enabling and disabling Fast Web View for PDF files .........................................................................73

Page 5: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 5 of 53

3.4.8. Using PDF text layer for recognition results improvement ...............................................................733.4.9. Using PDF text layer for generating quality output files of different formats ....................................733.5. Output settings ...............................................................................................................................743.5.1. Overwriting files in an output folder ...............................................................................................743.5.2. Export format compatible with FineReader Engine 11 .....................................................................763.5.3. KeepPages parameter .....................................................................................................................763.5.4. Export to specific column types in SharePoint .................................................................................763.5.5. Export to ePub3 format ..................................................................................................................783.5.6. Settings of units measurement for export to ALTO XML ..................................................................784. Document processing ..................................................................................................................................794.1. Improved recognition of Arabic texts ..............................................................................................794.2. Ability to limit the number of processed pages in input files ............................................................794.3. Support of new barcode type - USPS-4CB (Intelligent Mail Barcode) ................................................794.4. Disabled image compression of lossy JBIG2 type .............................................................................805. Scanning Station ..........................................................................................................................................815.1. Sending registration parameters values to index fields ....................................................................816. Verification and Indexing Stations ................................................................................................................826.1. Manual selection of documents for verification and indexing ..........................................................826.2. Saving documents...........................................................................................................................846.3. Timeout of inactivity .......................................................................................................................846.4. Improved work with document types and index fields on Indexing Stations.....................................846.4.1. Import of index fields from files ......................................................................................................846.4.2. Quick input of index fields...............................................................................................................866.4.3. Possibility to combine values from several regions into a one index field .........................................866.5. User interface changes ...................................................................................................................876.5.1. Verification Station .........................................................................................................................876.5.2. Indexing Station ..............................................................................................................................877. Operating systems .......................................................................................................................................877.1. Support for Windows Server 2012 Release 2 ...................................................................................877.2. Discontinued support for Windows XP and Windows Server 2003 ...................................................888. Scripting ......................................................................................................................................................888.1. Access to subsequent pages from the document assembly script ....................................................888.2. Detecting the workflow name by script ...........................................................................................889. Changes in the COM-based API and Web API................................................................................................889.1. Namespace changes .......................................................................................................................889.2. Compatible API ...............................................................................................................................889.3. Automatic API deployment on 64x operating systems .....................................................................889.4. Added objects.................................................................................................................................889.4.1. Correspondence between input and output files .............................................................................899.4.2. Support of the recognition service scenario (for NLC) ......................................................................899.4.3. Deleting of jobs ..............................................................................................................................9210. UI and Documentation localization .................................................................................................92

Page 6: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 6 of 53

Introduction

About the documentThis document describes new features that are implemented in ABBYY Recognition Server 4.

About the productABBYY Recognition Server 4 provides new technology, including significantly improvedrecognition of texts in Arabic, new export settings, and other technology improvements. Themain server features, such as stability, performance, and auto-recovery have been revised andimproved. The new version can also process document libraries stored in read-only or editablefolders. Other improvements include advanced logging, GUI changes, and bug fixes. The mainchanges are described in the history below.

Release 5 – Key features and enhancementsPart #: 1135/20, build # 4.0.6.118, OCR Technologies build # 13.0.28.117, release date:November 28, 2016.

· Improved processing of emails

· Support of Microsoft SharePoint 2016

· Microsoft Failover Cluster support

· Bug fixes

Release 4 – Key features and enhancementsPart #: 1135/14, build # 4.0.5.5022, OCR Technologies build # 13.0.24.96, release date:February 02, 2016.

· Support of Microsoft SharePoint Online (Office 365)

· SharePoint libraries processing improvements

· Office-independent conversion of digitally created office documents

Release 3 - Key features and enhancementsPart #: 1135/9, build #4.0.4.1425, OCR Technologies build # 13.0.20.54, release date:15/06/2015.

• Conversion of Office file formats• Processing the entire SharePoint portal with child sites within one workflow• Saving output files in input folders• Writing original documents as attachments to PDF/A and PDF documents• Improvements in export to ALTO XML• Sending notifications to the Administrator via an SMTP server

Release 2 - Key features and enhancementsPart #: 1135/6, build # 4.0.3.1167, OCR Technologies build # 13.0.15.131, release date:14/11/2014

Page 7: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 7 of 53

New features and changes in Release 2 are marked with the blue color here and in the documentbelow.The major features:

• Improved MRC compression method• Using IFilter for processing PDF files in MS SharePoint• Processing the SharePoint document libraries:

o Crawling of the complete SharePoint site (including multiple librariesand folders) o Periodical re-crawling settings

• Export to specific column types in SharePoint (support of Date, Number, and selectedother formats)• Export to PDF/A-3

Other improvements:• Improved e-mail notifications:

o In advance notifications about licenseexpiry o Information on server name in themessage text

• Sending registration parameters values from Scanning Station to index fields• Soft stop of the workflow processing• Support of failover cluster• Using PDF text layer for generating output files• Blank page detection parameters• New barcode type - USPS-4CB (Intelligent Mail Barcode)• New export format: ePub3• Settings of units measurement for export to ALTO XML• Disabled image compression of lossy JBIG2 type• Tagged PDF enabled by default• Possibility to combine values from several areas into a one index field• Access to subsequent pages from the document assembly script• Detecting the workflow name by script

Release 1 Multilingual - Key features and enhancementsPart #: 1135/5, build # 4.0.2.952, OCR Technologies build number 13.0.13.21, release date:14/08/2014

• Translation of Ul and help into the following languages:o French o Germano ltalian o Spanisho Chinese oPortuguese (Brazil)o Czech oHungarian o Polish

Release 1 (English and Russian User Interface) - Key features andenhancements

Part #: 1135/4, build # 4.0.2.943, OCR Technologies build number 13.0.13.15, release date: May27, 2014

• Improved fault tolerance and logging• Processing documents in "read-only" mode• Processing of documents in SharePoint libraries• Enhanced work with PDF files• Better support for construction drawings• Faster recognition of Arabic texts• User management via Active Directory

Page 8: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 8 of 53

Installing the new versionRecognition Server 4 can be installed on the same computer where Recognition Server 3.5 orprevious versions were installed.Configuration of a previous version of ABBYY Recognition Server can be imported in ABBYYRecognition Server 4. For further information, please see System Administrator's Guide,"Upgrade from the previous versions of ABBYY Recognition Server."Note. Please be aware that some changes have been made to the XML Result file scheme andthe corresponding API object. This may lead to modifications in your custom code written forintegration of ABBYY Recognition Server with third-party systems. Please find details below inthis document or in the XMLResult description article in the help file.

LicensingRecognition Server 4 requires licenses generated specifically for this version of the product. Itcannot work with a license generated for Recognition Server 3.5 or earlier.

New Features and Improvements

Release 5

1. Improvedprocessingofemailmessages

1.1. Import of email messages in EML formatProcessing of email messages stored in EML format is now supported. EML messages can beimported within any workflow type: from shared folders, from SharePoint, from mailboxes (asfiles attached to email messages), etc.

More information on configuring “Mail” workflows is provided below.

1.2. Processing of attached email messagesRecognition Server allows processing email messages (including body and any attachmentsinside) attached to email messages imported from a POP3 or Microsoft Exchange mailbox.

This may be useful in the following scenarios: case registration, customer relationshipmanagement, data investigation, and multi-channel input of incoming data when allcorrespondence contents are important or if previous emails are resent.

The following formats of attached messages are supported: Outlook email message, email ofEML file type.

Page 9: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 9 of 53

To enable the processing of both Outlook content messages and EML files, select the Processattached messages option.

If email messages in EML format should not be processed, this format should be added to thelist of excluded formats. To achieve this, specify the *.eml file extension in the Exclude field.

Note: If the processing of attached email messages is enabled, all input settings will be appliedto both main and attached email messages.

· Selecting the option All messages (and add message body to exported document) enables the processingof email messages both with and without attachments.

· If an attached message itself contains attached files or messages, the file extensions specified in theProcess attachments and Exclude fields define which attached files or messages must be processed.

1.3. Properties of output files produced by converting email messagesWhen processing an email message and saving its body, attached files, and attached emailmessages into separate documents, it is useful to have information about the source of eachoutput document. For example, it is useful to know whether a document was obtained fromthe body of a main message (e.g. request text) or from the body of an attached message (e.g.supplementary information). This information can be used to route documents to desiredlocations of the back-end system.

The following output document properties are available in the XML result file:

Page 10: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 10 of 53

· IsMailBodyFile // Shows whether the file was created from a message body (=True) or from anattachment (=False)

· IsMailAttachedMessageFile // Shows whether the file was created from an attached message(=True) or from a main message (=False)

1.4. Email body processing improvementsEmail contents processing has been improved:

1. An email header is now added to email body text.

Output files (or document pages) with email body now contain an email header with emaildate, sender, recipient, and subject. The language of the field names in the header depends onthe Recognition Server interface language. The date and time in the Sent field of the emailheader are converted to UTC.

Page 11: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 11 of 53

2. Email subjects are used as names for output files created from email bodies.

Output files produced by converting email bodies are now named using the text in theirsubject fields. If a subject field is empty, the output file name is Email Message.<ext>.

For emails with subjects longer than 50 characters, the filename will be truncated and theexcess characters will be replaced with ~.

For example, the subject IIM National Conference 2016 -' Big Information for BetterInformation Governance will result in a file named IIM National Conference 2016 -' BigInformatio~.pdf.

3. Source emails are saved as HTML files or ZIP archives.

If the workflow output parameters require that source files should be saved into a folder,imported emails will be saved as HTML files. If an email contains pictures in its body, an HTMLfile with PNG picture files will be created and packed into a ZIP archive.

1.5. Ability to send output emails to recipients in the “To:” field of the input emailEmail conversion service for employees has been improved by adding the ability to sendoutput emails to recipients in the “To:” field of the input email. This is useful when the userscans a document using an MFP and then sends an email with the scanned file tohimself/herself (the user’s address is in the “To:” field).

To enable this feature:

Page 12: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 12 of 53

· Select the Reply to all option in the workflow properties;

· Modify the Configuration.xml file as follows:

o Set the ReplyToAddressesInToList parameter to true;

o Specify excluded addresses (ExcludedAddresses parameter) if necessary.

To avoid a situation when the resulting email is also sent to the mail address used as an inputsource for Recognition Server import, it is strongly recommended that you add the RecognitionServer address to the list of excluded addresses.

For example:

<ExportFormat ... OutputFlowType="Smtp"EmailSelectionMethod="ReplyToSender" ReplyToAddressesInToList="true"...>

...<ExcludedAddresses>[email protected]</ExcludedAddresses>...

</ExportFormat>

2. SavingwordconfidencevaluestooutputAltoXMLfilesAn additional property, WC, which the contains word recognition confidence value, can now beexported into the output Alto XML file.

To enable this feature, modify the Configuration.xml file, setting the WriteWordConfidenceparameter to True.

3. MicrosoftSharePoint2016supportMicrosoft SharePoint 2016 is now supported.

Export to Microsoft SharePoint libraries, library crawling, saving documents with indexattributes, and other scenarios are available in both on-premise and online versions ofMicrosoft SharePoint 2016.

4. MicrosoftFailoverClustersupportABBYY Recognition Server can be deployed in a Microsoft Failover Cluster to ensure automaticswitching to a functioning server and to reduce down times.

To implement this functionality, additional improvements have been made. It is now possibleto store the server configuration file and temporary files of running tasks in a shared folder.This folder is available to any node with the Recognition Server installed and allows quickswitching to another server for uninterrupted processing.

Instructions for installing and setting up Recognition Server in a Microsoft Failover Cluster areavailable in the Administrator’s Guide (currently available in English and Russian only).

Page 13: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 13 of 53

5. ABBYYRecognitionServer4IFilterimprovementsABBYY Recognition Server 4 IFilter is optimized for better performance. The timeouts ofcommunication between IFilter components have been fine-tuned and logging of IFilter eventshas been disabled.

When tracking the IFilter operation and consulting with ABBYY Technical Support team, it isnecessary to enable logging for IFilter.

To enable logging:1. Create a folder for storing the log files. For example, C:\RS_Logs.

2. Enable the IFilter frond-end logging in the windows registry:

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\ABBYY\RecognitionServer\4.0\Log\App]"IsEnabled"="true""FilePath"="C:\\RS_Logs""IsEventLogEnabled"="false"

For 32-bit versions of Windows, Wow6432Node part of the path should be omitted.

3. (optionally) Enable the Recognition Server logging by modifying the DeveloperSettings.xml file:

a. Edit the C:\ProgramData\ABBYY Recognition Server 4.0\DeveloperSettings.xml fileb. In the Path parameters, specify the path to the folder, where logs will be keptc. Set IsEnabled parameter to the True

For example:

4. Send the log files from the folder to ABBYY Technical Support team.

Release 4

1. SupportofMicrosoftSharePointOnlineAn online version of Microsoft SharePoint Office 365 or SharePoint Online – has been fullysupported.

This functionality allows using Recognition Server in the same scenarios as with previouslysupported versions: export to SharePoint Online library, import files from SharePoint Onlinelibrary, processing files within SharePoint Online portal with child sites and libraries, etc.

To enable this ability, it is necessary to select SharePoint Online option in the SharePointAuthentication settings (Log In… button), when specifying the parameters of connection toSharePoint libraries in the workflow properties.

Page 14: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 14 of 53

2. ImprovementsinSharePointdocumentlibrariesprocessingIn previous releases, the ability to process documents stored in document libraries has beenadded:

· crawling the selected set of sites and document libraries,· detecting files that require processing,· saving the recognition results into the source library.

The previously implemented functionality had a number of limitations that have beenovercame in the Release 4. The following features have been added to the Document Libraryworkflow type:

1. Ability to replace the initial files with the converted ones by deleting the source files

2. Ability to copy the original MS SharePoint documents into backup folders in order to ensure safe filesreplacing with conversion results (applicable to workflow of MS SharePoint processing)

3. Ability to select the document saving parameters, if file names coincide, and keeping the correspondencebetween input and output files

4. Ability to change and add attributes of MS SharePoint documents (applicable to workflow of MSSharePoint processing)

5. Ability to create SharePoint content types based on Recognition Server document types and index fields

Based on the mentioned improvements the scenario of normalizing the content of SharePoint(or folder based) document libraries can be realized with Recognition Server – the files that arestored in multiple libraries can be replaced with their searchable copies suitable for long-termarchiving and search (e.g. PDF/A).

Please see the details of the improvements below.

Page 15: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 15 of 53

2.1. Ability to delete source documents in document libraries

A new option of deleting the source documents hasbeen added to the Document Library workflowproperties. It is applicable to workflows of anyDocument library type: Shared folder, SharePoint, FTP.Now the document folders can be converted tonormalized and accessible content storages withoutduplication of files.

To enable this feature, select Delete source filesoption at the 1. Input tab of the workflow properties.The source files are deleted only after the publishingof the conversion results is completed successfully.

Note: to ensure the documents security, it isrecommended to move the original files into backupfolders, at least when testing the configuration priorto production.

Page 16: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 16 of 53

2.2. Backup of SharePoint source documentsIt is possible now to move the original files into special backup folder of the same SharePointlibrary, which is being processed.

This mode is recommended for scenario, when Recognition Server crawls the SharePointlibraries and replaces the original files with the converted ones. It helps to preserve theoriginal files and ensure the quick access to the source files, which can be restored andmodified, if needed.

If source files backup mode is enabled, then the original files are copied then to the _Backupfolder of every processed library. A link to the original file in a backup folder is placed intocolumn named Source Files.

To enable this functionality, an option Back up source files to output library in the OutputFormat Settings dialog should be selected.

Page 17: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 17 of 53

The backup mode requires creating a new content type named RS Source File in the root site(or in the library) of MS SharePoint portal. This content type is added to any library that isselected for processing by Recognition Server and is assigned to the source documents that arecopied to the _Backup folder.

Creation of RS Source File content type can be done either during the documents export, if aprocessing station has the necessary permissions, or at the moment of workflow setup byclicking the Create backup content types… button.

If the set of libraries is fixed and doesn’t change, it is recommended to use the Create backupcontent types… button, as in this case the necessary changes to SharePoint site configurationare done once during the setup and won’t require granting higher permissions to processingstations.

Note: for creating content types during export, a processing station should have the necessarypermissions for SharePoint (Full, Edit, Design).

2.3. Output file creating options “If file exists” for SharePoint document libraryFor workflows that process SharePoint document libraries with saving the results into thesource library, an option If file exists has been added.

Note: for workflows, where SharePoint library serves as an export destination for convertedfiles from any input source and for folder-based document libraries, these options were

Page 18: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 18 of 53

available earlier.

In the previous releases, there was a default behavior of always creating a new file, that led toduplication of files in SharePoint document libraries. For example, if image-only PDF filesstored in the SharePoint libraries should be supplied with the text layer and the processingresulted in keeping both image-only and new searchable PDF file.

Now these options help to manage the output files creation and naming, if a conflict of namesarises when saving a file.

Create new name If a file with this name already exists, a 4-digit number will be appendedto the file name

This mode should be used in the scenario when the original files arepreserved and kept in the same folder.

Overwrite file If a file with this name already exists, it will be replaced with a publishedone.

This mode is useful in the scenario of replacing the scanned PDF fileswith their converted copies.

Page 19: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 19 of 53

Note: please also see the description in pp.2.4.

Use SharePoint versioning options If a file with this name already exists and document library supportsversioning, a file will be saved as a new version of existing document.

This mode can help in the scenario, when source PDF documents mustbe preserved as previous versions of processed documents.

Note: by default the option is set to Create new name, but if the customer upgrades from theRecognition Server 4 Release 3 Patch 1 with the enabled parameter of files rewriting(RewriteIfFileExists ="TRUE") in the Configuration.xml, then after the upgrade the If fileexists option is set to Overwrite file.

2.4. Keeping correspondence between input and output filesThe statistics of “Document Library” workflow processing is kept now in the internal database,that store the input files’ full names with links to the corresponding output files. This helps to:

1. Avoid overwriting the documents that were created from different input files, but resulted in output fileswith the same names (this behavior is maintained despite the selected “Ovewrite file” option in order toprevent documents loss);

For example, document library contains Contract.docx, Contract.pdf, Contract.tif that haveto be converted to searchable PDF/A. As the every file results in Contract.pdf filename, buthas the different content, several files will be created to preserve the documents from loss– Contract0000.pdf, Contract.pdf, and Contract0001.pdf accordingly.2. Overwrite the previous result if the corresponding source file has been changed and taken into the

processing again (it is applicable to scenario of office files conversion, e.g. if the source *.docx file hasbeen modified by the user).

The information on input-output files correspondence is cleared, if the workflow is restarted.

This can be done manually by using the Restart command from the toolbaron Administration console.

In addition, the workflow can be restarted automatically, if the workflowproperties have been changed and new changes must be applied to all documents in sourcefolders (in this case it is recommended to use Overwrite file option). When saving theworkflow, the user receives the warning message and selects, whether the workflow should berestarted.

2.5. Exporting document types to SharePoint libraryDocument types configured in Recognition Server can be exported now to the MicrosoftSharePoint libraries as SharePoint content types.

This can help in the scenario of creating a structured archive in SharePoint. The archivestructure can be described in the workflow settings and spread to a newly created SharePointlibrary, that normally contains the default content type named Documents only.

To enable this feature,a. on the 5. Indexing tab of workflow properties create the necessary document types and index fields;

b. click Actions… button and select Export document types to SharePoint action;

c. specify the connection settings to the SharePoint library where the processed documents will be stored,select the document types and fields to be exported, and click OK button.

Page 20: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 20 of 53

The necessary content types and columns will be created in the SharePoint site and added tothe selected library.

2.6. Indexing of documents stored in SharePoint librariesDocuments that are processed within the SharePoint portal and saved into the the sourcelibraries can be supplied with attributes (index fields) now.

(In the previous versions, only the documents exported into a single Sharepoint library couldbe saved with attributes.)

In addition to supplying SharePoint documents with new index fields’ values, the existingattributes can be shown and modified during indexing in Recognition Server.

Note: if the indexing step is not enabled, the attributes of the source document areautomatically inherited and assigned to the converted file.

After documents are converted and supplied with the index fields’ values, they are publishedto the source libraries and assigned with the appropriate content types. The documentcolumns are filled with the index fields’ values.

Page 21: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 21 of 53

To enable this feature, it is necessary to synchronize the document types of the RecognitionServer workflow properties with the SharePoint content types.

This can be done by:a) importing the appropriate content types from SharePoint document library using Import document types

from SharePoint command (Actions… button on the 5. Indexing tab of workflow properties)

orb) exporting the document types configured in Recognition Server to the SharePoint document library by

using Export document types to SharePoint command (Actions… button on the 5. Indexing tab ofworkflow properties).

Note: using the Indexing step is not recommended if the whole sites or multiple documentlibraries with various content types are selected for processing. This may lead to inconsistencybetween Recognition Server document types and SharePoint content types and result in failedjobs.

For conversion of documents within multiple libraries with indexing, it is necessary to assign alldocument libraries with the same set of content types while configuring the SharePoint server.

3. Processingofdigitallycreateddocuments

3.1. Built-in OpenOffice for conversion of office file formatsIt is possible now to process digitally created files without the need to install MS Office orLibreOffice applications on the server. Apache OpenOffice open-source software suite hasbeen integrated into Recognition Server.

The advantages of using this functionality as compared to conversion via MS Office suit:

· allows processing of digitally created files in pure server environments, where installation of any clientapplication is prohibited

· doesn’t require additional licenses purchase for MS Office package· doesn’t require manual MS Office restart, if it hangs

The disadvantage is in the worse quality of converted files. Apache OpenOffice cannotcorrectly open some certain files, therefore some formatting of documents may be lost or

Page 22: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 22 of 53

changed. Complex objects like Charts or objects in documents created in latest versions of MSOffice may be significantly lost or modified. Please see the RS4_R4_Known_Issues.docxdocument for more details.

The native quality of conversion with full objects support can be achieved by using the MSOffice application that can be enabled by configuring the input event handlers (Handlers…button, 1.Input tab). In this case, the selected application will be used for conversion instead ofthe built-in OpenOffice converter.

How it works:

A special module “Digital Born Documents” should be selected when installing the processingstations. This module works with the standard Recognition Server license and enables theability to process office files by built-in Apache OpenOffice libraries.

If Digital Born Documents module is installed and input files mask (Process files: setting)allows office files to be processed, then the conversion of digitally created files is performed.

The following file formats can be processed: doc, docx, odt, html, htm, txt, rtf; xls, xlsx, ods,csv; ppt, pptx, odp.

Page 23: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 23 of 53

3.2. Processing station rolesIt is possible to set the role that should be granted to a processing station to handle a certaintask:

· Preprocessing,· Processing,· Preprocessing and Processing.

This can be specified in the Processing Station Properties window.

Page 24: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 24 of 53

By default, a newly added station handles both preprocessing and processing tasks(Preprocessing and Processing role selected).

The Preprocessing role has been added for granting a certain processing station permissions topreprocess the documents by means of the built-in office files converter or third-party toolsdefined in event handlers (office suits settings or custom script).

The roles can be changed in situation when multiple processing stations are used inRecognition Server environment and the workflows are configured to process the officeformats.

Third-party applications for conversion of office files (e.g. Microsoft Office) can be installed noton all the machines. That is why only the certain stations can be assigned with the role thatincludes Preprocessing. The rest of the stations must have the Processing role for performingrecognition, export, indexing steps, etc.

Note: at least one station has to have the Preprocessing or Processing and Preprocessing rolein order to process input documents in office formats. If this is not the case, the workflow willdisplay the No preprocessing stations warning when started, and input files in office formatswill not be processed.

3.3. Ability to detect files that do not require processingIn scenarios of processing storages of documents of various types that should be converted toeditable formats like *.docx, *.xlsx or any other, it is possible now to skip processing ofimported files, if the input format coincides with the output format.

This helps to avoid the excess steps of conversion and recognition for originally digitallycreated files. For example, *.docx document won’t be converted to PDF and recreated againfor publishing it into *.docx format selected as an output format for processed documents.

Note: if input and output formats coincide but the workflow processing settings require

Page 25: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 25 of 53

additional analysis and recognition of entire document, such as “Extract text from pictures”,“Special processing for technical drawings”, “Verify all documents” and assembly settings, theoriginal office file will be converted to PDF, recognized and published.

4. Mailboxmessagesprocessing

4.1. Processing email body with MS Office/LibreOffice handlersThe functionality of processing the documents imported from Exchange/Pop3 mailboxes hasbeen extended with the ability to add an email body to the imported attachments. This featurerequires MS Office to be installed on the machine.

This will help to use Recognition Server in such scenarios like receiving applications, claims,invoices and other documents in the email body. The email body will be converted to PDF andcan be processed as any other imported file.

Note: there is a known issue that the email header (From, To, Subject, Date) is not saved intothe output file of email body.

To process email body, it is necessary to:a) select All messages (and add message body to exported document) option at the 1.Input tab of the

workflow properties;

b) configure the input event handler (Handlers… button) and specify the settings to open the files (WhenOffice File is Received event).

Page 26: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 26 of 53

The email body and the imported attachments are processed as separate documents andassembled into the output files according to the document separation properties.

If the imported email contains attachments, the output file will contain the email body page (-s) followed by the pages from the attachments.

If the imported email does not contain any attachment, the output document will contain theemail body page (-s) only.

4.2. Attachment of source files to notification email of failed jobThe source files of the failed job can be attached now to the output notification message. Thiswill help to inform the user about the problems of conversion of documents and keep the filessent by email in ad hoc scenario.

An option Attach source files should be enabled in the Email Subject and Text dialog in OutputFormat Settings of workflow properties.

Page 27: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 27 of 53

4.3. Using index fields in output message title and bodyIt is possible now to include the value of index field into the output email message subject ortext of the output email message.

This can be done by specifying the necessary index field tag when clicking the Insert tag…button.

5. Import

5.1. Exclude mask in input settingsThe input files mask has been extended with the ability to specify the file types that shouldNOT be imported by Recognition Server. In some cases, it is easier to excude certain fileextensions rather than enumerate the file formats that should be taken into processing.

List of extensions that do not require processing must be specified in the Exclude: list at the1.Input tab of the workflow properties. If several file extensions have to be entered, theyshould be separated by semicolon.

Page 28: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 28 of 53

6. Recognition

6.1. Thorough barcode recognitionIf the processing mode is set to Quality (Optimize OCR for slider on the 2.Process tab ofworkflow properties), then the thourough barcode recognition mode is enabled. This may helpto increase the detection and recognition accuracy of barcodes of poor quality or small size.

6.2. Improved Arabic OCRRecognition Server 4 Release 4 uses newer version of OCR technologies that show betteraccuracy of Arabic OCR. The total number of errors per document is 10-13 % less than in theprevious release.

Arabic OCR quality in Recognition Server is on the same level as in FineReader Engine SDK.

6.3. Support of formal language OCR-AOCR-A formal language has been added to the list of recognition languages supported byRecognition Server. Among the specific character types OCR-A alphabet contains three specialcharacters (⑀ ⑁ ⑂), that are widely used in banking cheques.

To enable the recognition of text of OCR-A type, it is necessary to select the OCR-A among thedocument languages at 2.Process tab of the workflow properties and the print type OCR-A inthe Advanced Processing Settings window.

Page 29: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 29 of 53

7. Indexing

7.1. Import of index fields format constraints from SharePoint libraryThe format constraints of SharePoint document attributes are assigned to Recognition Serverindex fields now, when the content types are imported from SharePoint library (5.Indexing tab> Actions… button > Import document types from SharePoint).

These settings are used when validating the values of index fields specified by operator at theIndexing Station and help to prevent export of invalid values that may lead to the documentpublishing failure.

8. Export

8.1. Element for using index field in output file naming and path schemaTo make the adding of index fields into output file naming rule more evident, a new elementIndexField has been added to the list of elements. Index fields’ values can be used for filenaming and for subfolder creation in the File Naming and Output Path dialog.

Page 30: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 30 of 53

When using this element, angle brackets (<>) are inserted into the desired place of namingschema. It is necessary to type the index field name into these angle brackets.

Note: there is no need to add the document type name, only index field names are required.

If the index field value has been left empty during the indexing or such field does not exist inthe workflow settings, than this element will be omitted in the file name or path of theprocessed document.

It is possible to specify several index fields in the output rule, including index fields fromdifferent document types. For instance, for sorting output documents to subfolders. To enablethis, it is necessary to insert several IndexField elements and specify their names.

Example:

Document type 1: Contract; Index fields: CompanyName, DateDocument type 2: Letter; Index fields: Sender, Date

Rule: <DocumentType>\<CompanyName><Sender>\<Date>\<FileName>.<Ext>

8.2. Image smoothing mode of MRCA special mode of smoothing the text on images of compressed PDF documents has beenadded to improve the visual quality of generated PDF files. The smoothed text characters donot have sharp borders and can be easy-readable.

To enable this mode, a value of MRCMode parameter in the Recognition Server settings(Configuration.xml file) should be changed to “Smoothing”: MRCMode="Smoothing"

By default, the value is set to “Normal”.

PDF file with smoothing mode enabled:

Page 31: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 31 of 53

PDF file with standard compression mode:

Note:1. The smoothing mode increases the output file size, e.g. 16kb ->24kb, 296kb -> 335kb

2. It is not recommended to enable this mode at the same time with the Special processing for technicaldrawings option on documents that contain many pictures. It may lead to diffusing of pictures.

The MRCMode parameter can have the following values:

MRCMode="Legacy" // compatibility with MRC settings of previous releasesMRCMode="Normal" // standard compression, the default valueMRCMode="Smoothing" // smoothing

An additional value, can be tested when processing images of bad quality with poor textdetection:MRCMode="TextMask" // compression of text objects only, allows not to diffuse the textobjects detected as pictures.

8.3. Correction of page orientation of PDF when adding a text layerIn scenarios where the text layer is injected into the input image-only PDF files, the orientationof document images is detected and corrected (aligned with the text direction) automaticallynow. It is applicable to the processing mode enables by the Modify text layer only option inthe format settings of output PDF.

By default, the automatic correction of pages orientation is enabled in the Recognition Serversettings (Configuration.xml file):ForbidCorrectOrientationWhileReplacingTextLayer="FALSE".

Page 32: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 32 of 53

To disable this mode, a value of ForbidCorrectOrientationWhileReplacingTextLayer paramerershould be changed to “TRUE”.

8.4. Saving coordinates of retyped words into ALTO XMLA special key can be used now to save coordinates of words to ALTO XML, even if the wordshave been retyped completely during the verification. It is very important in projects ofdigitizing the documents in complex languages like Arabic, when the recognition quality maybe not accurate and some words have to be deleted and typed anew.

This mode can be enabled in the Recognition Server settings (Configuration.xml file), theparameter’s value should be changed to WriteRenderedImageCoordinates="TRUE"

Note: by default this feature is disabled (WriteRenderedImageCoordinates="FALSE"), as itmay lead to slow down the processing.

8.5. Writing barcode type into XML fileA document exported to XML format contains the barcode type now, in addition to the storedbarcode value information.

A new child element named barcodeInfo has been added to the Block element of detectedbarcode.

If a document contains more than one barcode (for separation or with encoded data), it isoften useful to have information on both barcodes types and values to manage the documentsin the final export destination.

8.6. Using lossless export to JPEG2000It is possible to save document images into JPEG2000 without loss. This mode is required bythe scenarios of digital library creation (books, art works, magazines, newspapers with smalltext), when quality of images stored and presented to users of the library must be high.

This mode is enabled, if the Quality parameter in the Output Format Settings properties of

Page 33: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 33 of 53

export to JPEG 2000 is set to 100%.

9. Licensing

9.1. Information on Gothic pages quantityThe license parameters displayed in the Remote Administration Console have been extendedwith the information on the total number of Gothic pages and number of Gothic pagesavailable for processing. It is applicable to licenses of FineReader XIX type.

This information is useful in projects of digitizing the Gothic books to estimate the quantity ofpages allowed for processing.

If the license allows processing of both standard and gothic pages, than there will be twocolumns in the details window for the Licensing node of Administration console: Pages left andGothic pages left.

Note: if there are no FineReader XIX licenses added to the server, the Gothic pages leftcolumn is hidden.

In addition, an information on Gothic pages limitations and balance can be viewed in theLicense Properties window:

Page 34: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 34 of 53

10. Installation

10.1. Windows 10 supportRecognition Server 4 and its components can be installed and run properly under MicrosoftWindows 10 operation system now.

10.2. Patching procedureSince Release 4 version, Recognition Server 4 supports the ability to apply patches with certainchanges without the need to upgrade or reinstall the program. Patch can be released onrequest of ABBYY office intended for fixing several bugs on request of a certain customer.

The patch can be applied to the Release 4 or any of patches released after Release 4. It willcontain the binary difference between released assembly and the renewed assembly.

To install the patch it is necessary to double-click on provided patch file of *.MSP type.

It is possible to roll back to the previously installed version, if the patch is not sufficient forsolving the desired problems. To remove the installed patch, uninstall it from the ControlPanel.

11. API

11.1. Compatibility with the previous versionSince the previous release, the versioning of WEB API has been supported. The principle ifusage is described in the Product Info document.

The version of WEB API in the Release 4 is fully compatible with the Release 3. Therefore the

Page 35: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 35 of 53

version of the virtual application stays the same (Recognition4WS.v3) and can be used whencreating a new application.

Release 3

1. Import

1.1. Conversion of Office file formatsGeneral descriptionIt is possible now to process digitally created documents (e.g. in Microsoft Office file formats)together with images and PDF files.

This feature enables simultaneous input for of documents in various formats. Any digital librarycan be normalized and made searchable and ready for long-term storage.Imported document will be processed according to the workflow settings. The most commonscenario is to import various files and convert them to PDF or PDF/A. However, other outputformats, if supported by Recognition Server, can also be used.This kind of conversion requires the corresponding Microsoft Office v.2007+ (or LibreOfficev.4.2+) or other third party application to be installed on the computer with Recognition Servercomponents (server and/or station). Files converted to PDF via Microsoft Office will havesuperior visual quality and a text layer inherited from the original document.How to enableStep 1. It is necessary to configure an input handler for opening Office files and converting theminto to a PDF suitable for further processing by Recognition Server. Input handlers can beselected in the Input Handlers dialog box (1. Input tab, Handlers... button).

Converted by Supported FormatsMicrosoft Office DOC, DOCX, RTF, TXT, HTML, HTM, XLS, XLSX, PPT, PPTXLibreOffice DOC, DOCX, RTF, ODT, XLS, XLSX, ODS, PPT, PPTX, ODP

Page 36: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 36 of 53

In the handler properties, the user can specify:• an application for converting the files (Microsoft Office or LibreOffice)• a processing component of Recognition Server (server or station)• (for Microsoft Office only) user account permissions to run the application, if required.

The user credentials should be specified if Microsoft Office is running under the useraccount different from the account used for running the Recognition Server service.

For conversion via the third party application, the input handlers based on custom script should beused.Step 2. For new workflows of the Hot Folder type, conversion of all files is allowed by default(*.* is specified in "Process files:" mask). For new workflows of the Document Library type, thedesired Office files extensions should be added to a mask manually (in order to preventdocuments from being converted by mistake).To specify the Office files formats to be processed by Recognition Server, users can also use theConfiguration.xml file (<OnFileReceivedCustomOffice >). In this parameter, the user can disablethe conversion of Office files or specify the extensions of files that should be processed.Note: Import of Office files is not available in Microsoft Search I Filter and Google Search

For conversion via Microsoft Office or LibreOffice the event handler, named When Office File IsReceived (Settings), should be enabled.

Page 37: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 37 of 53

Appliance Connector workflows. These files do not require any processing prior to be indexedby search engines.

Implemented in: Release 3.

1.2. Import Event HandlersNew scripts were added to handle the import events managed by server and/or by station (2.Input tab, Handlers... button). Users can fine-tune document preprocessing using scripts thatalter or improve input files.There are two script-based event handlers: When File Is Received by Server (Custom Script) andWhen File Is Received by Station (Custom Script). A script is run separately for each file.By means of scripts, the user can analyze the input file (name, extension), then preprocess thefile and send to processing or exclude this file from processing (mark as processed) and add anotification to the Event Log.For instance, a document can be preprocessed by the external application (converted to formatsuitable for the further processing by Recognition Server); its resolution can be changed priorto recognition, etc.The third type of events serves for enabling the preprocessing of office files by Microsoft Officeor LibreOffice. It does not require scripting.

Implemented in: Release 3.

1.3. Processing the entire SharePoint portal with child sites within one workflowProcessing the entire SharePoint portal, including its multiple child sites, can now be configuredwithin a single workflow.In the previous release, the user had to create several workflows to access individual child sites.After specifying the connection settings to the SharePoint portal, the complete structure of theportal is shown in the Select SharePoint Libraries dialog box. Here it is possible to select anychild sites and their libraries to be processed.Note: Now the feature of processing Microsoft SharePoint libraries requires the installation of.Net Framework 4.5.2, which can be installed separately or by enabling the Microsoft SharePointSupport option when installing the Server Manager.

Page 38: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 38 of 53

Implemented in: Release 3.

1.4. Ability to process files only after an XML ticket is addedA new mode is supported by Recognition Server that always requires an XML ticket beforeprocessing documents.

This is useful in situations when documents are placed into an input folder before an XMLticket arrives. The option of waiting for an XML ticket can prevent files from being processedincorrectly.

To enable this feature, modify the input file mask so as to allow only XML files (*.xml). Documentswill not be processed until an XML ticket is placed into the input folder.

Page 39: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 39 of 53

Implemented in: Release 3.

Page 40: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 40 of 53

2. Processing

2.1. Support of Burmese OCRA new recognition language (Burmese) has been supported. It was added to the list oflanguages without dictionary support.

Implemented in: Release 3.

2.2. Support of user patterns created in FineReader 12In the Advanced Processing Settings it is now possible to load user pattern files, which were createdand trained in FineReader v.12 (*.fbt format). The ability to use files created in earlier version (*.ptnformat) has been preserved. User patterns are used to improve recognition results for rare anddecorative fonts.

To load a user pattern file, open the Advanced ProcessingSettings dialog box and click the Browse… button in Applyuser patterns

Implemented in: Release 3.

2.3. Extracting text from picturesThe advanced processing settings (workflow properties, 2. Process tab) now contain a new optionwhich enables aggressive text detection. When this option is selected, the program attempts torecognize text in all document zones, including picture zones (they may contain charts, diagrams,screenshots, etc.).This option helps to extract more textual information, which can then be used in document searches.Previously, this option was only available in the Configuration.xml file and was disabled by default(ProhibitHiddenTextDetection="true").

Page 41: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 41 of 53

Implemented in: Release 3.

2.4. Ability to use third-party engines for extracting separation barcodesIt is now possible to use external engines for recognizing barcodes used in the document separationstep. It enables correct separation of documents based on barcodes even if they are not supportedby Recognition Server.

It is now possible to access a document page and send it to an external engine via a separation script.After the necessary barcode is found and recognized, its value can be written into theRecognizedPage: BarcodeText property, and the page with the barcode can be used as the startingpage of the document.

Additionally, the value of RecognizedPage: BarcodeText property can be overwritten if theautomatically detected barcodes (not necessarily by an external engine) are analyzed by a script.This is useful in a situation when a page contains several barcodes and the user has to define rulesfor selecting the document separation barcode.

Implemented in: Release 3.

2.5. Preserving the original PDF quality when merging several filesIn scenarios when several files (e.g. PDF and TIFF) need to be merged into a single PDF file, the qualityof the original PDF file with a text layer is maintained. Only the image file is recognized prior tomerging.

Implemented in: Release 3.

Page 42: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 42 of 53

2.6. Workaround for “Not enough memory" issuesThis information can be helpful to those customers who have "Not enough memory" issues whenprocessing large multi-page files.There is a parameter in the Configuration.xml file that allows splitting large files into smallerfragments (once recognized, the split fragments merged again). In the event of "Not enoughmemory" errors, it is recommended that the user change the value of theMaxProcessorProcessedPages parameter to 1 (the default value is "5000").Note: This workaround does not guarantee a complete resolution of the problem, but can help incertain cases.Implemented in: Release 3.

3. Indexing

3.1. Fast loading of large documents on an Indexing stationIt is now possible to perform manual indexing of large multi-page documents faster. In many cases,the valuable data are located at the beginning of the document. Therefore, by default only the firstfive pages are loaded for indexing on the Indexing Station.

If required, a next set of pages or the entire document can be loaded. On the Indexing Station, newtoolbar buttons have been added to load the next portion of pages or the whole document: LoadMore Document Pages (CTRL+M) and Load All Document Pages (CTRL+A).

The default number of pages to be loaded is 5, but this value can be changed in the server settings.

There is a new parameter IndexingStationPagesSlice in the IndexingSettings tag of theConfiguration.xml file. This value specifies the maximum number of pages to be loaded at once. Todisable this mode and always load all pages of a document, change this value to -1.Implemented in: Release 3.

3.2. Event handlers for document typesSpecial types of scripts have been added to the document type properties, which modify the content

Page 43: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 43 of 53

of index fields in response to some events, such as change of document type, change of attributevalue, and completion of document indexing.

Using this feature, the following scenarios can be implemented:

• Database lookup (pick the single field from the image and pull out the accompanying datafrom a database/a file)See the description of OnDocumentAttributeValueChanged event handler.

• Creating dynamic lists of values (change the value or lists of values based on the value ofthe currently selected field or under certain conditions)See the description of OnDocumentAttributeValueChanged event handler.

• Predefining the values of index fields for several document types (predefine the values ofseveral fields and make the operator to pick the rest valuable data manually on a station)See the description of OnDocumentTypeChanged event handler.

• Exporting the service data into SharePoint (write the processing statistics, workflow nameand operator names into the hidden index fields)See the description of OnDocumentIndexingFinished event handler.

Event handler scripts can be specified in the Document Type dialog box.

Page 44: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 44 of 53

The following event handlers are available:

OnDocumentTypeChanged

The script is executed when a certain document type isassigned to the document. This can occur when anotherscript changes a document's type, or when a document's typeis changed by an operator on an Indexing Station.

It can be used for predefining the values of index fields.

Page 45: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 45 of 53

3.3. Hidden and read-only index fieldsIt is now possible to hide index fields that are completedautomatically (by a script) or prohibit their editing on theIndexing Station. This can be useful for index fields that containservice, sensitive, or any other kind of data which do notrequire the operator's attention or should not be modified. Forexample, the name of the verification/indexing operator orrecognition statistics can be stored in these fields and saved asdocument attributes upon export.

These options can be selected in the Field Properties dialog

box: Implemented in: Release 3.

4. Export

4.1. Saving output files in input foldersAn ability to save processed documents in the input folder has been added. This feature is availablefor the Document Library workflow type (shared folder, FTP folder, SharePoint library). This is usefulfor normalizing document storage, when documents should be kept in their initial folders.Output files can be placed into the input folder. If the names of source and output files are identical,the source files can be overwritten.To set up saving files in their initial locations, select Save output file in source library in the OutputFormat Settings dialog box when specifying Workflow properties.

For example, it can be used to set the default values, to load a list ofsuggested values from an external source (a file or database), to fill thevalues of hidden and read-only fields with necessary service information(operator names, workflow names, processing statistics, etc.).

OnDocumentAttributeValueChanged

The script is executed when an attribute value is changed by an operator.

It can be used to change field values based on the value of the currentlyselected field or under certain conditions.

For example, the operator can select the company name on the image andthe fields that contain company details will be populated automatically withdata from an external source (a file or database).

OnDocumentIndexingFinished

The script is executed when the indexing is completed (i.e. when thedocument is accepted or rejected by an operator).

It can be used for modifying field values after the editing of index fields isfinished. For example, it can be used to overwrite the value of some fieldsor to write the name of the indexing operator into an index field.

Implemented in: Release 3.

Page 46: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 46 of 53

Note: In previous releases, an option of exporting files to their source SharePoint libraries wasavailable in the Input parameters (Select SharePoint Libraries dialog box). Now that this optionhas been removed, the same behaviour can be achieved by configuring the settings in theOutput Format Settings dialog box as described above.

Implemented in: Release 3.

4.2. Writing original documents as attachments to PDF/A and PDF documentsIt is now possible to attach source files to output documents when exporting to PDF/A-3 or PDF(v.1.7). The initial files can be in image, PDF or other supported import format.

This may be useful when creating a digital library, where accessing (viewing, restoring) theoriginal documents is important or required.

To enable this feature, select the Attach source files option in the Output Format Settings dialogbox when specifying Workflow properties.

Implemented in: Release 3.

Page 47: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 47 of 53

4.3. Improvements in export to ALTO XML

4.3.1.Saving of source image file nameInformation about source document files is now written into output ALTO XML files. Thisinformation can be used to link the coordinates of recognized text to the original image file.- <sourceImageInformation>

<fileName>MNTN_Contract(2015).pdf</fileName></sourceImageInformation>

Implemented in: Release 3.

4.3.2.Writing original image coordinates to ALTO XML

A new option, Write original image coordinates, has been added.Selecting this option writes to ALTO XML the coordinates of the text objects respective to theoriginal image file. Use this option if original source files are saved into a digital library togetherwith the coordinates of recognized text.By default, the coordinates are calculated based on the pre-processed image that is used forrecognition. This image can be saved after processing in a format of your choice.

Implemented in: Release 3.

4.3.3.Support of ALTO XML version 3.0ALTO XML 3.0 is now supported. It is now possible to select the necessary schema in theparameters of the output ALTO XML file.Implemented in: Release 3.

4.3.4.Improved saving of word coordinates for CJK languagesThe specifics of the CJK (Chinese, Japanese, Korean) languages, where one word equals onecharacter, are now taken into account when exporting image coordinates. Every word isexported as a separate text object with its proper coordinates.

Page 48: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

«string WIDTH="9" HEIGHT="9" HP0S = "705" VPOS = "502" CONTENT="<> "/>

© ABBYY. All rights reserved. Page 48 of 53

Note: If the text of a document contains numbers,enabling the Digits recognition language isrecommended for the correct detection of thecoordinates.

Implemented in: Release 3.

4.3.5.Splitting an ALTO XML file into several files at exportAn output ALTO XML file for a large multi-page document can be split into separate output filesin the export step. This will make output files easier to manage. They will also open faster.To split an ALTO XML file, edit the Configuration.xml file and add the PagesPerFile property withthe necessary value.For example:- <ExportFormat Id="{A67E164B-2FB3-4EA6-9899-F665E311F764}" OutputFileFormat="ALTO"

KeepLastModifiedDate="false" RewriteIfFileExists="false" OutputFlowType="SharedFolder"FormatVersion="3_0" CoordinatesParticularity-'Words" FormattingMode-'Plain"MeasurementUnit-'Pixel" WriteOriginalImageCoordinates="true" PagesPerFile="1">

Implemented in: Release 3.

5. AdministrationConsole

5.1. Sending notifications to Administrator via an SMTP serverIt is now possible to send notifications to the Administrator via an SMTP server. This feature canbe useful if Microsoft Exchange cannot be installed or used on the server.To configure notifications, open the Recognition Server Properties dialog box, select desiredtypes of notification, and then select the option Send via SMTP server.

- «Texture WIDTH="1194” HEIGHT="40" HP0S="356" VP0S = "564" LANG = "zh-CN" BASELINE="600"> «String WIDTH="37"HEIGHT="37'' HP0S = ''356” VP0S = "566" CONTENT="ÏE7>«String WIDTH="37" HEIGHT="37" HPOS = "398" VPOS = "567" CONTENT="W/>«String WIDTH="38" HEIGHT="38" HPOS = "440 ' VP0S = "566" CONTENT="^7>«String WIDTH="37" HEIGHT="36" HP0S = "481" VP0S=”567" CONTENT="SÈ7>«String WIDTH=”34" HEIGHT="38" HP0S=”526" VP0S = "566" CONTENT="fÜJ7>«String WIDTH="37" HEIGHT="38'' HPOS = "565" VPOS = "566" CONTENT="&7>«String WIDTH="36n HEIGHT="37" HP0S = "607" VPOS = "565" CONTENT="ffl '/>«String WIDTH="37" HEIGHT="37" HPOS= "648'' VPOS = ”565" CONTENT="&7>«String WIDTH="36" HEIGHT="37" HP0S = ”69X" VPOS = ”565" CONTENT="lft7>«String WIDTH="36" HEIGHT="37" HPOS = ”732" VPOS = "566" CONTENT=""F7>«String WIDTH="36" HEIGHT="2" HPOS="774" VP0S=''582" CONTENT="—7>«String WIDTH=”38" HEIGHT="37" HP0S = ”815" VPOS = "565" CONTENT=,,Jh"/>«String WIDTH="38" HEIGHT="37" HP0S = "856" VPOS = "565" CONTENT="ïï7>«String WIDTH="38" HEIGHT="37" HP0S = "898" VPOS = "565" CONTENT="ft7>«String WIDTH=”34" HEIGHT="37'' HPOS = ”942" VPOS = ”565" CONTENT="M7>«String WIDTH="36" HEIGHT="36" HPOS = ”982" VPOS = "566" CONTENT="#7>«String WIDTH=”34" HEIGHT="37" HP0S = ”X025" VPOS="565" CONTENT=''^)7>«String WIDTH="36" HEIGHT="37" HP0S="X065" VPOS="565" CONTENT="W/>«String WIDTH="28" HEIGHT="38" HPOS = ”XXXX" VPOS="564" CONTENT="É7>«String WIDTH="34" HEIGHT="37'' HPOS=”XX50” VPOS="565” CONTENT="z(]7>«string WIDTH="39" HEIGHT="38" HPOS = ”X189" VPOS=”566" CONTENT="S7>«String WIDTH="37" HEIGHT="38" HPOS = "X232" VPOS="566" CONTENT="S7>«String WIDTH=”38" HEIGHT="36" HPOS = ”X273" VPOS=”566" CONTENT="Z7>«String WIDTH="33" HEIGHT="37" HP0S = ”X3X8" VPOS=''565" CONTENTS'07>«string WIDTH="36" HEIGHT="37" HPOS = ”X357" VPOS="565" CONTENT="fl)7>«String WIDTH=”38" HEIGHT="37" HPOS="X398" VPOS="565" CONTENT= "tt7>«String WIDTH="9" HEIGHT="9” HP0S = "X455" VP0S="580" CONTENTS'- 7>«String WIDTH="69" HEIGHT="32'' HP0S = ”X48X” VPOS="568" LANG="en-US" CONTENT="8747>

Page 49: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 49 of 53

In the E-mail Account parameters, the server address, port, sender address, and other optionscan be configured.

Implemented in: Release 3.

5.2. Logging the administrator's actionsThe event log contains information about certain actions performed by the administrator in theRemote Administration Console. These actions include changing workflow settings,activating/changing a license, modifying user accounts, and deleting the event log.

The extended logging will help to control the actions of the administrator and track anyunsanctioned changes to the product configuration that could disrupt its normal operation.

When the Administrator changes workflow parameters and clicks OK, this action is recorded inthe event log. If the Administrator clicks Cancel, the action is not logged.

Page 50: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 50 of 53

Implemented in: Release 3.

5.3. Extended information in the job logThe job log now contains information about actions performed by operators on documents (e.g.document was loaded, changed, accepted, rejected, etc.). The name of the operator (user name) isadded to each such log entry.

This may be useful for tracking the job and document processing history and identifying who verified orindexed a particular document in a job.

Implemented in: Release 3.

5.4. Restart export of failed jobsIt is now possible to export failed jobs manually.

This allows publishing documents that were rejected because there was no connection to the exportdestination or connection timed out (e.g. SharePoint connection failure, folders were inaccessible, etc.)

This information will also be added to the XML result file (InformationMessage parameter):

Page 51: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 51 of 53

without handling the exception and repeating the job from scratch.If connection to the export destination is lost, the jobs will have the Publishing paused status in the JobLog. After the export destination becomes accessible, the user can resume publishing by selecting theResume Publishing Job command.

Implemented in: Release 3.

5.5. Administration Console UI ImprovementsSome minor improvements have been made in the GUI of the Recognition Server AdministrationConsole which make the work of the administrator easier.

• Workflow properties can now be opened by double-clicking the workflow name (the input foldercan be opened from the context menu)

• The workflow toolbar now includes additional buttons for the most common actions (refresh,show/hide details pane, open properties, open input folder, start/stop workflow, create newworkflow).

• The Recognition Server details pane now contains links which open the Recognition Server options,registration options, search connector's parameters (if the corresponding components wereselected during installation), and the diagnostic tool.

Implemented in: Release 3.

Page 52: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 52 of 53

6. API

6.1. Support of WEB API versioningThe versioning of WEB API has been supported. It is now possible to refer to the particular version ofWEB API from the custom applications.

To use the particular version, the user has to connect to the Recognition4WS.vN virtual application.The following versions are available:

• Recognition4WS (the latest release. For the current release, Recognition4WS equalsRecognition4WS.v3)• Recognition4WS.v1 (Release 1, build # 4.0.2.943)• Recognition4WS.v2 (Release 2, build # 4.0.3.1167)• Recognition4WS.v3 (Release 3, build # 4.0.4.1425)

When creating a new application, it is recommended to specify the particular version, e.g.Recognition4WS.v3. It will ensure the compatibility with the future releases without the need torecompile the custom application.

Note: If the custom application has been created for usage in one of the previous releases (e.g. inRelease 2, where Recognition4WS is used in the code), then the customer can upgrade to the newerversion of Recognition Server by modifying the code (Recognition4WS -> Recognition4WS.v2) andrecompiling the custom application.If the custom application cannot be changed, then the user can delete the Recognition4WS virtualapplication from the IIS Manager and rename Recognition4WS.v2 virtual application toRecognition4WS.

Implemented in: Release 3.

Page 53: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 53 of 53

Releases 1-21. Server Features

1.1. Separate workflow queues

Each workflow now has a separate queue which prevents other workflows from being stopped if oneof them has an overloaded queue.The default number of jobs in the queue of each workflow is 50. This number can be changed in theConfiguration.xml file: MaxJobsCount="50". Prior to this change, this key set the total number of jobsin the server queue. Now this key sets the number of jobs in the queue of each workflow.

Implemented in: release 1 for 3A

1.2. Easy recovery after failure without data loss

Recovery after failure is now smoother and does not require manual copying of files. GUIDs are notused in file names anymore, so it is always possible to find a file by its name.When Recognition Server processes jobs, files are stored in the folder%programdata%\ABBYYRecognition Server 4.0\RS4WF\Images\<Workflow name>. File names are thesame as the names of source files with the only difference: job ID is added at the beginning of the name.

Implemented in: release 1 for 3A

1.3. Support working on Failover cluster

Work on failover cluster has been supported. The Recognition Server instances can be installed onseparate nodes of Failover cluster. All settings of the Recognition Server can be stored in the sharedfolder available for the cluster. Please note: this feature has not been tested. The testing can be doneupon the request.(The instruction with details of installing the Recognition Server on Failover cluster will be providedlater.)

Implemented in: Release 2.

1.4. Internal database

The current system state is now stored in the internal SQLite database. This database is installedtogether with Recognition Server and is invisible to users.Implemented in: Arabic Edition

1.5. Server exceptions folder

A new folder with server exceptions is now created in C:\ProgramData\ABBYY Recognition Server4.0\RS4WF\Exceptions. This folder contains jobs which failed due to the faulty operation of the serveror server flows. Jobs may fail if, for example, the database or the configuration file becomes corrupted.Implemented in: release 1 for 3A

Page 54: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 54 of 53

2. Administration Console

2.1. User rights management

2.1.1. Usage of Active Directory groups

In the Users node of the Administration Console it is now possible to add groups of users from ActiveDirectory. The full name of a group should be specified, including the domain name. A role (e.g. Verifieror Indexer) can be assigned to a group, and all members of the group will have the rights correspondingto the assigned role. Any users added to the group will automatically receive the rights required to workwith Recognition Server.

Implemented in: release 1 for 3A, release 1When a user adds a new group, the application checks if this group exists in Active Directory and displaysa warning if the group cannot be found. The user can still add a group with this name.

Implemented in: release 1

2.2. Logs and reports

2.2.1. Improved logging

The Job log contains records about every finished job in Recognition Server.The details pane has two tabs: a Files tab shows input and output files of the job and paths to these filesand a Details tab shows detailed information about the job, including Processing notes.

Now the job log may contain more than 500 records. The number of records is now limited only by thesize of the log or by the maximum number of days when data will be logged. These values can bechanged in the Job Log

"57"Properties dialog box, which can be opened by clicking the Options button ' .

The Find button allows users to search for records by input and output file name or by error text.Wildcard searches are supported.The job log can be saved to a *.csv by clicking Export to CSV File on the shortcut menu.

Page 55: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 55 of 53

By default, the Job Log provides two views: alljobs without filtering and failed jobs. It is alsopossible to create custom views by applying acustom filter to the log. To create a custom view,select the corresponding item from the shortcutmenu or click the CreateCustom View button and specify a view name andfiltering settings. The custom log will appear inthe tree below the Job Log node.Implemented in: Arabic edition

2.2.2. Saving information about the operator who edited or rejected the document

The XML result file now contains information about the operator who verified or indexed the document.This information is available in the verificationUserName and indexingUserName fields inside the<XmlResult> and <JobDocument> tags. If indexing and verification are switched off, these fields willremain empty.The XML result file now also contains information about the time of document indexing and verification.The job log contains information about the rejected jobs in Processing notes (who rejected a job and on

which station).

Implemented in: release 1 for 3A

2.2.3. Correspondence between input and output files

The XML result file allows you to establish a correspondence between the original and the resultingfiles: in the log, you can see the input and output files for each job.Changes in the XML result file:

• The attribute "Id" has been added to the <InputFile> tag. This is the identifier of the input file.• An embedded <Page> tag has been added to the <InputFile> tag. It has the following

parameters: Id - the page identifier of the input document; PageNumber - the number of thepage in the input file.

• A <Pages> tag with embedded <FileId> and <PageId> tags has been added to the<JobDocument> tag. <FileId> is the input file identifier and <PageId> is the page identifierwhich indicates the page of the input file which is the origin of the current processed page.

Changes in the log:

The log now has a Files tab which shows input and output files for each job.

Page 56: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 56 of 53

Implemented in: Arabic edition

2.3. Notifications

2.3.1. Including server and workflow names into the text of notification messagesNow the server name and the workflow names are included in the text of the notificationmessages sent by email to the administrator. This helps to easier manage theservers/workflows and solve the possible problems.

The subject of the email message has the following structure (to be used for filtering the emails):ABBYY Recognition Server (<Server Name>): <Reason of notification>

Implemented in: Release 2.

2.3.2. Notification about near license expiry

New notifications about near license expiry have been added.

Notifications can be sent based on the following event notification options:• percentage of remaining pages in license;• number of days left before the license expiry.

Implemented in: Release 2.

files DetailsInput File Output Filepagel01.tif C:\Users\Public\ABBYY\ABBYY Recognition Server 4.0\Default Workflow\Output

Folder\pagel010001.tifC:\Users\Public\ABBYY\ABBYY Recognition Server 4.0\DefaultWorkflow\Output2\page 1010003.pdf

Page 57: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 57 of 53

2.4. Job rejection without loss of files

Now it is possible to reject or delete a job without deleting the files.

Page 58: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 58 of 53

The new commands Reject Job and Reject All Jobs are used to reject a job or all jobs. The fileswill be saved to the Exceptions folder of the corresponding workflow.The commands Delete Job and Delete All Jobs are used to delete a job or all jobs. The files will beplaced into the Exceptions folder of the server.

Implemented in: release 1 for 3A

2.5. Interface improvements

2.5.1. Main window of Administration Console

2.5.2. Workflow status pane

The Workflow status pane displays the current state of the selected workflow. Available informationdepends on the workflow type.The status pane displays the following information:

• State: started or stopped• Start time• Stop time (if workflow was stopped)• Duration• Total number of jobs• Number of processed jobs• Number of copied files• Number of failed jobs• Paths to Output folders• Path to Exceptions folder

For a Document Library workflow which has been started, the status pane also displays a progress barwith percent completed.

The interface of the main window has been changed. New toolbars, panes, and buttons have been added. The orderof nodes is also slightly different. The stations are now gathered in the Stations node.

Page 59: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 59 of 53

2.6. Soft stop of the workflow processing

Now it is possible to stop the processing of jobs using the so called "soft" stop mechanism.It helps to complete the processing of all current jobs. New jobs are not taken into the processing. Afterthe results of all current jobs are published, the workflow is stopped.For manual "soft" stop one should use the Stop command. If the processing runs by the schedule, theworkflows are always stopped "softly".If the processing must be interrupted and the current jobs must be postponed without completion, oneshould use the manual Stop immediately command. It frees the computing power at once. Thepostponed jobs are finished, when the workflow is started again.

For a workflow with errors, the reason of failure is given in the status pane.

Implemented in: Release 2.

Page 60: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 60 of 53

3. Workflow settings

3.1. Document Library workflow type

New Recognition Server functionality allows users to process document libraries which shouldn't bemodified.

Now users don't need to copy files to the Hot Folder. Instead, they can simply specify the root folder ofa document library as an input folder, an output folder, an output format, and processing settings. Thedocument library will be recognized and the processed files will appear in the specified output folder.The structure of the original document library will be preserved.

Files which do not require recognition can be skipped, or moved to the output folder if you need topreserve the entire structure of the document library.The input files will not be deleted, as opposed to using the Hot Folder.A new workflow type has been created especially for processing document libraries.

The Document Library workflow will be stopped after all files in the indicated library are processed. Ifthe user places new files into the library, he must restart the workflow. As all processed files areregistered, only new files will be processed.

If the workflow settings have been changed and it is necessary to reprocess all files again, use theRestart command (click the arrow next to the Start button to see the command).

As a document library might be quite big and take a long time to process, workflows of type DocumentLibrary has a progress bar. See Workflow status pane for details.

Implemented in: Arabic edition, modified in release 1 for 3A

3.1.1. Periodical crawling of document libraries

A crawling frequency can now be set up for the workflow of the Document Library type to ensure thefast processing of upcoming files.A new option Crawl for new files in library every: should be enabled. The period of the library crawlingcan be selected from the drop-down list (from 10 minutes to 11 hours) or typed manually. E.g. "2 hours","12 hours", etc.

After the periodical crawling is enabled and the workflow is started, the system runs the monitoring of

□► - Workflow Properties1 Narr/N

Rpd-art 1 Pri°rity|U»)aL,uu"..u,.

„u..4*1 ScriptingDemo

_||

■v® Document_Library_Workflow

Page 61: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 61 of 53

the document library and counts down the time until the next crawl.

If the Crawl for new files in library every: option is not enabled, the library is crawled only once. Thestart time of crawling depends on the Workflow Activity settings (General tab).Settings of periodical crawling of document libraries can be also specified in the configuration file(Configuration.xml).Parameter EnablePeriodicCrawling stands for enabling/disabling the periodical crawling, the possiblevalues are True and False (the default is False). Parameter CrawlingInterval sets the crawling interval inmilliseconds (the default value is 7200000 ms).

Implemented in: Release 2.

3.2. Input settings

3.2.1. Processing SharePoint libraries

SharePoint libraries can now be indicated as a source for a Document Library workflow.Users can indicate the input source: a site, a particular library or several libraries, a folder or severalfolders.If Export output files to source library option is enabled when configuring the input source of MSSharePoint, the output parameters will always include an output file with the export destination ofSharePoint source libraries. Output files are saved into the same libraries/folders as they are at input.The format and naming schema of a file can be configured. By default the output files are saved underthe same names as at input. If a file already exists, a new version is created.If Export output files to source library option is not enabled, than the output settings can be configuredas usual, including saving the files into any SharePoint library/folder. Only one library/folder can beselected.If one and the same folder or library is indicated as input and output, files can be overwritten, or fileswith new names can be created, or the versions of the files can be changed. The behavior is determinedby the option selected from the If file exists drop-down list in the Output Format Settings dialog box.See also Overwriting files in the output folder.

Page 62: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 62 of 53

Limitations:1. If the input library is the same as the output library, the option For each folder cannot be

used — you can only create a job for each file.2. Only one site including all its libraries can be processed within one workflow. For child sites

one should create separate workflows.Implemented in: release 1 for 3A.Possibility to indicate several libraries as input was implemented in Release 2.

3.2.2. Using IFilter for processing PDF files in MS SharePoint

Microsoft Search IFilter for SharePoint 2013 can again be used for indexing PDF files due to thelifting of the Microsoft ban.To enable this possibility, the cumulative update package for SharePoint Server 2013 should beinstalled. Link to install it: http://support2.microsoft.com/default.aspx?scid=kb;EN-US;2882989Please note: The update for MS SharePoint should be installed before the installation ofRecognition Server 4 Release 2.If the Recognition Server 4 Release 2 has been installed, install the update for MS SharePoint,then run the installation of the Recognition Server 4 Release 2 again and use the Repaircommand to modify the installation.Implemented in: Release 2.

3.2.3. Filtering files for processing and settings for unprocessed files

It is possible now to filter files to be processed using a "mask" (i.e. a template) for file names. If

Page 63: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 63 of 53

you specify a name mask, the program will process only files with names and extensions whichfit the mask.Files can be selected in the workflow properties: Input tab, Select files to process.

You can use the "?" and symbols in the mask. "?" stands for any single character and stands forany number of any characters. For instance, the mask 1.* will select all files, the mask *.tiff willselect only files with the ".tiff" extension, and the image*.* mask will select files of all typeswhose names start with "image".For workflows of the Hot Folder and Mail types, the default mask is *.*, i.e. all files from theInput folder will be processed. For workflows of the Document Library type, the default maskselects files in all of the supported image formats (*.bmp, *.dib, *.rle, *.dcx, *.djvu, *.djv, *.gif,*.jb2, *.jbig2, *.jp2, *.j2k, *.jpf, *.jpx, *.jpc, *.jpg, *.jpeg, *.pcx, *.pdf, *.png, *.tif, *.tiff, *.wdp,*.wmp.). You can specify any other mask that suits your needs. For instance, you may wish tohave a mask that processes image files but ignores files with the ".tmp" extension, which maybe created in the input folder when scanning documents.

Under Other files, you can specify which actions should be performed on files that do not fitthe mask:

• Exceptions folder - Any files that do not fit the mask will be placed into the Exceptionsfolder. Use this option when only files of certain types must be processed.

• Output folders - Any files that do not fit the mask will be placed into an output folder.Use this option for processing archives where all documents must be preservedtogether with the folder structure. Processed image files will be converted to imageswith a text layer and all other files will be copied or moved to an output folder "as is."

• No action - Any files that do not fit the mask will be ignored. Use this option when onlyfiles of certain types must be processed. Note: We do not recommend using the Noaction option for workflows of the Hot Folder type, as this may fill up the folder withunprocessed files.

Note: A separate job is always created for unprocessed files. If the workflow must create onejob per folder and in a folder contains both processed and unprocessed files, the workflow willcreate one job for the processed files and another job for the unprocessed files.

1 Read-only folder. The user might need to recreate in the output folder the structure of the input folder.Only images should be processed and the other files must be moved to the output folder.

Page 64: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 64 of 53

The mask option is useful in the following scenarios:

• Hot Folder. Sometimes scanners create *.tmp files besides *.tiff files and place bothkinds of files in the same folder. Only * .tiff files should be processed, and the *.tmpfiles should be ignored.

Page 65: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 65 of 53

• Mail. Besides an attached image file, a letter may contain a logo or signature in GIFformat. Only the attached image file should be processed and the GIF logos andsignatures should be ignored.

The input files of failed jobs can now be moved to output folders, moved to the Exceptionsfolder, or ignored. To tell the program what it should do with failed jobs, use the Save failed jobsto option on the Quality control tab of the Workflow Properties dialog box.Note: If the user chooses to move unprocessed or failed files to output folders and the workflowcontains several output folders, the unprocessed or failed files will appear in all output folders.Implemented in: Arabic Edition, modified in release 1 for 3A

3.2.4. Using the SSL protocol for data protection

Communicating with a POP3 server over the SSLprotocol is now supported. If POP3 E-mailServer is selected as the source type, the optionUse SSL becomes available. Port 995 should bespecified in the Port number field.

Implemented in: release 1

3.3.Processing settings

3.3.1. Special mode for processing technical drawings

Working with technical drawings such asconstruction blueprints has beensignificantly improved. Since theprocessing of technical drawingsrequires settings different to thoserequired for regular documents, usersshould enable the Processing mode fortechnicaldrawings option on the 2. Process tab ofthe Workflow Settings dialog box.It is recommended to enable this modefor documents that contain a lot of finedetails. The graphical objects will remainunchanged and the text will berecognized.Recognition in this mode is done in threedirections: 2

2 The direction of the principal orientation,

Page 66: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 66 of 53

In the XML output file, the orientationof the text will be indicated in theorientation attribute:

• RotatedClockwise• RotatedCounterclockwise• If not indicated, the orientation

is "normal" (i.e. the text isoriented horizontally)

Note: Using this mode can slow downimage processing.Implemented in: release 1 for 3A

3.3.2. Despeckle images option

The Despeckle option is now available in theproduct GUI (Workflow properties, 2. Processtab, Advanced Processing Settings). Thisoption removes noise from the image. Noisecan be introduced by scanning, and it isrecommended that it be removed for betterdata recognition. During despeckling, theprogram also removes background dots orboundary lines of raster forms.By default, the option is switched off, becausein some cases it can adversely affectrecognition (the program may even fail torecognize some text fragments). Werecommend switching the option on only ifyou are certain that it will help to removenoise from your images (please try it first onseveral sample images).The corresponding API method isRemoveGarbage.

Implemented in: release 1 for 3A

3.3.3. Setup the color of filling the document edges after deskew

The possibility to select the color of filling the document edges ("triangles") left after theautomatic deskew of the image was added.By default, the grayscale colors are used and the color intensity is calculated automaticallybased on the whole image. However, for the specific cases it is necessary to define the colormanually: black, white or custom color.Now this can be done by means of a new object that was added to the workflow parametersin the Configuration.xml file:<BackgroundColorDetectionParams BackgroundColorDetectionType="Auto" red="255"

which is automatically detected• Rotated clockwise relative to the

principal orientation• Rotated counterclockwise relative to

principal orientation

Page 67: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 67 of 53

green="255" blue="255"/>The BackgroundColorDetectionType parameter can have the following values:

Auto - default, the grayscale colors are used and the color intensity is calculated automaticallybased on the whole image.FillBlack - black color will be used to fill the edges.FillWhite - white color will be used to fill the edges.Custom - custom color can be specified in RGB color model: red, green, blue values shouldbe specified (0-255). Please note:Red, green and blue color components are taken into account only if the parameter's value isset to Custom. In other cases, system ignores these values. For grayscale images, the manuallydefined color is converted from color to grey of the same intensity.

Page 68: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 68 of 53

The location of the configuration.xml file is: %PROGRAMDATA%\ABBYY Recognition Server 4.0\Configuration.xml. Please see the Help file for the instructions of usage the configuration file.Implemented in: Release 2 Patch 1

3.3.4. Additional fonts

This setting is only available in the configuration file.By default Recognition Server uses only limited number of fonts to avoid dependency of a resulton fonts set installed in each processing station. These fonts might be not enough for correctlydisplay fonts for Chinese, Korean, Japanese, Thai or Arabic.To solve this problem, a new parameter, AllowedFontsMode, is available in the sectionRecognitionParams of the configuration file (Configuration.xml).Possible values are:

• Default - In this mode, only the following fonts will be used: Arial, Times New Roman,and Courier New.

• All - All possible fonts will be used. Please note that processing will take longer. It is alsoimportant that the user have the same set of fonts on all the processing stations;otherwise the result might be different on different computers.

Users can also use a custom font set as an addition to the main font set. In this case, a list ofadditional fonts can be added below the section RecognitionParams using the elementAdditionalAllowedFont.This example illustrates adding the font AngsanaUPC to the set of main fonts:

<RecognitionParams RecognitionQuality="Fast" LookForBarcodes="true" VerificationMode-'AlwaysVerify" RecognitionMode="FullPage" TextExtractionMode="false"AllowedFontsMode="Default"><AdditionalAllowedFont>AngsanaUPC</AdditionalAllowedFont>Implemented in: release 1

3.3.5. To speed up processing, text in pictures is not recognized by default

To speed up processing, recognition of text in pictures is now disabled by default. If you need torecognize text in pictures, you can enable this feature in the configuration file. This can only bedone for the quality recognition mode.The name of the parameter is ProhibitHiddenTextDetection, the default value is true.Implemented in: release 1

3.3.6. Blank page detection settings

The settings to configure the flexible detection of empty pages have been added. It helps toavoid problems of wrong blank pages detection for images of the low quality, with the noise leftafter scanning, with non-textual objects, etc.Margins, percentage of blackness and objects allowed on a page to consider it empty can bespecified in the Document Separation parameters.

Page 69: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 69 of 53

Implemented in: Release 2.

3.4. PDF processing options

3.4.1. Improved MRCcompression method ofoutput PDF files

The quality of output PDF files generated with using the MRC method of compression has beensignificantly improved. The enhanced method of MRC compression now grants the noticeablybetter visual quality of documents while keeping almost the same small file size.The MRC compression for output files shows the same results of minimizing the file size andpreserving the visual quality as our competitors (incl. CVISION).The improved compression methods are used by default now in all new and previously createdworkflows with compressed PDF output format enabled (Enhanced compression (MRC) option).

To disable the updated MRC and use the previous compression mode one should set theLegacyMRCMode flag to True in the Configuration.xml of ABBYY Recognition Server settings.To manage the quality/size parameters of the output files, the Max Quality - (balanced) - MinSize profiles can be selected.These profiles help you to select the desired output quality/size and have the settingsconfigured automatically. For instance, when selecting Min Size profile, the quality parameteris set to 30% and the MRC compression is enabled.Implemented in: Release 2.

Page 70: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 70 of 53

3.4.2. Version, format, and other parameters of an output PDF file

Export settings for PDF and PDF/Ahave been expanded: it is nowpossible to specify a version foroutput PDF files and select a PDF/Astandard. The list of available PDFstandards includes PDF/A-1a, PDF/A-1b, PDF/A-2a, PDF/A-2b, and PDF/A-2u.

Implemented in: release 1 for 3A

3.4.3. Export to PDF/A-3 format

Export of output files to PDF/A-3 format has been supported. It is possible to select PDF/A-3a,PDF/A-3b, or PDF/A- 3u standards of PDF/A format.Please note: the attachment cannot be written into the output PDF/A-3.Implemented in: Release 2.

3.4.4. Tagged PDF enabled by default

When adding a new output format for saving documents to PDF files, the option of Enabletagged PDF (compatible with Adobe Acrobat 5.0 or above) is enabled by default now. This helpsto avoid problems with having excess spaces in the words and ensure the correct search withinthe PDF file.Please note: this option may result in upto a 10% increase in the file size.Implemented in: Release 2.

3.4.5. Possibility to skip processing PDFs with a text layer

It is now possible to skip the processing of PDF files. PDF files with a text layer can now be movedto an output folder if the user selects the option Do not modify files with high-quality text layer.The user can also select a detection mode:

• In Fast mode, the application looks for a text layer in the file. If a text layer is detected,the file will be moved to an output folder and the other export settings will be ignored.The application will not treat the pages in this file as OCRed, but please note that if thereare other output folders with formats other than PDF specified, OCR will be performed,affecting the page counter.

• In Thorough mode, the application compares the text layer of a PDF file with OCR results(a piece of text on each page will be compared). If the text in the text layer and the textobtained through OCR are identical, the file will be moved to an output folder. In thiscase pages are considered to be as OCRed, which affects the page counter.

When a text layer is compared to OCR results, the default threshold is 5%. This means that theprogram will use the OCR results, if there is more than more 5% difference between the texts.

Page 71: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 71 of 53

This threshold can be changed in the Configuration.xml file:SkipRecognizePdfsWithTextLayerCoefficient="25"This setting is located in the ExportFormat node and appears in the file when you set up outputto PDF.

Page 72: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 72 of 53

Note:1. Files skipped in Fast mode will not be sent to operator stages (i.e. indexing orverification).2. The setting is only applicable to source files in PDF format.

Implemented in: release 1 for 3A

3.4.6. Ability to embed a text layer and keep the image and all PDF file propertiesSometimes PDF files don't have a good text layer but have bookmarks, attachments or otherparameters which must be preserved. It is now possible to preserve all attributes of a PDF fileand embed only recognized text. The option Modify text layer only is available on the FormatSettings tab for PDF and PDF/A.Note: The option is only applicable to source files in PDF format.

Implemented in: release 1 for 3A

Page 73: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 73 of 53

3.4.7. Enabling and disabling Fast Web View for PDF files

The option Fast Web View is available onthe Format Settings tab for PDF andPDF/A. If the option is enabled, a previewwill be created for fast opening of the fileon websites.

Implemented in: release 1 for 3A

3.4.8. Using PDF text layer for recognition results improvement

In case PDF files with a text layer are OCRed by Recognition Server the source text layer is usedfor recognition results improvement. For example, unconfidently recognized characters arechecked with a text layer and are copied from it.Implemented in: release 1 for 3A

3.4.9. Using PDF text layer for generating quality output files of different formatsIf imported PDF file contains a text layer, it can be reused for creating the quality output files ofPDF and other formats. For example, PDF/A, ALTO XML, etc.When running the OCR of imported files, the original text layer is detected. The quality of theoriginal text character is evaluated before copying it to the resulting file. By this algorithm weensure the same or better quality of the output file compared to the original file.Please note, that the license counter is decreased, even if the original files contain the text layer.

Implemented in: Release 2.

Page 74: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 74 of 53

3.5. Output settings

3.5.1. Overwriting files in an output folder

It is now possible to overwrite an outputfile if it already exists in an output folder.If the option Overwrite if file exist is notselected, a 4-digit index will be added tothe file name.

In the XML result file, the attributeRewriteIfFileExists has been added to thetag <FormatSettings>. The value trueindicates that the files in the output folderwere overwritten.

Implemented in: Arabic edition

When you save output files in aSharePoint library, you have achoice of the following options:

• Create new name - The outputfile will be given a new name.

• Overwrite file - The output filewill replace the original file.

• Use SharePoint versioningoptions - The output file willreplace the original file and a new version number will be calculated

SharePoint options:

| Create new name ------ 31 Create new nameOverwrite fileUse SharePoint versioning options

Page 75: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 75 of 53

using the current settings of SharePoint versioning.Implemented in: release 1 for 3A

Page 76: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 76 of 53

3.5.2. Export format compatible with FineReader Engine 11

Recognition Server 4 supports export to aninternal FineReader format which iscompatible with FineReader Engine 11.To export to this internal FineReader format,select FineReader Internal format (*.layout,*.image) as the output format. As a result,two files will be created with *.layout and*.image extensions.This feature is useful for complicated imageprocessing in FRE. Instead of creating adistributed system, Recognition Server will beused for text layer creation.

Implemented in: release 1 for 3A

3.5.3. KeepPages parameter

This setting is only available in the configuration file.The new parameter KeepPages regulates page breaks in the output formats doc, rtf, and docx.This parameter is available in the export settings inside the ExportFormat tag of theconfiguration file (Configuration.xml). Possible values are true and false (the default value isfalse).Usage scenario:The size of a text fragment on a page can decrease if font size is decreased. To keep the pagebreaks as in the source document, the parameter should be set to true, otherwise content fromthe beginning of one page may be placed on the preceding page.In other cases, the size of a text fragment can increase and if you keep the page breaks, the endof the text fragment from one page can be placed on the following page. If this is the case, werecommend setting the KeepPages parameter to false.Implemented in: release 1 for 3A

3.5.4. Export to specific column types in SharePoint

Export of index fields to specific column types to SharePoint has been supported:• Single line of text;• Multiple lines of text;• Choice (menu to choose from);• Number;• Currency;• Date and Time;• Yes/No (checkbox);• Hyperlink or Picture;• Managed Metadata.

Page 77: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 77 of 53

The document attributes (index fields) should be mapped with the appropriate content typesimported from the selected SharePoint library.

Page 78: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 78 of 53

To configure the mapping process, one should click the Settings button then selecting theSharePoint document library in the output parameters. In the Mapping Document Attributes toSharePoint Columns window the links between the RS document types (created at the Indexingtab) and SharePoint content types (submitted from the selected library) should be established.After the appropriate SharePoint content type is selected, the RS document attributes (indexfields) can be mapped with the SharePoint columns.

Implemented in: Release 2.

3.5.5. Export to ePub3 format

Export of output files to ePub v.3 format has been supported.

Implemented in: Release 2.

3.5.6. Settings of units measurement for export to ALTO XML

A unit of measurement (pixels, inches, and mms) can be selected when configuring export toALTO XML format.

Implemented in: Release 2.

Page 79: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 79 of 53

4. Document processing

4.1.Improved recognition of Arabic texts

A new version of OCR Technologies is used in the Recognition Server 4, where Arabic OCR hasbeen significantly improved. Besides the technological tests, productivity of Recognition Server4 was measured on 2,500 pages of Arabic texts which were exported to RTF. This test has shownRecognition Server 4 to be 17-20% faster compared to Recognition Server 3.5.Implemented in: Arabic edition

4.2.Ability to limit the number of processed pages in input files

In many scenarios that involve searching document libraries it is sufficient to have text from afew first pages in order to find a document. In such cases, clients would like to save time andpages in the page counter by limiting the number of processed pages to N first pages in eachfile.This feature can be switched on for I Filter and GSA connectors in the workflow settings via theGUI. For other workflows, it can only be switched on using the XML ticket.Example of enabling this feature in the XML ticket:<XmlTicket PageNumToRecognizeForSingleInputFile="2">

<InputFile Name="50.pdf" /><ExportParams>

<ExportFormat OutputFileFormat="Text"OutputFlowType="SharedFolder">

<OutputLocation>D:\Output Folder</OutputLocation></ExportFormat>

</ExportParams></XmlTicket>This feature will work only if there is no document assembly (the option Create one documentfor each file in job is selected), otherwise the setting will be ignored.This setting has the following effect:

• Only processed pages will be counted in any output files• The time of processing will be reduced, as only the specified number of pages will be

processed• This setting will be ignored if the output format is PDF• Output files in text formats will contain only N pages• Output files in image formats will contain all pages, but the page counter will be

decremented only by N pages for each file• If an operator station is included in the processing, all pages can be opened on this

station, but only the first N pages will be available for indexing and editing. The operatorwill be able to recognize other pages on the Verification Stations if necessary. In thiscase, the page counter will be decremented by the number of recognized pages.

Implemented in: release 1

4.3.Support of new barcode type - USPS-4CB (Intelligent Mail Barcode)

Extraction of barcodes of USPS-4CB type which is used on mails in USA and is required by the USpostal service has been supported.Barcodes of USPS-4CB (Intelligent Mail Barcode) can be recognized in documents and also canbe selected as a barcode type for the document separation in the workflow settings.

Implemented in: Release 2.

Page 80: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 80 of 53

4.4.Disabled image compression of lossy JBIG2 type

Lossy JBIG2 image compression has been removed from the UI and internal compressionparameters, as it produced the output files of low quality.

Page 81: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 81 of 53

5. Scanning Station

5.1. Sending registration parameters values to index fields

When scanning a batch, the registrationparameters entered for a document can be sentas the values of the document index fields.The lists of index fields (document types andtheir attributes) must be pre-configured in theworkflow properties (Indexing tab).At the Scanning station in the batch type settingsone should specify the batch sendingparameters: select the desired workflow andimport the list of index fields by clicking theImport Registration Parameters button.

When creating a batch, select the desired BatchType, assemble the documents and assign theDocument Types in the Registration Parameterswindow.After processing the batch in Recognition Server,the documents with pre-filled index fields' values

are shown at the Indexing station. It is possible to skip the indexing stage by using the followingcode in the indexing script: "SkipManualIndexing = true;". In this case index fields' values will be

Page 82: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 82 of 53

exported according to the workflow settings. The values of document registration parameterscan be obtained from indexing or export script by using the standard Attributes object.

Also they are accessible from the XML result file.Please note:

• Only values of the parameters imported from the workflow can be sent as index fieldsto Recognition Server, despite it is possible to create more registration parameters inthe batch type settings at the Scanning Station.

• The types of entered values should coincide with the types of index fields, specified inthe workflow properties.

Implemented in: Release 2.

6. Verification and Indexing Stations

6.1.Manual selection of documents for verification and indexing

Operators of Verification and IndexingStations can now select documents manuallyfrom the queue. This feature can be veryuseful if an operator needs to speed up theprocessing of recently added urgentdocuments.

The button LSJ on both stations togglesbetween manual and automatic modes ofreceiving the next document.

The button on both stations should be usedto open the Select Document for Verificationor Select Document for Indexing dialog box.In this dialog box, the operator can find therequired document, sort documents byname, priority or creation date, and selectthe found document, which will be openedfor verification or indexing.Anew information pane displays the numberof documents in the queue and allowsstarting verification or indexing and selectingdocuments manually. This pane appears:

• Between tasks in manual mode• When connection with the server is

<Attributes><Docu mentType> Contract </DocumentType>

- <Field Type="SingleLine" IsRequired = "false" IsDefined="true"><Name> Number </Name><Value>CT5942</Value>

</Field>- <Field Type=' DateTime" IsRequired = "true” IsDefined=' true">

<Name>Date</Name><Value>14.10.2014 00:00:00.000000</Value>

</Field></Attributes>

Start verification

Documents in queue 0 high of 25

Start verification

Page 83: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 83 of 53

lost• When the current document is returned to the queue when timeout is reached

Page 84: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

Implemented in: release 1

© ABBYY. All rights reserved. Page 84 of 53

6.2.Saving documents

It is now possible to save changes in the current document on both Verification and IndexingStations. Now if a failure occurs during verification, the verification results will not be lost. Theverification results will be saved if the operator selects Document > Save or presses Ctrl+S.When the station is closed during document verification, the operator will be asked to save theresults.The current document with the saved changes will be returned to the server and will becomeavailable to other operators.On Indexing Stations, it is only possible to save results after the document type is selected.Implemented in: release 1

6.3.Timeout of inactivity

To prevent documents from sitting forever onoperators' stations, documents are returnedto the queue after a timeout is reached. Inprevious versions, the timeout value was setto 120 minutes and could not be changed.That proved insufficient for verifying largedocuments (for example, books). The 120-minute timeout is also not suitable forcompanies which allow operators to leave thecurrent document opened when they gohome after work or break for lunch.Now the timeout value can be changed in theRecognition Server Properties dialog box, orin the configuration file Configuration.xml(change the value inOperatorStationInactiveTimeoutInMinutes="120" in the QueueManager node).Important! This timeout is applied to allworkflows and to all jobs on Verification andIndexing Stations.Implemented in: release 1

6.4. Improved work with document types and index fields on Indexing Stations

6.4.1. Import of index fields from files

The ability to import document types, index fields, and values from an XML or CSV file has beenadded. This feature is useful if there is a need to use the same field in different workflows.The feature is available on the Indexing tab of the of the Workflow Properties dialog box tab(click the Import... button).Imported files should have the following structure:

• XML

Page 85: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 85 of 53

Indexing.xml

Page 86: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 86 of 53

• CSV

There are some other changes on the 5.Indexing tab of the Workflow Properties dialog box:• Order of document types can be changed using the Up and Down buttons.• The default document type can be selected using the Default type checkbox.

Implemented in: release 1

6.4.2. Quick input of index fields

When the operator starts typing an index field value, the values starting with the same letter will beautomatically selected from the list of allowed values.

Implemented in: release 1

6.4.3. Possibility to combine values from several regions into a one index fieldPossibility to use several regions as a source of values for the one index field has been added. This featurecan be useful to set the multi-line text as an index field value.To combine the values, one should hold the CTRL key and click on the regions that contain values to beused as a single index field. The values are aggregated and separated with spaces automatically.

Implemented in: Release 2.

DocumentType FieldName IsObligitary FieldType PossibleValues IsDefaulttype1 bbb List Field1;Field2;Field3 TRUEtype1 ccc SingleLine Don't saytype2 test TRUE MultipleLines Do this 1; test twice

Page 87: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 87 of 53

6.5. User interface changes

6.5.1. Verification StationThe main toolbar on the Verification Station has been changed:

• The new Warnings button allows the user to hide/show the warnings pane. The button alsodisplays the number of issued warnings.

• The number of low-confidence characters is displayed on the Check Spelling button.• The new Select Document button allows selecting documents manually from the verification

queue.• The new Get documents Automatically button allows switching between automatic and manual

document selection.• The Reject All Documents button is hidden, this command is only available in the menu *.

*Note: The Reject All Documents command should not be used very often because it rejects alldocuments of the job while the operator works on the current document only. The Reject commandreturns only the current document to the queue.Information about the number of documents in the queue is now displayed in the status bar:

6.5.2. Indexing StationThe main toolbar on Indexing Station has been changed:

• The new Select Document button allows selecting documents manually from the indexingqueue.

• The new Get documents Automatically button allows switching between automatic and manualdocument selection.

• Reject All Documents button is hidden, this command is only available in the menu *.

*Note. The Reject All Documents command should not be used very often because it rejects alldocuments of the job while the operator works on the current document only. The Reject commandreturns only the current document to the queue.

Information about the number of documents in the queue is now displayed in the status bar:Documents, in queue: 0 high of 22

Implemented in: release 1

7. Operating systems

7.1.Support for Windows Server 2012 Release 2

Recognition Server 4 can be installed and run on Windows Server 2012Release 2. Implemented in: release 1

Zoom In Zoom Out * Fit to Window

\ Accept DocumentReject Read & Check Spelling (534)

Implemented in: Release 1Accept Document Rejecta. Zoom In a.

Page 88: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 88 of 53

7.2. Discontinued support for Windows XP and Windows Server 2003

We stopped supporting Windows XP and Windows Server 2003. Recognition Server 4 cannot be installedon these operating systems.

8. Scripting

8.1. Access to subsequent pages from the document assembly script

The property RecognizedPage: UserProperty of a page can be used now to enable the document assemblybased on the analysis of subsequent pages.The decision on whether the page belongs to a document can be made based on the information fromthe previous pages. For example, the same ID values should be on all the pages of the document.Implemented in: Release 2.

8.2. Detecting the workflow name by script

A new property was added for a page object to get the workflow name for the page that is beingprocessed - RecognizedPage: WorkflowName.This possibility allows copying scripts to several workflows without manual modifications.Implemented in: Release 2.

9. Changes in the COM-based API and Web API

9.1. Namespace changes

The namespace of the COM API is changed from ABBYYRecognitionServer3 to ABBYYRecognitionServer.The namespace of the Web API is changed from RSSoapService3 to RSSoapService.

Implemented in: Arabic edition

9.2. Compatible API

By default, the API is not fully compatible, which allows Recognition Server 4 to be installed and run onthe same computer where a previous version is installed.If there is a need to have a fully compatible API without recompiling your applications, you can achievethis by following simple instructions.This feature is available by request only; please contact ABBYY HQ for the instructions.

9.3. Automatic API deployment on 64x operating systems

Both the Web and the COM API are automatically deployed by installer on 64x operating systems withoutany additional manual setup.

9.4. Added objects

The goal of adding new objects to the API is to support these scenarios:1. Ability to establish a correspondence between the input and output files.2. Ability to delete jobs after asynchronous processing (for Ricoh).3. Setup of the recognition service if Recognition Server is accessed and settings are changed

by a user working on the same computer (for NLC).

Page 89: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 89 of 53

9.4.1. Correspondence between input and output files

The following objects are added to the COM-based and Web-based API to support the ability to establisha correspondence between input and output files.InputFileThis object represents one input image file and the results of processing this file.PropertiesPagesThis obect represents a collection of Page objects.

PageThis object represents a page of the input file. This is a child object ofInputFile. Properties

PagePositionsThis object represents a collection of PagePosition objects.PagePositionThis object represents a page in the output document and information about the position of this page inthe input file. This is a child object of JobDocument.Properties

9.4.2. Support of the recognition service scenario (for NLC)

In this scenario, Recognition Server works as a service which is almost invisible to the user and is called ifdocuments processed with NLC are in an image file format and should be recognized first.Recognition Server is installed silently and uses the default workflow. However, its settings are availableon the same computer and the user can change these settings. In this situation, it is necessary to havethe ability to check if the job can be processed and cancel the job if it cannot be processed at the moment.

JobDocumentThis object represents one output document. PropertiesName Type DescriptionPagePositions

PagePositions,readonly

Returns a collection of pages of the output document with theinformation about the position of each page in the input file.

Name Type Description

Pages Pages, read-only Returns a collection of pages of the input file.

ID String, read-only Unique identifier of the input file generated by RS.

Name Type Description

ID String, read-only Unique page identifier generated by RS.

Number String, read-only Page number in the input file.

Name Type Description

FileId String, read-only ID of the input file to which the page belongs.

PageId String, read-only ID of the page in that input file.Implemented in: Arabic edition

Page 90: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 90 of 53

With the new API methods now you can:• Check if the workflow is started or stopped• Check if there is a connection with server• Check if indexing and/or verification is switched on in the workflow and change indexing or

verification settings

Page 91: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 91 of 53

The following objects have been added to the COM-based API.Now it is possible to get the state of a workflow. The parameter WorkflowState has been added to theIWorkflow interface.

Interface Name Type DescriptionIWorkflow WorkflowStat

eWorkflowStateEnum, read-only

Returns a collection of workflowstates.

WorkflowStateEnum is a type of constant enumeration, which defines different workflowstates.Name Description

WS_ApplyingSettings

The state of a workflow after it has been started and before theprocessing has begun. At this stage, the program checks if it canaccess the folder that contains the input documents. This state isvery short in duration and is not indicated in the console (the word"Starting" is displayed instead).

WS_CrawlingAt this stage, the program checks the folders of the DocumentLibrary workflow. It counts the files, adds them to the database,and prepares to process them. The word "Crawling" is displayed inthe console.

WS_FinishingThe state of a workflow when processing is coming to an end. Atthis stage, the program writes the files for the last time andcompletes publishing the large files. The words "FinishingProcessing" are displayed in the console.

WS_NotAvailableThe state of a workflow that is inaccessible. The words "NotAvailable" are displayed in the console, together with the reasonwhy the workflow cannot be accessed.

WS_ProcessingThe principal state of a workflow, when files are being received,processed, and recognized. The word "Processing" is displayed inthe console.

WS_StartingProcessThe state of a workflow after the start command has beenexecuted and before information about the beginning ofprocessing has been returned. The word "Starting" is displayed inthe console.

WS_Suspended The state of a workflow that has been stopped. The word"Stopped" is displayed in the console.

Besides workflow states, it is possible to get the state of the server.Interface Name Description

IClient Connect(stringserverName)

A connection with server is being established.If the server is stopped, there will be a COMExceptionwith this text: "ABBYY Recognition Server is notavailable: The client has successfully connected to theserver, but the server is not running."

A method which deletes a job and all images has been added to the IClient interface.Interface Name DescriptionIClient DeleteJob(string jobId) Deletes a job with its all images.

It is now possible to receive the server's Exceptions folder via the ¡Client interface.Interface Name Type Description

IClient ServerExceptionsFolder string, read-only Returns the folder with the server's

exceptions

Page 92: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 92 of 53

9.4.3. Deleting of jobs

10. UI and Documentation localization

Localization of ABBYY Recognition Server 4 is done according to the table below.

It is now possible to switch on/ off verification using IXmlTicketInterface Name Type Description

IRecognitionParams VerificationMode VerificationModeEnum

Returns the verificationtype: whether verificationwill be performed or not.

IRecognitionParams

VerificationModeThreshold

double Sets the verificationthreshold.VerificationModeEnum is a type of constant enumeration which defines different verification

types.

Name DescriptionDVM_DoNotVerify Verification is switched off.DVM_VerifyAlways Documents will be always verified.

DVM_VerifyIfThresholdExceededDocuments with the number of low-confidence charactersabove the threshold (VerificationModeThreshold) will beverified.Implemented in: release 1

The following method has been added to the COM-based and Web-based API to support theability to delete a job after asynchronous processing.A method which deletes a job and all of its images has been added to the IClient interface. __Interface Name DescriptionIClient DeleteJob (string jobId) Deletes a job with all of its images.Implemented in: release 1

Page 93: ABBYY Recognition Server 4 Feature List Release 5rs4_r5_feature_list.pdf · ABBYY Recognition Server 4 Feature List Release 5 ... Keeping correspondence between input and output files

© ABBYY. All rights reserved. Page 93 of 53

Implemented in: Release 1 Multilingual, Release 2.

English Russian French GermanItalian Spanish Chinese Portuguese(Brazil)

Czech Hungarian

Polish

ResourcesConsole + + + + + + + + + + +IndexingStation

+ + + + + + + + + + +

VerificationStation

+ + + + + + + + + + +

ScanningStation

+ + + + + + + + + + +

Protection + + + + + + + + + + +HelpConsole + + + + + + - - - - -IndexingStation

+ + + + + + - - - - -

VerificationStation

+ + + + + + - - - - -

ScanningStation

+ + + + - - - - - - -

Open API + - - - - - - - - - -AdminGuide

+ + + + + + - - - - -

EULA + + + + + + + + + + +InstallerRecognitionServer

+ + + + + + + + + + +

IFilter + + + + + + + + + + +Autorun + + + + + + + + + + +


Recommended