+ All Categories
Home > Technology > ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Date post: 08-Feb-2017
Category:
Upload: abbyy-usa
View: 132 times
Download: 6 times
Share this document with a friend
50
ABBYY Technology Summit 2016 ABBYY NAHQ, 2016 Pierre van der Westhuizen © ABBYY Confidential
Transcript
Page 1: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

ABBYY TechnologySummit2016

ABBYY NAHQ, 2016

Pierre van der Westhuizen

© ABBYY Confidential

Page 2: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Introduction:Designing High Performance Systems

• Upscaling FlexiCapture, identify bottlenecks, and optimize system performance

• Performance Metrics and testing

• FlexiCapture Performance at a glance

– Scaling

– Up to 3 million pages per day

– High Fault Tolerance and Availability

© ABBYY Confidential 2

Page 3: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Agenda:Designing High Performance Systems

• Introduction• *Architecture of FlexiCapture• Component Interaction Walkthrough• Defining Performance Metrics• Scaling of Systems (Demo, Medium, Large)• *Optimizing: Processing Stations, Scanning Stations and Workflow• Optimal Values of and Limitations on System Performance• System Monitoring and Bottleneck Detection• Performance testing• *Improving your current system

© ABBYY Confidential 3

Page 4: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Architecture

© ABBYY Confidential 4

Page 5: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

• Application Level

• Application Server

• Licensing Server

• Processing Level

• Processing Server

• Data Storage

• Database

• File Storage

© ABBYY Confidential 5

Architecture – Server Side

Page 6: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

• User Stations

• Scanning

• Verification

• Processing Stations

• Administration/Monitoring Web Console

• Project Setup Station

© ABBYY Confidential 6

Architecture – Client Side

Page 7: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Component Interaction

© ABBYY Confidential 8

Page 8: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Component Interaction

© ABBYY Confidential 9

Page 9: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Performance Metrics

© ABBYY Confidential 11

Page 10: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Defining Performance Metrics

We measure performance in volumes processed per period of time.

Define target performance using performance metrics:

• The required processing time

• Processing volumes

© ABBYY Confidential 12

Page 11: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Parameters that shape workload

• Average batch size in pages

• Image color mode: color, grayscale, black-and-white

• Pages per day (i.e. 24 hours), average/peak

• Pages per hour, average/peak

• Average document size in pages

• Number of scanning operators

• Number of verification operators

• Document storage time

© ABBYY Confidential 13

Page 12: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

SCALING

© ABBYY Confidential 14

Page 13: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

© ABBYY Confidential 15

Scaling

Page 14: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Demo System

© ABBYY Confidential 16

Page 15: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Medium System

© ABBYY Confidential 17

Page 16: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Medium System:Application Server

© ABBYY Confidential 18

1. Fast network2. Fast connection to FileStorage and Database3. Fast CPU4. RAM

Page 17: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Medium System:Processing, Licensing Servers

© ABBYY Confidential 19

For redundancy see:

FlexiCapture System Administrator Guide

Page 18: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Medium System:Database Server

© ABBYY Confidential 20

• More RAM• Fast HDD• Avoid Mirroring• Separate Data and Logs• Index Updates

Page 19: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Medium System:File Storage

© ABBYY Confidential 21

Read-write and capacity requirements depend on:

• Average and Peak processed per daySpeed Required for 10,000pages/hr2.8pages/sec = 2.8*3MB/sec = 8.4MB/sec

• Amount of time that documents are stored:e.g. 16 x 100,000 grayscale images x 3 MB (average file size for grayscale image) = 4.8 TB of data

NOTE: We strongly recommend something like RAID 10

Page 20: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Large System

© ABBYY Confidential 22

Page 21: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Optimizing Processing Stations

© ABBYY Confidential 23

Page 22: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

© ABBYY Confidential 24

Processing Stations

• Tune each station

• Add more stations

Page 23: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Processing Station:Hardware

© ABBYY Confidential 25

• 1 Process per core• 16 cores max• 1GB RAM per core

Processing speed greatly depends on the CPU speed and the Hard Disk read-write speed

Page 24: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Processing Station: TEMP Folder

Scenario: 100 page batches on 8-core Station

• 100 pages x 3 MB = 300MB

• 8 Cores means 8 simultaneous executive processes

• TEMP Folder Space required is 300 x 8 = 2.4GB

• Allocate 2GB per Core + 2.4GB for TEMP = 18.4GB RAM

© ABBYY Confidential 26

Page 25: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Calculate Number of Processing Stations

© ABBYY Confidential 27

Page 26: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Estimate the required number of processing cores

© ABBYY Confidential 28

Measure how long it takes to process one batch for one core

8-core Processing Station

Process:1. Create 24 copies of a typical batch2. Put all batches in the FlexiCapture hotfolder3. Start the timer at the first import task created 4. Stop timer after the last result has been exported to the backend

15 minutes elapsed

Each core has processed 3 batchesTime to process 1 batch is about 5 minutesIf batch has 69 pages => takes 4.35 seconds to process 1 page.

Page 27: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Estimate the required number of processing cores Cont’d

© ABBYY Confidential 29

Estimate desired number of cores

Assume you need to process P pages in T time. We already know from the above that 1 core needs t time to process 1 page. Hence, you need N = (P x t ) / T cores.

Example. 200,000 pages in 8 hours = 28,800 secondsWe know 1 core takes 4.35 seconds to process 1 page200,000 x 4.35/28,800 = 31 cores=>4 Processing Stations with 8 cores (32 cores in total) will be sufficient processing.

Page 28: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Processing Cores – Limiting Factors

© ABBYY Confidential 30

• The total load on the infrastructure that may result in bottlenecks– Server Hardware

– Network

– Shared Resources (Database, External Services)

• The number of processing cores that can be served by the Processing Server– Max 120 cores

Monitor Free Processing Cores on Processing Server

Page 29: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Optimizing: Scanning Stations

© ABBYY Confidential 31

Page 30: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Scanning Stations

© ABBYY Confidential 32

• Performance Limits– Scanner Speed

– Data Transfer Bandwidth

• Separate Network Interface for Scanning

• Setup scan settings– Color Mode

– Remove Blanks

• Schedule Image Uploads

Page 31: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Optimizing: Workflow

© ABBYY Confidential 34

Page 32: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Workflow

© ABBYY Confidential 35

• Avoid too many stages

• The slowest stage limits the performance

• Do not produce tasks that are too small when parallelizing processing at a stage

Page 33: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Optimal Values and Limits

© ABBYY Confidential 36

Page 34: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Optimal Values of and Limitations on the System Performance 1

© ABBYY Confidential 37

Factor Optimal values & limitations

System performance inpages per 24 hours:

Demo

Able to process:

up to 20,000 black-and-white or up to 1000 color pages per 24 hours

Medium up to 1 mln black-and-white or upto 300,000 color pages per 24 hours,using a farm of regular computers

Large up to 3 mln black-and-white or upto 1 mln color pages per 24 hours

Page 35: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Optimal Values of and Limitations on the System Performance 2

© ABBYY Confidential 38

Factor Optimal values & limitations

Number of scanning operators FlexiCapture is able to host 1000scanning operators.

Number of verification operators FlexiCapture is able to host 300verification operators.

Number of processing Stations We used up to 120 cores in total for allProcessing Stations.

Number of cores per Processing Station regular disk drive: up to 8 cores.fast disk drive: up to 16 cores.RAM drive: up to 32 cores.

Page 36: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Optimal Values of and Limitations on the System Performance 3

© ABBYY Confidential 39

Factor Optimal values & limitations

Number of pages in a Batch Optimal value is from 10 to 1000 pagesin a batch

Number of pages in a Document Optimal value is up to 100 pages in adocument

Number of pages, documents, and batchesin the system

This highly depends on hardware used.For a Large configuration, up to 100,000batches, or 1 mln documents, or 10 mlnpages is normal.

Page 37: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Optimal Values of and Limitations on the System Performance 4

© ABBYY Confidential 40

Factor Optimal values & limitations

Data storage time Typically, pages, document, batches and event log records are stored in theSystem for up to 2 weeks.

Statistics for reporting can be stored foryears with no impact on performance

Page 38: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Performance Testing

© ABBYY Confidential 41

Page 39: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Performance Testing

© ABBYY Confidential 42

• Single Entry Point Project– Import from Scanning Station

– Pre-processing

– Recognition

– Export

– Processed

Page 40: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

System Monitoring and Bottleneck Detection

© ABBYY Confidential 43

Page 41: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

System Monitoring and Bottleneck Detection

• Document processing monitoring via the Administration and Monitoring Console

• Hardware monitoring for each FlexiCapture server component using various Windows Performance Monitor counters. – Memory– CPU– Hard Disk– Network– IIS– SQL Server

© ABBYY Confidential 44

Page 42: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Setting up Performance Counters

© ABBYY Confidential 45

• Monitor FlexiCapture state and search for bottlenecks– Performance Monitor utility

• Recorded by Processing Server

• mmc /32 perfmon.msc

• Enable Counters

Page 43: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Improving Your Current System

© ABBYY Confidential 46

Page 44: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Improving your current System

© ABBYY Confidential 47

• Separate your Servers– App Server

– Processing Server/License Server

– Processing Station

– Database Server

Page 45: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Improving your current System Cont’d

© ABBYY Confidential 48

• Database– More RAM is better – At least 4GB

– Fast Drives

– Data and Logs on Separate Drives

– Autogrowth – 100MB Increments

– Simple Recovery Model

– Maintenance Plans• Backups – Truncate Transaction Log

• Indexes

Page 46: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Improving your current System Cont’d

© ABBYY Confidential 49

• Application Server– IIS Logs

– Caching

– Fast Hard Drive

– App Server Recycling Pool

– Number of Threads

– 2 NICs 1GB/s

Consider Load Balancing

Page 47: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Improving your current System Cont’d

© ABBYY Confidential 50

• File Storage– Disable Search indexing and anti-virus scanning of FileStorage

– Do not store images in SQL database

– 1GB/s access

• Batches– Purging (2 weeks or less)

– Limit Big batches (Less than 100 pages per batch)

• Input and Output– Put input directory on Server

– Separate location for Export and Hotfolders

Page 48: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Improving your current System Cont’d

© ABBYY Confidential 51

• Networking– Network Speed

– Switching

– VLANS

– Network Interfaces

Page 49: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Final Summary

© ABBYY Confidential 52

• Architecture– Medium System

• How to Optimize Processing Stations– RAM

• Improving your current system– Separate your Servers!

Page 50: ABBYY FlexiCapture & Designing a High-Performance System at #ABBYYSummit16

Questions

© ABBYY Confidential 53


Recommended