+ All Categories
Home > Documents > FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... ·...

FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... ·...

Date post: 25-Mar-2019
Category:
Upload: hoangminh
View: 283 times
Download: 2 times
Share this document with a friend
37
FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September 2010 Michael Fuchs ABBYY Europe GmbH September 2010
Transcript
Page 1: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

FineReader Engine Overview & New Features in V10

Semyon SerguninABBYY HeadquartersSeptember 2010

Michael FuchsABBYY Europe GmbHSeptember 2010

Page 2: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

FineReader Engine – Processing Steps

Step 1: Image/Document Input

Step 2: Image Pre-processing Algorithms

Step 3: Document & Layout Analysis

Step 4: Recognition

Step 5: Verification of the Recognition Results

Step 6: Synthesis & Export

Page 3: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 1Image Input

Page 4: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 1. InputOpening existing images

Load images from disc or memoryBMP, PCX, DCX, GIF, PNG, DjVuJPEG and JPEG2000 (part 1)TIFF ● B&W (uncompressed, CCITT3,

CCITT3FAX, CCITT4, PackBits, ZIP, LZW)● Grayscale (uncompressed, Packbits,

JPEG, ZIP, LZW)● Colour (uncompressed, JPEG, ZIP, LZW)

PDF● Adobe PDF Library 9.0 ● Access to internal data (Metadata,

Annotations, Text Objects, etc.)

Memory Image formats: Raw, Bitmap [HBITMAP], DIB

Load images from digital camerasAdvanced image pre-processing algorithms in FRE available!

Screenshot ReaderCapture any area from the screenAny formats (including Flash)

Page 5: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 1. Input Scanning documents (TWAIN)

Scanning via TWAIN Interface

ADF (Automatic Document Feeder)Manual paper feederScanner settings

BrightnessColourResolutionImage compressionDefine scanning area (zone)Simplex / DuplexOrientation / automatic rotation / manual rotationPaper formatPaper Top/Bottom/Left/RightEtc.

Visual Component:

Alternatively the original dialogue from the scanner driver can be used

Page 6: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 2 Image Pre-Processing

Page 7: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Noise removalDespecklingScale images (i.e. interpolate images with low resolution)Rotation (90°, 180° and 270°)

Step 2. Image pre-processing Available Options

Automatic deskewing

Automatic image splitting Straighten lines of text

CroppingAutomatic rotation

Page 8: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 2. Image pre-processing Binarisation Overview

Intelligent background filtering

Adaptive Binarisation

Page 9: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 2. Image pre-processing New V10: New Binarisation

Original scan

Prev. binarisation

New binarsation

Page 10: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 2. Image pre-processing New V10: Binarisation,Textured Background optimisations

Original scan

Prev. binarisation

New binarisation

Page 11: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 2. Image pre-processing New V10: Binarisation for the IMPACT project

Original Prev. binarisation New

No text from the other page

Page 12: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 2. Image pre-processing New V10 Colour Filtering (stamps and marks)

Page 13: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 2. Image pre-processing: Camera OCRNew V10: Automatic correction of 3D perspective distortions

Before

After

Page 14: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 2. Image pre-processing: Camera OCRNew V10: Blurred images correction

Before

After

Page 15: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 2. Image pre-processing: Camera OCRNew V10: ISO noise reduction

Before

After

Page 16: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 3Document & Layout Analysis

Page 17: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 3. Document & Layout Analysis Detecting sections of a document, analyse layout and find barcodes

Page 18: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 3. Document & Layout Analysis

3 layout analysis modes are available:

Document Analysis – NormalReturns text, tables, graphics (pictures), barcodes & patchcodes, lines (separators)

Document Analysis for full text indexingGraphics & pictures are OCRed as wellReturns text, tables, graphics (pictures), text inside of pictures and diagrams, barcodes & patchcodes, lines (separators)

Document Analysis for invoices (DAI)Optimized for small fontsReturns text, tables as plain text, text inside ofpictures and diagrams, barcodes & patchcodes, lines (separators)

Page 19: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 3. Document & Layout AnalysisNew V10: Improved detection of charts and graphics

Improved detection of pictures (photographs)

Old Technology V 10 Technology

Old Technology V 10 Technology

Page 20: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 3. Document & Layout Analysis New V10: Improvements for magazine-style pages

Old Technology V 10 Technology

Correct detection of image and text blocks

Wrong detection of image and text blocks

Page 21: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 4Recognition

Page 22: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 4. RecognitionAfter line detection, character recognition is applied with different classifiers

Raster classifier Contour classifier

Feature differentiating classifier Structure classifier

Page 23: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 4. RecognitionProcessing speed - Accuracy Balance

New Accurate Mode for low resolution/quality images – slightly slower

The “old Conflict” Recognition Accuracy vs. Processing Speed still exists.

Engine 10 “solves” this with different approaches!

Significant speed increase on good quality images in a new enhanced Fast Mode

Slightly improved accuracy in Normal Mode

Image Quality does matter!

Page 24: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 4. RecognitionNew V10: Accurate mode for low resolution scans

Additional classifier trained on low resolution scans and faxes

About 20% more accurate for low resolution scansAbout 10% slower than Normal mode

Page 25: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 4. Recognition Accuracy Improvements FRE10 Normal mode vs. FRE9 Normal mode

*based on ABBYY internal tests; number of recognition errors normalized relative to FRE9_R1 values

Page 26: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 4. Recognition Speed Improvements - important notes*

*based on ABBYY internal tests

Values of speed and accuracy make sense only for comparison of ABBYY OCR technologies in these particular conditions for these particular test batches.

Please DO NOT USE these numbers as absolute values, comparing to other results of OCR technologies, taken for different batches!

Background color keys:

Page 27: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 4. Recognition Speed Comparison FRE 8, 9, 10 modes*

*based on ABBYY internal tests

Page 28: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 4. Recognition Increased speed for European languages*

*based on ABBYY internal tests

Page 29: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Chinese Simplified

Recognition testChinese Simplified, Books 79

FRE9_R7

FRE10_R1

FRE9_R1

Page 30: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Built in Multi-core support for multi page documents Added in V9 Improvements in V10

New V10: Newtuned processing profiles increase the overall performance for specific scenarios

2 Sessions tomorrow !

Step 4. RecognitionSpeed improvements through Multi-Core Support* & tuned Profiles

*based on ABBYY internal tests

1,0

1,5

2,0

2,5

3,0

3,5

4,0

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Rat

e, ti

mes

Pages in a document

Recognition performance increase rate for multi-core systems comparing to one-core system

2 cores

4 cores

Page 31: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 6 Synthesis & Export

Page 32: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

FRE9.0 PDF Export ParametersAuthorBw FormatColor FormatCreatorEmbed FontsEncryption InfoExport ModeFont ModeGray FormatKeep Text And Background ColorKeywordsPaper HeightPaper WidthPDF VersionPicture FormatPicture ResolutionProducerQualityReplace Uncertain Words With ImageRunning Title ModeSet Page Size By Layout SizeSubjectTitleWrite LinksWrite Tagged PDF MRC Params (READ ONLY)

FRE10 PDF Export ParametersScenarioMRC ModePDFA Compliance ModeResolutionResolution TypeColorityText Export ModePDF Features (READ ONLY)Picture Compression Params (READ ONLY) PDF Features

Embed FontsEncryption InfoMeta Data Writing ParamsPaper SizePDF VersionReplace Uncertain Words With ImageRunning Title ModeWrite LinksWrite Tagged PDF

Scenario ProfilesMax QualityBalancedMin SizeMax Speed

FRE10 – 7 parameters

Scenario profilesMAX PDF Quality MIN PDF Size MAX Export SpeedBalanced Quality-Size-Speed

Fast and easy adjustment of PDF export and ability to set up any of all parametersFRE 9.0 – 25 parameters

Step 6. Document ExportNew API for PDF Export in FineReader Engine 10

Page 33: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Step 6. Synthesis & Export 2nd Generation of ADRT®

New elements and enhancements from the previous ADRT®

Engine 10 offers a new API to the internal ADRT results

New elements Overall enhancement of ADRT 1.0 work

Page 34: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

E-book Reader: PDFs can be displayed but the new formats allow much more flexible rendering when switching from portrait to landscape modeFB2*ePub*

Libraries: AltoXML*

Open Document Text format: .odt* ISO Standard, XML based export format More and more often required in public projects

Step 6. Synthesis & Export New XML Output Formats

*planned for a Maintenance Release of FRE 10

Page 35: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

FineReader Engine 10 – Jumpstart Samples and Source Code for Developers

Page 36: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

FineReader Engine 10 – The must have SDK!

ABBYY made significant technology optimisations in Engine 10:

Image Pre-processing: New Binarisation = better OCR = better Results

Speed Improvements: New Fast Mode, improved Multi-core Support

Quality Improvements: New mode for low resolution images, improved Fraktur OCR

New and Improved Language Support

Improved Document Analysis and ADRT

New API Calls and Optimised Processing Profiles

New and Improved Export formats

Page 37: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September

Any questions?

Thank you for your attention!


Recommended