+ All Categories
Home > Documents > The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in...

The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in...

Date post: 02-Jun-2020
Category:
Upload: others
View: 19 times
Download: 1 times
Share this document with a friend
30
ABBYY 3A PARTNER SUMMIT © ABBYY Confidential The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A
Transcript
Page 1: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT© ABBYY Confidential

The Theory of Everything... in Documents Processing

Oleg Sazhin, PhD, MBA ABBYY 3A

Page 2: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

Theory

Page 3: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

About 30% of revenue is invested in R&D, ~1.5 times more than the market average

More than 500 engineers work in ABBYY’s R&D,

one of the largest research centers in Artificial intelligence

Founded: Department of Image Recognition & Text Processing, 2*Computational Linguistics Departments

Resident of Skolkovo Innovation Center

Page 4: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

Leveraging SynergiesABBYY develops technologies in 4 key areas:

Text recognition

Document analysis

Lexicography (dictionary creation)

Natural language

processing

Page 5: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

ABBYY CONFIDENTIAL

Black box

?

Paper Image DATA

ECM,DMS,CRM,

DATABASE

Page 6: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

Self learning classificatrion. OCR/ICR/OMR

Page 8: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

ABBYY CONFIDENTIAL

Page analysis

detects zones that contain text, pictures, and tables.

Page 9: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

Text blocks are parsed into lines

Vertical histogram is used

Page 10: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

Horizontal histogram is used• larger gaps assumed to be spaces between words • smaller gaps interpreted as spaces between letters.

Lines are parsed into words, and words into letters

Structural classifier

Differentiating classifier

Page 11: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

OCR – Optical character recognition202 languages, including:•European languages (Latin, Cyrillic, Armenian, Greek alphabets)

•CJK: Chinese (Simplified and Traditional), Japanese, and Korean

•Thai, Vietnamese, Hebrew, Burmese•Arabic

ICR – Intelligent character recognition136 languages

OMR – Optical mark recognition

BCR – Barcode recognition

Page 12: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

ABBYY CONFIDENTIAL

Mobile

Recognition from Photo

Real Time Recognition from Video stream

Augmented Reality

Page 13: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

Fuzzy Logic & Hypothesis trees

Page 14: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

Fuzzy Logic

Hypotheses are characterized by their quality (value from 0 to 1)

How long is the address filed

O or 1No Yes

O…….1

Page 15: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

Fuzzy Logic & Hypothesis trees

n ri a 50% 35% 15%

Q=0,5 Q=0,35

Q=0,09 Q=1

In dictionary: Cambridge

Page 16: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

Semantics - understanding natural language

Page 17: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

ABBYY CONFIDENTIAL

Semantics(in the form a language-independent hierarchy of concepts)

While people speak different languages they think using largely identical concepts.

Relationships among these concepts are the same in all languages and they can be

pictured as a conceptual tree, with the thicker boughs representing more general

notions (e.g. “furniture”) and the thinner twigs representing more specific notions

(e.g. “table”).

Understandingnatural language

Page 18: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

ABBYY CONFIDENTIAL

Syntax

The syntax component detects how concepts are related to one another within

one or more sentences. The system analyzes a texts and builds a tree of syntactic

relations.

Morphology is essential for this analysis.

Joint use of the above

components enables

the system to “understand”

Understanding natural language

Page 19: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

Understanding natural language

Resolving ambiguities:

Page 20: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

ABBYY CONFIDENTIAL

Semantic Exploration – How it Works• Broadening

Instead of searching for

search for the common meaning of all these words

• Narrowing

Allows searching for a topic with a restricted meaning

Finds all documents where this word appears with any meaning.

„layoff“ OR „terminate“ OR „redundant“ OR „fire“

(TO_CAUSE_TO_LEAVE)

„apple“

apple(COMPANY_BY_NAME)

Understanding natural language

Page 22: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

Convolutional Neural Networks

Page 23: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

ABBYY CONFIDENTIAL

Convolutional Neural Networks

from "Neural Networks and Deep Learning“ by Michael Nielsen

FilterFilter

/ feature map

Page 24: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

ABBYY CONFIDENTIAL

0

FilterMouse part

Convolutional Neural Networks

Page 25: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

ABBYY CONFIDENTIAL

6600

6600

Mouse part Filter

Convolutional Neural Networks

Page 26: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

ABBYY CONFIDENTIAL

Filter 1 Filter 2 Filter 3 Filter 4

Back Propagation

Convolutional Neural Networks

Page 27: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

ABBYY CONFIDENTIAL

for:- image preprocessing- documents classification- text/image/table blocks searching- OCR- RTR (mobile real time recognition)- barcodes searching- Certain lines searching (on receipts, invoices etc)etc

Convolutional Neural Networks

Page 28: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

ABBYY CONFIDENTIAL

2016-17 World Wide Document Capture Software

Leading Capture Software Vendors Report

By Harvey Spencer, Mike Spang, Ken Chin

Harvey Spencer Associates Inc.

“A major source of ABBYY's strength lies in its R&D ….

Russian developed neural networks and other recognition algorithms

are widely acknowledged for their strength and accuracy.”

“A FlexiCapture Distributed system can be used to build sophisticated transaction capture solutions …”

Page 29: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

ABBYY CONFIDENTIAL

Page 30: The Theory of Everything in Documents Processing - ABBYY · The Theory of Everything... in Documents Processing Oleg Sazhin, PhD, MBA ABBYY 3A. ABBYY 3A PARTNER SUMMIT Theory. ABBYY

ABBYY 3A PARTNER SUMMIT

?


Recommended