ABBYY FlexiCapture 12: Machine Learning at #ABBYYSummit17

Post on 21-Jan-2018

176 views 3 download

transcript

ABBYY TechnologySummit2017

© ABBYY Confidential

ABBYY NAHQ, 2017

FlexiCapture Technical Track

ABBYY TechnologySummit2017

Machine Learning

ABBYY NAHQ, 2017

Chip VonBurg

© ABBYY Confidential

Cloud

© ABBYY Confidential 3

Web

© ABBYY Confidential 4

“Laser”

© ABBYY Confidential 5

Agenda

• What is machine learning?

• Machine Learning Algorithms and Scenario's

• Machine learning in FlexiCapture

• Q&A

© ABBYY Confidential 6

© ABBYY Confidential

What is Machine Learning

What is Machine Learning

• Machine learning is a method of data analysis that automatesanalytical model building

• Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look

• Evolved from the study of pattern recognition and computational learning theory

• Machine learning is closely related to (and often

overlaps with) computational statistics

Confidential 8

What is Machine Learning

Machine Learning enables applications to execute logic that wasn’t explicitly built

© ABBYY Confidential 9

What is Machine Learning

• How is machine learning used?– Biology/Genetics

– Search and recommendations• Music Services (song/station suggestions)

• Movie Services (Netflix/Amazon, etc)– Thumbs Up/Down is an example of Reinforcement learning and a feedback loop

• Add Services

– SPAM detection

– Anywhere Pattern matching can be applied

© ABBYY Confidential 10

What is Machine Learning

Machine Learning requires a few basic items:

• Sample Data

• An iterative process

• Machine Learning Algorithms– Naive Bayes

– Support Vector Machine (SVM)

– Genetic Algorithms

– Deep learning

Confidential 11

The structure of a typical Machine Learning Process

© ABBYY Confidential 12

Lets see an example:http://regex.inginf.units.it/

© ABBYY Confidential

© ABBYY Confidential

Machine Learning Algorithms and Scenarios

Naive Bayes

• Bayes Theorem was developed in 1700’s by Thomas Bayes– Provides a means for directly calculating the probability of a statement being true based

on the available evidence

• Naive Bayes classifier is a simple probabilistic classifiers applying Bayes' theorem with strong (naive) independence assumptions between the features

• Does an item belong or not belong based on the features within it?• Benefits of Naive Bayes classifiers

– Naive Bayes classifiers can be taught very effectively depending on the exact nature of the probabilistic model.

– Despite their very simplistic terms, Naive Bayes classifiers often work well in many complex tasks.

– The advantage of the Naive Bayes classifier is the small amount of training data needed.

Confidential 15

Support Vector Machine (SVM)

• SVM are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis

• Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other

• In addition to performing linear classification, SVMs can efficiently perform a non-linear classification

• The method is widely used in virtually all tasks. It is considered a basic method; more complex methods are used only if they provide a significant advantage over the support vector machine.

Confidential 16

Deep learning

• Deep learning is a form of a Neural Network with a bias

• Learning can be supervised, partially supervised or unsupervised.

• Some representations are loosely based on interpretation of information processing and communication patterns in a biological nervous system, such as neural coding that attempts to define a relationship between various stimuli and associated neuronal responses in the brain.

• Have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation and bioinformatics where they produced results comparable to and in some cases superior to human experts

© ABBYY Confidential 17

Machine Learning Scenarios

• Supervised Learning (on labeled data)– The main learning scenario. It requires manually labeled data. This is useful

for tasks that do not require a lot of labeled data or where such data emerge naturally.

• Unsupervised Learning (on unlabeled data)– This scenario makes it possible to find patterns in large arrays of raw data

(big data analysis). It can be used in combination with supervised learning in order to significantly reduce the amount of manual labeling.

• Reinforcement learning (by correcting errors on the fly)– The scenario allows to improve results during system operation, including

via interaction with the user

Confidential 18

Machine Learning Practical Examples

Sort these documents into like categories:

• Here are examples of each category– Example of supervised learning

– Example of standard training process used for classification or extraction

– Algorithm must determine what features all of the included documents have in common that excluded documents don’t have

Confidential 19

© ABBYY Confidential

Machine Learning in FlexiCapture

Machine Learning in FlexiCaptureClassification

© ABBYY Confidential 23

Machine Learning in FlexiCaptureClassification

© ABBYY Confidential 24

Machine Learning in FlexiCaptureField Training

© ABBYY Confidential 25

Machine Learning in FlexiCaptureField Training

© ABBYY Confidential 26

INPUTRECOGNITION

VERIFICATION

AUTOMATIC PROCESSING

EXPORT

AUTO-LEARNING

Machine Learning at ABBYY

• In addition to product facing changes, Machine Learning processes continue to help ABBYY produce and refine technology internally:– Real world classifiers

– Recognition engine training

– NLP Model training

– Etc

© ABBYY Confidential 27

Why do we care?

© ABBYY Confidential 28

automatic

Pro

ject

Sta

rt

Dev

elo

pm

en

t

Pro

du

ctio

n

Val

ue

Previous Approach

With Machine Learning

Summary

• Machine learning is a analytical automated approach to pattern matching

• Machine learning can be used to augment and improve traditional rule based processes

• Machine learning helps flip flop the traditional development cycle and speed projects to value

© ABBYY Confidential 29

Questions?

© ABBYY Confidential