ABBYY FlexiCapture 12: Machine Learning at #ABBYYSummit17

transcript

ABBYY TechnologySummit2017

ABBYY NAHQ, 2017

FlexiCapture Technical Track

ABBYY TechnologySummit2017

Machine Learning

ABBYY NAHQ, 2017

Chip VonBurg

“Laser”

Agenda

• What is machine learning?

• Machine Learning Algorithms and Scenario's

• Machine learning in FlexiCapture

• Q&A

What is Machine Learning

• Machine learning is a method of data analysis that automatesanalytical model building

• Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look

• Evolved from the study of pattern recognition and computational learning theory

• Machine learning is closely related to (and often

overlaps with) computational statistics

Confidential 8

Machine Learning enables applications to execute logic that wasn’t explicitly built

• How is machine learning used?– Biology/Genetics

– Search and recommendations• Music Services (song/station suggestions)

• Movie Services (Netflix/Amazon, etc)– Thumbs Up/Down is an example of Reinforcement learning and a feedback loop

• Add Services

– SPAM detection

– Anywhere Pattern matching can be applied

Machine Learning requires a few basic items:

• Sample Data

• An iterative process

• Machine Learning Algorithms– Naive Bayes

– Support Vector Machine (SVM)

– Genetic Algorithms

– Deep learning

Confidential 11

The structure of a typical Machine Learning Process

Lets see an example:http://regex.inginf.units.it/

Machine Learning Algorithms and Scenarios

Naive Bayes

• Bayes Theorem was developed in 1700’s by Thomas Bayes– Provides a means for directly calculating the probability of a statement being true based

on the available evidence

• Naive Bayes classifier is a simple probabilistic classifiers applying Bayes' theorem with strong (naive) independence assumptions between the features

• Does an item belong or not belong based on the features within it?• Benefits of Naive Bayes classifiers

– Naive Bayes classifiers can be taught very effectively depending on the exact nature of the probabilistic model.

– Despite their very simplistic terms, Naive Bayes classifiers often work well in many complex tasks.

– The advantage of the Naive Bayes classifier is the small amount of training data needed.

Confidential 15

Support Vector Machine (SVM)

• SVM are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis

• Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other

• In addition to performing linear classification, SVMs can efficiently perform a non-linear classification

• The method is widely used in virtually all tasks. It is considered a basic method; more complex methods are used only if they provide a significant advantage over the support vector machine.

Confidential 16

Deep learning

• Deep learning is a form of a Neural Network with a bias

• Learning can be supervised, partially supervised or unsupervised.

• Some representations are loosely based on interpretation of information processing and communication patterns in a biological nervous system, such as neural coding that attempts to define a relationship between various stimuli and associated neuronal responses in the brain.

• Have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation and bioinformatics where they produced results comparable to and in some cases superior to human experts

Machine Learning Scenarios

• Supervised Learning (on labeled data)– The main learning scenario. It requires manually labeled data. This is useful

for tasks that do not require a lot of labeled data or where such data emerge naturally.

• Unsupervised Learning (on unlabeled data)– This scenario makes it possible to find patterns in large arrays of raw data

(big data analysis). It can be used in combination with supervised learning in order to significantly reduce the amount of manual labeling.

• Reinforcement learning (by correcting errors on the fly)– The scenario allows to improve results during system operation, including

via interaction with the user

Confidential 18

Machine Learning Practical Examples

Sort these documents into like categories:

• Here are examples of each category– Example of supervised learning

– Example of standard training process used for classification or extraction

– Algorithm must determine what features all of the included documents have in common that excluded documents don’t have

Confidential 19

• You figure out what those categories are– Example of unsupervised learning

– Example of Clustering

– Algorithm must determine how to group objects based on like features

Confidential 20

• Here is feedback on correct or incorrect choices that have been made– Example of Reinforcement learning

– Example of a “Feedback Loop”

– Allows for the refinement of the model through information from the user

Confidential 21

Machine Learning in FlexiCapture

Machine Learning in FlexiCaptureClassification

Machine Learning in FlexiCaptureField Training

INPUTRECOGNITION

VERIFICATION

AUTOMATIC PROCESSING

EXPORT

AUTO-LEARNING

Machine Learning at ABBYY

• In addition to product facing changes, Machine Learning processes continue to help ABBYY produce and refine technology internally:– Real world classifiers

– Recognition engine training

– NLP Model training

– Etc

Why do we care?

automatic

Previous Approach

With Machine Learning

Summary

• Machine learning is a analytical automated approach to pattern matching

• Machine learning can be used to augment and improve traditional rule based processes

• Machine learning helps flip flop the traditional development cycle and speed projects to value

Questions?