ABBYY TechnologySummit2017
© ABBYY Confidential
ABBYY NAHQ, 2017
FlexiCapture Technical Track
ABBYY TechnologySummit2017
Machine Learning
ABBYY NAHQ, 2017
Chip VonBurg
© ABBYY Confidential
Cloud
© ABBYY Confidential 3
Web
© ABBYY Confidential 4
“Laser”
© ABBYY Confidential 5
Agenda
• What is machine learning?
• Machine Learning Algorithms and Scenario's
• Machine learning in FlexiCapture
• Q&A
© ABBYY Confidential 6
© ABBYY Confidential
What is Machine Learning
What is Machine Learning
• Machine learning is a method of data analysis that automatesanalytical model building
• Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look
• Evolved from the study of pattern recognition and computational learning theory
• Machine learning is closely related to (and often
overlaps with) computational statistics
Confidential 8
What is Machine Learning
Machine Learning enables applications to execute logic that wasn’t explicitly built
© ABBYY Confidential 9
What is Machine Learning
• How is machine learning used?– Biology/Genetics
– Search and recommendations• Music Services (song/station suggestions)
• Movie Services (Netflix/Amazon, etc)– Thumbs Up/Down is an example of Reinforcement learning and a feedback loop
• Add Services
– SPAM detection
– Anywhere Pattern matching can be applied
© ABBYY Confidential 10
What is Machine Learning
Machine Learning requires a few basic items:
• Sample Data
• An iterative process
• Machine Learning Algorithms– Naive Bayes
– Support Vector Machine (SVM)
– Genetic Algorithms
– Deep learning
Confidential 11
The structure of a typical Machine Learning Process
© ABBYY Confidential 12
Lets see an example:http://regex.inginf.units.it/
© ABBYY Confidential
© ABBYY Confidential
Machine Learning Algorithms and Scenarios
Naive Bayes
• Bayes Theorem was developed in 1700’s by Thomas Bayes– Provides a means for directly calculating the probability of a statement being true based
on the available evidence
• Naive Bayes classifier is a simple probabilistic classifiers applying Bayes' theorem with strong (naive) independence assumptions between the features
• Does an item belong or not belong based on the features within it?• Benefits of Naive Bayes classifiers
– Naive Bayes classifiers can be taught very effectively depending on the exact nature of the probabilistic model.
– Despite their very simplistic terms, Naive Bayes classifiers often work well in many complex tasks.
– The advantage of the Naive Bayes classifier is the small amount of training data needed.
Confidential 15
Support Vector Machine (SVM)
• SVM are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis
• Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other
• In addition to performing linear classification, SVMs can efficiently perform a non-linear classification
• The method is widely used in virtually all tasks. It is considered a basic method; more complex methods are used only if they provide a significant advantage over the support vector machine.
Confidential 16
Deep learning
• Deep learning is a form of a Neural Network with a bias
• Learning can be supervised, partially supervised or unsupervised.
• Some representations are loosely based on interpretation of information processing and communication patterns in a biological nervous system, such as neural coding that attempts to define a relationship between various stimuli and associated neuronal responses in the brain.
• Have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation and bioinformatics where they produced results comparable to and in some cases superior to human experts
© ABBYY Confidential 17
Machine Learning Scenarios
• Supervised Learning (on labeled data)– The main learning scenario. It requires manually labeled data. This is useful
for tasks that do not require a lot of labeled data or where such data emerge naturally.
• Unsupervised Learning (on unlabeled data)– This scenario makes it possible to find patterns in large arrays of raw data
(big data analysis). It can be used in combination with supervised learning in order to significantly reduce the amount of manual labeling.
• Reinforcement learning (by correcting errors on the fly)– The scenario allows to improve results during system operation, including
via interaction with the user
Confidential 18
Machine Learning Practical Examples
Sort these documents into like categories:
• Here are examples of each category– Example of supervised learning
– Example of standard training process used for classification or extraction
– Algorithm must determine what features all of the included documents have in common that excluded documents don’t have
Confidential 19
Machine Learning Practical Examples
Sort these documents into like categories:
• You figure out what those categories are– Example of unsupervised learning
– Example of Clustering
– Algorithm must determine how to group objects based on like features
Confidential 20
Machine Learning Practical Examples
Sort these documents into like categories:
• Here is feedback on correct or incorrect choices that have been made– Example of Reinforcement learning
– Example of a “Feedback Loop”
– Allows for the refinement of the model through information from the user
Confidential 21
© ABBYY Confidential
Machine Learning in FlexiCapture
Machine Learning in FlexiCaptureClassification
© ABBYY Confidential 23
Machine Learning in FlexiCaptureClassification
© ABBYY Confidential 24
Machine Learning in FlexiCaptureField Training
© ABBYY Confidential 25
Machine Learning in FlexiCaptureField Training
© ABBYY Confidential 26
INPUTRECOGNITION
VERIFICATION
AUTOMATIC PROCESSING
EXPORT
AUTO-LEARNING
Machine Learning at ABBYY
• In addition to product facing changes, Machine Learning processes continue to help ABBYY produce and refine technology internally:– Real world classifiers
– Recognition engine training
– NLP Model training
– Etc
© ABBYY Confidential 27
Why do we care?
© ABBYY Confidential 28
automatic
Pro
ject
Sta
rt
Dev
elo
pm
en
t
Pro
du
ctio
n
Val
ue
Previous Approach
With Machine Learning
Summary
• Machine learning is a analytical automated approach to pattern matching
• Machine learning can be used to augment and improve traditional rule based processes
• Machine learning helps flip flop the traditional development cycle and speed projects to value
© ABBYY Confidential 29
Questions?
© ABBYY Confidential