Performance Evaluation of GANs
in a semi-supervised OCR Use Case
Florian Wilhelm London, 2018-10-11
Special Interests
• Mathematical Modelling
• Recommendation Systems
• Data Science in Production
• Python Data Stack
• Maintainer of PyScaffold
Dr. Florian Wilhelm
Principal Data Scientist @ inovex
@FlorianWilhelm
� FlorianWilhelm
florianwilhelm.info
2
Florian TantenMaster Thesis @ inovex October 2017 - May 2018
IT-project house for digital transformation:‣ Agile Development & Management‣ Web · UI/UX · Replatforming · Microservices‣ Mobile · Apps · Smart Devices · Robotics‣ Big Data & Business Intelligence Platforms‣ Data Science · Data Products · Search · Deep Learning‣ Data Center Automation · DevOps · Cloud · Hosting‣ Trainings & Coachings
Using technology to inspire our clients. And ourselves.
inovex offices inKarlsruhe · Cologne · Munich · Pforzheim · Hamburg · Stuttgart.
www.inovex.de
4
Agenda
1. Use Case
2. Text Spotting
3. Data and Pipeline
4. Generative Adversarial Networks
5. Semi-supervised Learning
6. Results
5https://www.autocheck.com/vehiclehistory/autocheck/en/vinbasics
Vehicle Identification Number (VIN)
Unique identifier like a fingerprint of a vehicle
serial number
country security codemodel year
assembly plant
details
flexible fuel vehicles
manufacturer
6
Use Case
VIN:WF0DXXGAKDEJ37385
VIN-Decoder Manufacturer: BMWModel: X3Year: 2013-03-21Engine power: 143 PS
Equipment:- Xenon Lights...
Information about the car:
Spotting the vehicle identification number (VIN) in images of vehicle registration documents
7
OCR -Libraries
PyOCR
Commercial software Open source tools
8
„VSSZZZGJZHR03G533“
???+
OCR with Tesseract
9
Agenda
1. Use Case
2. Text Spotting
3. Data and Pipeline
4. Generative Adversarial Networks
5. Semi-supervised Learning
6. Results
Character detection & extraction Character recognition
11Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“
Methodology in Text Spotting
Sliding Window
Computer Vision Tools
Others
- Connected components- Stroke width transform- Edge detection
- SVM- Learning with HOG- CNN
- Region proposal- Hypotheses CNN pooling
Character or word
CNN
CNN + RNN
SVM
Nearest Neighbor
High-performer current studies
CNN = Convolutional Neural NetworkSVM = Support Vector MachineHOG = Histogram of oriented GradientsRNN = Recurrent Neural NetworksRL = Reinforcement Learning
379Character Recognition
...
Spotting = Detection + Recognition
12https://en.wikipedia.org/wiki/Convolutional_neural_network; http://intellabs.github.io/ParallelJavaScript/
Convolutional Neural Network
Max pooling with a 2x2 filter and stride = 2Convolution with 3x3 kernel and stride = 1
14
Agenda
1. Use Case
2. Data and Pipeline
3. Semi-supervised Learning
4. Generative Adversarial Networks
5. Semi-supervised Learning
6. Results
15
Objectives
- ~170 images of vehicle registration documents
b) Semi-supervised method
a) Supervised method
2. Comparison of classifiers
1. Implementation of a prototype „XLG0H200NA0A10348“
Dataset:
Text Spotting
16
End-to-End Text Spotting Pipeline
Sliding window
Character Detector (2 classes)
Chararacter Recognizer (36 classes)
Only one window per character
All windows
Non Maximum Suppression
All windows with characters
Region of Interest Extractor
Image depicting only VIN
X L G 0 H 2 0 N A 10 04 43 80
17
Small DatasetWhat to do about that?
1. Data Generation
2. Data Augmentation
18
Data Augmentation
Data augmentation:
Datasets:
Original image labeled manually as „0“
2 classes 36 classes
Chararacter Recognizer (36 classes)
Label: „0“
Character Detector (2 classes)
Label: „character“
Label: „no character“
19
170 images of vehicle registration documents
Training set
85 images 85 images
Training sets of classifiers Testing sets of classifiers Testing sets of pipeline
85 images
RecognizerDetector
~ 42000 images2 classes
~ 8000 images36 classes
~ 42000 images2 classes
~ 8000 images36 classes
RecognizerDetector
Data Augmentation Data Augmentation
Testing set
Datasets
20
Classifiers
1. Supervised Convolutional Neural Network
2. Semi-supervised Generative Adversarial Network Generator Discriminator
Input Feature extraction Classification
21
Agenda
1. Use Case
2. Text Spotting
3. Data and Pipeline
4. Generative Adversarial Networks
5. Semi-supervised Learning
6. Results
22
Yann LeCunDirector of Facebook AI Research, Prof at NYU
“... (GANs) and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.“
Ian J. Goodfellow @ Google Brain
23
Generative Adversarial Network
Generator (G) Discriminator (D)
Goal: Generate images, which seem to be realistic
Goal: Differentiate between fake and real images
24
Generative Adversarial Network
Generator (G)
Discriminator (D) Is D correct?
„D classified the generated image as 10% real“
„yes“
AB...89F
Real imagesReal labeled images
25Goodfellow et al. (2014), Generative Adversarial Networks
Mathematical formulation
Discriminator outputfor real images
Discriminator outputfor fake images
Discriminator calculates likelihood [0,1] for an image being real
Maximizing discriminator loss
Minimizing generator loss
Objective function
Training (alternating)
26
Example of generated images
Training images: Generated images during learning process:
27
Agenda
1. Use Case
2. Text Spotting
3. Data and Pipeline
4. Generative Adversarial Networks
5. Semi-supervised Learning
6. Results
28
Semi-supervised Learning
Supervised Learning
UnsupervisedLearning
Semi-supervised Learning
• Makes use of unlabeled data
• Combines supervised and unsupervised learning
29
Semi-supervised GAN for Character Detection
Real labeled images
Real unlabeled images
Generator
Discriminator
30
Agenda
1. Use Case
2. Text Spotting
3. Data and Pipeline
4. Generative Adversarial Networks
5. Semi-supervised Learning
6. Results
31
Character Detector (2 classes)
60,00%
70,00%
80,00%
90,00%
100,00%
20 50 100 200 400 700 1000 5000 15000 30000 42000
DCNN DCNN pretrained
„Character“ „No character“
Manually generated images with CAPTCHA methods
Pretraining of DCNN
Size of labeled training set
Acc
urac
y
Bildschirmfoto 2018-04-24 um 17.48.20Bildschirmfoto 2018-04-24 um 17.48.20
32
Character Detector (2 classes)
60,00%
70,00%
80,00%
90,00%
100,00%
20 50 100 200 400 700 1000 5000 15000 30000 42000
DCNN DCNN pretrained Supervised GAN
Generator
Discriminator
Real labeled images
CCF
C C
F
Supervised GAN
Size of labeled training set
Acc
urac
y
Bildschirmfoto 2018-04-24 um 17.48.20Bildschirmfoto 2018-04-24 um 17.48.20
33
Character Detector (2 classes)
60,00%
70,00%
80,00%
90,00%
100,00%
20 50 100 200 400 700 1000 5000 15000 30000 42000
DCNN DCNN pretrained Supervised GAN Semi-supervised GAN
Discriminator CCF
Generator
F
Real labeled images
CC
Real unlabeledimages
Semi-supervised GAN
Size of labeled training set
Acc
urac
y
Bildschirmfoto 2018-04-24 um 17.48.20
34
Character Recognizer (36 classes)
0,00%
10,00%
20,00%
30,00%
40,00%
50,00%
60,00%
70,00%
80,00%
90,00%
100,00%
36 72 108 200 300 400 600 800 1000 5000 8000
60,00%
70,00%
80,00%
90,00%
100, 00%
20 50100
200400
7001000
5000
15000
30000
42000
DCNN DCNN pr etrained Sup ervised GAN
Character DetectorCharacter Recognizer
Size of labeled training set
Acc
urac
y
Size of labeled training set
Acc
ura
cy
Bildschirmfoto 2018-04-24 um 17.48.20
..
35
End-to-End Text Spotting Pipeline
Sliding window
Character Detector (2 classes)
Chararacter Recognizer (36 classes)
Non Maximum Suppression
Region of Interest Extractor
Accuracy = 99.94%
85 images
1.
2.
85.
.
36
Google Cloud Vision API
Sliding window
Character Detector (2 classes)
Chararacter Recognizer (36 classes)
Non Maximum Suppression
Region of Interest Extractor
85 images
∅ Levenshtein distance = 4.49
85 images of VINs
...
Our ApproachGoogle Cloud Vision API vs.
∅ Levenshtein distance = 0.011
Levenshtein distance:
Classification Label
AYZ33 XYZ321 = 3
37
Key Learnings
• Custom solutions can tremendously outperformoff-the-shelve software in a specific use-case
• Semi-supervised GANs can be successfullyapplied in use-cases with little data
• With simple data augmentation techniques having only little data might be enough
38
Bibliography
- Krizhevsky et al. (2012) „ImageNet Classication with Deep Convolutional Neural Networks“
- Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“
- Girshick et al. (2015), „Fast R-CNN“
- Girshick et al. (2015), „Faster R-CNN“
- He et al. (2017), „Mask-R-CNN“
- Goodfellow et al. (2014) „Generative Adversarial Networks"
Thank you!
Florian Wilhelm
Principal Data Scientist
inovex GmbH
Schanzenstraße 6-20Kupferhütte 1.13 51063 Köln