Date post: | 16-Jan-2016 |
Category: |
Documents |
Upload: | myrtle-mosley |
View: | 220 times |
Download: | 2 times |
End-to-End Text Recognition with End-to-End Text Recognition with Convolutional Neural NetworksConvolutional Neural Networks
Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng
Computer Science Department
Stanford University
* Denotes equal contribution
Tao WangTao Wang 2
Scene Text Recognition OverviewScene Text Recognition Overview
• Text “in the wild” are hard to recognize
• Wide range of variations in backgrounds, textures, fonts, and lighting conditions
Street View Text Dataset K.Wang et al., 2011
ICDAR 2003 Dataset S. Lucas et al., 2003
Tao WangTao Wang 3
Detection/Classification High-level Inference
“HOTEL”
Two-Stage FrameworkTwo-Stage Framework
Tao WangTao Wang 4
Exhaustive Graph Search
MSER + SVM with RBF Kernel
Neumann and Matas, 2012
CRF + N-gram model
HOG + SVM with RBF Kernel
Mishra et al., 2012
Pictorial Structure
HOG + Random FernsK. Wang et al., 2011
Semi-Markov CRF
Appearance + GeometryWeinman et al., 2008
High-level inference
Classification and detection
Works
Tao WangTao Wang 5
Simple off-the-shelf heuristics
Learnt features + Learnt features + 2-layer CNN2-layer CNNOur approachOur approach
Graph based inference models
Hand-designed features + off-the-shelf classifier
Most other approaches
High-level inference
Classification and detection
Tao WangTao Wang 6
ICDAR 62-way cropped character classification
Detection/Classification End-to-end system after high-level inference
Various BenchmarksVarious Benchmarks
ICDAR and SVT end-to-end text recognition
ICDAR and SVT Cropped word recognition Lexicon
SOTASOTA
SOTA on ICDAR SOTASOTA
Tao WangTao Wang 7
Unsupervised Feature LearningUnsupervised Feature Learning
Contrast Normalization + ZCA whitening
K-Means
Coates et al., 2011
Tao WangTao Wang 8
Convolution ConvolutionSpatial Pooling Spatial Pooling
LL22-SVM Classifier-SVM Classifier
√√ TextText × × Non-TextNon-Text
Backpropagation
Large representation but not enough data. Overfitting?
96
256
~10K parameters for detection
~50K parameters for classification
1st layer 2nd layer
Tao WangTao Wang 9
Synthetic DataSynthetic Data
Color Statistics
Synthetic “hard negatives”
Real SyntheticUnrealistic Synthetic DataReal Data
Java.Font + Natural backgrounds
Tao WangTao Wang 10
Detector PerformanceDetector Performance
Tao WangTao Wang 11
Text Line Bounding boxes
Candidate spaces
Tao WangTao Wang 12
81.4 81.7
64
89
50
55
60
65
70
75
80
85
90
95
100
Yokobayashi etal., 2006
Coates et al.,2011
K.Wang et al.,2011
Our Approach Human
83.9
62-way classification accuracy on ICDAR cropped characters62-way classification accuracy on ICDAR cropped characters
(on ICDAR-Sample characters)
Acc
urac
y(%
)
Higher is better
Classifier PerformanceClassifier Performance
Tao WangTao Wang 13
Tao WangTao Wang 14
Ch
ar
Cla
ss
Sliding window position
Tao WangTao Wang 15
Word RecognitionWord Recognition
Lexicon:…
MAKESERIESESTATEPOKER
…
S E R I E S -5.45
7.82
-1.74
-9.02
max ∑
Tao WangTao Wang 16
76
82
90
62
84
57
7370
40
50
60
70
80
90
100
ICDAR-WD-50 ICDAR-WD-FULL SVT-WD
K.Wang et al., 2011
Mishra, et al., 2012
Our approach
Cropped Word Recognition AccuracyCropped Word Recognition AccuracyA
ccur
acy(
%)
Cropped Words Benchmarks
Higher is better
Tao WangTao Wang 17
…
…
Candidate spacesgenerated by detector
max( )j
j
MSeg
M Seg
S
Tao WangTao Wang 18
Tao WangTao Wang 19
End-to-end text recognition resultsEnd-to-end text recognition results
0.72
0.76
0.7
0.74
0.68
0.72
0.51
0.67
0.38
0.46
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
ICDAR-5 ICDAR-20 ICDAR-50 ICDAR-FULL SVT
K.Wang etal., 2011
Ourapproach
F-S
core
End-to-end Benchmarks
Higher is better
Tao WangTao Wang 20
Sample Output Sample Output Images from SVTImages from SVT
Tao WangTao Wang 21
Sample Output Images Sample Output Images from ICDAR-FULLfrom ICDAR-FULL
Tao WangTao Wang 22
max( )
max({ \ })
n c
m n c n
-- “confidence margin”
PEOSTELPEOST
POSTPOS
Hunspell
POSEPOST
PEOPLEPISTOL
…
LEXICON
Suggested Words
Our F-score: 0.38
Neumann and Matas, 2010: 0.40
c
Tao WangTao Wang 23
• Learnt features + 2-layer CNN for+ character detection and classification• Simple heuristics to build end-to-end scene text recognition system• State-of-the-art performances onState-of-the-art performances on
- ICDAR cropped character classification- ICDAR cropped word recognition- Lexicon based end-to-end recognition on ICDAR and SVT
• Extensible to more general lexicon with off-the-shelf spelling checker
ConclusionConclusion
Tao WangTao Wang 24