Michele Merler and John R. Kender Computer Science Department {mmerler,jrk}@cs.columbia.edu
Columbia University
Semantic Keyword Extraction via Adaptive Text Binarization of
Unstructured Unsourced Video
Example of indexing function: the word Energy is recognized in slides across 4 different presentations
8 presentation videos, 1hr 45 mins, ~13 Slides each
• Studies have been conducted in order to assess the reliability of slides as a summarization tool [He et al.2000]
Text1 Text2 Text3 Text4 Text5 Text6 Text7 Text8 Text9 Text10
Completed Tasks
Rcscarch
Interview with Client
{ i sited Project SpaceQiant House Resident Association Meeting
LoG edgesEdges Connected
Components Constraints
Empirically validated thresholds applied to
prune non-text regions
56
32
101000
height
height
widthwidth
AreaArea
Area
FR
FR
FR
F
Geometric Constraints
Edge Density ConstraintF – FrameR – Candidate Text Region
Alignment Regions Merge Constraints
Binarization - 54 Detected Regions, 2177154 annotated pixels
N. Ground Truth
Characters
N. Recognized
CharactersPrecision Recall
13804 7376 0.5343 0.7446
N. Ground Truth
Words
N. Recognized
WordsPrecision Recall
2276 1126 0.4947 0.6651 0
2000
4000
6000
8000
Recognition Method
N.
Recogniz
ed C
hara
cte
rs
Tesseract
LAO + Tesseract
),( rgtgtccorc ssEDNN
Edit Distance between Ground Truth String and Recognized String
Videos of presentations are employed in a large variety of systems for
different purposes
• Distance or E-learning• Automatic generation of conference proceedings • Student presentations
Challenges
Result: Fully automatic method for summarizing and indexing unstructured presentation videos based on text extracted from the projected slides
Integration into summarization and presentation tools such as the VAST MultiMedia Browser1 (see image above)
1. www.aquaphoenix.com/research/vastmm 2. http://code.google.com/p/tesseract-ocr 3. http://en.wikipedia.org/wiki/Letter_frequencies
Introduction Local Adaptive Otsu (LAO) Binarization
Proposed Approach
Results
Optimize for threshold T maximizing between-class variance in sliding window
22 ))()()(()()( TTTnTnT FBFBbetween
1
),,(1),,(),(
R
WyxkWyxyxT
)()()()()( 222 TTnTTnT FFBBwithin
Many videos are already archived
Low quality Lack of Structure
• Lack of additional sources of information
(e.g. electronic copies of slides)
• Not recorded by professional cameramen
• Shots cannot be used as clue
• Not edited
• Unconstrained camera movements
• Slides Clipped
• Compression
1. Segment video into semantically distinct shots based on slides
2. Slides Text Detection
10|)()(|
0),(
ji
ji
ji RxRx
RRRRmerge
2.0densityE
3. Slides Text Recognition
• Double Text Regions Size with Bilinear Interpolation
• LAO Binarization
• Tesseract2 OCR Engine• Training with 15 character sets• Height 30pt
Algorithm Precision Recall F1* t(sec) Binarization Example
Otsu 0.8611 0.8555 0.8583 0.539
Sauvola (k = 0.5) 0.9003 0.8759 0.8879 0.626
LAO 0.8831 0.9278 0.9049 2.126
LAO + Int. Hist. 0.8831 0.9278 0.9049 1.29 RecallPrecision
RecallPrecisionF1
2
Most popular fonts used in presentation slidesText reflecting English
letters frequencies3
Slides Text Recognition - 2276 words, 13804 characters
PrecisionSIMPLE RecallSIMPLE
0.71213 0.85914
PrecisionREFINED RecallREFINED
0.88584 0.680460
0.2
0.4
0.6
0.8
1
Precision Recall
•SIMPLE – use all candidate text regions
•REFINED – prune candidate text regions based on OCR output-
Slides Text Detection - 500 Frames
GT
EGT
E
EGT
TA
TATARecall
TA
TATAPrecision
Ground Truth Text Regions Area
Extracted Text Regions AreaTAE
TAGT
400 with text 100 no text
Text vs. Non-TextForeground vs. Background
Pixel Value pF BT
)(ph
2
F 2
B
Character Recognition
Word Recognition
N
r(c/w)
cor(c/w)
NRecall
gt(c/w)
cor(c/w)
N
NPrecision
• No electronic copies of the slides
• Changes in text used to assess slide changes
• The clients desire to generate wind energy • Grey Water system/Hydroelectric Energy@;gm;y,g[g_,bjns and energy efficient light bulbs
electrical/thermal energy Reallstlc energyPV and energy storage system hall energy needs
Optimal version of Sauvola’salgorithm
Localized version of Otsu’s algorithm [Otsu 79]
• Assumption: bimodal distribution (foreground/background)
• Fast implementation with Integral Histogram [Porikli et al. 05]
[Sauvola et al. 00][Otsu 79]
Dependency from k is removed!
*