Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | frederica-parker |
View: | 222 times |
Download: | 0 times |
Detecting Cartoons a Case Study in Automatic Video-
Genre Classification
Tzvetanka IanevaArjen de Vries
Hein Röhrig
Outline
• Goal: remove cartoons from search results in TREC-2002 video track
• Our Approach: extract Image Descriptors & SVM Machine Learning
• Related work• Novel Descriptors from Granulometry• SVM Learning• Experimental Results
TREC-2002 video track
• TREC- workshops for large scale evaluation of information retrieval technology
• CWI participation: Probabilistic Multimedia Retrieval Model
• does not distinguish sufficiently “Cartoons”
Example of undesirable ‘cartoon’Query
Best Matches returned
Related work• M.Roach et al. Motion based classification
of cartoons (2001)• B.T.Truong et al. Automatic genre
identification for content-based video categorization (2000)
• J.R.Smith et al. Searching for images and videos on the world wide web
• N.C.Rowe et al. Automatic caption
localization for photographs on www pages
• V.Athitsos et al. [ASF] Distinguishing
photographs and graphics on the www
Cartoons• What is a Cartoon?
– Cartoons do not contain any photographic material
– Photos photographic camera
• Appears easy to find cartoons – Few, simple, strong colors, patches of
uniform colors, strong black edges, text
Quiz: Cartoon or Photo?
Examples not so Typical
Photos like cartoons
“Cartoons” like photos
Artificial photos
Small cues
Overlapping Frames
Mixed
Shadow & Sparkle
Image Descriptors
• greater correlation• normalized• Example: avg. sat., thresh. brightness
Input Image
Image descriptors0.6231 0.9266 …
0.2880 0.4125
(240x352x3)
…
……
1 2 148
1 2 148
Overview of our all image descriptors
Image Descriptors Dimension average saturation 1
threshold brightness 1 color histogram 45 edge-direction histogram 40 compression ratio 1 multi-scale pat. spectrum 60
Brightness and Saturation
• HSV color model• Cartoons brighter =>
use % pixels with Value > 0.4
• Cartoons have strong colors =>
use average Saturation
Saturation in cartoon and photo images
0.2880
0.6231
RGB S-(HSV) RGB S-(HSV)
Brightness in cartoon and photo images
.
0.9266 0.4125
RGB V-(HSV) RGB V-HSV
Histograms
• Image I : XxY -> Rc
• Filter F : I -> I’
• Bins Bk partition of Rc
• hk = #{ (x,y) : I’(x,y) є Bk }
• E.g. brightness metric: I grayscale, c=1, B1 = [ 0, 0.4 ], B2=[0.4,1], return h2
Color Histogram
• More general than brightness & saturation• Again HSV color space• Partition HSV into 3x3x5
= 45 bins• Cartoons have less
colors => col. hist. desc.
Color histogram for in the 45-bin HSV
Color histogram for
in the 45-bin HSV
Edge detection• Cartoons have strong black edges =>
• Approx. total derivative of intensity
I(x,y)Ix,y,
Ix,y
x y
Approx. || and histogram of (, ||) 5 intervals for || 0 … sqrt(20) 8 intervals for 0 … 2
Edge angles & edge magnitudes
Edge histogram
Compressibility
• Cartoons: more simple composition• Detect complexity by measuring
compression ratio• Theory: “Kolmogorov complexity”• Our application: use lossless PNG
compression• Lossy JPEG not useful
0.13548 0.23365
Granulometries
• Idea: measure size distribution of objects
• How? openings by structuring element of growing scale
• Normalized size distribution
• Derivative = pattern spectrum
Openings
• Opening = erosion then dilation with same SE )]([)( ˆ ff BBB
Structuring Elements
• Non-flat parabola better(?) than flat disk
• Parabola: efficient computation, symmetry
)},(),({min)(),(
yxByxffByx
B
Small-scale pattern spectrum descriptors
SE disk
ri = i, i = 1,…20
SVM Learning• Simplest case:
linear separator• SVM finds
hyperplane with largest margin
• Closest points = Support Vectors
SVM Learning: nonseparable
• Noisy data: no separating hyperplane at all!
• Solution: penalty C for points inside the margin
• C SVM machines
SVM = quadratic programming
l
iii
i
ji
l
jijiji
l
ii
y
liC
xxyy
1
1,1
0
,,,1 0:subject to
2
1max
libxwy
Cw
iii
l
ii
bw
,,1 -1:subject to
2
1min
1
2
,,
SVM task:
Equivalent dualproblem:
SVM with kernels
l
iii
i
ji
l
jijiji
l
ii
y
liC
xxkyy
1
1,1
0
,,,1 0:subject to
),(2
1max
SVM task:
Equivalent dualproblem:
libxwy
Cw
iii
l
ii
bw
,,1 -1)(:subject to
2
1min
1
2
,,
FRn : )ˆ()()ˆ,( xxxxk
SVM kernels
2
2
2
ˆexp)ˆ,(
xx
xxk
qxxxxk 1ˆ)ˆ,(
RBF kernels
Polynomialkernels
SVM with kernels: decision function
l
iii
i
ji
l
jijiji
l
ii
y
liC
xxkyy
1
1,1
0
,,,1 0:subject to
),(2
1max
SVM task:
Equivalent dualproblem:
libxwy
Cw
iii
l
ii
bw
,,1 -1)(:subject to
2
1min
1
2
,,
Decision function:
bxxkyxfl
iiii
1
),(sgn)(
Experimental Data
• Key frames from TREC 2002 Video Track
• 13,026 photographic images• 1,620 cartoons• Manually classified• Experiments 1-3: train on (random)
3908 photos and 486 cartoons
Experiment 1: individual performance
0,0027
0
0,0095
0
0
0,0002
0,9541
1
0,754
1
1
0,9497
0,108
0,1106
0,0919
0,1106
0,1106
0,1052
average saturation
treshhold br ightness
color histogram
edge histogram
compression ratio
pattern spectrum
Error photos
Error cartoons
Total error
σ2 = 0.1
0.05 < σ2 < 0.5
σ2 = 0.07
0.05 < σ2 < 0.5
0.05 < σ2 < 0.5
σ2 = 0.07
Et = Ep +Ec
|p|
|p|+|c|
|c|
|p|+|c|
Experiment 2: “convergence” of SVM
learning
0,1020
0,1040
0,1060
0,1080
0,1100
0,1120
erro
r
1/ 2 1/ 4 1/ 6 1/ 8 1/ 10 1/ 12 1/ 14 1/ 16 1/ 18
σ²(Pattern spectrum)
Experiment 3: combined performance
0,0068
0,0111
0,0068
0,009
0,0098
0,011
0,0111
0,6914
0,657
0,7734
0,672
0,6684
0,7046
0,6437
0,0825
0,0825
0,0916
0,0823
0,0826
0,0884
0,0811
all - average saturation
all - treshhold br ightness
all - color histogram
edge histogram
all - compression ratio
all - pattern spectrum
all
Error photos
Error cartoons
Total error
σ2 = 0.06
Experiment 4: web-image classifier on our data
0.0
0.1
0.2
0.3
0.4
0.5
100 200 300 400 500 600
training set
erro
r we
[ASF]
Test set: random 1,000 photos and 1,000 cartoons
Experiment 5: Performance on web images
0
0,02
0,04
0,06
0,08
0,1
erro
r
we [ASF]
+ dimension and file type features
Comparison with 14,039 photographic and 9,512 graphical images harvested from WWW train on (random) 4239 photographics and 2826 graphics
Conclusions
• Hard task: good classifier• Use dynamics/spatio-temporal
relations ?• Semantic Gap?• Combine classifiers? • Granulometry not enough