Detecting Cartoons a Case Study in Automatic Video-Genre Classification Tzvetanka Ianeva Arjen de...

Detecting Cartoons a Case Study in Automatic Video-

Genre Classification

Tzvetanka IanevaArjen de Vries

Hein Röhrig

Outline

• Goal: remove cartoons from search results in TREC-2002 video track

• Our Approach: extract Image Descriptors & SVM Machine Learning

• Related work• Novel Descriptors from Granulometry• SVM Learning• Experimental Results

TREC-2002 video track

• TREC- workshops for large scale evaluation of information retrieval technology

• CWI participation: Probabilistic Multimedia Retrieval Model

• does not distinguish sufficiently “Cartoons”

Example of undesirable ‘cartoon’Query

Best Matches returned

Related work• M.Roach et al. Motion based classification

of cartoons (2001)• B.T.Truong et al. Automatic genre

identification for content-based video categorization (2000)

• J.R.Smith et al. Searching for images and videos on the world wide web

• N.C.Rowe et al. Automatic caption

localization for photographs on www pages

• V.Athitsos et al. [ASF] Distinguishing

photographs and graphics on the www

Cartoons• What is a Cartoon?

– Cartoons do not contain any photographic material

– Photos photographic camera

• Appears easy to find cartoons – Few, simple, strong colors, patches of

uniform colors, strong black edges, text

Quiz: Cartoon or Photo?

Examples not so Typical

Photos like cartoons

“Cartoons” like photos

Artificial photos

Small cues

Overlapping Frames

Mixed

Shadow & Sparkle

Image Descriptors

• greater correlation• normalized• Example: avg. sat., thresh. brightness

Input Image

Image descriptors0.6231 0.9266 …

0.2880 0.4125

(240x352x3)

…

……

1 2 148

1 2 148

Overview of our all image descriptors

Image Descriptors Dimension average saturation 1

threshold brightness 1 color histogram 45 edge-direction histogram 40 compression ratio 1 multi-scale pat. spectrum 60

Brightness and Saturation

• HSV color model• Cartoons brighter =>

use % pixels with Value > 0.4

• Cartoons have strong colors =>

use average Saturation

Saturation in cartoon and photo images

0.2880

0.6231

RGB S-(HSV) RGB S-(HSV)

Brightness in cartoon and photo images

.

0.9266 0.4125

RGB V-(HSV) RGB V-HSV

Histograms

• Image I : XxY -> Rc

• Filter F : I -> I’

• Bins Bk partition of Rc

• hk = #{ (x,y) : I’(x,y) є Bk }

• E.g. brightness metric: I grayscale, c=1, B1 = [ 0, 0.4 ], B2=[0.4,1], return h2

Color Histogram

• More general than brightness & saturation• Again HSV color space• Partition HSV into 3x3x5

= 45 bins• Cartoons have less

colors => col. hist. desc.

Color histogram for in the 45-bin HSV

Color histogram for

in the 45-bin HSV

Edge detection• Cartoons have strong black edges =>

• Approx. total derivative of intensity

I(x,y)Ix,y,

Ix,y

x y

Approx. || and histogram of (, ||) 5 intervals for || 0 … sqrt(20) 8 intervals for 0 … 2

Edge angles & edge magnitudes

Edge histogram

Compressibility

• Cartoons: more simple composition• Detect complexity by measuring

compression ratio• Theory: “Kolmogorov complexity”• Our application: use lossless PNG

compression• Lossy JPEG not useful

0.13548 0.23365

Granulometries

• Idea: measure size distribution of objects

• How? openings by structuring element of growing scale

• Normalized size distribution

• Derivative = pattern spectrum

Openings

• Opening = erosion then dilation with same SE )]([)( ˆ ff BBB

Structuring Elements

• Non-flat parabola better(?) than flat disk

• Parabola: efficient computation, symmetry

)},(),({min)(),(

yxByxffByx

B

Small-scale pattern spectrum descriptors

SE disk

ri = i, i = 1,…20

SVM Learning• Simplest case:

linear separator• SVM finds

hyperplane with largest margin

• Closest points = Support Vectors

SVM Learning: nonseparable

• Noisy data: no separating hyperplane at all!

• Solution: penalty C for points inside the margin

• C SVM machines

SVM = quadratic programming

l

iii

i

ji

l

jijiji

l

ii

y

liC

xxyy

1

1,1

0

,,,1 0:subject to

2

1max

libxwy

Cw

iii

l

ii

bw

,,1 -1:subject to

2

1min

1

2

,,

SVM task:

Equivalent dualproblem:

SVM with kernels

l

iii

i

ji

l

jijiji

l

ii

y

liC

xxkyy

1

1,1

0

,,,1 0:subject to

),(2

1max

SVM task:


libxwy

Cw

iii

l

ii

bw

,,1 -1)(:subject to

2

1min

1

2

,,

FRn : )ˆ()()ˆ,( xxxxk

SVM kernels

2

2

2

ˆexp)ˆ,(

xx

xxk

qxxxxk 1ˆ)ˆ,(

RBF kernels

Polynomialkernels

SVM with kernels: decision function

l

iii

i

ji

l

jijiji

l

ii

y

liC

xxkyy

1

1,1

0

,,,1 0:subject to

),(2

1max

SVM task:


libxwy

Cw

iii

l

ii

bw

,,1 -1)(:subject to

2

1min

1

2

,,

Decision function:

bxxkyxfl

iiii

1

),(sgn)(

Experimental Data

• Key frames from TREC 2002 Video Track

• 13,026 photographic images• 1,620 cartoons• Manually classified• Experiments 1-3: train on (random)

3908 photos and 486 cartoons

Experiment 1: individual performance

0,0027

0

0,0095

0

0

0,0002

0,9541

1

0,754

1

1

0,9497

0,108

0,1106

0,0919

0,1106

0,1106

0,1052

average saturation

treshhold br ightness

color histogram

edge histogram

compression ratio

pattern spectrum

Error photos

Error cartoons

Total error

σ2 = 0.1

0.05 < σ2 < 0.5

σ2 = 0.07

0.05 < σ2 < 0.5

0.05 < σ2 < 0.5

σ2 = 0.07

Et = Ep +Ec

|p|

|p|+|c|

|c|

|p|+|c|

Experiment 2: “convergence” of SVM

learning

0,1020

0,1040

0,1060

0,1080

0,1100

0,1120

erro

r

1/ 2 1/ 4 1/ 6 1/ 8 1/ 10 1/ 12 1/ 14 1/ 16 1/ 18

σ²(Pattern spectrum)

Experiment 3: combined performance

0,0068

0,0111

0,0068

0,009

0,0098

0,011

0,0111

0,6914

0,657

0,7734

0,672

0,6684

0,7046

0,6437

0,0825

0,0825

0,0916

0,0823

0,0826

0,0884

0,0811

all - average saturation

all - treshhold br ightness

all - color histogram

edge histogram

all - compression ratio

all - pattern spectrum

all

Error photos

Error cartoons

Total error

σ2 = 0.06

Experiment 4: web-image classifier on our data

0.0

0.1

0.2

0.3

0.4

0.5

100 200 300 400 500 600

training set

erro

r we

[ASF]

Test set: random 1,000 photos and 1,000 cartoons

Experiment 5: Performance on web images

0

0,02

0,04

0,06

0,08

0,1

erro

r

we [ASF]

+ dimension and file type features

Comparison with 14,039 photographic and 9,512 graphical images harvested from WWW train on (random) 4239 photographics and 2826 graphics

Conclusions

• Hard task: good classifier• Use dynamics/spatio-temporal

relations ?• Semantic Gap?• Combine classifiers? • Granulometry not enough

Date post:	17-Dec-2015
Category:	Documents
Upload:	frederica-parker
View:	222 times
Download:	0 times

Detecting Cartoons a Case Study in Automatic Video-Genre Classification Tzvetanka Ianeva Arjen de...

Documents