+ All Categories
Transcript
Page 1: Correlative Multi-Label Video Annotation

ACM Multimedia 2007ACM Multimedia 2007

Guo-Jun Qi, Guo-Jun Qi, Xian-Sheng HuaXian-Sheng Hua, Yong Rui, Jinhui Tang, Tao Mei and , Yong Rui, Jinhui Tang, Tao Mei and Hong-Jiang ZhangHong-Jiang Zhang

Microsoft Research AsiaMicrosoft Research Asia

September 25, 2007September 25, 2007

Page 2: Correlative Multi-Label Video Annotation

MotivationMotivation Correlative Multi-Label AnnotationCorrelative Multi-Label Annotation Modeling correlationsModeling correlations Learning the classifierLearning the classifier Connections to Gibbs Random FieldConnections to Gibbs Random Field

Experiments Experiments Live DemoLive Demo

2

Page 3: Correlative Multi-Label Video Annotation

How many images and videos in the How many images and videos in the world?world?

3

May 2007: 500

millionsAug. 2007 : 1

billion2000 images

/minute

Sep. 2007 : 84

millions

Page 4: Correlative Multi-Label Video Annotation

70 - 80’ Manual Labeling

90’ Pure Content Based (QBE)

Now Automated Annotation

Year

Manual

Automatic

Learning-Based

1970 1980 1990 2000

Page 5: Correlative Multi-Label Video Annotation

Now Automated Annotation

Learning-Based

Modeling and

Learning

Classifier

Training samples

Features

Learning-based video annotation schemes

Person

Grass

Tree

Building

Road

Face

New sampleLake?

Page 6: Correlative Multi-Label Video Annotation

A typical strategy – A typical strategy – Individual Concept Individual Concept DetectionDetection

Annotate multiple concepts separatelyAnnotate multiple concepts separately

6

Low-Level Features

Outdoor Face PersonPeople-

MarchingRoad

Walking- Running

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

Page 7: Correlative Multi-Label Video Annotation

7

√ Person√ Street√ Building

× Beach× Mountain

√ Crowd√ Outdoor√ Walking/Running

√ Marching? Marching

Page 8: Correlative Multi-Label Video Annotation

Low-Level Features

Outdoor Face PersonPeople-Marchin

gRoad

Walking- Running

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

8

Low-Level Features

Outdoor Face PersonPeople-Marchin

gRoad

Walking- Running

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

Concept Model Vector

Score Score Score Score Score Score

Concept Fusion

Another typical strategy – Another typical strategy – Fusion-BasedFusion-Based Context Based Concept fusion (CBCF)Context Based Concept fusion (CBCF)

Page 9: Correlative Multi-Label Video Annotation

9

Low-Level Features

Outdoor Face PersonPeople-

MarchingRoad

Walking- Running

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

Concept Fusion

Concept Model Vector

Score Score Score Score Score Score

Page 10: Correlative Multi-Label Video Annotation

10

Our strategy – Our strategy – Integrated Concept Integrated Concept DetectionDetection

Correlative Multi-Label Learning (CML)Correlative Multi-Label Learning (CML)

-1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1 -1 / 1

Low-Level Features

OutdoorPeople-

MarchingRoadFace Person

Walking- Running

Page 11: Correlative Multi-Label Video Annotation

11

Multi-Label Annotation

No correlation

Has Correlations, but uses a second step

Model concepts and correlations in one step

Individual Detectors

Fusion Based

Integrated

1st Paradigm

2nd Paradigm

3rd Paradigm

Page 12: Correlative Multi-Label Video Annotation

Our strategy – Our strategy – Integrated Concept Integrated Concept DetectionDetection

Correlative Multi-Label Learning (CML)Correlative Multi-Label Learning (CML)

Page 13: Correlative Multi-Label Video Annotation

13

How to model concepts and the How to model concepts and the correlations among concept in a single correlations among concept in a single stepstep

Page 14: Correlative Multi-Label Video Annotation

NotationsNotations

14

Page 15: Correlative Multi-Label Video Annotation

Modeling concept and correlations Modeling concept and correlations simultaneouslysimultaneously

1 1

15

6.0,5.0,4.0,3.0,2.0,1.0x

1:,1:,1:,1:,1: treecarbeachroadperson y

02.002.0

1.0001.0

01.0

12,2

12,2

11,2

11,2

13,1

13,1

12,1

12,1

11,1

11,1

NoYesconceptfeature

/,

-

Page 16: Correlative Multi-Label Video Annotation

Modeling concept and correlations Modeling concept and correlations simultaneouslysimultaneously

1 1

16

6.0,5.0,4.0,3.0,2.0,1.0x

1:,1:,1:,1:,1: treecarbeachroadperson y

0010

00011,1

3,11,1

3,11,1

3,11,13,1

112,1

112,1

1,12,1

1,12,1

,,

NYNYConceptConcept

/,/2,1

-

Page 17: Correlative Multi-Label Video Annotation

Modeling concept and correlationsModeling concept and correlations

17

12 KDK

Page 18: Correlative Multi-Label Video Annotation

Learning the classifierLearning the classifier

Misclassification Error

Loss function

Empirical risk

Regularization

Introduce slackvariables

Lagrange dual

Find solution by SMO

18

Page 19: Correlative Multi-Label Video Annotation

Connection to Gibbs Random FieldConnection to Gibbs Random Field

Define a random field

19

Rewrite the classifier

is a random field

consists of all adjacent sites, that is, this RF is fully connected

Define energy functionDefine GRF

Page 20: Correlative Multi-Label Video Annotation

Connection to Gibbs Random FieldConnection to Gibbs Random Field

Rewrite the classifier

20

Define energy function

Intuitive explanation of CML

Define a random field

is a random field

consists of all adjacent sites, that is, this RF is fully connected

Define GRF

Page 21: Correlative Multi-Label Video Annotation

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%)

21

Page 22: Correlative Multi-Label Video Annotation

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF

((MAP=0.253MAP=0.253) 14%) 14%

22

CMLCBCFSVM

SVM CML ↑ 17%CBCF CML ↑14%

Page 23: Correlative Multi-Label Video Annotation

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF

((MAP=0.253MAP=0.253) 14%) 14%

23

CMLCBCFSVM

SVM CML ↑ 131%CBCF CML ↑128%

Page 24: Correlative Multi-Label Video Annotation

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF

((MAP=0.253MAP=0.253) 14%) 14%

24

CMLCBCFSVM CMLCBCFSVM CMLCBCFSVM

Page 25: Correlative Multi-Label Video Annotation

ExperimentsExperiments TRECVID 2005 dataset (170 hours)TRECVID 2005 dataset (170 hours) 39 concepts (LSCOM-Lite)39 concepts (LSCOM-Lite) Training (65%), Validation (16%), Testing (19%)Training (65%), Validation (16%), Testing (19%) CML (CML (MAP=0.290MAP=0.290) improves IndSVM () improves IndSVM (MAP=0.246MAP=0.246) 17% and CBCF ) 17% and CBCF

((MAP=0.253MAP=0.253) 14%) 14%

25

Page 26: Correlative Multi-Label Video Annotation

26

Page 27: Correlative Multi-Label Video Annotation

Correlative Multi-Label Video AnnotationCorrelative Multi-Label Video Annotation A new paradigm for multi-label annotationA new paradigm for multi-label annotation Models correlations and concepts Models correlations and concepts

simultaneouslysimultaneously Has a close connection to Gibbs Random FieldHas a close connection to Gibbs Random Field

27

Page 28: Correlative Multi-Label Video Annotation

Multi-Instance Multi-Label AnnotationMulti-Instance Multi-Label Annotation Exploit correlations among concepts and among Exploit correlations among concepts and among

instances at the same timeinstances at the same time Not only can get image/frame level annotation, Not only can get image/frame level annotation,

but also can get region level annotationbut also can get region level annotation

28

Sky

MountainWater

Sands

Scenery

Page 29: Correlative Multi-Label Video Annotation

29

Page 30: Correlative Multi-Label Video Annotation

Correlative Multi-Label Video AnnotationCorrelative Multi-Label Video Annotation A new paradigm for multi-label annotationA new paradigm for multi-label annotation Models correlations and concepts Models correlations and concepts

simultaneouslysimultaneously Has a close connection to Gibbs Random FieldHas a close connection to Gibbs Random Field

30


Top Related