VIREO-TNO @ TRECVID 2014 Zero-Shot Event Detection and … · 2014. 11. 25. · VIREO-TNO @ TRECVID...

Post on 17-Sep-2020

2 views 0 download

transcript

VIREO-TNO @ TRECVID 2014Zero-Shot Event Detection and Recounting

Speaker: Maaike de Boer (TNO)

Yi-Jie Lu1, Hao Zhang1, Chong-Wah Ngo1

Maaike de Boer2, John Schavemaker2, Klamer Schutte2, Wessel Kraaij2

1VIREO Group, City University of Hong Kong, Hong Kong2Netherlands Organization for Applied Scientific Research (TNO), Netherlands

Outline

0-Shot System

– System Overview

– Findings

MER System

– System Workflow

– Results

Semantic Query Generation (SQG)

– Given an event query, SQG translates the query description into a representation of semantic concepts

Event Query(Attempting a Bike Trick)

SQG

< Objects >• Bike 0.60• Motorcycle 0.60• Mountain bike 0.60< Actions >• Bike trick 1.00• Ridding bike 0.62• Flipping bike 0.61< Scenes >• Parking lot 0.01

Semantic Query

Relevant ConceptsRelevance ScoreConcept Bank

$

Concept Bank

€TRECVID

SIN

₤Research Collection

ƒ HMDB51

$UCF101

¥ImageNet

Concept Bank

– Research collection (497 concepts)

– ImageNet ILSVRC’12 (1000 concepts)

– SIN’14 (346 concepts)

$

Concept Bank

€TRECVID

SIN

₤Research Collection

ƒ HMDB51

$UCF101

¥ImageNet

Event Search

– Ranking according to the SQ and concept responses

< Objects >• Bike 0.60• Motorcycle 0.60• Mountain bike 0.60< Actions >• Bike trick 1.00• Ridding bike 0.62• Flipping bike 0.61< Scenes >• Parking lot 0.01

Semantic Query

Video Ranking

Event Search is iqc

q

Concept Response ic

Outline

0-Shot System

– System Overview

– Findings

MER System

– System Workflow

– Results

SQG Experiments

– Exact matching vs. WordNet/ConceptNet matching

– How many concepts are used to represent an event?

– To further improve the weighting:

TF-IDF

Term specificity

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Ave

rage

Pre

cisi

on

Event ID

WordNet ExactMatching EM-TOP

WordNet

Exact Matching

Exact matching but only retains the top few concepts

7%

Exact matching vs. WordNet matching

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

1 6 11 16 21 26

Mea

n A

vera

ge P

reci

sio

n

Top k Concepts

MAP(all)

Hit the best MAP by only retaining the Top 8 concepts

Amount of concepts used to represent event

Insights

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

1 6 11 16 21 26

Ave

rage

Pre

cisi

on

Top k Concepts

21

Event 21: Attempting a bike trick

TrickWheel

Paddle wheel

Car wheelPotter wheel

Person riding

Jumping

Insights

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

1 6 11 16 21 26

Ave

rage

Pre

cisi

on

Top k Concepts

31

Event 31: Beekeeping

Honeycomb (ImageNet)

Bee (ImageNet)

Bee house (ImageNet)Cutting (research collection)

Cutting down tree (research collection)

Insights

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

1 6 11 16 21 26

Ave

rage

Pre

cisi

on

Top k Concepts

23

Event 23: Dog show

Brush dog (research collection)

Dog show (research collection)

Improvements by TF-IDF and word specificityMethod MAP (on MED14-Test)

Exact Matching Only 0.0306

Exact Matching + TF 0.0420

Exact Matching + TFIDF 0.0495

Exact Matching + TFIDF + Word Specificity 0.0502

0

0.01

0.02

0.03

0.04

0.05

0.06

EM Only EM + TF EM + TFIDF EM + TFIDF +Spec.

Findings

1. Exact matching performs better than matching with WordNet and/or ConceptNet

2. Performance is even better by only retaining the top few exactly matched concepts

3. Adding both TF-IDF and Word Specificity increases performance

Why ontology-based mapping would not work?

A s

amp

le q

uer

y in

TR

ECV

ID 2

00

9

Why ontology-based mapping would not work?

Dog Show

Concept“dog”

cat

horse

mammal

carnivore

animal

kit fox

red wolf

SIN

ImageNet

Why ConceptNet mapping would not work?

Tailgating

car

food

helmet

team uniform

portable shelter

parking lot

driver

engine

tailgatingdesires

bus

Findings

It is difficult to

– harness the ontology-based mapping while constraining the mapping by event context

In the Ad-Hoc event “Extinguishing a Fire”

– Key concepts are missing:

Fire extinguisher

Firefighter

Findings

It is reasonable to

– Scale up the number of concepts, thus increasing the chance of exact matching

MED14-Eval-Full Results

PS 000Ex

– Automatic semantic query generation and search

– Fusion of 0-Shot and OCR system

– Achieves the MAP of 5.2

AH 000Ex

– System is the same as in PS 000Ex

– Achieves the MAP of 2.6

– Performance drops due to the lack of key concepts

Outline

0-Shot System

– System Overview

– Findings

MER System

– System Workflow

– Results

MER System

In algorithm design, we aim to optimize– Concept-to-event relevancy

– Evidence diversity

– Viewing time of evidential shots

MER System

In algorithm design, we aim to optimize– Concept-to-event relevancy

First, we require that candidate shots are relevant to the event;

Second, we do concept-to-shot alignment.

– Evidence diversity

– Viewing time of evidential shots

MER System

In algorithm design, we aim to optimize– Concept-to-event relevancy

First, we require that candidate shots are relevant to the event;

Second, we do concept-to-shot alignment.

– Evidence diversity In concept-to-shot alignment, we recount each shot with a unique concept

different from other shots.

– Viewing time of evidential shots

MER System

In algorithm design, we aim to optimize– Concept-to-event relevancy

First, we require that candidate shots are relevant to the event;

Second, we do concept-to-shot alignment.

– Evidence diversity In concept-to-shot alignment, we recount each shot with a unique concept

different from other shots.

– Viewing time of evidential shots Select only the three most confident shots as key evidence

Basically, each shot is in about 5 seconds

Outline

0-Shot System

– System Overview

– Findings

MER System

– System Workflow

– Results

Key Evidence Localization

Extract keyframes uniformly

Key Evidence Localization

Concept Reponses

Apply concept detectors$

Concept Bank

€TRECVID

SIN

₤Research Collection

ƒ HMDB51

$UCF101

¥ImageNet

Key Evidence Localization

Choose keyframes that are most relevant to this event

• All concepts in semantic query are taken into account by calculating the weighted sum

is iwr

Key Evidence Localization

Expand keyframes to shots

Key Evidence Localization

The top 3 shots are selected as key evidences

Key Evidence Localization

The rests are non-key evidences

Concept-to-Shot Alignment

The top concept in the key evidence is selected as the representative concept* We choose unique concept for each shot

< Objects >• Bike• Motorcycle• Mountain bike< Actions >• Bike trick• Ridding bike• Flipping bike< Scenes >• Parking lot

Semantic Query

Key

Non-Key

Ridding bikeBike trickBike

Bike trickBikeRidding bike

Key

Key

MER14 Results

The percentage of strongly agree

(b) Event query quality(a) Evidence quality

0%

5%

10%

15%

20%

25%

30%

VIREO Team1 Team2 Team3 Team4 Team6 Team5

0%

5%

10%

15%

20%

25%

30%

Team2 VIREO Team4 Team3 Team6 Team1 Team5

MER14 Results

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

Team1 Team2 Team3 VIREO Team4 Team5 Team6

0%

10%

20%

30%

40%

50%

60%

70%

Team2 VIREO Team4 Team1 Team6 Team5 Team3

The percentage of both agree and strongly agree

(b) Event query quality(a) Evidence quality

Summary

0-Shot System

– The simple exact matching performs the best

– The quality of concepts selected to represent an event is more important than quantity

– It’s an open problem of how to harness the ontology-based mapping

Summary

MER System

– In key evidence localization, we emphasize the event relevancy first, then the hot concepts

– We recommend three shots as key evidences and each in about 5 seconds

Thanks!