+ All Categories
Home > Documents > Reading Between the Lines: Object Localization Using Implicit Cues from Image Tags

Reading Between the Lines: Object Localization Using Implicit Cues from Image Tags

Date post: 26-Feb-2016
Category:
Upload: signa
View: 36 times
Download: 0 times
Share this document with a friend
Description:
Sung Ju Hwang and Kristen Grauman University of Texas at Austin CVPR 2010. Reading Between the Lines: Object Localization Using Implicit Cues from Image Tags. Detecting tagged objects. Images tagged with keywords clearly tell us which objects to search for. Dog Black lab Jasper - PowerPoint PPT Presentation
Popular Tags:
34
READING BETWEEN THE LINES: OBJECT LOCALIZATION USING IMPLICIT CUES FROM IMAGE TAGS Sung Ju Hwang and Kristen Grauman University of Texas at Austin CVPR 2010
Transcript
Page 1: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

READING BETWEEN THE LINES: OBJECT LOCALIZATION USING IM-PLICIT CUES FROM IMAGE TAGS

Sung Ju Hwang and Kristen GraumanUniversity of Texas at AustinCVPR 2010

Page 2: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Images tagged with keywords clearly tell us which objects to search for

Detecting tagged objects

DogBlack labJasperSofaSelfLiving roomFedoraExplore#24

Page 3: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Duygulu et al. 2002

Detecting tagged objectsPrevious work using tagged images fo-cuses on the noun ↔ object correspon-dence.

Fergus et al. 2005

Li et al., 2009

Berg et al. 2004

[Lavrenko et al. 2003, Monay & Gatica-Perez 2003, Barnard et al. 2004, Schroff et al. 2007, Gupta & Davis 2008, Vijayanarasimhan & Grauman 2008, …]

Page 4: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

MugKeyKeyboardToothbrushPenPhotoPost-it

ComputerPosterDeskBookshelfScreenKeyboardScreenMugPoster

? ?

Based on tags alone, can you guess where and what size the mug will be in each im-age?

Our IdeaThe list of human-provided tags gives useful cues beyond just which objects are present.

Page 5: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

ComputerPosterDeskBookshelfScreenKeyboardScreenMugPoster

MugKeyKeyboardToothbrushPenPhotoPost-it

Our Idea

Presence of larger objectsMug is named first

Absence of larger objectsMug is named later

The list of human-provided tags gives useful cues beyond just which objects are present.

Page 6: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Our IdeaWe propose to learn the implicit localiza-tion cues provided by tag lists to improve object detection.

Page 7: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

WomanTableMugLadder

Approach overview

MugKeyKeyboardTooth-brushPenPhotoPost-it

Object de-tector

Implicit tag features

ComputerPosterDeskScreenMugPoster

Training: Learn object-specific connection between localization parameters and implicit tag features.

MugEiffel

DeskMugOffice

MugCoffee

Testing: Given novel image, localize objects based on both tags and appearance.

P (location, scale | tags)Implicit tag

features

Page 8: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

WomanTableMugLadder

Approach overview

MugKeyKeyboardTooth-brushPenPhotoPost-it

Object de-tector

Implicit tag features

ComputerPosterDeskScreenMugPoster

Training: Learn object-specific connection between localization parameters and implicit tag features.

MugEiffel

DeskMugOffice

MugCoffee

Testing: Given novel image, localize objects based on both tags and appearance.

P (location, scale | tags)Implicit tag

features

Page 9: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Feature: Word presence/ab-sence

MugKeyKeyboardToothbrushPenPhotoPost-it

ComputerPosterDeskBookshelfScreenKeyboardScreenMugPoster

Presence or absence of other objects af -fects the scene layout record bag-of-words frequency.

Presence or absence of other objects af -fects the scene layout

= count of i-th word.

, where

Mug Pen Post-it Toothbrush Key Photo Com-

puter Screen Key-board Desk Book-

shelf Poster

W(im1) 1 1 1 1 1 1 0 0 1 0 0 0

W(im2) 1 0 0 0 0 0 1 2 1 1 1 1

Page 10: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Feature: Word presence/ab-sence

MugKeyKeyboardToothbrushPenPhotoPost-it

ComputerPosterDeskBookshelfScreenKeyboardScreenMugPoster

Presence or absence of other objects af -fects the scene layout record bag-of-words frequency.

Presence or absence of other objects af -fects the scene layout

Large objects men-tioned

Small objects men-tioned

= count of i-th word.

, where

Mug Pen Post-it Toothbrush Key Photo Com-

puter Screen Key-board Desk Book-

shelf Poster

W(im1) 1 1 1 1 1 1 0 0 1 0 0 0

W(im2) 1 0 0 0 0 0 1 2 1 1 1 1

Page 11: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Feature: Rank of tagsPeople tag the “important” objects ear-lierPeople tag the “important” objects earlier record rank of each tag compared to its typical rank. = percentile rank of i-

th word., whereMugKeyKeyboardToothbrushPenPhotoPost-it

ComputerPosterDeskBookshelfScreenKeyboardScreenMugPoster

Mug Com-puter Screen Key-

board Desk Book-shelf Poster Photo Pen Post-it Tooth

brush Key

R(im1) 0.80 0 0 0.51 0 0 0 0.28 0.72 0.82 0 0.90

R(im2) 0.23 0.62 0.21 0.13 0.48 0.61 0.41 0 0 0 0 0

Page 12: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Feature: Rank of tagsPeople tag the “important” objects ear-lierPeople tag the “important” objects earlier record rank of each tag compared to its typical rank. = percentile rank of i-

th word., whereMugKeyKeyboardToothbrushPenPhotoPost-it

ComputerPosterDeskBookshelfScreenKeyboardScreenMugPoster

Relatively high rank

Mug Com-puter Screen Key-

board Desk Book-shelf Poster Photo Pen Post-it Tooth

brush Key

R(im1) 0.80 0 0 0.51 0 0 0 0.28 0.72 0.82 0 0.90

R(im2) 0.23 0.62 0.21 0.13 0.48 0.61 0.41 0 0 0 0 0

Page 13: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Feature: Proximity of tagsPeople tend to move eyes to nearby ob-jects after first fixation record proximity of all tag pairs.

People tend to move eyes to nearby ob-jects after first fixation

= rank differ-ence.

, where

1) Mug2) Key3) Keyboard4) Toothbrush5) Pen6) Photo7) Post-it

1) Computer2) Poster3) Desk4) Bookshelf5) Screen6) Keyboard7) Screen8) Mug9) Poster

2 3

456

7

1

2

3 4 56789

1

Mug Screen Key-board Desk Book-

shelfMug 1 0 0.5 0 0

Screen   0 0 0 0Key-

board     1 0 0Desk       0 0Book-shelf         0

Mug Screen Key-board Desk Book-

shelfMug 1 1 0.5 0.2 0.25

Screen   1 1 0.33 0.5Key-

board     1 0.33 0.5Desk       1 1Book-shelf         1

Page 14: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Feature: Proximity of tagsPeople tend to move eyes to nearby ob-jects after first fixation record proximity of all tag pairs.

People tend to move eyes to nearby ob-jects after first fixation

= rank differ-ence.

, where

1) Mug2) Key3) Keyboard4) Toothbrush5) Pen6) Photo7) Post-it

1) Computer2) Poster3) Desk4) Bookshelf5) Screen6) Keyboard7) Screen8) Mug9) Poster

2 3

456

7

1

2

3 4 56789

1

Mug Screen Key-board Desk Book-

shelfMug 1 0 0.5 0 0

Screen   0 0 0 0Key-

board     1 0 0Desk       0 0Book-shelf         0

Mug Screen Key-board Desk Book-

shelfMug 1 1 0.5 0.2 0.25

Screen   1 1 0.33 0.5Key-

board     1 0.33 0.5Desk       1 1Book-shelf         1May be close to each other

Page 15: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

WomanTableMugLadder

Approach overview

MugKeyKeyboardTooth-brushPenPhotoPost-it

Object de-tector

Implicit tag features

ComputerPosterDeskScreenMugPoster

MugEiffel

DeskMugOffice

MugCoffee

P (location, scale | W,R,P)Implicit tag

features

Training:

Testing:

Page 16: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Modeling P(X|T)

We model it directly using a mixture den-sity network (MDN) [Bishop, 1994].

We need PDF for location and scale of the target object, given the tag feature:P(X = scale, x, y | T = tag feature)

Input tag feature(Words, Rank, or Proximity)

Mixture model

Neural network

α µ Σ α µ Σ α µ Σ

Page 17: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

LampCarWheelWheelLight

WindowHouseHouseCarCarRoadHouseLightpole

CarWindowsBuildingManBarrelCarTruckCar

BoulderCar

Modeling P(X|T)Example: Top 30 most likely localization pa-rameters sampled for the object “car”, given only the tags.

Page 18: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

LampCarWheelWheelLight

WindowHouseHouseCarCarRoadHouseLightpole

CarWindowsBuildingManBarrelCarTruckCar

BoulderCar

Modeling P(X|T)Example: Top 30 most likely localization pa-rameters sampled for the object “car”, given only the tags.

Page 19: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

WomanTableMugLadder

Approach overview

MugKeyKeyboardTooth-brushPenPhotoPost-it

Object de-tector

Implicit tag features

ComputerPosterDeskScreenMugPoster

MugEiffel

DeskMugOffice

MugCoffee

P (location, scale | W,R,P)Implicit tag

features

Training:

Testing:

Page 20: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Integrating with object detec-torHow to exploit this learned distribution

P(X|T)?1)Use it to speed up the detection

process (location priming)

Page 21: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Integrating with object detec-torHow to exploit this learned distribution

P(X|T)?1)Use it to speed up the detection

process (location priming)(a) Sort all candi-

date windows according to P(X|T).

Most likelyLess likelyLeast likely

(b) Run detector only at the most probable locations and scales.

Page 22: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Integrating with object detec-torHow to exploit this learned distribution

P(X|T)?1)Use it to speed up the detection

process (location priming)2)Use it to increase detection accuracy

(modulate the detector output scores)Predictions from object detector0.7

0.8

0.9

Predictions based on tag features

0.3

0.2

0.9

Page 23: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Integrating with object detec-torHow to exploit this learned distribution

P(X|T)?1)Use it to speed up the detection

process (location priming)2)Use it to increase detection accuracy

(modulate the detector output scores)

0.630.24

0.18

Page 24: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Experiments: DatasetsLabelMe PASCAL VOC 2007 Street and office scenes Contains ordered tag

lists via labels added 5 classes 56 unique taggers 23 tags / image Dalal & Trigg’s HOG de-

tector

Flickr images Tag lists obtained on

Mechanical Turk 20 classes 758 unique taggers 5.5 tags / image Felzenszwalb et al.’s

LSVM detector

Page 25: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

ExperimentsWe evaluate

Detection Speed Detection Accuracy

We compare Raw detector (HOG, LSVM) Raw detector + Our tag features

We also show the results when using Gist [Tor-ralba 2003] as context, for reference.

Page 26: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

We search fewer win-dows to achieve same detection rate.

We know which detec-tion hypotheses to trust most.

PASCAL: Performance evaluation

0 0.2 0.4 0.60

0.2

0.4

0.6

0.8

1

recall

prec

isio

n

Accuracy: All 20 PASCAL Classes

LSVM (AP=33.69)

0 0.2 0.4 0.60

0.2

0.4

0.6

0.8

1

recall

prec

isio

n

Accuracy: All 20 PASCAL Classes

LSVM (AP=33.69)LSVM+Tags (AP=36.79)

0 0.2 0.4 0.60

0.2

0.4

0.6

0.8

1

recall

prec

isio

n

Accuracy: All 20 PASCAL Classes

LSVM (AP=33.69)LSVM+Tags (AP=36.79)LSVM+Gist (AP=36.28)

0 0.2 0.4 0.6 0.80

0.2

0.4

0.6

0.8

detection rate

porti

on o

f win

dow

s se

arch

ed

Speed: All 20 LabelMe Classes

Sliding (0.223)

0 0.2 0.4 0.6 0.80

0.2

0.4

0.6

0.8

detection rate

porti

on o

f win

dow

s se

arch

ed

Speed: All 20 LabelMe Classes

Sliding (0.223)Sliding+Tags (0.098)

0 0.2 0.4 0.6 0.80

0.2

0.4

0.6

0.8

detection rate

porti

on o

f win

dow

s se

arch

ed

Speed: All 20 LabelMe Classes

Sliding (0.223)Sliding+Tags (0.098)Sliding+Gist (0.125)

Naïve sliding window searches 70%.

We search only 30%.

0 0.2 0.4 0.60

0.2

0.4

0.6

0.8

1

recall

prec

isio

n

Accuracy: All 20 PASCAL Classes

LSVM (AP=33.69)LSVM+Tags (AP=36.79)LSVM+Gist (AP=36.28)

Page 27: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

potte

dplan

t cat sofa

boat

motorbi

ke train car cha

ir

tvmon

itor

horse

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

Tags

Object Class

AP Im

prov

emen

tPASCAL: Accuracy vs Gist per class

Page 28: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

potte

dplan

t cat sofa

boat

motorbi

ke train car cha

ir

tvmon

itor

horse

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

TagsGist

Object Class

AP Im

prov

emen

tPASCAL: Accuracy vs Gist per class

Page 29: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

LampPersonBottleDogSofaPaintingTable

PersonTableChairMirrorTableclothBowlBottleShelfPaintingFood

Bottle

CarLicense PlateBuilding

Car

LSVM+Tags (Ours)

LSVM alone

PASCAL: Example detections

CarDoorDoorGearSteering WheelSeatSeatPersonPersonCamera

Page 30: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

DogFloorHairclip

DogDogDogPersonPersonGroundBenchScarf

PersonMicrophoneLight

HorsePersonTreeHouseBuildingGroundHurdleFence

PASCAL: Example detectionsDog

Person

LSVM+Tags (Ours)

LSVM alone

Page 31: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

AeroplaneSkyBuildingShadow

PersonPersonPoleBuildingSidewalkGrassRoad

DogClothesRopeRopePlantGroundShadowStringWall

BottleGlassWineTable

PASCAL: Example failure casesLSVM+Tags (Ours)

LSVM alone

Page 32: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Results: Observations Often our implicit features predict:

- scale well for indoor objects- position well for outdoor objects

Gist usually better for y position, while our tags are generally stronger for scale

Need to have learned about target ob-jects in variety of examples with differ-ent contexts

- visual and tag context are complementary

Page 33: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Summary We want to learn what is implied (beyond objects

present) by how a human provides tags for an im-age.

Approach translates existing insights about hu-man viewing behavior (attention, importance, gaze, etc.) into enhanced object detection.

Novel tag cues enable effective localization prior. Significant gains with state-of-the-art detectors

and two datasets.

Page 34: Reading Between the Lines:  Object Localization Using Implicit Cues from Image Tags

Hwang & Grauman, CVPR 2010

Joint multi-object detection

From tags to natural language sen-tences

Image retrieval applications

Future work


Recommended