+ All Categories
Home > Documents > Visual Recognition with Humans in the...

Visual Recognition with Humans in the...

Date post: 23-May-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
63
Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter Welinder Pietro Perona Steve Branson Catherine Wah Boris Babenko Florian Schroff
Transcript
Page 1: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Serge Belongie UC San Diego

Visual Recognition with Humans in the Loop

Peter Welinder Pietro Perona

Steve Branson Catherine Wah Boris Babenko Florian Schroff

Page 2: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

http://www.cse.ucsd.edu

Page 3: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Outline• Visipedia project overview • Relevant Work • Birds-200 dataset • “Visual 20 Questions” game • Results • Discussion

Page 4: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Outline• Visipedia project overview • Relevant Work • Birds-200 dataset • “Visual 20 Questions” game • Results • Discussion

Page 5: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

What Is Visipedia?

http://en.wikipedia.org/wiki/Bird

• The visual counterpart to Wikipedia • A user-generated encyclopedia of visual knowledge • An effort to associate articles with large quantities of

well-organized, intuitive visual concepts

Page 6: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Motivation• People will willingly label or organize certain

images if: – They are interested in a particular subject matter – They have the appropriate expertise

Ring-tailed lemur Thruxton Jackaroo

Page 7: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Motivation• Construct a more comprehensive and intuitive

knowledge base of visual objects • Provide services like better text-to-image search

and image-to-article search

Page 8: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Populating Visipedia• Populate Wikipedia articles with more visual

data using large quantities of unlabeled data on the web

World wide web Visipedia

Page 9: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Outline• Visipedia project overview • Relevant Work • Birds-200 dataset • “Visual 20 Questions” game • Results • Discussion

Page 10: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Related Work: Systems• Botanist’s Field Guide [Belhumeur et al. ’08] • Oxford Flowers [Nilsback & Zisserman ’08] • STONEFLY9 [Martínez-Muñoz et al. ’09] • omoby [IQEngines.com ‘10] • 20 Questions game [20q.net] • ReCAPTCHA [von Ahn et al. ’08] • Wikimedia Commons

!10

Page 11: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Related Work: Methods

• Relevance Feedback • Active Learning • Expert Systems • Decision Trees • Feature Sharing & Taxonomies • Parts & Attributes • Crowdsourcing & Human Computation

!11

Page 12: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Attribute-Based Classification• Train classifiers on

attributes instead of objects • Attributes are shared by

different object classes • Attributes provide the

ingredients necessary to recognize each object class

Lampert et al. 2009 Farhadi et al. 2009

Page 13: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter
Page 14: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Wikimedia Commons• Multiple ways of

organizing sub-categories and visual information

• Sub-categories or clusters are represented by some exemplar image

http://commons.wikimedia.org/wiki/Dog

Page 15: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Motivation (Computer Vision Perspective)

• Need for more training data – Beyond the capacity of any one research group – Better quality control

• Need for more realistic data – Let people define what tasks are important – Study tightly-related categories

Page 16: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Dealing With a Large Number of Related Classes

• Standard classification methods fail because: – Only small number of training examples per class are

available – Variation between classes is small – Variation within a class is often still high

Brewer’s Sparrow Vesper Sparrow

Page 17: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

!17

Page 18: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Outline• Visipedia project overview • Relevant Work • Birds-200 dataset • “Visual 20 Questions” game • Results • Discussion

Page 19: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Birds-200 Dataset

6033 images over 200 bird species

Page 20: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Image Harvesting

• Flickr: text search on species name • MTurk: presence/absence and bounding

boxes

!20

Page 21: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

!21

Page 22: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

!22

Page 23: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Attribute Labeling• attributes from whatbird.com • 25 visual attributes -> 288 binary attributes

– similar to “dichotomous key” in biology • MTurk interface

– {guessing, probably, definitely} • 5x redundancy factor

!23

Page 24: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Attribute-Based Classification• Number of attributes

is less than number of classes

• Attribute classification tasks might be easier

• Makes it easier to incorporate human knowledge

www.whatbird.com

Page 25: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

!25

Page 26: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

!26

Page 27: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

!27

Page 28: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

!28

Page 29: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

!29

MTurker Label Certainty

Page 30: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

MTurker Feedback• “These hits were fun. Will you be posting more of them anytime

soon? Thanks!” • “These are Beautiful birds and I am enjoying this hit collection” • “I really enjoy doing your hits, they are fun and interesting. Thanks.” • “Love doing these because I'm a bird watcher.” • “the birds are so cute..hope u can send more kind of birds” • “I haven't really studied birds, but doing these HITs has made me

realize just how beautiful they are. It has also made me aware of the many different types of birds. Thank you”

• “I REALLY LOVE THE COLOR OF THE BIRDS.” • “Thank you for providing this job. The fact that the images are

beautiful to look at make it a lot more enjoyable to do!” • “Enjoyable to do.” • Hourly Wage ≈ $1.25

!30

Page 31: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Outline• Visipedia project overview • Relevant Work • Birds-200 dataset • “Visual 20 Questions” game • Results • Discussion

Page 32: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Visual 20 Questions

!32

• “Computer Vision” module = Vedaldi’s VLFeat • VQ Geometric Blur, color/gray SIFT spatial pyramid • Multiple Kernel Learning • Per-Class 1-vs-All SVM • 15 training examples per bird species • Choose question to maximize expected Information Gain

Page 33: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

!33

Page 34: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Outline• Visipedia project overview • Relevant Work • Birds-200 dataset • “Visual 20 Questions” game • Results • Discussion

Page 35: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

General Observations

• User Responses are Stochastic • Computer Vision Reduces Manual Labor • User Responses Drive Up Performance • Computer Vision Improves Overall

Performance • Different Questions are Asked w/ and w/o

Computer Vision • Recognition is not Always Successful

!35

Page 36: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

0 10 20 30 40 50 600

0.2

0.4

0.6

0.8

1

Number of Binary Questions Asked

Perc

ent C

lass

ified

Cor

rect

ly

Deterministic UsersMTurk UsersMTurk Users + Model

w/o Computer Vision

!36

• User Responses are Stochastic

Page 37: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Number of Binary Questions Asked

Perc

ent C

lass

ified

Cor

rect

ly

No CV1−vs−allAttribute

w/ Computer Vision

!37• Computer Vision Reduces Manual Labor

Page 38: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

0 2 4 6 8 10 12 14 160

0.05

0.1

0.15

0.2

Number of Binary Questions Asked

Perc

ent o

f Tes

tset

Imag

es

No CV (11.11)1−vs−all (6.64)Attribute (6.43)

w/ Computer Vision (cont’d)

!38• User Responses Drive Up Performance

Page 39: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

• Computer Vision Improves Overall Performance • Different Questions are Asked w/ and w/o

Computer Vision

Page 40: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

• Recognition is not Always Successful

Page 41: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Indigo Bunting Blue Grosbeak

Page 42: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Future Work• More Birds! More Categories! • Attribute Induction • Incorporate Part Localization • Partner with Wikimedia Foundation

Page 43: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Project Website

• Database, harvesting software, etc – http://visipedia.org

!43

Page 44: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

• (extra slides follow)

!44

Page 45: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Part Labels

• Part diagrams give some indication of the spatial configuration of parts, but people will do this only for a small number of images

Page 46: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Object Localization and Shared Parts

• Training a classifier with latent variables (Dollar et al. 2008, Felzenszwalb et al. 2008)

• Latent variables are things like the pose and location of parts

• Objects in related domains share the same types of parts and poses

Page 47: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Shared Parts and AttributesPine Warbler Cape May Warbler Kentucky Warbler

Yellow Beak Black Striped

Hornet

Attribute and Part Detectors

Belly

Page 48: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Object Localization and Shared Parts

Pine Warbler Cape May Warbler Kentucky Warbler

• Train part and attribute classifiers from class descriptions: – – – Part locations zi

belly, zihead, zi

beak in image xi are latent variables

Belly: solid, yellow Head: yellow Beak: all-purpose

Belly: striped, yellow, black Head: black Beak: all-purpose

Belly: solid, yellow Head: black Beak: all-purpose

Page 49: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Object Localization and Shared Parts

Pine Warbler Cape May Warbler Kentucky Warbler

• Training examples for each part/attribute span across different bird classes – For each Cape May Warbler image xi

Belly: solid, yellow Head: yellow Beak: all-purpose

Belly: striped, yellow, black Head: black Beak: all-purpose

Belly: solid, yellow Head: black Beak: all-purpose

Page 50: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Objects are More than Class LabelsPine Warbler Cape May Warbler Kentucky Warbler

Yellow Beak Black Striped

Hornet

Attribute and Part Detectors

• Represent objects as parts and attributes • Model relationships between classes • Pool training examples from different object classes • Define building blocks useful to detect new object classes

Belly

Page 51: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Classification Using Multiple Pathways

• Arrange recognition tasks into multiple “pathways”Bird Pathway

Bird Detector

Species Detectors

Parts, Pose,

Attributes

Face Pathway

Face Detector

Face Recognition

Parts, Attributes

Text Pathway

Text Detector

Text Reading

Image

Indoor vs. Outdoor

Graphic vs. Real Image

Page 52: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Classification Using Multiple Pathways

• Place redundant calculations in earlier pathways • Transfer information from easier tasks to harder

ones • Cascade classification tasks to avoid

unnecessary computations

Page 53: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Classification Using Multiple Pathways

• Pathway components: – A domain: a set of object classes that often have

similar parts or attributes – Takes as input an image, information extracted from

earlier pathways – Algorithms useful for extracting attributes and

information relevant to the domain – Outputs estimated attributes and part locations – Invokes other pathways as necessary

Page 54: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Clustering and Near Duplicate Detection

Raw Beef Cooked Beef Cow Diagrams

• Improve presentation of data by suppressing duplicate, redundant, or similar images

Page 55: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Clustering and Near Duplicate Detection

• Use similarity metrics in different feature spaces, e.g. bag of words, color histogram, GIST and standard methods for clustering and near duplicate detection

Page 56: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Image Registration• Bring unlabeled images into correspondence with a

labeled one using some matching function, e.g. an affine or perspective transformation or shape matching

• Transfer labels from labeled images to unlabeled ones

Page 57: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Visual Knowledge

• Associate categories with predictions of which visual attributes are most representative

Sacred IbisGlossy Ibis

Curved Beak

Page 58: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Interactive Labeling Systems• Speedup the population of image examples for

some category: – Active learning: intelligently query labeling tasks

while incrementally training a category classifier – Relevance feedback: use labeled images to re-rank

the relevancy of unlabeled to a category

• Semi-supervised segmentation methods, e.g. GrabCut

Page 59: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

!59

Page 60: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

!60

Page 61: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Combining Knowledge From Text and Images

• Leverage article text and link structure in Wikipedia articles

Page 62: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Connecting Knowledge Between Article Text and Images

• Use article text and link structure to predict class attributes, taxonomical structure, and object parts

Page 63: Visual Recognition with Humans in the Loopjacobsschool.ucsd.edu/events/google/docs/Google01Nov2011.pdf · Serge Belongie UC San Diego Visual Recognition with Humans in the Loop Peter

Connecting Knowledge Between Article Text and Images

• Add “links” between images and article text


Recommended