A Alternate Algorithms For Combining Computer Vision and...

1

A Alternate Algorithms For Combining Computer Visionand User Input

In Section 3.1 of the main paper, we introduced a simple and computationally ef-ficient algorithm for combining object recognition algorithms. In this algorithm,estimators for per-class probabilities p(c|x) can be trained offline, and user in-put U is combined at runtime to estimate p(c|x, U). In this section, we consideralgorithms that instead directly estimate p(c|x, U). These algorithms are morecomputationally expensive, but could be regarded as an upper bound for re-alistically achievable performance. Suppose we have some multi-class learningalgorithm that takes as input a dataset D = {(x1, y1)...(xn, yn)} of image-classtraining pairs and is capable of producing probabilistic output p(c|x). Thenp(c|x, U) could be estimated by dynamically retraining on a refined dataset DU ,which contains only the training images consistent with user response U . Al-though this procedure is in general computationally intractable, we note thatfor some learning methods retraining may not actually be necessary; for exam-ple, when using a learning algorithm based on 1-vs-1 classifiers for all class pairs,a user response that eliminates some set of classes does not require retraining.

Our empirical findings suggest that algorithms based on retraining are notworth their added computational cost. Fig. 1 shows two plots comparing retrain-ing 1-vs-all classifiers on Birds-200 to pre-training 1-vs-all classifiers offline. Inthe retraining case, we retrain on multiple random class subsets of different sizeand evaluate performance on the same class subsets. In the offline case, trainingalways occurs over all classes; however, at runtime only the appropriate clas-sifiers are used. The experiment corresponds to the case where the user inputU eliminates some set of classes from plausibility. We see that both methodsget similar results. Surprisingly, retraining does not improve performance at all,and actually hurts performance by 3 − 4% when the number of classes is lessthan 50. This probably occurs because the classifiers trained offline have greatergeneralizability; each 1-vs-all classifier is trained with the same set of positiveexamples, but also with additional negative examples not used in the retrainingbased method. We also experimented with a retraining algorithm based on 1-vs-1 classifiers. We omitted these experiments from our results because the 1-vs-1method achieved lower base performance than the 1-vs-all method.

2

0 50 100 150 2000.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of Classes (K)

Ave

rage

Cla

ss A

ccur

acy

Training Once on All 200 ClassesRetraining on Each K−class Subset

Fig. 1. No benefit of retraining: We select many random class subsets of size Kand compare performance of retraining 1-vs-all classifiers on each subset versus ver-sus applying a classifier trained offline on all 200 classes to the appropriate classes.Performance is comparable, and the pre-trained classifier outperforms the retrainedclassifiers by 3 − 4% when K < 50. Results suggest that for this type of classifier, weare unlikely to benefit much from retraining when receiving user-supplied informationthat eliminates some classes from plausibility.

3

B User Interface

Fig. 2. User-interface Example 1: The interface shown was used to collect datafrom Mechanical Turk. The figure shows two different attribute questions, one for beakshape and one for underparts color. In each case, the test image is shown on the left, andthe question is shown on the right. For each possible answer, a clickable prototypicalexample is shown. When the question pertains to a particular part, a diagram is used tovisually define the part (as shown in the image for underparts color). In the underpartscolor image, the user is allowed to select multiple colors.

4

Fig. 3. User-interface Example 2: The interface shown was used to collect datafor the head pattern attribute from Mechanical Turk. Although the average personmight be unfamiliar with many of the selectable attributes, hovering the mouse overany option shows additional prototypical examples and instructions.

5

C Animals With Attributes Dataset

Animals With Attributes (AwA) is a dataset of 50 animal classes such as ze-bras, pandas, and dolphins. Each class is associated with soft labels for 85 binaryattributes based on posing class-level attribute questions to multiple people, ef-fectively encoding a distribution p(u|c). We simulate test performance by ran-domly selecting a question response based on p(u|c). The dataset also includesare class-attributes, which were obtained by thresholding the soft labels. Whilethe dataset is not exactly aligned with our goal of recognition of ner-grained cat-egories, it is the most established dataset with the types of annotations requiredfor our application. The dataset is difficult due to large intraclass variation andunaligned images. The results of our experiments using simulated user responsesare shown in Fig. 9 from the main paper, and using hard per-class attributes areshown in Fig. 4.

0 1 2 3 4 5 6 7 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of Binary Questions Asked

Per

cent

Cla

ssifi

ed C

orre

ctly

No CV1−vs−allAttribute

−1 0 1 2 3 4 5 6 7 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Number of Binary Questions Asked

Per

cent

of T

ests

et Im

ages

No CV (3.59)1−vs−all (1.93)Attribute (2.15)

Fig. 4. Performance on Animals With Attributes: In contrast to Fig. 9 from themain paper which showed simulated results on soft (probabilistic) per-class attributes,performance here is shown on the hard (deterministic) per-class attributes from Lam-pert et al. Left Plot: Classification performance (Method 1) as the number of questionresponses increases. Right Plot: The required number of questions needed to identifythe true class (Method 2) drops from 3.59 to 1.93 on average when incorporating com-puter vision. As can be seen in this figure and Fig. 9, the Animals With Attributesexperiments show similar results to Birds-200 in terms of the shape of the curves, thebenefit of computer vision in reducing the average number of questions asked, and theeffects of stochastic user responses.

6

D Birds-200: Dataset Statistics

Fig. 5. Attribute Class Counts: Each bar shows the number of bird species thathave a particular attribute, according to whatbird.com. The average number of classesper attribute is 41.4 (median 30).

Fig. 6. Class Attribute Counts: Each bar shows the raw count of the number of at-tributes present for a particular class, according to whatbird.com. The average numberof attributes per class is 26.0.

7

Fig. 7. Class-Attribute Similarities: The matrix shows the normalized Hammingdistance (percentage of bits that differ) between each pair of classes, if each class is rep-resented by a binary vector encoding attribute presence (according to whatbird.com).Each row and column corresponding to a bird species. We see clusters of bird specieswith very similar attribute vectors. The largest clusters correspond to warblers (153-184) and sparrows (113-135, Fig. 8 row 1), which are small, perching-type birds thatare mostly brown. There are also smaller clusters of highly related classes, which cor-respond to terns (143-147, Fig. 8 row 2) and gulls (58-65, Fig. 8 row 3), and blackbirds(9-13). We also see that blackbirds often have similar attributes to cowbirds and cor-morants (which are also usually black).

8

Baird’sSparrow

Brewer’sSparrow

GrasshopperSparrow

Henslow’sSparrow

Lincoln’sSparrow

Clay-coloredSparrow

ArcticTern

CaspianTern

CommonTern

ElegantTern

Forster’sTern

LeastTern

CaliforniaGull

Glaucous-winged Gull

Heermann’sGull

Herring’sGull

Ring-billedGull

Slaty-backedGull

Fig. 8. Clusters of related species: Each row shows a group of 6 different speciesof sparrows, terns, and gulls that are visually similar and sometimes confused by oursystem.

9

E Birds-200: User Responses

Fig. 9. Certainty responses for different attribute questions: The plot showsrelative proportion of Guessing, Probably, and Definitely responses for different at-tribute questions. Attributes that tend to be answered with greatest certainty suchas has primary color, has belly color, and has belly pattern correspond to parts thatare prominently visible and unambiguous. Attributes answered with higher proportionof Guessing responses include attributes that are more ambiguous like has wing shape(which is difficult to determine unless the wings are open) and has size (which can onlybe determined by contextual information or prior knowledge), and attributes of partsthat are often not visible like has leg color and has back color.

10

Fig. 10. Certainty responses as the number of questions increases: The plotshows the relative proportion of Guessing, Probably, and Definitely responses as afunction of the number of questions needed to classify each image (Method 2). We seethat the relative proportion of Guesssing responses tends to increase steadily as therequired number of questions increases from 0 to 15. After 10 questions, the results aremore erratic due to smaller sample sizes (there are often not that many test imagesthat took more than 10 questions).

11

F Birds-200: Additional Results

Fig. 11. Number of questions required for each class: The plot shows a CDFof the number of questions required to correctly classify images of each bird class(Method 2) using MTurk responses and computer vision. The classes that took thefewest number of questions, such as Ivory Gull and Painted Bunting, tend to be classesthat are visually distinct and have attributes that are reliably identified by MTurkers.The classes that took more questions to answer, such as Bank Swallow and Chuck-will’s-widow, tend to be classes that are either visually similar to other classes or haveattributes that MTurkers tend to answer erratically.

12

Fig. 12. Number of questions required for each attribute: The plot shows aCDF of the number of questions required to correctly classify images with a particularattribute (Method 2) using MTurk responses and computer vision. Because certain at-tributes occur more frequently in the bird classes than others, the CDF values for eachattribute are normalized by the number of classes with that attribute. The attributesthat took the fewest number of questions, including has upperparts color white andhas primary color white, generally pertain to color-based attributes, which are visu-ally less ambiguous than shape-based attributes. The attributes that took more ques-tions to answer, such as has leg color brown and has tail shape forked tail, tend to beattributes that are often not visible in the image.

13

G Birds-200: Computer Vision Results

Global ColorHist

Color Hist +Spatial Pyr

SIFT +Spatial Pyr

HSV SIFT +Spatial Pyr

SelfSimilarity +Spatial Pyr

All FeaturesCombined

10.94% 11.13% 20.35% 21.62% 11.97% 27.77%

Fig. 13. Features for computer vision algorithms: Each entry in the table showsthe base classification accuracy when using computer vision without any user feedbackfor different types of features. All features except the color histogram methods arecomputed using Andrea Vedaldi’s source code. The tag “Spatial Pyr” indicates thatthe features are vecotr-quantized and histogrammed over multiple spatial pyramidlevels and then used in χ-squared kernels. The color histograms are computed in HSV-space. Two variants of SIFT are shown, one operating on the grayscale images andanother operating on each channel of the HSV image. The best performance of anysingle feature type is achieved using HSV SIFT (21.62%); however we can see thatincorporating color and self-similarity features in a multiple kernel learning frameworkboosts performance to 27.77%.

14

Fig. 14. Confusion matrix using only computer vision: The confusion matrix onBirds-200 using a 1-vs-all SVM, without using any user feedback. Results were obtainedusing 15 training examples per class using the features described in Fig. 13.

15

H Birds-200: Additional Example Image Results

Magnolia Warbler Common Yellowthroat

Fig. 15. Example where user input helps: The left image shows a test image of aMagnolia Warbler. Using just computer vision, the image is classified incorrectly as aCommon Yellowthroat. After the user responded no to the question HasBreastPattern-Solid, the image is classified correctly.

16

Hooded Oriole Baltimore Oriole

Fig. 16. Example where user input helps: The left image shows a test image of aHooded Oriole. Using just computer vision, the image is classified incorrectly as a Bal-timore Oriole. After the user responded yes to the question HasUnderpartsColorYellow,the image is classified correctly.

Summer Tanager Cardinal

Fig. 17. Example where user input helps: The left image shows a test image of aSummer Tanager. Using just computer vision, the image is classified incorrectly as aCardinal. After the user responded yes to the question HasHeadPatternPlain, the imageis classified correctly. Note that HasHeadPatternPlain is not commonly selected as afirst question; it was selected because it was a good discriminating feature between theclasses that were highly ranked by computer vision.

17

Acadian Flycatcher Least Flycatcher

Fig. 18. Failure case due to highly similar classes: The Acadian Flycatcher andLeast Flycatcher are two commonly confused species. All 3 of the Acadian Flycatcherimages shown are misclassified as Least Flycatchers.

Indigo Bunting Blue Grosbeak

Fig. 19. Failure due to missing attributes: The Indigo Bunting test image is mis-classified as a Blue Grosbeak, despite the fact that computer vision by itself correctlyclassifies the image. The two birds are similar in terms of the attributes we are us-ing. The two birds have different shades of blue and the Blue Grosbeak has brownwingbars–these are attributes that we are missing in our vocabulary.

18

Groove-billed Ani Bronzed Cowbird

Has Undertail Color Blue? yes (definitely)

Fig. 20. Example failure case due to bad user response and missing at-tributes: The test image of the Groove-billed Ani is misclassified as a Bronzed Cow-bird. The misclassification is in part due to the user response that the undertail isdefinitely blue; some of the Bronzed Cowbird training images have a bluish sheen,whereas none of the Groove-billed Ani images are blue. Although the Groove-billedAni has some visually distinguishing features, such as a more textured feather patternand unusually wide beak, these attributes were not present in our attribute vocabulary.

Cedar Waxwing

Fig. 21. We’re missing some attribute: Although the above image was correctlyclassified, a distinguishing feature of the Cedar Waxwing is the red wing marking,an attribute that we did not include. Including that attribute could potentially haveallowed that image to be classified correctly using fewer questions.

Date post:	14-Aug-2019
Category:	Documents
Upload:	nguyenhanh
View:	215 times
Download:	0 times

A Alternate Algorithms For Combining Computer Vision and...

Documents