The Role of Features, Algorithms and Data in Visual Recognition Reporter: WuBin Data: 2011.04.29.

The Role of Features, Algorithms and Data in Visual Recognition

Reporter: WuBinData: 2011.04.29

Outline

Authors Abstract Problem Experiment & Result Discussion Conclusion

Outline


Authors (Devi Parikh) Research Assistant Professor at Toyota

Technological Institute (TTI) at Chicago Research interests

computer vision machine learning pattern recognition

Education Ph.D. : Carnegie Mellon University (2009) MS.: Carnegie Mellon University (2007) BS.: Rowan University (2005)

Publications CVPR’11(2), CVPR’10(1), CVPR’09(1),

CVPR’08(1), ECCV,08(1), ICCV’07(1), CVPR’07(Best Paper )…

Cornell University (2009)

Microsoft Research (twice)

Intel Research (once)

Authors (C. Lawrence Zitnick)

Microsoft Research, Redmond Research Interests

Color Image Single Image Stereo Matching

Education PhD student, Robotics Institute Carnegie Mellon University

Publications CVPR’10(2), ACM Trans. Graph(2), ECCV’10(2), CVPR’09(2),

CVPR’08(2), ECCV,08(2), PAMI’08(1), IJCV’07(1), CVPR’06(1), ECCV,06(1), ICCV’05(1), PAMI’00(1)…

Outline


Abstract

Which factors is critical to humans’ superior performance on visual (scene and object) recognition Learning algorithm Amount of training data Features representations

In this work we take a small step towards this goal by performing a series of human studies and machine experiments We find no evidence that human pattern matching algorithms are better

than standard machine learning algorithms We find that humans don’t leverage increased amounts of training data Through statistical analysis on the machine experiments and supporting

human studies, we find that the main factor impacting accuracies is the choice of features

摘要在视觉识别（场景以及物体）领域存在很多基于计算机视觉的相关算法。为了取得更好的识别效果，有的系统着眼于复杂的学习算法，一些则利用大量的训练数据，还有一些考虑对更有效的特征进行建模。然而遗憾的是，所有这些系统都远远无法达到人类的识别能力。如果我们了解了人类在视觉识别上的响应方式，那么就能对上述三种方式的有效性产生更深刻的认识，从而发现究竟是什么造就了人类优越的识别能力。

本文通过对人类学习和机器学习的一系列实验，朝着这个方向前进了一小步。我们发现，没有任何证据证明人类的学习算法要优于标准的机器学习算法。另外，人类也不依赖于增加大量的训练样本来提高识别能力。在本文实验的基础上，通过统计分析发现，影响识别精度的最重要的因素在于特征的选择。

Outline


Problem

What makes humans so much better at these tasks than today’s machines?

Outline


Experiment

Machine experiments Classifiers Datasets Feature types Dimensionality Proportion of noisy features Number of training instances

Human studies

Machine experiments (1)

Classifiers NN: nearest neighbor NCM: nearest class-mean LSVM: linear SVM QSVM: SVM with a quadratic polynomial kernel CSVM: SVM with a cubic polynomial kernel RBFSVM: SVM with an Radial Basis Function (RBF) kernel DT: decision tree NET: a multi-layer perceptron neural network with 1 hidden layer and 2

0 hidden layer nodes BOOST: boosting with linear SVM on individual features as the simple

learners LDASVM: Principal Component Analysis (PCA) then Linear Discrimi

nant Analysis (LDA) followed by a linear SVM classifier


Datasets OSR: eight categories (coast, forest, highway, inside-city, mountain, op

en-country, street, tall-building) of outdoor scene recognition dataset ISR: eight categories (bathroom, bedroom, dining room, gym, kitchen, l

iving room, movie theater and stairs) from the indoor scene recognition dataset

PA1: eight categories (bird, bottle, cat, dog, horse, person, pottedplant, sheep) from the PASCAL object recognition dataset

PA2: eight other categories (aeroplane, bicycle, boat, chair, car, dining table, motorbike, sofa) from the same PASCAL object recognition dataset

CAL: six categories (aeroplane, car-rear, face, ketch, motorbike, watch) from the Caltech-101 object categories dataset


Feature types CH: color histogram computed by assigning all pixels in an imag

e to a pre-computed universal color dictionary computed using k-means

TH: texture histogram computed over a discretization of multi-scale edge orientations in the image

GIST: gist descriptor BOW: bag-of-words feature descriptor for the CAL dataset ATT: binary attributes, indicate whether the objects have certain

higher-level attributes such as being round, or furry, or having a head, etc.


Dimensionality CH: {4; 8; 16; 32; 64; 128; 256} GIST: vary the number of edge-orientations (osi ) at each of the th

ree scales ([s1; s2; s3]), as well as the number of spatial blocks (n x n) the image is divided into.

TH: vary the number of orientation bins across the three scales to obtain descriptors of different dimensionality

BOW: the dimensionality was kept fixed at 200 by using a dictionary with 200 SIFT codewords

ATT: the dimensionality was also kept fixed at 64 for PA2, while PA1 used a 32 bit version in addition to the 64 bit one by dropping the attributes that were almost always set to zero across the dataset


Proportion of noisy features {0%; 25%; 50%; 100%; 200%} 200% indicates that twice the number of original

features are added as noisy features Gaussian distribution with the same mean and standard

deviation


Number of training instances Vary the number of training instances used per

category in the range {2; 4; 8; 16; 32; 64; 100(88 for CAL)}

Human Studies (1)

To prevent the use of prior knowledge about images by the subjects, we do not display to them any direct image information such as texture patches or color. Instead, we use abstracted visual patterns as stimuli.

subjects performed at 34% for Figure 3(a), 47% for (b), 50% for (c) and 47% for (d)

Human Studies (2)

Result

The Role of Algorithms The Role of Data The Role of Features

The Role of Algorithms Fix the set of input features and training data for each set of

experiments

The learning algorithm used by humans is not superior to state-of-the-art techniques on these types of problems

The Role of Data (1)

The machine experiments show consistent improvement as the number of training instances increase

It is unlikely that machine accuracies will match those of humans when given the original images

Humans may not be as capable at leveraging large amounts of training data for pattern matching

Humans are very capable of generalizing from a small number of training examples

The Role of Data (2)

For machine experiments significantly more training examples are needed to achieve similar levels of accuracy

Linear SVMs and NCM are less sensitive to noise

Humans are also susceptible to data noise

The Role of Features (1)

Edge and gradient based features typically out perform color based features across the various datasets

Humans are also known to be very sensitive to edge or contour information

We perform human studies on recognition using the same features and training sets as the machine experiments, and they show similar treads

The feature set is critical for recognition


Humans are shown natural images from the outdoor scene recognition dataset under different transformations and asked to select a category name from a list

Don’t provide subjects with any training data Two transformations

Block test The image is divided into non-overlapping blocks, and the pixels in each

block are randomly shuffled Maintains the global layout of the scene, but the local statistics are lost

Puzzle-test The image is divided into non-overlapping blocks, but the blocks are

randomly shuffled in the image while maintaining the pixels’ relative locations in the block

Local regions of the image are preserved while the global layout is not.


In both high and low resolution images, human recognition is robust to a significant loss of local statistics. This indicates that humans rely on the global layout of the scene for scene recognition

In high resolution images, human recognition rates are also very robust even when the global layout of the scene is drastically altered, which indicates that humans can also rely on local regions of images


Humans do not rely on a fixed set of features. Depending on the information available to them, humans can adaptively rely on different sets of features during testing. This is true even if similar instances have never been seen before. This ability to adapt during testing is not seen in standard machine learning algorithms

Outline


Discussion

Discussion

The notion of features goes beyond the choice between colors, texture, the need for spatial information, etc. It includes the concepts of incorporating semantic attributes that are shared across categories

Perhaps what makes the human feature representation so powerful is that these feature representations are tuned for high performance at a variety of tasks

It is important to note that in addition to visual features, humans leverage prior knowledge from several non-visual higher-level and semantic features about how the world we live in functions

Outline


Conclusion

In this paper we study human responses on visual recognition problems as posed to machines, to gain insight into which of the three factors is critical to humans’ superior performance. learning algorithm amount of training data features

We find no evidence that human pattern matching algorithms are better than standard machine learning algorithms. Moreover, we find that humans don’t leverage increased amounts of training data. We thus hypothesize with the aid of ANOVA analysis that features are the main factor contributing to superior human performance. Future work involves extensive studies to identify which visual features humans rely on to aid in the development of novel machine recognition algorithms.

Date post:	16-Jan-2016
Category:	Documents
Upload:	annabelle-fleming
View:	216 times
Download:	0 times

The Role of Features, Algorithms and Data in Visual Recognition Reporter: WuBin Data: 2011.04.29.

Documents