+ All Categories
Home > Documents > What is the Best Multi-Stage Architecture for Object Recognition

What is the Best Multi-Stage Architecture for Object Recognition

Date post: 31-Dec-2015
Category:
Upload: euphemia-cathy
View: 28 times
Download: 0 times
Share this document with a friend
Description:
What is the Best Multi-Stage Architecture for Object Recognition. Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo Li ECE, Duke University Dec. 13rd, 2010. Outline. Introduction Model Architecture Training Protocol Experiments Caltech 101 Dataset - PowerPoint PPT Presentation
Popular Tags:
21
What is the Best Multi- Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo Li ECE, Duke University Dec. 13rd, 2010
Transcript
Page 1: What is the Best Multi-Stage Architecture for Object Recognition

What is the Best Multi-Stage Architecture for Object Recognition

Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun

Presented by Lingbo Li

ECE, Duke University

Dec. 13rd, 2010

Page 2: What is the Best Multi-Stage Architecture for Object Recognition

Outline

• Introduction

• Model Architecture

• Training Protocol

• Experiments Caltech 101 Dataset NORB Dataset MNIST Dataset

• Conclusions

Page 3: What is the Best Multi-Stage Architecture for Object Recognition

Introduction (I)Feature extraction stages:A filter bank A non-linear operationA pooling operation

Recognition architectures:

•Single stage of features + supervised classifier: SIFT, HoG, etc.

•Two or more successive stages of feature extractors + supervised classifier: convolutional networks

Page 4: What is the Best Multi-Stage Architecture for Object Recognition

Introduction (II)

• Q1: How do the non-linearities that follow the filter banks influence the recognition accuracy?

• Q2: Is there any advantage to using an architecture with two successive stages of features extraction, rather than with a single stage?

• Q3: Does learning the filter banks in an unsupervised or supervised manner improve the performance over hard-wired filters or even random filters?

Page 5: What is the Best Multi-Stage Architecture for Object Recognition

Model Architecture (I)• Input:

Output:

Filter :

A filter bank layer with 64 filters of size 9x9 :

is the j-th feature map

Page 6: What is the Best Multi-Stage Architecture for Object Recognition

Model Architecture (II)

• • Subtractive normalization operation

Divisive normalization operation

Page 7: What is the Best Multi-Stage Architecture for Object Recognition

Model Architecture (III)

An average pooling layer with 4x4 down-sampling:

A max-pooling layer with 4x4 down-sampling:

Page 8: What is the Best Multi-Stage Architecture for Object Recognition

Model Architecture (IV)

Combining Modules into a Hierarchy• • •

Page 9: What is the Best Multi-Stage Architecture for Object Recognition

Training Protocol (I) Optimal sparse coding:

Under sparse condition, this problem can be written as an optimization problem:

Given training samples , learning proceeds:

1)Minimize the loss function

2)Find by running a rather expensive optimization algorithm.

Page 10: What is the Best Multi-Stage Architecture for Object Recognition

Training Protocol (II) Predictive Sparse Decomposition (PSD) PSD trains a regressor to approximate the sparse solution for all training samples, where

Learning proceeds by minimizing the loss function

where

Thus, (dictionary) and (filters) are simultaneously optimized.

Page 11: What is the Best Multi-Stage Architecture for Object Recognition

Training Protocol (III)A single letter: an architecture with a single stage of feature extraction followed by a classifier;

A double letter: an architecture with two stages if feature extraction followed by a classifier.

Filters are set to random values and kept fixed.

Classifiers are trained in supervised mode.

Filters are trained using unsupervised PSD algorithm, and kept fixed.

Classifiers are trained in supervised model.

Filters are initialized with random values. The entire system (Feature stages + classifiers) is trained in supervised mode with gradient descent.

Filters are initialized with the PSD unsupervised learning algorithm. The entire system (feature stages + classifiers) is trained in supervised mode by gradient descent.

Page 12: What is the Best Multi-Stage Architecture for Object Recognition

Experiments (I) – Caltech 101

• Data pre-processing:1) Convert to gray-scale and resize to 151x151 pixels;

2) Subtract the image mean and divide by the image standard deviation;

3) Apply subtractive/divisive normalization (N layer with c=1);

4) Zero-padding the shorter side to 143 pixels.

• Recognition rates are averaged over 5 drawings of the training set (30 images per class).

• Hyper-parameters are selected to maximize the performance on the validation set of 5 samples per class taken out of the training sets.

Page 13: What is the Best Multi-Stage Architecture for Object Recognition

Experiments (I) – Caltech 101

• Using a Single Stage of Feature Extraction:

• Using Two Stages of Feature Extraction:

Multinomial logistic regression

PMK-SVM64 26x26 feature maps

Multinomial logistic regression

PMK-SVM 256 4x4feature maps

64 26x26 feature maps

Page 14: What is the Best Multi-Stage Architecture for Object Recognition

Experiments (I) – Caltech 101

Page 15: What is the Best Multi-Stage Architecture for Object Recognition

Experiments (I) – Caltech 101

• Random filters and no filter learning whatsoever with can achieve decent performance; • Supervised fine tuning improves the performance;• Two-stage systems are better than their single-stage

counterparts;• With rectification and normalization , unsupervised training

does not improve the performance;• abs rectification is a crucial component for good

performance;• Single-stage system with PMK-SVM reaches the same

performance with a two-stage with logistic regression;

Page 16: What is the Best Multi-Stage Architecture for Object Recognition

Experiments (II) – NORB Dataset

• NORB dataset has 5 object categories;

• 24300 training samples and 24300 test samples (4860 per class); Each image is gray-scale with 96x96 pixels;

• Only consider the protocols;

1) Random filters do not perform as well as learned filters with more labels samples.

2) The use of abs and normalization makes a big difference.

Page 17: What is the Best Multi-Stage Architecture for Object Recognition

Experiments (II) – NORB Dataset

Use gradient descent to find the optimal input patterns in a

architecture.

In the left figure:

•(1-a) random stage-1 filters;

•(1-b) corresponding optimal inputs;

•(2-a) PSD filters;

•(2-b) Optimal input patterns;

•(3) subset of stage-2 filters after PSD and supervised refinement on Caltech-101.(3)

(1-a)

(2-b)(2-a)

(1-b)

Page 18: What is the Best Multi-Stage Architecture for Object Recognition

Experiments (III) – MNIST Dataset

• 60,000 gray-scale 28x28 pixel images for training and 10,000 images for testing;

• 2-stage of feature extraction:

convolution50 7x7 filters

Max-pooling2*2 windows

50 28x28feature maps

50 14x14feature maps

Input Image34x34

convolution1024 5x5filters

64 10x10feature maps

Max-pooling2x2 windows

64 5x5feature maps

the first stage

the second stage

10-way multinomial

logistic regression

Page 19: What is the Best Multi-Stage Architecture for Object Recognition

Experiments (III) – MNIST Dataset

• Parameters are trained with PSD: the only hyper-parameter is tuned with a validation set of 10,000 training samples.

• The classifier is randomly initialized;

• The whole system is tuned in supervised mode.

• A test error rate of 0.53% was obtained.

Page 20: What is the Best Multi-Stage Architecture for Object Recognition

Conclusions (I)

• Q1: How do the non-linearities that follow the filter banks influence the recognition accuracy?

1) A rectifying non-linearity is the single most important factor.

2) A local normalization layer can also improve the performance.

• Q2: Is there any advantage to using an architecture with two successive stages of feature extraction, rather than with a single stage?

1) Two stages are better than one. 2) The performance of two-stage system is similar to that of

the best single-stage systems based on SIFT and PMK-SVM.

Page 21: What is the Best Multi-Stage Architecture for Object Recognition

Conclusions (II)

• Q3: Does learning the filter banks in an unsupervised or supervised manner improve the performance over hard-wired filters or even random filters?

1) Random filters yield good performance only in the case of small training set.

2) The optimal input patterns for a randomly initialized stage are similar to the optimal inputs for a stage that use learned filters.

3) The global supervised learning of filters yields good recognition rate if with the proper non-linearites.

4) Unsupervised pre-training followed by supervised refinement yields the best overall accuracy.


Recommended