Post on 24-Jun-2015
description
transcript
Lu, Wang-Chou
Image Representation Usage Guide
2014/10/01 @ 林⼜⼝口
Why Image Representation?
Machine Learning Course @ Caltech
Xi = 500 x 375 D
Low Dimensional Vector
Map of Representation
Robust
Computation Cost
Color Histogram, !PCA, !
Sparse Coding, !Bag of Visual Word, !
DPM, !Deep Learning…
Outline❖ Hand Crafted Features!
❖ Machine Learning approach!
❖ Hierarchical Approaches!
❖ When to use? Real-time vs Precision
Hand Crafted Features❖ Color Histogram, Template, Haar Features!
❖ Interested Point Detector + HOG!
❖ Bag of Visual Word
Simple FeaturesHistogram Based Haar Features
Template Based
SIFT Like Approach
HOG, 3780D, !overlapped 7 x 15 cells * !
( normalized 2x2 grid) * 9 binsSIFT 128D,|V| = 1!4 X 4 grid * 8 bins
David Lowe [IJCV 2004] N Dalal et al [CVPR2005]
Bag of Visual Word + SPM
SVM
SIFT!Descriptor
S. Lazebnik et al [cvpr06]
Machine Learning Approach❖ Dimensionality Reduction!
❖ PCA, Manifold Learning, Sparse Coding, LSH!
❖ Deformable Part Model!
❖ Neural Network!
❖ Convolution Neural Network
Principle Component Analysis
MA Turk et al [cvpr91]
Manifold Learning
[ISOMAP, LLE 2003]
Sparse Coding
H Lee et al. [NIPS 2007]
reconstruction error sparsity
Y: Input Vector!B: Basis Matrix!
Z: weight
Locality Sensitive Hashing Embedding
Deformable Part Model
Pedro F. Felzenszwalb et al [PAMI 2010]
Neural Network
Tanh & Sigmoid !nonlinear function
Convolution Neural Network
LeCun 1989
Krizhevsky et al. [NIPS2012]
ReLu
State of the Art
GoogleNet 2014
MSRA2014
Deepness Table
Convolution Neural Network & Deformable Part Model use max pooling, !others use sum pooling or say histogram pooling
Image Representation Usage Guide
1
1.5
2
2.5
3
3.5
4
4.5
5
iPhone 5s Tegra K1 or PC Geforce Titan HPC
Real Time ApplicationInteractive Application
Color Histogram
SIFT/HOG
Bag of Visual Word
BoW+ SPM, Deformable Part Model
Convolution Neural Network
45 gflops 370 gflops 5.1 tera flops
Deepness
gflops for Single Precision, PC: i7 3.5G 4 cores parellel computing
TRAINING TIME NOT INCLUDED
Some Tips❖ GPU ~= 50 CPU Cores!
❖ Hand Crafted Feature is shallow, higher feature template need to be learnt.!
❖ Do Dimensionality Reduction!
❖ Deeper Features, More Training Data!
❖ Handle Invariance: Registration vs Spatial Pooling!
❖ The Learnt Deep Representation(CNN) is shareable