Image representation usage guide

transcript

Lu, Wang-Chou

Image Representation Usage Guide

2014/10/01 @ 林⼜⼝口

Why Image Representation?

Machine Learning Course @ Caltech

Xi = 500 x 375 D

Low Dimensional Vector

Map of Representation

Robust

Computation Cost

Color Histogram, !PCA, !

Sparse Coding, !Bag of Visual Word, !

DPM, !Deep Learning…

Outline❖ Hand Crafted Features!

❖ Machine Learning approach!

❖ Hierarchical Approaches!

❖ When to use? Real-time vs Precision

Hand Crafted Features❖ Color Histogram, Template, Haar Features!

❖ Interested Point Detector + HOG!

❖ Bag of Visual Word

Simple FeaturesHistogram Based Haar Features

Template Based

SIFT Like Approach

HOG, 3780D, !overlapped 7 x 15 cells * !

( normalized 2x2 grid) * 9 binsSIFT 128D,|V| = 1!4 X 4 grid * 8 bins

David Lowe [IJCV 2004] N Dalal et al [CVPR2005]

Bag of Visual Word + SPM

SIFT!Descriptor

S. Lazebnik et al [cvpr06]

Machine Learning Approach❖ Dimensionality Reduction!

❖ PCA, Manifold Learning, Sparse Coding, LSH!

❖ Deformable Part Model!

❖ Neural Network!

❖ Convolution Neural Network

Principle Component Analysis

MA Turk et al [cvpr91]

Manifold Learning

[ISOMAP, LLE 2003]

Sparse Coding

H Lee et al. [NIPS 2007]

reconstruction error sparsity

Y: Input Vector!B: Basis Matrix!

Z: weight

Locality Sensitive Hashing Embedding

Deformable Part Model

Pedro F. Felzenszwalb et al [PAMI 2010]

Neural Network

Tanh & Sigmoid !nonlinear function

Convolution Neural Network

LeCun 1989

Krizhevsky et al. [NIPS2012]

State of the Art

GoogleNet 2014

MSRA2014

Deepness Table

Convolution Neural Network & Deformable Part Model use max pooling, !others use sum pooling or say histogram pooling

Image Representation Usage Guide

iPhone 5s Tegra K1 or PC Geforce Titan HPC

Real Time ApplicationInteractive Application

Color Histogram

SIFT/HOG

Bag of Visual Word

BoW+ SPM, Deformable Part Model

Convolution Neural Network

45 gflops 370 gflops 5.1 tera flops

Deepness

gflops for Single Precision, PC: i7 3.5G 4 cores parellel computing

TRAINING TIME NOT INCLUDED

Some Tips❖ GPU ~= 50 CPU Cores!

❖ Hand Crafted Feature is shallow, higher feature template need to be learnt.!

❖ Do Dimensionality Reduction!

❖ Deeper Features, More Training Data!

❖ Handle Invariance: Registration vs Spatial Pooling!

❖ The Learnt Deep Representation(CNN) is shareable

Image representation usage guide

Data & Analytics