+ All Categories
Home > Documents > Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching...

Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching...

Date post: 17-Jan-2016
Category:
Upload: lambert-lester
View: 214 times
Download: 0 times
Share this document with a friend
43
Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld Course website http://www.wisdom.weizmann.ac.il/~vision/courses/2016_1/ DNN/index.html
Transcript
Page 1: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Hands-on course in deep neural networks

for visionInstructors

Michal Irani, Ronen Basri

Teaching AssistantsIta Lifshitz, Ethan Fetaya, Amir Rosenfeld

Course websitehttp://www.wisdom.weizmann.ac.il/~vision/courses/2016_1/DNN/index.html

Page 2: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.
Page 3: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Schedule (fall semester)

4/11/2015 Introduction18/11/2015 Tutorial on Caffe, exercise 1

9/12/2015 Q&A on exercise 116/12/2015 Submission of exercise 1 (no class)

6/1/2016 Student presentations of recent work27/1/2016 Project selection

Page 4: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Nerve cells in a microscope

Page 5: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

1836 First microscopic image of a nerve cell (Valentin)1838 First visualization of axons (Remak)1862 First description of the neuromuscular junction (Kühne)1873 Introduction of silver-chromate technique as staining

procedure (Golgi)1888 Birth of the neuron doctrine: the nervous system is made up of

independent cells (Cajal)1891 The term “neuron” is coined (Waldeyer).1897 Concept of synapse (Sherrington)1906 Nobel Prize: Cajal and Golgi.1921 Nobel Prize: Sherrington

Page 6: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Nerve cell

Page 7: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Synapse

Page 8: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Action potential

Page 9: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

• The human brain contains ~86 billion nerve cells• The DNA cannot encode a different function for each cell• Therefore,

• Each cell must perform a simple computation• The overall computations are achieved by ensembles of neurons

(“connectionism”)• These computations can be changed dynamically by learning (“plasticity”)

Page 10: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

A Logical Calculus of the Ideas Immanent in Nervous Activity

Warren S. Mcculloch and Walter PittsBULLETIN OF MATHEMATICAL BIOPHYSICS VOLUME 5, 1943

“We shall make the following physical assumptions for our calculus.1. The activity of the neuron is an "all-or-none" process. 2. A certain fixed number of synapses must be excited within the period

of latent addition in order to excite a neuron at any time, and this number is independent of previous activity and position on the neuron.

3. The only significant delay within the nervous system is synaptic delay. 4. The activity of any inhibitory synapse absolutely prevents excitation of

the neuron at that time. 5. The structure of the net does not change with time.”

Page 11: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

The Perceptron: A Probabilistic Model for Information Storage and Organization in The Brain

Frank RosenblattPsychological Review Vol. 65, No. 6, 1958

Page 12: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

A simplified view of neural computation• A nerve cell accumulates electric potential at its dendrites• This accumulation is due to the flux of neurotransmitors from

neighboring (input) neurons• The strength of this potential depends on the efficacy of the synapse

and can be positive (“excitatory”) or negative (“inhibitory”)• Once sufficient electric potential is accumulated (exceeding a certain

threshold) one or more action potentials (spikes) are produced. They then travel through the nerve axon and affect nearby (output) neurons

Page 13: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

The perceptron

• Input • Weights

• Output

• This is a linear classifier • Implemented with one layer of weights

+ threshold (Heaviside step activation)

𝑥1

𝑥𝑑

𝑥2𝑤1

𝑤2

𝑤𝑛

Page 14: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Multi-layer Perceptron

• Linear classifiers are very limited,e.g., XOR configurations cannot be classified

Page 15: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Multi-layer Perceptron

• Linear classifiers are very limited,e.g., XOR configurations cannot be classified

Solution: multilayer, feed-forward perceptron

𝑥1

𝑥𝑑

𝑥2

h1

h𝑚

h2

h ′ 1

h ′𝑚′

h ′ 2

�̂� 1

�̂� 𝑘

�̂� 2

Input Hidden 1 Hidden 2 Output

The non-linear is called“activation function”

Page 16: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Activation functions

• Heaviside (threshold)

• Sigmoid ( or logistic )

• Rectified linear unit (Relu: max(x,0))

• Max pooling (max(

Page 17: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Multi-layer Perceptron

• A perceptron with one hidden layer can approximate any smooth function to an arbitrary accuracy

Page 18: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Supervised learning

• Objective: given labeled training examples , learn a map from input to output,

• Type of problems:• Classification – each input is mapped to one of a discrete set, e.g.,

{person, bike, car}• Regression – each input is mapped to a number, e.g., viewing angle

• How? Given a network architecture, find a set of weights that minimize a loss function on the training data, e.g., -log likelihood of class given input

• Generalization vs. overfit

Page 19: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

• We want to minimize loss on the test data, but the test data is not available in training

• Use a validation set to make sure you don’t overfit

Generalization vs. overfit

𝑥

𝑦

Page 20: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

• We want to minimize loss on the test data, but the test data is not available in training

• Use a validation set to make sure you don’t overfit

Generalization vs. overfit

Loss

Iteration

Loss on training

Loss on validation

Page 21: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Training DNN: back propagation

• A network is a function • Objective: modify to improve the prediction of on the training (and

validation) data• Quality of is measured via the loss function • Back propagation: minimize by gradient descent

• Gradient is computed by a backward pass by applying the chain rule

Page 22: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Calculating the gradient

• Loss: , where • Activation: logistic

(note: )

(Recall that )

𝑥 ′𝑑′𝑥 ′ 1

𝑤 ′ 11

�̂� 1 �̂� 𝑘𝑤 ′𝑑′ 𝑘

𝑥𝑑𝑥1

𝑤11 𝑤𝑑𝑑′

Page 23: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Calculating the gradient

• Computed recursively starting with

𝑥 ′𝑑′𝑥 ′ 1

𝑤 ′ 11

�̂� 1 �̂� 𝑘𝑤 ′𝑑′ 𝑘

𝑥𝑑𝑥1

𝑤11 𝑤𝑑𝑑′

Page 24: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Training algorithm

• Initialize with random weights• Forward pass: given a training input vector apply the network to and

store all intermediate results• Backward pass: starting from top, recursively use the chain rule to

calculate derivatives for all nodes and use those derivatives to calculate for all edges

• Repeat for all training vectors, the gradient is composed of the sum of all over all edges

• Weight update:

Page 25: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Stochastic gradient descent

• Gradient descent requires computing the gradient of the (full) loss over all of the training data at every stepWith large training this is expensive

• Approach: compute the gradient over a sample (“mini-batch”), usually by re-shuffling the training setGoing once through the entire training data is called an epoch

• If learning rate decreases appropriately and under mild assumptions this converges almost surely to a local minimum

• Momentum: pass the gradient from one iteration to the next (with decay), i.e.,

Page 26: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Regularization: dropout

• At training we randomly eliminate half of the nodes in the network• At test we use the full network, but each weight is halved• This spreads the representation of the data over multiple nodes• Informally, it is equivalent to training with many different networks

and using their average response

Page 27: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Invariance

• Invariance to translation, rotation and scale as well as for some transformations of intensity is often desired

• Can be achieved by perturbing the training set (data augmentation) – useful for small transformations (expensive)

• Translation invariance is further achieved with max pooling and/or convolution

Page 28: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Invariance: convolutional neural nets• Translation invariance can be achieved by conv-nets• A conv-net learns a linear filter at the first level, non linear ones higher up• SIFT is an example for a useful non-linear filter, but a learned filter may be more

suitable• Inspired by the local receptive fields in the V1 visual cortex area• Conv-nets were applied successfully to digit recognition in the 90’s (LeCunn et al.

1998), but at the time did not scale well to other kinds of images

𝑥 ′𝑑′𝑥 ′ 1

𝑥𝑑𝑥5𝑥3𝑥2𝑥1 𝑥4

𝑥 ′ 2 𝑥 ′ 3

Weight-sharing: same color same weight

Page 29: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Alex-netKrizhevsky, Sutzkever, Hinton 2012

• Trained and tested on Imagenet: 1.2M training images, 50K validation, 150K test. 1000 categories

• Loss: softmax – top layer has nodes, (here categories). The softmax function renormalizes them by

Network maximizes the multinomial logistic regression objective, that is over the training images of class

Page 30: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Alex-net

• Activation: Relu• Data augmentation

• Translation• Horizontal reflection• gray level shifts by principal components)

• Dropout• Momentum

tanhRelu

Page 31: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Alex-net

Page 32: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Alex-net: results• ILSVRC-2010

• ILSVRC-2012

• More recent networks reduced error to ~7%

Team nameError

(5 guesses)Description

SuperVision 0.15315 Using extra training from ImageNet 2011

SuperVision 0.16422 Using only supplied training data

ISI 0.26172 Weighted sum of scores SIFT+FV, LBP+FV, GIST+FV, and CSIFT+FV

OXFORD_VGG 0.26979 Mixed selection from High-Level SVM and Baseline Scores

XRCE/INRIA 0.27058

University of Amsterdam 0.29576 Baseline: SVM trained on Fisher Vectors over Dense SIFT and Color Statistics

Page 33: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Alex-net: results

(Krizhevsky et al. 2012)

Page 34: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Applications• Image classification• Face recognition• Object detection• Object tracking• Low-level vision

• Optical flow, stereo• Super-resolution, de-bluring• Edge detection• Image segmentation

• Attention and saliency• Image and video captioning• …

Page 35: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Face recognition

• Google’s conv-net is trained on260M face images

• Achieved 99.63% accuracy on LFW(face comparison database), and someof its mistakes turned out to belabeling mistakes)

• Available in Google Photos

(Schroff et al. 2015)

Page 36: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Object detection

• Latest methods achieve average precision of about 60% on PASCAL VOC 2007 and 44% on Imagenet ILSVRC 2014

(He et al. 2015)

Page 37: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Unsupervised learning

• Find structure in data• Type of problems:

• Clustering• Density estimation• Dimensionality reduction

Page 38: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Auto-encoders

• Produce a compact representationof the training

• Analogous to PCA• Note that identity transformation

may be a valid (but undesired)solution

• Initialize by training a Restricted Bolzmann Machine

𝑥1

𝑥𝑑

𝑥2

h1

h𝑚

h2

�̂� 1

�̂� 𝑑

�̂� 2

Input Hidden 1 “Output = Input”

Page 39: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Recurrent networks: Hopfield netJohn Hopfield, 1982

• Fully connected network• Time dynamic: starting from an input,

apply network repeatedly to convergence• Update: ,

minimizes an Ising model energy• Weights are set to store preferred states• “Associative memory” (content address):

denoising and completion

𝑥1𝑥𝑑𝑥2

𝑥3𝑥4

Page 40: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Recurrent networks

• Used, e.g., for image annotation, i.e., produce a descriptive sentence of an input image

Next word

Image

Page 41: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Recurrent networks (unrolled)

• Used, e.g., for image annotation, i.e., produce a descriptive sentence of an input image

First word Second word

Image

Page 42: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

(Kiros et al. 2014)

Page 43: Hands-on course in deep neural networks for vision Instructors Michal Irani, Ronen Basri Teaching Assistants Ita Lifshitz, Ethan Fetaya, Amir Rosenfeld.

Neural network development packages• Caffe: http://caffe.berkeleyvision.org/

• Matconvnet: http://www.vlfeat.org/matconvnet/

• Torch: https://github.com/torch/torch7/wiki/Cheatsheet


Recommended