+ All Categories
Home > Documents > Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and...

Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and...

Date post: 16-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
74
Lecture 10 recap Prof. Leal-Taixé and Prof. Niessner 1
Transcript
Page 1: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Lecture 10 recap

Prof. Leal-Taixé and Prof. Niessner 1

Page 2: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

LeNet

• Digit recognition: 10 classes

• Conv -> Pool -> Conv -> Pool -> Conv -> FC• As we go deeper: Width, height Number of filters

Prof. Leal-Taixé and Prof. Niessner 2

60k parameters

Page 3: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

AlexNet

Prof. Leal-Taixé and Prof. Niessner 3

[Krizhevsky et al. 2012]

• Softmax for 1000 classes

Page 4: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

VGGNet

• Striving for simplicity

• CONV = 3x3 filters with stride 1, same convolutions

• MAXPOOL = 2x2 filters with stride 2

Prof. Leal-Taixé and Prof. Niessner 4

[Simonyan and Zisserman 2014]

Page 5: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

VGGNet

Prof. Leal-Taixé and Prof. Niessner 5

Conv=3x3,s=1,sameMaxpool=2x2,s=2

Page 6: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

VGGNet

Prof. Leal-Taixé and Prof. Niessner 6

• Conv -> Pool -> Conv -> Pool -> Conv -> FC• As we go deeper: Width, height Number of filters

• Called VGG-16: 16 layers that have weights

• Large but simplicity makes it appealing

138M parameters

Page 7: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

The problem of depth

• As we add more and more layers, training becomes

harder

• Vanishing and exploding gradients

• How can we train very deep nets?

Prof. Leal-Taixé and Prof. Niessner 7

Page 8: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Residual block

• Two layers

Prof. Leal-Taixé and Prof. Niessner 8

Input

Linear Non-linearity

WLxL�1 + bL xL = f(WLxL�1 + bL)

xL+1 = f(WL+1xL + bL+1)

xL�1 xL+1xL

Page 9: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Residual block

• Two layers

Prof. Leal-Taixé and Prof. Niessner 9

Linear

xL�1 xL+1xL

Linear

Main path

Input

Skip connection

Page 10: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Residual block

• Two layers

Prof. Leal-Taixé and Prof. Niessner 10

Linear

xL�1 xL+1xL

LinearInput

xL+1 = f(WL+1xL + bL+1)

xL+1 = f(WL+1xL + bL+1 + xL�1)

Page 11: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Residual block

• Two layers

• Usually use a same convolution since we need same

dimensions

• Otherwise we need to convert the dimensions with a

matrix of learned weights or zero padding

Prof. Leal-Taixé and Prof. Niessner 11

xL+1xL

xL�1 +

Page 12: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Why do ResNets work?

• The identity is easy for the residual block to learn

• Guaranteed it will not hurt performance, can only improve

Prof. Leal-Taixé and Prof. Niessner 12

xL+1xL

xL�1 +NN

Page 13: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

1x1 convolution

-5 3 2 -5 3

4 3 2 1 -3

1 0 3 3 5

-2 0 1 4 4

5 6 7 9 -1

2

Imag

e 5

x5

Ke

rne

l 1x1

Prof. Leal-Taixé and Prof. Niessner 13

What is the output size?

Page 14: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

1x1 convolution

-5 3 2 -5 3

4 3 2 1 -3

1 0 3 3 5

-2 0 1 4 4

5 6 7 9 -1

2

Imag

e 5

x5

Ke

rne

l 1x1

−5 ∗ 2 = −10

Prof. Leal-Taixé and Prof. Niessner 14

-10

Page 15: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

1x1 convolution

-5 3 2 -5 3

4 3 2 1 -3

1 0 3 3 5

-2 0 1 4 4

5 6 7 9 -1

2

Imag

e 5

x5

Ke

rne

l 1x1

−1 ∗ 2 = −2

Prof. Leal-Taixé and Prof. Niessner 15

-10 6 4 -10 6

8 6 4 2 -6

2 0 6 6 10

-4 0 2 8 8

10 12 14 18 -2

Page 16: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

1x1 convolution

-5 3 2 -5 3

4 3 2 1 -3

1 0 3 3 5

-2 0 1 4 4

5 6 7 9 -1

Imag

e 5

x5

Prof. Leal-Taixé and Prof. Niessner 16

-10 6 4 -10 6

8 6 4 2 -6

2 0 6 6 10

-4 0 2 8 8

10 12 14 18 -2

• For 1 kernel or filter, it keeps the dimensions and just scales the input with a number

Page 17: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Using 1x1 convolutions

Prof. Leal-Taixé and Prof. Niessner 17

• Use it to shrink the number of channels• Further adds a non-linearity à one can learn more

complex functions

32

32200

32

3232

32 Conv 1x1x200+ ReLU

Page 18: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Inception layer

• Tired of choosing filter sizes?

• Use them all!

• All same convolutions

• 3x3 max pooling is with stride 1

18Prof. Leal-Taixé and Prof. Niessner

Page 19: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Inception layer: computational cost32

32200

92 Conv 5x5+ ReLU

Multiplications: 1x1x200x32x32x16 5x5x16x32x32x92 ~ 40 million

32

3292

16 Conv 1x1+ ReLU

32

3216

Reduction of multiplications by 1/10Prof. Leal-Taixé and Prof. Niessner 19

Page 20: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Inception layer

Prof. Leal-Taixé and Prof. Niessner 20

Page 21: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Semantic Segmentation (FCN)[Long et al. 15] Fully Convolutional Networks for Semantic Segmetnation (FCN)

Prof. Leal-Taixé and Prof. Niessner 21

Page 22: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Transfer learning

Prof. Leal-Taixé and Prof. Niessner 22

Trained on

ImageNet

New dataset

with C classes

TRAIN

FROZEN

Donahue 2014, Razavian 2014

Page 23: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Now you are:

• Ready to perform image classification on any dataset

• Ready to design your own architecture

• Ready to deal with other problems such as semantic segmentation (Fully Convolutional Network)

Prof. Leal-Taixé and Prof. Niessner 23

Page 24: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Recurrent Neural Networks

Prof. Leal-Taixé and Prof. Niessner 24

Page 25: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

RNNs are flexible

Classic Neural Networks for Image ClassificationProf. Leal-Taixé and Prof. Niessner 25

Page 26: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

RNNs are flexible

Image captioningProf. Leal-Taixé and Prof. Niessner 26

Page 27: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

RNNs are flexible

Language recognitionProf. Leal-Taixé and Prof. Niessner 27

Page 28: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

RNNs are flexible

Machine translationProf. Leal-Taixé and Prof. Niessner 28

Page 29: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

RNNs are flexible

Event classificationProf. Leal-Taixé and Prof. Niessner 29

Page 30: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Basic structure of a RNN

• Multi-layer RNNOutputs

Inputs

Hidden states

Prof. Leal-Taixé and Prof. Niessner 30

Page 31: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Basic structure of a RNN

• Multi-layer RNNOutputs

Inputs

Hidden states

The hidden state will have its own internal dynamics

More expressive model!

Prof. Leal-Taixé and Prof. Niessner 31

Page 32: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Basic structure of a RNN

• We want to have notion of “time” or “sequence”

[Christopher Olah] Understanding LSTMs

Hidden state inputPrevious

hidden state

Prof. Leal-Taixé and Prof. Niessner 32

Page 33: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Basic structure of a RNN

• We want to have notion of “time” or “sequence”

Hidden

stateParameters to be learned

Prof. Leal-Taixé and Prof. Niessner 33

Page 34: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Basic structure of a RNN

• We want to have notion of “time” or “sequence”

Hidden state

Note: non-linearitiesignored for now

Output

Prof. Leal-Taixé and Prof. Niessner 34

Page 35: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Basic structure of a RNN

• We want to have notion of “time” or “sequence”

Hidden state

Same parameters for each time step = generalization!

Output

Prof. Leal-Taixé and Prof. Niessner 35

Page 36: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Basic structure of a RNN

• Unrolling RNNs

[Christopher Olah] Understanding LSTMs

Hidden state is the same

Prof. Leal-Taixé and Prof. Niessner 36

Page 37: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Basic structure of a RNN

• Unrolling RNNs

[Christopher Olah] Understanding LSTMsProf. Leal-Taixé and Prof. Niessner 37

Page 38: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Basic structure of a RNN

• Unrolling RNNs as feedforward nets

w1

w2

w3

w4

w1

w2

w3

w4

w1

w2

w3

w4

w1

w2

w3

w4

xt1

xt2

xt+21

xt+22xt+1

2

xt+11

Weights are the same!

Prof. Leal-Taixé and Prof. Niessner 38

Page 39: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Backprop through a RNN

• Unrolling RNNs as feedforward nets

w1

w2

w3

w4

w1

w2

w3

w4

w1

w2

w3

w4

w1

w2

w3

w4

Chain rule

All the way to t=0

Add the derivatives at different times for each weightProf. Leal-Taixé and Prof. Niessner 39

Page 40: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Long-term dependencies

I moved to Germany … so I speak German fluently

Prof. Leal-Taixé and Prof. Niessner 40

Page 41: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Long-term dependencies

• Simple recurrence

• Let us forget the input At = ✓tA0

Same weights are multiplied over and over again

Prof. Leal-Taixé and Prof. Niessner 41

Page 42: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Long-term dependencies

• Simple recurrence

What happens to small weights?

What happens to large weights?

Vanishing gradient

Exploding gradient

At = ✓tA0

Prof. Leal-Taixé and Prof. Niessner 42

Page 43: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Long-term dependencies

• Simple recurrence

• If admits eigendecomposition

At = ✓tA0

Diagonal of this matrix are the eigenvalues

Matrix of eigenvectors

Prof. Leal-Taixé and Prof. Niessner 43

Page 44: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Long-term dependencies

• Simple recurrence

• If admits eigendecomposition

• Orthogonal allows us to simplify the recurrence

At = Q⇤tQ|A0

At = ✓tA0

Prof. Leal-Taixé and Prof. Niessner 44

Page 45: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Long-term dependencies

• Simple recurrence At = Q⇤tQ|A0

What happens to eigenvalues with magnitude less than one?

What happens to eigenvalues with magnitude larger than one?

Vanishing gradient

Exploding gradient Gradient clipping

Prof. Leal-Taixé and Prof. Niessner 45

Page 46: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Long-term dependencies

• Simple recurrence At = ✓tA0

Let us just make a matrix with eigenvalues = 1

Allow the cell to maintain its “state”

Prof. Leal-Taixé and Prof. Niessner 46

Page 47: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Vanishing gradient

• 1. From the weights

• 2. From the activation functions (tanh)

At = ✓tA0

Prof. Leal-Taixé and Prof. Niessner 47

Page 48: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Vanishing gradient

• 1. From the weights

• 2. From the activation functions (tanh)

At = ✓tA0

1

Prof. Leal-Taixé and Prof. Niessner 48

Page 49: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Long Short Term

Memory

Hochreiter and Schmidhuber 1997Prof. Leal-Taixé and Prof. Niessner 49

Page 50: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Long-Short Term Memory Units

• Simple RNN has tanh as non-linearity

Prof. Leal-Taixé and Prof. Niessner 50

Page 51: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Long-Short Term Memory Units

• LSTM

Prof. Leal-Taixé and Prof. Niessner 51

Page 52: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Long-Short Term Memory Units

• Key ingredients • Cell = transports the information through the unit

Prof. Leal-Taixé and Prof. Niessner 52

Page 53: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Long-Short Term Memory Units

• Key ingredients • Cell = transports the information through the unit

• Gate = remove or add information to the cell state

Sigmoid

Prof. Leal-Taixé and Prof. Niessner 53

Page 54: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

LSTM: step by step

• Forget gate

Decides when to erase the cell state

Sigmoid = output between 0 (forget) and 1 (keep)

Prof. Leal-Taixé and Prof. Niessner 54

Page 55: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

LSTM: step by step

• Input gateDecides which values will be updated

New cell state, output from a tanh (-1,1)

Prof. Leal-Taixé and Prof. Niessner 55

Page 56: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

LSTM: step by step

• Element-wise operations

Prof. Leal-Taixé and Prof. Niessner 56

Page 57: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

LSTM: step by step

• Output gate

Decides which

values will be

outputted

Output from a

tanh (-1,1)

Prof. Leal-Taixé and Prof. Niessner 57

Page 58: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

LSTM: step by step

• Forget gate

• Input gate

• Output gate

• Cell update

• Cell

• Output

gt = Tanh(✓xgxt + ✓hght�1 + bg)

ht = ot � Tanh(Ct)

Prof. Leal-Taixé and Prof. Niessner 58

Page 59: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

LSTM: vanishing gradients?

• Cell

• 1. From the weights

• 2. From the activation functions

weightsIdentity function

1 for important information

Prof. Leal-Taixé and Prof. Niessner 59

Page 60: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Long-Short Term Memory Units

• Highway for the gradient to flow

Prof. Leal-Taixé and Prof. Niessner 60

Page 61: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

RNN’s in Computer Vision

• Caption generation

Xu et al. 2015Prof. Leal-Taixé and Prof. Niessner 62

Page 62: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

RNN’s in Computer Vision

• Caption generation

• Focus is shifted to different parts of the image

Xu et al. 2015Prof. Leal-Taixé and Prof. Niessner 63

Page 63: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

RNN’s in Computer Vision

• Instance segmentation

Romera-Paredes et al. 2015Prof. Leal-Taixé and Prof. Niessner 64

Page 64: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Final exam

Prof. Leal-Taixé and Prof. Niessner 65

Page 65: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Final exam

• Multiple choice questions • Series of questions with free answer

• There can be questions related to the exercises à if you did the exercises it will be easier for you to answer them

Prof. Leal-Taixé and Prof. Niessner 66

Page 66: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Final exam

• Must-know topics:

– Basics of ML à from linear classifier to NN

– Optimization schemes (not necessary to know all the

formulas, but to have a good understanding of the

differences between them and their behavior

– Backpropagation: concept, math, hint: be fluent at

computing backprop by hand

– Loss functions and activation functions

– CNN: convolution, backprop

– RNN, LSTMs

Prof. Leal-Taixé and Prof. Niessner 67

Page 67: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Admin

• Exam date: July 16th at 08:00

• There will NOT be a retake exam

• No cheat sheet nor calculator during the exam

Prof. Leal-Taixé and Prof. Niessner 68

Page 68: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Next semesters: new DL courses

Prof. Leal-Taixé and Prof. Niessner 69

Page 69: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Deep Learning at TUM

• Keep expanding the courses on Deep Learning

• This Introduction to Deep Learning course is the basis for a series of Advanced DL lectures on different topics

• Advanced topics are typically only for Master students

Prof. Leal-Taixé and Prof. Niessner 70

Page 70: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Deep Learning at TUM

Intro to Deep

Learning

DL for Physics(Thuerey)

DL for Vision (Niessner, Leal-Taixe)

DL for Medical Applicat.

(Menze)

DL in Robotics

(Bäuml)

Machine Learning(Günnemann)

Prof. Leal-Taixé and Prof. Niessner 71

Page 71: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Advanced DL for Computer Vision

• Deep Learning for Vision (WS18/19): syllabus– Advanced architectures, e.g. Siamese neural networks

– Variational Autoencoders– Generative models, e.g. GAN,

– Multi-dimensional CNN– Bayesian Deep Learning

Prof. Leal-Taixé and Prof. Niessner 72

Page 72: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Advanced DL for Computer Vision

• Deep Learning for Vision (WS18/19)– 2 V + 5 P

– Must have attended the Intro to DL

– Practical part is a project that will last the whole semester– Please do not sign up unless you are willing to spend a

lot of time on the project!

Prof. Leal-Taixé and Prof. Niessner 73

Page 73: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Detection, Segmentation and Tracking

• New lecture (Prof. Leal-Taixé, SS19)

– Must have attended the Intro to DL

– Common detection and segmentation frameworks

(YOLO, Faster-RCNN, Mask-RCNN)

– Extension to videos à tracking

– One project that will last the whole semester

Prof. Leal-Taixé and Prof. Niessner 74

Page 74: Lecture 10 recap - Dynamic Vision and Learning: Home · Lecture 10 recap Prof. Leal -Taixé and Prof. Niessner 1. LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv

Thank you

Visual

Computing

Dynamic

Vision

and

Learning

- 3D scanning

- DL in 3D understanding

- 3D reconstruction

- Video segmentation

- Object tracking

- Camera localization

Niessner Leal-Taixé

Prof. Leal-Taixé and Prof. Niessner 75


Recommended