High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3...

Post on 22-May-2020

3 views 0 download

transcript

High Level Computer Vision

Recurrent Neural Networks@ June 5, 2019

Bernt Schiele & Mario Fritz

www.mpi-inf.mpg.de/hlcv/ Max Planck Institute for Informatics & Saarland University, Saarland Informatics Campus Saarbrücken

High Level Computer Vision | Bernt Schiele & Mario Fritz

Overview Today’s Lecture

• Recurrent Neural Networks (RNNs) ‣ Motivation & flexibility of RNNs (some recap from last week)

‣ Language modeling - including “unreasonable effectiveness of RNNs”

‣ RNNs for image description / captioning

‣ Standard RNN and a particularly successful RNN: Long Short Term Memory (LSTM) - including “visualizations of RNN cells”

!2

Recurrent Networks offer a lot of flexibility:

slidecredit:AndrejKarpathy

Sequences in VisionSequences in the input… (many-to-one)

JumpingDancingFightingEating

Running

slidecredit:JeffDonahue

Sequences in VisionSequences in the output… (one-to-many)

A happy brown dog.

slidecredit:JeffDonahue

Sequences in VisionSequences everywhere! (many-to-many)

A dog jumps over a hurdle.

slidecredit:JeffDonahue

ConvNets

Krizhevsky et al., NIPS 2012

slidecredit:JeffDonahue

Problem #1

fixed-size, static input

224

224

slidecredit:JeffDonahue

Problem #1

fixed-size, static input

224

224

???

slidecredit:JeffDonahue

Problem #2

Krizhevsky et al., NIPS 2012

slidecredit:JeffDonahue

output is a single choice from a fixed list of options

doghorsefishsnake

cat

Problem #2slidecredit:JeffDonahue

output is a single choice from a fixed list of options

a happy brown doga big brown doga happy red doga big red dog

Problem #2

slidecredit:JeffDonahue

Recurrent Networks offer a lot of flexibility:

slidecredit:AndrejKarpathy

High Level Computer Vision | Bernt Schiele & Mario Fritz

Recurrent Neural Network (RNN)

!14

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Recurrent Neural Network (RNN)

!15

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Recurrent Neural Network (RNN)

!16

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Recurrent Neural Network (RNN)

!17

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

(Simple) Recurrent Neural Network

!18

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

RNN: Computational Graph

!19

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

RNN: Computational Graph

!20

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

RNN: Computational Graph

!21

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

RNN: Computational Graph: Many to Many

!22

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

RNN: Computational Graph: Many to Many

!23

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz !24

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

RNN: Computational Graph: Many to One

!25

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

RNN: Computational Graph: One to Many

!26

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Sequence to Sequence

!27

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Sequence to Sequence

!28

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Recurrent Networks offer a lot of flexibility:

slidecredit:AndrejKarpathy

Language Models

Recurrent Neural Network Based Language Model [Tomas Mikolov, 2010]

Word-level language model. Similar to:

slidecredit:AndrejKarpathy

Suppose we had the training sentence “cat sat on mat”

We want to train a language model: P(next word | previous words)

i.e. want these to be high: P(cat | [<S>]) P(sat | [<S>, cat]) P(on | [<S>, cat, sat]) P(mat | [<S>, cat, sat, on]) P(<E>| [<S>, cat, sat, on, mat])

slidecredit:AndrejKarpathy

Suppose we had the training sentence “cat sat on mat”

We want to train a language model: P(next word | previous words)

First, suppose we had only a finite, 1-word history: i.e. want these to be high: P(cat | <S>) P(sat | cat) P(on | sat) P(mat | on) P(<E>| mat)

slidecredit:AndrejKarpathy

h0

x0 <START>

y0

x1 “cat”

h1

y1

“cat sat on mat”

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4

300 (learnable) numbers associated with each word in vocabulary

slidecredit:AndrejKarpathy

h0

x0 <START>

y0

x1 “cat”

h1

y1

“cat sat on mat”

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4

300 (learnable) numbers associated with each word in vocabulary

hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4)

slidecredit:AndrejKarpathy

h0

x0 <START>

y0

x1 “cat”

h1

y1

“cat sat on mat”

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4

300 (learnable) numbers associated with each word in vocabulary

10,001-D class scores: Softmax over 10,000 words and a special <END> token. y4 = Why * h4

hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4)

slidecredit:AndrejKarpathy

h0

x0 <START>

y0

x1 “cat”

h1

y1

“cat sat on mat”

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4

300 (learnable) numbers associated with each word in vocabulary

10,001-D class scores: Softmax over 10,000 words and a special <END> token. y4 = Why * h4

hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4 + Whh * h3)

Recurrent Neural Network:

slidecredit:AndrejKarpathy

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words)

x0 <START>

slidecredit:AndrejKarpathy

Generating Sentences...

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

slidecredit:AndrejKarpathy

Generating Sentences...

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

sample!

slidecredit:AndrejKarpathy

Generating Sentences...

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

slidecredit:AndrejKarpathy

Generating Sentences...

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

sample!

slidecredit:AndrejKarpathy

Generating Sentences...

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

slidecredit:AndrejKarpathy

Generating Sentences...

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

sample!

slidecredit:AndrejKarpathy

Generating Sentences...

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

h3

y3

slidecredit:AndrejKarpathy

Generating Sentences...

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

sample!

slidecredit:AndrejKarpathy

Generating Sentences...

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4

slidecredit:AndrejKarpathy

Generating Sentences...

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words) h0

x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4

samples <END>? done.

slidecredit:AndrejKarpathy

Generating Sentences...

High Level Computer Vision | Bernt Schiele & Mario Fritz

Example…

!48

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Example…

!49

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Example…

!50

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Example…

!51

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Example…

!52

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Example…

!53

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Example…

!54

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Learning via Backpropagation…

!55

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Learning via Backpropagation…

!56

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Learning via Backpropagation…

!57

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Learning via Backpropagation…

!58

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

“The Unreasonable Effectiveness of Recurrent Neural Networks”

karpathy.github.io

slidecredit:AndrejKarpathy

Character-level language model example

Vocabulary: [h,e,l,o]

Example training sequence: “hello”

slidecredit:AndrejKarpathy

slidecredit:AndrejKarpathy

slidecredit:AndrejKarpathy

slidecredit:AndrejKarpathy

slidecredit:AndrejKarpathy

Try it yourself: char-rnn on Github (uses Torch7)

slidecredit:AndrejKarpathy

Cooking Recipes

slidecredit:AndrejKarpathy

Obama Speeches

slidecredit:AndrejKarpathy

slidecredit:AndrejKarpathy

slidecredit:AndrejKarpathy

Learning from Linux Source Code

slidecredit:AndrejKarpathy

slidecredit:AndrejKarpathy

slidecredit:AndrejKarpathy

slidecredit:AndrejKarpathy

Yoav Goldberg n-gram experiments

Order 10 ngram model on Shakespeare:

slidecredit:AndrejKarpathy

But on Linux:

slidecredit:AndrejKarpathy

“straw hat”

training example

slidecredit:AndrejKarpathy

“straw hat”

training example

slidecredit:AndrejKarpathy

“straw hat”

training example

X

slidecredit:AndrejKarpathy

“straw hat”

training example

X

h0

x0 <START>

y0

x1 “straw”

h1

y1

x2 “hat”

h2

y2

<START> straw hat

slidecredit:AndrejKarpathy

“straw hat”

training example

X

h0

x0 <START>

y0

x1 “straw”

h1

y1

x2 “hat”

h2

y2

<START> straw hat

before: h0 = tanh(0, Wxh * x0)

now: h0 = tanh(0, Wxh * x0 + Wih * v)

slidecredit:AndrejKarpathy

test image

slidecredit:AndrejKarpathy

test image

x0 <START>

<START>

slidecredit:AndrejKarpathy

h0

x0 <START>

y0

<START>

test image

slidecredit:AndrejKarpathy

h0

x0 <START>

y0

<START>

test image

straw

sample!

slidecredit:AndrejKarpathy

h0

x0 <START>

y0

<START>

test image

straw

h1

y1

slidecredit:AndrejKarpathy

h0

x0 <START>

y0

<START>

test image

straw

h1

y1

hat

sample!

slidecredit:AndrejKarpathy

h0

x0 <START>

y0

<START>

test image

straw

h1

y1

hat

h2

y2

slidecredit:AndrejKarpathy

h0

x0 <START>

y0

<START>

test image

straw

h1

y1

hat

h2

y2

sample! <END> token => finish.

slidecredit:AndrejKarpathy

Wow I can’t believe that worked

slidecredit:JeffDonahue

Wow I can’t believe that worked

slidecredit:JeffDonahue

Well, I can kind of see it

slidecredit:JeffDonahue

Well, I can kind of see it

slidecredit:JeffDonahue

Not sure what happened there...

slidecredit:JeffDonahue

High Level Computer Vision | Bernt Schiele & Mario Fritz

Vanilla RNN...

!94

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Vanilla RNN...

!95

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Vanilla RNN...

!96

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Vanilla RNN...

!97

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Vanilla RNN...

!98

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Vanilla RNN...

!99

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Long Short Term Memory (LSTM)

!100

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Long Short Term Memory (LSTM)

!101

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Long Short Term Memory (LSTM)

!102

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Long Short Term Memory (LSTM): Gradient Flow

!103

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Long Short Term Memory (LSTM): Gradient Flow

!104

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Long Short Term Memory (LSTM): Gradient Flow

!105

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Long Short Term Memory (LSTM): Gradient Flow

!106

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Alternatives

!107

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

High Level Computer Vision | Bernt Schiele & Mario Fritz

Summary

!108

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

Visualizing and Understanding Recurrent Networks Andrej Karpathy*, Justin Johnson*, Li Fei-Fei (on arXiv.org)

slidecredit:AndrejKarpathy

Hunting interpretable cells

Hunting interpretable cells

slidecredit:AndrejKarpathy

Hunting interpretable cells

quote detection cell

slidecredit:AndrejKarpathy

Hunting interpretable cells

line length tracking cell

slidecredit:AndrejKarpathy

Hunting interpretable cells

if statement cell

slidecredit:AndrejKarpathy

Hunting interpretable cells

quote/comment cell

slidecredit:AndrejKarpathy

Hunting interpretable cells

code depth cell

slidecredit:AndrejKarpathy

Hunting interpretable cells

something interesting cell (not quite sure what)

slidecredit:AndrejKarpathy