High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3...

transcript

High Level Computer Vision

Recurrent Neural Networks@ June 5, 2019

Bernt Schiele & Mario Fritz

www.mpi-inf.mpg.de/hlcv/ Max Planck Institute for Informatics & Saarland University, Saarland Informatics Campus Saarbrücken

High Level Computer Vision | Bernt Schiele & Mario Fritz

Overview Today’s Lecture

• Recurrent Neural Networks (RNNs) ‣ Motivation & flexibility of RNNs (some recap from last week)

‣ Language modeling - including “unreasonable effectiveness of RNNs”

‣ RNNs for image description / captioning

‣ Standard RNN and a particularly successful RNN: Long Short Term Memory (LSTM) - including “visualizations of RNN cells”

Recurrent Networks offer a lot of flexibility:

slidecredit:AndrejKarpathy

Sequences in VisionSequences in the input… (many-to-one)

JumpingDancingFightingEating

Running

slidecredit:JeffDonahue

Sequences in VisionSequences in the output… (one-to-many)

A happy brown dog.

Sequences in VisionSequences everywhere! (many-to-many)

A dog jumps over a hurdle.

ConvNets

Krizhevsky et al., NIPS 2012

Problem #1

fixed-size, static input

Problem #1

fixed-size, static input

Problem #2

Krizhevsky et al., NIPS 2012

output is a single choice from a fixed list of options

doghorsefishsnake

Problem #2slidecredit:JeffDonahue

output is a single choice from a fixed list of options

a happy brown doga big brown doga happy red doga big red dog

Problem #2

Recurrent Neural Network (RNN)

slide credit: Fei-Fei, Justin Johnson, Serena Yeung

(Simple) Recurrent Neural Network

RNN: Computational Graph

RNN: Computational Graph: Many to Many

High Level Computer Vision | Bernt Schiele & Mario Fritz !24

RNN: Computational Graph: Many to One

RNN: Computational Graph: One to Many

Sequence to Sequence

Language Models

Recurrent Neural Network Based Language Model [Tomas Mikolov, 2010]

Word-level language model. Similar to:

Suppose we had the training sentence “cat sat on mat”

We want to train a language model: P(next word | previous words)

Suppose we had the training sentence “cat sat on mat”

We want to train a language model: P(next word | previous words)

x0 <START>

x1 “cat”

“cat sat on mat”

x2 “sat”

x3 “on”

x4 “mat”

300 (learnable) numbers associated with each word in vocabulary

x0 <START>

x1 “cat”

x2 “sat”

x3 “on”

x4 “mat”

hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4)

x0 <START>

x1 “cat”

x2 “sat”

x3 “on”

x4 “mat”

10,001-D class scores: Softmax over 10,000 words and a special <END> token. y4 = Why * h4

hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4)

x0 <START>

x1 “cat”

x2 “sat”

x3 “on”

x4 “mat”

10,001-D class scores: Softmax over 10,000 words and a special <END> token. y4 = Why * h4

hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4 + Whh * h3)

Recurrent Neural Network:

Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words)

x0 <START>

Generating Sentences...

P(next word | previous words) h0

x0 <START>

x1 “cat”

sample!

x0 <START>

x1 “cat”

x0 <START>

x1 “cat”

x2 “sat”

sample!

x0 <START>

x1 “cat”

x2 “sat”

x0 <START>

x1 “cat”

x2 “sat”

x3 “on”

sample!

x0 <START>

x1 “cat”

x2 “sat”

x3 “on”

x0 <START>

x1 “cat”

x2 “sat”

x3 “on”

x4 “mat”

sample!

x0 <START>

x1 “cat”

x2 “sat”

x3 “on”

x4 “mat”

x0 <START>

x1 “cat”

x2 “sat”

x3 “on”

x4 “mat”

samples <END>? done.

Example…

Learning via Backpropagation…

“The Unreasonable Effectiveness of Recurrent Neural Networks”

karpathy.github.io

Character-level language model example

Vocabulary: [h,e,l,o]

Example training sequence: “hello”

Try it yourself: char-rnn on Github (uses Torch7)

Cooking Recipes

Obama Speeches

Learning from Linux Source Code

Yoav Goldberg n-gram experiments

Order 10 ngram model on Shakespeare:

But on Linux:

“straw hat”

training example

“straw hat”

training example

“straw hat”

training example

“straw hat”

training example

x0 <START>

x1 “straw”

x2 “hat”

<START> straw hat

“straw hat”

training example

x0 <START>

x1 “straw”

x2 “hat”

<START> straw hat

before: h0 = tanh(0, Wxh * x0)

now: h0 = tanh(0, Wxh * x0 + Wih * v)

test image

x0 <START>

<START>

x0 <START>

<START>

test image

x0 <START>

<START>

test image

sample!

x0 <START>

<START>

test image

x0 <START>

<START>

test image

sample!

x0 <START>

<START>

test image

x0 <START>

<START>

test image

sample! <END> token => finish.

Wow I can’t believe that worked

Well, I can kind of see it

Not sure what happened there...

Vanilla RNN...

Long Short Term Memory (LSTM)

Long Short Term Memory (LSTM): Gradient Flow

Alternatives

Summary

Visualizing and Understanding Recurrent Networks Andrej Karpathy*, Justin Johnson*, Li Fei-Fei (on arXiv.org)

Hunting interpretable cells

quote detection cell

line length tracking cell

if statement cell

quote/comment cell

code depth cell

something interesting cell (not quite sure what)

High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3...

Documents