+ All Categories
Home > Documents > Recurrent Neural Networks (RNN) - Mines ParisTechperso.mines-paristech.fr/fabien.moutarde/ES... ·...

Recurrent Neural Networks (RNN) - Mines ParisTechperso.mines-paristech.fr/fabien.moutarde/ES... ·...

Date post: 21-May-2020
Category:
Upload: others
View: 47 times
Download: 0 times
Share this document with a friend
16
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 1 Deep -Learning: Recurrent Neural Networks (RNN) Pr. Fabien MOUTARDE Center for Robotics MINES ParisTech PSL Université Paris [email protected] http://people.mines-paristech.fr/fabien.moutarde Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 2 Acknowledgements During preparation of these slides, I got inspiration and borrowed some slide content from several sources, in particular: Fei-Fei Li + J.Johnson + S.Yeung: slides on Recurrent Neural Networksfrom the Convolutional Neural Networks for Visual Recognitioncourse at Stanford http ://cs231n.stanford.edu/slides/2019/cs231n_2019_lecture10.pdf Yingyu Liang: slides on Recurrent Neural Networksfrom the Deep Learning Basicscourse at Princeton https ://www.cs.princeton.edu/courses/archive/spring16/cos495/slides/DL _lecture 9_RNN.pdf Arun Mallya: slides Introduction to RNNsfrom the Trends in Deep Learning and Recognitioncourse of Svetlana LAZEBNIK at University of Illinois at Urbana-Champaign http ://slazebni.cs.illinois.edu/spring17/lec02_rnn.pdf Tingwu Wang: slides on Recurrent Neural Networkfor a course at University of Toronto https ://www.cs.toronto.edu/%7Etingwuwang/rnn_tutorial.pdf Christopher Olah: online tutorial Understanding LSTM Networkshttps ://colah.github.io/posts/2015-08-Understanding-LSTMs/
Transcript

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 1

Deep-Learning:

Recurrent Neural Networks (RNN)

Pr. Fabien MOUTARDECenter for Robotics

MINES ParisTech

PSL Université Paris

[email protected]

http://people.mines-paristech.fr/fabien.moutarde

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 2

Acknowledgements

During preparation of these slides, I got inspiration and borrowedsome slide content from several sources, in particular:

• Fei-Fei Li + J.Johnson + S.Yeung: slides on “Recurrent Neural Networks”from the “Convolutional Neural Networks for Visual Recognition” course atStanfordhttp://cs231n.stanford.edu/slides/2019/cs231n_2019_lecture10.pdf

• Yingyu Liang: slides on “Recurrent Neural Networks” from the “DeepLearning Basics” course at Princeton

https://www.cs.princeton.edu/courses/archive/spring16/cos495/slides/DL_lecture9_RNN.pdf

• Arun Mallya: slides “Introduction to RNNs” from the “Trends in DeepLearning and Recognition” course of Svetlana LAZEBNIK at University ofIllinois at Urbana-Champaign

http://slazebni.cs.illinois.edu/spring17/lec02_rnn.pdf

• Tingwu Wang: slides on “Recurrent Neural Network” for a course atUniversity of Toronto

https://www.cs.toronto.edu/%7Etingwuwang/rnn_tutorial.pdf

• Christopher Olah: online tutorial “Understanding LSTM Networks”https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 3

Outline

• Standard Recurrent Neural Networks

• Training RNN: BackPropagation Through Time

• LSTM and GRU

• Applications of RNNs

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 4

Recurrent Neural Networks (RNN)

Time-delayfor each connection

Equivalent form

f f

0

1

11

2

x2

output

x1

x3

input

S S

00

output

f

f

x2(t)

x1(t)

x3(t)

input

1

x2(t-1)

1

x3(t-1)

x2(t-1)

1

x2(t-2)

1S

SS

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 5

Canonical form of RNN

Non-recurrent network

U(t)External input

............

..................

Output Y(t)

1 1 1.......

........

X(t-1)State variables

X(t)State variables

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 6

Time unfolding of RNN

State variablesat time tOutput at time t :

Non-recurrent network

External input at time t :State variablesat time t-1Output at time t-1 :

External input at time t-1 :State variablesat time t-2Output at time t-2 :

External input at time t-2 :

State variablesat time t-3

Non-recurrent network

Non-recurrent network

Y(t)

Y(t-1)

Y(t-2)

U(t-2)

U(t-1)

U(t)

X(t-2)

X(t-3)

X(t-1)

X(t)

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 7

Dynamic systems & RNN

If using a Neural Net for f, this is EXACTLY a RNN!

Figures from Deep Learning, Goodfellow, Bengio and Courville

!("#$) = % !("), &("#$)

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 8

Standard (“vanilla”) RNN

State vector s ßà vector h of hidden neurons

ou yt=softMax(Whyht)

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 9

Advantages of RNN

The hidden state s of the RNN builds a kind of

lossy summary of the past

RNN totally adapted to processing SEQUENTIAL

data (same computation formula applied at each

time step, but modulated by the evolving “memory”

contained in state s)

Universality of RNNs: any function computable by

a Turing Machine can be computed by a finite-size

RNN (Siegelmann and Sontag, 1995)

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 10

RNN hyper-parameters

• As for MLP, main hyperparameter =

size of hidden layer (=size of vector h)

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 11

Outline

• Standard Recurrent Neural Networks

• Training RNN: BackPropagation Through Time

• LSTM and GRU

• Applications of RNNs

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 12

RNN training

• BackPropagation Through Time (BPTT)

gradients update for a whole sequence

• or Real Time Recurrent Learning (RTRL)

gradients update for each frame in a sequence

t+1t

t+2 t+3

t+4

Temporal sequence

W(t)W(t)W(t)

e3 e4W(t)

W(t+4)

Horizon Nt = 4

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 13

BackPropagationTHROUGH TIME (BPTT)

• Forward through entire sequence to compute SUM of losses at ALL (or part of) time steps

• Then backprop through ENTIRE sequence to compute gradients

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 14

BPTT computation principle

Xd(t-1)

bloc 1

W(t)

X(t)

U(t) U(t+1)

bloc 2

W(t)

X(t+1)

U(t+2)

bloc 3

W(t)

X(t+2)

D(t+2)D(t+1)

dE/dXn+1dE/dXn

dW3dW2dW1

dW = dW1 + dW2 + dW3

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 15

BPTT algorithm

W(t+Nt) = W(t) - l gradW(E) avec E = St (Y

t-D

t)2

'*, +-.+/ = +-.+0.

+0.+1.23

+1.23+/

+-+/ =4

.53

6 +-.+/

Feedforward

Network

U(t) Y(t)

X(t)

delay

X(t-1)

state

+1.+/ = 4

753

.23 +1.+1.27

+1.27+/

and (chain rule)

+1.+1.27 =8

953

. +19+1923

Jacobian matrix of the

Feedforward net

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 16

Vanishing/exploding gradient problem

• If eigenvalues of Jacobian matrix >1, then gradients tend

to EXPLODE

è Learning will never converge.

• Conversely, if eigenvalues of Jacobian matrix <1,

then gradients tend to VANISH

è Error signals can only affect small time lags

è short-term memory.

èPossible solutions for exploding gradient: CLIPPING trick

è Possible solutions for vanishing gradient: – use ReLU instead of tanh– change what is inside the RNN!

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 17

Outline

• Standard Recurrent Neural Networks

• Training RNN: BackPropagation Through Time

• LSTM and GRU

• Applications of RNNs

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 18

Long Short-Term Memory (LSTM)

LSTM = RNN variant for solving this issue(proposed by Hochreiter & Schmidhuber in 1997)

• Key idea = use “gates” that modulate respective influences of input and memory

Problem of standard RNNs =

no actual LONG-TERM memory

[Figures from https://colah.github.io/posts/2015-08-Understanding-LSTMs/]

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 19

LSTM gates

Gate = pointwise multiplication by s in ]0;1[è modulate between “let nothing through”

and “let everything through”

• FORGET gate

• INPUT gate

è next state = mix betweenpure memory or pure new

[Figures from https://colah.github.io/posts/2015-08-Understanding-LSTMs/]

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 20

LSTM summary

• OUTPUT gate

ALL weigths Wf, Wi, Wc and Wo(and biases) are LEARNT

[Figure from Deep Learning book by I. Goodfellow, Y. Bengio & A. Courville]

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 21

Why LSTM avoids vanishing gradients?

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 22

Gated Recurrent Unit (GRU)

Simplified variant of LSTM, with only 2 gates:a RESET gate & an UPDATE gate

(proposed by Cho, et al. in 2014)

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 23

Outline

• Standard Recurrent Neural Networks

• Training RNN: BackPropagation Through Time

• LSTM and GRU

• Applications of RNNs

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 24

Typical usages of RNNs

Sequence

Sequence to Sequence

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 25

Combining RNN with CNN

Input into RNN the features from

last convolutional layer

For example,

for image captioning

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 26

Deep RNNs

Several RNNs stacked (like layers in MLP)

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 27

Bi-directional RNNs

(e.g. for offline classification of sequence of words)

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 28

Encoder-decoder RNN

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 29

Applications of RNN/LSTM

Wherever data is intrinsicly SEQUENTIAL

• Speech recognition• Natural Language Processing (NLP)

– Machine-Translation

– Image caption generator

• Gesture recognition

• Music generation

• Potentially any kind of time-series!!

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 30

Summary and perspectives on Recurrent Neural Networks

• For SEQUENTIAL data

(speech, text, …, gestures, …)

• Impressive results in

Natural Language Processing (in particular

Automated Real-Time Translation)

• Training of standard RNNs can be tricky

(vanishing gradient…)

• LSTM / GRU now more used than standard RNNs

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 31

Any QUESTIONS ?


Recommended