Long-Short Term Memory Network - WordPress.com · 2017-11-07 · Long-Short Term Recurrent Networks...

Post on 21-May-2020

1 views 0 download

transcript

Long-Short Term Memory Network

Hien Van Nguyen

University of Houston

11/6/2017

Why recurrent networks?

• Sequential input, next state depends on previous state

• Generalize to input with variable length

• Consider smaller chunk fewer parameters in model

11/7/2017 Machine Learning 2

What is sequence?

11/7/2017 Machine Learning 3

Source: https://uvadlc.github.io/lectures/lecture8.pdf

One-hot vector

11/7/2017 Machine Learning 4

Recurrent networks

11/7/2017 Machine Learning 5

Unroll through time

Recurrent networks

11/7/2017 Machine Learning 6

Unroll through time

Simple recurrent network

• Linear activation

• Gradient:

• 𝑇𝑇 is the number of timestepsconsidered

11/7/2017 Machine Learning 7

Problem of Vanishing/Exploding Gradient

• Review of chain rule

• Apply chain rule:

11/7/2017 Machine Learning 8

How change in V at step k will affect loss at step t

On the difficulty of training recurrent networks https://arxiv.org/pdf/1211.5063.pdf

Problem of Vanishing/Exploding Gradient

• Recall that:

• Using chain rule:

11/7/2017 Machine Learning 9

Problem of Vanishing/Exploding Gradient

11/7/2017 Machine Learning 10

Long-Short Term Recurrent Networks (LSTM)

• Idea: Don’t multiply Multiplication == Vanishing gradients

Instead of multiplying previous hidden state by a matrix to get new state

we add something to old hidden state and get new state (not called “hidden state” but “cell” in LSTM language, explained next)

11/7/2017 Machine Learning 11

Long-Short Term Recurrent Networks (LSTM)

• Intuition:Not everything is useful to rememberNot every input is useful to takeNot necessary to output each instance

11/6/2017 Machine Learning 12

Long-Short Term Recurrent Networks (LSTM)

• Comparison of vanilla RNN and LSTM

11/7/2017 Machine Learning 13

Vanilla RNN

LSTM

Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Long-Short Term Recurrent Networks (LSTM)

• Comparison of vanilla RNN and LSTM

11/7/2017 Machine Learning 14

Vanilla RNN

LSTM

LSTM-Step by Step

11/7/2017 Machine Learning 15

LSTM-Step by Step

11/7/2017 Machine Learning 16

LSTM-Step by Step

11/7/2017 Machine Learning 17

LSTM-Step by Step

11/7/2017 Machine Learning 18

LSTM-Step by Step

11/7/2017 Machine Learning 19

Long-Short Term Recurrent Networks (LSTM)

• Comparison of vanilla RNN and LSTM

11/7/2017 Machine Learning 20

Vanilla RNN

LSTM

LSTM-Gradient Flow

11/7/2017 Machine Learning 21

Learning sequence representation:https://d-nb.info/1082034037/34

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

LSTM-Gradient Flow

11/7/2017 Machine Learning 22

Applications – Machine Translation

11/7/2017 Machine Learning 23

Source: https://uvadlc.github.io/lectures/lecture8.pdf

Applications – Machine Translation

11/7/2017 Machine Learning 24

Google Pixel Buds

Applications – Image Captioning

11/7/2017 Machine Learning 25

Applications – Question Answering

11/7/2017 Machine Learning 26

Applications – Visual Question Answering

11/7/2017 Machine Learning 27

Source: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Applications – Visual Question Answering

11/7/2017 Machine Learning 28