Long-Short Term Memory Network - WordPress.com · 2017-11-07 · Long-Short Term Recurrent Networks...

transcript

Long-Short Term Memory Network

Hien Van Nguyen

University of Houston

11/6/2017

Why recurrent networks?

• Sequential input, next state depends on previous state

• Generalize to input with variable length

• Consider smaller chunk fewer parameters in model

11/7/2017 Machine Learning 2

What is sequence?

Source: https://uvadlc.github.io/lectures/lecture8.pdf

One-hot vector

Recurrent networks

Unroll through time

Recurrent networks

Unroll through time

Simple recurrent network

• Linear activation

• Gradient:

• 𝑇𝑇 is the number of timestepsconsidered

Problem of Vanishing/Exploding Gradient

• Review of chain rule

• Apply chain rule:

How change in V at step k will affect loss at step t

On the difficulty of training recurrent networks https://arxiv.org/pdf/1211.5063.pdf

• Recall that:

• Using chain rule:

Long-Short Term Recurrent Networks (LSTM)

• Idea: Don’t multiply Multiplication == Vanishing gradients

Instead of multiplying previous hidden state by a matrix to get new state

we add something to old hidden state and get new state (not called “hidden state” but “cell” in LSTM language, explained next)

• Intuition:Not everything is useful to rememberNot every input is useful to takeNot necessary to output each instance

• Comparison of vanilla RNN and LSTM

Vanilla RNN

Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Vanilla RNN

LSTM-Step by Step

Vanilla RNN

LSTM-Gradient Flow

Learning sequence representation:https://d-nb.info/1082034037/34

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

LSTM-Gradient Flow

Applications – Machine Translation

Source: https://uvadlc.github.io/lectures/lecture8.pdf

Applications – Machine Translation

Google Pixel Buds

Applications – Image Captioning

Applications – Question Answering

Applications – Visual Question Answering

Source: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Applications – Visual Question Answering

Long-Short Term Memory Network - WordPress.com · 2017-11-07 · Long-Short Term Recurrent Networks...

Documents