+ All Categories
Home > Documents > Towards Understanding Long Short Term Memory...

Towards Understanding Long Short Term Memory...

Date post: 21-May-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
50
Towards Understanding Long Short Term Memory Networks HMI 1/28/2019 Jordan Rodu Department of Statistics University of Virginia
Transcript
Page 1: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Towards Understanding Long Short Term Memory Networks

HMI

1/28/2019

Jordan Rodu

Department of Statistics

University of Virginia

Page 2: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Towards Understanding Long Short Term Memory Networks

HMI

1/28/2019

Jordan Rodu

Department of Statistics

University of Virginia with Joao SedocUniversity of PennsylvaniaDepartment of Computer Science

Page 3: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Mapping LSTMs to state space models

• Goal is not to interpret results on specific data

• Rather, map LSTM onto reasonable models- understand the space of sequences captured by LSTMs

• Preliminary work, basic ideas

Page 4: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Models

t t+1 t+2

Page 5: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Models

t t+1 t+2

Page 6: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Models

t t+1 t+2

T T

Page 7: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Models

t t+1 t+2

T T

𝑝 𝑥𝑡+1 𝑥1:𝑡) = 𝑝 𝑥𝑡+1 𝑥𝑡)

Page 8: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Models

t t+1 t+2

O O O

Page 9: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Models

t t+1 t+2

O O O

𝑝 𝑦𝑡 𝑥𝑡)

Page 10: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Models- flavors

• Output• Discrete• Continuous• Low dimensional• High dimensional

• States• Discrete• Continuous

• Low dimensional• High dimensional

• Time• Discrete• Continuous

Page 11: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Models- flavors

• Output• Discrete• Continuous• Low dimensional• High dimensional

• States• Discrete• Continuous

• Low dimensional• High dimensional

• Time• Discrete• Continuous

Page 12: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Models1⋮0

0⋮1

0⋮0

Page 13: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Models

𝑏1⋮𝑏𝑘

Page 14: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Models

𝑏1⋮𝑏𝑘

Page 15: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Models

𝑏1⋮𝑏𝑘

𝑝𝑥1|𝑦⋮

𝑝𝑥𝑘|𝑦

Page 16: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Models

𝑏1⋮𝑏𝑘

𝑝𝑥1|𝑦⋮

𝑝𝑥𝑘|𝑦

෨𝑏1⋮෨𝑏𝑘

Page 17: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Model- belief states

Page 18: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Model- belief states

Page 19: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Model- belief states

Page 20: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

A few related architectures

⋮ ⋮ ⋮

Page 21: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

A few related architectures

⋮ ⋮ ⋮

Page 22: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

A few related architectures

⋮ ⋮ ⋮

Page 23: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

A few related architectures

⋮ ⋮ ⋮

Page 24: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

A few related architectures

⋮ ⋮ ⋮

Page 25: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

A few related architectures

Τ𝑡 Τ𝑡+1 Τ𝑡+2

Page 26: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

A few related architectures

Τ𝑡 Τ𝑡+1 Τ𝑡+2

𝑂𝑡 𝑂𝑡+1 𝑂𝑡+2

Page 27: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

LSTM

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡

Page 28: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

LSTM

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡

Page 29: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

LSTM

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡

𝜎(𝑊ℎℎ𝑡−1 +𝑊𝑦𝑦𝑡 + 𝑏)

Page 30: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

LSTM

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡

𝜎(𝑊ℎℎ𝑡−1 +𝑊𝑦𝑦𝑡 + 𝑏)

Page 31: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden Markov Models-reminder

𝑏1⋮𝑏𝑘

𝑝𝑥1|𝑦⋮

𝑝𝑥𝑘|𝑦

෨𝑏1⋮෨𝑏𝑘

Page 32: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

LSTM

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡

𝜎(𝑊ℎℎ𝑡−1 +𝑊𝑦𝑦𝑡 + 𝑏)

Page 33: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

LSTM

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡

𝜎(𝑊ℎℎ𝑡−1 +𝑊𝑦𝑦𝑡 + 𝑏)

prior (T)

Page 34: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

LSTM

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡

𝜎(𝑊ℎℎ𝑡−1 +𝑊𝑦𝑦𝑡 + 𝑏)

posterior (T and O)

Page 35: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks
Page 36: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

LSTM

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡

𝜎(𝑊ℎℎ𝑡−1 +𝑊𝑦𝑦𝑡 + 𝑏)

Page 37: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Hidden states, 2𝑘 states for k hidden nodes

⋮ ⋮ ⋮

Page 38: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Dependencies specified by weights from 𝑊ℎ

⋮ ⋮ ⋮

Page 39: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Incorporating memory cell

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡

Page 40: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Incorporating memory cell

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡𝐶𝑡 = 𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖𝑡 ∗ ෩𝐶𝑡

Page 41: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Incorporating memory cell

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡𝐶𝑡 = 𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖𝑡 ∗ ෩𝐶𝑡

𝜎(𝑊ℎℎ𝑡−1 +𝑊𝑦𝑦𝑡 + 𝑏)

Page 42: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

⋮ ⋮ ⋮

Page 43: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

⋮ ⋮ ⋮−1, 0, 1

Page 44: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

⋮ ⋮ ⋮−1, 0, 1

State space size 3𝑘 with special symmetry thatmodulates excitation andinhibition

Page 45: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Partially Observable Markov Decision Process

t t+1 t+2T T

*relaxed visualization

O O O

T

𝑎𝑡 𝑎𝑡+1 𝑎𝑡+2

Page 46: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

POMDP representation

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡

Page 47: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

POMDP representation

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡

𝜎(𝑊ℎℎ𝑡−1 +𝑊𝑦𝑦𝑡 + 𝑏)

Page 48: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

POMDP representation

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡

𝜎(𝑊ℎℎ𝑡−1 +𝑊𝑦𝑦𝑡 + 𝑏)

policy

Page 49: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

POMDP representation

𝜎 𝜎 tanh 𝜎

x

x

+

x

tanh

𝑦𝑡

ℎ𝑡

𝜎(𝑊ℎℎ𝑡−1 +𝑊𝑦𝑦𝑡 + 𝑏)

policy

Recall that ourgoal here is notto learn a POMDPor to approximatea POMDP usingLSTMs, rather towrap LSTMs inmodels for which we have a morerobust understanding.

Page 50: Towards Understanding Long Short Term Memory Networks”hmi.virginia.edu/wp-content/uploads/2018/01/jordan-rodu-slides.pdf · Towards Understanding Long Short Term Memory Networks

Thanks!


Recommended