+ All Categories
Home > Documents > Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios...

Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios...

Date post: 18-Oct-2020
Category:
Upload: others
View: 10 times
Download: 0 times
Share this document with a friend
133
Language Modelling Variational Auto-encoder for Sentences References References Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1 / 46
Transcript
Page 1: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Deep Generative Language ModelsDGM4NLP

Miguel RiosUniversity of Amsterdam

May 2, 2019

Rios DGM4NLP 1 / 46

Page 2: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Outline

1 Language Modelling

2 Variational Auto-encoder for Sentences

Rios DGM4NLP 2 / 46

Page 3: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Recap Generative Models of Word Representation

Rios DGM4NLP 3 / 46

Discriminative embedding modelsword2vec

In the event of a chemical spill, most children know they shouldevacuate as advised by people in charge.

Place words in Rd as to answer questions like

“Have I seen this word in this context?”

Page 4: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Recap Generative Models of Word Representation

Rios DGM4NLP 3 / 46

Discriminative embedding modelsword2vec

In the event of a chemical spill, most children know they shouldevacuate as advised by people in charge.

Place words in Rd as to answer questions like

“Have I seen this word in this context?”

Fit a binary classifier

positive examples

negative examples

Page 5: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Recap Generative Models of Word Representation

The models processes a sentence and outputs a wordrepresentation:

Rios DGM4NLP 4 / 46

Page 6: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Recap Generative Models of Word Representation

Rios DGM4NLP 5 / 46

quickly evacuate the area / deje el lugar rapidamente

x

y

z

a

θ

m

n

|B|

Page 7: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Recap Generative Models of Word Representation

Rios DGM4NLP 5 / 46

quickly evacuate the area / deje el lugar rapidamente

x

y

z

a

θ

m

n

|B|

Page 8: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Recap Generative Models of Word Representation

Rios DGM4NLP 5 / 46

quickly evacuate the area / deje el lugar rapidamente

x

y

z

a

θ

m

n

|B|

Page 9: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Recap Generative Models of Word Representation

Rios DGM4NLP 5 / 46

quickly evacuate the area / deje el lugar rapidamente

x

y

z

a

θ

m

n

|B|

Page 10: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Recap Generative Models of Word Representation

Rios DGM4NLP 5 / 46

quickly evacuate the area / deje el lugar rapidamente

x

y

z

a

θ

m

n

|B|

Page 11: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Recap Generative Models of Word Representation

Rios DGM4NLP 5 / 46

quickly evacuate the area / deje el lugar rapidamente

x

y

z

a

θ

m

n

|B|

Page 12: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Recap Generative Models of Word Representation

Rios DGM4NLP 5 / 46

quickly evacuate the area / deje el lugar rapidamente

x

y

z

a

θ

m

n

|B|

Page 13: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Recap Generative Models of Word Representation

Rios DGM4NLP 5 / 46

quickly evacuate the area / deje el lugar rapidamente

x

y

z

a

θ

m

n

|B|

Page 14: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Recap Generative Models of Word Representation

Rios DGM4NLP 5 / 46

quickly evacuate the area / deje el lugar rapidamente

x

y

z

a

θ

m

n

|B|

Page 15: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Recap Generative Models of Word Representation

x

embedding128d

h: BiRNN100d

u 100d

f(z)

x

s 100d

sample z100d

g(zaj)

y

Generative model

Inference model

Rios DGM4NLP 6 / 46

Page 16: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Introduction

you know nothing, jon x

ground control to major x

the x

Rios DGM4NLP 7 / 46

Page 17: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Introduction

you know nothing, jon x

ground control to major x

the x

Rios DGM4NLP 7 / 46

Page 18: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Introduction

you know nothing, jon x

ground control to major x

the x

Rios DGM4NLP 7 / 46

Page 19: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Introduction

the quick brown x

the quick brown fox x

the quick brown fox jumps x

the quick brown fox jumps over x

the quick brown fox jumps over the x

the quick brown fox jumps over the lazy x

the quick brown fox jumps over the lazy dog

Rios DGM4NLP 8 / 46

Page 20: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Introduction

the quick brown x

the quick brown fox x

the quick brown fox jumps x

the quick brown fox jumps over x

the quick brown fox jumps over the x

the quick brown fox jumps over the lazy x

the quick brown fox jumps over the lazy dog

Rios DGM4NLP 8 / 46

Page 21: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Introduction

the quick brown x

the quick brown fox x

the quick brown fox jumps x

the quick brown fox jumps over x

the quick brown fox jumps over the x

the quick brown fox jumps over the lazy x

the quick brown fox jumps over the lazy dog

Rios DGM4NLP 8 / 46

Page 22: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Introduction

the quick brown x

the quick brown fox x

the quick brown fox jumps x

the quick brown fox jumps over x

the quick brown fox jumps over the x

the quick brown fox jumps over the lazy x

the quick brown fox jumps over the lazy dog

Rios DGM4NLP 8 / 46

Page 23: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Introduction

the quick brown x

the quick brown fox x

the quick brown fox jumps x

the quick brown fox jumps over x

the quick brown fox jumps over the x

the quick brown fox jumps over the lazy x

the quick brown fox jumps over the lazy dog

Rios DGM4NLP 8 / 46

Page 24: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Introduction

the quick brown x

the quick brown fox x

the quick brown fox jumps x

the quick brown fox jumps over x

the quick brown fox jumps over the x

the quick brown fox jumps over the lazy x

the quick brown fox jumps over the lazy dog

Rios DGM4NLP 8 / 46

Page 25: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Introduction

the quick brown x

the quick brown fox x

the quick brown fox jumps x

the quick brown fox jumps over x

the quick brown fox jumps over the x

the quick brown fox jumps over the lazy x

the quick brown fox jumps over the lazy dog

Rios DGM4NLP 8 / 46

Page 26: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Definition

Language models give us the probability of a sentence;

At a time step, they assign a probability to the next word.

Rios DGM4NLP 9 / 46

Page 27: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Definition

Language models give us the probability of a sentence;

At a time step, they assign a probability to the next word.

Rios DGM4NLP 9 / 46

Page 28: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Applications

Very useful on different tasks:

Speech recognition;

Spelling correction;

Machine translation;

LMs are useful in almost any tasks that deals with generatinglanguage.

Rios DGM4NLP 10 / 46

Page 29: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Applications

Very useful on different tasks:

Speech recognition;

Spelling correction;

Machine translation;

LMs are useful in almost any tasks that deals with generatinglanguage.

Rios DGM4NLP 10 / 46

Page 30: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Applications

Very useful on different tasks:

Speech recognition;

Spelling correction;

Machine translation;

LMs are useful in almost any tasks that deals with generatinglanguage.

Rios DGM4NLP 10 / 46

Page 31: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Applications

Very useful on different tasks:

Speech recognition;

Spelling correction;

Machine translation;

LMs are useful in almost any tasks that deals with generatinglanguage.

Rios DGM4NLP 10 / 46

Page 32: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Applications

Very useful on different tasks:

Speech recognition;

Spelling correction;

Machine translation;

LMs are useful in almost any tasks that deals with generatinglanguage.

Rios DGM4NLP 10 / 46

Page 33: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Language Models

N-gram based LMs;

Log-linear LMs;

Neural LMs.

Rios DGM4NLP 11 / 46

Page 34: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Language Models

N-gram based LMs;

Log-linear LMs;

Neural LMs.

Rios DGM4NLP 11 / 46

Page 35: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Language Models

N-gram based LMs;

Log-linear LMs;

Neural LMs.

Rios DGM4NLP 11 / 46

Page 36: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

N-gram LM

x is a sequence of words

x = x1, x2, x3, x4, x5= you, know, nothing, jon, snow

Rios DGM4NLP 12 / 46

Page 37: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

N-gram LM

x is a sequence of words

x = x1, x2, x3, x4, x5= you, know, nothing, jon, snow

Rios DGM4NLP 12 / 46

Page 38: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

N-gram LM

To compute the probability of a sentence

p(x) = p (x1, x2, . . . , xn) (1)

We apply the chain rule:

p(x) =∏i

p (xi |x1, . . . , xi−1) (2)

We limit the history with a Markov order:p (xi |x1, . . . , xi−1) ' p (xi |xi−4, xi−3, xi−2, xi−1)

[Jelinek and Mercer, 1980, Goodman, 2001]Rios DGM4NLP 13 / 46

Page 39: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

N-gram LM

To compute the probability of a sentence

p(x) = p (x1, x2, . . . , xn) (1)

We apply the chain rule:

p(x) =∏i

p (xi |x1, . . . , xi−1) (2)

We limit the history with a Markov order:p (xi |x1, . . . , xi−1) ' p (xi |xi−4, xi−3, xi−2, xi−1)

[Jelinek and Mercer, 1980, Goodman, 2001]Rios DGM4NLP 13 / 46

Page 40: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

N-gram LM

To compute the probability of a sentence

p(x) = p (x1, x2, . . . , xn) (1)

We apply the chain rule:

p(x) =∏i

p (xi |x1, . . . , xi−1) (2)

We limit the history with a Markov order:p (xi |x1, . . . , xi−1) ' p (xi |xi−4, xi−3, xi−2, xi−1)

[Jelinek and Mercer, 1980, Goodman, 2001]Rios DGM4NLP 13 / 46

Page 41: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

N-gram LM

Chain rule:p(x) =

∏i

p (xi |x1, . . . , xi−1) (3)

Rios DGM4NLP 14 / 46

Page 42: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

N-gram LM

Chain rule:p(x) =

∏i

p (xi |x1, . . . , xi−1) (3)

Rios DGM4NLP 14 / 46

Page 43: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

N-gram LM

We make a Markov assumption of conditional independence:

p (xi |x1, . . . , xi−1) ' p (xi |xi−1) (4)

Rios DGM4NLP 15 / 46

Page 44: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

N-gram LM

We make a Markov assumption of conditional independence:

p (xi |x1, . . . , xi−1) ' p (xi |xi−1) (4)

Rios DGM4NLP 15 / 46

Page 45: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

N-gram LM

We make the Markov assumption:p (xi |x1, . . . , xi−1) ' p (xi |xi−1)

If we do not observe the bigram p(xi |xi−1) is 0and the probability of a sentence will be 0.

MLE

pMLE (xi |xi−1) =count (xi−1, xi )

count (xi−1)(5)

Laplace smoothing:

padd1 (xi |xi−1) =count (xi−1, xi ) + 1

count (xi−1) + V(6)

Rios DGM4NLP 16 / 46

Page 46: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

N-gram LM

We make the Markov assumption:p (xi |x1, . . . , xi−1) ' p (xi |xi−1)

If we do not observe the bigram p(xi |xi−1) is 0and the probability of a sentence will be 0.

MLE

pMLE (xi |xi−1) =count (xi−1, xi )

count (xi−1)(5)

Laplace smoothing:

padd1 (xi |xi−1) =count (xi−1, xi ) + 1

count (xi−1) + V(6)

Rios DGM4NLP 16 / 46

Page 47: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

N-gram LM

We make the Markov assumption:p (xi |x1, . . . , xi−1) ' p (xi |xi−1)

If we do not observe the bigram p(xi |xi−1) is 0and the probability of a sentence will be 0.

MLE

pMLE (xi |xi−1) =count (xi−1, xi )

count (xi−1)(5)

Laplace smoothing:

padd1 (xi |xi−1) =count (xi−1, xi ) + 1

count (xi−1) + V(6)

Rios DGM4NLP 16 / 46

Page 48: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

N-gram LM

We make the Markov assumption:p (xi |x1, . . . , xi−1) ' p (xi |xi−1)

If we do not observe the bigram p(xi |xi−1) is 0and the probability of a sentence will be 0.

MLE

pMLE (xi |xi−1) =count (xi−1, xi )

count (xi−1)(5)

Laplace smoothing:

padd1 (xi |xi−1) =count (xi−1, xi ) + 1

count (xi−1) + V(6)

Rios DGM4NLP 16 / 46

Page 49: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Log-linear LM

p(y |x) =exp w · φ(x , y)∑

y ′∈Vyexpw · φ (x , y ′)

(7)

y is the next word and Vy is the vocabulary;

x is the history;

φ is a feature function that returns an n-dimensional vector;

w are the model parameters.

Rios DGM4NLP 17 / 46

Page 50: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Log-linear LM

p(y |x) =exp w · φ(x , y)∑

y ′∈Vyexpw · φ (x , y ′)

(7)

y is the next word and Vy is the vocabulary;

x is the history;

φ is a feature function that returns an n-dimensional vector;

w are the model parameters.

Rios DGM4NLP 17 / 46

Page 51: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Log-linear LM

p(y |x) =exp w · φ(x , y)∑

y ′∈Vyexpw · φ (x , y ′)

(7)

y is the next word and Vy is the vocabulary;

x is the history;

φ is a feature function that returns an n-dimensional vector;

w are the model parameters.

Rios DGM4NLP 17 / 46

Page 52: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Log-linear LM

p(y |x) =exp w · φ(x , y)∑

y ′∈Vyexpw · φ (x , y ′)

(7)

y is the next word and Vy is the vocabulary;

x is the history;

φ is a feature function that returns an n-dimensional vector;

w are the model parameters.

Rios DGM4NLP 17 / 46

Page 53: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Log-linear LM

p(y |x) =exp w · φ(x , y)∑

y ′∈Vyexpw · φ (x , y ′)

(7)

y is the next word and Vy is the vocabulary;

x is the history;

φ is a feature function that returns an n-dimensional vector;

w are the model parameters.

Rios DGM4NLP 17 / 46

Page 54: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Log-linear LM

n-gram features xj−1 = the and xj = puppy.

gappy n-gram features xj−2 = the and xj = puppy.

class features: xj belongs to class ABC;

gazetteer features: xj is a place name;

Rios DGM4NLP 18 / 46

Page 55: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Log-linear LM

n-gram features xj−1 = the and xj = puppy.

gappy n-gram features xj−2 = the and xj = puppy.

class features: xj belongs to class ABC;

gazetteer features: xj is a place name;

Rios DGM4NLP 18 / 46

Page 56: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Log-linear LM

n-gram features xj−1 = the and xj = puppy.

gappy n-gram features xj−2 = the and xj = puppy.

class features: xj belongs to class ABC;

gazetteer features: xj is a place name;

Rios DGM4NLP 18 / 46

Page 57: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Log-linear LM

n-gram features xj−1 = the and xj = puppy.

gappy n-gram features xj−2 = the and xj = puppy.

class features: xj belongs to class ABC;

gazetteer features: xj is a place name;

Rios DGM4NLP 18 / 46

Page 58: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Log-linear LM

With features of words and histories we can share statisticalweight

With n-grams, there is no sharing at all

We also get smoothing for free

We can add arbitrary features

We use Stochastic Gradient Descent (SGD)

Rios DGM4NLP 19 / 46

Page 59: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Log-linear LM

With features of words and histories we can share statisticalweight

With n-grams, there is no sharing at all

We also get smoothing for free

We can add arbitrary features

We use Stochastic Gradient Descent (SGD)

Rios DGM4NLP 19 / 46

Page 60: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Log-linear LM

With features of words and histories we can share statisticalweight

With n-grams, there is no sharing at all

We also get smoothing for free

We can add arbitrary features

We use Stochastic Gradient Descent (SGD)

Rios DGM4NLP 19 / 46

Page 61: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Log-linear LM

With features of words and histories we can share statisticalweight

With n-grams, there is no sharing at all

We also get smoothing for free

We can add arbitrary features

We use Stochastic Gradient Descent (SGD)

Rios DGM4NLP 19 / 46

Page 62: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Log-linear LM

With features of words and histories we can share statisticalweight

With n-grams, there is no sharing at all

We also get smoothing for free

We can add arbitrary features

We use Stochastic Gradient Descent (SGD)

Rios DGM4NLP 19 / 46

Page 63: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Neural LM

n-gram language models have proven to be effective in varioustasks

log-linear models allow us to share weights through features

maybe our history is still too limited, e.g. n-1 words

we need to find useful features

Rios DGM4NLP 20 / 46

Page 64: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Neural LM

n-gram language models have proven to be effective in varioustasks

log-linear models allow us to share weights through features

maybe our history is still too limited, e.g. n-1 words

we need to find useful features

Rios DGM4NLP 20 / 46

Page 65: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Neural LM

n-gram language models have proven to be effective in varioustasks

log-linear models allow us to share weights through features

maybe our history is still too limited, e.g. n-1 words

we need to find useful features

Rios DGM4NLP 20 / 46

Page 66: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Neural LM

n-gram language models have proven to be effective in varioustasks

log-linear models allow us to share weights through features

maybe our history is still too limited, e.g. n-1 words

we need to find useful features

Rios DGM4NLP 20 / 46

Page 67: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

With NN we can exploit distributed representations to allowfor statistical weight sharing.

How does it work:

1 each word is mapped to an embedding: an m-dimensionalfeature vector;

2 a probability function over word sequences is expressed interms of these vectors;

3 we jointly learn the feature vectors and the parameters of theprobability function.

[Bengio et al., 2003]Rios DGM4NLP 21 / 46

Page 68: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

With NN we can exploit distributed representations to allowfor statistical weight sharing.

How does it work:

1 each word is mapped to an embedding: an m-dimensionalfeature vector;

2 a probability function over word sequences is expressed interms of these vectors;

3 we jointly learn the feature vectors and the parameters of theprobability function.

[Bengio et al., 2003]Rios DGM4NLP 21 / 46

Page 69: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

With NN we can exploit distributed representations to allowfor statistical weight sharing.

How does it work:1 each word is mapped to an embedding: an m-dimensional

feature vector;

2 a probability function over word sequences is expressed interms of these vectors;

3 we jointly learn the feature vectors and the parameters of theprobability function.

[Bengio et al., 2003]Rios DGM4NLP 21 / 46

Page 70: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

With NN we can exploit distributed representations to allowfor statistical weight sharing.

How does it work:1 each word is mapped to an embedding: an m-dimensional

feature vector;2 a probability function over word sequences is expressed in

terms of these vectors;

3 we jointly learn the feature vectors and the parameters of theprobability function.

[Bengio et al., 2003]Rios DGM4NLP 21 / 46

Page 71: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

With NN we can exploit distributed representations to allowfor statistical weight sharing.

How does it work:1 each word is mapped to an embedding: an m-dimensional

feature vector;2 a probability function over word sequences is expressed in

terms of these vectors;3 we jointly learn the feature vectors and the parameters of the

probability function.

[Bengio et al., 2003]Rios DGM4NLP 21 / 46

Page 72: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

Similar words are expected to have similar feature vectors:(dog,cat), (running,walking), (bedroom,room)

With this, probability mass is naturally transferred from (1) to(2):

The cat is walking in the bedroom.

The dog is running in the room.

Take-away message:The presence of only one sentence in the training data willincrease the probability of a combinatorial number ofneighbours in sentence space.

Rios DGM4NLP 22 / 46

Page 73: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

Similar words are expected to have similar feature vectors:(dog,cat), (running,walking), (bedroom,room)

With this, probability mass is naturally transferred from (1) to(2):

The cat is walking in the bedroom.

The dog is running in the room.

Take-away message:The presence of only one sentence in the training data willincrease the probability of a combinatorial number ofneighbours in sentence space.

Rios DGM4NLP 22 / 46

Page 74: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

Similar words are expected to have similar feature vectors:(dog,cat), (running,walking), (bedroom,room)

With this, probability mass is naturally transferred from (1) to(2):

The cat is walking in the bedroom.

The dog is running in the room.

Take-away message:The presence of only one sentence in the training data willincrease the probability of a combinatorial number ofneighbours in sentence space.

Rios DGM4NLP 22 / 46

Page 75: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

Similar words are expected to have similar feature vectors:(dog,cat), (running,walking), (bedroom,room)

With this, probability mass is naturally transferred from (1) to(2):

The cat is walking in the bedroom.

The dog is running in the room.

Take-away message:The presence of only one sentence in the training data willincrease the probability of a combinatorial number ofneighbours in sentence space.

Rios DGM4NLP 22 / 46

Page 76: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

Similar words are expected to have similar feature vectors:(dog,cat), (running,walking), (bedroom,room)

With this, probability mass is naturally transferred from (1) to(2):

The cat is walking in the bedroom.

The dog is running in the room.

Take-away message:The presence of only one sentence in the training data willincrease the probability of a combinatorial number ofneighbours in sentence space.

Rios DGM4NLP 22 / 46

Page 77: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

FF-LM

Eyou,Eknow,Enothing ∈ R100

x = [Eyou; Eknow; Enothing] ∈ R300

y = W 3 tanh (W 1x + b1) + W 2x + b2(8)

Rios DGM4NLP 23 / 46

Page 78: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

FF-LM

Eyou,Eknow,Enothing ∈ R100

x = [Eyou; Eknow; Enothing] ∈ R300

y = W 3 tanh (W 1x + b1) + W 2x + b2(8)

Rios DGM4NLP 23 / 46

Page 79: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

FF-LM

The non-linear activation functions perform featurecombinations that a linear model cannot do;

End-to-end training on next word prediction.

Rios DGM4NLP 24 / 46

Page 80: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

FF-LM

The non-linear activation functions perform featurecombinations that a linear model cannot do;

End-to-end training on next word prediction.

Rios DGM4NLP 24 / 46

Page 81: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

FF-LM

The non-linear activation functions perform featurecombinations that a linear model cannot do;

End-to-end training on next word prediction.

Rios DGM4NLP 24 / 46

Page 82: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

FF-LM

We now have much better generalisation, but still a limitedhistory/context.

Recurrent neural networks have unlimited history

Rios DGM4NLP 25 / 46

Page 83: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

FF-LM

We now have much better generalisation, but still a limitedhistory/context.

Recurrent neural networks have unlimited history

Rios DGM4NLP 25 / 46

Page 84: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

FF-LM

We now have much better generalisation, but still a limitedhistory/context.

Recurrent neural networks have unlimited history

Rios DGM4NLP 25 / 46

Page 85: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

FF-LM

We now have much better generalisation, but still a limitedhistory/context.Recurrent neural networks have unlimited history

Rios DGM4NLP 26 / 46

Page 86: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

FF-LM

We now have much better generalisation, but still a limitedhistory/context.

Recurrent neural networks have unlimited history

Rios DGM4NLP 26 / 46

Page 87: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Feed-forward NLM

FF-LM

We now have much better generalisation, but still a limitedhistory/context.Recurrent neural networks have unlimited history

Rios DGM4NLP 26 / 46

Page 88: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

RNN NLM

RNN-LM

Start: predict second word from first

[Mikolov et al., 2010]Rios DGM4NLP 27 / 46

Page 89: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

RNN NLM

RNN-LM

Start: predict second word from first

[Mikolov et al., 2010]Rios DGM4NLP 27 / 46

Page 90: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

RNN NLM

Rios DGM4NLP 28 / 46

Page 91: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Outline

1 Language Modelling

2 Variational Auto-encoder for Sentences

Rios DGM4NLP 29 / 46

Page 92: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Sen VAE

Model observations as draws from the marginal of a DGM.

An NN maps from a latent sentence embedding z ∈ Rdz to adistribution p(x |z , θ) over sentences,

p(x |θ) =∫p(z)p(x |z , θ)dz

=∫N (z |0, I )

∏|x |i=1 Cat (xi |f (z , x<i ; θ))dz

(9)

[Bowman et al., 2015]Rios DGM4NLP 30 / 46

Page 93: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Sen VAE

Model observations as draws from the marginal of a DGM.

An NN maps from a latent sentence embedding z ∈ Rdz to adistribution p(x |z , θ) over sentences,

p(x |θ) =∫p(z)p(x |z , θ)dz

=∫N (z |0, I )

∏|x |i=1 Cat (xi |f (z , x<i ; θ))dz

(9)

[Bowman et al., 2015]Rios DGM4NLP 30 / 46

Page 94: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Sen VAE

z xm1 θ

N

φ

Generative model

Z ∼ N (0, I )

Xi |z , x<i ∼ Cat(fθ(z , x<i ))

Inference model

Z ∼ N (µφ(xm1 ), σφ(xm1 )2)

Rios DGM4NLP 31 / 46

Page 95: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Sen VAE

Generation is one word at a time without Markovassumptions, but f ()conditions on zin addition to the observed prefix.

The conditional p(x |z , θ) is the decoder.

p(x |θ) is the marginal likelihood.

We train the model to assign high (marginal) probability toobservations like a LMs.

However the model uses a latent space to exploitneighbourhood and smoothness in latent space to captureregularities in data space.For example, it may group sentences according to certain e.g.lexical choices, syntactic complexity, lexical semantics, etc...

Rios DGM4NLP 32 / 46

Page 96: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Sen VAE

Generation is one word at a time without Markovassumptions, but f ()conditions on zin addition to the observed prefix.

The conditional p(x |z , θ) is the decoder.

p(x |θ) is the marginal likelihood.

We train the model to assign high (marginal) probability toobservations like a LMs.

However the model uses a latent space to exploitneighbourhood and smoothness in latent space to captureregularities in data space.For example, it may group sentences according to certain e.g.lexical choices, syntactic complexity, lexical semantics, etc...

Rios DGM4NLP 32 / 46

Page 97: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Sen VAE

Generation is one word at a time without Markovassumptions, but f ()conditions on zin addition to the observed prefix.

The conditional p(x |z , θ) is the decoder.

p(x |θ) is the marginal likelihood.

We train the model to assign high (marginal) probability toobservations like a LMs.

However the model uses a latent space to exploitneighbourhood and smoothness in latent space to captureregularities in data space.For example, it may group sentences according to certain e.g.lexical choices, syntactic complexity, lexical semantics, etc...

Rios DGM4NLP 32 / 46

Page 98: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Sen VAE

Generation is one word at a time without Markovassumptions, but f ()conditions on zin addition to the observed prefix.

The conditional p(x |z , θ) is the decoder.

p(x |θ) is the marginal likelihood.

We train the model to assign high (marginal) probability toobservations like a LMs.

However the model uses a latent space to exploitneighbourhood and smoothness in latent space to captureregularities in data space.For example, it may group sentences according to certain e.g.lexical choices, syntactic complexity, lexical semantics, etc...

Rios DGM4NLP 32 / 46

Page 99: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Sen VAE

Generation is one word at a time without Markovassumptions, but f ()conditions on zin addition to the observed prefix.

The conditional p(x |z , θ) is the decoder.

p(x |θ) is the marginal likelihood.

We train the model to assign high (marginal) probability toobservations like a LMs.

However the model uses a latent space to exploitneighbourhood and smoothness in latent space to captureregularities in data space.For example, it may group sentences according to certain e.g.lexical choices, syntactic complexity, lexical semantics, etc...

Rios DGM4NLP 32 / 46

Page 100: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Approximate Inference

The model has a diagonal Gaussian distribution as variationalposterior:qφ(z |x) = N (z |µφ(x), diag (σφ(x)))

With reparametrisation:z = hφ(ε, x) = µφ(x) + σφ(x)� ε, where ε ∼ N (0, I) f

Analytical KL:KL [qφ(z |x)‖pθ(z)] =12

∑Dzd=1

(− log σ2φ(x)− 1 + σ2φ(x) + µ2φ(x)

)

Rios DGM4NLP 33 / 46

Page 101: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Approximate Inference

The model has a diagonal Gaussian distribution as variationalposterior:qφ(z |x) = N (z |µφ(x), diag (σφ(x)))

With reparametrisation:z = hφ(ε, x) = µφ(x) + σφ(x)� ε, where ε ∼ N (0, I) f

Analytical KL:KL [qφ(z |x)‖pθ(z)] =12

∑Dzd=1

(− log σ2φ(x)− 1 + σ2φ(x) + µ2φ(x)

)

Rios DGM4NLP 33 / 46

Page 102: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Approximate Inference

The model has a diagonal Gaussian distribution as variationalposterior:qφ(z |x) = N (z |µφ(x), diag (σφ(x)))

With reparametrisation:z = hφ(ε, x) = µφ(x) + σφ(x)� ε, where ε ∼ N (0, I) f

Analytical KL:KL [qφ(z |x)‖pθ(z)] =12

∑Dzd=1

(− log σ2φ(x)− 1 + σ2φ(x) + µ2φ(x)

)

Rios DGM4NLP 33 / 46

Page 103: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Approximate Inference

We jointly estimate the parameters of both generative andinference by maximising a lowerbound on the log-likelihoodfunction (ELBO):

L(θ, φ|x) = Eq(z|x ,φ)[log p(x |z , θ)].−KL(q(z |x , φ)|p(z))

(10)

Rios DGM4NLP 34 / 46

Page 104: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Architecture

Gaussian Sen VAE parametrises a categorical distribution overthe vocabulary for each given prefix, and, it conditions on alatent embedding:Z ∼ N (0, I ),Xi |z , x<i ∼ Cat (f (z , x<i ; θ))

f (z , x<i ; θ) = softmax (si )

ei = emb (xi ; θemb)

h0 = tanh (affine (z ; θ init ))

hi = GRU (hi−1, ei−1; θgru)

si = affine (hi ; θout)

(11)

Rios DGM4NLP 35 / 46

Page 105: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Architecture

Gaussian Sen VAE parametrises a categorical distribution overthe vocabulary for each given prefix, and, it conditions on alatent embedding:Z ∼ N (0, I ),Xi |z , x<i ∼ Cat (f (z , x<i ; θ))

f (z , x<i ; θ) = softmax (si )

ei = emb (xi ; θemb)

h0 = tanh (affine (z ; θ init ))

hi = GRU (hi−1, ei−1; θgru)

si = affine (hi ; θout)

(11)

Rios DGM4NLP 35 / 46

Page 106: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

The Strong Decoder Problem

The VAE may ignore the latent variable given the interactionbetween the prior and posterior in the KL divergence.

This problem appears when we have strong decodersconditional likelihoods p(x |z) parametrised by high capacitymodels

The model might achieve a high ELBO without usinginformation from z

RNN LM is strong decoder because they condition on allprevious context when generating a word

Rios DGM4NLP 36 / 46

Page 107: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

The Strong Decoder Problem

The VAE may ignore the latent variable given the interactionbetween the prior and posterior in the KL divergence.

This problem appears when we have strong decodersconditional likelihoods p(x |z) parametrised by high capacitymodels

The model might achieve a high ELBO without usinginformation from z

RNN LM is strong decoder because they condition on allprevious context when generating a word

Rios DGM4NLP 36 / 46

Page 108: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

The Strong Decoder Problem

The VAE may ignore the latent variable given the interactionbetween the prior and posterior in the KL divergence.

This problem appears when we have strong decodersconditional likelihoods p(x |z) parametrised by high capacitymodels

The model might achieve a high ELBO without usinginformation from z

RNN LM is strong decoder because they condition on allprevious context when generating a word

Rios DGM4NLP 36 / 46

Page 109: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

The Strong Decoder Problem

The VAE may ignore the latent variable given the interactionbetween the prior and posterior in the KL divergence.

This problem appears when we have strong decodersconditional likelihoods p(x |z) parametrised by high capacitymodels

The model might achieve a high ELBO without usinginformation from z

RNN LM is strong decoder because they condition on allprevious context when generating a word

Rios DGM4NLP 36 / 46

Page 110: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

What to do?

Weakening the Decoder, the model relies on the latentvariables the reconstruction of the observed data.

KL Annealing, weigh the KL term in the ELBO with a factorthat is annealed from 0 to 1 over a fixed number of steps ofsize γ ∈ (0, 1)

Word Dropout, by dropping a percentage of the input atrandom, the decoder has to rely on the latent variable to fill inthe missing gaps.

Freebits because it allows encoding the first r nats ofinformation for free.max(r ,KL(qφ(z |x)‖p(z)))

Rios DGM4NLP 37 / 46

Page 111: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

What to do?

Weakening the Decoder, the model relies on the latentvariables the reconstruction of the observed data.

KL Annealing, weigh the KL term in the ELBO with a factorthat is annealed from 0 to 1 over a fixed number of steps ofsize γ ∈ (0, 1)

Word Dropout, by dropping a percentage of the input atrandom, the decoder has to rely on the latent variable to fill inthe missing gaps.

Freebits because it allows encoding the first r nats ofinformation for free.max(r ,KL(qφ(z |x)‖p(z)))

Rios DGM4NLP 37 / 46

Page 112: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

What to do?

Weakening the Decoder, the model relies on the latentvariables the reconstruction of the observed data.

KL Annealing, weigh the KL term in the ELBO with a factorthat is annealed from 0 to 1 over a fixed number of steps ofsize γ ∈ (0, 1)

Word Dropout, by dropping a percentage of the input atrandom, the decoder has to rely on the latent variable to fill inthe missing gaps.

Freebits because it allows encoding the first r nats ofinformation for free.max(r ,KL(qφ(z |x)‖p(z)))

Rios DGM4NLP 37 / 46

Page 113: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

What to do?

Weakening the Decoder, the model relies on the latentvariables the reconstruction of the observed data.

KL Annealing, weigh the KL term in the ELBO with a factorthat is annealed from 0 to 1 over a fixed number of steps ofsize γ ∈ (0, 1)

Word Dropout, by dropping a percentage of the input atrandom, the decoder has to rely on the latent variable to fill inthe missing gaps.

Freebits because it allows encoding the first r nats ofinformation for free.max(r ,KL(qφ(z |x)‖p(z)))

Rios DGM4NLP 37 / 46

Page 114: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Metrics

For VAE models, Negative log-likelihood NLL is estimated,since we do not have access to the exact marginal likelihood.

We use an importance sampling (IS) estimate://

p(x |θ) =∫p(z , x |θ)dz

IS=∫q(z |x)p(z,x |θ)q(z|x) dz

≈ 1S

∑Ss=1

p(z(s),x |θ)q(z(s)|x)

where z(s) ∼ q(z |x)(12)

Perplexity PPL: the exponent of average per-word entropy,given N i.i.d. sequences

PPL = exp

(∑Ni=1 log p (xi )∑N

i=1 |xi |

)(13)

perplexity is based on the importance sampled NLL

Rios DGM4NLP 38 / 46

Page 115: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Metrics

For VAE models, Negative log-likelihood NLL is estimated,since we do not have access to the exact marginal likelihood.

We use an importance sampling (IS) estimate://

p(x |θ) =∫p(z , x |θ)dz

IS=∫q(z |x)p(z,x |θ)q(z|x) dz

≈ 1S

∑Ss=1

p(z(s),x |θ)q(z(s)|x)

where z(s) ∼ q(z |x)(12)

Perplexity PPL: the exponent of average per-word entropy,given N i.i.d. sequences

PPL = exp

(∑Ni=1 log p (xi )∑N

i=1 |xi |

)(13)

perplexity is based on the importance sampled NLL

Rios DGM4NLP 38 / 46

Page 116: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Metrics

For VAE models, Negative log-likelihood NLL is estimated,since we do not have access to the exact marginal likelihood.

We use an importance sampling (IS) estimate://

p(x |θ) =∫p(z , x |θ)dz

IS=∫q(z |x)p(z,x |θ)q(z|x) dz

≈ 1S

∑Ss=1

p(z(s),x |θ)q(z(s)|x)

where z(s) ∼ q(z |x)(12)

Perplexity PPL: the exponent of average per-word entropy,given N i.i.d. sequences

PPL = exp

(∑Ni=1 log p (xi )∑N

i=1 |xi |

)(13)

perplexity is based on the importance sampled NLL

Rios DGM4NLP 38 / 46

Page 117: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Metrics

For VAE models, Negative log-likelihood NLL is estimated,since we do not have access to the exact marginal likelihood.

We use an importance sampling (IS) estimate://

p(x |θ) =∫p(z , x |θ)dz

IS=∫q(z |x)p(z,x |θ)q(z|x) dz

≈ 1S

∑Ss=1

p(z(s),x |θ)q(z(s)|x)

where z(s) ∼ q(z |x)(12)

Perplexity PPL: the exponent of average per-word entropy,given N i.i.d. sequences

PPL = exp

(∑Ni=1 log p (xi )∑N

i=1 |xi |

)(13)

perplexity is based on the importance sampled NLL

Rios DGM4NLP 38 / 46

Page 118: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Baseline

RNNLM (Dyer et al., 2016)

At each step, an RNNLM parameterises a categoricaldistribution over the vocabulary, i.e.Xi |x<i ∼ Cat (f (x<i ; θ)) and

f (x<i ; θ) = softmax (si ) and

ei = emb (xi ; θemb)

hi = GRU (hi−1, ei−1; θgru)

si = affine (hi ; θout)

(14)

Embedding layer (emb), one (or more) GRU cell(s) (h0 ∈ θ isa parameter of the model), and an affine layer to map fromthe dimensionality of the GRU to the vocabulary size.

Rios DGM4NLP 39 / 46

Page 119: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Baseline

RNNLM (Dyer et al., 2016)

At each step, an RNNLM parameterises a categoricaldistribution over the vocabulary, i.e.Xi |x<i ∼ Cat (f (x<i ; θ)) and

f (x<i ; θ) = softmax (si ) and

ei = emb (xi ; θemb)

hi = GRU (hi−1, ei−1; θgru)

si = affine (hi ; θout)

(14)

Embedding layer (emb), one (or more) GRU cell(s) (h0 ∈ θ isa parameter of the model), and an affine layer to map fromthe dimensionality of the GRU to the vocabulary size.

Rios DGM4NLP 39 / 46

Page 120: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Baseline

RNNLM (Dyer et al., 2016)

At each step, an RNNLM parameterises a categoricaldistribution over the vocabulary, i.e.Xi |x<i ∼ Cat (f (x<i ; θ)) and

f (x<i ; θ) = softmax (si ) and

ei = emb (xi ; θemb)

hi = GRU (hi−1, ei−1; θgru)

si = affine (hi ; θout)

(14)

Embedding layer (emb), one (or more) GRU cell(s) (h0 ∈ θ isa parameter of the model), and an affine layer to map fromthe dimensionality of the GRU to the vocabulary size.

Rios DGM4NLP 39 / 46

Page 121: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Data

Wall Street Journal part of the Penn Treebank corpus

Preprocessing on train-validation-test split as [Dyer et al.,2016]

WSJ section 1-21 training,

Section 23 as test corpus

Section 24 as validation

Rios DGM4NLP 40 / 46

Page 122: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Data

Wall Street Journal part of the Penn Treebank corpus

Preprocessing on train-validation-test split as [Dyer et al.,2016]

WSJ section 1-21 training,

Section 23 as test corpus

Section 24 as validation

Rios DGM4NLP 40 / 46

Page 123: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Data

Wall Street Journal part of the Penn Treebank corpus

Preprocessing on train-validation-test split as [Dyer et al.,2016]

WSJ section 1-21 training,

Section 23 as test corpus

Section 24 as validation

Rios DGM4NLP 40 / 46

Page 124: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Data

Wall Street Journal part of the Penn Treebank corpus

Preprocessing on train-validation-test split as [Dyer et al.,2016]

WSJ section 1-21 training,

Section 23 as test corpus

Section 24 as validation

Rios DGM4NLP 40 / 46

Page 125: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Data

Wall Street Journal part of the Penn Treebank corpus

Preprocessing on train-validation-test split as [Dyer et al.,2016]

WSJ section 1-21 training,

Section 23 as test corpus

Section 24 as validation

Rios DGM4NLP 40 / 46

Page 126: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Results

NLL↓ PPL↓RNN-LM 118.7±0.12 107.1±0.46

VAE 118.4±0.09 105.7±0.36

Annealing 117.9±0.08 103.7±0.31

Free-bits 117.5±0.18 101.9±0.77

Rios DGM4NLP 41 / 46

Page 127: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Samples

decode greedily from a prior sampl and the variability is due tothe generator’s reliance on the latent sample.

The VAE ignores z and greedy generation from a prior sampleis essentially deterministic in that case

Rios DGM4NLP 42 / 46

Page 128: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Samples

decode greedily from a prior sampl and the variability is due tothe generator’s reliance on the latent sample.

The VAE ignores z and greedy generation from a prior sampleis essentially deterministic in that case

Rios DGM4NLP 42 / 46

Page 129: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Samples

Homotopy, decode greedily from points lying between aposterior sample conditioned on the first sentence and aposterior sample conditioned on the last sentence.

zα = α ∗ z1 + (1− α) ∗ z2α ∈ [0, 1]

Rios DGM4NLP 43 / 46

Page 130: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

Samples

Homotopy, decode greedily from points lying between aposterior sample conditioned on the first sentence and aposterior sample conditioned on the last sentence.

zα = α ∗ z1 + (1− α) ∗ z2α ∈ [0, 1]

Rios DGM4NLP 43 / 46

Page 131: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

References I

Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and ChristianJanvin. A neural probabilistic language model. Journal ofMachine Learning Research, 3:1137–1155, 2003. URLhttp://dblp.uni-trier.de/db/journals/jmlr/jmlr3.

html#BengioDVJ03.

Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai,Rafal Jozefowicz, and Samy Bengio. Generating sentences froma continuous space. CoRR, abs/1511.06349, 2015. URLhttp://arxiv.org/abs/1511.06349.

Rios DGM4NLP 44 / 46

Page 132: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

References II

Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah A.Smith. Recurrent neural network grammars. In Proceedings ofthe 2016 Conference of the North American Chapter of theAssociation for Computational Linguistics: Human LanguageTechnologies, pages 199–209, San Diego, California, June 2016.Association for Computational Linguistics. doi:10.18653/v1/N16-1024. URLhttps://www.aclweb.org/anthology/N16-1024.

Joshua T. Goodman. A bit of progress in language modeling.Comput. Speech Lang., 15(4):403–434, October 2001. ISSN0885-2308. doi: 10.1006/csla.2001.0174. URLhttp://dx.doi.org/10.1006/csla.2001.0174.

Rios DGM4NLP 45 / 46

Page 133: Deep Generative Language Models - DGM4NLP · Deep Generative Language Models DGM4NLP Miguel Rios University of Amsterdam May 2, 2019 Rios DGM4NLP 1/46. Language Modelling Variational

Language ModellingVariational Auto-encoder for Sentences

ReferencesReferences

References III

Fred Jelinek and Robert L. Mercer. Interpolated estimation ofMarkov source parameters from sparse data. In Edzard S.Gelsema and Laveen N. Kanal, editors, Proceedings, Workshopon Pattern Recognition in Practice, pages 381–397. NorthHolland, Amsterdam, 1980.

Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Cernocky, andSanjeev Khudanpur. Recurrent neural network based languagemodel. In Takao Kobayashi, Keikichi Hirose, and SatoshiNakamura, editors, INTERSPEECH, pages 1045–1048. ISCA,2010. URL http://dblp.uni-trier.de/db/conf/

interspeech/interspeech2010.html#MikolovKBCK10.

Rios DGM4NLP 46 / 46


Recommended