+ All Categories
Home > Documents > Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014....

Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014....

Date post: 01-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
23
Lecture 12 SRTTU A.Akhavan 1 شنبه،۱۰ آذر۱۳۹۷ Lecture 12: Sequence to sequence models Alireza Akhavan Pour CLASS.VISION
Transcript
Page 1: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan 1 ۱۳۹۷آذر ۱۰شنبه،

Lecture 12: Sequence to sequence models

Alireza Akhavan Pour

CLASS.VISION

Page 2: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan

Sequence to sequence model: Introduction and concepts

2 ۱۳۹۷آذر ۱۰شنبه،

Page 3: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan 3 ۱۳۹۷آذر ۱۰شنبه،

Page 4: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan 4 ۱۳۹۷آذر ۱۰شنبه،

Sequence to sequence model

Jane visite l’Afrique en septembre

Jane is visiting Africa in September.

𝑥<1> 𝑥<2> 𝑥<3> 𝑥<4> 𝑥<5>

𝑦<1> 𝑦<2> 𝑦<3> 𝑦<4> 𝑦<5> 𝑦<6>

[Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

[Sutskever et al., 2014. Sequence to sequence learning with neural networks]

𝑎<0>

𝑥<1> 𝑥<𝑇𝑥>

Page 5: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan 5 ۱۳۹۷آذر ۱۰شنبه،

Sequence to sequence model

Jane visite l’Afrique en septembre

Jane is visiting Africa in September.

𝑥<1> 𝑥<2> 𝑥<3> 𝑥<4> 𝑥<5>

𝑦<1> 𝑦<2> 𝑦<3> 𝑦<4> 𝑦<5> 𝑦<6>

[Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

[Sutskever et al., 2014. Sequence to sequence learning with neural networks]

𝑎<0>

𝑥<1> 𝑥<𝑇𝑥>

Encoder Decoder

Page 6: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan 6 ۱۳۹۷آذر ۱۰شنبه،

A cat sitting on a chair𝑦<1> 𝑦<2> 𝑦<3> 𝑦<4> 𝑦<5>

55×55 × 96 27×27 ×96 27×27 ×256 13×13 ×256

11 × 11

s = 4

3 × 3

s = 2

MAX-POOL

5 × 5

same

3 × 3

s = 2

MAX-POOL

13×13 ×384

3 × 3

same

3 × 3

=

13×13 ×384 13×13 ×256 6×6 ×256

3 × 3 3 × 3

s = 2

MAX-POOL

9216

Softmax

1000

4096

4096

[Mao et. al., 2014. Deep captioning with multimodal recurrent neural networks]

[Vinyals et. al., 2014. Show and tell: Neural image caption generator]

[Karpathy and Li, 2015. Deep visual-semantic alignments for generating image descriptions]

𝑦<6>

ො𝑦<𝑇𝑦>

𝑥

ො𝑦<1> ො𝑦<2>

Image captioning

Page 7: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan 7 ۱۳۹۷آذر ۱۰شنبه،

Language model: 𝑎<0>

𝑥<1>

ො𝑦<1> ො𝑦<2> ො𝑦<𝑇𝑦>

𝑥<2>

Machine translation as building a conditional language model

൯𝒑(𝒚<𝟏>, … , 𝒚<𝑻𝒚>

= ො𝑦<1>

Page 8: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan 8 ۱۳۹۷آذر ۱۰شنبه،

Language model:

Machine translation: 𝑎<0>

𝑥<1>

ො𝑦<1>

𝑥<𝑇𝑥>

ො𝑦<𝑇𝑦>

⋯⋯

𝑎<0>

ො𝑦<1> ො𝑦<2> ො𝑦<𝑇𝑦>

Machine translation as building a conditional language model

൯𝒑(𝒚<𝟏>, … , 𝒚<𝑻𝒚>

0وکتور State ی کهencoderایجاد کرده

൯𝒑 𝒚<𝟏>, … , 𝒚<𝑻𝒚> 𝒙<𝟏>, … , 𝒙<𝑻𝒙>Conditional language model

Page 9: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan 9 ۱۳۹۷آذر ۱۰شنبه،

Jane visite l’Afrique en septembre. 𝑃(𝑦<1>, … , 𝑦<𝑇𝑦>| 𝑥)

Jane is visiting Africa in September.

Jane is going to be visiting Africa in September.

In September, Jane will visit Africa.

Her African friend welcomed Jane in September.

arg max𝑦<1>,…,𝑦<𝑇𝑦>

𝑃(ො𝑦<1> , ො𝑦<2> , … , 𝑦<𝑇𝑦>| 𝑥)

Finding the most likely translation

English

French

Page 10: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan 10 ۱۳۹۷آذر ۱۰شنبه،

Jane is visiting Africa in September.

Jane is going to be visiting Africa in September.

𝑎<0>

𝑥<1>

ො𝑦<1>

𝑥<𝑇𝑥>

ො𝑦<𝑇𝑦>

⋯⋯

Why not a greedy search?

arg max𝑦

𝑃(ො𝑦<1> , ො𝑦<2> , … , ො𝑦<𝑇𝑦> | 𝑥)arg max𝑦

𝑃(ො𝑦<1> , ො𝑦<2> , … , ො𝑦<𝑇𝑦> | 𝑥)

𝑃(Jane is going | 𝑥) > 𝑃(Jane is visiting | 𝑥)

Page 11: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan

Beam search

11 ۱۳۹۷آذر ۱۰شنبه،

Page 12: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan 12 ۱۳۹۷آذر ۱۰شنبه،

𝑎<0>

𝑥<1>

ො𝑦<1>

𝑥<𝑇𝑥>

a

in

jane

september

zulu

10000

𝑃(𝑦<1> | 𝑥)

Step 1

Beam search algorithmB = 3 (Beam width)

French English

Page 13: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan 13 ۱۳۹۷آذر ۱۰شنبه،

a

in

jane

september

zulu

Step 1

10000

𝑎<0>

𝑥<1> 𝑥<𝑇𝑥>

𝑎<0>

𝑥<1> 𝑥<𝑇𝑥>

𝑎<0>

𝑥<1> 𝑥<𝑇𝑥>

Step 2

Beam search algorithm

a

aaron

september

zulu

ො𝑦<1>in

ො𝑦<2>

in

𝑃 𝑦<2> 𝑥, "𝑖𝑛")

𝑃 𝑦<1>, 𝑦<2> 𝑥) = 𝑃 𝑦<1> 𝑥) 𝑃 𝑦<2> 𝑥, 𝑦<1>)a

visitingis

zulu

𝑗𝑎𝑛𝑒 ො𝑦<2>𝑃 𝑦<2> 𝑥, "𝑗𝑎𝑛𝑒")

a

zulu

𝑠𝑒𝑝𝑡𝑒𝑚𝑏𝑒𝑟 ො𝑦<2>

(B=3)

Page 14: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan 14 ۱۳۹۷آذر ۱۰شنبه،

a

in

jane

september

zulu

Step 1

10000

𝑎<0>

𝑥<1> 𝑥<𝑇𝑥>

𝑎<0>

𝑥<1> 𝑥<𝑇𝑥>

𝑎<0>

𝑥<1> 𝑥<𝑇𝑥>

Step 2

Beam search algorithm

a

aaron

september

zulu

ො𝑦<1>in

ො𝑦<2>

in

𝑃 𝑦<2> 𝑥, "𝑖𝑛")

𝑃 𝑦<1>, 𝑦<2> 𝑥) = 𝑃 𝑦<1> 𝑥) 𝑃 𝑦<2> 𝑥, 𝑦<1>)a

visitingis

zulu

𝑗𝑎𝑛𝑒 ො𝑦<2>𝑃 𝑦<2> 𝑥, "𝑗𝑎𝑛𝑒")

a

zulu

𝑠𝑒𝑝𝑡𝑒𝑚𝑏𝑒𝑟 ො𝑦<2>

(B=3)

Page 15: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan 15 ۱۳۹۷آذر ۱۰شنبه،

Beam search (𝐵 = 3)in september

jane is

jane visits

𝑎<0>

𝑥<1> 𝑥<𝑇𝑥>

ො𝑦<3>

septemberin

𝑃(𝑦<1>, 𝑦<2>| 𝑥) jane visits africa in september. <EOS>

𝑎<0>

𝑥<1> 𝑥<𝑇𝑥>

ො𝑦<3>

isjane

𝑎<0>

𝑥<1> 𝑥<𝑇𝑥>

ො𝑦<3>

visitsjane

خروجی، احتمال ها را نیز ذخیره کرده ایم3برای هر کدام از این

Page 16: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan

Refinements to beam search

16 ۱۳۹۷آذر ۱۰شنبه،

Page 17: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan

Length normalization

17 ۱۳۹۷آذر ۱۰شنبه،

arg max𝑦ෑ

𝑡=1

𝑇𝑦

𝑃 𝑦<𝑡> 𝑥, 𝑦<1>, … , 𝑦<𝑡−1>)

arg max𝑦

𝑡=1

𝑇𝑦

log 𝑃 𝑦<𝑡> 𝑥, 𝑦<1>, … , 𝑦<𝑡−1>)

𝑡=1

𝑇𝑦

log 𝑃 𝑦<𝑡> 𝑥, 𝑦<1>, … , 𝑦<𝑡−1>)

𝑃(𝑦<1>, … , 𝑦<𝑇𝑦>| 𝑥) = 𝑃(𝑦<1> | 𝑥) P(𝑦<2> | 𝑥, 𝑦<1>) …P(𝑦<𝑇𝑦> | 𝑥, 𝑦<1>, … , 𝑦<𝑇𝑦−1>)

!وابسته به طول خروجی

1

𝑇𝑦𝑇𝑦𝛼

𝜶 = 𝟎. 𝟕

𝜶 = 𝟎 ? 𝜶 = 𝟏 ?

Page 18: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan

Beam search discussion

18 ۱۳۹۷آذر ۱۰شنبه،

Beam width B?

Unlike exact search algorithms like BFS (Breadth First Search) or

DFS (Depth First Search), Beam Search runs faster but is not

guaranteed to find exact maximum for arg max𝑦𝑃(𝑦|𝑥).

you might see in the production setting B=10.

B=100, B=1000 are uncommon (sometimes used in research

settings)

Large B: Better result, slowerSmall B: worse result, faster

Page 19: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan

Error analysis on beam search

19 ۱۳۹۷آذر ۱۰شنبه،

Page 20: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan

Example

20 ۱۳۹۷آذر ۱۰شنبه،

Jane visite l’Afrique en septembre.

Human: Jane visits Africa in September.

Algorithm: Jane visited Africa last September.

𝑎<0>

𝑥<1> 𝑥<𝑇𝑥>

(𝒚∗)

(𝒚)

RNN Beam search

Jane visits Africa …

Page 21: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan

Error analysis on beam search

21 ۱۳۹۷آذر ۱۰شنبه،

Human: Jane visits Africa in September. (𝑦∗)

Algorithm: Jane visited Africa last September. ( ො𝑦)

Case 1:

Beam search chose ො𝑦. But 𝑦∗ attains higher 𝑃 𝑦 𝑥 .

Conclusion: Beam search is at fault.

Case 2:

𝑦∗ is a better translation than ො𝑦. But RNN predicted 𝑃 𝑦∗ 𝑥 < 𝑃 ො𝑦 𝑥 .

Conclusion: RNN model is at fault.

(P(y* | X) > P(y | X))

(P(y* | X) <= P(y | X))

Page 22: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan

Error analysis process

22 ۱۳۹۷آذر ۱۰شنبه،

Jane visits Africa in September.

Jane visited Africa

last September.

Human Algorithm 𝑃 𝑦∗ 𝑥 𝑃 ො𝑦 𝑥 At fault?

Figures out what faction of errors are “due to” beam

search vs. RNN model

Page 23: Lecture 12: Sequence to sequence modelsfall97.class.vision/slides/12.pdf · [Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation]

Lecture 12SRTTU – A.Akhavan 23 ۱۳۹۷آذر ۱۰شنبه،

منابع

• https://www.coursera.org/specializations/deep-learning

• https://towardsdatascience.com/sequence-to-sequence-model-introduction-and-concepts-44d9b41cd42d


Recommended