Temperature Is All You Need - Centre de Recherches ... · Temperature Is All You Need Massimo...

Post on 18-Jun-2020

1 views 0 download

transcript

Temperature Is All You Need

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin

MILA, Universite de Montreal, RLlab, McGill University & Google Brain

September 1, 2018

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 1 / 19

tl;dr

(lower is better for both metrics)

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 2 / 19

tl;dr

(lower is better for both metrics)

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 2 / 19

tl;dr

The most effective way to evaluate NLG models is to comparethem in quality/diversity space w.r.t multiple temperatures

Maximum-likelihood training is superior (for now) to TextualGANs

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 3 / 19

tl;dr

The most effective way to evaluate NLG models is to comparethem in quality/diversity space w.r.t multiple temperatures

Maximum-likelihood training is superior (for now) to TextualGANs

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 3 / 19

tl;dr

The most effective way to evaluate NLG models is to comparethem in quality/diversity space w.r.t multiple temperatures

Maximum-likelihood training is superior (for now) to TextualGANs

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 3 / 19

MLE trained models generate poor samples

(1) Researchers are expected to comment on where ascheme is sold , but it is no longer this big name at thispoint .

(2) We know you ’ re going to build the kind of homeyou ’ re going to be expecting it can give us a betterunderstanding of what ground test we ’ re on this year, he explained .

Table 1: MLE samples lack in global coherence

It is hypothesized that this is in-part because of the well-known issuesof exposure bias [1], [2].

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 4 / 19

MLE trained models generate poor samples

(1) Researchers are expected to comment on where ascheme is sold , but it is no longer this big name at thispoint .

(2) We know you ’ re going to build the kind of homeyou ’ re going to be expecting it can give us a betterunderstanding of what ground test we ’ re on this year, he explained .

Table 1: MLE samples lack in global coherence

It is hypothesized that this is in-part because of the well-known issuesof exposure bias [1], [2].

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 4 / 19

GAN generate great samples (in images)

Figure 1: GAN generated images from [3]

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 5 / 19

GAN generate great samples (in images)

Figure 1: GAN generated images from [3]

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 5 / 19

GAN visualized

Figure 2: GAN [4] Framework visualized

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 6 / 19

non-differentiable in output space

Difficult to backpropagate through discrete variables

Straight-through estimator [5]

Gumbel softmax [6]

Reinforcement Learning (in particular REINFORCE [7])

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 7 / 19

non-differentiable in output space

Difficult to backpropagate through discrete variables

Straight-through estimator [5]

Gumbel softmax [6]

Reinforcement Learning (in particular REINFORCE [7])

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 7 / 19

Recent advancements in Sequential GANs

SeqGAN [8]: First sequential GAN (trained with REINFORCE)

Figure 3: A cartoon of SeqGAN

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 8 / 19

Recent advancements in Sequential GANs

SeqGAN [8]: First sequential GAN (trained with REINFORCE)

Figure 3: A cartoon of SeqGAN

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 8 / 19

Recent advancements in Sequential GANs

SeqGAN [8]: First sequential GAN (trained with REINFORCE)

Figure 3: A cartoon of SeqGAN

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 8 / 19

Recent advancements in Sequential GANs

MaliGAN [9]: rescale the reward to alleviate the vanishing gradient

RankGAN [10]: replace binary classification with ranking score

LeakGAN [11]: Discriminator leaks information to the Generator

TextGAN [12]: MMD loss + latent regularization

a lot more: DPGAN, GSGAN, IRLGAN, etc

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 9 / 19

Recent advancements in Sequential GANs

MaliGAN [9]: rescale the reward to alleviate the vanishing gradient

RankGAN [10]: replace binary classification with ranking score

LeakGAN [11]: Discriminator leaks information to the Generator

TextGAN [12]: MMD loss + latent regularization

a lot more: DPGAN, GSGAN, IRLGAN, etc

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 9 / 19

Recent advancements in Sequential GANs

MaliGAN [9]: rescale the reward to alleviate the vanishing gradient

RankGAN [10]: replace binary classification with ranking score

LeakGAN [11]: Discriminator leaks information to the Generator

TextGAN [12]: MMD loss + latent regularization

a lot more: DPGAN, GSGAN, IRLGAN, etc

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 9 / 19

Recent advancements in Sequential GANs

MaliGAN [9]: rescale the reward to alleviate the vanishing gradient

RankGAN [10]: replace binary classification with ranking score

LeakGAN [11]: Discriminator leaks information to the Generator

TextGAN [12]: MMD loss + latent regularization

a lot more: DPGAN, GSGAN, IRLGAN, etc

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 9 / 19

Recent advancements in Sequential GANs

MaliGAN [9]: rescale the reward to alleviate the vanishing gradient

RankGAN [10]: replace binary classification with ranking score

LeakGAN [11]: Discriminator leaks information to the Generator

TextGAN [12]: MMD loss + latent regularization

a lot more: DPGAN, GSGAN, IRLGAN, etc

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 9 / 19

Recent advancements in Sequential GANs

MaliGAN [9]: rescale the reward to alleviate the vanishing gradient

RankGAN [10]: replace binary classification with ranking score

LeakGAN [11]: Discriminator leaks information to the Generator

TextGAN [12]: MMD loss + latent regularization

a lot more: DPGAN, GSGAN, IRLGAN, etc

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 9 / 19

flawed evaluation protocol

BLEU [13] is an n-gram overlap metric most popular in MT

To advertise the Textual GANs, corpus-level BLEU was designed

corpus-level BLEU (alone) is really bad

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 10 / 19

flawed evaluation protocol

BLEU [13] is an n-gram overlap metric most popular in MT

To advertise the Textual GANs, corpus-level BLEU was designed

corpus-level BLEU (alone) is really bad

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 10 / 19

flawed evaluation protocol

BLEU [13] is an n-gram overlap metric most popular in MT

To advertise the Textual GANs, corpus-level BLEU was designed

corpus-level BLEU (alone) is really bad

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 10 / 19

flawed evaluation protocol

BLEU [13] is an n-gram overlap metric most popular in MT

To advertise the Textual GANs, corpus-level BLEU was designed

corpus-level BLEU (alone) is really bad

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 10 / 19

quick fix

Self-BLEU [14]: BLEU score between generated sentences

Figure 4: Results taken from [15]. Lower is better for both metrics.

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 11 / 19

quick fix

Self-BLEU [14]: BLEU score between generated sentences

Figure 4: Results taken from [15]. Lower is better for both metrics.

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 11 / 19

temperature tuning

G(yt|y1:t−1) = softmax(ot ·W/α).

ot: generator’s pre-logits activation at tW : word embedding matrixα: temperature parameter

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 12 / 19

temperature tuning

G(yt|y1:t−1) = softmax(ot ·W/α).

ot: generator’s pre-logits activation at tW : word embedding matrixα: temperature parameter

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 12 / 19

temperature tuning

α

2 (1) If you go at watch crucial characters putting awareness in Washington , forgetthere are now unique developments organized personally then why charge .

(2) Front wants zero house blood number places than above spin 5 provide schoolprojects which youth particularly teenager temporary dollars plenty of investors enjoyheaded Japan about if federal assets own , at 41 .

0.7 (1) The other witnesses are believed to have been injured , the police said in a state-ment , adding that there was no immediate threat to any other witnesses .

(2) The company ’ s net income fell to 5 . 29 billion , or 2 cents per share , on thesame period last year .

0 (1) The company ’ s shares rose 1 . 5 percent to 1 . 81 percent , the highest since theend of the year .

(2) The company ’ s shares rose 1 . 5 percent to 1 . 81 percent , the highest since theend of the year .

Table 2: Effect of varying the temperature of the softmax layer in anautoregressive language model

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 13 / 19

Quality/Diversity space w.r.t temperatures

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 14 / 19

Quality/Diversity space w.r.t temperatures

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 14 / 19

New (global) metrics in town

Language Model score (quality): likelihood of the generatedsentences under a LM

Reverse LM score [16] (diversity + quality)

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 15 / 19

New (global) metrics in town

Language Model score (quality): likelihood of the generatedsentences under a LM

Reverse LM score [16] (diversity + quality)

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 15 / 19

New (global) metrics in town

Language Model score (quality): likelihood of the generatedsentences under a LM

Reverse LM score [16] (diversity + quality)

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 15 / 19

Quality/Diversity space w.r.t temperatures (global)

Figure 5: lower is better for both metrics

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 16 / 19

Quality/Diversity space w.r.t temperatures (global)

Figure 5: lower is better for both metrics

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 16 / 19

Conclusion

most effective way to evaluate NLG models is in quality/diversityspace w.r.t. multiples temperatures

MLE training is superior (for now) to adversarial training

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 17 / 19

Conclusion

most effective way to evaluate NLG models is in quality/diversityspace w.r.t. multiples temperatures

MLE training is superior (for now) to adversarial training

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 17 / 19

Conclusion

most effective way to evaluate NLG models is in quality/diversityspace w.r.t. multiples temperatures

MLE training is superior (for now) to adversarial training

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 17 / 19

Future Work

What about adversarial latent space regularization ?

Temperature Is All You Need for Image Generation ?

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 18 / 19

Future Work

What about adversarial latent space regularization ?

Temperature Is All You Need for Image Generation ?

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 18 / 19

Future Work

What about adversarial latent space regularization ?

Temperature Is All You Need for Image Generation ?

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 18 / 19

References

Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba.

Sequence level training with recurrent neural networks.arXiv preprint arXiv:1511.06732, 2015.

Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer.

Scheduled sampling for sequence prediction with recurrent neural networks.In Advances in Neural Information Processing Systems, pages 1171–1179, 2015.

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen.

Progressive growing of gans for improved quality, stability, and variation.arXiv preprint arXiv:1710.10196, 2017.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair,

Aaron Courville, and Yoshua Bengio.Generative adversarial nets.In Advances in neural information processing systems, pages 2672–2680, 2014.

Yoshua Bengio, Nicholas Leonard, and Aaron Courville.

Estimating or propagating gradients through stochastic neurons for conditional computation.arXiv preprint arXiv:1308.3432, 2013.

Eric Jang, Shixiang Gu, and Ben Poole.

Categorical reparameterization with gumbel-softmax.arXiv preprint arXiv:1611.01144, 2016.

Ronald J Williams.

Simple statistical gradient-following algorithms for connectionist reinforcement learning.Machine learning, 8(3-4):229–256, 1992.

Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu.

Seqgan: Sequence generative adversarial nets with policy gradient.2017.

Tong Che, Yanran Li, Ruixiang Zhang, R Devon Hjelm, Wenjie Li, Yangqiu Song, and Yoshua

Bengio.Maximum-likelihood augmented discrete generative adversarial networks.arXiv preprint arXiv:1702.07983, 2017.

Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, and Ming-Ting Sun.

Adversarial ranking for language generation.In Advances in Neural Information Processing Systems, pages 3155–3165, 2017.

Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang.

Long text generation via adversarial training with leaked information.arXiv preprint arXiv:1709.08624, 2017.

Yizhe Zhang, Zhe Gan, Kai Fan, Zhi Chen, Ricardo Henao, Dinghan Shen, and Lawrence Carin.

Adversarial feature matching for text generation.arXiv preprint arXiv:1706.03850, 2017.

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu.

Bleu: a method for automatic evaluation of machine translation.In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318.Association for Computational Linguistics, 2002.

Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and Yong Yu.

Texygen: A benchmarking platform for text generation models.SIGIR, 2018.

Sidi Lu, Yaoming Zhu, Weinan Zhang, Jun Wang, and Yong Yu.

Neural text generation: Past, present and beyond.arXiv preprint arXiv:1803.07133, 2018.

Junbo Jake Zhao, Yoon Kim, Kelly Zhang, Alexander M Rush, and Yann LeCun.

Adversarially regularized autoencoders for generating discrete structures.CoRR, abs/1706.04223, 2017.

Massimo Caccia*, Lucas Caccia*, William Fedus,Hugo Larochelle, Joelle Pineau, Laurent Charlin (MILA, Universite de Montreal, RLlab, McGill University & Google Brain)Temperature Is All You Need September 1, 2018 19 / 19