+ All Categories
Home > Documents > New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to...

New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to...

Date post: 11-Jul-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
67
New (Optimization) Perspectives on GANs Gauthier Gidel
Transcript
Page 1: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

New (Optimization) Perspectives on GANs

Gauthier Gidel

Page 2: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

I. A Variational Inequality Perspective on GANs.

II. Reducing Noise in GANs with Variance Reduced Methods.

Page 3: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

A Variational Inequality Perspective on GANs

Gauthier Gidel*¹, Hugo Berard*¹², Gaëtan Vignoud¹, Pascal Vincent¹², Simon Lacoste-Julien¹

*equal contribution¹ Mila, Université de Montréal

² Facebook AI Research (FAIR), Montréal

Page 4: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Hugo Berard

GaëtanVignoud

PascalVincent

SimonLacoste-Julien

Page 5: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

1. Quick Recap on GANs and two-player games.

2. GAN as a Variational Inequality Problem.

3. Optimization of Variational Inequality.

4. Experimental results.

5. Conclusion.

NB: All the citations in this talk are in my arXiv submission.

Page 6: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Quick recap on Generative Adversarial Networks (GANs)

(and two-player games)

Page 7: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Generative Adversarial Networks (GANs)Fake Data

True Data

GeneratorNoise

DiscriminatorFakeorReal

[Goodfelow et al. NIPS 2014]

Page 8: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Generative Adversarial Networks (GANs)

Discriminator Generator

If D is non-parametric:

[Goodfelow et al. NIPS 2014]

Non-saturating GAN: “much stronger gradient in early learning”Loss of Generator Loss of Discriminator

Page 9: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Two-player Games

Zero-sum game if: also called Saddle Point (SP).

Example: WGAN formulation [Arjovsky et al. 2017]

Player 2Player 1

Page 10: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Two-player GamesPlayer 2Player 1

● In games we want to converge to the Saddle Point.

● Different from single objective minimization where

we want to avoid saddle points.

● Saddle point -> Zero-sum game (or Minmax)

Page 11: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Two-player Games

Non zero-sum game if we do not have:

Player 2Player 1

Example: Non-saturating GAN: [Goodfellow et al. 2014]

Loss of Generator Loss of Discriminator

Page 12: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Minmax training is hard different !

Page 13: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Minmax training is hard different !

(You can replace “minmax” with two-player games)

Page 14: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

“Minmax Training is Hard ...”Example: WGAN with linear discriminator and generator

Bilinear saddle point = Linear in 𝜃 and 𝜙 ⇒ “Cycling behavior” (see right).

Gradient vector field:

Page 15: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Generative Adversarial Networks as a Variational Inequality Problem

(VIP)

Page 16: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

GANs as a Variational Inequality

Nash-Equilibrium:

Stationary Conditions:

No player can improve its cost

New perspective for GANs:- Based on stationary conditions.- Relates to vast literature with standard algorithms.

can be constraint sets.

Page 17: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

GANs as a Variational Inequality

Nash-Equilibrium: Stationary Conditions:

Same problem but different perspective.

Joint Minimization vs. Stationary point

Page 18: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

GANs as a Variational InequalityStationary Conditions:

Can be written as:

𝜔* solves the Variational Inequality

Page 19: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

GANs as a Variational InequalityStationary Conditions:

Figure from [Dunn 1979]

Unconstrained (or optimum in the interior):

Page 20: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

GANs as a Variational InequalityStationary Conditions:

Unconstrained (or ⍵* in the interior): Constrained and ⍵* on the boundary:

Figure from [Dunn 1979]

Page 21: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

GANs as a Variational InequalityTakeaways:

- GAN can be formulated as a Variational Inequality.

- Encompass most of GANs formulations.

- Standard algorithms from Variational Inequality can be used for GANs.

- Theoretical Guarantees (for convex and stochastic cost functions).

Page 22: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Techniques to optimize VIP (Batch setting)

Page 23: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Standard Algorithms from Variational InequalityMethod 1: Averaging - Converge even for “cycling behavior”.

- Easy to implement. (out of the training loop)- Can be combined with any method.

Averaging schemes can be efficiently implemented in an online fashion:

Page 24: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Standard Algorithms from Variational InequalityMethod 1: Averaging - Converge even for “cycling behavior”.

- Easy to implement. (out of the training loop)- Can be combined with any method.

General Online averaging:

Example 1: Uniform averaging

Example 2: Exponential moving averaging (EMA)

Page 25: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Standard Algorithms from Variational InequalityMethod 1: Averaging - Converge even for “cycling behavior”.

- Easy to implement. (out of the training loop)- Can be combined with any method.

General Online averaging:

Example 1: Uniform averaging

Example 2: Exponential moving averaging (EMA)

Page 26: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Standard Algorithms from Variational InequalityMethod 1: Averaging - Converge even for “cycling behavior”.

- Easy to implement. (out of the training loop)- Can be combined with any method.

General Online averaging:

Example 1: Uniform averaging

Example 2: Exponential moving averaging (EMA)

Page 27: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Standard Algorithms from Variational InequalityMethod 1: Averaging - Converge even for “cycling behavior”.

- Easy to implement. (out of the training loop)- Can be combined with any method.

General Online averaging:

Example 1: Uniform averaging

Example 2: Exponential moving averaging (EMA)

Page 28: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Standard Algorithms from Variational InequalityMethod 1: Averaging

Simple Minmax problem:

Page 29: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Standard Algorithms from Variational InequalityMethod 1: Averaging

Simple Minmax problem:

Page 30: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Standard Algorithms from Variational InequalityMethod 1: Averaging Simultaneous Vs. Alternating more developed in

Negative Momentum for Improved Game DynamicsGidel, Askari Hemmat, Pezeshki, Lepriol, Huang, Lacoste-Julien and Mitliagkas

Page 31: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Standard Algorithms from Variational InequalityMethod 2: Extragradient

- Step 1:

- Step 2:

Intuition:

1. Game prespective: Look one step in the future and anticipate next move of adversary.

2. Euler’s method: Extrapolation is close to an implicit method because

- Standard in the literature.- Does not require averaging.- Theoretically and empirically

faster.

Page 32: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Standard Algorithms from Variational InequalityMethod 2: Extragradient

Intuition: Extrapolation is close to an implicit method because

Unknown:Require to solve a non-linear system

Page 33: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Standard Algorithms from Variational InequalityMethod 2: Extragradient Intuition: Extrapolation is close to an implicit method

*

*

almost the same

Page 34: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Problem: Extragradient requires to compute two gradients at each step.

Solution: Extrapolation from the past Re-use gradient.

- Step 1: Re-use from previous iteration.

- Step 2: (same as extragradient).

Extrapolation from the past: Re-using the gradients

Page 35: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Extrapolation from the past: Re-using the gradients

Problem: Extragradient requires to compute two gradients at each step.

Solution: Extrapolation from the past Re-use gradient.

- Step 1: Re-use from previous iteration.

- Step 2: (same as extragradient).

New Method !!!Related to [Daskalakis et al., 2018]

Page 36: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

step-size = 0.2step-size = 0.5

Page 37: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Experimental Results

Page 38: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Experimental Results

Bilinear Stochastic Objective: (with constraints)

Page 39: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Extrapolation(Adam style)

Update(Adam style)

Page 40: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Experimental Results: WGAN on CIFAR10Inception Score on CIFAR10

Extragradient Methods

Inception Score vs nb of generator updates

Averaging

Page 41: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Experimental Results: WGAN-GP (ResNet) on CIFAR10

Extragradient Methods Averaging

Inception Score vs Number of

Page 42: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

To sum-up

- GAN can be formulated as a Variational Inequality.

- Bring standard methods from optimization literature to the GAN community.

- Averaging helps improve the inception score (further evidence by [Yazici et al. 2018]).

- Extrapolation is faster and achieve better convergence.

- Introduce Extrapolation from the past a cheaper version of extragradient.

- We can design better algorithm for GANs inspired from Variational Inequality.

Page 43: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Noise in GANs

Page 44: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Reducing Noise in GAN Training with Variance Reduced Extragradient

Tatjana Chavdarova*¹², Gauthier Gidel*¹, François Fleuret¹², Simon Lacoste-Julien¹

*equal contribution¹ Mila, Université de Montréal

² EPFL, IDIAP

Page 45: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Tatjana Chavdarova

François Fleuret

SimonLacoste-Julien

Page 46: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Reminder: Need for Averaging or/and Extragradient.

Page 47: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Reminder: Need for Averaging or/and Extragradient.

No signal from the average iterate.

The green sequence do not stop at the optimum.

We need last iterate convergence.(Not Convergence of the averaged iterate)

Focus on Extragradient.

Page 48: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Issue: We did not consider noise.

Minimization Game

Page 49: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Issue: We did not consider noise.

Far from the objective: “approximately” the right direction

Far from the objective:Direction with noise can be “bad”.

Page 50: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Standard methods to solve (bilinear) games:

Gradient method Extragradient

Batch Method Diverge to ∞

Stochastic Method No hope for convergence ????

Page 51: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Noise breaks Extragradient.

Page 52: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Noise breaks Extragradient.Intuition:

Extragradient Updates:

Page 53: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Noise breaks Extragradient.Intuition:

Extragradient Updates:(Sample i and j)

Extrapolation part

Ai Aj = 0 No extrapolation

Diverge as GD.

Page 54: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Reducing noise with Variance reduction methods.

- Idea: take advantage of the finite sum.

- Finite sum in ML: Expectation of a finite number of sample.

- Generator and discriminator losses can be written as:

Page 55: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

SVRG estimate of the gradient.

- Full batch gradient expensive but tractable.

Page 56: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

SVRG estimate of the gradient.

- Full batch gradient expensive but tractable.

Snapshot network

Page 57: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

SVRG estimate of the gradient.

- Full batch gradient expensive but tractable.

Snapshot network

Full gradient at thesnapshot network

Page 58: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

SVRG estimate of the gradient.

- Full batch gradient expensive but tractable.

- Unbiased estimates:

Snapshot network

Full gradient at thesnapshot network

Page 59: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

SVRG estimate of the gradient.

- Full batch gradient expensive but tractable.

- Unbiased estimates:

- Compute the snapshot only once per pass.

Snapshot network

Full gradient at thesnapshot network

Page 60: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Variance Reduced Extragradient: SVRE

- Combine Extragradient + Variance Reduction for finite sum.

Page 61: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Variance Reduction of Strongly Monotone Games:

SVRG and Acc. SVRG are from [Palaniapan and Bach 2016]

Page 62: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Why is this convergence rate not desirable ?

Vs.

Does not handle Unconstrained case.No restart possible.

Does handle Unconstrained case.Restart possible.

Page 63: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

SVRE on bilinear Game: (Exact example where stochastic extragradient breaks)

Page 64: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

First point, SVRE effectively reduces the variance:

Blue: Stochastic Extragradient

Brown: SVRE.

Page 65: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Second point SVRE allows larger step-sizes: (SVHN)

SE: Stochastic Extragradient.

SVRE: Variance Reduced Extragradient.

-A: Adam

WS: Warm Start.

AVG: Average.

-VRAd (VRam): variant of Adam for SVRE.

Page 66: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

Second point SVRE allows larger step-sizes: (ImageNet)

Page 67: New (Optimization) Perspectives on GANs · - Bring standard methods from optimization literature to the GAN community. - Averaging helps improve the inception score (further evidence

Gauthier Gidel, MSR Seminar, January 29, 2019

To sum-up

- Noise may be an issue in GANs.

- Proposed to combine VR + Extragradient to tackle both game and noise aspects.

- Unlike in single-objective minimization, we observed that variance reduction could improve the performance of deep learning models for GAN training.

- highlights the difference between game optimization and standard minimization.


Recommended