+ All Categories
Home > Documents > Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking...

Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking...

Date post: 27-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
29
4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 1 of 34 Benchmarking The ATM Algorithm on the BBOB 2009 Noiseless Function Testbed Benjamin Bodner Brown University Providence, RI, USA BBOB Workshop GECCO 2019 Prague
Transcript
Page 1: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 1 of 34

Benchmarking The ATM Algorithm

on the BBOB 2009 Noiseless Function Testbed

Benjamin BodnerBrown University

Providence, RI, USA

BBOB Workshop

GECCO 2019

Prague

Page 2: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Content

Motivation Intuition

Introduction01 BBOB NoiselessBBOB Large-scale

Internal runtime

Results03

Parameters & main equations

Parameter adaptationResource allocation

Main Components02 Recent progress

Goals moving forwardConclusions

Summary042 of 34Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed4/1/2020

Page 3: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Motivation• Growing need for

optimization methods for

very high-dimensional

settings

Image from:

https://towardsdatascience.com/why-

deep-learning-is-needed-over-traditional-

machine-learning-1b6a99177063

Optimization

Algorithms• Problems commonly

have 10^5- 10^8

optimizable variables[Devlin et al. 2019]

Deep Learning

Physical Sciences

Image from GOMC:

https://gomc-

wsu.github.io/Manual/index.html

12/7/2019Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 3 of 34

Page 4: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Motivation

Deep

Learning

Gradient-based

optimization methods can

create many difficulties

[Shalev-Shwartz et al. 2017]

Current ways of

mitigating these issues

Vanishing gradients

Getting stuck in

local minima

Hyperparameter

tuning

Image from: https://towardsdatascience.com/gradient-descent-algorithm-and-its-variants-10f652806a3

Noise

Image from [He et al. 2015]),

Architecture

Design

Regularization

Image from:

Srivastava, Nitish, et al. 2014

[Sutskever 2013]12/7/2019

Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 4 of 34

Do not always work

Page 5: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Motivation

Image by Thomas Splettstoesser:

https://www.behance.net/gallery/10952399/Protein-

Folding-Funnel

Image from GOMC:

https://gomc-wsu.github.io/Manual/index.html• Functions are non-convex

• Notoriously have large

numbers of local minima [Nichita 2002]

Image from:

https://en.wikibooks.org/wiki/Structural_Bioch

emistry/Proteins/Protein_Folding_Problem

Physical Sciences

• Simulated annealing and

quasi-Newton methods can be slow

• Do not always converge to the global minima [Hao et al. 2015]

5 of 34Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed12/7/2019

Interacting Particles

Protein Folding

Page 6: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Motivation

Existing algorithms have

been highly successful

in these settings

Characteristics intentionally

designed into the BBOB

function testbeds

[BIPOP CMA-ES, Hansen 2009]

Covariance matrices and

Hessians limit their scalability

capabilities

Key components

and operations are

usually of order D^2 Images from:

Finck, Hansen, Ros, Auger 201512/7/2019

Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34

Page 7: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Proposal

Eliminate the use of D^2 objects

and operations

Adaptive Two Mode (ATM) Algorithm

A black box optimization algorithm which

only maintains objects and executes

operations of order D

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 7 of 34

Page 8: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

The Adaptive Two Mode Algorithm

Directional

distribution

Isotropic

distribution

• The two modes complement each other

• ATM uses a set of rules to control the amplitudes

and interactions between the modes

Uses a combination of two kinds of search distributions / “modes”

Exploitation Exploration

8 of 34Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed12/7/2019

Page 9: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

ATM Algorithm

Start from

isotropic

distribution If sample leads

to improvement:Suggest samples

in that direction

If new samples also lead to improvement:

Sample in same direction at exponentially increasing amplitude

Once no more

“good” samples

are found:

Start over with the

isotropic search

(using an evolutionary strategy)

1

4 3

2

Best Sample

Best Sample

from last step

Regular Sample

12/7/2019 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 9 of 34

Repeat

Page 10: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Parameters of the AlgorithmThere are (currently) 8 parameters which play several roles in the ATM algorithm:

• Controlling the growth factors of the modes:

• Controlling the amplitudes of the modes

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 10 of 34

𝑖𝑓 𝑋𝑏𝑒𝑠𝑡𝑡 − 𝑋𝑏𝑒𝑠𝑡𝑡−12> 𝛥𝑋𝑚𝑖𝑛

2 : 𝑑𝑜 𝑑 += 1, 𝑟 = 0

𝑒𝑙𝑠𝑒: 𝑑𝑜 𝑟 += 1, 𝑑 = 0

𝑅 = 𝑅𝑚𝑎𝑥 exp 𝐺𝑟 sin 𝑚𝑜𝑑𝜋𝑟

2 𝑇𝑟,𝜋

2− 1

𝐷 = 𝑅𝑚𝑎𝑥exp 𝐺𝑑𝑑 − 𝐷𝑑𝑟

Page 11: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Parameters of the Algorithm

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 11 of 24

• Controlling the search distribution in different axis:

𝑺 = 𝛽𝑺 + 1 − 𝛽 𝑚𝑒𝑎𝑛𝑶 − 𝑂𝐺𝑏𝑒𝑠𝑡𝑿 − 𝑿𝐺𝑏𝑒𝑠𝑡𝑡

2

𝑨 =𝛼

𝑺 + 𝛼2

Page 12: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

𝑂𝑃 =(𝑚𝑒𝑎𝑛 Δ𝑂𝑃𝑏𝑒𝑠𝑡 +𝑚𝑖𝑛 Δ𝑂𝑃𝑏𝑒𝑠𝑡 )

2

Online Parameter Tuning

𝛥𝑂𝑃𝑏𝑒𝑠𝑡 = 𝐵𝑒𝑠𝑡 𝑐ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑡ℎ𝑒𝑡𝑟𝑢𝑒 𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛,𝑓𝑜𝑢𝑛𝑑 𝑏𝑦 𝑡ℎ𝑒 𝑝𝑎𝑟𝑎𝑚𝑡𝑒𝑟 𝑠𝑒𝑡

Changing

characteristics at

different stages

Different

functions

Need for online

parameter tuning

• 4 intertwined parameter sets

• Parameter sets are optimized by

another Two-Mode algorithm

• Objective function designed to

reflect the “success” at the task of

minimizing the true objective function

How to do this?+

12/7/2019 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 13 of 34

Page 13: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Problem with Online Tuning

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 14 of 34

New

parameter

sets

Changing

local search

space

Good chance

for unsuitable

sets

+

ProposalFewer resources to “bad” parameter sets

more resources to better ones

Resources allocated to

parameter set

Performance of

parameter set

Page 14: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Parallel Optimization with Resource Allocation Given a fixed number of samples 𝑁𝑡𝑜𝑡, distributed among 𝑚 parameter sets.

Change the allocation of samples to reflect their performance

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 15 of 34

𝑵𝑡+1 = 𝑵𝑡 − 𝐾 𝑀−1 𝜟𝑶𝑷𝒃𝒆𝒔𝒕𝒕 − K0M−1 𝐍t − 𝐍0

𝑵𝒕 = 𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒 𝑎𝑙𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑣𝑒𝑐𝑡𝑜𝑟 𝑎𝑡 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 𝑡

Page 15: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Parallel Optimization with Resource Allocation – Choice of Matrices

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 16 of 34

K =

𝑚 − 1 ∗ 𝐾 −𝐾 ⋯ −𝐾−𝐾 𝑚 − 1 ∗ 𝐾 ⋯ −𝐾⋮ ⋮ ⋱ ⋮

−𝐾 −𝐾 ⋯ 𝑚 − 1 ∗ 𝐾𝐾0 = 𝑘0 𝐼𝑀 = 𝜇 𝐼

• Conserves the total number of samples

• Merit-based allocation system

𝑵𝑡+1 = 𝑵𝑡 − 𝐾 𝑀−1 𝜟𝑶𝑷𝒃𝒆𝒔𝒕𝒕 − K0M−1 𝐍t − 𝐍0

Page 16: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 20 of 34

Information Flow Throughout ATM Components

Resource allocation

Parameter

Set3

Parameter

Set4

Evaluate

Samples

Parameter

Set1Parameter

Set2

Values of

objective

function

Suggestions for samples

Repeat

Page 17: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

ATM Optimization Process

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 21 of 24

Sum Of Different Powers - f14Sharp Ridge - f13Rotated Ellipse - f10

Page 18: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Succeeds at

solving:

• 23/24 in 2D

• 8/24 in 40D

Results on BBOB Testbed - Overview

• Underperforms on

non-separable functions

• Especially if ill-conditioned

and/or noisy

One of the best

optimizers for the

separable

functions subset

(f1-5)

+Large budget

Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed4/1/2020 24 of 34

Page 19: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Results on BBOB Testbed - Successes

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 25 of 34

• Very effective at optimizing separable functions

• Capable at optimizing functions with “large” regions

around the global minima which are convex(“large” = comparable to 𝑅𝑚𝑎𝑥)

Page 20: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Results on BBOB Testbed Underperformance

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 26 of 34

• Poor performance on

rotated and ill

conditioned functions

• Poor performance

rotated and noisy/

multimodal functions

Page 21: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Results from BBOB Largescale

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 27 of 34

Budget = 3000D

Page 22: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Ability to Scale to Large Search Spaces

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 28 of 34

Internal runtime of the ATM algorithm scales linearly

as a function of the number of variables in the search space

Results from timing experiment:

• Internal Runtime =

Total Runtime – Evaluation time

• 128 function evaluations,

averaged over 3 runs

• f1 sphere function

• Number of variables to pass

1.0 sec internal runtime

• Google Colab GPU

CMA(pip install CMA)

BFGS

(scipy.optimize)

Internal Runtime as a Function of Number of Variables in Search Space

Inte

rna

l R

un

tim

e (

sec

on

ds)

Number of variables in search space (NU)

ATM

Nelder Mead (scipy.optimize)

L-BFGS-B

(scipy.optimize)

Page 23: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Recent Progress

• Introduced a primary axis

updated by a moving

average rule

• Performance of the ATM

is improved on rotated

functions

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 29 of 34

Iterations

log10Δ𝑓

Convergence Plot Rotated Ellipse f10 Convergence Plot Sharp Ridge f13

Original ATM

New Version

Page 24: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Goals Moving Forward

Improve performance on

rotated and ill-conditioned

functions

(without using DxD objects)

01Increase performance in noisy

environments – use averaging

and moving mean

03

02

04Add second population with

weak restart conditions –

for multimodal functions

Make the ATM more user friendly

and customizable

For more information see: https://github.com/BjBodner/ATM-optimization-algorithm

30 of 344/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed

Page 25: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Conclusions

• Good candidate for

optimizing very

high-dimensional

problems

• More research is

needed

• Scales linearly

with size of the

search space:

• No DxD objects

• Underperforms on

rotated functions

e.g., ill-conditioned

and/or noisy functions

• Very efficient at

optimizing

separable

functions

Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed4/1/2020 31 of 34

The ATM Algorithm

Page 26: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Acknowledgements

Dr Brenda Rubenstein,

Brown University,

Providence, RI, USA for her

guidance in developing this

algorithm.

Her contributions and

encouragement were

essential in advancing this

project forward and getting

it to its current form.

Dr Eran Triester

Ben-Gurion University,

Beersheva, Israel for his

ongoing collaboration.

Working with him is

significantly helping

improve the performance

of the algorithm.

32 of 34Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed4/1/2020

Page 27: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

References

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 33 of 34

• Nikolaus Hansen. Benchmarking a BI-Population CMA-ES on the BBOB-2009 Function

Testbed . GECCO '09 Proceedings of the 11th Annual Conference Companion on

Genetic and Evolutionary Computation Conference: Late Breaking Papers Pages

2389-2396

• Bin Qian, Angel R. Ortiz, David Baker. Improvement of comparative model accuracy

by free-energy optimization along principal components of natural structural

variation. PNAS October 26, 2004, vol.101,no. 43, 1534

• Dan Vladimir Nichita, Susana Gomez, Eduardo Luna. Multiphase equilibria calculation

by direct minimization of Gibbs free energy with a global optimization method.

Computers and Chemical Engineering 26 (2002) 1703/1724

Page 28: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

References

◦ Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Touta. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805

◦ Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah. Failures of Gradient-Based Deep Learning. ICML’17 Proceedings of the 34th International Conference on MachineLearning-Volume70,Pages3067-3075 2017.

◦ Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun. Deep Residual Learning for Image Recognition. arXiv:1512.03385v1 [cs.CV] 10 Dec 2015.

◦ Sutskever,I.,Martens,J.,Dahl,G.,Hinton,G. On the importance of initialization and momentum in deep learning. In Proceedings of the 30 the International Conference on Machine Learning-Volume28,I CML13, III1139-III-1147 (JMLR.org,2013)

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 34 of 34

Page 29: Benchmarking The ATM Algorithm › presentation-archive › 2019-GECCO › 07_… · Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34. Proposal Eliminate

Thank you!

Questions?

Email: benjamin_bodner@brown_edu

For more information see: https://github.com/BjBodner/ATM-optimization-algorithm


Recommended