New 9 GECCO Workshop on Blackbox Optimization Benchmarking … · 2020. 4. 1. · 9th GECCO...

transcript

9th GECCO Workshop onBlackbox Optimization Benchmarking (BBOB):

Welcome and Introduction to COCO/BBOB

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this

notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Copyright is held by the owner/author(s).

GECCO '14, Jul 12-16 2014, Vancouver, BC, Canada

ACM 978-1-4503-2881-4/14/07.

http://dx.doi.org/10.1145/2598394.2605339

The BBOBieshttps://github.com/numbbo/coco

slides based on previous ones by A. Auger, N. Hansen, and D. Brockhoff

challenging optimization problemsappear in many

scientific, technological and industrial domains

Optimize 𝑓: Ω ⊂ ℝ𝑛 ↦ ℝ𝑘

derivatives not available or not useful

𝑥 ∈ ℝ𝑛 𝑓(𝑥) ∈ ℝ𝑘

Numerical Blackbox Optimization

Given:

Not clear:

which of the many algorithms should I use on my problem?

𝑥 ∈ ℝ𝑛 𝑓(𝑥) ∈ ℝ𝑘

Practical Blackbox Optimization

Deterministic algorithmsQuasi-Newton with estimation of gradient (BFGS) [Broyden et al. 1970]Simplex downhill [Nelder & Mead 1965] Pattern search [Hooke and Jeeves 1961] Trust-region methods (NEWUOA, BOBYQA) [Powell 2006, 2009]

Stochastic (randomized) search methodsEvolutionary Algorithms (continuous domain) • Differential Evolution [Storn & Price 1997] • Particle Swarm Optimization [Kennedy & Eberhart 1995] • Evolution Strategies, CMA-ES [Rechenberg 1965, Hansen & Ostermeier 2001] • Estimation of Distribution Algorithms (EDAs) [Larrañaga, Lozano, 2002] • Cross Entropy Method (same as EDA) [Rubinstein, Kroese, 2004] • Genetic Algorithms [Holland 1975, Goldberg 1989]

Simulated annealing [Kirkpatrick et al. 1983]Simultaneous perturbation stochastic approx. (SPSA) [Spall 2000]

Numerical Blackbox Optimizers

Deterministic algorithmsQuasi-Newton with estimation of gradient (BFGS) [Broyden et al. 1970]Simplex downhill [Nelder & Mead 1965] Pattern search [Hooke and Jeeves 1961] Trust-region methods (NEWUOA, BOBYQA) [Powell 2006, 2009]

Stochastic (randomized) search methodsEvolutionary Algorithms (continuous domain) • Differential Evolution [Storn & Price 1997] • Particle Swarm Optimization [Kennedy & Eberhart 1995] • Evolution Strategies, CMA-ES [Rechenberg 1965, Hansen & Ostermeier 2001] • Estimation of Distribution Algorithms (EDAs) [Larrañaga, Lozano, 2002] • Cross Entropy Method (same as EDA) [Rubinstein, Kroese, 2004] • Genetic Algorithms [Holland 1975, Goldberg 1989]

Simulated annealing [Kirkpatrick et al. 1983]Simultaneous perturbation stochastic approx. (SPSA) [Spall 2000]

• choice typically not immediately clear• although practitioners have knowledge about which

difficulties their problem has (e.g. multi-modality, non-separability, ...)

Numerical Blackbox Optimizers

• understanding of algorithms

• algorithm selection

• putting algorithms to a standardized test• simplify judgement

• simplify comparison

• regression test under algorithm changes

Kind of everybody has to do it (and it is tedious):

• choosing (and implementing) problems, performance measures, visualization, stat. tests, ...

• running a set of algorithms

Need: Benchmarking

that's where COCO and BBOB come into play

Comparing Continuous Optimizers Platform

https://github.com/numbbo/coco

automatized benchmarking

Overview of COCO's Structure

How to benchmark algorithms with COCO?

...in 2-3 minutes

requirements & download

installation I: experiments

installation II: postprocessing

coupling algo + COCO

Simplified Example Experiment in Pythonimport cocoex

import scipy.optimize

### input

suite_name = "bbob"

output_folder = "scipy-optimize-fmin"

fmin = scipy.optimize.fmin

### prepare

suite = cocoex.Suite(suite_name, "", "")

observer = cocoex.Observer(suite_name,

"result_folder: " + output_folder)

### go

for problem in suite: # this loop will take several minutes

problem.observe_with(observer) # generates the data for

# cocopp post-processing

fmin(problem, problem.initial_solution)

Note: the actual example_experiment.py contains more advanced things like

restarts, batch experiments, other algorithms (e.g. CMA-ES), etc.

running the experiment

postprocessing

Result Folder

Automatically Generated Results

doesn't look too complicated, does it?

[the devil is in the details ]

so far (i.e. incl. BBOB-2019):

data for 300+ algorithm variants

(some of which on noisy/multiobjective/largescale suites)

143 workshop papers

by 109 authors from 28 countries

Is Benchmarking Not Trivial?

Choose a set of algorithms

Choose a set of (test) functions

Run the algorithms and compare the results!the devil is in the details…

hence, COCO implements a

reasonable, well-founded, and

well-documented

pre-chosen methodology

• real world problems• expensive

• comparison typically limited to certain domains

• experts have limited interest to publish

• "artificial" benchmark functions• cheap

• controlled

• data acquisition is comparatively easy

• problem of representativeness

Measuring Performance

• define the "scientific question"

the relevance can hardly be overestimated

• should represent "reality"

• are often too simple?

remind separability

• account for invariance properties

prediction of performance is based on “similarity”, ideally equivalence classes of functions

Test Functions

Available Test Suites in COCObbob (since 2009) 24 noiseless fcts 200+ data sets

bbob-noisy (since 2009) 30 noisy fcts 40+ data sets

bbob-biobj (since 2016) 55 bi-objective fcts 30+ data sets

bbob-largescale 24 noiseless fcts 11 data sets

bbob-mixint 24 noiseless fcts 4 data sets

bbob-biobj-mixint 92 bi-objective fcts -

Close from release:

• constrained test suite (bbob-constrained)

• extended bi-objective suite (bbob-biobj-ext)

Under development/in planning phase:

3-objective suite + real-world problems(see also the game benchmarking workshop)

Easy Data Accessimport cocopp

cocopp.bbob.get('BIPOP')

ValueError: 'BIPOP' has multiple matches in the data archive:

2009/BIPOP-CMA-ES_hansen_noiseless.tgz

2012/BIPOPaCMA_loshchilov_noiseless.tgz

2017/KL-BIPOP-CMA-ES-Yamaguchi.tgz

Either pick a single match, or use the `get_all` or `get_first` method,

or use the ! (first) or * (all) marker and try again.

cocopp.main(‘exdata/myfolder bbob/2009/BIPOP!’)

[data access also available via command line]

Meaningful quantitative measure• quantitative on the ratio scale (highest possible)

"algo A is two times better than algo B" is a meaningful statement

• assume a wide range of values

• meaningful (interpretable) with regard to the real world

possible to transfer from benchmarking to real world

How Do We Measure Performance?

runtime or first hitting time is the prime candidate(we don't have many choices anyway)

convergence graphs is all we have to start with...

Measuring Performance Empirically

convergence graphs is all we have to start with...

Measuring Performance Empirically

One fits all:

• single- vs. multiobjective• unconstrained vs. constrained• with instances even

deterministic vs. stochastic algos

Difference:which performance measure?

Main Performance Visualization:

Empirical Runtime Distributions

[aka Empirical Cumulative Distribution Function (ECDF) of the Runtime]

[aka data profile]

Convergence Graph of 15 Runs

target

15 Runs ≤ 15 Runtime Data Points

Empirical CDF1

the ECDF of run lengths to reach the target

● has for each data point a vertical step of constant size

● displays for each x-value (budget) the count of observations to the left (first hitting times)

Empirical Cumulative Distribution

e.g. 60% of the runs need between 2000 and 4000 evaluationse.g. 80% of the runs reached the target

Reconstructing A Single Run

50 equallyspaced targets

the empirical CDFmakes a step for each star, is monotonous and displays for each budget the fraction of targets achieved within the budget

the ECDF recovers the monotonous graph, discretised and flipped

15 runs

Aggregation

15 runs

50 targets

Aggregation

15 runs

50 targets

Aggregation

15 runs

50 targets

ECDF with 750 steps

Aggregation

50 targets from 15 runs

...integrated in a single graph

Aggregation

area over the ECDFcurve

=average log runtime

(or geometric avg. runtime) over all

targets (difficult and easy) and all runs

50 targets from 15 runs integrated in a single graph

Interpretation

Fixed-target: Measuring Runtime

• Algo Restart A:

• Algo Restart B:

𝑹𝑻𝑨𝒓

ps(Algo Restart A) = 1

𝑹𝑻𝑩𝒓

ps(Algo Restart B) = 1

Fixed-target: Measuring Runtime

• Expected running time of the restarted algorithm:

𝐸 𝑅𝑇𝑟 =1 − 𝑝𝑠𝑝𝑠

𝐸 𝑅𝑇𝑢𝑛𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑓𝑢𝑙 + 𝐸[𝑅𝑇𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑓𝑢𝑙]

• Estimator average running time (aRT):

ෝ𝑝𝑠 =#successes

𝑅𝑇𝑢𝑛𝑠𝑢𝑐𝑐 = Average evals of unsuccessful runs

𝑅𝑇𝑠𝑢𝑐𝑐 = Average evals of successful runs

𝑎𝑅𝑇 =total #evals

#successes

ECDFs with Simulated Restarts

What we typically plot are ECDFs of the simulated restarted algorithms:

Worth to Note: ECDFs in COCO

In COCO, ECDF graphs

• never aggregate over dimension

• but often over targets and functions

• can show data of more than 1 algorithm at a time

150 algorithmsfrom BBOB-2009

till BBOB-2015

...but no time to explain them here

More Automated Plots...

...but no time to explain them here

More Automated Plots...

A few more Details…

• the idea of instances

• multiobjective optimization

• All COCO problems come in form of instances

• e.g. as translated/rotated versions of the same function

Notion of Instances

• Prescribed instances typically change from year to year

• avoid overfitting

• 5 instances are always kept the same

• the bbob functions are locally perturbed by non-linear transformations

Notion of Instances

• Prescribed instances typically change from year to year

• avoid overfitting

• 5 instances are always kept the same

• the bbob functions are locally perturbed by non-linear transformations

Notion of Instances

f10 (Ellipsoid)

f15 (Rastrigin)

algorithm quality =

normalized* hypervolume (HV)

of all non-dominated solutions

if a point dominates nadir

closest normalized* negative distance

to region of interest [0,1]2

if no point dominates nadir

* such that ideal=[0,0] and nadir=[1,1]

Bi-objective Performance Assessment

and now?

BBOB-2019Session 1: large-scale and multiobjective optimization

08:30 - 09:15 The BBOBies: A Short Introduction to COCO and BBOB

09:15 – 09:40Konstantinos Varelas*: Benchmarking Large Scale Variants of CMA-ES and L-BFGS-B on the bbob-largescale Testbed

09:40 - 10:05Paul Dufossé* and Cheikh Touré: Benchmarking MO-CMA-ES and COMO-CMA-ES on the Bi-objective bbob-biobj Testbed

10:05 – 10:20Dimo Brockhoff* and Tea Tušar:Benchmarking Algorithms from the platypus Framework on the Biobjective bbob-biobj Testbed

Session 2: single-objective noiseless unbounded optimization

10:40 - 10:45 The BBOBies: Introduction to Blackbox Optimization Benchmarking

10:45 – 10:55Dimo Brockhoff and Nikolaus Hansen: The Impact of Sample Volume in Random Search on the bbobTest Suite

10:55 - 11:20 Benjamin Bodner: Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed

11:20 – 11:45Louis Faury, Clément Calauzènes, and Olivier Fercoq: Benchmarking GNN-CMA-ES on the BBOB noiseless testbed

11:45 - 12:10Konstantinos Varelas and Marie-Ange Dahito: Benchmarking Multivariate Solvers of SciPy on the Noiseless Testbed

12:10 - 12:20 Nikolaus Hansen: The COCO data archive and This Year’s Results

12:20 – 12:30 The BBOBies: Wrap-up and Open Discussion