Post on 12-Oct-2020
transcript
9th GECCO Workshop onBlackbox Optimization Benchmarking (BBOB):
Welcome and Introduction to COCO/BBOB
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this
notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Copyright is held by the owner/author(s).
GECCO '14, Jul 12-16 2014, Vancouver, BC, Canada
ACM 978-1-4503-2881-4/14/07.
http://dx.doi.org/10.1145/2598394.2605339
The BBOBieshttps://github.com/numbbo/coco
slides based on previous ones by A. Auger, N. Hansen, and D. Brockhoff
challenging optimization problemsappear in many
scientific, technological and industrial domains
Optimize 𝑓: Ω ⊂ ℝ𝑛 ↦ ℝ𝑘
derivatives not available or not useful
𝑥 ∈ ℝ𝑛 𝑓(𝑥) ∈ ℝ𝑘
Numerical Blackbox Optimization
Given:
Not clear:
which of the many algorithms should I use on my problem?
𝑥 ∈ ℝ𝑛 𝑓(𝑥) ∈ ℝ𝑘
Practical Blackbox Optimization
Deterministic algorithmsQuasi-Newton with estimation of gradient (BFGS) [Broyden et al. 1970]Simplex downhill [Nelder & Mead 1965] Pattern search [Hooke and Jeeves 1961] Trust-region methods (NEWUOA, BOBYQA) [Powell 2006, 2009]
Stochastic (randomized) search methodsEvolutionary Algorithms (continuous domain) • Differential Evolution [Storn & Price 1997] • Particle Swarm Optimization [Kennedy & Eberhart 1995] • Evolution Strategies, CMA-ES [Rechenberg 1965, Hansen & Ostermeier 2001] • Estimation of Distribution Algorithms (EDAs) [Larrañaga, Lozano, 2002] • Cross Entropy Method (same as EDA) [Rubinstein, Kroese, 2004] • Genetic Algorithms [Holland 1975, Goldberg 1989]
Simulated annealing [Kirkpatrick et al. 1983]Simultaneous perturbation stochastic approx. (SPSA) [Spall 2000]
Numerical Blackbox Optimizers
Deterministic algorithmsQuasi-Newton with estimation of gradient (BFGS) [Broyden et al. 1970]Simplex downhill [Nelder & Mead 1965] Pattern search [Hooke and Jeeves 1961] Trust-region methods (NEWUOA, BOBYQA) [Powell 2006, 2009]
Stochastic (randomized) search methodsEvolutionary Algorithms (continuous domain) • Differential Evolution [Storn & Price 1997] • Particle Swarm Optimization [Kennedy & Eberhart 1995] • Evolution Strategies, CMA-ES [Rechenberg 1965, Hansen & Ostermeier 2001] • Estimation of Distribution Algorithms (EDAs) [Larrañaga, Lozano, 2002] • Cross Entropy Method (same as EDA) [Rubinstein, Kroese, 2004] • Genetic Algorithms [Holland 1975, Goldberg 1989]
Simulated annealing [Kirkpatrick et al. 1983]Simultaneous perturbation stochastic approx. (SPSA) [Spall 2000]
• choice typically not immediately clear• although practitioners have knowledge about which
difficulties their problem has (e.g. multi-modality, non-separability, ...)
Numerical Blackbox Optimizers
• understanding of algorithms
• algorithm selection
• putting algorithms to a standardized test• simplify judgement
• simplify comparison
• regression test under algorithm changes
Kind of everybody has to do it (and it is tedious):
• choosing (and implementing) problems, performance measures, visualization, stat. tests, ...
• running a set of algorithms
Need: Benchmarking
that's where COCO and BBOB come into play
Comparing Continuous Optimizers Platform
https://github.com/numbbo/coco
automatized benchmarking
Overview of COCO's Structure
Overview of COCO's Structure
Overview of COCO's Structure
How to benchmark algorithms with COCO?
...in 2-3 minutes
https://github.com/numbbo/coco
https://github.com/numbbo/coco
https://github.com/numbbo/coco
https://github.com/numbbo/coco
https://github.com/numbbo/coco
https://github.com/numbbo/coco
https://github.com/numbbo/coco
requirements & download
https://github.com/numbbo/coco
installation I: experiments
https://github.com/numbbo/coco
installation II: postprocessing
https://github.com/numbbo/coco
coupling algo + COCO
https://github.com/numbbo/coco
Simplified Example Experiment in Pythonimport cocoex
import scipy.optimize
### input
suite_name = "bbob"
output_folder = "scipy-optimize-fmin"
fmin = scipy.optimize.fmin
### prepare
suite = cocoex.Suite(suite_name, "", "")
observer = cocoex.Observer(suite_name,
"result_folder: " + output_folder)
### go
for problem in suite: # this loop will take several minutes
problem.observe_with(observer) # generates the data for
# cocopp post-processing
fmin(problem, problem.initial_solution)
Note: the actual example_experiment.py contains more advanced things like
restarts, batch experiments, other algorithms (e.g. CMA-ES), etc.
running the experiment
https://github.com/numbbo/coco
postprocessing
https://github.com/numbbo/coco
Result Folder
Automatically Generated Results
Automatically Generated Results
Automatically Generated Results
Automatically Generated Results
doesn't look too complicated, does it?
[the devil is in the details ]
so far (i.e. incl. BBOB-2019):
data for 300+ algorithm variants
(some of which on noisy/multiobjective/largescale suites)
143 workshop papers
by 109 authors from 28 countries
Is Benchmarking Not Trivial?
Choose a set of algorithms
Choose a set of (test) functions
Run the algorithms and compare the results!the devil is in the details…
hence, COCO implements a
reasonable, well-founded, and
well-documented
pre-chosen methodology
On
• real world problems• expensive
• comparison typically limited to certain domains
• experts have limited interest to publish
• "artificial" benchmark functions• cheap
• controlled
• data acquisition is comparatively easy
• problem of representativeness
Measuring Performance
• define the "scientific question"
the relevance can hardly be overestimated
• should represent "reality"
• are often too simple?
remind separability
• account for invariance properties
prediction of performance is based on “similarity”, ideally equivalence classes of functions
Test Functions
Available Test Suites in COCObbob (since 2009) 24 noiseless fcts 200+ data sets
bbob-noisy (since 2009) 30 noisy fcts 40+ data sets
bbob-biobj (since 2016) 55 bi-objective fcts 30+ data sets
bbob-largescale 24 noiseless fcts 11 data sets
bbob-mixint 24 noiseless fcts 4 data sets
bbob-biobj-mixint 92 bi-objective fcts -
Close from release:
• constrained test suite (bbob-constrained)
• extended bi-objective suite (bbob-biobj-ext)
Under development/in planning phase:
3-objective suite + real-world problems(see also the game benchmarking workshop)
new
new
new
Easy Data Accessimport cocopp
cocopp.bbob.get('BIPOP')
[…]
ValueError: 'BIPOP' has multiple matches in the data archive:
2009/BIPOP-CMA-ES_hansen_noiseless.tgz
2012/BIPOPaCMA_loshchilov_noiseless.tgz
[…]
2017/KL-BIPOP-CMA-ES-Yamaguchi.tgz
Either pick a single match, or use the `get_all` or `get_first` method,
or use the ! (first) or * (all) marker and try again.
cocopp.main(‘exdata/myfolder bbob/2009/BIPOP!’)
[data access also available via command line]
Meaningful quantitative measure• quantitative on the ratio scale (highest possible)
"algo A is two times better than algo B" is a meaningful statement
• assume a wide range of values
• meaningful (interpretable) with regard to the real world
possible to transfer from benchmarking to real world
How Do We Measure Performance?
runtime or first hitting time is the prime candidate(we don't have many choices anyway)
convergence graphs is all we have to start with...
Measuring Performance Empirically
convergence graphs is all we have to start with...
Measuring Performance Empirically
One fits all:
• single- vs. multiobjective• unconstrained vs. constrained• with instances even
deterministic vs. stochastic algos
Difference:which performance measure?
Main Performance Visualization:
Empirical Runtime Distributions
[aka Empirical Cumulative Distribution Function (ECDF) of the Runtime]
[aka data profile]
Convergence Graph of 15 Runs
target
15 Runs ≤ 15 Runtime Data Points
Empirical CDF1
0.8
0.6
0.4
0.2
0
the ECDF of run lengths to reach the target
● has for each data point a vertical step of constant size
● displays for each x-value (budget) the count of observations to the left (first hitting times)
Empirical Cumulative Distribution
e.g. 60% of the runs need between 2000 and 4000 evaluationse.g. 80% of the runs reached the target
Reconstructing A Single Run
50 equallyspaced targets
Reconstructing A Single Run
Reconstructing A Single Run
Reconstructing A Single Run
the empirical CDFmakes a step for each star, is monotonous and displays for each budget the fraction of targets achieved within the budget
1
0.8
0.6
0.4
0.2
0
Reconstructing A Single Run
the ECDF recovers the monotonous graph, discretised and flipped
1
0.8
0.6
0.4
0.2
0
Reconstructing A Single Run
1
0.8
0.6
0.4
0.2
0
Reconstructing A Single Run
the ECDF recovers the monotonous graph, discretised and flipped
15 runs
Aggregation
15 runs
50 targets
Aggregation
15 runs
50 targets
Aggregation
15 runs
50 targets
ECDF with 750 steps
Aggregation
50 targets from 15 runs
...integrated in a single graph
Aggregation
area over the ECDFcurve
=average log runtime
(or geometric avg. runtime) over all
targets (difficult and easy) and all runs
50 targets from 15 runs integrated in a single graph
Interpretation
Fixed-target: Measuring Runtime
Fixed-target: Measuring Runtime
• Algo Restart A:
• Algo Restart B:
𝑹𝑻𝑨𝒓
ps(Algo Restart A) = 1
𝑹𝑻𝑩𝒓
ps(Algo Restart B) = 1
Fixed-target: Measuring Runtime
• Expected running time of the restarted algorithm:
𝐸 𝑅𝑇𝑟 =1 − 𝑝𝑠𝑝𝑠
𝐸 𝑅𝑇𝑢𝑛𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑓𝑢𝑙 + 𝐸[𝑅𝑇𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑓𝑢𝑙]
• Estimator average running time (aRT):
ෝ𝑝𝑠 =#successes
#runs
𝑅𝑇𝑢𝑛𝑠𝑢𝑐𝑐 = Average evals of unsuccessful runs
𝑅𝑇𝑠𝑢𝑐𝑐 = Average evals of successful runs
𝑎𝑅𝑇 =total #evals
#successes
ECDFs with Simulated Restarts
What we typically plot are ECDFs of the simulated restarted algorithms:
Worth to Note: ECDFs in COCO
In COCO, ECDF graphs
• never aggregate over dimension
• but often over targets and functions
• can show data of more than 1 algorithm at a time
150 algorithmsfrom BBOB-2009
till BBOB-2015
...but no time to explain them here
More Automated Plots...
...but no time to explain them here
More Automated Plots...
A few more Details…
• the idea of instances
• multiobjective optimization
• All COCO problems come in form of instances
• e.g. as translated/rotated versions of the same function
Notion of Instances
• All COCO problems come in form of instances
• e.g. as translated/rotated versions of the same function
Notion of Instances
• All COCO problems come in form of instances
• e.g. as translated/rotated versions of the same function
• Prescribed instances typically change from year to year
• avoid overfitting
• 5 instances are always kept the same
Plus:
• the bbob functions are locally perturbed by non-linear transformations
Notion of Instances
• All COCO problems come in form of instances
• e.g. as translated/rotated versions of the same function
• Prescribed instances typically change from year to year
• avoid overfitting
• 5 instances are always kept the same
Plus:
• the bbob functions are locally perturbed by non-linear transformations
Notion of Instances
f10 (Ellipsoid)
f15 (Rastrigin)
algorithm quality =
normalized* hypervolume (HV)
of all non-dominated solutions
if a point dominates nadir
closest normalized* negative distance
to region of interest [0,1]2
if no point dominates nadir
* such that ideal=[0,0] and nadir=[1,1]
Bi-objective Performance Assessment
and now?
BBOB-2019Session 1: large-scale and multiobjective optimization
08:30 - 09:15 The BBOBies: A Short Introduction to COCO and BBOB
09:15 – 09:40Konstantinos Varelas*: Benchmarking Large Scale Variants of CMA-ES and L-BFGS-B on the bbob-largescale Testbed
09:40 - 10:05Paul Dufossé* and Cheikh Touré: Benchmarking MO-CMA-ES and COMO-CMA-ES on the Bi-objective bbob-biobj Testbed
10:05 – 10:20Dimo Brockhoff* and Tea Tušar:Benchmarking Algorithms from the platypus Framework on the Biobjective bbob-biobj Testbed
Session 2: single-objective noiseless unbounded optimization
10:40 - 10:45 The BBOBies: Introduction to Blackbox Optimization Benchmarking
10:45 – 10:55Dimo Brockhoff and Nikolaus Hansen: The Impact of Sample Volume in Random Search on the bbobTest Suite
10:55 - 11:20 Benjamin Bodner: Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed
11:20 – 11:45Louis Faury, Clément Calauzènes, and Olivier Fercoq: Benchmarking GNN-CMA-ES on the BBOB noiseless testbed
11:45 - 12:10Konstantinos Varelas and Marie-Ange Dahito: Benchmarking Multivariate Solvers of SciPy on the Noiseless Testbed
12:10 - 12:20 Nikolaus Hansen: The COCO data archive and This Year’s Results
12:20 – 12:30 The BBOBies: Wrap-up and Open Discussion
BBOB-2019Session 1: large-scale and multiobjective optimization
08:30 - 09:15 The BBOBies: A Short Introduction to COCO and BBOB
09:15 – 09:40Konstantinos Varelas*: Benchmarking Large Scale Variants of CMA-ES and L-BFGS-B on the bbob-largescale Testbed
09:40 - 10:05Paul Dufossé* and Cheikh Touré: Benchmarking MO-CMA-ES and COMO-CMA-ES on the Bi-objective bbob-biobj Testbed
10:05 – 10:20Dimo Brockhoff* and Tea Tušar:Benchmarking Algorithms from the platypus Framework on the Biobjective bbob-biobj Testbed
Session 2: single-objective noiseless unbounded optimization
10:40 - 10:45 The BBOBies: Introduction to Blackbox Optimization Benchmarking
10:45 – 10:55Dimo Brockhoff and Nikolaus Hansen: The Impact of Sample Volume in Random Search on the bbobTest Suite
10:55 - 11:20 Benjamin Bodner: Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed
11:20 – 11:45Louis Faury, Clément Calauzènes, and Olivier Fercoq: Benchmarking GNN-CMA-ES on the BBOB noiseless testbed
11:45 - 12:10Konstantinos Varelas and Marie-Ange Dahito: Benchmarking Multivariate Solvers of SciPy on the Noiseless Testbed
12:10 - 12:20 Nikolaus Hansen: The COCO data archive and This Year’s Results
12:20 – 12:30 The BBOBies: Wrap-up and Open Discussion
http://coco.gforge.inria.fr/