Automatic Algorithm Configuration
Thomas Stutzle
IRIDIA, CoDE, Universite Libre de BruxellesBrussels, Belgium
iridia.ulb.ac.be/
~
stuetzle
Outline
1. Context
2. Automatic algorithm configuration
3. Automatic configuration methods
4. Applications
5. Concluding remarks
Heuristic Optimization 2014 2
The algorithmic solution of hard optimizationproblems is one of the CS/OR success stories!
I Exact (systematic search) algorithms
I Branch&Bound, Branch&Cut, constraint programming, . . .
I powerful general-purpose software available
I guarantees on optimality but often time/memory consuming
I Approximate algorithms
I heuristics, local search, metaheuristics, hyperheuristics . . .
I typically special-purpose software
I rarely provable guarantees but often fast and accurate
Much active research on hybrids betweenexact and approximate algorithms!
Heuristic Optimization 2014 3
Design choices and parameters everywhere
Todays high-performance optimizers involve a largenumber of design choices and parameter settings
I exact solvers
I design choices include alternative models, pre-processing,variable selection, value selection, branching rules . . .
I many design choices have associated numerical parametersI example: SCIP 3.0.1 solver (fastest non-commercial MIP
solver) has more than 200 relevant parameters that influencethe solver’s search mechanism
I approximate algorithms
I design choices include solution representation, operators,neighborhoods, pre-processing, strategies, . . .
I many design choices have associated numerical parametersI example: multi-objective ACO algorithms with 22 parameters
(plus several still hidden ones)
Heuristic Optimization 2014 4
Example: Ant Colony Optimization
Heuristic Optimization 2014 5
Example: Ant Colony Optimization
Heuristic Optimization 2014 6
Probabilistic solution construction
i
j
k
g
! ij " ij
?,
Heuristic Optimization 2014 7
ACO design choices and numerical parameters
I solution constructionI choice of constructive procedureI choice of pheromone modelI choice of heuristic informationI numerical parameters
I ↵,� influence the weight of pheromone and heuristicinformation, respectively
Iq0 determines greediness of construction procedure
Im, the number of ants
I pheromone update
I which ants deposit pheromone and how much?I numerical parameters
I ⇢: evaporation rateI ⌧0: initial pheromone level
I local searchI . . . many more . . .
Heuristic Optimization 2014 8
Parameter types
I categorical parameters design
I choice of constructive procedure, choice of recombinationoperator, choice of branching strategy,. . .
I ordinal parameters design
I neighborhoods, lower bounds, . . .
I numerical parameters tuning, calibration
I integer or real-valued parametersI weighting factors, population sizes, temperature, hidden
constants, . . .I numerical parameters may be conditional to specific values of
categorical or ordinal parameters
Design and configuration of algorithms involves settingcategorical, ordinal, and numerical parameters
Heuristic Optimization 2014 9
Designing optimization algorithms
Challenges
I many alternative design choices
I nonlinear interactions among algorithm componentsand/or parameters
I performance assessment is di�cult
Traditional design approach
I trial–and–error design guided by expertise/intuition prone to over-generalizations, implicit independenceassumptions, limited exploration of design alternatives
Can we make this approach more principled and automatic?
Heuristic Optimization 2014 10
Towards automatic algorithm configuration
Automated algorithm configuration
I apply powerful search techniques to design algorithms
I use computation power to explore design spaces
I assist algorithm designer in the design process
I free human creativity for higher level tasks
Heuristic Optimization 2014 11
O✏ine configuration and online parameter control
O✏ine configuration
I configure algorithm before deploying it
I configuration on training instances
I related to algorithm design
Online parameter control
I adapt parameter setting while solving an instance
I typically limited to a set of known crucial algorithmparameters
I related to parameter calibration
O✏ine configuration techniques can be helpful to configure(online) parameter control strategies
Heuristic Optimization 2014 12
O✏ine configuration
Typical performance measures
I maximize solution quality (within given computation time)
I minimize computation time (to reach optimal solution)
Heuristic Optimization 2014 13
Approaches to configurationI numerical optimization techniques
I e.g. MADS [Audet&Orban, 2006], various [Yuan et al., 2012]
I heuristic search methodsI e.g. meta-GA [Grefenstette, 1985], ParamILS [Hutter et al., 2007,
2009], gender-based GA [Ansotegui at al., 2009], linear GP [Oltean,2005], REVAC(++) [Eiben & students, 2007, 2009, 2010] . . .
I experimental design techniquesI e.g. CALIBRA [Adenso–Dıaz, Laguna, 2006], [Ridge&Kudenko,
2007], [Coy et al., 2001], [Ruiz, Stutzle, 2005]
I model-based optimization approachesI e.g. SPO [Bartz-Beielstein et al., 2005, 2006, .. ], SMAC [Hutter et
al., 2011, ..]
I sequential statistical testingI e.g. F-race, iterated F-race [Birattari et al, 2002, 2007, . . .]
General, domain-independent methods required: (i) applicable to all variable
types, (ii) multiple training instances, (iii) high performance
Heuristic Optimization 2014 14
Approaches to configurationI numerical optimization techniques
I e.g. MADS [Audet&Orban, 2006], various [Yuan et al., 2012]
I heuristic search methodsI e.g. meta-GA [Grefenstette, 1985], ParamILS [Hutter et al., 2007,
2009], gender-based GA [Ansotegui at al., 2009], linear GP [Oltean,2005], REVAC(++) [Eiben & students, 2007, 2009, 2010] . . .
I experimental design techniquesI e.g. CALIBRA [Adenso–Dıaz, Laguna, 2006], [Ridge&Kudenko,
2007], [Coy et al., 2001], [Ruiz, Stutzle, 2005]
I model-based optimization approachesI e.g. SPO [Bartz-Beielstein et al., 2005, 2006, .. ], SMAC [Hutter et
al., 2011, ..]
I sequential statistical testingI e.g. F-race, iterated F-race [Birattari et al, 2002, 2007, . . .]
General, domain-independent methods required: (i) applicable to all variable
types, (ii) multiple training instances, (iii) high performance
Heuristic Optimization 2014 15
The racing approach
⇥
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas su�cient evidence is gathered against them
I . . . repeat until a winner is selectedor until computation time expires
Heuristic Optimization 2014 16
The F-Race algorithm
Statistical testing
1. family-wise tests for di↵erences among configurations
I Friedman two-way analysis of variance by ranks
2. if Friedman rejects H0, perform pairwise comparisons to bestconfiguration
I apply Friedman post-test
Heuristic Optimization 2014 17
Some applications
International time-tabling competition
I winning algorithm configured by F-race
I interactive injection of new configurations
Vehicle routing and scheduling problem
I first industrial application
I improved commerialized algorithm
F-race in stochastic optimization
I evaluate “neighbours” using F-race(solution cost is a random variable!)
I good performance if variance of solution cost is high
Heuristic Optimization 2014 18
Iterated race
F-race is a method for the selection of the best configuration andindependent of the way the set of configurations is sampled
Sampling configurations and F-race
I full factorial design
I random sampling design
I iterative refinement of a sampling model (iterated race)
Heuristic Optimization 2014 19
Iterated race: an illustration
I sample configurationsfrom initial distribution
While not terminate()
1. apply race
2. modify the distribution
3. sample configurationswith selection probability
Heuristic Optimization 2014 20
Sampling distributions
Numerical parameter Xd 2 [xd , xd ]
I Truncated normal distribution
N (µzd ,�
id) 2 [xd , xd ]
µzd = value of parameter d in elite configuration z
�id = decreases with the number of iterations
Categorical parameter Xd 2 {x1, x2, . . . , xnd}I Discrete probability distribution
x1 x2 . . . xndPrz{Xd = xj} = 0.1 0.3 . . . 0.4
I Updated by increasing probability of parameter value in eliteconfiguration and reducing probabilities of others
Heuristic Optimization 2014 21
The irace Package
Manuel Lopez-Ibanez, Jeremie Dubois-Lacoste, Thomas Stutzle, and Mauro
Birattari. The irace package, Iterated Race for Automatic Algorithm
Configuration. Technical Report TR/IRIDIA/2011-004, IRIDIA, Universite
Libre de Bruxelles, Belgium, 2011.
http://iridia.ulb.ac.be/irace
I implementation of Iterated Racing in R
Goal 1: flexibleGoal 2: easy to use
I but no knowledge of R necessary
Heuristic Optimization 2014 22
Other tools: ParamILS, SMAC
ParamILS
I iterated local search in configuration space
I requires discretization of numerical parameters
I http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/
SMAC
I surrogate model assisted search process
I encouraging results for large configuration spaces
I http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
capping: e↵ective speed-up technique for configuration targetrun-time
Heuristic Optimization 2014 23
Applications of automatic configuration tools
I configuration of “black-box” solvers
I e.g. mixed integer programming solvers, continuous optimizers
I supporting tool in algorithm engineering
I e.g. metaheuristics for probabilistic TSP, re-engineering PSO
I bottom-up generation of heuristic algorithms
I e.g. heuristics for SAT, FSP, etc.; metaheuristic framework
I design configurable algorithm frameworks
I e.g. Satenstein, MOACO, UACOR
Heuristic Optimization 2014 24
Example, configuration of “black-box” solvers
Mixed-integer programming solvers
Heuristic Optimization 2014 25
Mixed integer programming (MIP) solvers[Hutter, Hoos, Leyton-Brown, Stutzle, 2009, Hutter, Hoos Leighton-Brown, 2010]
I MIP modelling widely used for tackling optimization problems
I powerful commercial (e.g. CPLEX) and non-commercial (e.g.SCIP) solvers available
I large number of parameters (tens to hundreds)
Benchmark set Default Configured SpeedupRegions200 72 10.5 (11.4± 0.9) 6.8Conic.SCH 5.37 2.14 (2.4± 0.29) 2.51CLS 712 23.4 (327± 860) 30.43MIK 64.8 1.19 (301± 948) 54.54QP 969 525 (827± 306) 1.85
FocusedILS, 10 runs, 2 CPU days, 63 parameters
Heuristic Optimization 2014 26
Example, bottom-up generation of algorithms
Automatic design of hybrid SLS algorithms
Heuristic Optimization 2014 27
Automatic design of hybrid SLS algorithms[Marmion, Mascia, Lopes-Ibanez, Stutzle, 2013]
Approach
I decompose single-point SLS methods into components
I derive generalized metaheuristic structure
I component-wise implementation of metaheuristic part
Implementation
I present possible algorithm compositions by a grammar
I instantiate grammer using a parametric representation
I allows use of standard automatic configuration toolsI shows good performance when compared to, e.g., grammatical
evolution [Mascia, Lopes-Ibanez, Dubois-Lacoste, Stutzle, 2013]
Heuristic Optimization 2014 28
General Local Search Structure: ILS
s0 :=initSolutions⇤ := ls(s0)repeats 0 :=perturb(s⇤,history)s⇤0 :=ls(s 0)s⇤ :=accept(s⇤,s⇤0,history)
until termination criterion met
I many SLS methods instantiable from this structureI abilities
I hybridizationI recursionI problem specific implementation at low-level
Heuristic Optimization 2014 29
Grammar
<algorithm> ::= <initialization> <ils>
<initialization> ::= random | <pbs_initialization>
<ils> ::= ILS(<perturb>, <ls>, <accept>, <stop>)
<perturb> ::= none | <initialization> | <pbs_perturb>
<ls> ::= <ils> | <descent> | <sa> | <rii> | <pii> | <vns> | <ig> | <pbs_ls>
<accept> ::= alwaysAccept | improvingAccept <comparator>
| prob(<value_prob_accept>) | probRandom | <metropolis>
| threshold(<value_threshold_accept>) | <pbs_accept>
<descent> ::= bestDescent(<comparator>, <stop>)
| firstImprDescent(<comparator>, <stop>)
<sa> ::= ILS(<pbs_move>, no_ls, <metropolis>, <stop>)
<rii> ::= ILS(<pbs_move>, no_ls, probRandom, <stop>)
<pii> ::= ILS(<pbs_move>, no_ls, prob(<value_prob_accept>), <stop>)
<vns> ::= ILS(<pbs_variable_move>, firstImprDescent(improvingStrictly),
improvingAccept(improvingStrictly), <stop>)
<ig> ::= ILS(<deconst-construct_perturb>, <ls>, <accept>, <stop>)
<comparator> ::= improvingStrictly | improving
<value_prob_accept> ::= [0, 1]
<value_threshold_accept> ::= [0, 1]
<metropolis> ::= metropolisAccept(<init_temperature>, <final_temperature>,
<decreasing_temperature_ratio>, <span>)
<init_temperature> ::= {1, 2,..., 10000}
<final_temperature> ::= {1, 2,..., 100}
<decreasing_temperature_ratio> ::= [0, 1]
<span> ::= {1, 2,..., 10000}
Heuristic Optimization 2014 30
Grammar
<algorithm> ::= <initialization> <ils>
<initialization> ::= random | <pbs_initialization>
<ils> ::= ILS(<perturb>, <ls>, <accept>, <stop>)
<perturb> ::= none | <initialization> | <pbs_perturb>
<ls> ::= <ils> | <descent> | <sa> | <rii> | <pii> | <vns> | <ig> | <pbs_ls>
<accept> ::= alwaysAccept | improvingAccept <comparator>
| prob(<value_prob_accept>) | probRandom | <metropolis>
| threshold(<value_threshold_accept>) | <pbs_accept>
<descent> ::= bestDescent(<comparator>, <stop>)
| firstImprDescent(<comparator>, <stop>)
<sa> ::= ILS(<pbs_move>, no_ls, <metropolis>, <stop>)
<rii> ::= ILS(<pbs_move>, no_ls, probRandom, <stop>)
<pii> ::= ILS(<pbs_move>, no_ls, prob(<value_prob_accept>), <stop>)
<vns> ::= ILS(<pbs_variable_move>, firstImprDescent(improvingStrictly),
improvingAccept(improvingStrictly), <stop>)
<ig> ::= ILS(<deconst-construct_perturb>, <ls>, <accept>, <stop>)
<comparator> ::= improvingStrictly | improving
<value_prob_accept> ::= [0, 1]
<value_threshold_accept> ::= [0, 1]
<metropolis> ::= metropolisAccept(<init_temperature>, <final_temperature>,
<decreasing_temperature_ratio>, <span>)
<init_temperature> ::= {1, 2,..., 10000}
<final_temperature> ::= {1, 2,..., 100}
<decreasing_temperature_ratio> ::= [0, 1]
<span> ::= {1, 2,..., 10000}
Heuristic Optimization 2014 31
Grammar
<algorithm> ::= <initialization> <ils>
<initialization> ::= random | <pbs_initialization>
<ils> ::= ILS(<perturb>, <ls>, <accept>, <stop>)
<perturb> ::= none | <initialization> | <pbs_perturb>
<ls> ::= <ils> | <descent> | <sa> | <rii> | <pii> | <vns> | <ig> | <pbs_ls>
<accept> ::= alwaysAccept | improvingAccept <comparator>
| prob(<value_prob_accept>) | probRandom | <metropolis>
| threshold(<value_threshold_accept>) | <pbs_accept>
<descent> ::= bestDescent(<comparator>, <stop>)
| firstImprDescent(<comparator>, <stop>)
<sa> ::= ILS(<pbs_move>, no_ls, <metropolis>, <stop>)
<rii> ::= ILS(<pbs_move>, no_ls, probRandom, <stop>)
<pii> ::= ILS(<pbs_move>, no_ls, prob(<value_prob_accept>), <stop>)
<vns> ::= ILS(<pbs_variable_move>, firstImprDescent(improvingStrictly),
improvingAccept(improvingStrictly), <stop>)
<ig> ::= ILS(<deconst-construct_perturb>, <ls>, <accept>, <stop>)
<comparator> ::= improvingStrictly | improving
<value_prob_accept> ::= [0, 1]
<value_threshold_accept> ::= [0, 1]
<metropolis> ::= metropolisAccept(<init_temperature>, <final_temperature>,
<decreasing_temperature_ratio>, <span>)
<init_temperature> ::= {1, 2,..., 10000}
<final_temperature> ::= {1, 2,..., 100}
<decreasing_temperature_ratio> ::= [0, 1]
<span> ::= {1, 2,..., 10000}
Heuristic Optimization 2014 32
Flow-shop problem with weighted tardinessI Automatic configuration:
I 1, 2 or 3 levels of recursion (r)I 80, 127, and 174 parameters, respectivelyI budget: r ⇥ 10 000 trials each of 30 seconds
ALS1 ALS2 ALS3 soa−IG2660
027
000
2740
0
Algorithms
Fitn
ess
valu
e
ALS1 ALS2 ALS3 soa−IG
2420
024
600
2500
0
AlgorithmsFi
tnes
s va
lue
ALS1 ALS2 ALS3 soa−IG
3300
033
400
3380
0
Algorithms
Fitn
ess
valu
e
ALS1 ALS2 ALS3 soa−IG
4100
0042
0000
Algorithms
Fitn
ess
valu
e
ALS1 ALS2 ALS3 soa−IG
3250
0033
5000
Algorithms
Fitn
ess
valu
e
ALS1 ALS2 ALS3 soa−IG
4900
0050
0000
5100
00
AlgorithmsFi
tnes
s va
lue
results are competitive or superior to state-of-the-art algorithmHeuristic Optimization 2014 33
Example, design configurable algorithm framework
Multi-objective ant colony optimization (MOACO)
Heuristic Optimization 2014 34
Multi-objective Optimization
I many real-life problems are multiobjective
I no a priori knowledge Pareto-optimality
Heuristic Optimization 2014 35
MOACO frameworkLopez-Ibanez, Stutzle, 2012
I algorithm framework for multi-objectiveACO algorithms
I can instantiate MOACO algorithms from literature
I 10 parameters control the multi-objective part
I 12 parameters control the underlying pure “ACO” part
Example of a top-down approach to algorithm configuration
Heuristic Optimization 2014 36
MOACO framework
irace + hypervolume = automatic configurationof multi-objective solvers!
Heuristic Optimization 2014 37
Automatic configuration multi-objective ACO
MOACO (5)MOACO (4)MOACO (3)MOACO (2)MOACO (1)
mACO−4mACO−3mACO−2mACO−1
PACOCOMPETants
MACSBicriterionAnt (3 col)BicriterionAnt (1 col)
MOAQ
0.5 0.6 0.7 0.8 0.9 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
euclidAB100.tsp
0.5 0.6 0.7 0.8 0.9 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●●●
euclidAB300.tsp
0.5 0.6 0.7 0.8 0.9 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
euclidAB500.tsp
Heuristic Optimization 2014 38
Automatic configuration multi-objective ACO
MOACO−full (5)MOACO−full (4)MOACO−full (3)MOACO−full (2)MOACO−full (1)
MOACO−aco (5)MOACO−aco (4)MOACO−aco (3)MOACO−aco (2)MOACO−aco (1)
MOACO (5)BicriterionAnt−aco (5)BicriterionAnt−aco (4)BicriterionAnt−aco (3)BicriterionAnt−aco (2)BicriterionAnt−aco (1)
BicriterionAnt (3 col)
0.85 0.90 0.95 1.00 1.05 1.10
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
euclidAB100.tsp
0.85 0.90 0.95 1.00 1.05 1.10
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●●
●
●
euclidAB300.tsp
0.85 0.90 0.95 1.00 1.05 1.10
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
euclidAB500.tsp
Heuristic Optimization 2014 39
Why automatic algorithm configuration?
I improvement over manual, ad-hoc methods for tuning
I reduction of development time and human intervention
I increase number of considerable degrees of freedom
I empirical studies, comparisons of algorithms
I support for end users of algorithms
. . . and it has become feasible due to increasein computational power!
Heuristic Optimization 2014 40
Configuring configurators
What about configuring automatically the configurator? . . . andconfiguring the configurator of the configurator?
I can be done (example, see [Hutter et al., 2009]), but . . .I it is costly and iterating further leads to diminishing returns
Heuristic Optimization 2014 41
Towards a shift of paradigm in algorithm design
������������
�� ������
�������������
��������������
���� �������
Heuristic Optimization 2014 42
Towards a shift of paradigm in algorithm design
������������
�� ������
�������������
��������������
���� �������
Heuristic Optimization 2014 43
Towards a shift of paradigm in algorithm design
�� ���
������������
�� ������
�������������
��������������
���� �������
��������
� ���� �
Heuristic Optimization 2014 44
Conclusions
Status
I using automatic configuration tools is rewarding in terms ofdevelopment time and algorithm performance
I interactive usage of configurators allows humans to focus oncreative part of algorithm design
I many application opportunities also in other areas thanoptimization
Future work
I more powerful configurators
I more and more complex applications
I best practice
Heuristic Optimization 2014 45