Optimizing Metaheuristics and Hyperheuristics throughMulti-level Parallelism on a Many-core System
Jose-Matıas Cutillas-Lozano1
Luis-Pedro Garcıa2 Domingo Gimenez1
1Department of Computing and Systems, University of Murcia, Spain
2Service of Support to Technological Research, Technical University of Cartagena, Murcia,Spain
Workshop on Parallel Computing and OptimizationIPDPS, Chicago, May 23, 2016
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 1 / 28
Contents
1 Motivation
2 Parallel meta- and hyperheuristics
3 Test case problems
4 Experimental results
5 Conclusions
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 2 / 28
Motivation
Hyperheuristics based on parameterized metaheuristics
Selecting the appropriate values of parameters to apply a satisfactorymetaheuristic to a particular problem can be difficult and iscomputationally demanding.
Parallel hyperheuristics based on a metaheuristic schema are used toselect these values by searching in the space of metaheuristics.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 3 / 28
Motivation
Hyperheuristics based on parameterized metaheuristics
Selecting the appropriate values of parameters to apply a satisfactorymetaheuristic to a particular problem can be difficult and iscomputationally demanding.
Parallel hyperheuristics based on a metaheuristic schema are used toselect these values by searching in the space of metaheuristics.
Four levels of parallelism are applied, two at the hyperheuristic leveland two for the metaheuristics.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 3 / 28
Motivation
Hyperheuristics based on parameterized metaheuristics
Selecting the appropriate values of parameters to apply a satisfactorymetaheuristic to a particular problem can be difficult and iscomputationally demanding.
Parallel hyperheuristics based on a metaheuristic schema are used toselect these values by searching in the space of metaheuristics.
Four levels of parallelism are applied, two at the hyperheuristic leveland two for the metaheuristics.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 3 / 28
Motivation
Parallel meta- and hyperheuristics
For direct application of metaheuristics, auto-tuning techniques areused to optimize the execution time.
A many-core system (Xeon Phi) is used to accelerate the applicationof hyperheuristics by exploiting multi-level parallelism.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 4 / 28
Motivation
Parallel meta- and hyperheuristics
For direct application of metaheuristics, auto-tuning techniques areused to optimize the execution time.
A many-core system (Xeon Phi) is used to accelerate the applicationof hyperheuristics by exploiting multi-level parallelism.
Two problems are used as test cases:
The Maximum Diversity Problem (MDP)The Protein-Ligand Docking Problem (PLDP)
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 4 / 28
Motivation
Parallel meta- and hyperheuristics
For direct application of metaheuristics, auto-tuning techniques areused to optimize the execution time.
A many-core system (Xeon Phi) is used to accelerate the applicationof hyperheuristics by exploiting multi-level parallelism.
Two problems are used as test cases:
The Maximum Diversity Problem (MDP)The Protein-Ligand Docking Problem (PLDP)
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 4 / 28
Parallel meta- and hyperheuristics
Hyperheuristics
The problem to optimize by the hyperheuristic is to obtain asatisfactory metaheuristic for the problems used as test cases.
In the hyperheuristic an individual or element is represented by aninteger vector MetaheurParam of size 20 that encodes the set ofparameters that characterizes a metaheuristic.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 5 / 28
Parallel meta- and hyperheuristics
Hyperheuristics
The problem to optimize by the hyperheuristic is to obtain asatisfactory metaheuristic for the problems used as test cases.
In the hyperheuristic an individual or element is represented by aninteger vector MetaheurParam of size 20 that encodes the set ofparameters that characterizes a metaheuristic.
The fitness value in the hyperheuristic for an element MetaheurParam
is the fitness value obtained when the metaheuristic with theparameters in MetaheurParam is applied to a particular problem.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 5 / 28
Parallel meta- and hyperheuristics
Hyperheuristics
The problem to optimize by the hyperheuristic is to obtain asatisfactory metaheuristic for the problems used as test cases.
In the hyperheuristic an individual or element is represented by aninteger vector MetaheurParam of size 20 that encodes the set ofparameters that characterizes a metaheuristic.
The fitness value in the hyperheuristic for an element MetaheurParam
is the fitness value obtained when the metaheuristic with theparameters in MetaheurParam is applied to a particular problem.
Because a hyperheuristic executes a lot of metaheuristics, theexecution time is very large and parallelism is necessary.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 5 / 28
Parallel meta- and hyperheuristics
Hyperheuristics
The problem to optimize by the hyperheuristic is to obtain asatisfactory metaheuristic for the problems used as test cases.
In the hyperheuristic an individual or element is represented by aninteger vector MetaheurParam of size 20 that encodes the set ofparameters that characterizes a metaheuristic.
The fitness value in the hyperheuristic for an element MetaheurParam
is the fitness value obtained when the metaheuristic with theparameters in MetaheurParam is applied to a particular problem.
Because a hyperheuristic executes a lot of metaheuristics, theexecution time is very large and parallelism is necessary.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 5 / 28
Parallel meta- and hyperheuristics
Structure of the hyperheuristic
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 6 / 28
Parallel meta- and hyperheuristics
Metaheuristic parameters in the shared-memory parameterized schema
Initialize INEIni Initial Number of ElementsFNEIni Final Number of Elements after initializationPEIIni Percentage of Elements to Improve in the initializationIIEIni Intensification in the Improvement of initial ElementsSTMIni Short-Term Tabu Memory in the improvement of initial elements
EndCondition MNIEnd Maximum Number of IterationsNIREnd maximum Number of Iterations with Repetition of the best solution
Select NBESel Number of Best Elements selected for combinationNWESel Number of Worst Elements selected for combination
Combine NBBCom Number of Best-Best elements combinationsNBWCom Number of Best-Worst elements combinationsNWWCom Number of Worst-Worst elements combinations
Improve PEIImp Percentage of Elements to Improve after combinationIIEImp Intensification in the Improvement of Elements after combinationSMIImp Short-term tabu Memory in the Improvement after combination
Diversification PEDImp Percentage of Elements to DiversifyIDEImp Intensification in the Diversification of ElementsSMDImp Short-term tabu Memory in the Diversification
Include NBEInc Number of Best Elements to include in the reference setLTMInc Long-term Tabu Memory between iterations
The basic functions can be implemented in different ways, with a different meaning and
number of parameters
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 7 / 28
Parallel meta- and hyperheuristics
Metaheuristics and functions
Four pure metaheuristics, GRASP (GR), Genetic algorithm (GA),Scatter search (SS), Tabu Search (TS), and some combinations of thetype GR+GA+SS+TS are considered for meta- and hyperheuristics.
The basic functions are similar for meta- and hyperheuristics, withsmaller sizes for hyperheuristic sets and parameters due to theirhigher computational cost.
Combination of elements is made by groups of parameters inhyperheuristics to avoid possible incompatibilities.
For the MDP, fitness was calculated with several problem inputs inone execution (FitSP1E), reducing the dependence on the input andthe increase in the execution time.
F. Almeida, J.-M. Cutillas-Lozano, D. Gimenez: Hyperheuristics based on parameterized
metaheuristic schemes, GECCO, 2015
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 8 / 28
Parallel meta- and hyperheuristics
Parameterized shared-memory schema
Initialize(S ,ParamIni,ThreadsIni)while (not EndCondition(S ,ParamEnd,ThreadsEnd))
SS = Select(S ,ParamSel,ThreadsSel)SS1 = Combine(SS ,ParamCom,ThreadsCom)SS2 = Improve(SS1,ParamImp,ThreadsImp)S = Include(SS2,ParamInc,ThreadsInc)
Independent parallelization of the functions, with parallelism parameters(number of threads) for each function.
The optimum value of the parallelism parameters depends on the values ofthe metaheuristic parameters (the metaheuristic or combination ofmetaheuristics).
✜ F. Almeida, D. Gimenez, J. J. Lopez-Espın, M. Perez-Perez: Parameterised schemes of metaheuristics: basic ideas andapplications with Genetic algorithms, Scatter Search and GRASP, IEEE SMC, 43 (3), 570-586, 2013
✜ F. Almeida, D. Gimenez, J. J. Lopez-Espın: A Parametrized Shared-Memory Scheme for Parametrized Metaheuristics,Journal of Supercomputing, 58 (3), 292-301, 2011
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 9 / 28
Parallel meta- and hyperheuristics
Levels of parallelism in the schema
Functions with the same parallel schema are identified.
Allows fine and coarse grained parallelism by changing the number of threads in each level.
One-level parallel schema (schema 1)
omp set num threads(threads − one − level(MetaheurParam))#pragma omp parallel forloop in elements
treat element
i.e.: Initialize, Combine...
Two-level parallel schema (schema 2)
first–level(MetaheurParam):
omp set num threads(threads − first − level(MetaheurParam))#pragma omp parallel forloop in elements
second–level(MetaheurParam,threads − first − level)
second–level(MetaheurParam,threads − first − level):
omp set num threads(threads − second −
level(MetaheurParam,threads − first − level))#pragma omp parallel forloop in neighbors
treat neighbor
i.e.: Initialize, Improve...
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 10 / 28
Parallel meta- and hyperheuristics
Parallelism parameters in the shared-memory parameterized schema
Initialize TGEIni number of Threads for the initial Generation of Elements
TI1Ini number of Threads in the Improvement after initialization, first level
TI2Ini number of Threads in the Improvement after initialization, second level
Combine TCPCom number of Threads for the Combination of Pairs of elements
Improve TR1Imp number of Threads in the Improvement of the Reference set, first level
TR2Imp number of Threads in the Improvement of the Reference set, second level
TC1Imp number of Threads in the Improvement of elements obtained by Combination, first level
TC2Imp number of Threads in the Improvement of elements obtained by Combination, second level
Diversification TR1Div number of Threads in the Diversification of the Reference set, first level
TR2Div number of Threads in the Diversification of the Reference set, second level
TC1Div number of Threads in the Diversification of elements obtained by Combination, first level
TC2Div number of Threads in the Diversification of elements obtained by Combination, second level
Include TIEInc number of Threads for the Inclusion of Elements in the reference set
The parallelism can be implemented in different ways, resulting in different meanings
and numbers of parallelism parameters
J.-M. Cutillas-Lozano, D. Gimenez: Parameterized message-passing metaheuristic schemes on a heterogeneous computing
system, TPNC 2014
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 11 / 28
Parallel meta- and hyperheuristics
Modeling and auto-tuning
An auto-tuning methodology is systematically applied to reduceexecution time in the parallel-parameterized schema.
It has been applied only to the metaheuristics.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 12 / 28
Parallel meta- and hyperheuristics
Modeling and auto-tuning
An auto-tuning methodology is systematically applied to reduceexecution time in the parallel-parameterized schema.
It has been applied only to the metaheuristics.
It is necessary to select the values of the parallelism parameters(ThreadsIni , ThreadsCom, ThreadsImp, ThreadsDiv and ThreadsInc)in the first and the second parallelism levels. A model of theexecution time must be obtained for each function.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 12 / 28
Parallel meta- and hyperheuristics
Modeling and auto-tuning
An auto-tuning methodology is systematically applied to reduceexecution time in the parallel-parameterized schema.
It has been applied only to the metaheuristics.
It is necessary to select the values of the parallelism parameters(ThreadsIni , ThreadsCom, ThreadsImp, ThreadsDiv and ThreadsInc)in the first and the second parallelism levels. A model of theexecution time must be obtained for each function.
The optimum number of threads varies with the number ofindividuals, and we are interested in the selection at running time of anumber of threads close to the optimum.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 12 / 28
Parallel meta- and hyperheuristics
Modeling and auto-tuning
An auto-tuning methodology is systematically applied to reduceexecution time in the parallel-parameterized schema.
It has been applied only to the metaheuristics.
It is necessary to select the values of the parallelism parameters(ThreadsIni , ThreadsCom, ThreadsImp, ThreadsDiv and ThreadsInc)in the first and the second parallelism levels. A model of theexecution time must be obtained for each function.
The optimum number of threads varies with the number ofindividuals, and we are interested in the selection at running time of anumber of threads close to the optimum.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 12 / 28
Parallel meta- and hyperheuristics
Time model in shared-memory
One-level parallelism functions:
t1−nivel =ks1 · NE
p+ kp · p
p(opt.) =
√
ks1 · NE
kp
F1−level ks1 NE
Gen-Ini kg INEIni
Combine kc 2 · (NBBCom + NBWCom + NWWCom)Include ki NFEIni + 2 · (NBBCom + NBWCom + NWWCom) − NBEInc
Two-level parallelism functions:
t2−levels =ks2 · Param
p1 · p2+kp,1·p1+kp,2·p2
p1(opt.) = 3
√
√
√
√
ks2 · kp,2
k2p,1
· Param
p2(opt.) = 3
√
√
√
√
ks2 · kp,1
k2p,2
· Param
with Param = NE·PI·II100
F2−levels ks2 NE PI II
Imp-Ini kii INEIni PEIIni IMEIni
Imp-Ref kir NFEIni PEIImp IIEImp
Imp-Com kic 2 · (NBBCom + NBWCom + NWWCom) PEIImp IIEImp
Div kd NFEIni PEDImp IDEImp
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 13 / 28
Test case problems
The Maximum Diversity Problem (MDP)
A subset of m elements (an individual) from a set of n elements is selected so thatthe sum of the distances between the chosen elements is maximized.
Each element can be represented by a set of attributes. Considering sik as the
state or value of the k-th attribute of element i , where k = 1, ...,K . Then the
distance between elements i and j could be defined as:
dij =
√
√
√
√
K∑
k=1
(sik − sjk)2
Then, variable xi takes the value 1 if element i is selected and 0 otherwise,i = 1, ..., n:
Maximize
n−1∑
i=1
n∑
j=i+1
dijxixj
subject to
n∑
i=1
xi = m, xi = {0, 1}, 1 ≤ i ≤ n
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 14 / 28
Test case problems
The Protein-Ligand Docking Problem (PLDP)
Protein-ligand docking techniques search among libraries of small molecules(ligands) in order to identify those structures which are most likely to bind to adrug target (protein).
Fitness or scoring function calculates the binding energy between the atoms of theprotein and the ligand.
The search space is determined by the degrees of freedom of the protein and theligand, in our case six, three for translation and three for the rotation movementsof the ligand.
The values of the movements and rotations of the ligand can be approached withmetaheuristics.
The computation of the scoring function is the most costly part, so more threadsare assigned to this level.
Initial collaboration:
B. Imbernon, J. M. Cecilia, D. Gimenez: Enhancing Metaheuristic-based Virtual Screening Methods on Massively Parallel and
Heterogeneous Systems, PMAM-PPoPP, 2016
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 15 / 28
Experimental results
Computational environment
A multicore host machine with two Intel Xeon E5-2620 hexa-coreprocessors running at 2.40 GHz.
A Many Integrated Core (MIC) Intel Xeon Phi, with 57 cores at 1.1GHz based on Pentium (x86), each core supporting 4 hardwarethreads, with a bidirectional ring bus and up to 6 GBytes GDDR5.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 16 / 28
Experimental results
MDP. Hyperheuristic parameters
Hyperheuristics used for the selection ofmetaheuristics:
Hybrid Hyperheuristic (Hhy)
Reduced Hybrid Hyperheuristic (Hre)
Genetic Algorithm based Hyperheuristic (Hge)
Metaheuristic configurations:
Lower and upper limits of the metaheuristicparameters within the hyperheuristics search.
INEIni NFEIni PEIIni IMEIni STMIni NBESel NWESel NBBCom NBWCom
Hhy 20 20 50 5 5 10 10 15 20Hre 5 5 50 3 2 3 2 2 3Hge 20 20 0 0 0 20 0 10 0Lower 5 5 0 1 0 2 2 2 5Upper 100 100 100 10 0 50 50 7 15
NWWCom PEIImp IIEImp SMIImp PEDImp IDEImp SMDImp NBEInc LTMInc
Hhy 15 50 5 5 10 5 5 10 5Hre 2 50 3 2 10 5 2 3 5Hge 0 0 0 0 10 5 0 20 0Lower 2 0 1 0 0 1 0 2 0Upper 7 100 10 0 10 5 0 100 15
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 17 / 28
Experimental results
MDP. Thread affinity in Xeon Phi
Comparison of the execution time (in seconds) for some thread configurations anddifferent affinities when applying a hyperheuristic to MDP.Balanced is the best (lower times).
balanced scatter compact
ThT=50 16.03 15.71 22.41ThT=100 12.39 12.42 17.56ThT=150 12.17 12.32 17.60ThT=200 12.48 12.71 17.66average 13.27 13.29 18.80
12
14
16
18
20
22
ThT=50 ThT=100 ThT=150 ThT=200 average
time
(s)
balancedscatter
compact
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 18 / 28
Experimental results
MDP. Thread configuration for four levels of parallelism
Example of number of threads of one and two levels of parallelism for the execution of Hhy with a total of ThT=20threads.
ThH ThT series represents threads used in the hyperheuristic total number of threads series of the experiment, and therest of threads were assigned to the metaheuristics being selected at low level (ThM), with ThM · ThH = ThT.
Experiments with other numbers of threads (50, 75, 100, . . . 250) in the same way.
One-level parallel routines Two-level parallel routinesThH ThT series TGEIni TCPCom TIEInc TI Ini TR Imp TC Imp TR Div TC Div
ThH 20 1 p1 1 1 1 1 1 1 1 1p2 - - - 2 2 2 1 1
ThH 20 2 p1 2 2 2 1 1 1 2 2p2 - - - 2 2 2 1 1
ThH 20 3 p1 5 5 5 2 2 2 5 5p2 - - - 2 2 2 1 1
ThH 20 4 p1 5 5 5 1 1 1 1 1p2 - - - 5 5 5 2 2
ThH 20 5 p1 5 5 5 1 1 1 2 2p2 - - - 5 5 5 2 2
ThH 20 6 p1 10 10 10 2 2 2 5 5p2 - - - 5 5 5 2 2
ThH 20 7 p1 10 10 10 1 1 1 1 1p2 - - - 10 10 10 5 5
ThH 20 8 p1 10 10 10 1 1 1 2 2p2 - - - 10 10 10 5 5
ThH 20 9 p1 20 20 20 2 2 2 4 4p2 - - - 10 10 10 5 5
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 19 / 28
Experimental results
MDP. Time study with four levels of parallelism
Execution time in seconds for the Hhy and various thread combinations applied to areduced instance of MDP of size m10 n2 in Xeon Phi. Sequential time in Xeon Phi22.78 seconds.
total number of threads (ThT)thread combination 20 50 75 100 125 150 175 200 225 250
ThH ThT 1 23.10 21.79 15.11 14.92 15.23 15.04 15.37 15.19 15.15 15.42ThH ThT 2 21.80 14.96 12.33 11.53 11.69 11.12 11.70 11.62 11.78 11.88ThH ThT 3 15.09 11.56 10.73 10.29 10.41 9.92 10.64 10.50 10.62 10.61ThH ThT 4 21.63 21.15 14.96 14.87 15.22 14.60 15.24 15.09 14.97 14.89ThH ThT 5 21.22 14.95 12.12 11.41 11.70 11.09 11.71 11.44 11.67 12.05ThH ThT 6 14.87 11.54 10.49 10.51 10.55 10.48 10.41 10.41 10.49 10.50ThH ThT 7 21.51 21.38 15.10 14.98 14.94 14.82 15.14 15.08 15.21 15.04ThH ThT 8 21.19 15.05 12.26 11.75 11.93 11.55 11.97 11.74 11.92 12.09ThH ThT 9 14.95 11.93 11.66 11.24 11.54 10.87 11.28 11.29 11.30 11.69
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 20 / 28
Experimental results
MDP. Metaheuristic parameters
Values of the parameters for the four pure metaheuristics and those selectedautomatically by the genetic (M-Hge) and the reduced (M-Hre) hyperheuristics tobe applied to the MDP.
INEIni NFEIni PEIIni IMEIni STMIni NBESel NWESel NBBCom NBWCom
GR 150 1 100 3 0 0 0 0 0GA 100 100 0 0 0 100 0 50 0SS 75 50 100 3 0 25 25 5 10TS 150 1 100 3 0 1 0 0 0
M-Hge 96 30 43 3 0 14 9 6 12M-Hre 65 19 95 2 0 2 1 4 8
NWWCom PEIImp IIEImp SMIImp PEDImp IDEImp SMDImp NBEInc LTMInc
GR 0 0 0 0 0 0 0 0 0GA 0 0 0 0 10 5 0 100 0SS 5 100 5 0 0 0 0 25 0TS 0 100 5 0 0 0 0 1 3
M-Hge 4 39 2 0 8 4 0 30 7M-Hre 1 86 2 0 8 1 0 18 11
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 21 / 28
Experimental results
MDP. Autotuning results for metaheuristics
Constants of the model
One-level parallel routines Two-level parallel routinesIni Com Inc Imp-Ini Imp-Ref Imp-Com Div-Ref Div-Com
ks · 103 0.52 0.34 0.26 337 332 713 320 149
kp,1 · 103 0.053 0.03 0.048 5.70 5.74 7.58 26.8 75.5
kp,2 · 103 - - - 1.52 0.96 22.3 25.7 71.8
Optimum number of threads selected
One-level parallel routines Two-level parallel routinesTGEIni TCPCom TIEInc TI Ini TR Imp TC Imp TR Div TC Div
GR p1 38 0 2 19 0 0 0 0p2 - - - 7 0 0 0 0
GA p1 31 24 23 0 0 0 8 4p2 - - - 0 0 0 7 4
SS p1 27 15 19 15 13 30 0 0p2 - - - 7 7 7 0 0
TS p1 38 0 0 19 4 0 0 0p2 - - - 7 7 0 0 0
M-Hge p1 26 13 17 8 8 15 3 1p2 - - - 7 7 5 4 1
M-Hre p1 14 9 12 10 7 16 2 1p2 - - - 7 7 5 2 1
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 22 / 28
Experimental results
MDP. Statistical summary of the results
Fitness obtained when applying different metaheuristics to three sizes of the MDP.
P−M_seq M−H_seq P−M_par M−H_par6600
0070
0000
7400
0078
0000
metaheuristics
fitne
ss
MDP n500m50
P−M_seq M−H_seq P−M_par M−H_par8400
8800
9200
9600
metaheuristics
fitne
ss
MDP n300m60
P−M_seq M−H_seq P−M_par M−H_par
4000
4200
4400
4600
metaheuristics
fitne
ss
MDP n400m40
sM-H: set of metaheuristics obtained from hyperheuristics.
s P-M: set of four pure metaheuristics.
s Sequential and in parallel.
s Time limit 40 minutes
s The Kruskal-Wallis test revealed statistical differences in themeans of the sets of metaheuristics for the problem sizesstudied.
s The best algorithms for all problem sizes were themetaheuristics selected by hyperheuristics (M-H) in parallel.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 23 / 28
Experimental results
PLDP. Metaheuristic parameters
s Two hyperheuristics were considered for the PLDP: a GRASP-based Hyperheuristic (Hgr)with reduced values of the parameters consisting of a set of INEIni=5 individuals to beimproved with an intensification of IIEIni=3; and basic Random Search Hyperheuristic(Hrs) with medium size, INEIni=100.
s Values of the parameters for four metaheuristics not selected automatically (M1 to M4)and those selected by a random search (M-Hrs) and a GRASP-based (M-Hgr)hyperheuristic.
INEIni NEIIni IIEIni NFBEIni NFWEIni NBESel NWESel
M1 64 32 10 16 16 20 12M2 64 64 10 32 32 40 24M3 96 96 10 50 46 46 50M4 128 128 10 64 128 64 64
M-Hrs 86 24 2 7 17 14 10M-Hgr 96 62 4 27 35 28 27
NBBCom NWWCom NBWCom NEIImp IIEImp NBEInc
M1 10 5 10 0 0 16M2 20 15 20 0 0 32M3 40 25 40 0 0 50M4 20 15 20 0 0 64
M-Hrs 42 14 45 0 0 6M-Hgr 36 49 6 0 0 7
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 24 / 28
Experimental results
PLDP. Time study with four level of parallelism
Execution time in seconds for the Hrs and various thread combinations applied to thePLDP in Xeon Phi. Sequential time in the multicore 7983 seconds.
total number of threads (ThT)thread combination 20 50 100 150 200 250
ThH ThT 1 795 428 346 367 366 361ThH ThT 2 782 508 388 296 327 422ThH ThT 3 940 495 354 283 282 338ThH ThT 4 765 519 359 339 324 393ThH ThT 5 950 469 343 375 390 397ThH ThT 6 1021 530 396 401 345 365ThH ThT 7 1449 696 398 331 477 357ThH ThT 8 1436 734 407 407 308 442ThH ThT 9 1512 910 459 416 318 265
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 25 / 28
Experimental results
PLDP. Speed-up and fitness results
Execution time (in seconds) and speed-up obtained when applying the metaheuristics to thePLDP:
M1 M2 M3 M4 M-Hrs M-Hgr
Multicore 83.7 306.1 697.0 268.4 146.9 149.0Xeon Phi 15.9 49.9 140.6 61.4 131.5 127.2speed-up 5.3 6.1 5.0 4.4 1.1 1.2
Fitness obtained when applying the metaheuristics to the PLDP sequentially and in parallel. Alimit of 1000 seconds was established for each execution:
M1 M2 M3 M4 M-Hrs M-Hgr
Multicore -107,6 -120,5 -96,7 -101,9 -104,3 -100,3Xeon Phi -111,2 -127,8 -130,8 -127,5 -125,7 -124,7
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 26 / 28
Conclusions
Conclusions
Parallel hyperheuristics based on parameterized metaheuristic schemas selectsuitable metaheuristics for two test case problems.
Four levels of parallelism (two at the hyperheuristic level, and two at themetaheuristic level), with auto-tuning techniques for the optimum selectionof threads on the metaheuristics.
The MIC architecture (Xeon Phi) is tested to enhance the performance ofthe hyperheuristic schema with massive parallelism.
The four-level parallel implementations can be used for faster or bettersolutions.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 27 / 28
Conclusions
Future research
Application of the parameterized schema and the auto-tuning methodologyto other optimization problems (data envelopment analysis anddetermination of chemical components of polymers).
The high computational cost and the importance of the Protein-LigandDocking Problem makes advisable a deeper analysis of the application of thehyperheuristic schema with four parallelism levels.
The inclusion of new basic metaheuristics (Ant Colony Optimization,Particle Swarm Optimization...) would result in a larger number ofmetaheuristic parameters.
Similar algorithms for GPU and heterogeneous clusters comprising nodes ofmulticores+multiGPU+multiMIC.
Cutillas-Lozano, Garcıa, Gimenez (SCPP) Hyperheuristics on a Many-core System PCO, May 23, 2016 28 / 28