Post on 20-Jan-2016
transcript
An Introduction to Metaheuristics
Chun-Wei Tsai
Electrical Engineering, National Cheng Kung University
Page 2
Outline
Optimization Problem and Metaheuristics
Metaheuristic Algorithms
• Hill Climbing (HC)
• Simulated Annealing (SA)
• Tabu Search (TS)
• Genetic Algorithm (GA)
• Ant Colony Optimization (ACO)
• Particle Swarm Optimization (PSO)
Performance Consideration
Conclusion and Discussion
Page 3
Optimization Problem
The optimization problems
– continuous
– discrete
The combinatorial optimization problem (COP) is a kind of the discrete optimization problems
Most of the COPs are NP-hard
Page 4
The problem definition of COP
The combinatorial optimization problem
P = (S, f) can be defined as:
where opt is either min or max, x = {x1, x2, . . . , xn} is a set of variables, D1,D2, . . . ,Dn are the variable domains, f is an objective function to be optimized, and f : D1×D2×· · ·×Dn R+. In addition, S = {s | s ∈ D1×D2×· · ·×Dn} is the search space. Then, to solve P, one has to find a solution s ∈ S with optimal objective function value.
1 2 3 4solutions1
D1 D2 D3 D4
1234
1234
1234
1234
D1 D2 D3 D4
1234
1234
1234
1234
2 2 3 3solutions2
Page 5
Combinatorial Optimization Problem and Metaheuristics (1/3)
Complex Problems
– NP-complete problem (Time)
• No optimum solution can be found in a reasonable time with limited computing resources.
• E.g., Traveling Salesman Problem
– Large scale problem (Space)
• In general, this kind of problem cannot be handled efficiently with limited memory space.
• E.g., Data Clustering Problem, astronomy, MRI
Page 6
Combinatorial Optimization Problem and Metaheuristics (2/3)
Traveling Salesman Problem (n!)
– Shortest Routing Path
Path 1:
Path 2:
Page 7
Combinatorial Optimization Problem and Metaheuristics (3/3)
Metaheuristics
– It works by guessing the right directions for finding the true or near optimal solution of complex problems so that the space searched, and thus the time required, can be significantly reduced.
s1
s2
s4
s3
s5opt
D1
D2
opt
f
opt
s2
s1
s4
s5
s3
Page 8
The Concept of Metaheuristic Algorithms
The word “meta” means higher level while the word “heuristics” means to find. (Glover, 1986)
The operators of metaheuristics
– Transition: play the role of searching the solutions (exploration and exploitation).
– Evaluation: evaluate the objective function value of the problem in question.
– Determination: play the role of deciding the search directions.
Transition
Evaluation
Determination
s1 = (1,1)
s2 = (2,2)
o1 = 5
o2 = 3
d1
s1
s2
opt
D1
D2
s1
s'2
opt
D1
D2
s1
s'2
opt
D1
s'1D2
d2
Page 9
An example-Bulls and cows
Check all candidate solutions
Guess Feedback Deduction
– Secret number: 9305
– Opponent's try: 1234
• 0A1B
• 1234
– Opponent's try: 5678
• 0A1B
• 5678
– number 0 and 9 must be the secret number
from wiki
Transition
Evaluation
Determination
Transition
Evaluation
Determination
Page 10
Classification of Metaheuristics (1/2)
The most important way to classify metaheuristics
– population-based vs. single-solution-based (Blum and Roli, 2003)
The single-solution-based algorithms work on a single solution, thus the name
– Hill Climbing
– Simulated Annealing
– Tabu Search
The population-based algorithms work on a population of solutions, thus the name
– Genetic Algorithm
– Ant Colony Optimization
– Particle Swarm Optimization
Page 11
Classification of Metaheuristics (2/2)
Single-solution-based
– Hill Climbing
– Simulated Annealing
– Tabu Search
Population-based
– Genetic Algorithm
Swarm Intelligence
– Ant Colony Optimization
– Particle Swarm Optimization
Page 12
Hill Climbing (1/2)
greedy algorithm based on heuristic adaptation of the objective function to explore a better
landscape
begin
t 0
Randomly create a string vc
Repeat
evaluate vc
select m new strings from the neighborhood of vc
Let vn be the best of the m new strings
If f(vc) < f(vn) then vc vn
t t+1
Until t N
end
Page 13
Hill Climbing (2/2)
Starting pointStarting point
Local optimum
Global optimum
search space
Page 14
Simulated Annealing (1/3)
Metropolis et al., 1953
From the annealing process found in the thermodynamics and metallurgy
To avoid the local optimum, SA allows worse moves with a controlled probability---temperature
The temperature will become lower and lower due to the convergence condition
Page 15
Simulated Annealing (2/3)
begin
t 0
Randomly create a string vc
Repeat
evaluate vc
select 3 new strings from the neighborhood of vc
Let vn be the best of the 3 new strings
If f(vc) < f(vn) then vc vn
Else if (T > random()) then vc vn
Update T according to annealing schedule
t t+1
Until t N
end
vc = 01110
f = 01110 = 3
n1 = 00110, n2 = 11110, n3 = 01100
vn = 11110
vc = vn= 11110
Page 16
Simulated Annealing (3/3)
Starting pointStarting point
Local optimum
Global optimum
search space
Local optimum
Page 17
Tabu Search (1/3)
Fred W. Glover, 1989
To avoid falling into the local optima and searching the same solutions, the solutions recently visited are saved in a list, called the tabu list (a short-term memory the size of which is a parameter).
Moreover, when a new solution is generated, it will be inserted into the tabu list and will stay in the tabu list until it is replaced by a new solution in a first-in-first-out manner.
http://spot.colorado.edu/~glover/
Page 18
Tabu Search (2/3)
begin
t 0
Randomly create a string vc
Repeat
evaluate vc
select 3 new strings from the neighborhood of vc and not in the tabu list
Let vn be the best of the 3 new strings
If f(vc) < f(vn) then vc vn
Update tabu list TL
t t+1
Until t N
end
Page 19
Tabu Search (3/3)
Starting point
Local optimum
Global optimum
search space
Local optimum
Page 20
Genetic Algorithm (1/5)
John H. Holland, 1975
Indeed, the genetic algorithm is one of the most important population-based algorithms.
Schema Theorem
– short, low-order, above-average schemata receive exponentially increasing trials in subsequent generations of a genetic algorithms.
David E. Goldberg
– http://www.illigal.uiuc.edu/web/technical-reports/
Page 21
Genetic Algorithm (2/5)
Page 22
Genetic Algorithm (3/5)
Page 23
Genetic Algorithm (4/5)
Initialization operators
Selection operators
– Evaluate the fitness function (or the objective function)
– Determinate the search direction
Reproduction operators
Crossover operators
– Recombine the solutions to generate new candidate solutions
Mutation operators
– To avoid the local optima
Page 24
Genetic Algorithm (5/5)
begin
t 0
initialize Pt
evaluate Pt
while (not terminated) do
begin
t t+1
select Pt from Pt-1
crossover and mutation Pt
evaluate Pt
end
end
p1 = 01110, p2 = 01110 p3 = 11100, p4 = 00010
f1 = 01110 = 3, f2 = 01110 = 3f3 = 11100 = 3, f4 = 00010 = 1
s1 = 01110 = 0.3, s2 = 01110 = 0.3s3 = 11100 = 0.3, s4 = 00010 = 0.1
s4 = 11100 = 0.3
p1 = 011 10, p2 = 01 110p3 = 111 00, p4 = 11 100
c1 = 011 00, c2 = 01 100 c3 = 111 10, c4 = 11 110
c1 = 01100, c2 = 01100c3 = 11110, c4 = 11110
c1 = 01101, c2 = 01110 c3 = 11010, c4 = 11111
24
Page 25
Ant Colony Optimization (1/5)
Marco Dorigo, 1992
Ant colony optimization (ACO) is another well-known population-based metaheuristic originated from an observation of the behavior of ants by Dorigo
The ants are able to find out the shortest path from a food source to the nest by exploiting pheromone information
http://iridia.ulb.ac.be/~mdorigo/HomePageDorigo/
Page 26
Ant Colony Optimization (2/5)
nest
food
nest
food
Page 27
Ant Colony Optimization (3/5)
Create the initial weights of each path
While the termination criterion is not met
Create the ant population s = {s1, s2, . . . , sn}
Each ant si moves one step to the next city according the pheromone rule
Update the pheromone
End
Page 28
Ant Colony Optimization (4/5)
Solution construction
– for choosing the next sub-solution is defined as follows:
where is the set of feasible (or candiate) sub-solutions that can be the next sub-solution of i; is the pheromone value between the sub-solutions i and j; and is a heuristic value which is also called the heuristic information.
Page 29
Ant Colony Optimization (5/5)
Pheromone Update is employed for updating the pheromone values on each edge e(i, j), which is defined as follows:
where is the number of ants; represents either the length of the tour created by ant k.
Page 30
Particle Swarm Optimization (1/4)
James Kennedy and Russ Eberhart, 1995
The particle swarm optimization originates from an observation of the social behavior by Kennedy and Eberhart
global best, local best, and trajectory
http://clerc.maurice.free.fr/pso/
Page 31
Particle Swarm Optimization (2/4)
IPSO, http://appshopper.com/education/pso
http://abelhas.luaforge.net/
Page 32
Particle Swarm Optimization (3/4)
Create the initial population (particle positions) s = {s1, s2, . . . , sn} and particle velocities v = {v1, v2, . . . , vn}
While the termination criterion is not met
Evaluate the fitness values fi of each particle si
For each particle
Update the particle position and velocity
IF (fi < f’i ) Update the local best f’i = fi
IF (fi < fg ) Update the global best fg = fi
End
global best
local best,personal best
trajectory, current motion
New motion
Page 33
Particle Swarm Optimization (4/4)
particle’s position and velocity update equations :
velocity Vik+1 = wVi
k +c1 r1 (pbi-sik) + c2 r2(gb-si
k)
where vik: velocity of particle i at iteration k,
w, c1, c2: weighting factor, r1, r2: uniformly distributed random number between 0 and 1,
sik: current position of agent i at
iteration k, pbi: pbest of particle i, gb: gbest of the group.
position Xik+1 = Xi
k + Vik+1
Larger Larger ww global search ability global search abilitySmaller Smaller ww local search ability local search ability
Page 34
Summary
Page 35
Performance Consideration
Enhancing the Quality of the End Result
– How to balance the Intensification and Diversification
– Initialization Method
– Hybrid Method
– Operator Enhancement
Reducing the Running Time
– Parallel Computing
– Hybrid Metaheuristics
– Redesigning the procedure of Metaheuristics
Page 36
Large Scale Problem
Methods for solving large scale problems (Xu and Wunsch, 2008)
– random sampling
– data condensation
– density-based approaches
– grid-based approaches
– divide and conquer
– Incremental learning
Page 37
How to balance the Intensification and Diversification
Intensification
– Local Search, 2-opt, n-opt
Diversification
– Keeping Diversity
– Fitness Sharing
– Increase the number of individuals
• More computing resource
– Re-create
Too much intensification local optimum
Too much diversification random search
opt
opt
Intensification
Diversification
Page 38
Reducing the Running Time
Parallel computing
– This method generally does not reduce the overall computation time.
– master-slave model, fine-grained model (cellular model) and coarse-grained model (island model) [Cant´u-Paz, 1998; Cant´u-Paz and Goldberg, 2000]
Population
Sub-Population Island 1
Sub-Population Island 3
Sub-Population Island 2
Sub-Population Island 4
Migration procedure
Page 39
Multiple-Search Genetic Algorithm (1/3)
The evolutionary process of MSGA.
Page 40
Multiple-Search Genetic Algorithm (2/3)
Multiple Search Genetic Algorithm (MSGA) vs. Learnable Evolution Model (LEM)
TSP problem pcb442
Tsai’s MSGA Michalski’s LEM
Page 41
Multiple-Search Genetic Algorithm (3/3)
It may face the premature convergence problem because diversity of metaheuristics may decrease too quickly.
Each iteration may take more computation time than that of the original algorithm.
Page 42
Pattern Reduction Algorithm
Concept
Assumptions and Limitations
The Proposed Algorithm
– Detection
– Compression
– Removal
Page 43
Concept (1/4)
Our observation shows that a lot of computations of most, if not all, of the metaheuristic algorithms during their convergence process are redundant.
43
(Data courtesy of Su and Chang)
Page 44
Concept (2/4)
44
Page 45
Concept (3/4)
45
Page 46
Concept (4/4)
0 0 1 0
1 1 1 0
C1
C2
.
.
.
g =1, s =4
g =2, s =4
g =n, s =4
0 0 1 0
1 1 1 0
C1
C2
g=1, s =4
0 0
1 1
C1
C2
0 0
1 1
C1
C2
.
.
.
g =2, s =2
g =n, s =2
Metaheuristics Metaheuristics +
Pattern Reduction
0 0 1 0
1 1 1 0
C1
C2
0 0 1 0
1 1 1 0
C1
C2
Page 47
Assumptions and Limitations
Assumptions
– Some of the sub-solutions at certain point in the evolution process will eventually end up being part of the final solution (Schema Theory, Holland 1975)
– Pattern Reduction (PR) is able to detect these sub-solutions as early as possible during the evolution process of metaheuristics.
Limitations
– The proposed algorithm requires that the sub-solutions be integer or binary encoded (i.e., combinatorial optimization problem).
47
Page 48
Some Results of PR
Page 49
The Proposed Algorithm
Create the initial solutions P = {p1, p2, . . . , pn}
While termination criterion is not met
Apply the transition, evaluation, and determination operators of the metaheuristics in question to P
/* Begin PR */
Detect the sub−solutions R = {r1, r2, . . . , rm} that have a high probability not to be changed
Compress the sub−solutions in R into a single pattern, say, c
Remove the sub−solutions in R from P; that is, P = P \ R
P = P ∪ {c}
/* End PR */
End
49
Page 50
Detection Time-Oriented
– Detect patterns not changed in a certain number of iterations
– aka static patterns
Space-Oriented
– Detect sub-solutions that are common at certain loci
Problem-Specific
– E.g., for the k-means, we are assuming that patterns near a centroid are unlikely to be reassigned to another cluster.
50
P1: 1352476
P2: 7352614 …
T1
P1: 1 C1 476
P2: 7 C1 614
Tn
T1: 1352476
T2: 7352614
T3: 7352416
Tn: 7 C1 416
…
Page 51
x9
x6
x8
x12
x1
x2
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
Cluster 0
Cluster 2
Cluster 1
Mean
1 1 1 1 2 2 2 2 3 3 3 3
Problem-Specific
51
P: Number of patterns
x9
x10x11
x6
x7x8
x5
x12
x4
x1
x3x2
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
Cluster 0
Cluster 2
Cluster 1
Mean
1 1 1 1 2 2 2 2 3 3 3 3
Removed
P = 12 P = 9
Page 52
Compression and Removal
The compression module plays the role of compressing all the sub-solutions to be removed whereas the removal module plays the role of removing all the sub-solutions once they are compressed.
Lossy Method
– May cause a “small” loss of the quality of the end result.
52
Page 53
An Example
Page 54
Simulation Environment
The empirical analysis was conducted on an IBM X3400 machine with 2.0 GHz Xeon CPU and 8GB of memory using CentOS 5.0 running Linux 2.6.18.
Enhancement in percentage
– ((Tn - To) / To ) x 100
TSP
– Traditional Genetic Algorithm (TGA), HeSEA, Learnable Evolution Model (LEM), Ant Colony System (ACS), Tabu Search (TS), Tabu GA, Simulated Annealing (SA).
Clustering
– Standard k-means (KM), Relational k-means (RKM), Kernel k-means (KKM), Scheme Kernel k-means (SKKM), Triangle Inequality k-means (TKM), Genetic k-means Algorithm (GKA), or Particle Swarm Optimization (PSO)
54
Page 55
The Results of Traveling Salesman Problem (1/2)
55
Page 56
The Results of Traveling Salesman Problem (2/2)
56
Page 57
The Results of Data Clustering Problem (1/3)Data sets for Clustering
57
Page 58
The Results of Data Clustering Problem (2/3)
58
Page 59
The Results of Data Clustering Problem (3/3)
59
Page 60
Time Complexity
Ideally, the running time of “k-means with PR” is independent of the number of iterations.
In reality, however, our experimental result shows that setting the removal bound to 80% gives the best result.
where n is the number of patterns, k the number of clusters, l the number of iterations, and d number of dimensions.
60
Page 61
Conclusion and Discussion (2/2)
In this presentation, we introduce the
– Combinatorial Optimization Problem
– Several Metaheuristic Algorithms
– The Performance Enhancement Method
• MSGA, PREGA and so on.
Future Work
– Developing an efficient algorithm that will not only eliminate all the redundant computations but also guarantee that the quality of the end results by “metaheuristics with PR” is either preserved or even enhanced, with respect to those of “metaheuristics by themself.”
– Applying the proposed framework to other optimization problems and metaheuristics.
Page 62
Conclusion and Discussion (2/2)
Future Work
– Applying the proposed framework to continuous optimization problem.
– Developing more efficient detection, compression, and removal methods.
Discussion
• E-mail: cwtsai87@gmail.com• MSN: cwtsai87@yahoo.com.tw• Web site: http://cwtsai.ee.ncku.edu.tw/
• Chun-Wei Tsai is currently a postdoctoral fellow at the Electrical Engineering of National Cheng Kung University.
• His research interests include evolutionary computation, web information retrieval, e-Learning, and data mining.
Page 64
s1 = 0 1 1 0 0 s2 = 1 0 0 1 1
f1 = 2 f2 = 3
s1 = 0 1 1 1 0 s2 = 1 0 0 0 1 Framework for metaheuristics
Create the initial solutions s = {s1, s2, …, sn}
While termination criterion is not met
Transit s to s’
Evaluate the objective function value of each solution s’I in s’.
Determine s
End
p1 = 2/5 p2 = 3/5
p1 = 2/6 p2 = 4/6
p1 = 3/7 p2 = 4/7
p1 = 4/9 p2 = 5/9
.
.
.
g = 1
g = 2
g = 3
g = niteration
fitne
ss
64
Page 65
Initialization Methods (1/4)
In general, the initial solutions of metaheuristics are randomly generated and it may take a tremendous number of iterations, and thus a great deal of time, to converge.
– Sampling
– Dimension reduction
– Greedy method
65
Page 66
Initialization Methods (2/4)
66
Page 67
Initialization Methods (3/4)
67
Page 68
Initialization Methods (4/4)
fitne
ss
iteration
averagetotal
The refinement initialization method can provide a more stable solution and enhance the performance of metaheuristics.
The risk of using the refinement initialization methods is that it may cause metaheuristics to fall into local optima.
68
Page 69
Local Search Methods (1/2) The local search methods play an important role in fine-tuning the
solution found by metaheuristics.
13 8 5 6 7 4 9 2 0
31 8 5 6 7 4 9 2 0 18 3 5 6 7 4 9 2 015 8 3 6 7 4 9 2 016 8 5 6 7 4 9 2 017 8 5 6 3 4 9 2 014 8 5 6 7 3 9 2 019 8 5 6 7 4 3 2 012 8 5 6 7 4 9 3 010 8 5 6 7 4 9 2 3
69
Page 70
Local Search Methods (2/2)
We have to take into account how to balance metaheuristics and local search. Longer computation time may provide a better result, but
there is no guarantee.
70
Page 71
Hybrid Methods
Hybrid method combines pros from different metaheuristic algorithms for enhancing the performance of metaheuristics
– E.g., GA plays the role of global search while SA plays the role of local search.
– Again, we may have to balance the performance of the hybrid algorithm.
71
Page 72
Data Clustering Problem
– Partitioning the n patterns into k groups or clusters based on some similarity metric.
image1 codebook image2
k-means Vector Quantization
72