Post on 04-Feb-2018
transcript
Seminar
University of Cyprus
Department of Public and Business Administration
September 2004
An Introduction to
Optimization Heuristics
Manfred Gilli
Department of Econometrics, University of Geneva and FAME
www.unige.ch/ses/metri/gilli
1. Optimization heuristics (an overview)
2. Threshold Accepting
3. Portfolio optimization with TA
M.Gilli Optimization heuristics 2
References
Winker, P., (2001): Optimization Heuristics in Econometrics.Wiley, Chichester.
Winker, P. and M. Gilli, (2004): Applications of optimizationheuristics to estimation and modelling problems. Computa-tional Statistics & Data Analysis 47, 211–223.(www.sciencedirect.com/csda/)
Winker, P. and D. Maringer, (2005): Threshold Acceptingin Economics and Statistics. (to appear in Kluwer AppliedOptimization Series).
Gilli, M. and E. Kellezi, (2002): Portfolio Optimization withVaR and Expected Shortfall. In Computational Methods inDecision-making, Economics and Finance, (Eds. E.J. Konto-ghiorghes, B. Rustem and S. Siokos), 165–181, Kluwer Ap-plied Optimization Series.
Gilli, M. and E. Kellezi, (2002): The Threshold Accept-ing Heuristic for Index Tracking. In Financial Engineering,E-Commerce, and Supply Chain, (Eds. P. Pardalos andV.K. Tsitsiringos), 1–18, Kluwer Academic Publishers, Boston.
Gilli, M. and P. Winker, (2003): A Global OptimizationHeuristic for Estimating Agent Based Models. Computa-tional Statistics and Data Analysis, 42, 299–312.(www.sciencedirect.com/csda/)
M.Gilli Optimization heuristics 3
Lecture 1
Optimization heuristics
(an overview)Outline
• Standard optimization paradigm
• Heuristic optimization paradigm
• Overview of optimization heuristics
– Simulated annealing
– Threshold accepting
– Tabu search
– Genetic algorithm
– Ant colonies
• Elements for a classification
– Basic characteristics
– Hybrid meta-heuristics
M.Gilli Optimization heuristics 4
The standard optimization paradigm
Optimization problems in estimation and modelling
typically expressed as:
maxx∈X
f(x) (1)
search space X ⊂ Rn (possibly discrete)
(1) often synonymous with solution xopt
assumed to exist and frequently to be unique !!
McCullough and Vinod (1999, p. 635) state:
‘Many textbooks convey the impression that all one
has to do is use a computer to solve the problem,
the implicit and unwarranted assumptions being
that the computer’s solution is accurate and that
one software package is as good as any other ’.
Obviously this assumption is not necessarily met
Rather than being globally convex and well behaved
functions for real applications may look like →
M.Gilli Optimization heuristics 5
M.Gilli Optimization heuristics 6
Example from statistics:
Least median of squares estimator (LMS)
yi = xTi θ + εi i = 1, . . . , N
θLMS = argminθ
QLMS(θ)
QLMS(θ) = medi (r2i ) median of squared residuals
r2i = (yi − xTi θ)2.
M.Gilli Optimization heuristics 7
Example: Objective function for the estimation of
the parameters of an agent based model of financial
markets
0.010.02
0.030.04
0.05
0.10.2
0.30.4
0.50
1
2
3
4
εδ
M.Gilli Optimization heuristics 8
Only objective functions for two dimensional prob-
lems can be illustrated. In real applications it is
most likely that we have to optimize with respect
to many variables, which makes the problem much
more complex.
Classical optimization paradigm understood as:
• solution is identified by means of enumeration
or differential calculus
• existence of (unique) solution presumed
• convergence of classical optimization methods
for the solution of the corresponding first-order
conditions
Many optimization problems in statistics (e.g. OLS
estimation) fall within this category
However many optimization problems resist this
standard approach
M.Gilli Optimization heuristics 9
Limits of the classical optimization paradigm
• Problems which do not fulfill the requirementsof these methods
• Cases where the standard optimization para-digm can be applied, but problem sizes mayhinder efficient calculation.
Classification (relative to the classical optimizationparadigm) of the universe of estimation and mod-elling problems:
TTTTTTTTTTTTTTTTTTTTTTTTTTTTT
&%
'$
TTTTTTTTTTTTTTTTTTTTTTTTTTT
6
easy to solve
Continuous
½½
½½½=
Discrete
tractable by standardapproximation methods
•
QQQk
application of standard methodswill probably fail
M.Gilli Optimization heuristics 10
• Set X of possible solutions:
– continuous– discrete
• Easy to solve:
– continuous: (e.g. LS estimation) allowing
for an analytical solution– discrete: allowing for a solution by enumer-
ation for small scaled problems
• Tractable by standard approximation methods:
Solution can be approximated reasonably well
by standard algorithms (e.g. gradient methods)
• Complementary set: Straightforward applica-
tion of standard methods will, in general, not
even provide a good approximation of the global
optimum
The heuristic optimization paradigm
Methods:
• Based on concepts found in nature
• Have become feasible as a consequence of
growing computational power
• Although aiming at high quality solution, they
cannot pretend to produce the exact solution
in every case with certainty
Nevertheless, a stochastic high–quality approximation of
a global optimum is probably more valuable than a deter-
ministic poor–quality local minimum provided by a clas-
sical method or no solution at all.
• Easy to implement to different problems
• Side constraints on the solution can be taken
into account at low additional cots
M.Gilli Optimization heuristics 11
Cross–Examination of Optimization Paradigms
ParadigmProperty Classical HeuristicAvailability
√growing
Deterministic√
sometimes
Efficiency√
/– –
Solution quality√
/– +
Multi purpose –√
M.Gilli Optimization heuristics 12
Overview of optimization heuristics
Two broad classes:
• Construction methods (greedy algorithms)
• Local search methods
Solution space not explored systematically
A particular heuristic is characterized by the
way the walk through the solution domain is
organized
M.Gilli Optimization heuristics 13
Classical local search for minimization
1: Generate current solution xc
2: while stopping criteria not met do
3: Select xn ∈ N (xc) (neighbor to current sol.)
4: if f(xn) < f(xc) then xc = xn
5: end while
Selection of neighbor xn and criteria for acceptance
define the walk through the solution space
Stopping criteria (a given number of iterations)
Classical meta-heuristics:
• Simulated annealing• Tabu search• Genetic algorithms• Ant colonies
Different rules for choice and/or acceptance of
neighbor solution
All (except Tabu search) accept uphill moves
(in order to escape local minima)
M.Gilli Optimization heuristics 14
Simulated annealing (SA)
• Kirkpatrick, Gelatt and Vecchi (1983)
• Based on analogy between combinatorialoptimization and annealing process of solids
• Improvement of solution for move fromxc to xn always accepted
• Accepts uphill move, only with given probability(decreases in a number of rounds to zero)
1: Generate current solution xc, initialize Rmax and T
2: for r = 1 to Rmax do
3: while stopping criteria not met do
4: Compute xn ∈ N (xc) (neighbor to current sol.)
5: Compute M= f(xn)− f(xc) and generate u (urv)
6: if (M < 0) or (e−M/T > u) then xc = xn
7: end while
8: Reduce T
9: end for
u
1
∆/T
e−∆/T
M.Gilli Optimization heuristics 15
Threshold accepting (TA)
• Dueck and Scheuer (1990)
• Deterministic analog of Simulated Annealing
• Sequence of temperatures T replaced by
sequence of thresholds τ .
• Statement 6. of SA algorithm becomes:
if M < τ then xc = xn
• Statement 8: threshold τ reduced instead of T
M.Gilli Optimization heuristics 16
Tabu search (TS)
• Glover and Laguna (1997)
• Designed for exploration of discrete search spaceswith finite set of neighbor solutions
• Avoids cycling (visiting same solution more thanonce) by use of short term memory (tabu list,most recently visited solutions)
• Statement 3: choice of xn may or may notexaminate all neighbors of xc
If more than one element is considered,xn corresponds to the best neighbor solution
1: Generate current solution xc and initialize tabu list T = ∅2: while stopping criteria not met do3: Compute xn ∈ N (xc) and xn 6∈ T4: if f(xn) < f(xc) then xc = xn and T = T ∪ xn
5: Update memory
6: end while
• Statement 5: a simple way to update memoryis to remove older entries from tabu list
• Stopping criterion: given number of iterationsor number of consecutive iterations without im-provement
M.Gilli Optimization heuristics 17
Genetic algorithm (GA)
• Imitates evolutionary process of species that
sexually reproduce
• Do not operate on a single current solution,
but on a set of current solutions (population)
• New individuals P ′′ generated with cross-over :
combines part of genetic patrimony of each
parent and applies a random mutation
If new individual (child), inherits good charac-
teristics from parents → higher probability to
survive
M.Gilli Optimization heuristics 18
1: Generate current population P of solutions
2: while stopping criteria not met do
3: Select P ′ ⊂ P (mating pool), initialize P ′′ = ∅ (childs)
4: for i = 1 to n do
5: Select individuals xa and xb at random from P ′
6: Apply cross-over to xa and xb to produce xchild
7: Randomly mutate produced child xchild
8: P ′′ = P ′′ ∪ xchild
9: end for
10: P = survive(P ′, P ′′)
11: end while
Statement 3: Set of starting solutions
Statements 4–10: Construction of neighbor sol.
Survivors P (new population) formed either by:
• last generated individuals P ′′ (childs)
• P ′′ ∪ fittest fromP ′• only the fittest from P ′′
• the fittest from P ′ ∪ P ′′
M.Gilli Optimization heuristics 19
Ant colonies (AC)
• Colorni, Dorigo and Maniezzo (1992)
• Imitates the way ants search for food and find
their way back to their nest
• First an ant explores its neighborhood randomly.
As soon as a source of food is found it starts
to transport food to the nest leaving traces of
pheromone on the ground which guide other
ants to the source
• Intensity of the pheromone traces depend on
quantity and quality of food available at source
as well as from distance between source and
nest, as for a short distance more ants will
travel on the same trail in a given time interval.
• As ants preferably travel along important trails
their behavior is able to optimize their work
• Pheromone trails evaporate and once a source
of food is exhausted the trails will disappear and
the ants will start to search for other sources
M.Gilli Optimization heuristics 20
• The search area of the ant corresponds to a
discrete set of solutions
• The amount of food is associated with an ob-
jective function
• The pheromone trail is modelled with an adap-
tive memory
1: Initialize pheromone trail
2: while stopping criteria not met do
3: for all ants do
4: Deposit ant randomly
5: while solution incomplete do
6: Select next element randomly according to
pheromone trail
7: end while
8: end for
9: Update pheromone trail
10: end while
M.Gilli Optimization heuristics 21
Reinforced process:
• Within same time more ants can pass shorter route
• More pheromone on shorter route
• More ants attracted
M.Gilli Optimization heuristics 22
Real life ants:
• Leave chemical marks (pheromone)
• Use pheromone for orientation
• Prefer trails with high pheromone
How do ants know where to go ?
• Ant at point i
• τij intensity of pheromone trail from i → j
• ηij visibility (constant) from i → j
• probability to go to j (simplest version):
pij =τij ηij∑
k
τik ηik
Trail update:
• Old pheromone evaporates partly (0 < ρ < 1)
• Ant on route i → j with length `ij spreads q
pheromone
M τij =q
`ij
• New pheromone tail
τ t+1ij = ρ τ t
ij + M τ tij
M.Gilli Optimization heuristics 23
Applications:
• Travelling salesman problem• Quadratic assignment problem• Job scheduling problem• Graph coloring problem• Sequential ordering
References:
• Colorni, Dorigo and Maniezzo (1992)• Overview on different versions and applications:
Bornabeau, Dorigo and Theraulaz (1999):
Swarm Intelligence
Elements for classification
Meta-heuristic: general skeleton of an algorithm
(applicable to a wide range of problems)
May evolve to a particular heuristic (if specialized
to solve a particular problem)
• Meta-heuristics: made up by different compo-
nents
• If components from different meta-heuristics
are assembled → hybrid meta-heuristic
Proliferation of heuristic optimization methods: →need for taxonomy or classification
M.Gilli Optimization heuristics 24
Basic characteristics of the meta-heuristics:
• Trajectory method: current solution slightly
modified by searching within the neighborhood
of the current solution (SA, TS)
• Discontinuous method: full solution space avail-
able for new solution. Discontinuity induced by
generation of starting solutions, (GA, AC) cor-
responds to jumps in search space
• Single agent method:
one solution per iteration processed (SA, TS)
• Multi-agent or population based method: Pop-
ulation of searching agents all of which con-
tribute to the collective experience (GA, AC)
• Guided search (search with memory usage):
Incorporates additional rules and hints on where
to search (GA: population represents memory
of recent search experience, AC: pheromone
matrix represents adaptive memory of previ-
ously visited solutions, TS: tabu list provides
short term memory)
• Unguided search or memoryless method:
relies perfectly on the search heuristic
M.Gilli Optimization heuristics 25
Meta-heuristics and their features:
Features SA TA TS GA AC
Trajectory methods√ √ √
(√
) (√
)
Discontinuous methods no no no√ √
Single agent√ √ √
no no
Population based no no no√ √
Guided search (memory) no no√ √ √
Unguided search (memoryless)√ √
no no no
M.Gilli Optimization heuristics 26
Hybrid meta-heuristics (HMH):
Combine elements of classical meta-heuristics →allows to imagine a large number of new techniques
Motivated by need to achieve tradeoff between:
• capabilities to explore search space
• possibility to exploit experience accumulated
during search
Classification combines hierarchical and flat scheme:
High-level (H)
Low-level (L)
Relay (R)
Co-evol (C)
Homogeneous
Heterogeneous
Global
Partial
GeneralSpecial
.....................................................................................................................
.............................................................................................
..................................................................................................................................................................................................................
...........................................................................................................................................
..................................................................................................................................
..................................................................................................................................
...........................................................................................................................................
............................................................................................................................................................
...................................................................................................................................................................................
...........................................................................................................................................
..................................................................................................................................
..................................................................................................................................
...........................................................................................................................................
............................................................................................................................................................
...................................................................................................................................................................................
M.Gilli Optimization heuristics 27
Hierarchical classification of hybridizations
• Low-level: replaces component of given MH by
component from another MH
• High-level: different MH are self-contained
• Relay: combines different MH in a sequence
• Co-evolutionary: different MH cooperate
M.Gilli Optimization heuristics 28
Examples:
• Low-level Relay: (not very common) e.g. SA
where neighbor xn is obtained as: select xi in
larger neighborhood of xc and perform a de-
scent local search. If this point is not accepted
return to xc (not xi).
• Low-level Co-evolutionary: GA and AC perform
well in exploration of search space but weak in
exploitation of solutions found → hybridization
for GA: greedy heuristic for crossover and TS
for mutation
• High-level Relay: e.g. greedy heuristic to gen-
erate initial population of GA and/or SA and
TS to improve population obtained by GA
Another ex.: use heuristic to optimize another
heuristic, i.e. find optimal values for parameters
• High-level Co-evolutionary: many self-contained
algorithms cooperate in a parallel search to find
an optimum
M.Gilli Optimization heuristics 29
Flat classification of hybridizations
• Homogenous versus Heterogeneous: same MH
used versus combination of different MH
• Global versus Partial: all algorithms explore
same solution space versus partitioned solution
space
• Specialist versus General: combination of MH
which solve different problems versus all MH
solve the same problem
e.g. a high-level relay hybrid for the optimiza-
tion of another heuristic is a specialist hybrid
M.Gilli Optimization heuristics 30
Lecture 2
Threshold Accepting (TA)• Introduction
Builds on the tutorial given by P. Winker at the “Computa-
tional Management Science” Conference and Workshop on
“Computational Econometrics and Statistics”, University of
Neuchatel, Switzerland, 2–5 April 2004.
M.Gilli Optimization heuristics 31
Basic features of TA
• Similar to Simulated Annealing
• Local search heuristic (suggests slight random
modifications to the current solution thus grad-
ually moves through the search space)
• Suited for problems where solution space has
a local structure and where we can define a
neighborhood around the solution
• Accepts uphill moves to escape local optima
(on a deterministic criterion)
M.Gilli Optimization heuristics 32
What do we expect from a heuristic?
• A good approximation to the global optimum
• To be robust to problem changes with respect
to tuning parameters
• Easy to implement to many problem instances
M.Gilli Optimization heuristics 33
• Local search similar to Simulated Annealing
• Allows uphill moves
• Requires local structure on search space
• Requires a threshold sequence
• Converges asymptotically to global optimum
M.Gilli Optimization heuristics 34
Implementation involves definition of:
• Neighborhood (Local structure on search space)
• Objective function and constraints
• Threshold sequence
M.Gilli Optimization heuristics 35
Neighborhood definition
• Ω search space
• for each element x ∈ Ω
we define the neighborhood N (x) ∈ Ω
(cannot be generated !!)
• for current solution xc we compute xn ∈ N (xc)
• Neighborhood defined with ε-spheres
N (xc) = xn|xn ∈ Ω , ‖xn − xc‖ < ε‖ · ‖ Euclidian or Hamming distance
M.Gilli Optimization heuristics 36
Local structure
• Objective function should exhibit local behavior
with regard to the chosen neighborhood
For elements in N (xn) the value of the objective
function should be close to f(xold) (closer than
randomly selected points)
Trade-off between large neighborhoods which
guarantee non-trivial projections and small neigh-
borhoods with a real local behavior of the ob-
jective function.
• Neighborhood relatively easy to define for func-
tions with real values variables (more difficult
for combinatorial problems)
M.Gilli Optimization heuristics 37
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.2
0.25
0.3
0.35
0.4
0.45
0.5
M.Gilli Optimization heuristics 38
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.2
0.25
0.3
0.35
0.4
0.45
0.5
M.Gilli Optimization heuristics 39
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.2
0.25
0.3
0.35
0.4
0.45
0.5
M.Gilli Optimization heuristics 40
Pseudo-code for TA
1: Initialize nR, nS and τr, r = 1,2, . . . , nR
2: Generate current solution xc ∈ X3: for r = 1 to nR do
4: for i = 1 to nS do
5: Generate xn ∈ N (xc) (neighbor of xc)
6: if f(xn) < f(xc) + τr then
7: xc = xn
8: end if
9: end for
10: end for
M.Gilli Optimization heuristics 41
Objective function
• Objective function is problem specific(not necessarily smooth or differentiable)
• Performance depends on fast (and exact)calculation. This can be a problem if the ob-jective function is the result of a Monte Carlosimulation.
• Local updating (to improve performance)Directly compute ∆, instead of computing ∆from f(xn)− f(xc)
Ex: Traveling salesman problem
A B
C D
..............................................................................................................................
.....................................................................................................................
................................................................................................................................................................................
.....................................................................................................................
..............................................................................................................................
A B
C D
..............................................................................................................................
...............................................................................................
................................................................................................................................................................................
......................
...............................................................................................
......................
..............................................................................................................................
∆ = d(A, C) + d(C, B) + d(B, D)
−d(A, B)− d(B, C)− d(C, D)
M.Gilli Optimization heuristics 42
Constraints
• Search space Ω is a subspace Ω ⊂ Rk
• if subspace not connected or it is difficult to
generate elements in Ω:
– use Rk as search space
– add penalty term to objective function,
if xn 6∈ Ω
– increase penalty term during iterations
M.Gilli Optimization heuristics 43
Lower bounds
• Theoretical lower bounds on the objective func-
tion help to assess the performance of the al-
gorithm
• In particular, if f sol = lower bound, a global
optimum is identified
M.Gilli Optimization heuristics 44
Threshold sequence
• Althofer and Koschnick (1991) prove conver-
gence for “appropriate threshold sequence”
• In practice the threshold sequence is computed
from the empirical distribution of a sequence of
∆’s
1: for i = 1 to n do
2: Randomly choose x1 ∈ Ω
3: Compute x2 ∈ N (x1)
4: Compute ∆i = |f(x1)− f(x2)|5: end for
6: Compute empirical distribution of trimmed ∆i,
i = 1, . . . , b0.8nc7: Provide percentiles Pi, i = 1, . . . , nR
8: Compute corresponding quantiles Qi, i = 1, . . . , nR
The threshold sequence τi, i = 1, . . . , nR
corresponds to Qi, i = 1, . . . , nR
M.Gilli Optimization heuristics 45
Empirical distribution of ∆’s
0.18 1.47 3.89x 10
−5
0
0.3
0.70.9
M.Gilli Optimization heuristics 46
Restarting TA
• Stochastic search heuristics like TA can be rep-
resented as stochastic mapping
TA : Ω → fmin, fmin ∼ DTA(µ, σ)
fmin is the random realization of the minimum
found by the algorithm for a given random num-
ber sequence.
DTA truncated from left by
fglobalmin = inff(x)|x ∈ Ω → DTA not normal !!!!
• Repeated application (with different seeds) of
TA (i = 1, . . . , R) provide empirical distribution
of results (fmin)
• Standard procedure reports:
minf imin|i = 1, . . . , R and sometimes R
• We suggest to provide: number of restarts R,
the empirical mean and standard deviation or
some quantiles
M.Gilli Optimization heuristics 47
Question: How to chose the number of restarts?
For a given amount of computational ressources
(total iterations)
• The larger R the better the distribution
DTA(µ1, σ1) is approximated
• Less restarts and more iterations results in an
approximation of lower quality of a different dis-
tribution DTA(µ2, σ2) with a smaller expectation
µ2 < µ1
• Trade-off !!!
M.Gilli Optimization heuristics 48
The following two tables are from Winker and Maringer (2005)
Traveling salesman problem with 442 points described in
Winker (2001, Ch. 8)
Iterations 100 000 1 000 000 10 000 000
Restarts 10 000 1 000 100
Mean 5317 5170 5138SD 52.8 28.7 21.810% 5251 5135 51125% 5234 5125 51071% 5204 5110 5098
A higher number of iterations produces lower means
(µ3 < µ1) and quantiles
But estimation of lower quantile for 10 000 000 it-
erations based on 100 observations only !!!
Does not answer question about how many restarts
to chose for given amount of ressources
M.Gilli Optimization heuristics 49
Each sequence or restarts of the previous results is
divided into 100 sub-results, i.e. instead of consid-
ering 10 000 restarts we consider 100 times 1 000
restarts, etc.
We then count how often the overall best solution
has been found in each of the 100 sub-problems.
Iterations 100 000 1 000 000 10 000 000
Restarts 100 10 1
Times bestin 100 0 65 35
Meandeviationfrom best 73.5 5.2 14.5
Best choice for number of restarts falls between 2
extremes !!
A moderate number of restart seems a good choice.
M.Gilli Optimization heuristics 50
Lecture 3
Portfolio optimization with TAOutline
• Why do we need heuristics for
Portfolio optimization
– Returns and risk measures
– Mean-variance framework
– Mean-downside risk framework
– Index tracking
– Constraints in practice
• Applications
– Benchmarking TA (mean/variance case)
– Computing mean/downside-risk frontiers
– Tracking an artificial index (benchmarking)
– Tracking market indices
• Conclusions
M.Gilli Optimization heuristics 51
Traditionally portfolio optimization deals with re-
turns and risk.
In the mean-variance framework one wants either
to minimize risk for a given return or maximise re-
turn for a given risk. In such a framework classical
optimization methods work efficiently.
More recent approaches use different risk measures
which are VaR (lower quantiles) or expectations
conditional to such a quantile (expected shortfall).
Also there are some practical constraints which
in general are not considered by the classical ap-
proach.
In these situations classical optimization methods
can not be used any more.
This is where we suggest the use of heuristic opti-
mization methods.
Returns
Returns of financial assets are random variables
• What distributions are appropriate ?
• How to model dependency ?
• Established facts:
– asymmetry
– fat tails
– in general, not normal
−10 −8 −6 −4 −2 0 2 4 6 8 100
1
2
3
4
5
6
7
8
9
10x 10
−3
M.Gilli Optimization heuristics 52
Risk measures (1)
Variance
(second central moment):
Var(v) = E[(v − Ev)2]
• Penalizes negative as well as positive
deviations from the mean
• Does not account for asymmetry
• May not exist (e.g. if tails too fat)
M.Gilli Optimization heuristics 53
Risk measures (2)
Downside-Risk
• Partial (or conditional) moments of distribution
• Related to losses (rather than gains)
• Measures deviation from a target
– benchmark return
– short term interest rate
– desired return
M.Gilli Optimization heuristics 54
−10 −8 −6 −4 −2 0 2 4 6 8 100
5
10
15
20
25
p = 0.05
VaRp
Value at Risk: quantile of distribution of portfolio
value
VaRp = F−1(p)
Shortfall probability : probability for value of
portfolio to fall below VaRp
p = P (v < VaRp)
Expected Shortfall : expected (conditional) value
for losses below the threshold VaRp
ESp = E(v | v < VaRp)
Semi-variance :
E[(v − Ev)2 | v < Ev]
M.Gilli Optimization heuristics 55
Value at Risk
Industry standard for measuring market risk
(since Basle accord I (1988))
Week points:
• Estimates a single point of distribution of losses
• Does not inform about size of losses beyond
VaR, i.e. extreme events (with low probability)
but catastrophic consequences.
• Does not satisfy sub-additivity (VaR of a port-
folio might be superior to sum of VaRs of assets
in portfolio).
Expected shortfall appears superior.
M.Gilli Optimization heuristics 56
Mean-variance framework (1)
• Introduced by Markowitz (1952)
• Principle: a portfolio is optimal if it maximizes
the return for a given level of risk
– One tries to find the most attractive
combination of return and risk
– Return: mean of future expected gains
– Risk: variance of future gains
0 0.01 0.02 0.03 0.04 0.05 0.060
1
2
3
4
5
6x 10
−3
Standard Deviation
Exp
ecte
d R
etur
n
M.Gilli Optimization heuristics 57
Mean-variance framework (2)
Advantages
• Optimization can be done efficiently
• Well introduced among practitioners and
academics
Limitations
• Builds on restrictive hypotheses:
normality of returns
existence of first two moments of distribution
• Lack of flexibility : several practical constraints
cannot be handled with standard
optimization techniques
• Inconveniences of variance as a measure of risk
M.Gilli Optimization heuristics 58
Mean-Downside Risk framework (1)
A more recent approach for portfolio choice
Introduces downside risk as a criterion for
portfolio choice plus realistic constraints.
Optimization in this framework becomes complex.
M.Gilli Optimization heuristics 59
Mean – Downside Risk framework (2)
• Mean – VaR : the investor maximizes the
future value of the portfolio under the
constraint that the probability, for the future
value to go below VaR, does not exceed β
maxx
Ev
P (v < VaR) ≤ β∑
j xj = v0
x`j ≤ xj ≤ xu
j j ∈ P
• Mean – Expected Shortfall : the investor
constrains the size of losses beyond VaR
maxx
Ev
E(v | v < VaR) ≥ ν∑
j xj = v0
x`j ≤ xj ≤ xu
j j ∈ P
M.Gilli Optimization heuristics 60
Index Tracking (1)
Reproduce performance of a market index by
investing in a small number of assets.
-
t1︸ ︷︷ ︸
Period ofobservation
t2︸ ︷︷ ︸
t3
Trackingperiod
?
ConstructTPF
?
Rebalance
TPF at t−3
We consider:
• realistic problem sizes
• realistic constraints
M.Gilli Optimization heuristics 61
Index Tracking (2)
nA + 1 number of assets in the market
pit price of asset i at time t
xit quantity of asset i in portfolio at time t
Pt composition of portfolio at time t
Pt = xit | i = 0,1, . . . , nA
Jt set of indices of assets in Pt
Jt = i | xit 6= 0
portfolio is rebalanced at t−
M.Gilli Optimization heuristics 62
Index Tracking (3)
vt− value of portfolio before rebalancement
vt− =nA∑
i=0
xi,t−1 pit
vt value of portfolio after rebalancement
vt =nA∑
i=0
xit pit =∑
i∈Jt
xit pit
rIt return of index I for period [t− 1, t]
rIt = ln
(It
It−1
)
rPt return of portfolio P for period [t− 1, t]
rPt = ln
(vt−vt−1
)= ln
( ∑nAi=0 xi,t−1 pit∑nA
i=0 xi,t−1 pi,t−1
)
M.Gilli Optimization heuristics 63
Index Tracking (4)
• Tracking error (TE) Often defined as the
variance of the deviation of portfolio returns
from an index
Such a definition allows for zero TE
even if portfolio underperforms the market
Oct99 Jan00 Apr00 Aug00 Nov00 Feb01 May01 Sep01 Dec01 Mar02800
900
1000
1100
1200
1300
1400
1500
1600
S&P 500
M.Gilli Optimization heuristics 64
Objective function
• Tracking error for period [t1, t2]
Et1,t2 =
(∑t2t=t1
| rPt − rI
t |α)1
α
t2 − t1
• Average of deviations
Rt1,t2 =
∑t2t=t1
(rPt − rI
t )
t2 − t1
• Objective function (to be minimized)
Ft1,t2 = λ Et1,t2 − (1− λ)Rt1,t2 λ ∈ [0,1]
M.Gilli Optimization heuristics 65
Constraints (all kind of portfolios)
• Cardinality
#Jt ≤ K
• Size
xit ≥ 0 i = 0, . . . , nA
εi ≤ xitpit∑i∈Jt
xitpit≤ δi i ∈ Jt 0 ≤ εi < δi ≤ 1
• Transaction costs
Ct ≤ γ vt−
• Minimum round lots
xit = yit si
si lot size
yit number of lots
M.Gilli Optimization heuristics 66
The optimization problem
minPt1
Ft1,t2 = λ Et1,t2 − (1− λ)Rt1,t2
Ct1 ≤ γ vt−1
∑
i∈Jt1
pi,t1 xi,t1 + Ct1 = vt−1
εi ≤pi,t1
xi,t1∑
i∈Jt1
pi,t1 xi,t1
≤ δi i ∈ Jt1
#Jt1 ≤ K
M.Gilli Optimization heuristics 67
Other nonstandard objective functions
• maximize the probability that return on portfo-
lio beats return on benchmark by a given per-
centage before going below it by more than
another percentage
• minimize the expected time until portfolio beats
the benchmark
• maximize the expected reward obtained upon
reaching the performance goal
• minimize the expected penalty paid upon falling
to a shortfall level
• . . .
M.Gilli Optimization heuristics 68
Optimization tools
• QP works for:
– Mean – variance with short selling constraints,
size and class constraints and
linear or convex transaction costs
– Index tracking with variance as TE and same
constraints
– is efficient (comes with standard software)
• Standard optimization techniques can be no
longer used if we add constraints on:
– cardinality (number of assets in portfolio)
– round lots
– buy-in threshold
– non convex transaction costs
M.Gilli Optimization heuristics 69
For this problem classical methods fail to produce
reliable results and we have to resort to heuristic
optimization
M.Gilli Optimization heuristics 70
Related work
• Beasley, Meade and Chang (1999)
→ GA
• Chang, Meade, Beasley and Sharaiha
→ SA, GA, TS (cardinality constraints)
• Bertsimas, Darnell and Soucy (1999)
→ MIP
• Mansini and Speranza (1999)
→ LP-based heuristics (roundlots)
• Konno and Wijayanayake (2001)
→ (concave transaction costs)
• Krokhmal, Palmquist and Uryasev (2002)
Rockafellar and Uryasev (2000, 2002)
→ LP (C–VaR)
• Lobo, Fazel and Boyd (2000)
→ MIP-based heuristics
• Jobst, Horniman, Lucas and Mitra (2000)
→ QMIP
• Kleber and Maringer (2001)
Ant colonies, GA
M.Gilli Optimization heuristics 71
Parameters:
– nR number of rounds
– nS steps per round
– τr, r = 1,2, . . . , nR threshold sequence
Implementation steps
Definition of:
– (Objective function f(x))
– Neighborhood xnew ∈ Nxold
∗ Draw 2 assets a, b with probability 1nA
∗ Move fraction q from a to b
∗ Check if constraints hold
– Thresholds
∗ Evaluate objective function for a large
number of random portfolios
∗ Compute neighbors and their distances
∗ Compute empirical distribution of distances
∗ Threshold defined by quantiles
M.Gilli Optimization heuristics 72
Benchmarking the TA algorithm
• Data from SMI (1997–99)• Pentium III 800 MHz, Matlab 5.x
Mean-variance solution with QP and TA
0.5 1.23 2x 10
−3
1
5
8.5
10x 10
−3
Variance of portfolio
Exp
ecte
d re
turn
Starting portfolio
Optimized portfolio
34 42 82 890
0.1
0.2
0.3
0.4
Asset index
Weight
QPTA
M.Gilli Optimization heuristics 73
Returns on optimized portfolios under
shortfall constraints:
Computation of efficient frontiers
• Initial capital v0 = 8000000
• Shortfall probability β = 0.05
• 7700000 ≤ VaR ≤ 8000000
• 7550000 ≤ ES ≤ 7850000
Problem: How describe distribution of
future returns (prices)?
• generate price scenarios
• define empirical distribution from historical prices
• resample from set of historical prices
M.Gilli Optimization heuristics 74
Mean-VaR
(ps resampled from historical prices)
minx
− 1
nS
nS∑
s=1
x′ps
#s |x′ps < VaR ≤ β nS
x′ p0 = v0
ι′z ≤ K⌈
ωlj v0zj
p0j
⌉≤ xj ≤
⌊ωu
j v0zj
p0j
⌋j = 1, . . . , nA
zj ∈ 0,1 j = 1, . . . , nA
M.Gilli Optimization heuristics 75
Mean-Expected Shortfall
(ps resampled from historical prices)
minx
− 1
nS
nS∑
s=1
vs
1#s|vs<VaR
∑
s|vs<VaR
vs ≥ ν
#s | vs < VaR ≤ β nS
x′ p0 = v0
ι′z ≤ K⌈
ωlj v0zj
p0j
⌉≤ xj ≤
⌊ωu
j v0zj
p0j
⌋j = 1, . . . , nA
zj ∈ 0,1 j = 1, . . . , nA
M.Gilli Optimization heuristics 76
Efficient frontier for Mean-VaR/ES:
• 19 assets of SMI (1997–99)
• for period t, compute 500 2-weeks returns
• resample from set of 500 returns
• compute Mean–VaR/ES portfolios v1i , i = 1, . . . , n
v0 = 8000000
for i = 1 to n do
P (v1i < VaRi) ≤ .05
E(v1i |v1
i < VaRi) ≥ ESi
max number of assets in portfolio 10
min/max holding size [.005,0.5] v0
transaction cost = 0
no shortselling
end for
M.Gilli Optimization heuristics 77
Efficient frontier for Mean-VaR:
7.7 7.8 7.9 8x 10
6
8
8.01
8.02
8.03
8.04
8.05
8.06
8.07x 106
VaR
Expected portfolio value
M.Gilli Optimization heuristics 78
Efficient frontier for Mean-ES:
7.5 7.6 7.7 7.8 7.9x 10
6
8
8.01
8.02
8.03
8.04
8.05
8.06
8.07x 106
Expected Shortfall
Expected portfolio value
M.Gilli Optimization heuristics 79
Index Tracking: Benchmarking with artificial index
• Data set from Beasley: http://mscmga.ms.ic.
ac.uk/jeb/orlib/indtrackinfo.html
– Hang Seng (31assets)– DAX 100 (85 assets)– FTSE 100 (89 assets)– S&P 100 (98 assets)– Nikkei (225 assets)– global set (528) assets
• construct index by randomly choosing K assets
and their weights (εi = 0.01) ≤ ωi ≤ (δi = 1)
• find with TA the portfolio that tracks the index
– starting portfolio randomly chosen– α = 1– no constraint on transaction costs– no cardinality restriction
• repeat optimization 1000 times and count
number of times TA finds assets in the index
M.Gilli Optimization heuristics 80
34 54 123 184 217 253 274 275 417 4860
0.05
0.1
0.15
0.2
Asset index
Weights
Artificial index portfolioTracking portfolio
0 50 100 150 200 250 3000.4
0.6
0.8
1
1.2
1.4
time
Index value
Artificial index Tracking portfolio
M.Gilli Optimization heuristics 81
10−4
10−3
10−2
10−1
−2
−1
0
1
2
3
4x 10
−3
Tracking error
Exp
ecte
d r
etu
rn
0 0.5 1 1.5 2 2.5 3 3.5 4
x 104
10−4
10−3
10−2
steps
Ob
jective
fu
nctio
n
Starting portfolio
Solution
M.Gilli Optimization heuristics 82
Confidence intervals of the TA solution
0 0.5 1
x 10−4
0
1
2
3
4
5x 10
4 DAX
ns = 1000/1000
0 0.5 1
x 10−4
0
1
2
3
4
5x 10
4 FTSE
ns = 1000/1000
0 0.5 1
x 10−4
0
1
2
3
4
5x 10
4 SP
ns = 1000/1000
0 0.5 1
x 10−4
0
1
2
3
4
5x 10
4 Nikkei
ns = 996/1000
0 1 2
x 10−3
0
2000
4000
6000
8000
10000HangSeng
ns = 917/1000
0 1 2
x 10−3
0
2000
4000
6000
8000
10000AllMarkets
ns = 957/1000
M.Gilli Optimization heuristics 83
Confidence intervals of the TA solution (continued)
0 0.5 1
x 10−4
0
1
2
3
4
5x 10
4 All Markets
ns = 998/1000
• mean 6.4× 10−5
• standard deviation 2.1× 10−5
M.Gilli Optimization heuristics 84
Tracking errors and execution times
IndexNumber of
assets Tracking errorTime(sec)
Hang Seng 31 1.80× 10−5 5
DAX 85 4.65× 10−5 6
FTSE 89 3.11× 10−5 7
S&P 98 4.85× 10−5 7
Nikkei 225 1.80× 10−4 13
All markets 528 2.02× 10−4 22
M.Gilli Optimization heuristics 85
Out-of-sample performance
• observe market index in period [100,245]
• find tracking portfolio and look at its
performance in period [245,290]
3 4 5 15 16 18 20 23 25 270
0.1
0.2
0.3
0.4
0.5Total transaction costs = 1.81%
Asset index
Wei
ghts
Initial portfolio Tracking portfolio
100 150 200 250
0.6
0.7
0.8
0.9
1
1.1
1.2
time
Inde
x va
lue
In−sample TE = 4.50e−003 Out−of−sample TE = 7.22e−003
Index Tracking portfolio
M.Gilli Optimization heuristics 86
Out-of-sample performance (continued)
Rebalancing cost
240 245 250 255
0.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
time
Inde
x va
lue
Rebalancing cost
Index Tracking portfolio
Results with various constraints on K and TCmax
TCmax = 2.0% TCmax = 0.4%
K TEis TEos TEis TEos
4 8.00× 10−3 1.16× 10−2 8.00× 10−3 1.25× 10−2
10 4.50× 10−3 7.20× 10−3 5.92× 10−3 9.00× 10−3
20 1.23× 10−3 2.02× 10−3 4.59× 10−3 5.95× 10−3
M.Gilli Optimization heuristics 87
Conclusions
The threshold accepting algorithm:
• allows to deal easily with all sort of constraints
of practical importance:
– cardinality constraints (integer constraints
limiting the portfolio to a specified number
of assets)
– limits in the proportions held in a given asset
– class constraints
– minimum roundlots
– transaction costs
– . . .
• is computationally efficient
(the larger the problem the more efficient)
• is easy to implement
• provides useful approximations of optima
TA opens new perspectives in the practice of
portfolio management.
M.Gilli Optimization heuristics 88