1GENETIC PROGRAMMING
2 GENETIC PROGRAMMING
Finding Perceived Pattern Structures using Genetic Programming
Mehdi Dastani
Dept. of Mathematics
and Computer Science
Free University Amsterdam
The Netherlands
email: [email protected]
Elena Marchiori
Dept. of Mathematics
and Computer Science
Free University Amsterdam
The Netherlands
email: [email protected]
Robert Voorn
Dept. of Mathematics
and Computer Science
Free University Amsterdam
The Netherlands
email: [email protected]
Abstract
Structural information theory (SIT) deals
with the perceptual organization, often called
the `gestalt' structure, of visual patterns.
Based on a set of empirically validated struc-
tural regularities, the perceived organization
of a visual pattern is claimed to be the most
regular (simplest) structure of the pattern.
The problem of �nding the perceptual orga-
nization of visual patterns has relevant ap-
plications in multi-media systems, robotics
and automatic data visualization. This pa-
per shows that genetic programming (GP) is
a suitable approach for solving this problem.
1 Introduction
In principle, a visual pattern can be described in
many di�erent ways; however, in most cases it will
be perceived as having a certain description. For
example, the visual pattern illustrated in Figure
1-A may have, among others, two descriptions as
they are illustrated in Figure 1-B and 1-C. Hu-
man perceivers prefer usually the description that
is illustrated in Figure 1-B. An empirically sup-
ported theory of visual perception is the Structural
Information Theory (SIT) [Leeuwenberg, 1971,
Van der Helm and Leeuwenberg, 1991,
Van der Helm, 1994]. SIT proposes a set of empiri-
cally validated and perceptually relevant structural
regularities and claims that the preferred description
of a visual pattern is based on the structure that
covers most regularities in that pattern. Using the
formalization of the notions of perceptually relevant
structure and simplicity given by SIT, the problem
of �nding the simplest structure of a visual pattern
(SPS problem) can be formulated mathematically as
a constrained optimization problem.
A
B C
Figure 1: Visual pattern A has two potential structures
B and C.
The SPS problem has relevant applications. For ex-
ample, multimedia systems and image databases need
to analyze, classify, and describe images in terms of
constitutive objects that human users perceives in
those images [Zhu, 1999]. Furthermore, autonomous
robots need to analyze their visual inputs and con-
struct hypotheses about possibly present objects in
their environments [Kang and Ikeuchi, 1993]. Also, in
the �elds of information visualization the goal is to
generate images that represent information such that
human viewers extract that information by looking
at the images [Bertin, 1981]. In all these applica-
tions, a model of gestalt perception is indispensable
[Mackinlay, 1986, Marks and Reiter, 1990]. We focus
on a simple domain of visual patterns and claim that
an appropriate model of gestalt perception for this do-
main is an essential step towards a model of gestalt
perception for more complex visual patterns that are
used in the above mentioned real-world applications
[Dastani, 1998].
Since the search space of possible structures grows
exponentially with the complexity of the visual pat-
tern, heuristic algorithms have to be used for solv-
ing the SPS problem eÆciently. The only algo-
rithm for SPS we are aware of is developed by
[Van der Helm and Leeuwenberg, 1986]. This algo-
3GENETIC PROGRAMMING
rithm ignores the important source of computational
complexity of the problem and covers only a subclass
of perceptually relevant structures. The central part of
this partial algorithm consists of translating the search
for a simplest structure into a shortest route problem.
The algorithm is shown to have O(N4) computational
complexity, where N denotes the length of the input
pattern. To cover all perceptually relevant structures
for not only the domain of visual line patterns, but
also for more complex domains of visual patterns, it
is argued in [Dastani, 1998] that the computational
complexity grows exponentially with the length of the
input patterns.
This paper shows that genetic programming
[Koza, 1992] provides a natural paradigm for solving
the SPS problem using SIT. A novel evolutionary
algorithm is introduced whose main features are the
use of SIT operators for generating the initial popula-
tion of candidate structures, and the use of knowledge
based genetic operators in the evolutionary process.
The use of GP is motivated by the SIT formalization:
structures can be easily described using the standard
GP-tree representation. However, the GP search
is constrained by the fact that structures have to
characterize the same input pattern. In order to
satisfy this constraint, knowledge based operators are
used in the evolutionary process.
The paper is organized as follows. In the next section,
we briey discuss the problem of visual perception and
explain how SIT predicts the perceived structure of vi-
sual line patterns. In Section 3, SIT is used to give a
formalization of the SPS problem for visual line pat-
terns. Section 4 describes how the formalization can be
used in an automatic procedure for generating struc-
tures. Section 5 introduces the GP algorithm for SPS.
Section 6 describes implementation aspects of the al-
gorithm and reports some results of experiments. The
paper concludes with a summary of the contributions
and future research directions.
2 SIT: A Theory of Visual Perception
According to the structural information theory, the
human perceptual system is sensitive to certain
kinds of structural regularities within sensory pat-
terns. They are called perceptually relevant struc-
tural regularities, which are speci�ed by means of
ISA operators: Iteration, Symmetry and Alternations
[Van der Helm and Leeuwenberg, 1991]. Examples of
string patterns that can be speci�ed by these operators
are abab, abcba, and abgabpz, respectively. A visual
pattern can be described in di�erent ways by applying
di�erent ISA operators. In order to disambiguate the
set of descriptions and to decide on the perceived or-
ganization of the pattern, a simplicity measure, called
information load, is introduced. The information load
measures the amount of perceptually relevant regu-
larities covered by pattern descriptions. It is claimed
that the description of a visual pattern with the mini-
mum information load reects its perceived organiza-
tion [Van der Helm, 1994].
In this paper, we focus on the domain of linear line pat-
terns which are turtle-graphics, like line drawings for
which the turtle starts somewhere and moves in such
a way that the line segments are connected and do not
cross each other. A linear line pattern is encoded as
a letter string for which it can be shown that its sim-
plest description represents the perceived organization
of the encoded linear line pattern [Leeuwenberg, 1971].
The encoding process consists of two steps. In the �rst
step, the successive line segments and their relative an-
gles in the pattern are traced from the starting point
of the pattern and identical letter symbols are assigned
to identical line segments (equal length) as well as to
identical angles (relative to the trace movement). In
the second step, the letter symbols that are assigned
to line segments and angles are concatenated in the or-
der they have been visited during the trace of the �rst
step. This results in a letter string that represents the
pattern. An example of such an encoding is illustrated
in Figure 2.
x
x x
y y
a ab b b b
axaybxbybxb
Figure 2: Encoding of a line pattern into a string.
Note that letter strings are themselves perceptual pat-
terns that can be described in many di�erent ways,
one of which is usually the perceived description. The
determination of the perceived description of string
patterns is the essential focus of Hofstadter's Copycat
project [Hofstadter, 1984].
3 The SPS Problem
In this section, we formally de�ne the class of string de-
scriptions that represent possible perceptually relevant
organizations of linear line patterns. Also, a complex-
ity function is de�ned that measures the information
load of those descriptions. In this way, we can en-
4 GENETIC PROGRAMMING
code a linear line pattern into a string, generate the
perceptually relevant descriptions of the string, and
determine the perceived organization of the line pat-
tern by choosing the string description which has the
minimum information load.
The class of descriptions that represent possible per-
ceptual organizations for Linear Line Patterns LLP is
de�ned over the set E = fa; : : : ; zg as follows.
1. For all t 2 E; t 2 LLP
2. If t 2 LLP and n is a natural number, then
iter(t; n) 2 LLP
3. If t 2 LLP , then symeven(t) 2 LLP
4. If t1; t2 2 LLP , then symodd(t1; t2) 2 LLP
5. If t; t1; : : : ; tn 2 LLP , then
altleft(t; < t1; : : : ; tn >) 2 LLP and
altright(t; < t1; : : : ; tn >) 2 LLP
6. If t1; : : : ; tn 2 LLP , then con(t1; : : : ; tn) 2 LLP
The meaning of LLP expressions can be de�ned by the
denotational semantics j[ j], which involves string con-
catenation (�) and string reection (reflect(abcde) =
edcba) operators.
1. If t 2 E, then j[tj] = t
2. j[iter(t; n)j] = j[tj] � : : : � j[tj] (n times)
3. j[symeven(t)j] = j[tj] � reflect(j[tj])
4. j[symodd(t1; t2)j] = j[t1j] � j[t2j] � reflect(j[t1j])
5. j[altleft(t; < t1; : : : ; tn >)j] =
j[tj] � j[t1j] � : : : � j[tj] � j[tnj]
6. j[altright(t; < t1; : : : ; tn >)j] =
j[t1j] � j[tj] � : : : � j[tnj] � j[tj]
7. j[con(t1; : : : ; tn)j] = j[t1j] � : : : � j[tnj]
The complexity function C on LLP expressions,
measures the complexity of an expression as the
number of individual letters t occurring in it, i.e.
C(t) = 1
C(f(T1; : : : ; Tn)) =P
n
i=1C(Ti)
During the last 20 years, Leeuwenberg and his
co-workers have reported on a number of exper-
iments that tested predictions based on the sim-
plicity principle. These experiments were con-
cerned with the disambiguation of ambiguous pat-
terns. The predictions of the simplicity princi-
ple were, on the whole, con�rmed by these experi-
ments [Bu�art et al., 1981, Van Leeuwen et al., 1988,
Boselie and Wouterlood, 1989].
The following LLP expressions describe, among oth-
ers, four di�erent perceptual organizations of the pat-
tern axaybxbybxb:
- con(a; x; a; y; b; x; b; y; b; x; b),
- con(symodd(a; x); y; symodd(b; x); y; symodd(b; x))
- con(symodd(a; x); iter(con(y; b; x; b); 2))
- con(symodd(a; x); iter(altright(b;< y; x >); 2))
Note that these descriptions reect four di�erent per-
ceptual organizations of the line pattern that is illus-
trated in Figure 2. The information load of these four
descriptions are 11; 8; 6; and 5, respectively. This im-
plies that the last description reects the perceived
organization of the line pattern illustrated in Figure 2.
The SPS problem can now be de�ned as follows. Given
a pattern p, �nd a LLP expression t such that
� j[tj] = p and
� C(t) = minfC(s) j s 2 LLP and j[sj] = pg:
As mentioned in the introduction, the only (partial)
algorithm for solving SPS problem is proposed by Van
der Helm [Van der Helm and Leeuwenberg, 1986].
This algorithm �nds only a subclass of perceptually
relevant structures of string patterns by �rst con-
structing a directed acyclic graph for the given string
pattern. If we place an index after each element in
the string pattern, starting from the leftmost element,
then each node in the graph would correspond to an
index, and each link in the graph from node i to j
corresponds to a gestalt for the subpattern starting
at position i and ending at position j. Given this
graph, the SPS problem is translated to a shortest
route problem. Note that this algorithm is designed
for one-dimensional string patterns and it is not clear
how this algorithm can be applied to other domains
of perceptual patterns. Instead, our formalization
of the SPS problem can be easily applied to more
complex visual patterns by extending the LLP
with domain dependent operators such as Euclidean
transformations for two-dimensional visual patterns
[Dastani, 1998].
5GENETIC PROGRAMMING
4 Generating LLP Expressions
In order to solve the SPS problem using genetic pro-
gramming, a probabilistic procedure for generating
LLP expressions, called BUILD-STRUCT, is used.
This procedure takes as input a string, and generates
a (tree structure of a) LLP expression for that string.
The procedure is based on a set of probabilistic pro-
duction rules.
The production rules are derived from the SIT
de�nition of expressions, and are of the form
� t1 : : : tn � �! � P (t1 : : : tn) �
where � and � are (possibly empty) LLP expressions,
t1; : : : ; tn are LLP expressions, and P is an ISA oper-
ator (of arity n). The triple (�; t1 : : : tn; �) is called
splitting of the sequence.
A snapshot of the set of production rules used in
BUILD-STRUCT is given below.
� t t � �! � iter(t; 2) �
� t iter(t; n) � �! � iter(t; n+ 1) �
� iter(t; n) t � �! � iter(t; n+ 1) �
� t1 t2 � �! � con(t1; t2) �
� con(t1; ::; tn) t � �! � con(t1; ::; tn; t) �
� t con(t1; ::; tn) � �! � con(t; t1; ::; tn) �
A production rule transforms a sequence of LLP ex-
pressions into a shorter one. In this way, the repeated
application of production rules terminates after a �-
nite number of steps and produces one LLP expres-
sion. There are two forms of non-determinism in the
algorithm:
1. the choice of which rule to apply when more than
one production rule is applicable,
2. the choice of a splitting of the sequence when more
splittings are possible.
In BUILD-STRUCT both choices are performed ran-
domly. BUILD-STRUCT employs a speci�c data
structure which results in a more eÆcient implemen-
tation of the above described non-determinism. The
BUILD-STRUCT procedure is used in the initializa-
tion of the genetic algorithm and in the mutation op-
erator.
We conclude this section with an example illustrating
the application of the production rules system. The
LLP expression iter(con(a; b; a); 2) can be obtained
using the above production rules starting from the
pattern abaaba as follows, where an underlined sub-
string indicates that an ISA operator will be applied
to that substring:
aba aba �! con(a; b; a)aba
con(a; b; a) aba �! con(a; b; a)con(a; b; a)
con(a; b; a)con(a; b; a) �! iter(con(a; b; a); 2)
Note in this example that the iter operator is
applied to two structurally identical LLP expressions
(i.e. con(a; b; a)con(a; b; a) �! iter(con(a; b; a); 2)).
In general, the ISA operators are not applied on the
basis of structural identity of LLP expressions, but
on the basis of their semantics, i.e. on the basis of the
patterns that are denoted by the LLP expressions (i.e.
symodd(a; b)con(a; b; a) �! iter(symodd(a; b); 2)).
5 A GP for the SPS Problem
This section introduces a novel evolutionary algorithm
for the SPS problem, called GPSPS (Genetic Pro-
gramming for the SPS problem), which applies GP
to SIT. A population of LLP expressions is evolved,
using knowledge based mutation and crossover op-
erators to generate new expressions, and using the
SIT complexity measure as �tness function. GPSPS
is an instance of the generational scheme, cf. e.g.
[Michalewicz, 1996], illustrated below, where P (t) de-
notes the population at iteration t and jP (t)j its size.
PROCEDURE GPSPS
t
have the highest probability of being selected. We
have also made our GP elitist to guarantee that the
best element found so far will be in the actual popu-
lation.
The main features of GPSPS are described in the rest
of this section.
5.1 Representation and Fitness
GPSPS acts on LLP expressions describing the same
string. A LLP expression is represented by means of a
tree in the style used in Genetic Programming, where
leaves are primitive elements while internal nodes are
ISA operators. The �tness function is the complexity
measure C as it is introduced in Section 3.
Thus, the goal of GPSPS is to �nd a chromosome
(representing a structure of the a given string) which
minimizes C. Given a string, a speci�c procedure is
used to ensure that the initial population contains only
chromosomes describing the same pattern. Moreover,
novel genetic operators are designed which preserve
the semantics of chromosomes.
5.2 Initialization
Given a string, chromosomes of the intial population
are generated using the procedure BUILD-STRUCT.
In this way, the initial population contains randomly
selected (representations of) LLP expressions of the
pattern.
5.3 Mutation
When the mutation operator is applied to a chromo-
some T , an internal node n of T is randomly selected
and the procedure BUILD-STRUCT is applied to the
(string represented by the) subtree of T starting at n.
Figure 3 illustrates an application of the mutation op-
erator to an internal node. Observe that each node
(except the terminals) has the same chance of being
selected. In this way smaller subtrees have a larger
chance of being modi�ed.
It is interesting to investigate the e�ectiveness of the
heuristic implemented in BUILD-STRUCT when in-
corporated into an iterated local search algorithm.
Therefore we have implemented an algorithm that mu-
tates one single element for a large number of iterations
and returns the best element that has been found over
all iterations. Although some regularities are discov-
ered by this algorithm, its performance is rather scarce
if compared with GPSPS, even when the number of it-
erations is set to be bigger than the size of the popula-
tion times the number of generations used by GPSPS.
a b
2a
iter(aa)
con
a b
(ab) 2a
iter(aa)
2
iter(abab)
con(ababaa)
symodd(aba) b
con(abab)
(ababaa)con
mutation
Figure 3: Example of the mutation-operator.
5.4 Crossover
The crossover operator cannot simply swap subtrees
between two parents, like in standard GP, due to the
semantic constraint on chromosomes (e.g. chromo-
somes have to denote the same string). Therefore, the
crossover is designed in such a way that it swaps only
subtrees that denote the same string. This is realized
by associating with each internal node of the tree the
string that is denoted by the subtree starting at that
internal node. Then, two nodes of the parents with
equal associated strings are randomly selected and the
corresponding subtrees are swapped. An example of
crossover is illustrated in Figure 4.
b b aa
con(abba)
con
a b
(ab)
symeven(abba)
ba b ca
(abbac)con
abba
con(abba)
con(abbacabba)
(abba)symodd
(bb)con
bb
symeven(abba)
(ab)con
ba
symodd(abbacabba)
a
c
con
bb
(bb)
ba b a c
con(abbac)
con(abbacabba)
symodd(abba)
a
c
symodd(abbacabba)
crossover
Figure 4: Example of the crossover-operator.
7GENETIC PROGRAMMING
When a crossover-pair can not be found, no crossover
takes place. Fortunately this happens only for a small
portion of the crossovers. Usually there are more than
one pair to choose from. This issue is further discussed
in the next section.
5.5 Optimization
As discussed above, the mutation and crossover oper-
ators transform subtrees. When these operators are
applied, the resulting subtrees may exhibit structures
of a form suitable for optimization. For instance, sup-
pose a subtree of the form con(iter(b; 2); a; con(b; b))
is transformed by one of the operators in the sub-
tree con(iter(b; 2); a; iter(b; 2)). This improves the
complexity of the subtree. Unfortunately, based
on this new subtree the expected LLP expression
symodd(iter(b; 2); a) cannot be obtained.
The crossover operator is only helpful for this problem
if there is already a subtree that encodes that speci�c
substring with an symodd structure. This problem
could in fact be solved by applying the mutation op-
erator to the con structure. However, the probability
that the application of the mutation operator will gen-
erate the symodd structure is small.
In order to solve this problem, a simple optimization
procedure is called after each application of the mu-
tation and crossover operators. This procedure uses
simple heuristics to optimize the con structure. First,
the procedure checks if the (entire) con structure is
symmetrical and changes it into a symodd or symeven
structure if possible. If this is not the case, the pro-
cedure checks if neighboring structures that are sim-
ilar can be combined. For example, a structure of
the form con(c; iter(b; 2); iter(b; 3)) can be optimized
to con(c; iter(b; 5)). This kind of optimization is also
applied to altleft and altright structures.
6 Experiments
In this section we discuss some preliminary experi-
ments. The example strings we consider are short and
are designed to illustrate what type of structures are
interesting for this domain. The choice of the values of
the GP parameters used in the experiments is deter-
mined by the considered type of strings. Because the
strings are short, a small pool size of 50 individuals
is used. Making the size of the pool very large would
make the GP perform better, but when the pool is ini-
tialized, it would probably already contain the most
preferred structure. The number of iterations is also
small to avoid generating all possible structures and is
therefore set to 150. This allows us to draw prelimi-
nary conclusions about the performance of the GP.
Two important parameters of the GP are the mutation
and crossover rates. We have done a few test runs to
�nd a setting that produced good results. We have
set the mutation-rate on 0.6 and the crossover-rate to
0.4. The mutation is deliberately set to a higher rate,
because this operator is the most important for dis-
covering structures. The crossover operator is used to
swap substructures between good chromosomes.
We have chosen six di�erent short strings that con-
tain structures that are of interest to our search prob-
lem. Moreover, two longer strings are considered. For
the two long strings the mutation and crossover rates
above speci�ed are used, but the poolsize and the num-
ber of generations are both set to 300. The eight
strings are the code for the linear line patterns illus-
trated in Figure 5.
A
a
a
A
a
a
A
a
a
BB B
a a
A A
A A
bbbb
a a
B B Ba
a a
a
a
a
a
A
AA
A
AA
a
a
a
a
a aA
B
C
D
Ebbb
Y
XX
b
a
Y
X5
a
7
c cZ
YYY
bX
aX
Y
X X
b b
aa a
b
X
bb
Y Y Y Y
X
X
Xb
aa a a
b
SS X X8
TTE Y
X
Z UY
X bc c c caa
b b ba a
dv
A
3
1 2
aa
aa
XX
Y Zb
c
c
c
A
B B
4
6
Figure 5: Line drawings used in experiments.
The algorithm is run on each string a number of times
using di�erent random seeds. The resulting structures
are given in Figure 7, where the structure and �tnesses
of the two best elements of the �nal population are re-
ported. For each string GPSPS is able to �nd the opti-
mal structure. The results of runs with di�erent seeds
are very similar, indicating the (expected) robustness
of the algorithm on these strings.
Figure 6 illustrates how the best �tness and the mean
�tness of the population vary in a typical run of GP-
8 GENETIC PROGRAMMING
0 50 100 150 200 250 3005
10
15
20
25
30
35
Generations
Fitn
ess
Linear Line Pattern 7
Best FitnessMean Fitness
Figure 6: Best and Mean Fitness.
SPS on the line pattern number 7 of Figure 5. On this
pattern, the algorithm is able to �nd a near optimum
of rather good quality after about 50 generations, and
it spends the other 250 generations to �nd the slighly
improved structure. In this experiment about 12% of
the crossovers failed. On average there were about
2.59 possible 'crossover-pairs' possible (with a stan-
dard deviation of 1.38) when the crossover operator
was applicable.
The structures that are found are the most preferred
structures as predicted by the SIT theory. The system
is thus capable of �nding the perceived organizations
for these line drawings patterns.
7 Conclusion and Future Research
This paper discussed the problem of human visual per-
ception and introduced a formalization of a theory of
visual perception, called SIT. The claim of SIT is to
predict the perceived organization of visual patterns
on the basis of the simplicity principle. It is argued
that a full computational model for SIT is compu-
tationally intractable and that heuristic methods are
needed to compute the perceived organization of visual
patterns.
We have applied genetic programming techniques to
this formal theory of visual perception in order to com-
pute the perceived organization of visual line patterns.
Based on perceptually relevant operators from SIT, a
pool of alternative organizations of an input pattern is
generated. Motivated by SIT, mutation and crossover
operations are de�ned that can be applied to these or-
ganizations to generate new organizations for the in-
put pattern. Finally, a �tness function is de�ned that
determines the appropriateness of generated organiza-
tions. This �tness function is directly derived from
SIT and measures the simplicity of organizations.
In this paper, we have focused on a small domain of
visual linear line patterns. The next step is to extend
our system to compute the perceived organization of
more complex visual patterns like two-dimensional vi-
sual patterns, which are de�ned in terms of a variety of
visual attributes such as color, size, position, texture,
shape.
Finally, we intend to investigate whether the class of
structural regularities proposed by SIT is also relevant
for �nding meaningful organizations within patterns
from biological experiments, like DNA sequences. For
this task, we will need to modify GPSPS in order to
allow a group of letters to be treated as a primitive
element.
References
[Bertin, 1981] Bertin, J. (1981). Graphics and Graphic
Information-Processing. Walter de Gruyter, Berlin
NewYork.
[Boselie and Wouterlood, 1989] Boselie, F. and
Wouterlood, D. (1989). The minimum principle
and visual pattern completion. Psychological
Research, 51:93{101.
[Bu�art et al., 1981] Bu�art, H., Leeuwenberg, E.,
and Restle, F. (1981). Coding theory of visual pat-
tern completion. Journal of Experimental Psychol-
ogy: Human Perception and Performance, 7:241{
274.
[Dastani, 1998] Dastani, M. (1998). Ph.D. thesis, Uni-
versity of Amsterdam, The Netherlands.
[Hofstadter, 1984] Hofstadter, D. (1984). The copy-
cat project: An experiment in nondeterministic and
creative analogies. In A.I. Memo 755, Arti�cial In-
telligence Laboratory, Cambridge, Mass. MIT.
[Kang and Ikeuchi, 1993] Kang, S. and Ikeuchi, K.
(1993). Toward automatic robot instruction from
perception: Recognizing a grasp from observation.
In IEEE Trans. on Robotics and Automation, vol.
9, no. 4, pages 432{443.
[Koza, 1992] Koza, J. (1992). Genetic Programming.
MIT Press.
[Leeuwenberg, 1971] Leeuwenberg, E. (1971). A per-
ceptual coding language for visual and auditory pat-
terns. American Journal of Psychology, 84:307{349.
9GENETIC PROGRAMMING
[Mackinlay, 1986] Mackinlay, J. (1986). Automating
the design of graphical presentations of relational
information. In ACM Transactions on Graphics,
volume 5, pages 110{141.
[Marks and Reiter, 1990] Marks, J. and Reiter, E.
(1990). Avoiding unwanted conversational implica-
tures in text and graphics. In Proceeding AAAI,
Menlo Park, CA.
[Michalewicz, 1996] Michalewicz, Z. (1996). Genetic
Algorithms + Data Structures = Evolution Pro-
grams. Springer-Verlag, Berlin.
[Van der Helm, 1994] Van der Helm, P. (1994). The
dynamics of pragnanz. Psychological Research,
56:224{236.
[Van der Helm and Leeuwenberg, 1986] Van der
Helm, P. and Leeuwenberg, E. (1986). Avoiding
explosive search in automatic selection of simplest
pattern codes. Pattern Recognition, 19:181{191.
[Van der Helm and Leeuwenberg, 1991] Van der
Helm, P. and Leeuwenberg, E. (1991). Accessi-
bility: A criterion for regularity and hierarchy
in visual pattern code. Journal of Mathematical
Psychology, 35:151{213.
[Van Leeuwen et al., 1988] Van Leeuwen, C., Bu�art,
H., and Van der Vegt, J. (1988). Sequence inuence
on the organization of meaningless serial stimuli:
economy after all. Journal of Experimental Psychol-
ogy: Human Perception and Performance, 14:481{
502.
[Zhu, 1999] Zhu, S. (Nov, 1999). Embedding gestalt
laws in markov random �elds - a theory for shape
modeling and perceptual organization. IEEE Trans.
on Pattern Analysis and Machine Intelligence, Vol.
21, No.11.
1 string:
aAaAaAaAaAaAaA
structure:
a) iter(con(a,A),7)
b) con(iter(con(a,A),2),iter(con(a,A),5))
complexity
a) 2
b) 4
2 string:
aAaBbAbBbAbBaAa
structure:
a) symodd(altleft(a,),B)
b) symodd(con(symodd(a,A),altright(b,)),B)
complexity
a) 6
b) 6
3 string:
aAaBaAaBaAaB
structure:
a) iter(altleft(a,),3)
b) iter(con(symodd(a,A),B), 3)
complexity
a) 3
b) 3
4 string:
aXaYaXaZbAcBcBc
structure:
a) altleft(symodd(a,X),
Reducing Bloat and Promoting Diversity usingMulti-Objective Methods
Edwin D. de Jong1;2 Richard A. Watson2 Jordan B. Pollack2
fedwin, richardw, [email protected] Universiteit Brussel, AI Lab, Pleinlaan 2, B-1050 Brussels, Belgium
2Brandeis University, DEMO Lab, Computer Science dept., Waltham, MA 02454, USA
Category: Genetic Programming
Abstract
Two important problems in genetic program-
ming (GP) are its tendency to �nd unnec-
essarily large trees (bloat), and the general
evolutionary algorithms problem that diver-
sity in the population can be lost prema-
turely. The prevention of these problems
is frequently an implicit goal of basic GP.
We explore the potential of techniques from
multi-objective optimization to aid GP by
adding explicit objectives to avoid bloat and
promote diversity. The even 3, 4, and 5-
parity problems were solved eÆciently com-
pared to basic GP results from the litera-
ture. Even though only non-dominated in-
dividuals were selected and populations thus
remained extremely small, appropriate diver-
sity was maintained. The size of individuals
visited during search consistently remained
small, and solutions of what we believe to be
the minimum size were found for the 3, 4,
and 5-parity problems.
Keywords: genetic programming, code growth,
bloat, introns, diversity maintenance, evolutionary
multi-objective optimization, Pareto optimality
1 INTRODUCTION
A well-known problem in genetic programming (GP),
is the tendency to �nd larger and larger programs over
time (Tackett, 1993; Blickle & Thiele, 1994; Nordin &
Banzhaf, 1995; McPhee & Miller, 1995; Soule & Fos-
ter, 1999), called bloat or code growth. This is harm-
ful since it results in larger solutions than necessary.
Moreover, it increasingly slows down the rate at which
new individuals can be evaluated. Thus, keeping the
size of trees that are visited small is generally an im-
plicit objective of GP.
Another important issue in GP and in other methods
of evolutionary computation is that of how diversity
of the population can be achieved and maintained. A
population that is spread out over promising parts of
the search space has more chance of �nding a solution
than one that is concentrated on a single �tness peak.
Since members of a diverse population solve parts of
the problem in di�erent ways, it may also be more
likely to discover partial solutions that can be utilized
through crossover. Diversity is not an objective in the
conventional sense; it applies to the populations visited
during the search, not to �nal solutions. A less obvious
idea then is to view the contribution of individuals to
population diversity as an objective.
Multi-objective techniques are speci�cally designed for
problems in which knowledge about multiple objec-
tives is available, see e.g. Fonseca and Fleming (1995)
for an overview. The main idea of this paper is to
use multi-objective techniques to add the objectives of
size and diversity in addition to the usual objective of
a problem-speci�c �tness measure. A multi-objective
approach to bloat appears promising and has been
used before (Langdon, 1996; Rodriguez-Vazquez, Fon-
seca, & Fleming, 1997), but has not become standard
practice. The reason may be that basic multi-objective
methods, when used with small tree size as an objec-
tive, can result in premature convergence to small in-
dividuals (Langdon & Nordin, 2000; Ekart, 2001). We
therefore investigate the use of a size objective in com-
bination with explicit diversity maintenance.
The remaining sections discuss the n-parity problem
(2), bloat (3), multi-objective methods (4), diversity
maintenance(5), ideas behind the approach, called FO-
CUS, (6), algorithmic details (7), results (8), and con-
clusions (9).
2 THE N-PARITY PROBLEM
The test problems that will be used in this paper are
even n-parity problems, with n ranging from 3 to 5.
A correct solution to this problem takes a binary se-
quence of length n as input and returns true (one) if
11GENETIC PROGRAMMING
X0 X1 X0 X1
NORAND
OR
Figure 1: A correct solution to the 2-parity problem
the number of ones in the sequence is even, and false
(zero) if it is odd. It is named even to avoid confusion
with the related odd parity problem, which gives the
inverse answer. Trees may use the following boolean
operators as internal nodes: AND, OR, NAND, and
NOR. Each leaf speci�es an element of the sequence.
The �tness is the fraction of all possible length n bi-
nary sequences for which the program returns the cor-
rect answer. Figure 1 shows an example.
The n-parity problem has been selected because it is a
diÆcult problem that has been used by a number of re-
searchers. With increasing order, the problem quickly
becomes more diÆcult. One way to understand its
hardness is that for any setting of the bits, ipping
any bit inverts the outcome of the parity function.
Equivalently, its Karnaugh map (Zissos, 1972) equals
a checkerboard function, and thus has no adjacencies.
2.1 SIZE OF THE SMALLEST
SOLUTIONS TO N-PARITY
We believe that the correct solutions to n-parity con-
structed as follows are of minimal size, but are not able
to prove this. The principle is to recursively divide the
bit sequence in half and, take the parity of each halve,
and feed these two into a parity function. For subse-
quences of size one, i.e. single bits, the bit itself is used
instead of its parity. When this occurs for one of the
two arguments, the outcome would be inverted, and
thus the odd 2-parity function is used to obtain the
even 2-parity of the bits.
Let S be a binary sequence of length jSj = n � 2.S is divided in half yielding two subsequences L and
R with, for even n, length n2or, for odd n, lengths
n�1
2and n+1
2. Then the following recursively de�ned
function P(S) gives a correct expression for the even-
parity of S for jSj � 2 in terms of the above operators:
P (S) =
8<:S if jSj = 1ODD(P (L); P (R)) if jSj > 1 ^ g(L;R)EVEN(P (L); P (R)) otherwise
whereODD(A, B) = NOR(AND(A, B), NOR(A, B)),EVEN(A, B) = OR(AND(A, B), NOR(A, B)), and
g(A;B) =
�TRUE if (jAj = 1) XOR (jBj = 1)FALSE else
Table 1: Length of the shortest solution to n-parity
using the operators AND, OR, NAND, and NOR.
n 1 2 3 4 5 6 7
Length 3 7 19 31 55 79 103
The length jP (S)j of the expression P (S) satis�es:
jP (S)j =
�1 for jSj = 1
3 + 2jP (L)j + 2jP (R)j for jSj > 1
For n = 2i; i > 0, this expression can be shown to
equal 2n2 � 1. Table 1 gives the lengths of the ex-pressions for the �rst seven even-n-parity problems.
For jSj = 1, the shortest expression is NOR(S, S); forjSj > 1, the length is given by the above expression.The rapid growth with increasing order stems from the
repeated doubling of the required inputs.
3 THE PROBLEM OF BLOAT
A well-known problem, known as bloat or code growth,
is that the trees considered during a GP run grow
in size and become larger than is necessary to rep-
resent good solutions. This is undesirable because it
slows down the search by increasing evaluation and
manipulation time and, if the growth consists largely
of non-functional code, by decreasing the probability
that crossover or mutation will change the operational
part of the tree. Also, compact trees have been linked
to improved generalization (Rosca, 1996).
Several causes of bloat have been suggested. First,
under certain restrictions (Soule, 1998), crossover fa-
vors smaller than average subtrees in removal but
not in replacement. Second, larger trees are more
likely to produce �t (and large) o�spring because
non-functional code can play a protective role against
crossover (Nordin & Banzhaf, 1995) and, if the prob-
ability of mutating a node decreases with increasing
tree size, against mutation. Third, the search space
contains more large than small individuals (Langdon
& Poli, 1998).
Nordin and Banzhaf (1995) observed that the length
of the e�ective part of programs decreases over time.
However, the total length of the programs in the ex-
periments also increased rapidly, and hence it may be
concluded that in those experiments bloat was mainly
due to growth of ine�ective code (introns).
Finally, it is conceivable that in some circumstances
non-functional code may be useful. It has been sug-
gested that introns may be useful for retaining code
that is not used in the current individual but is a
helpful building block that may be used later (Nordin,
Francone, & Banzhaf, 1996).
12 GENETIC PROGRAMMING
Table 2: Properties of the basic GP method used.
Problem 3-ParityFitness Fraction of correct answersOperators AND, OR, NAND, and NORStop criterion 500,000 evaluations or solutionInitial tree size Uniform [1..20] internal nodesCycle generationalPopulation Size 1000Parent selection Boltzmann with T = 0.1Replacement CompleteUniqueness check Individuals occur at most onceP(crossover) 0.9P(mutation) 0.1Mutation method Mutate node with P = 1
n
0
100
200
300
400
500
600
700
0 20000 40000 60000 80000 1000000
0.25
0.5
0.75
1
Ave
rage
tree
siz
e
Number of fitness evaluations
Average treesize
0
100
200
300
400
500
600
700
0 20000 40000 60000 80000 1000000
0.25
0.5
0.75
1
Ave
rage
tree
siz
e
Number of fitness evaluations
Average treesize
0
100
200
300
400
500
600
700
0 20000 40000 60000 80000 1000000
0.25
0.5
0.75
1
Ave
rage
tree
siz
e
Number of fitness evaluations
Average treesize
0
100
200
300
400
500
600
700
0 20000 40000 60000 80000 1000000
0.25
0.5
0.75
1
Ave
rage
tree
siz
e
Number of fitness evaluations
Average treesizeFraction of runs that yielded solution
Size of smallest correct tree
Figure 2: Average tree sizes of ten di�erent runs (solid
lines) using basic GP on the 3-parity program.
3.1 OBSERVATION OF BLOAT USING
BASIC GP
To con�rm that bloat does indeed occur in the test
problem of n-parity using basic GP, thirty runs where
performed for the 3-parity problem. The parameters
of the run are shown in Table 2. A run ends when
a correct solution has been found. Figure 2 shows
that average tree sizes increase rapidly in each run. If
a solution is not found at an early point in the run,
bloating rapidly increases the sizes of the trees in the
population, thus increasingly slowing down the search.
A single run of 111,054 evaluations already took more
than 15 hours on a current PC running Linux due to
the increasing amount of processing required per tree
as a result of bloat. The population of size-unlimited
trees that occurred in the single 4-parity run that
was tried (with trees containing up to 6,000 nodes)
�lled virtually the entire swap space and caused per-
formance to degrade to impractical levels. Clearly, the
problem of bloat must be addressed in order to solve
these and higher order versions of the problem in an
eÆcient manner.
0
100
200
300
400
500
600
700
0 20000 40000 60000 80000 1000000
0.25
0.5
0.75
1
Fra
ctio
n of
suc
cess
ful r
uns
Number of fitness evaluations
Average treesize
0
100
200
300
400
500
600
700
0 20000 40000 60000 80000 1000000
0.25
0.5
0.75
1
Fra
ctio
n of
suc
cess
ful r
uns
Number of fitness evaluations
Average treesize
0
100
200
300
400
500
600
700
0 20000 40000 60000 80000 1000000
0.25
0.5
0.75
1
Fra
ctio
n of
suc
cess
ful r
uns
Number of fitness evaluations
Average treesize
0
100
200
300
400
500
600
700
0 20000 40000 60000 80000 1000000
0.25
0.5
0.75
1
Fra
ctio
n of
suc
cess
ful r
uns
Number of fitness evaluations
Average treesizeFraction of runs that yielded solution
Minimum size of correct tree
Figure 3: Average tree sizes and fraction of successful
runs in the 3-parity problem using basic GP with a tree
size limit of 200. Tree sizes are successfully limited, of
course, but the approach is not ideal (see text).
3.2 USING A FIXED TREE SIZE LIMIT
Probably the most common way to avoid bloat is to
simply limit the allowed tree size or depth (Langdon &
Poli, 1998; Koza, 1992), although the latter has been
found to lead to loss of diversity near the root node
when used with crossover (Gathercole & Ross, 1996).
Figure 3 shows the e�ect of using a limit of 200 on 3-
parity. This limit is well above the minimum size of a
correct solution, but not too high either since several
larger solutions were found in the unrestricted run.
The average tree size is around 140 nodes.
On the 4-parity problem (with a tree size limit of 200),
the average tree size varied around 150. However,
whereas on 3-parity 90% of the runs found a solution
within 100,000 evaluations, on 4-parity only 33% of
the runs found a solution within 500,000 evaluations,
testifying to the increased diÆculty of this order of
the parity problem. For 5-parity, basic GP found no
solutions within 1,000,000 evaluations for any of the
30 runs. Thus, our version of GP with �xed tree size
limit does not scale up well. Furthermore, a funda-
mental problem with this method of preventing bloat
is that the maximum tree size has to be selected before
the search, when it is often unknown.
3.3 WEIGHTED SUM OF FITNESS AND
SIZE
Instead of choosing a �xed tree size limit in advance
one would rather like to have the algorithm search for
trees that can be as large as they need to be, but not
much larger. A popular approach that goes some way
towards this goal is to include a component in the �t-
ness that rewards small trees or programs. This is
mostly done by adding a component to the �tness,
thus making �tness a linear combination of a perfor-
mance measure and a parsimony measure (Koza, 1992;
Soule, Foster, & Dickinson, 1996). However, this ap-
proach is not without its own problems (Soule & Fos-
13GENETIC PROGRAMMING
Objective 1
Objective 2
Non-dominated
Highest isocline of weightedsum that crosses an individual
Direction in which weighted sum increases
individuals
Figure 4: Schematic rendition of a concave tradeo�
surface. This occurs when better performance in one
objective means worse performance in the other, vice
versa. The lines mark the maximum �tness individu-
als for three example weightings (see vectors) using a
linear weighting of the objectives. No linear weight-
ing exists that �nds the in-between individuals, with
reasonable performance in both objectives.
ter, 1999). First, the weight of the parsimony measure
must be determined beforehand, and so a choice con-
cerning the tradeo� between size and performance is
already made before the search. Furthermore, if the
tradeo� surface between the two �tness components
is concave1 (see Fig. 4), a linear weighting of the two
components favors individuals that do well in one of
the objectives, but excludes individuals that perform
reasonably in both respects (Fleming & Pashkevich,
1985).
Soule and Foster (1999) have investigated why a linear
weighting of �tness and size has yielded mixed results.
It was found that a weight value that adequately bal-
ances �tness and size is diÆcult to �nd. However, if
the required balance is di�erent for di�erent regions
in objective space, then adequate parsimony pressure
cannot be speci�ed using a single weight. If this is
the case, then methods should be used that do not at-
tempt to �nd such a single balance. This idea forms
the basis of multi-objective optimization.
4 MULTI-OBJECTIVE METHODS
After several early papers describing the idea of opti-
mizing for multiple objectives in evolutionary compu-
tation (Scha�er, 1985; Goldberg, 1989), the approach
has recently received increasing attention (Fonseca &
Fleming, 1995; Van Veldhuizen, 1999). The basic idea
is to search for multiple solutions, each of which satisfy
the di�erent objectives to di�erent degrees. Thus, the
selection of the �nal solution with a particular com-
bination of objective values is postponed until a time
when it is known what combinations exist.
A key concept in multi-objective optimization is that
of dominance. Let individual xAhave values A
ifor the
n objectives, and individual xBhave objective values
1Since �tness is to be maximized, the tradeo� curveshown is concave.
Bi. Then A dominates B if
8i 2 [1::n] : Ai� B
i^ 9i : A
i> B
i
Multi-objective optimization methods typically strive
for Pareto optimal solutions, i.e. individuals that are
not dominated by any other individuals.
5 DIVERSITY MAINTENANCE
A key di�erence between classic search methods and
evolutionary approaches is that in the latter a popu-
lation of individuals is maintained. The idea behind
this is that by maintaining individuals in several re-
gions of the search space that look promising (diver-
sity maintenance), there is a higher chance of �nding
useful material from which to construct solutions.
In order to maintain the existing diversity of a pop-
ulation, evolutionary methods typically keep some or
many of the individuals that happen to have been gen-
erated and have relatively high �tness, but lower than
that found so far. In the same way, evolutionary multi-
objective methods usually keep some dominated indi-
viduals in addition to the non-dominated individuals
(Fonseca & Fleming, 1993). However, this appears to
be a somewhat arbitrary way of maintaining diversity.
In the following section, we present a more directed
method. The relation to other diversity maintenance
methods is discussed.
6 THE FOCUS METHOD
We propose to do diversity maintenance by using a
basic multi-objective algorithm and including an ob-
jective that actively promotes diversity. To the best
of our knowledge, this idea has not been used in other
work, including multi-objective research. If it works
well, the need for keeping arbitrary dominated indi-
viduals may be avoided. To test this, we use the di-
versity objective in combination with a multi-objective
method that only keeps non-dominated individuals, as
reported in section 8.
The approach strongly directs the attention of the
search towards the explicitly speci�ed objectives. We
therefore name this method FOCUS, which stands for
Find Only and Complete Undominated Sets, reecting
the fact that populations only contain non-dominated
individuals, and contain all such individuals encoun-
tered so far. Focusing on non-dominated individuals
combines naturally with the idea that the objectives
are responsible for exploration, and this combination
de�nes the FOCUS method.
The concept of diversity applies to populations, mean-
ing that they are dispersed. To translate this aim into
an objective for individuals, a metric has to be de�ned
that, when optimized by individuals, leads to diverse
populations. The metric used here is that of average
14 GENETIC PROGRAMMING
squared distance to the other members of the popu-
lation. When this measure is maximized, individuals
are driven away from each other.
Interestingly, the average distance metric strongly de-
pends on the current population. If the population
were centered around a single central peak in the �t-
ness landscape, then individuals that moved away from
that peak could survive by satisfying the diversity ob-
jective better than the individuals around the �tness
peak. It might be expected that this would cause
large parts of the population to occupy regions that
are merely far away from other individuals but are not
relevant to the problem. However, if there are any
di�erences in �tness in the newly explored region of
the search space, then the �tter individuals will come
to replace individuals that merely performed well on
diversity. When more individuals are created in the
same region, the potential for scoring highly on diver-
sity for those individuals diminishes, and other areas
will be explored. The dynamics thus created are a new
way to maintain diversity.
Other techniques that aim to promote diversity in a di-
rected way exist, and include �tness sharing (Goldberg
& Richardson, 1987; Deb & Goldberg, 1989), deter-
ministic crowding (Mahfoud, 1995), and �tness derat-
ing (Beasley, Bull, & Martin, 1993). A distinguishing
feature of the method proposed here is that in choos-
ing the diversity objective, problem-based criteria can
be used to determine which individuals should be kept
for exploration purposes.
7 ALGORITHM DETAILS
The algorithm selects individuals if and only if they are
not dominated by other individuals in the population.
The population is initialized with 300 randomly cre-
ated individuals of 1 to 20 internal nodes. A cycle
proceeds as follows. A chosen number n of new indi-
viduals (300) is generated based on the current popu-
lation using crossover (90%) and mutation (10%). If
the individual already exists in the population, it is
mutated. If the result also exists, it is discarded. Oth-
erwise it is added to the population. All individuals
are then evaluated if necessary. After evaluation, all
population members are checked against other popu-
lation members, and removed if dominated by any of
them.
A slightly stricter criterion than Pareto's is used: A
dominates B if 8i 2 [1::n] : Ai� B
i. Of multiple indi-
viduals occupying the same point on the tradeo� sur-
face, precisely one will remain, since the removal cri-
terion is applied sequentially. This criterion was used
because the Pareto criterion caused a proliferation of
individuals occupying the same point on the trade-o�
surface when no diversity objective was used2.
2In later experiments including the diversity objec-
0
100
200
300
400
500
600
700
0 20000 40000 60000 80000 1000000
0.25
0.5
0.75
1
Fra
ctio
n of
suc
cess
ful r
uns
Number of fitness evaluations
Average treesize
0
100
200
300
400
500
600
700
0 20000 40000 60000 80000 1000000
0.25
0.5
0.75
1
Fra
ctio
n of
suc
cess
ful r
uns
Number of fitness evaluations
Average treesize
0
100
200
300
400
500
600
700
0 20000 40000 60000 80000 1000000
0.25
0.5
0.75
1
Fra
ctio
n of
suc
cess
ful r
uns
Number of fitness evaluations
Average treesize
0
100
200
300
400
500
600
700
0 20000 40000 60000 80000 1000000
0.25
0.5
0.75
1
Fra
ctio
n of
suc
cess
ful r
uns
Number of fitness evaluations
Average treesizeFraction of runs that yielded solution
Minimum size of correct tree
Figure 5: Average tree size and fraction of successful
runs for the [�tness, size, diversity] objective vector on
the 3-parity problem. The trees are much smaller than
for basic GP, and solutions are found faster.
The following distance measure is used in the diversity
objective. The distance between two corresponding
nodes is zero if they are identical and one if they are
not. The distance between two trees is the sum of the
distances of the corresponding nodes, i.e. nodes that
overlap when the two trees are overlaid, starting from
the root. The distance between two trees is normalized
by dividing by the size of the smaller tree of the two.
8 EXPERIMENTAL RESULTS
In the following experiments we use �tness, size, and
diversity as objectives. The implementation of the ob-
jectives is as follows. Fitness is the fraction of all 2n
input combinations handled correctly. For size, we use
1 over the number of nodes in the tree as the objective
value. The diversity objective is the average squared
distance to the other population members.
8.1 USING FITNESS, SIZE, AND
DIVERSITY AS OBJECTIVES
Fig. 5 shows the graph of Fig. 3 for the method of
using �tness, size, and diversity as objectives. The av-
erage tree size remains extremely small. In addition,
a glance at the graphs indicates that correct solutions
are found more quickly. To determine whether this
is indeed the case, we compute the computational ef-
fort, i.e. the expected number of evaluations required
to yield a correct solution with a 99% probability, as
described in detail by Koza (1994).
The impression that correct solutions to 3-parity are
found more quickly for the multi-objective approach
(see Figure 6) is con�rmed by considering the com-
putational e�ort E; whereas GP with the tree size
limit requires 72,044 evaluations, the multi-objective
approach requires 42,965 evaluations. For the 4-
parity problem, the di�erence is larger; basic GP needs
tive, this proliferation was not observed, and the standardPareto criterion also worked satisfactorily.
15GENETIC PROGRAMMING
0
100000
200000
300000
400000
500000
600000
0 50000 1000000
0.5
1
Exp
ecte
d R
equi
red
eval
uatio
ns
P(c
orre
ct s
olut
ion)
Evaluations
GP: E = 72,044
MO: E = 42,965
P for MO methodP for GP
I for MO methodI for GP
Figure 6: Probability of �nding a solution and com-
putational e�ort for 3-parity using basic GP and the
multi-objective method.
0
2e+06
4e+06
6e+06
8e+06
1e+07
1.2e+07
1.4e+07
0 100000 200000 300000 400000 5000000
0.5
1
Exp
ecte
d R
equi
red
eval
uatio
ns
P(c
orre
ct s
olut
ion)
Evaluations
MO: E = 238,856
GP: E = 5,410,550
P for MO methodP for GP
I for MO methodI for GP
Figure 7: Probability of �nding a solution and compu-
tational e�ort for 4-parity for basic GP and the multi-
objective method. The performance of the multi-
objective method is considerably superior.
5,410,550 evaluations, whereas the multi-objective ap-
proach requires only 238,856. This is a dramatic im-
provement, and demonstrates that our method can be
very e�ective.
Finally, experiments have been performed using the
even more diÆcult 5-parity problem. For this prob-
lem, basic GP did not �nd any correct solutions within
a million evaluations. The multi-objective method did
�nd solutions, and did so reasonably eÆciently, requir-
ing a computational e�ort of 1,140,000 evaluations.
Table 3 summarizes the results of the experiments.
Considering the average size of correct solutions on
3-parity, the multi-objective method outperforms all
methods that have been compared, as the �rst solution
it �nds has 30.4 nodes on average. What's more, the
multi-objective method also requires a smaller num-
ber of evaluations to do so than the other methods.
Finally, perhaps most surprisingly, it �nds correct so-
lutions using extremely small populations, typically
containing less than 10 individuals. For example, the
average population size over the whole experiment for
3-parity was 6.4, and 8.5 at the end of the experiment,
Table 3: Results of the experiments (GP and Multi-
Objective rows). For comparison, results of Koza's
(1994) set of experiments (population size 16,000) and
the best results with other con�gurations (population
size 4,000) found there. E: computational e�ort, S:
average tree size of �rst solution, Pop: average popu-
lation size.
3-parity E S PopGP 72,044 93.67 1000Multi-objective 42,965 30.4 6.4Koza GP 96,000 44.6 16,000Koza GP-ADF 64,000 48.2 16,0004-parity E S PopGP 5,410,550 154 1000Multi-objective 238,856 68.5 15.8Koza GP 384,000 112.6 16,000Koza GP-ADF 176,000 60.1 16,0005-parity E S PopGP 11 n.a. n.aMulti-objective 1,140,000 218.7 49.7Koza GP 6,528,000 299.9 16,000Koza GP 1,632,000 299.9 4,000Koza GP-ADF 464,000 156.8 16,000Koza GP-ADF 272,000 99.5 4,000
1No solutions were found for 5-parity using basic GP.
and the highest population size encountered in all 30
runs was 18. This suggests that the diversity main-
tenance achieved by using this greedy multi-objective
method in combination with an explicit diversity ob-
jective is e�ective, since even extremely small popula-
tions did not result in premature convergence.
Considering 4 and 5-parity, the GP extended with the
size and diversity objectives outperforms both basic
GP methods used by Koza (1994) and the basic GP
method tested here, both in terms of computational
e�ort and tree size. The Automatically De�ned Func-
tion (ADF) experiments performed by Koza for these
and larger problem sizes perform better. These prob-
ably bene�t from the inductive bias of ADFs, which
favors a modular structure. Therefore, a natural di-
rection for future experiments is to also extend ADFs
with size and diversity objectives.
For comparison, we also implemented an evolutionary
multi-objective technique that does keep some domi-
nated individuals. It used the number of individuals by
which an individual is dominated as a rank, similar to
the method described by Fonseca and Fleming (1993).
The results were similar in terms of evaluations, but
the method keeping strictly non-dominated individuals
worked faster, probably due to the calculation of the
distance measure. Since this is quadratic in the pop-
ulation size, the small populations of multi-objective
save much time (about a factor 7 for 5-parity), which
made it preferable.
16 GENETIC PROGRAMMING
As a control experiment, we also investigated whether
the diversity objective is really required by using
only �tness and size as objectives using the algorithm
that was described. The individuals found are small
(around 10 nodes), but the �tness of the individuals
found was well below basic GP, and hence the diver-
sity objective was indeed performing a useful function
in the experiments.
8.2 OBTAINING STILL SMALLER
SOLUTIONS
Finally, we investigate whether the algorithm is able
to �nd smaller solutions, after �nding the �rst. Af-
ter the �rst correct solution is found, we monitor the
smallest correct solution. Although the �rst solution
size of 30 was already low compared to other methods,
the algorithm rapidly �nds smaller correct solutions.
The average size drops to 22 within 4,000 additional
evaluations, and converges to around 20. The smallest
tree (found in 12 out of 30 runs) was 19, i.e. equalling
the presumed minimum size. On 4-parity, solutions
dropped in size from the initial 68.5 to 50 in about
10,000 evaluations, and to 41 on average when runs
were continued longer (85,000 evaluations). In 12 of
the 30 runs, minimum size solutions (31 nodes) were
found. Using the same method, a minimum size solu-
tion to 5-parity (55 nodes) was also found.
The quick convergence to smaller tree sizes shows that
at least for the problem at hand, the method is e�ec-
tive at �nding small solutions when it is continued run-
ning after the �rst correct solutions have been found,
in line with the seeding experiments by Langdon and
Nordin (2000).
9 CONCLUSIONS
The paper has discussed using multi-objective meth-
ods as a general approach to avoiding bloat in GP
and to promoting diversity, which is relevant to evo-
lutionary algorithms in general. Since both of these
issues are often implicit goals, a straightforward idea
is to make them explicit by adding corresponding ob-
jectives. In the experiments that are reported, a size
objective rewards smaller trees, and a diversity objec-
tive rewards trees that are di�erent from other individ-
uals in the population, as calculated using a distance
measure.
Strongly positive results are reported regarding both
size control and diversity maintenance. The method
is successful in keeping the trees that are visited small
without requiring a size limit or a relative weighting of
�tness and size. It impressively outperforms basic GP
on the 3, 4, and 5-parity problem both with respect
to computational e�ort and tree size. Furthermore,
correct solutions of what we believe to be the minimum
size have been found for all problem sizes examined,
i.e. the even 3, 4, and 5-parity problems.
The e�ectiveness of the new way of promoting diver-
sity proposed here can be assessed from the follow-
ing, which concerns the even 3, 4, and 5-parity prob-
lems. The multi-objective algorithm that was used
only maintains individuals that are not dominated by
other individuals found so far, and maintains all such
individuals (except those with identical objective vec-
tors). Thus, only non-dominated individuals are se-
lected after each generation, and populations (hence)
remained extremely small (6, 16, and 50 on average,
respectively). In de�ance of this uncommon degree of
greediness or elitism, suÆcient diversity was achieved
to solve these problems eÆciently in comparison with
basic GP method results both as obtained here and as
found in the literature. Control experiments in which
the diversity objective was removed (leaving the �t-
ness and size objectives) failed to maintain suÆcient
diversity, as would be expected.
The approach that was pursued here is to make de-
sired characteristics of search into explicit objectives
using multi-objective methods. This method is simple
and straightforward and performed well on the prob-
lem sizes reported, in that it improved the performance
of basic GP on 3 and 4-parity. It solved 5-parity rea-
sonably eÆciently, even though basic GP found no so-
lutions on 5-parity. For problem sizes of 6 and larger,
basic GP is no longer feasible, and more sophisticated
methods must be invoked that make use of modular-
ity, such as Koza's Automatically De�ned Functions
(1994) or Angeline's GLiB (1992). We expect that the
multi-objective approach with size and diversity as ob-
jectives that was followed here could also be of value
when used in combination with these or other existing
methods in evolutionary computation.
Acknowledgements
The authors would like to thank Michiel de Jong,
Pablo Funes, Hod Lipson, and Alfonso Renart for use-
ful comments and suggestions concerning this work.
Edwin de Jong gratefully acknowledges a Fulbright
grant.
References
Angeline, P. J., & Pollack, J. B. (1992). The evolutionaryinduction of subroutines. In Proceedings of the fourteenthannual conference of the cognitive science society (p. 236-241). Bloomington, Indiana, USA: Lawrence Erlbaum.
Beasley, D., Bull, D. R., & Martin, R. R. (1993). A sequen-tial niche technique for multimodal function optimization.Evolutionary Computation, 1 (2), 101{125.
Blickle, T., & Thiele, L. (1994). Genetic programming andredundancy. In J. Hopf (Ed.), Genetic algorithms withinthe framework of evolutionary computation (workshop atki-94, saarbrucken) (pp. 33{38). Im Stadtwald, Building
17GENETIC PROGRAMMING
44, D-66123 Saarbrucken, Germany: Max-Planck-Institutfur Informatik (MPI-I-94-241).
Deb, K., & Goldberg, D. E. (1989). An investigation ofniche and species formation in genetic function optimiza-tion. In J. D. Scha�er (Ed.), Proceedings of the 3rd in-ternational conference on genetic algorithms (pp. 42{50).George Mason University: Morgan Kaufmann.
Ekart, A. (2001). Selection based on the Pareto nondomi-nation criterion for controlling code growth in genetic pro-gramming. Genetic Programming and Evolvable Machines,2, 61-73.
Fleming, P. J., & Pashkevich, A. P. (1985). Computer-aided control system design using a multiobjective opti-mization approach. In Proceedings of the iee internationalconference | control '85 (pp. 174{179). Cambridge, UK.
Fonseca, C. M., & Fleming, P. J. (1993). Genetic Algo-rithms for Multiobjective Optimization: Formulation, Dis-cussion and Generalization. In S. Forrest (Ed.), Proceedingsof the �fth international conference on genetic algorithms(ICGA'93) (pp. 416{423). San Mateo, California: MorganKau�man Publishers.
Fonseca, C. M., & Fleming, P. J. (1995). An Overview ofEvolutionary Algorithms in Multiobjective Optimization.Evolutionary Computation, 3 (1), 1{16.
Gathercole, C., & Ross, P. (1996). An adverse interactionbetween crossover and restricted tree depth in genetic pro-gramming. In J. R. Koza, D. E. Goldberg, D. B. Fogel, &R. L. Riolo (Eds.), Genetic programming 1996: Proceed-ings of the �rst annual conference (pp. 291{296). StanfordUniversity, CA, USA: MIT Press.
Goldberg, D. E. (1989). Genetic algorithms in search,optimization, and machine learning. Addison-Wesley.
Goldberg, D. E., & Richardson, J. (1987). Genetic algo-rithms with sharing for multimodal function optimization.In J. J. Grefenstette (Ed.), Genetic algorithms and theirapplications : Proc. of the second Int. Conf. on GeneticAlgorithms (pp. 41{49). Hillsdale, NJ: Lawrence ErlbaumAssoc.
Koza, J. R. (1992). Genetic programming. Cambridge,MA: MIT Press.
Koza, J. R. (1994). Genetic programming II: Automaticdiscovery of reusable programs. Cambridge, MA: MITPress.
Langdon, W. B. (1996). Advances in genetic programming2. In P. J. Angeline & K. Kinnear (Eds.), (p. 395-414).Cambridge, MA: MIT Press. (Chapter 20)
Langdon, W. B., & Nordin, J. P. (2000). Seeding GP pop-ulations. In R. Poli, W. Banzhaf, W. B. Langdon, J. F.Miller, P. Nordin, & T. C. Fogarty (Eds.), Genetic pro-gramming, proceedings of eurogp'2000 (Vol. 1802, pp. 304{315). Edinburgh: Springer-Verlag.
Langdon, W. B., & Poli, R. (1998). Fitness causes bloat:Mutation. In W. Banzhaf, R. Poli, M. Schoenauer, & T. C.Fogarty (Eds.), Proceedings of the �rst european workshopon genetic programming (Vol. 1391, pp. 37{48). Paris:Springer-Verlag.
Mahfoud, S. W. (1995). Niching methods for genetic al-gorithms. Unpublished doctoral dissertation, University ofIllinois at Urbana-Champaign, Urbana, IL, USA. (IlliGALReport 95001)
McPhee, N. F., & Miller, J. D. (1995). Accurate repli-cation in genetic programming. In L. Eshelman (Ed.),Genetic algorithms: Proceedings of the sixth internationalconference (icga95) (pp. 303{309). Pittsburgh, PA, USA:Morgan Kaufmann.
Nordin, P., & Banzhaf, W. (1995). Complexity compres-sion and evolution. In L. Eshelman (Ed.), Genetic algo-rithms: Proceedings of the sixth international conference(icga95) (pp. 310{317). Pittsburgh, PA, USA: MorganKaufmann.
Nordin, P., Francone, F., & Banzhaf, W. (1996). Explicitlyde�ned introns and destructive crossover in genetic pro-gramming. In P. J. Angeline & K. E. Kinnear, Jr. (Eds.),Advances in genetic programming 2 (pp. 111{134). Cam-bridge, MA, USA: MIT Press.
Rodriguez-Vazquez, K., Fonseca, C. M., & Fleming, P. J.(1997). Multiobjective genetic programming: A nonlinearsystem identi�cation application. In J. R. Koza (Ed.), Latebreaking papers at the 1997 genetic programming confer-ence (pp. 207{212). Stanford University, CA, USA: Stan-ford Bookstore.
Rosca, J. (1996). Generality versus size in genetic pro-gramming. In J. R. Koza, D. E. Goldberg, D. B. Fogel, &R. L. Riolo (Eds.), Genetic programming 1996: Proceed-ings of the �rst annual conference (pp. 381{387). StanfordUniversity, CA, USA: MIT Press.
Scha�er, J. D. (1985). Multiple objective optimizationwith vector evaluated genetic algorithms. In J. J. Grefen-stette (Ed.), Proceedings of the 1st international conferenceon genetic algorithms and their applications (pp. 93{100).Pittsburgh, PA: Lawrence Erlbaum Associates.
Soule, T. (1998). Code growth in genetic programming.Unpublished doctoral dissertation, University of Idaho.
Soule, T., & Foster, J. A. (1999). E�ects of code growthand parsimony presure on populations in genetic program-ming. Evolutionary Computation, 6 (4), 293{309.
Soule, T., Foster, J. A., & Dickinson, J. (1996). Codegrowth in genetic programming. In J. R. Koza, D. E. Gold-berg, D. B. Fogel, & R. L. Riolo (Eds.), Genetic program-ming 1996: Proceedings of the �rst annual conference (pp.215{223). Stanford University, CA, USA: MIT Press.
Tackett, W. A. (1993). Genetic programming for featurediscovery and image discrimination. In S. Forrest (Ed.),Proceedings of the 5th international conference on geneticalgorithms, icga-93 (pp. 303{309). University of Illinois atUrbana-Champaign: Morgan Kaufmann.
Van Veldhuizen, D. A. (1999). Multiobjective Evolution-ary Algorithms: Classi�cations, Analyses, and New Inno-vations. Unpublished doctoral dissertation, Departmentof Electrical and Computer Engineering. Graduate Schoolof Engineering. Air Force Institute of Technology, Wright-Patterson AFB, Ohio.
Zissos, D. (1972). Logic design algorithms. London: OxfordUniversity Press.
18 GENETIC PROGRAMMING
Adaptive Genetic Programs via Reinforcement Learning
Keith L. Downing
Department of Computer Science
The Norwegian University of Science and Technology (NTNU)
7020 Trondheim, Norway
tele: (+47) 73 59 18 40
email: [email protected]
Abstract
Reinforced Genetic Programming (RGP) en-
hances standard tree-based genetic program-
ming (GP) [7] with reinforcement learning
(RL)[11]. Essentially, leaf nodes of GP trees
become monitored action-selection points,
while the internal nodes form a decision tree
for classifying the current state of the prob-
lem solver. Reinforcements returned by the
problem solver govern both �tness evaluation
and intra-generation learning of the proper
actions to take at the selection points. In
theory, the hybrid RGP system hints of mu-
tual bene�ts to RL and GP in controller-
design applications, by, respectively, provid-
ing proper abstraction spaces for RL search,
and accelerating evolutionary progress via
Baldwinian or Lamarckian mechanisms. In
practice, we demonstrate RGP's improve-
ments over standard GP search on maze-
search tasks
1 Introduction
The bene�ts of combining evolution and learning,
while largely theoretical in the biological sciences,
have found solid empirical veri�cation in the �eld
of evolutionary computation (EC). When evolution-
ary algorithms (EAs) are supplemented with learning
techniques, general adaptivity improves such that the
learning EA �nds solutions faster than the standard
EA [3, 16]. These enhancements can stem from bi-
ologically plausible mechanisms such as the Baldwin
E�ect [2, 14], or from disproven phenomena such as
Lamarckianism [8, 4].
In most learning EAs, the data structure or program
in which learning occurs is divorced from the structure
that evolves. For example, a common learning EA is a
hybrid genetic-algorithm (GA) - arti�cial neural net-
work (ANN) system in which the GA encodes a basic
ANN topology (plus possibly some initial arc weights),
and the ANN then uses backpropagation or hebbian
learning to gradually modify those weights [17, 10, 6].
A Baldwin E�ect is often evident in the fact that the
GA-encoded weights improve over time, thus reduc-
ing the need for learning [1]. Lamarckianism can be
added by reversing the morphogenic process and back-
encoding the ANN's learned weights into the GA chro-
mosome prior to reproduction [12].
Our primary objective is to realize Baldwinian and
Lamarckian adaptivity within standard tree-based ge-
netic programs [7], without the need for a complex
morphogenic conversion to a separate learning struc-
ture. Hence, as the GP program runs, the tree nodes
can adapt, thereby altering (and hopefully improving)
subsequent runs of the same program. Thus, the typi-
cal problem domain is one in which each GP tree exe-
cutes many times during �tness evaluation, for exam-
ple, in control tasks.
2 RGP Overview
Reinforced Genetic Programming combines reinforce-
ment learning [11] with conventional tree-based genetic
programming [7]. This produces GP trees with rein-
forced action-choice leaf nodes, such that successive
runs of the same tree exhibit improved performance on
the �tness task. These improvements may or may not
be reverse-encoded into the genomic form of the tree,
thus facilitating tests of both Baldwinian and Lamar-
ckian enhancements to GP.
The basic idea is most easily explained by exam-
ple. Consider a small control program for a maze-
wandering agent:
19GENETIC PROGRAMMING
(if (between 0 x 5)
(if (between 0 y 5)
(choice (move-west) (move-north)) R1
(choice (move-east) (move-south))) R2
(if (between 6 x 8)
(choice (move-west) (move-east)) R3
(choice (move-north) (move-south)))) R4
Figure 1 illustrates the relationship between this pro-
gram and the 10x10 maze. Variables x and y specify
the agents current maze coordinates, while the choice
nodes are monitored action decisions. The between
predicate simply tests if the middle argument is within
the closed range speci�ed by the �rst and third argu-
ments, while the move functions are discrete one-cell
jumps. So if the agent's current location falls within
the southwest region, R1, speci�ed by the (between 0
x 5) and (between 0 y 5) predicates of the decision
tree, then the agent can choose between a westward
and a northward move; whereas the eastern edge gives
a north-south option.
During �tness testing, the agent will execute its tree
code on each timestep and perform the recommended
action in the maze, which then returns a reinforcement
signal. For example, hitting a wall may invoke a small
negative signal, while reaching a goal state would gar-
ner a large positive payback.
Initially, the choice nodes select randomly among their
possible actions, but as the �tness test proceeds, each
node accumulates reinforcement statistics as to the rel-
ative utility of each action (in the context of the par-
ticular location of the choice node in the decision tree,
which reects the location of the agent in the maze).
After a �xed number of random free trials, which is
a standard parameter in reinforcement-learning sys-
tems (RLSs), the node begins making stochastic action
choices based on the reinforcement statistics. Hence,
the node's initial exploration gives way to exploitation.
Along with determining the tree's internal decisions,
the evolving genome sets the range for RL exploration
by specifying the possible actions to the choice nodes;
the RLS then �ne-tunes the search. By including al-
ternate forms of choice nodes in GP's primitive set,
such as choice-4, choice-2, choice-1 (direct action),
where the integer denotes the number of action argu-
ments, the RGP's learning e�ort comes under evolu-
tionary control. Over many evolutionary generations,
the genomes provide more appropriate decision trees
and more restricted (yet more relevant) action options
to the RLS.
In the maze domain, learning has an implicit cost due
to the nature of the �tness function, which is based on
X
YR1
R2
R3
R4
0
9
9
?
?
?
?
Start
Goal
If (between 0 y 5)
(choice west north) (choice east south) (choice west east) (choice north south)
if (between 6 x 8)
If (between 0 x 5)Y N
Y NY N
N
Figure 1: The genetic program determines a partition-
ing of the reinforcement-learning problem space.
the average reinforcement per timestep of the agent.
So an agent that moves directly to a goal location (or
follows a wall without any explorative "bumps" into it)
will have higher average reinforcement than one that
investigates areas o� the optimal path. Initially, ex-
plorative learning helps the agent �nd the goal, but
then evolution further hones the controllers to follow
shorter paths to the goal, with little or no opportu-
nity for stochastic action choices. Hence, the average
reinforcement (i.e. �tness) steadily increases, �rst as
a result of learning (phase I of the Baldwin E�ect)
and then as a result of genomic hard-wiring (phase II)
encouraged by the implicit learning cost [9].
To exploit Lamarckianism, RGP can replace any
choice node in the genomic tree with a direct action
function for the action that was deemed best for that
node. Hence, if the choice node for R1 in Figure 1
learns that north is the best move from this region
(while choices for R2 and R3 �nd eastward moves most
pro�table, and R4 learns the advantage of southward
moves), then prior to reproduction, the genome can be
specialized to:
(if (between 0 x 5)
(if (between 0 y 5) (move-north) (move-east))
(if (between 6 x 8) (move-east) (move-south)
This represents an optimal control strategy for the ex-
ample, with no time squandered on exploration.
20 GENETIC PROGRAMMING
3 Reinforcement Learning in RGP
Reinforcement Learning comes in many shapes and
forms, and the basic design of RGP supports many of
these variations. However, the examples in this paper
use Q-learning [15] with eligibility traces.
Q-learning is an o�-policy temporal di�erencing form
of RL. In conventional RL terminology, Q(s,a) denotes
the value of choosing action a while in state s. Tempo-
ral di�erencing implies that to update Q(s,a) for the
current state, st, and most recent action, at, utilize
the di�erence between the current value of Q(st; at),
and the sum of a) the reward, rt+1, received after exe-
cuting action a in state s, and b) the discounted value
of the new state that results from performing a in s.
For the new state, st+1, its value, V (st+1) is based on
the best possible action that can be taken from st+1,
or maxaQ(st+1; a). Hence, the complete update equa-
tion is:
Q(st; at) Q(st; at)+
�[rt+1 + maxaQ(st+1; a)�Q(st; at)] (1)
Here, is the discount rate and � is the step size
or learning rate. The expression in brackets is the
temporal-di�erence error, Æt. Thus, if performing a in
s leads to positive (negative) rewards and good (bad)
next states, then Q(s; a) will increase (decrease), with
the degree of change governed by � and .
To implement these Q(s,a) updates (the core activity
of Q-learning) within GP trees, RGP employs qstate
objects, one per choice node. Each qstate houses a list
of state-action pairs (SAPs), where the value slot of
each SAP corresponds to Q(s,a). For each GP tree, a
qtable object is generated. It keeps track of all qstates
in the tree, as well as those most recently visited and
the the latest reinforcement signal.
In conventional RL, all possible states, �, are deter-
mined prior to any learning, with each state typically a
point in a space whose dimensions are the relevant en-
vironmental factors and internal state variables of the
agent. So for a maze-wandering robot, the dimensions
might be discretized