GENETIC PROGRAMMING 1gpbib.cs.ucl.ac.uk/gecco2001/d01.pdf · 2001. 5. 25. · genetic programming...

1GENETIC PROGRAMMING

2 GENETIC PROGRAMMING

Finding Perceived Pattern Structures using Genetic Programming

Mehdi Dastani

Dept. of Mathematics

and Computer Science

Free University Amsterdam

The Netherlands

email: [email protected]

Elena Marchiori




The Netherlands


Robert Voorn




The Netherlands


Abstract

Structural information theory (SIT) deals

with the perceptual organization, often called

the `gestalt' structure, of visual patterns.

Based on a set of empirically validated struc-

tural regularities, the perceived organization

of a visual pattern is claimed to be the most

regular (simplest) structure of the pattern.

The problem of �nding the perceptual orga-

nization of visual patterns has relevant ap-

plications in multi-media systems, robotics

and automatic data visualization. This pa-

per shows that genetic programming (GP) is

a suitable approach for solving this problem.

1 Introduction

In principle, a visual pattern can be described in

many di�erent ways; however, in most cases it will

be perceived as having a certain description. For

example, the visual pattern illustrated in Figure

1-A may have, among others, two descriptions as

they are illustrated in Figure 1-B and 1-C. Hu-

man perceivers prefer usually the description that

is illustrated in Figure 1-B. An empirically sup-

ported theory of visual perception is the Structural

Information Theory (SIT) [Leeuwenberg, 1971,

Van der Helm and Leeuwenberg, 1991,

Van der Helm, 1994]. SIT proposes a set of empiri-

cally validated and perceptually relevant structural

regularities and claims that the preferred description

of a visual pattern is based on the structure that

covers most regularities in that pattern. Using the

formalization of the notions of perceptually relevant

structure and simplicity given by SIT, the problem

of �nding the simplest structure of a visual pattern

(SPS problem) can be formulated mathematically as

a constrained optimization problem.

A

B C

Figure 1: Visual pattern A has two potential structures

B and C.

The SPS problem has relevant applications. For ex-

ample, multimedia systems and image databases need

to analyze, classify, and describe images in terms of

constitutive objects that human users perceives in

those images [Zhu, 1999]. Furthermore, autonomous

robots need to analyze their visual inputs and con-

struct hypotheses about possibly present objects in

their environments [Kang and Ikeuchi, 1993]. Also, in

the �elds of information visualization the goal is to

generate images that represent information such that

human viewers extract that information by looking

at the images [Bertin, 1981]. In all these applica-

tions, a model of gestalt perception is indispensable

[Mackinlay, 1986, Marks and Reiter, 1990]. We focus

on a simple domain of visual patterns and claim that

an appropriate model of gestalt perception for this do-

main is an essential step towards a model of gestalt

perception for more complex visual patterns that are

used in the above mentioned real-world applications

[Dastani, 1998].

Since the search space of possible structures grows

exponentially with the complexity of the visual pat-

tern, heuristic algorithms have to be used for solv-

ing the SPS problem eÆciently. The only algo-

rithm for SPS we are aware of is developed by

[Van der Helm and Leeuwenberg, 1986]. This algo-


rithm ignores the important source of computational

complexity of the problem and covers only a subclass

of perceptually relevant structures. The central part of

this partial algorithm consists of translating the search

for a simplest structure into a shortest route problem.

The algorithm is shown to have O(N4) computational

complexity, where N denotes the length of the input

pattern. To cover all perceptually relevant structures

for not only the domain of visual line patterns, but

also for more complex domains of visual patterns, it

is argued in [Dastani, 1998] that the computational

complexity grows exponentially with the length of the

input patterns.

This paper shows that genetic programming

[Koza, 1992] provides a natural paradigm for solving

the SPS problem using SIT. A novel evolutionary

algorithm is introduced whose main features are the

use of SIT operators for generating the initial popula-

tion of candidate structures, and the use of knowledge

based genetic operators in the evolutionary process.

The use of GP is motivated by the SIT formalization:

structures can be easily described using the standard

GP-tree representation. However, the GP search

is constrained by the fact that structures have to

characterize the same input pattern. In order to

satisfy this constraint, knowledge based operators are

used in the evolutionary process.

The paper is organized as follows. In the next section,

we briey discuss the problem of visual perception and

explain how SIT predicts the perceived structure of vi-

sual line patterns. In Section 3, SIT is used to give a

formalization of the SPS problem for visual line pat-

terns. Section 4 describes how the formalization can be

used in an automatic procedure for generating struc-

tures. Section 5 introduces the GP algorithm for SPS.

Section 6 describes implementation aspects of the al-

gorithm and reports some results of experiments. The

paper concludes with a summary of the contributions

and future research directions.

2 SIT: A Theory of Visual Perception

According to the structural information theory, the

human perceptual system is sensitive to certain

kinds of structural regularities within sensory pat-

terns. They are called perceptually relevant struc-

tural regularities, which are speci�ed by means of

ISA operators: Iteration, Symmetry and Alternations

[Van der Helm and Leeuwenberg, 1991]. Examples of

string patterns that can be speci�ed by these operators

are abab, abcba, and abgabpz, respectively. A visual

pattern can be described in di�erent ways by applying

di�erent ISA operators. In order to disambiguate the

set of descriptions and to decide on the perceived or-

ganization of the pattern, a simplicity measure, called

information load, is introduced. The information load

measures the amount of perceptually relevant regu-

larities covered by pattern descriptions. It is claimed

that the description of a visual pattern with the mini-

mum information load reects its perceived organiza-

tion [Van der Helm, 1994].

In this paper, we focus on the domain of linear line pat-

terns which are turtle-graphics, like line drawings for

which the turtle starts somewhere and moves in such

a way that the line segments are connected and do not

cross each other. A linear line pattern is encoded as

a letter string for which it can be shown that its sim-

plest description represents the perceived organization

of the encoded linear line pattern [Leeuwenberg, 1971].

The encoding process consists of two steps. In the �rst

step, the successive line segments and their relative an-

gles in the pattern are traced from the starting point

of the pattern and identical letter symbols are assigned

to identical line segments (equal length) as well as to

identical angles (relative to the trace movement). In

the second step, the letter symbols that are assigned

to line segments and angles are concatenated in the or-

der they have been visited during the trace of the �rst

step. This results in a letter string that represents the

pattern. An example of such an encoding is illustrated

in Figure 2.

x

x x

y y

a ab b b b

axaybxbybxb

Figure 2: Encoding of a line pattern into a string.

Note that letter strings are themselves perceptual pat-

terns that can be described in many di�erent ways,

one of which is usually the perceived description. The

determination of the perceived description of string

patterns is the essential focus of Hofstadter's Copycat

project [Hofstadter, 1984].

3 The SPS Problem

In this section, we formally de�ne the class of string de-

scriptions that represent possible perceptually relevant

organizations of linear line patterns. Also, a complex-

ity function is de�ned that measures the information

load of those descriptions. In this way, we can en-


code a linear line pattern into a string, generate the

perceptually relevant descriptions of the string, and

determine the perceived organization of the line pat-

tern by choosing the string description which has the

minimum information load.

The class of descriptions that represent possible per-

ceptual organizations for Linear Line Patterns LLP is

de�ned over the set E = fa; : : : ; zg as follows.

1. For all t 2 E; t 2 LLP

2. If t 2 LLP and n is a natural number, then

iter(t; n) 2 LLP

3. If t 2 LLP , then symeven(t) 2 LLP

4. If t1; t2 2 LLP , then symodd(t1; t2) 2 LLP

5. If t; t1; : : : ; tn 2 LLP , then

altleft(t; < t1; : : : ; tn >) 2 LLP and

altright(t; < t1; : : : ; tn >) 2 LLP

6. If t1; : : : ; tn 2 LLP , then con(t1; : : : ; tn) 2 LLP

The meaning of LLP expressions can be de�ned by the

denotational semantics j[ j], which involves string con-

catenation (�) and string reection (reflect(abcde) =

edcba) operators.

1. If t 2 E, then j[tj] = t

2. j[iter(t; n)j] = j[tj] � : : : � j[tj] (n times)

3. j[symeven(t)j] = j[tj] � reflect(j[tj])

4. j[symodd(t1; t2)j] = j[t1j] � j[t2j] � reflect(j[t1j])

5. j[altleft(t; < t1; : : : ; tn >)j] =

j[tj] � j[t1j] � : : : � j[tj] � j[tnj]

6. j[altright(t; < t1; : : : ; tn >)j] =

j[t1j] � j[tj] � : : : � j[tnj] � j[tj]

7. j[con(t1; : : : ; tn)j] = j[t1j] � : : : � j[tnj]

The complexity function C on LLP expressions,

measures the complexity of an expression as the

number of individual letters t occurring in it, i.e.

C(t) = 1

C(f(T1; : : : ; Tn)) =P

n

i=1C(Ti)

During the last 20 years, Leeuwenberg and his

co-workers have reported on a number of exper-

iments that tested predictions based on the sim-

plicity principle. These experiments were con-

cerned with the disambiguation of ambiguous pat-

terns. The predictions of the simplicity princi-

ple were, on the whole, con�rmed by these experi-

ments [Bu�art et al., 1981, Van Leeuwen et al., 1988,

Boselie and Wouterlood, 1989].

The following LLP expressions describe, among oth-

ers, four di�erent perceptual organizations of the pat-

tern axaybxbybxb:

- con(a; x; a; y; b; x; b; y; b; x; b),

- con(symodd(a; x); y; symodd(b; x); y; symodd(b; x))

- con(symodd(a; x); iter(con(y; b; x; b); 2))

- con(symodd(a; x); iter(altright(b;< y; x >); 2))

Note that these descriptions reect four di�erent per-

ceptual organizations of the line pattern that is illus-

trated in Figure 2. The information load of these four

descriptions are 11; 8; 6; and 5, respectively. This im-

plies that the last description reects the perceived

organization of the line pattern illustrated in Figure 2.

The SPS problem can now be de�ned as follows. Given

a pattern p, �nd a LLP expression t such that

� j[tj] = p and

� C(t) = minfC(s) j s 2 LLP and j[sj] = pg:

As mentioned in the introduction, the only (partial)

algorithm for solving SPS problem is proposed by Van

der Helm [Van der Helm and Leeuwenberg, 1986].

This algorithm �nds only a subclass of perceptually

relevant structures of string patterns by �rst con-

structing a directed acyclic graph for the given string

pattern. If we place an index after each element in

the string pattern, starting from the leftmost element,

then each node in the graph would correspond to an

index, and each link in the graph from node i to j

corresponds to a gestalt for the subpattern starting

at position i and ending at position j. Given this

graph, the SPS problem is translated to a shortest

route problem. Note that this algorithm is designed

for one-dimensional string patterns and it is not clear

how this algorithm can be applied to other domains

of perceptual patterns. Instead, our formalization

of the SPS problem can be easily applied to more

complex visual patterns by extending the LLP

with domain dependent operators such as Euclidean

transformations for two-dimensional visual patterns

[Dastani, 1998].


4 Generating LLP Expressions

In order to solve the SPS problem using genetic pro-

gramming, a probabilistic procedure for generating

LLP expressions, called BUILD-STRUCT, is used.

This procedure takes as input a string, and generates

a (tree structure of a) LLP expression for that string.

The procedure is based on a set of probabilistic pro-

duction rules.

The production rules are derived from the SIT

de�nition of expressions, and are of the form

� t1 : : : tn � �! � P (t1 : : : tn) �

where � and � are (possibly empty) LLP expressions,

t1; : : : ; tn are LLP expressions, and P is an ISA oper-

ator (of arity n). The triple (�; t1 : : : tn; �) is called

splitting of the sequence.

A snapshot of the set of production rules used in

BUILD-STRUCT is given below.

� t t � �! � iter(t; 2) �

� t iter(t; n) � �! � iter(t; n+ 1) �

� iter(t; n) t � �! � iter(t; n+ 1) �

� t1 t2 � �! � con(t1; t2) �

� con(t1; ::; tn) t � �! � con(t1; ::; tn; t) �

� t con(t1; ::; tn) � �! � con(t; t1; ::; tn) �

A production rule transforms a sequence of LLP ex-

pressions into a shorter one. In this way, the repeated

application of production rules terminates after a �-

nite number of steps and produces one LLP expres-

sion. There are two forms of non-determinism in the

algorithm:

1. the choice of which rule to apply when more than

one production rule is applicable,

2. the choice of a splitting of the sequence when more

splittings are possible.

In BUILD-STRUCT both choices are performed ran-

domly. BUILD-STRUCT employs a speci�c data

structure which results in a more eÆcient implemen-

tation of the above described non-determinism. The

BUILD-STRUCT procedure is used in the initializa-

tion of the genetic algorithm and in the mutation op-

erator.

We conclude this section with an example illustrating

the application of the production rules system. The

LLP expression iter(con(a; b; a); 2) can be obtained

using the above production rules starting from the

pattern abaaba as follows, where an underlined sub-

string indicates that an ISA operator will be applied

to that substring:

aba aba �! con(a; b; a)aba

con(a; b; a) aba �! con(a; b; a)con(a; b; a)

con(a; b; a)con(a; b; a) �! iter(con(a; b; a); 2)

Note in this example that the iter operator is

applied to two structurally identical LLP expressions

(i.e. con(a; b; a)con(a; b; a) �! iter(con(a; b; a); 2)).

In general, the ISA operators are not applied on the

basis of structural identity of LLP expressions, but

on the basis of their semantics, i.e. on the basis of the

patterns that are denoted by the LLP expressions (i.e.

symodd(a; b)con(a; b; a) �! iter(symodd(a; b); 2)).

5 A GP for the SPS Problem

This section introduces a novel evolutionary algorithm

for the SPS problem, called GPSPS (Genetic Pro-

gramming for the SPS problem), which applies GP

to SIT. A population of LLP expressions is evolved,

using knowledge based mutation and crossover op-

erators to generate new expressions, and using the

SIT complexity measure as �tness function. GPSPS

is an instance of the generational scheme, cf. e.g.

[Michalewicz, 1996], illustrated below, where P (t) de-

notes the population at iteration t and jP (t)j its size.

PROCEDURE GPSPS

t

have the highest probability of being selected. We

have also made our GP elitist to guarantee that the

best element found so far will be in the actual popu-

lation.

The main features of GPSPS are described in the rest

of this section.

5.1 Representation and Fitness

GPSPS acts on LLP expressions describing the same

string. A LLP expression is represented by means of a

tree in the style used in Genetic Programming, where

leaves are primitive elements while internal nodes are

ISA operators. The �tness function is the complexity

measure C as it is introduced in Section 3.

Thus, the goal of GPSPS is to �nd a chromosome

(representing a structure of the a given string) which

minimizes C. Given a string, a speci�c procedure is

used to ensure that the initial population contains only

chromosomes describing the same pattern. Moreover,

novel genetic operators are designed which preserve

the semantics of chromosomes.

5.2 Initialization

Given a string, chromosomes of the intial population

are generated using the procedure BUILD-STRUCT.

In this way, the initial population contains randomly

selected (representations of) LLP expressions of the

pattern.

5.3 Mutation

When the mutation operator is applied to a chromo-

some T , an internal node n of T is randomly selected

and the procedure BUILD-STRUCT is applied to the

(string represented by the) subtree of T starting at n.

Figure 3 illustrates an application of the mutation op-

erator to an internal node. Observe that each node

(except the terminals) has the same chance of being

selected. In this way smaller subtrees have a larger

chance of being modi�ed.

It is interesting to investigate the e�ectiveness of the

heuristic implemented in BUILD-STRUCT when in-

corporated into an iterated local search algorithm.

Therefore we have implemented an algorithm that mu-

tates one single element for a large number of iterations

and returns the best element that has been found over

all iterations. Although some regularities are discov-

ered by this algorithm, its performance is rather scarce

if compared with GPSPS, even when the number of it-

erations is set to be bigger than the size of the popula-

tion times the number of generations used by GPSPS.

a b

2a

iter(aa)

con

a b

(ab) 2a

iter(aa)

2

iter(abab)

con(ababaa)

symodd(aba) b

con(abab)

(ababaa)con

mutation

Figure 3: Example of the mutation-operator.

5.4 Crossover

The crossover operator cannot simply swap subtrees

between two parents, like in standard GP, due to the

semantic constraint on chromosomes (e.g. chromo-

somes have to denote the same string). Therefore, the

crossover is designed in such a way that it swaps only

subtrees that denote the same string. This is realized

by associating with each internal node of the tree the

string that is denoted by the subtree starting at that

internal node. Then, two nodes of the parents with

equal associated strings are randomly selected and the

corresponding subtrees are swapped. An example of

crossover is illustrated in Figure 4.

b b aa

con(abba)

con

a b

(ab)

symeven(abba)

ba b ca

(abbac)con

abba

con(abba)

con(abbacabba)

(abba)symodd

(bb)con

bb

symeven(abba)

(ab)con

ba

symodd(abbacabba)

a

c

con

bb

(bb)

ba b a c

con(abbac)

con(abbacabba)

symodd(abba)

a

c

symodd(abbacabba)

crossover

Figure 4: Example of the crossover-operator.


When a crossover-pair can not be found, no crossover

takes place. Fortunately this happens only for a small

portion of the crossovers. Usually there are more than

one pair to choose from. This issue is further discussed

in the next section.

5.5 Optimization

As discussed above, the mutation and crossover oper-

ators transform subtrees. When these operators are

applied, the resulting subtrees may exhibit structures

of a form suitable for optimization. For instance, sup-

pose a subtree of the form con(iter(b; 2); a; con(b; b))

is transformed by one of the operators in the sub-

tree con(iter(b; 2); a; iter(b; 2)). This improves the

complexity of the subtree. Unfortunately, based

on this new subtree the expected LLP expression

symodd(iter(b; 2); a) cannot be obtained.

The crossover operator is only helpful for this problem

if there is already a subtree that encodes that speci�c

substring with an symodd structure. This problem

could in fact be solved by applying the mutation op-

erator to the con structure. However, the probability

that the application of the mutation operator will gen-

erate the symodd structure is small.

In order to solve this problem, a simple optimization

procedure is called after each application of the mu-

tation and crossover operators. This procedure uses

simple heuristics to optimize the con structure. First,

the procedure checks if the (entire) con structure is

symmetrical and changes it into a symodd or symeven

structure if possible. If this is not the case, the pro-

cedure checks if neighboring structures that are sim-

ilar can be combined. For example, a structure of

the form con(c; iter(b; 2); iter(b; 3)) can be optimized

to con(c; iter(b; 5)). This kind of optimization is also

applied to altleft and altright structures.

6 Experiments

In this section we discuss some preliminary experi-

ments. The example strings we consider are short and

are designed to illustrate what type of structures are

interesting for this domain. The choice of the values of

the GP parameters used in the experiments is deter-

mined by the considered type of strings. Because the

strings are short, a small pool size of 50 individuals

is used. Making the size of the pool very large would

make the GP perform better, but when the pool is ini-

tialized, it would probably already contain the most

preferred structure. The number of iterations is also

small to avoid generating all possible structures and is

therefore set to 150. This allows us to draw prelimi-

nary conclusions about the performance of the GP.

Two important parameters of the GP are the mutation

and crossover rates. We have done a few test runs to

�nd a setting that produced good results. We have

set the mutation-rate on 0.6 and the crossover-rate to

0.4. The mutation is deliberately set to a higher rate,

because this operator is the most important for dis-

covering structures. The crossover operator is used to

swap substructures between good chromosomes.

We have chosen six di�erent short strings that con-

tain structures that are of interest to our search prob-

lem. Moreover, two longer strings are considered. For

the two long strings the mutation and crossover rates

above speci�ed are used, but the poolsize and the num-

ber of generations are both set to 300. The eight

strings are the code for the linear line patterns illus-

trated in Figure 5.

A

a

a

A

a

a

A

a

a

BB B

a a

A A

A A

bbbb

a a

B B Ba

a a

a

a

a

a

A

AA

A

AA

a

a

a

a

a aA

B

C

D

Ebbb

Y

XX

b

a

Y

X5

a

7

c cZ

YYY

bX

aX

Y

X X

b b

aa a

b

X

bb

Y Y Y Y

X

X

Xb

aa a a

b

SS X X8

TTE Y

X

Z UY

X bc c c caa

b b ba a

dv

A

3

1 2

aa

aa

XX

Y Zb

c

c

c

A

B B

4

6

Figure 5: Line drawings used in experiments.

The algorithm is run on each string a number of times

using di�erent random seeds. The resulting structures

are given in Figure 7, where the structure and �tnesses

of the two best elements of the �nal population are re-

ported. For each string GPSPS is able to �nd the opti-

mal structure. The results of runs with di�erent seeds

are very similar, indicating the (expected) robustness

of the algorithm on these strings.

Figure 6 illustrates how the best �tness and the mean

�tness of the population vary in a typical run of GP-


0 50 100 150 200 250 3005

10

15

20

25

30

35

Generations

Fitn

ess

Linear Line Pattern 7

Best FitnessMean Fitness

Figure 6: Best and Mean Fitness.

SPS on the line pattern number 7 of Figure 5. On this

pattern, the algorithm is able to �nd a near optimum

of rather good quality after about 50 generations, and

it spends the other 250 generations to �nd the slighly

improved structure. In this experiment about 12% of

the crossovers failed. On average there were about

2.59 possible 'crossover-pairs' possible (with a stan-

dard deviation of 1.38) when the crossover operator

was applicable.

The structures that are found are the most preferred

structures as predicted by the SIT theory. The system

is thus capable of �nding the perceived organizations

for these line drawings patterns.

7 Conclusion and Future Research

This paper discussed the problem of human visual per-

ception and introduced a formalization of a theory of

visual perception, called SIT. The claim of SIT is to

predict the perceived organization of visual patterns

on the basis of the simplicity principle. It is argued

that a full computational model for SIT is compu-

tationally intractable and that heuristic methods are

needed to compute the perceived organization of visual

patterns.

We have applied genetic programming techniques to

this formal theory of visual perception in order to com-

pute the perceived organization of visual line patterns.

Based on perceptually relevant operators from SIT, a

pool of alternative organizations of an input pattern is

generated. Motivated by SIT, mutation and crossover

operations are de�ned that can be applied to these or-

ganizations to generate new organizations for the in-

put pattern. Finally, a �tness function is de�ned that

determines the appropriateness of generated organiza-

tions. This �tness function is directly derived from

SIT and measures the simplicity of organizations.

In this paper, we have focused on a small domain of

visual linear line patterns. The next step is to extend

our system to compute the perceived organization of

more complex visual patterns like two-dimensional vi-

sual patterns, which are de�ned in terms of a variety of

visual attributes such as color, size, position, texture,

shape.

Finally, we intend to investigate whether the class of

structural regularities proposed by SIT is also relevant

for �nding meaningful organizations within patterns

from biological experiments, like DNA sequences. For

this task, we will need to modify GPSPS in order to

allow a group of letters to be treated as a primitive

element.

References

[Bertin, 1981] Bertin, J. (1981). Graphics and Graphic

Information-Processing. Walter de Gruyter, Berlin

NewYork.

[Boselie and Wouterlood, 1989] Boselie, F. and

Wouterlood, D. (1989). The minimum principle

and visual pattern completion. Psychological

Research, 51:93{101.

[Bu�art et al., 1981] Bu�art, H., Leeuwenberg, E.,

and Restle, F. (1981). Coding theory of visual pat-

tern completion. Journal of Experimental Psychol-

ogy: Human Perception and Performance, 7:241{

274.

[Dastani, 1998] Dastani, M. (1998). Ph.D. thesis, Uni-

versity of Amsterdam, The Netherlands.

[Hofstadter, 1984] Hofstadter, D. (1984). The copy-

cat project: An experiment in nondeterministic and

creative analogies. In A.I. Memo 755, Arti�cial In-

telligence Laboratory, Cambridge, Mass. MIT.

[Kang and Ikeuchi, 1993] Kang, S. and Ikeuchi, K.

(1993). Toward automatic robot instruction from

perception: Recognizing a grasp from observation.

In IEEE Trans. on Robotics and Automation, vol.

9, no. 4, pages 432{443.

[Koza, 1992] Koza, J. (1992). Genetic Programming.

MIT Press.

[Leeuwenberg, 1971] Leeuwenberg, E. (1971). A per-

ceptual coding language for visual and auditory pat-

terns. American Journal of Psychology, 84:307{349.


[Mackinlay, 1986] Mackinlay, J. (1986). Automating

the design of graphical presentations of relational

information. In ACM Transactions on Graphics,

volume 5, pages 110{141.

[Marks and Reiter, 1990] Marks, J. and Reiter, E.

(1990). Avoiding unwanted conversational implica-

tures in text and graphics. In Proceeding AAAI,

Menlo Park, CA.

[Michalewicz, 1996] Michalewicz, Z. (1996). Genetic

Algorithms + Data Structures = Evolution Pro-

grams. Springer-Verlag, Berlin.

[Van der Helm, 1994] Van der Helm, P. (1994). The

dynamics of pragnanz. Psychological Research,

56:224{236.

[Van der Helm and Leeuwenberg, 1986] Van der

Helm, P. and Leeuwenberg, E. (1986). Avoiding

explosive search in automatic selection of simplest

pattern codes. Pattern Recognition, 19:181{191.

[Van der Helm and Leeuwenberg, 1991] Van der

Helm, P. and Leeuwenberg, E. (1991). Accessi-

bility: A criterion for regularity and hierarchy

in visual pattern code. Journal of Mathematical

Psychology, 35:151{213.

[Van Leeuwen et al., 1988] Van Leeuwen, C., Bu�art,

H., and Van der Vegt, J. (1988). Sequence inuence

on the organization of meaningless serial stimuli:

economy after all. Journal of Experimental Psychol-

ogy: Human Perception and Performance, 14:481{

502.

[Zhu, 1999] Zhu, S. (Nov, 1999). Embedding gestalt

laws in markov random �elds - a theory for shape

modeling and perceptual organization. IEEE Trans.

on Pattern Analysis and Machine Intelligence, Vol.

21, No.11.

1 string:

aAaAaAaAaAaAaA

structure:

a) iter(con(a,A),7)

b) con(iter(con(a,A),2),iter(con(a,A),5))

complexity

a) 2

b) 4

2 string:

aAaBbAbBbAbBaAa

structure:

a) symodd(altleft(a,),B)

b) symodd(con(symodd(a,A),altright(b,)),B)

complexity

a) 6

b) 6

3 string:

aAaBaAaBaAaB

structure:

a) iter(altleft(a,),3)

b) iter(con(symodd(a,A),B), 3)

complexity

a) 3

b) 3

4 string:

aXaYaXaZbAcBcBc

structure:

a) altleft(symodd(a,X),

Reducing Bloat and Promoting Diversity usingMulti-Objective Methods

Edwin D. de Jong1;2 Richard A. Watson2 Jordan B. Pollack2

fedwin, richardw, [email protected] Universiteit Brussel, AI Lab, Pleinlaan 2, B-1050 Brussels, Belgium

2Brandeis University, DEMO Lab, Computer Science dept., Waltham, MA 02454, USA

Category: Genetic Programming

Abstract

Two important problems in genetic program-

ming (GP) are its tendency to �nd unnec-

essarily large trees (bloat), and the general

evolutionary algorithms problem that diver-

sity in the population can be lost prema-

turely. The prevention of these problems

is frequently an implicit goal of basic GP.

We explore the potential of techniques from

multi-objective optimization to aid GP by

adding explicit objectives to avoid bloat and

promote diversity. The even 3, 4, and 5-

parity problems were solved eÆciently com-

pared to basic GP results from the litera-

ture. Even though only non-dominated in-

dividuals were selected and populations thus

remained extremely small, appropriate diver-

sity was maintained. The size of individuals

visited during search consistently remained

small, and solutions of what we believe to be

the minimum size were found for the 3, 4,

and 5-parity problems.

Keywords: genetic programming, code growth,

bloat, introns, diversity maintenance, evolutionary

multi-objective optimization, Pareto optimality

1 INTRODUCTION

A well-known problem in genetic programming (GP),

is the tendency to �nd larger and larger programs over

time (Tackett, 1993; Blickle & Thiele, 1994; Nordin &

Banzhaf, 1995; McPhee & Miller, 1995; Soule & Fos-

ter, 1999), called bloat or code growth. This is harm-

ful since it results in larger solutions than necessary.

Moreover, it increasingly slows down the rate at which

new individuals can be evaluated. Thus, keeping the

size of trees that are visited small is generally an im-

plicit objective of GP.

Another important issue in GP and in other methods

of evolutionary computation is that of how diversity

of the population can be achieved and maintained. A

population that is spread out over promising parts of

the search space has more chance of �nding a solution

than one that is concentrated on a single �tness peak.

Since members of a diverse population solve parts of

the problem in di�erent ways, it may also be more

likely to discover partial solutions that can be utilized

through crossover. Diversity is not an objective in the

conventional sense; it applies to the populations visited

during the search, not to �nal solutions. A less obvious

idea then is to view the contribution of individuals to

population diversity as an objective.

Multi-objective techniques are speci�cally designed for

problems in which knowledge about multiple objec-

tives is available, see e.g. Fonseca and Fleming (1995)

for an overview. The main idea of this paper is to

use multi-objective techniques to add the objectives of

size and diversity in addition to the usual objective of

a problem-speci�c �tness measure. A multi-objective

approach to bloat appears promising and has been

used before (Langdon, 1996; Rodriguez-Vazquez, Fon-

seca, & Fleming, 1997), but has not become standard

practice. The reason may be that basic multi-objective

methods, when used with small tree size as an objec-

tive, can result in premature convergence to small in-

dividuals (Langdon & Nordin, 2000; Ekart, 2001). We

therefore investigate the use of a size objective in com-

bination with explicit diversity maintenance.

The remaining sections discuss the n-parity problem

(2), bloat (3), multi-objective methods (4), diversity

maintenance(5), ideas behind the approach, called FO-

CUS, (6), algorithmic details (7), results (8), and con-

clusions (9).

2 THE N-PARITY PROBLEM

The test problems that will be used in this paper are

even n-parity problems, with n ranging from 3 to 5.

A correct solution to this problem takes a binary se-

quence of length n as input and returns true (one) if


X0 X1 X0 X1

NORAND

OR

Figure 1: A correct solution to the 2-parity problem

the number of ones in the sequence is even, and false

(zero) if it is odd. It is named even to avoid confusion

with the related odd parity problem, which gives the

inverse answer. Trees may use the following boolean

operators as internal nodes: AND, OR, NAND, and

NOR. Each leaf speci�es an element of the sequence.

The �tness is the fraction of all possible length n bi-

nary sequences for which the program returns the cor-

rect answer. Figure 1 shows an example.

The n-parity problem has been selected because it is a

diÆcult problem that has been used by a number of re-

searchers. With increasing order, the problem quickly

becomes more diÆcult. One way to understand its

hardness is that for any setting of the bits, ipping

any bit inverts the outcome of the parity function.

Equivalently, its Karnaugh map (Zissos, 1972) equals

a checkerboard function, and thus has no adjacencies.

2.1 SIZE OF THE SMALLEST

SOLUTIONS TO N-PARITY

We believe that the correct solutions to n-parity con-

structed as follows are of minimal size, but are not able

to prove this. The principle is to recursively divide the

bit sequence in half and, take the parity of each halve,

and feed these two into a parity function. For subse-

quences of size one, i.e. single bits, the bit itself is used

instead of its parity. When this occurs for one of the

two arguments, the outcome would be inverted, and

thus the odd 2-parity function is used to obtain the

even 2-parity of the bits.

Let S be a binary sequence of length jSj = n � 2.S is divided in half yielding two subsequences L and

R with, for even n, length n2or, for odd n, lengths

n�1

2and n+1

2. Then the following recursively de�ned

function P(S) gives a correct expression for the even-

parity of S for jSj � 2 in terms of the above operators:

P (S) =

8<:S if jSj = 1ODD(P (L); P (R)) if jSj > 1 ^ g(L;R)EVEN(P (L); P (R)) otherwise

whereODD(A, B) = NOR(AND(A, B), NOR(A, B)),EVEN(A, B) = OR(AND(A, B), NOR(A, B)), and

g(A;B) =

�TRUE if (jAj = 1) XOR (jBj = 1)FALSE else

Table 1: Length of the shortest solution to n-parity

using the operators AND, OR, NAND, and NOR.

n 1 2 3 4 5 6 7

Length 3 7 19 31 55 79 103

The length jP (S)j of the expression P (S) satis�es:

jP (S)j =

�1 for jSj = 1

3 + 2jP (L)j + 2jP (R)j for jSj > 1

For n = 2i; i > 0, this expression can be shown to

equal 2n2 � 1. Table 1 gives the lengths of the ex-pressions for the �rst seven even-n-parity problems.

For jSj = 1, the shortest expression is NOR(S, S); forjSj > 1, the length is given by the above expression.The rapid growth with increasing order stems from the

repeated doubling of the required inputs.

3 THE PROBLEM OF BLOAT

A well-known problem, known as bloat or code growth,

is that the trees considered during a GP run grow

in size and become larger than is necessary to rep-

resent good solutions. This is undesirable because it

slows down the search by increasing evaluation and

manipulation time and, if the growth consists largely

of non-functional code, by decreasing the probability

that crossover or mutation will change the operational

part of the tree. Also, compact trees have been linked

to improved generalization (Rosca, 1996).

Several causes of bloat have been suggested. First,

under certain restrictions (Soule, 1998), crossover fa-

vors smaller than average subtrees in removal but

not in replacement. Second, larger trees are more

likely to produce �t (and large) o�spring because

non-functional code can play a protective role against

crossover (Nordin & Banzhaf, 1995) and, if the prob-

ability of mutating a node decreases with increasing

tree size, against mutation. Third, the search space

contains more large than small individuals (Langdon

& Poli, 1998).

Nordin and Banzhaf (1995) observed that the length

of the e�ective part of programs decreases over time.

However, the total length of the programs in the ex-

periments also increased rapidly, and hence it may be

concluded that in those experiments bloat was mainly

due to growth of ine�ective code (introns).

Finally, it is conceivable that in some circumstances

non-functional code may be useful. It has been sug-

gested that introns may be useful for retaining code

that is not used in the current individual but is a

helpful building block that may be used later (Nordin,

Francone, & Banzhaf, 1996).


Table 2: Properties of the basic GP method used.

Problem 3-ParityFitness Fraction of correct answersOperators AND, OR, NAND, and NORStop criterion 500,000 evaluations or solutionInitial tree size Uniform [1..20] internal nodesCycle generationalPopulation Size 1000Parent selection Boltzmann with T = 0.1Replacement CompleteUniqueness check Individuals occur at most onceP(crossover) 0.9P(mutation) 0.1Mutation method Mutate node with P = 1

n

0

100

200

300

400

500

600

700

0 20000 40000 60000 80000 1000000

0.25

0.5

0.75

1

Ave

rage

tree

siz

e

Number of fitness evaluations

Average treesize

0

100

200

300

400

500

600

700

0 20000 40000 60000 80000 1000000

0.25

0.5

0.75

1

Ave

rage

tree

siz

e


Average treesize

0

100

200

300

400

500

600

700

0 20000 40000 60000 80000 1000000

0.25

0.5

0.75

1

Ave

rage

tree

siz

e


Average treesize

0

100

200

300

400

500

600

700

0 20000 40000 60000 80000 1000000

0.25

0.5

0.75

1

Ave

rage

tree

siz

e


Average treesizeFraction of runs that yielded solution

Size of smallest correct tree

Figure 2: Average tree sizes of ten di�erent runs (solid

lines) using basic GP on the 3-parity program.

3.1 OBSERVATION OF BLOAT USING

BASIC GP

To con�rm that bloat does indeed occur in the test

problem of n-parity using basic GP, thirty runs where

performed for the 3-parity problem. The parameters

of the run are shown in Table 2. A run ends when

a correct solution has been found. Figure 2 shows

that average tree sizes increase rapidly in each run. If

a solution is not found at an early point in the run,

bloating rapidly increases the sizes of the trees in the

population, thus increasingly slowing down the search.

A single run of 111,054 evaluations already took more

than 15 hours on a current PC running Linux due to

the increasing amount of processing required per tree

as a result of bloat. The population of size-unlimited

trees that occurred in the single 4-parity run that

was tried (with trees containing up to 6,000 nodes)

�lled virtually the entire swap space and caused per-

formance to degrade to impractical levels. Clearly, the

problem of bloat must be addressed in order to solve

these and higher order versions of the problem in an

eÆcient manner.

0

100

200

300

400

500

600

700

0 20000 40000 60000 80000 1000000

0.25

0.5

0.75

1

Fra

ctio

n of

suc

cess

ful r

uns


Average treesize

0

100

200

300

400

500

600

700

0 20000 40000 60000 80000 1000000

0.25

0.5

0.75

1

Fra

ctio

n of

suc

cess

ful r

uns


Average treesize

0

100

200

300

400

500

600

700

0 20000 40000 60000 80000 1000000

0.25

0.5

0.75

1

Fra

ctio

n of

suc

cess

ful r

uns


Average treesize

0

100

200

300

400

500

600

700

0 20000 40000 60000 80000 1000000

0.25

0.5

0.75

1

Fra

ctio

n of

suc

cess

ful r

uns



Minimum size of correct tree

Figure 3: Average tree sizes and fraction of successful

runs in the 3-parity problem using basic GP with a tree

size limit of 200. Tree sizes are successfully limited, of

course, but the approach is not ideal (see text).

3.2 USING A FIXED TREE SIZE LIMIT

Probably the most common way to avoid bloat is to

simply limit the allowed tree size or depth (Langdon &

Poli, 1998; Koza, 1992), although the latter has been

found to lead to loss of diversity near the root node

when used with crossover (Gathercole & Ross, 1996).

Figure 3 shows the e�ect of using a limit of 200 on 3-

parity. This limit is well above the minimum size of a

correct solution, but not too high either since several

larger solutions were found in the unrestricted run.

The average tree size is around 140 nodes.

On the 4-parity problem (with a tree size limit of 200),

the average tree size varied around 150. However,

whereas on 3-parity 90% of the runs found a solution

within 100,000 evaluations, on 4-parity only 33% of

the runs found a solution within 500,000 evaluations,

testifying to the increased diÆculty of this order of

the parity problem. For 5-parity, basic GP found no

solutions within 1,000,000 evaluations for any of the

30 runs. Thus, our version of GP with �xed tree size

limit does not scale up well. Furthermore, a funda-

mental problem with this method of preventing bloat

is that the maximum tree size has to be selected before

the search, when it is often unknown.

3.3 WEIGHTED SUM OF FITNESS AND

SIZE

Instead of choosing a �xed tree size limit in advance

one would rather like to have the algorithm search for

trees that can be as large as they need to be, but not

much larger. A popular approach that goes some way

towards this goal is to include a component in the �t-

ness that rewards small trees or programs. This is

mostly done by adding a component to the �tness,

thus making �tness a linear combination of a perfor-

mance measure and a parsimony measure (Koza, 1992;

Soule, Foster, & Dickinson, 1996). However, this ap-

proach is not without its own problems (Soule & Fos-


Objective 1

Objective 2

Non-dominated

Highest isocline of weightedsum that crosses an individual

Direction in which weighted sum increases

individuals

Figure 4: Schematic rendition of a concave tradeo�

surface. This occurs when better performance in one

objective means worse performance in the other, vice

versa. The lines mark the maximum �tness individu-

als for three example weightings (see vectors) using a

linear weighting of the objectives. No linear weight-

ing exists that �nds the in-between individuals, with

reasonable performance in both objectives.

ter, 1999). First, the weight of the parsimony measure

must be determined beforehand, and so a choice con-

cerning the tradeo� between size and performance is

already made before the search. Furthermore, if the

tradeo� surface between the two �tness components

is concave1 (see Fig. 4), a linear weighting of the two

components favors individuals that do well in one of

the objectives, but excludes individuals that perform

reasonably in both respects (Fleming & Pashkevich,

1985).

Soule and Foster (1999) have investigated why a linear

weighting of �tness and size has yielded mixed results.

It was found that a weight value that adequately bal-

ances �tness and size is diÆcult to �nd. However, if

the required balance is di�erent for di�erent regions

in objective space, then adequate parsimony pressure

cannot be speci�ed using a single weight. If this is

the case, then methods should be used that do not at-

tempt to �nd such a single balance. This idea forms

the basis of multi-objective optimization.

4 MULTI-OBJECTIVE METHODS

After several early papers describing the idea of opti-

mizing for multiple objectives in evolutionary compu-

tation (Scha�er, 1985; Goldberg, 1989), the approach

has recently received increasing attention (Fonseca &

Fleming, 1995; Van Veldhuizen, 1999). The basic idea

is to search for multiple solutions, each of which satisfy

the di�erent objectives to di�erent degrees. Thus, the

selection of the �nal solution with a particular com-

bination of objective values is postponed until a time

when it is known what combinations exist.

A key concept in multi-objective optimization is that

of dominance. Let individual xAhave values A

ifor the

n objectives, and individual xBhave objective values

1Since �tness is to be maximized, the tradeo� curveshown is concave.

Bi. Then A dominates B if

8i 2 [1::n] : Ai� B

i^ 9i : A

i> B

i

Multi-objective optimization methods typically strive

for Pareto optimal solutions, i.e. individuals that are

not dominated by any other individuals.

5 DIVERSITY MAINTENANCE

A key di�erence between classic search methods and

evolutionary approaches is that in the latter a popu-

lation of individuals is maintained. The idea behind

this is that by maintaining individuals in several re-

gions of the search space that look promising (diver-

sity maintenance), there is a higher chance of �nding

useful material from which to construct solutions.

In order to maintain the existing diversity of a pop-

ulation, evolutionary methods typically keep some or

many of the individuals that happen to have been gen-

erated and have relatively high �tness, but lower than

that found so far. In the same way, evolutionary multi-

objective methods usually keep some dominated indi-

viduals in addition to the non-dominated individuals

(Fonseca & Fleming, 1993). However, this appears to

be a somewhat arbitrary way of maintaining diversity.

In the following section, we present a more directed

method. The relation to other diversity maintenance

methods is discussed.

6 THE FOCUS METHOD

We propose to do diversity maintenance by using a

basic multi-objective algorithm and including an ob-

jective that actively promotes diversity. To the best

of our knowledge, this idea has not been used in other

work, including multi-objective research. If it works

well, the need for keeping arbitrary dominated indi-

viduals may be avoided. To test this, we use the di-

versity objective in combination with a multi-objective

method that only keeps non-dominated individuals, as

reported in section 8.

The approach strongly directs the attention of the

search towards the explicitly speci�ed objectives. We

therefore name this method FOCUS, which stands for

Find Only and Complete Undominated Sets, reecting

the fact that populations only contain non-dominated

individuals, and contain all such individuals encoun-

tered so far. Focusing on non-dominated individuals

combines naturally with the idea that the objectives

are responsible for exploration, and this combination

de�nes the FOCUS method.

The concept of diversity applies to populations, mean-

ing that they are dispersed. To translate this aim into

an objective for individuals, a metric has to be de�ned

that, when optimized by individuals, leads to diverse

populations. The metric used here is that of average


squared distance to the other members of the popu-

lation. When this measure is maximized, individuals

are driven away from each other.

Interestingly, the average distance metric strongly de-

pends on the current population. If the population

were centered around a single central peak in the �t-

ness landscape, then individuals that moved away from

that peak could survive by satisfying the diversity ob-

jective better than the individuals around the �tness

peak. It might be expected that this would cause

large parts of the population to occupy regions that

are merely far away from other individuals but are not

relevant to the problem. However, if there are any

di�erences in �tness in the newly explored region of

the search space, then the �tter individuals will come

to replace individuals that merely performed well on

diversity. When more individuals are created in the

same region, the potential for scoring highly on diver-

sity for those individuals diminishes, and other areas

will be explored. The dynamics thus created are a new

way to maintain diversity.

Other techniques that aim to promote diversity in a di-

rected way exist, and include �tness sharing (Goldberg

& Richardson, 1987; Deb & Goldberg, 1989), deter-

ministic crowding (Mahfoud, 1995), and �tness derat-

ing (Beasley, Bull, & Martin, 1993). A distinguishing

feature of the method proposed here is that in choos-

ing the diversity objective, problem-based criteria can

be used to determine which individuals should be kept

for exploration purposes.

7 ALGORITHM DETAILS

The algorithm selects individuals if and only if they are

not dominated by other individuals in the population.

The population is initialized with 300 randomly cre-

ated individuals of 1 to 20 internal nodes. A cycle

proceeds as follows. A chosen number n of new indi-

viduals (300) is generated based on the current popu-

lation using crossover (90%) and mutation (10%). If

the individual already exists in the population, it is

mutated. If the result also exists, it is discarded. Oth-

erwise it is added to the population. All individuals

are then evaluated if necessary. After evaluation, all

population members are checked against other popu-

lation members, and removed if dominated by any of

them.

A slightly stricter criterion than Pareto's is used: A

dominates B if 8i 2 [1::n] : Ai� B

i. Of multiple indi-

viduals occupying the same point on the tradeo� sur-

face, precisely one will remain, since the removal cri-

terion is applied sequentially. This criterion was used

because the Pareto criterion caused a proliferation of

individuals occupying the same point on the trade-o�

surface when no diversity objective was used2.

2In later experiments including the diversity objec-

0

100

200

300

400

500

600

700

0 20000 40000 60000 80000 1000000

0.25

0.5

0.75

1

Fra

ctio

n of

suc

cess

ful r

uns


Average treesize

0

100

200

300

400

500

600

700

0 20000 40000 60000 80000 1000000

0.25

0.5

0.75

1

Fra

ctio

n of

suc

cess

ful r

uns


Average treesize

0

100

200

300

400

500

600

700

0 20000 40000 60000 80000 1000000

0.25

0.5

0.75

1

Fra

ctio

n of

suc

cess

ful r

uns


Average treesize

0

100

200

300

400

500

600

700

0 20000 40000 60000 80000 1000000

0.25

0.5

0.75

1

Fra

ctio

n of

suc

cess

ful r

uns



Minimum size of correct tree

Figure 5: Average tree size and fraction of successful

runs for the [�tness, size, diversity] objective vector on

the 3-parity problem. The trees are much smaller than

for basic GP, and solutions are found faster.

The following distance measure is used in the diversity

objective. The distance between two corresponding

nodes is zero if they are identical and one if they are

not. The distance between two trees is the sum of the

distances of the corresponding nodes, i.e. nodes that

overlap when the two trees are overlaid, starting from

the root. The distance between two trees is normalized

by dividing by the size of the smaller tree of the two.

8 EXPERIMENTAL RESULTS

In the following experiments we use �tness, size, and

diversity as objectives. The implementation of the ob-

jectives is as follows. Fitness is the fraction of all 2n

input combinations handled correctly. For size, we use

1 over the number of nodes in the tree as the objective

value. The diversity objective is the average squared

distance to the other population members.

8.1 USING FITNESS, SIZE, AND

DIVERSITY AS OBJECTIVES

Fig. 5 shows the graph of Fig. 3 for the method of

using �tness, size, and diversity as objectives. The av-

erage tree size remains extremely small. In addition,

a glance at the graphs indicates that correct solutions

are found more quickly. To determine whether this

is indeed the case, we compute the computational ef-

fort, i.e. the expected number of evaluations required

to yield a correct solution with a 99% probability, as

described in detail by Koza (1994).

The impression that correct solutions to 3-parity are

found more quickly for the multi-objective approach

(see Figure 6) is con�rmed by considering the com-

putational e�ort E; whereas GP with the tree size

limit requires 72,044 evaluations, the multi-objective

approach requires 42,965 evaluations. For the 4-

parity problem, the di�erence is larger; basic GP needs

tive, this proliferation was not observed, and the standardPareto criterion also worked satisfactorily.


0

100000

200000

300000

400000

500000

600000

0 50000 1000000

0.5

1

Exp

ecte

d R

equi

red

eval

uatio

ns

P(c

orre

ct s

olut

ion)

Evaluations

GP: E = 72,044

MO: E = 42,965

P for MO methodP for GP

I for MO methodI for GP

Figure 6: Probability of �nding a solution and com-

putational e�ort for 3-parity using basic GP and the

multi-objective method.

0

2e+06

4e+06

6e+06

8e+06

1e+07

1.2e+07

1.4e+07

0 100000 200000 300000 400000 5000000

0.5

1

Exp

ecte

d R

equi

red

eval

uatio

ns

P(c

orre

ct s

olut

ion)

Evaluations

MO: E = 238,856

GP: E = 5,410,550

P for MO methodP for GP

I for MO methodI for GP

Figure 7: Probability of �nding a solution and compu-

tational e�ort for 4-parity for basic GP and the multi-

objective method. The performance of the multi-

objective method is considerably superior.

5,410,550 evaluations, whereas the multi-objective ap-

proach requires only 238,856. This is a dramatic im-

provement, and demonstrates that our method can be

very e�ective.

Finally, experiments have been performed using the

even more diÆcult 5-parity problem. For this prob-

lem, basic GP did not �nd any correct solutions within

a million evaluations. The multi-objective method did

�nd solutions, and did so reasonably eÆciently, requir-

ing a computational e�ort of 1,140,000 evaluations.

Table 3 summarizes the results of the experiments.

Considering the average size of correct solutions on

3-parity, the multi-objective method outperforms all

methods that have been compared, as the �rst solution

it �nds has 30.4 nodes on average. What's more, the

multi-objective method also requires a smaller num-

ber of evaluations to do so than the other methods.

Finally, perhaps most surprisingly, it �nds correct so-

lutions using extremely small populations, typically

containing less than 10 individuals. For example, the

average population size over the whole experiment for

3-parity was 6.4, and 8.5 at the end of the experiment,

Table 3: Results of the experiments (GP and Multi-

Objective rows). For comparison, results of Koza's

(1994) set of experiments (population size 16,000) and

the best results with other con�gurations (population

size 4,000) found there. E: computational e�ort, S:

average tree size of �rst solution, Pop: average popu-

lation size.

3-parity E S PopGP 72,044 93.67 1000Multi-objective 42,965 30.4 6.4Koza GP 96,000 44.6 16,000Koza GP-ADF 64,000 48.2 16,0004-parity E S PopGP 5,410,550 154 1000Multi-objective 238,856 68.5 15.8Koza GP 384,000 112.6 16,000Koza GP-ADF 176,000 60.1 16,0005-parity E S PopGP 11 n.a. n.aMulti-objective 1,140,000 218.7 49.7Koza GP 6,528,000 299.9 16,000Koza GP 1,632,000 299.9 4,000Koza GP-ADF 464,000 156.8 16,000Koza GP-ADF 272,000 99.5 4,000

1No solutions were found for 5-parity using basic GP.

and the highest population size encountered in all 30

runs was 18. This suggests that the diversity main-

tenance achieved by using this greedy multi-objective

method in combination with an explicit diversity ob-

jective is e�ective, since even extremely small popula-

tions did not result in premature convergence.

Considering 4 and 5-parity, the GP extended with the

size and diversity objectives outperforms both basic

GP methods used by Koza (1994) and the basic GP

method tested here, both in terms of computational

e�ort and tree size. The Automatically De�ned Func-

tion (ADF) experiments performed by Koza for these

and larger problem sizes perform better. These prob-

ably bene�t from the inductive bias of ADFs, which

favors a modular structure. Therefore, a natural di-

rection for future experiments is to also extend ADFs

with size and diversity objectives.

For comparison, we also implemented an evolutionary

multi-objective technique that does keep some domi-

nated individuals. It used the number of individuals by

which an individual is dominated as a rank, similar to

the method described by Fonseca and Fleming (1993).

The results were similar in terms of evaluations, but

the method keeping strictly non-dominated individuals

worked faster, probably due to the calculation of the

distance measure. Since this is quadratic in the pop-

ulation size, the small populations of multi-objective

save much time (about a factor 7 for 5-parity), which

made it preferable.


As a control experiment, we also investigated whether

the diversity objective is really required by using

only �tness and size as objectives using the algorithm

that was described. The individuals found are small

(around 10 nodes), but the �tness of the individuals

found was well below basic GP, and hence the diver-

sity objective was indeed performing a useful function

in the experiments.

8.2 OBTAINING STILL SMALLER

SOLUTIONS

Finally, we investigate whether the algorithm is able

to �nd smaller solutions, after �nding the �rst. Af-

ter the �rst correct solution is found, we monitor the

smallest correct solution. Although the �rst solution

size of 30 was already low compared to other methods,

the algorithm rapidly �nds smaller correct solutions.

The average size drops to 22 within 4,000 additional

evaluations, and converges to around 20. The smallest

tree (found in 12 out of 30 runs) was 19, i.e. equalling

the presumed minimum size. On 4-parity, solutions

dropped in size from the initial 68.5 to 50 in about

10,000 evaluations, and to 41 on average when runs

were continued longer (85,000 evaluations). In 12 of

the 30 runs, minimum size solutions (31 nodes) were

found. Using the same method, a minimum size solu-

tion to 5-parity (55 nodes) was also found.

The quick convergence to smaller tree sizes shows that

at least for the problem at hand, the method is e�ec-

tive at �nding small solutions when it is continued run-

ning after the �rst correct solutions have been found,

in line with the seeding experiments by Langdon and

Nordin (2000).

9 CONCLUSIONS

The paper has discussed using multi-objective meth-

ods as a general approach to avoiding bloat in GP

and to promoting diversity, which is relevant to evo-

lutionary algorithms in general. Since both of these

issues are often implicit goals, a straightforward idea

is to make them explicit by adding corresponding ob-

jectives. In the experiments that are reported, a size

objective rewards smaller trees, and a diversity objec-

tive rewards trees that are di�erent from other individ-

uals in the population, as calculated using a distance

measure.

Strongly positive results are reported regarding both

size control and diversity maintenance. The method

is successful in keeping the trees that are visited small

without requiring a size limit or a relative weighting of

�tness and size. It impressively outperforms basic GP

on the 3, 4, and 5-parity problem both with respect

to computational e�ort and tree size. Furthermore,

correct solutions of what we believe to be the minimum

size have been found for all problem sizes examined,

i.e. the even 3, 4, and 5-parity problems.

The e�ectiveness of the new way of promoting diver-

sity proposed here can be assessed from the follow-

ing, which concerns the even 3, 4, and 5-parity prob-

lems. The multi-objective algorithm that was used

only maintains individuals that are not dominated by

other individuals found so far, and maintains all such

individuals (except those with identical objective vec-

tors). Thus, only non-dominated individuals are se-

lected after each generation, and populations (hence)

remained extremely small (6, 16, and 50 on average,

respectively). In de�ance of this uncommon degree of

greediness or elitism, suÆcient diversity was achieved

to solve these problems eÆciently in comparison with

basic GP method results both as obtained here and as

found in the literature. Control experiments in which

the diversity objective was removed (leaving the �t-

ness and size objectives) failed to maintain suÆcient

diversity, as would be expected.

The approach that was pursued here is to make de-

sired characteristics of search into explicit objectives

using multi-objective methods. This method is simple

and straightforward and performed well on the prob-

lem sizes reported, in that it improved the performance

of basic GP on 3 and 4-parity. It solved 5-parity rea-

sonably eÆciently, even though basic GP found no so-

lutions on 5-parity. For problem sizes of 6 and larger,

basic GP is no longer feasible, and more sophisticated

methods must be invoked that make use of modular-

ity, such as Koza's Automatically De�ned Functions

(1994) or Angeline's GLiB (1992). We expect that the

multi-objective approach with size and diversity as ob-

jectives that was followed here could also be of value

when used in combination with these or other existing

methods in evolutionary computation.

Acknowledgements

The authors would like to thank Michiel de Jong,

Pablo Funes, Hod Lipson, and Alfonso Renart for use-

ful comments and suggestions concerning this work.

Edwin de Jong gratefully acknowledges a Fulbright

grant.

References

Angeline, P. J., & Pollack, J. B. (1992). The evolutionaryinduction of subroutines. In Proceedings of the fourteenthannual conference of the cognitive science society (p. 236-241). Bloomington, Indiana, USA: Lawrence Erlbaum.

Beasley, D., Bull, D. R., & Martin, R. R. (1993). A sequen-tial niche technique for multimodal function optimization.Evolutionary Computation, 1 (2), 101{125.

Blickle, T., & Thiele, L. (1994). Genetic programming andredundancy. In J. Hopf (Ed.), Genetic algorithms withinthe framework of evolutionary computation (workshop atki-94, saarbrucken) (pp. 33{38). Im Stadtwald, Building


44, D-66123 Saarbrucken, Germany: Max-Planck-Institutfur Informatik (MPI-I-94-241).

Deb, K., & Goldberg, D. E. (1989). An investigation ofniche and species formation in genetic function optimiza-tion. In J. D. Scha�er (Ed.), Proceedings of the 3rd in-ternational conference on genetic algorithms (pp. 42{50).George Mason University: Morgan Kaufmann.

Ekart, A. (2001). Selection based on the Pareto nondomi-nation criterion for controlling code growth in genetic pro-gramming. Genetic Programming and Evolvable Machines,2, 61-73.

Fleming, P. J., & Pashkevich, A. P. (1985). Computer-aided control system design using a multiobjective opti-mization approach. In Proceedings of the iee internationalconference | control '85 (pp. 174{179). Cambridge, UK.

Fonseca, C. M., & Fleming, P. J. (1993). Genetic Algo-rithms for Multiobjective Optimization: Formulation, Dis-cussion and Generalization. In S. Forrest (Ed.), Proceedingsof the �fth international conference on genetic algorithms(ICGA'93) (pp. 416{423). San Mateo, California: MorganKau�man Publishers.

Fonseca, C. M., & Fleming, P. J. (1995). An Overview ofEvolutionary Algorithms in Multiobjective Optimization.Evolutionary Computation, 3 (1), 1{16.

Gathercole, C., & Ross, P. (1996). An adverse interactionbetween crossover and restricted tree depth in genetic pro-gramming. In J. R. Koza, D. E. Goldberg, D. B. Fogel, &R. L. Riolo (Eds.), Genetic programming 1996: Proceed-ings of the �rst annual conference (pp. 291{296). StanfordUniversity, CA, USA: MIT Press.

Goldberg, D. E. (1989). Genetic algorithms in search,optimization, and machine learning. Addison-Wesley.

Goldberg, D. E., & Richardson, J. (1987). Genetic algo-rithms with sharing for multimodal function optimization.In J. J. Grefenstette (Ed.), Genetic algorithms and theirapplications : Proc. of the second Int. Conf. on GeneticAlgorithms (pp. 41{49). Hillsdale, NJ: Lawrence ErlbaumAssoc.

Koza, J. R. (1992). Genetic programming. Cambridge,MA: MIT Press.

Koza, J. R. (1994). Genetic programming II: Automaticdiscovery of reusable programs. Cambridge, MA: MITPress.

Langdon, W. B. (1996). Advances in genetic programming2. In P. J. Angeline & K. Kinnear (Eds.), (p. 395-414).Cambridge, MA: MIT Press. (Chapter 20)

Langdon, W. B., & Nordin, J. P. (2000). Seeding GP pop-ulations. In R. Poli, W. Banzhaf, W. B. Langdon, J. F.Miller, P. Nordin, & T. C. Fogarty (Eds.), Genetic pro-gramming, proceedings of eurogp'2000 (Vol. 1802, pp. 304{315). Edinburgh: Springer-Verlag.

Langdon, W. B., & Poli, R. (1998). Fitness causes bloat:Mutation. In W. Banzhaf, R. Poli, M. Schoenauer, & T. C.Fogarty (Eds.), Proceedings of the �rst european workshopon genetic programming (Vol. 1391, pp. 37{48). Paris:Springer-Verlag.

Mahfoud, S. W. (1995). Niching methods for genetic al-gorithms. Unpublished doctoral dissertation, University ofIllinois at Urbana-Champaign, Urbana, IL, USA. (IlliGALReport 95001)

McPhee, N. F., & Miller, J. D. (1995). Accurate repli-cation in genetic programming. In L. Eshelman (Ed.),Genetic algorithms: Proceedings of the sixth internationalconference (icga95) (pp. 303{309). Pittsburgh, PA, USA:Morgan Kaufmann.

Nordin, P., & Banzhaf, W. (1995). Complexity compres-sion and evolution. In L. Eshelman (Ed.), Genetic algo-rithms: Proceedings of the sixth international conference(icga95) (pp. 310{317). Pittsburgh, PA, USA: MorganKaufmann.

Nordin, P., Francone, F., & Banzhaf, W. (1996). Explicitlyde�ned introns and destructive crossover in genetic pro-gramming. In P. J. Angeline & K. E. Kinnear, Jr. (Eds.),Advances in genetic programming 2 (pp. 111{134). Cam-bridge, MA, USA: MIT Press.

Rodriguez-Vazquez, K., Fonseca, C. M., & Fleming, P. J.(1997). Multiobjective genetic programming: A nonlinearsystem identi�cation application. In J. R. Koza (Ed.), Latebreaking papers at the 1997 genetic programming confer-ence (pp. 207{212). Stanford University, CA, USA: Stan-ford Bookstore.

Rosca, J. (1996). Generality versus size in genetic pro-gramming. In J. R. Koza, D. E. Goldberg, D. B. Fogel, &R. L. Riolo (Eds.), Genetic programming 1996: Proceed-ings of the �rst annual conference (pp. 381{387). StanfordUniversity, CA, USA: MIT Press.

Scha�er, J. D. (1985). Multiple objective optimizationwith vector evaluated genetic algorithms. In J. J. Grefen-stette (Ed.), Proceedings of the 1st international conferenceon genetic algorithms and their applications (pp. 93{100).Pittsburgh, PA: Lawrence Erlbaum Associates.

Soule, T. (1998). Code growth in genetic programming.Unpublished doctoral dissertation, University of Idaho.

Soule, T., & Foster, J. A. (1999). E�ects of code growthand parsimony presure on populations in genetic program-ming. Evolutionary Computation, 6 (4), 293{309.

Soule, T., Foster, J. A., & Dickinson, J. (1996). Codegrowth in genetic programming. In J. R. Koza, D. E. Gold-berg, D. B. Fogel, & R. L. Riolo (Eds.), Genetic program-ming 1996: Proceedings of the �rst annual conference (pp.215{223). Stanford University, CA, USA: MIT Press.

Tackett, W. A. (1993). Genetic programming for featurediscovery and image discrimination. In S. Forrest (Ed.),Proceedings of the 5th international conference on geneticalgorithms, icga-93 (pp. 303{309). University of Illinois atUrbana-Champaign: Morgan Kaufmann.

Van Veldhuizen, D. A. (1999). Multiobjective Evolution-ary Algorithms: Classi�cations, Analyses, and New Inno-vations. Unpublished doctoral dissertation, Departmentof Electrical and Computer Engineering. Graduate Schoolof Engineering. Air Force Institute of Technology, Wright-Patterson AFB, Ohio.

Zissos, D. (1972). Logic design algorithms. London: OxfordUniversity Press.


Adaptive Genetic Programs via Reinforcement Learning

Keith L. Downing

Department of Computer Science

The Norwegian University of Science and Technology (NTNU)

7020 Trondheim, Norway

tele: (+47) 73 59 18 40


Abstract

Reinforced Genetic Programming (RGP) en-

hances standard tree-based genetic program-

ming (GP) [7] with reinforcement learning

(RL)[11]. Essentially, leaf nodes of GP trees

become monitored action-selection points,

while the internal nodes form a decision tree

for classifying the current state of the prob-

lem solver. Reinforcements returned by the

problem solver govern both �tness evaluation

and intra-generation learning of the proper

actions to take at the selection points. In

theory, the hybrid RGP system hints of mu-

tual bene�ts to RL and GP in controller-

design applications, by, respectively, provid-

ing proper abstraction spaces for RL search,

and accelerating evolutionary progress via

Baldwinian or Lamarckian mechanisms. In

practice, we demonstrate RGP's improve-

ments over standard GP search on maze-

search tasks

1 Introduction

The bene�ts of combining evolution and learning,

while largely theoretical in the biological sciences,

have found solid empirical veri�cation in the �eld

of evolutionary computation (EC). When evolution-

ary algorithms (EAs) are supplemented with learning

techniques, general adaptivity improves such that the

learning EA �nds solutions faster than the standard

EA [3, 16]. These enhancements can stem from bi-

ologically plausible mechanisms such as the Baldwin

E�ect [2, 14], or from disproven phenomena such as

Lamarckianism [8, 4].

In most learning EAs, the data structure or program

in which learning occurs is divorced from the structure

that evolves. For example, a common learning EA is a

hybrid genetic-algorithm (GA) - arti�cial neural net-

work (ANN) system in which the GA encodes a basic

ANN topology (plus possibly some initial arc weights),

and the ANN then uses backpropagation or hebbian

learning to gradually modify those weights [17, 10, 6].

A Baldwin E�ect is often evident in the fact that the

GA-encoded weights improve over time, thus reduc-

ing the need for learning [1]. Lamarckianism can be

added by reversing the morphogenic process and back-

encoding the ANN's learned weights into the GA chro-

mosome prior to reproduction [12].

Our primary objective is to realize Baldwinian and

Lamarckian adaptivity within standard tree-based ge-

netic programs [7], without the need for a complex

morphogenic conversion to a separate learning struc-

ture. Hence, as the GP program runs, the tree nodes

can adapt, thereby altering (and hopefully improving)

subsequent runs of the same program. Thus, the typi-

cal problem domain is one in which each GP tree exe-

cutes many times during �tness evaluation, for exam-

ple, in control tasks.

2 RGP Overview

Reinforced Genetic Programming combines reinforce-

ment learning [11] with conventional tree-based genetic

programming [7]. This produces GP trees with rein-

forced action-choice leaf nodes, such that successive

runs of the same tree exhibit improved performance on

the �tness task. These improvements may or may not

be reverse-encoded into the genomic form of the tree,

thus facilitating tests of both Baldwinian and Lamar-

ckian enhancements to GP.

The basic idea is most easily explained by exam-

ple. Consider a small control program for a maze-

wandering agent:


(if (between 0 x 5)

(if (between 0 y 5)

(choice (move-west) (move-north)) R1

(choice (move-east) (move-south))) R2

(if (between 6 x 8)

(choice (move-west) (move-east)) R3

(choice (move-north) (move-south)))) R4

Figure 1 illustrates the relationship between this pro-

gram and the 10x10 maze. Variables x and y specify

the agents current maze coordinates, while the choice

nodes are monitored action decisions. The between

predicate simply tests if the middle argument is within

the closed range speci�ed by the �rst and third argu-

ments, while the move functions are discrete one-cell

jumps. So if the agent's current location falls within

the southwest region, R1, speci�ed by the (between 0

x 5) and (between 0 y 5) predicates of the decision

tree, then the agent can choose between a westward

and a northward move; whereas the eastern edge gives

a north-south option.

During �tness testing, the agent will execute its tree

code on each timestep and perform the recommended

action in the maze, which then returns a reinforcement

signal. For example, hitting a wall may invoke a small

negative signal, while reaching a goal state would gar-

ner a large positive payback.

Initially, the choice nodes select randomly among their

possible actions, but as the �tness test proceeds, each

node accumulates reinforcement statistics as to the rel-

ative utility of each action (in the context of the par-

ticular location of the choice node in the decision tree,

which reects the location of the agent in the maze).

After a �xed number of random free trials, which is

a standard parameter in reinforcement-learning sys-

tems (RLSs), the node begins making stochastic action

choices based on the reinforcement statistics. Hence,

the node's initial exploration gives way to exploitation.

Along with determining the tree's internal decisions,

the evolving genome sets the range for RL exploration

by specifying the possible actions to the choice nodes;

the RLS then �ne-tunes the search. By including al-

ternate forms of choice nodes in GP's primitive set,

such as choice-4, choice-2, choice-1 (direct action),

where the integer denotes the number of action argu-

ments, the RGP's learning e�ort comes under evolu-

tionary control. Over many evolutionary generations,

the genomes provide more appropriate decision trees

and more restricted (yet more relevant) action options

to the RLS.

In the maze domain, learning has an implicit cost due

to the nature of the �tness function, which is based on

X

YR1

R2

R3

R4

0

9

9

?

?

?

?

Start

Goal

If (between 0 y 5)

(choice west north) (choice east south) (choice west east) (choice north south)

if (between 6 x 8)

If (between 0 x 5)Y N

Y NY N

N

Figure 1: The genetic program determines a partition-

ing of the reinforcement-learning problem space.

the average reinforcement per timestep of the agent.

So an agent that moves directly to a goal location (or

follows a wall without any explorative "bumps" into it)

will have higher average reinforcement than one that

investigates areas o� the optimal path. Initially, ex-

plorative learning helps the agent �nd the goal, but

then evolution further hones the controllers to follow

shorter paths to the goal, with little or no opportu-

nity for stochastic action choices. Hence, the average

reinforcement (i.e. �tness) steadily increases, �rst as

a result of learning (phase I of the Baldwin E�ect)

and then as a result of genomic hard-wiring (phase II)

encouraged by the implicit learning cost [9].

To exploit Lamarckianism, RGP can replace any

choice node in the genomic tree with a direct action

function for the action that was deemed best for that

node. Hence, if the choice node for R1 in Figure 1

learns that north is the best move from this region

(while choices for R2 and R3 �nd eastward moves most

pro�table, and R4 learns the advantage of southward

moves), then prior to reproduction, the genome can be

specialized to:

(if (between 0 x 5)

(if (between 0 y 5) (move-north) (move-east))

(if (between 6 x 8) (move-east) (move-south)

This represents an optimal control strategy for the ex-

ample, with no time squandered on exploration.


3 Reinforcement Learning in RGP

Reinforcement Learning comes in many shapes and

forms, and the basic design of RGP supports many of

these variations. However, the examples in this paper

use Q-learning [15] with eligibility traces.

Q-learning is an o�-policy temporal di�erencing form

of RL. In conventional RL terminology, Q(s,a) denotes

the value of choosing action a while in state s. Tempo-

ral di�erencing implies that to update Q(s,a) for the

current state, st, and most recent action, at, utilize

the di�erence between the current value of Q(st; at),

and the sum of a) the reward, rt+1, received after exe-

cuting action a in state s, and b) the discounted value

of the new state that results from performing a in s.

For the new state, st+1, its value, V (st+1) is based on

the best possible action that can be taken from st+1,

or maxaQ(st+1; a). Hence, the complete update equa-

tion is:

Q(st; at) Q(st; at)+

�[rt+1 + maxaQ(st+1; a)�Q(st; at)] (1)

Here, is the discount rate and � is the step size

or learning rate. The expression in brackets is the

temporal-di�erence error, Æt. Thus, if performing a in

s leads to positive (negative) rewards and good (bad)

next states, then Q(s; a) will increase (decrease), with

the degree of change governed by � and .

To implement these Q(s,a) updates (the core activity

of Q-learning) within GP trees, RGP employs qstate

objects, one per choice node. Each qstate houses a list

of state-action pairs (SAPs), where the value slot of

each SAP corresponds to Q(s,a). For each GP tree, a

qtable object is generated. It keeps track of all qstates

in the tree, as well as those most recently visited and

the the latest reinforcement signal.

In conventional RL, all possible states, �, are deter-

mined prior to any learning, with each state typically a

point in a space whose dimensions are the relevant en-

vironmental factors and internal state variables of the

agent. So for a maze-wandering robot, the dimensions

might be discretized

Date post:	22-Oct-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

GENETIC PROGRAMMING 1gpbib.cs.ucl.ac.uk/gecco2001/d01.pdf · 2001. 5. 25. · genetic programming...

Documents