+ All Categories
Home > Documents > GENETIC PROGRAMMING 1gpbib.cs.ucl.ac.uk/gecco2001/d01.pdf · 2001. 5. 25. · genetic programming...

GENETIC PROGRAMMING 1gpbib.cs.ucl.ac.uk/gecco2001/d01.pdf · 2001. 5. 25. · genetic programming...

Date post: 22-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
170
1 GENETIC PROGRAMMING
Transcript
  • 1GENETIC PROGRAMMING

  • 2 GENETIC PROGRAMMING

  • Finding Perceived Pattern Structures using Genetic Programming

    Mehdi Dastani

    Dept. of Mathematics

    and Computer Science

    Free University Amsterdam

    The Netherlands

    email: [email protected]

    Elena Marchiori

    Dept. of Mathematics

    and Computer Science

    Free University Amsterdam

    The Netherlands

    email: [email protected]

    Robert Voorn

    Dept. of Mathematics

    and Computer Science

    Free University Amsterdam

    The Netherlands

    email: [email protected]

    Abstract

    Structural information theory (SIT) deals

    with the perceptual organization, often called

    the `gestalt' structure, of visual patterns.

    Based on a set of empirically validated struc-

    tural regularities, the perceived organization

    of a visual pattern is claimed to be the most

    regular (simplest) structure of the pattern.

    The problem of �nding the perceptual orga-

    nization of visual patterns has relevant ap-

    plications in multi-media systems, robotics

    and automatic data visualization. This pa-

    per shows that genetic programming (GP) is

    a suitable approach for solving this problem.

    1 Introduction

    In principle, a visual pattern can be described in

    many di�erent ways; however, in most cases it will

    be perceived as having a certain description. For

    example, the visual pattern illustrated in Figure

    1-A may have, among others, two descriptions as

    they are illustrated in Figure 1-B and 1-C. Hu-

    man perceivers prefer usually the description that

    is illustrated in Figure 1-B. An empirically sup-

    ported theory of visual perception is the Structural

    Information Theory (SIT) [Leeuwenberg, 1971,

    Van der Helm and Leeuwenberg, 1991,

    Van der Helm, 1994]. SIT proposes a set of empiri-

    cally validated and perceptually relevant structural

    regularities and claims that the preferred description

    of a visual pattern is based on the structure that

    covers most regularities in that pattern. Using the

    formalization of the notions of perceptually relevant

    structure and simplicity given by SIT, the problem

    of �nding the simplest structure of a visual pattern

    (SPS problem) can be formulated mathematically as

    a constrained optimization problem.

    A

    B C

    Figure 1: Visual pattern A has two potential structures

    B and C.

    The SPS problem has relevant applications. For ex-

    ample, multimedia systems and image databases need

    to analyze, classify, and describe images in terms of

    constitutive objects that human users perceives in

    those images [Zhu, 1999]. Furthermore, autonomous

    robots need to analyze their visual inputs and con-

    struct hypotheses about possibly present objects in

    their environments [Kang and Ikeuchi, 1993]. Also, in

    the �elds of information visualization the goal is to

    generate images that represent information such that

    human viewers extract that information by looking

    at the images [Bertin, 1981]. In all these applica-

    tions, a model of gestalt perception is indispensable

    [Mackinlay, 1986, Marks and Reiter, 1990]. We focus

    on a simple domain of visual patterns and claim that

    an appropriate model of gestalt perception for this do-

    main is an essential step towards a model of gestalt

    perception for more complex visual patterns that are

    used in the above mentioned real-world applications

    [Dastani, 1998].

    Since the search space of possible structures grows

    exponentially with the complexity of the visual pat-

    tern, heuristic algorithms have to be used for solv-

    ing the SPS problem eÆciently. The only algo-

    rithm for SPS we are aware of is developed by

    [Van der Helm and Leeuwenberg, 1986]. This algo-

    3GENETIC PROGRAMMING

  • rithm ignores the important source of computational

    complexity of the problem and covers only a subclass

    of perceptually relevant structures. The central part of

    this partial algorithm consists of translating the search

    for a simplest structure into a shortest route problem.

    The algorithm is shown to have O(N4) computational

    complexity, where N denotes the length of the input

    pattern. To cover all perceptually relevant structures

    for not only the domain of visual line patterns, but

    also for more complex domains of visual patterns, it

    is argued in [Dastani, 1998] that the computational

    complexity grows exponentially with the length of the

    input patterns.

    This paper shows that genetic programming

    [Koza, 1992] provides a natural paradigm for solving

    the SPS problem using SIT. A novel evolutionary

    algorithm is introduced whose main features are the

    use of SIT operators for generating the initial popula-

    tion of candidate structures, and the use of knowledge

    based genetic operators in the evolutionary process.

    The use of GP is motivated by the SIT formalization:

    structures can be easily described using the standard

    GP-tree representation. However, the GP search

    is constrained by the fact that structures have to

    characterize the same input pattern. In order to

    satisfy this constraint, knowledge based operators are

    used in the evolutionary process.

    The paper is organized as follows. In the next section,

    we briey discuss the problem of visual perception and

    explain how SIT predicts the perceived structure of vi-

    sual line patterns. In Section 3, SIT is used to give a

    formalization of the SPS problem for visual line pat-

    terns. Section 4 describes how the formalization can be

    used in an automatic procedure for generating struc-

    tures. Section 5 introduces the GP algorithm for SPS.

    Section 6 describes implementation aspects of the al-

    gorithm and reports some results of experiments. The

    paper concludes with a summary of the contributions

    and future research directions.

    2 SIT: A Theory of Visual Perception

    According to the structural information theory, the

    human perceptual system is sensitive to certain

    kinds of structural regularities within sensory pat-

    terns. They are called perceptually relevant struc-

    tural regularities, which are speci�ed by means of

    ISA operators: Iteration, Symmetry and Alternations

    [Van der Helm and Leeuwenberg, 1991]. Examples of

    string patterns that can be speci�ed by these operators

    are abab, abcba, and abgabpz, respectively. A visual

    pattern can be described in di�erent ways by applying

    di�erent ISA operators. In order to disambiguate the

    set of descriptions and to decide on the perceived or-

    ganization of the pattern, a simplicity measure, called

    information load, is introduced. The information load

    measures the amount of perceptually relevant regu-

    larities covered by pattern descriptions. It is claimed

    that the description of a visual pattern with the mini-

    mum information load reects its perceived organiza-

    tion [Van der Helm, 1994].

    In this paper, we focus on the domain of linear line pat-

    terns which are turtle-graphics, like line drawings for

    which the turtle starts somewhere and moves in such

    a way that the line segments are connected and do not

    cross each other. A linear line pattern is encoded as

    a letter string for which it can be shown that its sim-

    plest description represents the perceived organization

    of the encoded linear line pattern [Leeuwenberg, 1971].

    The encoding process consists of two steps. In the �rst

    step, the successive line segments and their relative an-

    gles in the pattern are traced from the starting point

    of the pattern and identical letter symbols are assigned

    to identical line segments (equal length) as well as to

    identical angles (relative to the trace movement). In

    the second step, the letter symbols that are assigned

    to line segments and angles are concatenated in the or-

    der they have been visited during the trace of the �rst

    step. This results in a letter string that represents the

    pattern. An example of such an encoding is illustrated

    in Figure 2.

    x

    x x

    y y

    a ab b b b

    axaybxbybxb

    Figure 2: Encoding of a line pattern into a string.

    Note that letter strings are themselves perceptual pat-

    terns that can be described in many di�erent ways,

    one of which is usually the perceived description. The

    determination of the perceived description of string

    patterns is the essential focus of Hofstadter's Copycat

    project [Hofstadter, 1984].

    3 The SPS Problem

    In this section, we formally de�ne the class of string de-

    scriptions that represent possible perceptually relevant

    organizations of linear line patterns. Also, a complex-

    ity function is de�ned that measures the information

    load of those descriptions. In this way, we can en-

    4 GENETIC PROGRAMMING

  • code a linear line pattern into a string, generate the

    perceptually relevant descriptions of the string, and

    determine the perceived organization of the line pat-

    tern by choosing the string description which has the

    minimum information load.

    The class of descriptions that represent possible per-

    ceptual organizations for Linear Line Patterns LLP is

    de�ned over the set E = fa; : : : ; zg as follows.

    1. For all t 2 E; t 2 LLP

    2. If t 2 LLP and n is a natural number, then

    iter(t; n) 2 LLP

    3. If t 2 LLP , then symeven(t) 2 LLP

    4. If t1; t2 2 LLP , then symodd(t1; t2) 2 LLP

    5. If t; t1; : : : ; tn 2 LLP , then

    altleft(t; < t1; : : : ; tn >) 2 LLP and

    altright(t; < t1; : : : ; tn >) 2 LLP

    6. If t1; : : : ; tn 2 LLP , then con(t1; : : : ; tn) 2 LLP

    The meaning of LLP expressions can be de�ned by the

    denotational semantics j[ j], which involves string con-

    catenation (�) and string reection (reflect(abcde) =

    edcba) operators.

    1. If t 2 E, then j[tj] = t

    2. j[iter(t; n)j] = j[tj] � : : : � j[tj] (n times)

    3. j[symeven(t)j] = j[tj] � reflect(j[tj])

    4. j[symodd(t1; t2)j] = j[t1j] � j[t2j] � reflect(j[t1j])

    5. j[altleft(t; < t1; : : : ; tn >)j] =

    j[tj] � j[t1j] � : : : � j[tj] � j[tnj]

    6. j[altright(t; < t1; : : : ; tn >)j] =

    j[t1j] � j[tj] � : : : � j[tnj] � j[tj]

    7. j[con(t1; : : : ; tn)j] = j[t1j] � : : : � j[tnj]

    The complexity function C on LLP expressions,

    measures the complexity of an expression as the

    number of individual letters t occurring in it, i.e.

    C(t) = 1

    C(f(T1; : : : ; Tn)) =P

    n

    i=1C(Ti)

    During the last 20 years, Leeuwenberg and his

    co-workers have reported on a number of exper-

    iments that tested predictions based on the sim-

    plicity principle. These experiments were con-

    cerned with the disambiguation of ambiguous pat-

    terns. The predictions of the simplicity princi-

    ple were, on the whole, con�rmed by these experi-

    ments [Bu�art et al., 1981, Van Leeuwen et al., 1988,

    Boselie and Wouterlood, 1989].

    The following LLP expressions describe, among oth-

    ers, four di�erent perceptual organizations of the pat-

    tern axaybxbybxb:

    - con(a; x; a; y; b; x; b; y; b; x; b),

    - con(symodd(a; x); y; symodd(b; x); y; symodd(b; x))

    - con(symodd(a; x); iter(con(y; b; x; b); 2))

    - con(symodd(a; x); iter(altright(b;< y; x >); 2))

    Note that these descriptions reect four di�erent per-

    ceptual organizations of the line pattern that is illus-

    trated in Figure 2. The information load of these four

    descriptions are 11; 8; 6; and 5, respectively. This im-

    plies that the last description reects the perceived

    organization of the line pattern illustrated in Figure 2.

    The SPS problem can now be de�ned as follows. Given

    a pattern p, �nd a LLP expression t such that

    � j[tj] = p and

    � C(t) = minfC(s) j s 2 LLP and j[sj] = pg:

    As mentioned in the introduction, the only (partial)

    algorithm for solving SPS problem is proposed by Van

    der Helm [Van der Helm and Leeuwenberg, 1986].

    This algorithm �nds only a subclass of perceptually

    relevant structures of string patterns by �rst con-

    structing a directed acyclic graph for the given string

    pattern. If we place an index after each element in

    the string pattern, starting from the leftmost element,

    then each node in the graph would correspond to an

    index, and each link in the graph from node i to j

    corresponds to a gestalt for the subpattern starting

    at position i and ending at position j. Given this

    graph, the SPS problem is translated to a shortest

    route problem. Note that this algorithm is designed

    for one-dimensional string patterns and it is not clear

    how this algorithm can be applied to other domains

    of perceptual patterns. Instead, our formalization

    of the SPS problem can be easily applied to more

    complex visual patterns by extending the LLP

    with domain dependent operators such as Euclidean

    transformations for two-dimensional visual patterns

    [Dastani, 1998].

    5GENETIC PROGRAMMING

  • 4 Generating LLP Expressions

    In order to solve the SPS problem using genetic pro-

    gramming, a probabilistic procedure for generating

    LLP expressions, called BUILD-STRUCT, is used.

    This procedure takes as input a string, and generates

    a (tree structure of a) LLP expression for that string.

    The procedure is based on a set of probabilistic pro-

    duction rules.

    The production rules are derived from the SIT

    de�nition of expressions, and are of the form

    � t1 : : : tn � �! � P (t1 : : : tn) �

    where � and � are (possibly empty) LLP expressions,

    t1; : : : ; tn are LLP expressions, and P is an ISA oper-

    ator (of arity n). The triple (�; t1 : : : tn; �) is called

    splitting of the sequence.

    A snapshot of the set of production rules used in

    BUILD-STRUCT is given below.

    � t t � �! � iter(t; 2) �

    � t iter(t; n) � �! � iter(t; n+ 1) �

    � iter(t; n) t � �! � iter(t; n+ 1) �

    � t1 t2 � �! � con(t1; t2) �

    � con(t1; ::; tn) t � �! � con(t1; ::; tn; t) �

    � t con(t1; ::; tn) � �! � con(t; t1; ::; tn) �

    A production rule transforms a sequence of LLP ex-

    pressions into a shorter one. In this way, the repeated

    application of production rules terminates after a �-

    nite number of steps and produces one LLP expres-

    sion. There are two forms of non-determinism in the

    algorithm:

    1. the choice of which rule to apply when more than

    one production rule is applicable,

    2. the choice of a splitting of the sequence when more

    splittings are possible.

    In BUILD-STRUCT both choices are performed ran-

    domly. BUILD-STRUCT employs a speci�c data

    structure which results in a more eÆcient implemen-

    tation of the above described non-determinism. The

    BUILD-STRUCT procedure is used in the initializa-

    tion of the genetic algorithm and in the mutation op-

    erator.

    We conclude this section with an example illustrating

    the application of the production rules system. The

    LLP expression iter(con(a; b; a); 2) can be obtained

    using the above production rules starting from the

    pattern abaaba as follows, where an underlined sub-

    string indicates that an ISA operator will be applied

    to that substring:

    aba aba �! con(a; b; a)aba

    con(a; b; a) aba �! con(a; b; a)con(a; b; a)

    con(a; b; a)con(a; b; a) �! iter(con(a; b; a); 2)

    Note in this example that the iter operator is

    applied to two structurally identical LLP expressions

    (i.e. con(a; b; a)con(a; b; a) �! iter(con(a; b; a); 2)).

    In general, the ISA operators are not applied on the

    basis of structural identity of LLP expressions, but

    on the basis of their semantics, i.e. on the basis of the

    patterns that are denoted by the LLP expressions (i.e.

    symodd(a; b)con(a; b; a) �! iter(symodd(a; b); 2)).

    5 A GP for the SPS Problem

    This section introduces a novel evolutionary algorithm

    for the SPS problem, called GPSPS (Genetic Pro-

    gramming for the SPS problem), which applies GP

    to SIT. A population of LLP expressions is evolved,

    using knowledge based mutation and crossover op-

    erators to generate new expressions, and using the

    SIT complexity measure as �tness function. GPSPS

    is an instance of the generational scheme, cf. e.g.

    [Michalewicz, 1996], illustrated below, where P (t) de-

    notes the population at iteration t and jP (t)j its size.

    PROCEDURE GPSPS

    t

  • have the highest probability of being selected. We

    have also made our GP elitist to guarantee that the

    best element found so far will be in the actual popu-

    lation.

    The main features of GPSPS are described in the rest

    of this section.

    5.1 Representation and Fitness

    GPSPS acts on LLP expressions describing the same

    string. A LLP expression is represented by means of a

    tree in the style used in Genetic Programming, where

    leaves are primitive elements while internal nodes are

    ISA operators. The �tness function is the complexity

    measure C as it is introduced in Section 3.

    Thus, the goal of GPSPS is to �nd a chromosome

    (representing a structure of the a given string) which

    minimizes C. Given a string, a speci�c procedure is

    used to ensure that the initial population contains only

    chromosomes describing the same pattern. Moreover,

    novel genetic operators are designed which preserve

    the semantics of chromosomes.

    5.2 Initialization

    Given a string, chromosomes of the intial population

    are generated using the procedure BUILD-STRUCT.

    In this way, the initial population contains randomly

    selected (representations of) LLP expressions of the

    pattern.

    5.3 Mutation

    When the mutation operator is applied to a chromo-

    some T , an internal node n of T is randomly selected

    and the procedure BUILD-STRUCT is applied to the

    (string represented by the) subtree of T starting at n.

    Figure 3 illustrates an application of the mutation op-

    erator to an internal node. Observe that each node

    (except the terminals) has the same chance of being

    selected. In this way smaller subtrees have a larger

    chance of being modi�ed.

    It is interesting to investigate the e�ectiveness of the

    heuristic implemented in BUILD-STRUCT when in-

    corporated into an iterated local search algorithm.

    Therefore we have implemented an algorithm that mu-

    tates one single element for a large number of iterations

    and returns the best element that has been found over

    all iterations. Although some regularities are discov-

    ered by this algorithm, its performance is rather scarce

    if compared with GPSPS, even when the number of it-

    erations is set to be bigger than the size of the popula-

    tion times the number of generations used by GPSPS.

    a b

    2a

    iter(aa)

    con

    a b

    (ab) 2a

    iter(aa)

    2

    iter(abab)

    con(ababaa)

    symodd(aba) b

    con(abab)

    (ababaa)con

    mutation

    Figure 3: Example of the mutation-operator.

    5.4 Crossover

    The crossover operator cannot simply swap subtrees

    between two parents, like in standard GP, due to the

    semantic constraint on chromosomes (e.g. chromo-

    somes have to denote the same string). Therefore, the

    crossover is designed in such a way that it swaps only

    subtrees that denote the same string. This is realized

    by associating with each internal node of the tree the

    string that is denoted by the subtree starting at that

    internal node. Then, two nodes of the parents with

    equal associated strings are randomly selected and the

    corresponding subtrees are swapped. An example of

    crossover is illustrated in Figure 4.

    b b aa

    con(abba)

    con

    a b

    (ab)

    symeven(abba)

    ba b ca

    (abbac)con

    abba

    con(abba)

    con(abbacabba)

    (abba)symodd

    (bb)con

    bb

    symeven(abba)

    (ab)con

    ba

    symodd(abbacabba)

    a

    c

    con

    bb

    (bb)

    ba b a c

    con(abbac)

    con(abbacabba)

    symodd(abba)

    a

    c

    symodd(abbacabba)

    crossover

    Figure 4: Example of the crossover-operator.

    7GENETIC PROGRAMMING

  • When a crossover-pair can not be found, no crossover

    takes place. Fortunately this happens only for a small

    portion of the crossovers. Usually there are more than

    one pair to choose from. This issue is further discussed

    in the next section.

    5.5 Optimization

    As discussed above, the mutation and crossover oper-

    ators transform subtrees. When these operators are

    applied, the resulting subtrees may exhibit structures

    of a form suitable for optimization. For instance, sup-

    pose a subtree of the form con(iter(b; 2); a; con(b; b))

    is transformed by one of the operators in the sub-

    tree con(iter(b; 2); a; iter(b; 2)). This improves the

    complexity of the subtree. Unfortunately, based

    on this new subtree the expected LLP expression

    symodd(iter(b; 2); a) cannot be obtained.

    The crossover operator is only helpful for this problem

    if there is already a subtree that encodes that speci�c

    substring with an symodd structure. This problem

    could in fact be solved by applying the mutation op-

    erator to the con structure. However, the probability

    that the application of the mutation operator will gen-

    erate the symodd structure is small.

    In order to solve this problem, a simple optimization

    procedure is called after each application of the mu-

    tation and crossover operators. This procedure uses

    simple heuristics to optimize the con structure. First,

    the procedure checks if the (entire) con structure is

    symmetrical and changes it into a symodd or symeven

    structure if possible. If this is not the case, the pro-

    cedure checks if neighboring structures that are sim-

    ilar can be combined. For example, a structure of

    the form con(c; iter(b; 2); iter(b; 3)) can be optimized

    to con(c; iter(b; 5)). This kind of optimization is also

    applied to altleft and altright structures.

    6 Experiments

    In this section we discuss some preliminary experi-

    ments. The example strings we consider are short and

    are designed to illustrate what type of structures are

    interesting for this domain. The choice of the values of

    the GP parameters used in the experiments is deter-

    mined by the considered type of strings. Because the

    strings are short, a small pool size of 50 individuals

    is used. Making the size of the pool very large would

    make the GP perform better, but when the pool is ini-

    tialized, it would probably already contain the most

    preferred structure. The number of iterations is also

    small to avoid generating all possible structures and is

    therefore set to 150. This allows us to draw prelimi-

    nary conclusions about the performance of the GP.

    Two important parameters of the GP are the mutation

    and crossover rates. We have done a few test runs to

    �nd a setting that produced good results. We have

    set the mutation-rate on 0.6 and the crossover-rate to

    0.4. The mutation is deliberately set to a higher rate,

    because this operator is the most important for dis-

    covering structures. The crossover operator is used to

    swap substructures between good chromosomes.

    We have chosen six di�erent short strings that con-

    tain structures that are of interest to our search prob-

    lem. Moreover, two longer strings are considered. For

    the two long strings the mutation and crossover rates

    above speci�ed are used, but the poolsize and the num-

    ber of generations are both set to 300. The eight

    strings are the code for the linear line patterns illus-

    trated in Figure 5.

    A

    a

    a

    A

    a

    a

    A

    a

    a

    BB B

    a a

    A A

    A A

    bbbb

    a a

    B B Ba

    a a

    a

    a

    a

    a

    A

    AA

    A

    AA

    a

    a

    a

    a

    a aA

    B

    C

    D

    Ebbb

    Y

    XX

    b

    a

    Y

    X5

    a

    7

    c cZ

    YYY

    bX

    aX

    Y

    X X

    b b

    aa a

    b

    X

    bb

    Y Y Y Y

    X

    X

    Xb

    aa a a

    b

    SS X X8

    TTE Y

    X

    Z UY

    X bc c c caa

    b b ba a

    dv

    A

    3

    1 2

    aa

    aa

    XX

    Y Zb

    c

    c

    c

    A

    B B

    4

    6

    Figure 5: Line drawings used in experiments.

    The algorithm is run on each string a number of times

    using di�erent random seeds. The resulting structures

    are given in Figure 7, where the structure and �tnesses

    of the two best elements of the �nal population are re-

    ported. For each string GPSPS is able to �nd the opti-

    mal structure. The results of runs with di�erent seeds

    are very similar, indicating the (expected) robustness

    of the algorithm on these strings.

    Figure 6 illustrates how the best �tness and the mean

    �tness of the population vary in a typical run of GP-

    8 GENETIC PROGRAMMING

  • 0 50 100 150 200 250 3005

    10

    15

    20

    25

    30

    35

    Generations

    Fitn

    ess

    Linear Line Pattern 7

    Best FitnessMean Fitness

    Figure 6: Best and Mean Fitness.

    SPS on the line pattern number 7 of Figure 5. On this

    pattern, the algorithm is able to �nd a near optimum

    of rather good quality after about 50 generations, and

    it spends the other 250 generations to �nd the slighly

    improved structure. In this experiment about 12% of

    the crossovers failed. On average there were about

    2.59 possible 'crossover-pairs' possible (with a stan-

    dard deviation of 1.38) when the crossover operator

    was applicable.

    The structures that are found are the most preferred

    structures as predicted by the SIT theory. The system

    is thus capable of �nding the perceived organizations

    for these line drawings patterns.

    7 Conclusion and Future Research

    This paper discussed the problem of human visual per-

    ception and introduced a formalization of a theory of

    visual perception, called SIT. The claim of SIT is to

    predict the perceived organization of visual patterns

    on the basis of the simplicity principle. It is argued

    that a full computational model for SIT is compu-

    tationally intractable and that heuristic methods are

    needed to compute the perceived organization of visual

    patterns.

    We have applied genetic programming techniques to

    this formal theory of visual perception in order to com-

    pute the perceived organization of visual line patterns.

    Based on perceptually relevant operators from SIT, a

    pool of alternative organizations of an input pattern is

    generated. Motivated by SIT, mutation and crossover

    operations are de�ned that can be applied to these or-

    ganizations to generate new organizations for the in-

    put pattern. Finally, a �tness function is de�ned that

    determines the appropriateness of generated organiza-

    tions. This �tness function is directly derived from

    SIT and measures the simplicity of organizations.

    In this paper, we have focused on a small domain of

    visual linear line patterns. The next step is to extend

    our system to compute the perceived organization of

    more complex visual patterns like two-dimensional vi-

    sual patterns, which are de�ned in terms of a variety of

    visual attributes such as color, size, position, texture,

    shape.

    Finally, we intend to investigate whether the class of

    structural regularities proposed by SIT is also relevant

    for �nding meaningful organizations within patterns

    from biological experiments, like DNA sequences. For

    this task, we will need to modify GPSPS in order to

    allow a group of letters to be treated as a primitive

    element.

    References

    [Bertin, 1981] Bertin, J. (1981). Graphics and Graphic

    Information-Processing. Walter de Gruyter, Berlin

    NewYork.

    [Boselie and Wouterlood, 1989] Boselie, F. and

    Wouterlood, D. (1989). The minimum principle

    and visual pattern completion. Psychological

    Research, 51:93{101.

    [Bu�art et al., 1981] Bu�art, H., Leeuwenberg, E.,

    and Restle, F. (1981). Coding theory of visual pat-

    tern completion. Journal of Experimental Psychol-

    ogy: Human Perception and Performance, 7:241{

    274.

    [Dastani, 1998] Dastani, M. (1998). Ph.D. thesis, Uni-

    versity of Amsterdam, The Netherlands.

    [Hofstadter, 1984] Hofstadter, D. (1984). The copy-

    cat project: An experiment in nondeterministic and

    creative analogies. In A.I. Memo 755, Arti�cial In-

    telligence Laboratory, Cambridge, Mass. MIT.

    [Kang and Ikeuchi, 1993] Kang, S. and Ikeuchi, K.

    (1993). Toward automatic robot instruction from

    perception: Recognizing a grasp from observation.

    In IEEE Trans. on Robotics and Automation, vol.

    9, no. 4, pages 432{443.

    [Koza, 1992] Koza, J. (1992). Genetic Programming.

    MIT Press.

    [Leeuwenberg, 1971] Leeuwenberg, E. (1971). A per-

    ceptual coding language for visual and auditory pat-

    terns. American Journal of Psychology, 84:307{349.

    9GENETIC PROGRAMMING

  • [Mackinlay, 1986] Mackinlay, J. (1986). Automating

    the design of graphical presentations of relational

    information. In ACM Transactions on Graphics,

    volume 5, pages 110{141.

    [Marks and Reiter, 1990] Marks, J. and Reiter, E.

    (1990). Avoiding unwanted conversational implica-

    tures in text and graphics. In Proceeding AAAI,

    Menlo Park, CA.

    [Michalewicz, 1996] Michalewicz, Z. (1996). Genetic

    Algorithms + Data Structures = Evolution Pro-

    grams. Springer-Verlag, Berlin.

    [Van der Helm, 1994] Van der Helm, P. (1994). The

    dynamics of pragnanz. Psychological Research,

    56:224{236.

    [Van der Helm and Leeuwenberg, 1986] Van der

    Helm, P. and Leeuwenberg, E. (1986). Avoiding

    explosive search in automatic selection of simplest

    pattern codes. Pattern Recognition, 19:181{191.

    [Van der Helm and Leeuwenberg, 1991] Van der

    Helm, P. and Leeuwenberg, E. (1991). Accessi-

    bility: A criterion for regularity and hierarchy

    in visual pattern code. Journal of Mathematical

    Psychology, 35:151{213.

    [Van Leeuwen et al., 1988] Van Leeuwen, C., Bu�art,

    H., and Van der Vegt, J. (1988). Sequence inuence

    on the organization of meaningless serial stimuli:

    economy after all. Journal of Experimental Psychol-

    ogy: Human Perception and Performance, 14:481{

    502.

    [Zhu, 1999] Zhu, S. (Nov, 1999). Embedding gestalt

    laws in markov random �elds - a theory for shape

    modeling and perceptual organization. IEEE Trans.

    on Pattern Analysis and Machine Intelligence, Vol.

    21, No.11.

    1 string:

    aAaAaAaAaAaAaA

    structure:

    a) iter(con(a,A),7)

    b) con(iter(con(a,A),2),iter(con(a,A),5))

    complexity

    a) 2

    b) 4

    2 string:

    aAaBbAbBbAbBaAa

    structure:

    a) symodd(altleft(a,),B)

    b) symodd(con(symodd(a,A),altright(b,)),B)

    complexity

    a) 6

    b) 6

    3 string:

    aAaBaAaBaAaB

    structure:

    a) iter(altleft(a,),3)

    b) iter(con(symodd(a,A),B), 3)

    complexity

    a) 3

    b) 3

    4 string:

    aXaYaXaZbAcBcBc

    structure:

    a) altleft(symodd(a,X),

  • Reducing Bloat and Promoting Diversity usingMulti-Objective Methods

    Edwin D. de Jong1;2 Richard A. Watson2 Jordan B. Pollack2

    fedwin, richardw, [email protected] Universiteit Brussel, AI Lab, Pleinlaan 2, B-1050 Brussels, Belgium

    2Brandeis University, DEMO Lab, Computer Science dept., Waltham, MA 02454, USA

    Category: Genetic Programming

    Abstract

    Two important problems in genetic program-

    ming (GP) are its tendency to �nd unnec-

    essarily large trees (bloat), and the general

    evolutionary algorithms problem that diver-

    sity in the population can be lost prema-

    turely. The prevention of these problems

    is frequently an implicit goal of basic GP.

    We explore the potential of techniques from

    multi-objective optimization to aid GP by

    adding explicit objectives to avoid bloat and

    promote diversity. The even 3, 4, and 5-

    parity problems were solved eÆciently com-

    pared to basic GP results from the litera-

    ture. Even though only non-dominated in-

    dividuals were selected and populations thus

    remained extremely small, appropriate diver-

    sity was maintained. The size of individuals

    visited during search consistently remained

    small, and solutions of what we believe to be

    the minimum size were found for the 3, 4,

    and 5-parity problems.

    Keywords: genetic programming, code growth,

    bloat, introns, diversity maintenance, evolutionary

    multi-objective optimization, Pareto optimality

    1 INTRODUCTION

    A well-known problem in genetic programming (GP),

    is the tendency to �nd larger and larger programs over

    time (Tackett, 1993; Blickle & Thiele, 1994; Nordin &

    Banzhaf, 1995; McPhee & Miller, 1995; Soule & Fos-

    ter, 1999), called bloat or code growth. This is harm-

    ful since it results in larger solutions than necessary.

    Moreover, it increasingly slows down the rate at which

    new individuals can be evaluated. Thus, keeping the

    size of trees that are visited small is generally an im-

    plicit objective of GP.

    Another important issue in GP and in other methods

    of evolutionary computation is that of how diversity

    of the population can be achieved and maintained. A

    population that is spread out over promising parts of

    the search space has more chance of �nding a solution

    than one that is concentrated on a single �tness peak.

    Since members of a diverse population solve parts of

    the problem in di�erent ways, it may also be more

    likely to discover partial solutions that can be utilized

    through crossover. Diversity is not an objective in the

    conventional sense; it applies to the populations visited

    during the search, not to �nal solutions. A less obvious

    idea then is to view the contribution of individuals to

    population diversity as an objective.

    Multi-objective techniques are speci�cally designed for

    problems in which knowledge about multiple objec-

    tives is available, see e.g. Fonseca and Fleming (1995)

    for an overview. The main idea of this paper is to

    use multi-objective techniques to add the objectives of

    size and diversity in addition to the usual objective of

    a problem-speci�c �tness measure. A multi-objective

    approach to bloat appears promising and has been

    used before (Langdon, 1996; Rodriguez-Vazquez, Fon-

    seca, & Fleming, 1997), but has not become standard

    practice. The reason may be that basic multi-objective

    methods, when used with small tree size as an objec-

    tive, can result in premature convergence to small in-

    dividuals (Langdon & Nordin, 2000; Ekart, 2001). We

    therefore investigate the use of a size objective in com-

    bination with explicit diversity maintenance.

    The remaining sections discuss the n-parity problem

    (2), bloat (3), multi-objective methods (4), diversity

    maintenance(5), ideas behind the approach, called FO-

    CUS, (6), algorithmic details (7), results (8), and con-

    clusions (9).

    2 THE N-PARITY PROBLEM

    The test problems that will be used in this paper are

    even n-parity problems, with n ranging from 3 to 5.

    A correct solution to this problem takes a binary se-

    quence of length n as input and returns true (one) if

    11GENETIC PROGRAMMING

  • X0 X1 X0 X1

    NORAND

    OR

    Figure 1: A correct solution to the 2-parity problem

    the number of ones in the sequence is even, and false

    (zero) if it is odd. It is named even to avoid confusion

    with the related odd parity problem, which gives the

    inverse answer. Trees may use the following boolean

    operators as internal nodes: AND, OR, NAND, and

    NOR. Each leaf speci�es an element of the sequence.

    The �tness is the fraction of all possible length n bi-

    nary sequences for which the program returns the cor-

    rect answer. Figure 1 shows an example.

    The n-parity problem has been selected because it is a

    diÆcult problem that has been used by a number of re-

    searchers. With increasing order, the problem quickly

    becomes more diÆcult. One way to understand its

    hardness is that for any setting of the bits, ipping

    any bit inverts the outcome of the parity function.

    Equivalently, its Karnaugh map (Zissos, 1972) equals

    a checkerboard function, and thus has no adjacencies.

    2.1 SIZE OF THE SMALLEST

    SOLUTIONS TO N-PARITY

    We believe that the correct solutions to n-parity con-

    structed as follows are of minimal size, but are not able

    to prove this. The principle is to recursively divide the

    bit sequence in half and, take the parity of each halve,

    and feed these two into a parity function. For subse-

    quences of size one, i.e. single bits, the bit itself is used

    instead of its parity. When this occurs for one of the

    two arguments, the outcome would be inverted, and

    thus the odd 2-parity function is used to obtain the

    even 2-parity of the bits.

    Let S be a binary sequence of length jSj = n � 2.S is divided in half yielding two subsequences L and

    R with, for even n, length n2or, for odd n, lengths

    n�1

    2and n+1

    2. Then the following recursively de�ned

    function P(S) gives a correct expression for the even-

    parity of S for jSj � 2 in terms of the above operators:

    P (S) =

    8<:S if jSj = 1ODD(P (L); P (R)) if jSj > 1 ^ g(L;R)EVEN(P (L); P (R)) otherwise

    whereODD(A, B) = NOR(AND(A, B), NOR(A, B)),EVEN(A, B) = OR(AND(A, B), NOR(A, B)), and

    g(A;B) =

    �TRUE if (jAj = 1) XOR (jBj = 1)FALSE else

    Table 1: Length of the shortest solution to n-parity

    using the operators AND, OR, NAND, and NOR.

    n 1 2 3 4 5 6 7

    Length 3 7 19 31 55 79 103

    The length jP (S)j of the expression P (S) satis�es:

    jP (S)j =

    �1 for jSj = 1

    3 + 2jP (L)j + 2jP (R)j for jSj > 1

    For n = 2i; i > 0, this expression can be shown to

    equal 2n2 � 1. Table 1 gives the lengths of the ex-pressions for the �rst seven even-n-parity problems.

    For jSj = 1, the shortest expression is NOR(S, S); forjSj > 1, the length is given by the above expression.The rapid growth with increasing order stems from the

    repeated doubling of the required inputs.

    3 THE PROBLEM OF BLOAT

    A well-known problem, known as bloat or code growth,

    is that the trees considered during a GP run grow

    in size and become larger than is necessary to rep-

    resent good solutions. This is undesirable because it

    slows down the search by increasing evaluation and

    manipulation time and, if the growth consists largely

    of non-functional code, by decreasing the probability

    that crossover or mutation will change the operational

    part of the tree. Also, compact trees have been linked

    to improved generalization (Rosca, 1996).

    Several causes of bloat have been suggested. First,

    under certain restrictions (Soule, 1998), crossover fa-

    vors smaller than average subtrees in removal but

    not in replacement. Second, larger trees are more

    likely to produce �t (and large) o�spring because

    non-functional code can play a protective role against

    crossover (Nordin & Banzhaf, 1995) and, if the prob-

    ability of mutating a node decreases with increasing

    tree size, against mutation. Third, the search space

    contains more large than small individuals (Langdon

    & Poli, 1998).

    Nordin and Banzhaf (1995) observed that the length

    of the e�ective part of programs decreases over time.

    However, the total length of the programs in the ex-

    periments also increased rapidly, and hence it may be

    concluded that in those experiments bloat was mainly

    due to growth of ine�ective code (introns).

    Finally, it is conceivable that in some circumstances

    non-functional code may be useful. It has been sug-

    gested that introns may be useful for retaining code

    that is not used in the current individual but is a

    helpful building block that may be used later (Nordin,

    Francone, & Banzhaf, 1996).

    12 GENETIC PROGRAMMING

  • Table 2: Properties of the basic GP method used.

    Problem 3-ParityFitness Fraction of correct answersOperators AND, OR, NAND, and NORStop criterion 500,000 evaluations or solutionInitial tree size Uniform [1..20] internal nodesCycle generationalPopulation Size 1000Parent selection Boltzmann with T = 0.1Replacement CompleteUniqueness check Individuals occur at most onceP(crossover) 0.9P(mutation) 0.1Mutation method Mutate node with P = 1

    n

    0

    100

    200

    300

    400

    500

    600

    700

    0 20000 40000 60000 80000 1000000

    0.25

    0.5

    0.75

    1

    Ave

    rage

    tree

    siz

    e

    Number of fitness evaluations

    Average treesize

    0

    100

    200

    300

    400

    500

    600

    700

    0 20000 40000 60000 80000 1000000

    0.25

    0.5

    0.75

    1

    Ave

    rage

    tree

    siz

    e

    Number of fitness evaluations

    Average treesize

    0

    100

    200

    300

    400

    500

    600

    700

    0 20000 40000 60000 80000 1000000

    0.25

    0.5

    0.75

    1

    Ave

    rage

    tree

    siz

    e

    Number of fitness evaluations

    Average treesize

    0

    100

    200

    300

    400

    500

    600

    700

    0 20000 40000 60000 80000 1000000

    0.25

    0.5

    0.75

    1

    Ave

    rage

    tree

    siz

    e

    Number of fitness evaluations

    Average treesizeFraction of runs that yielded solution

    Size of smallest correct tree

    Figure 2: Average tree sizes of ten di�erent runs (solid

    lines) using basic GP on the 3-parity program.

    3.1 OBSERVATION OF BLOAT USING

    BASIC GP

    To con�rm that bloat does indeed occur in the test

    problem of n-parity using basic GP, thirty runs where

    performed for the 3-parity problem. The parameters

    of the run are shown in Table 2. A run ends when

    a correct solution has been found. Figure 2 shows

    that average tree sizes increase rapidly in each run. If

    a solution is not found at an early point in the run,

    bloating rapidly increases the sizes of the trees in the

    population, thus increasingly slowing down the search.

    A single run of 111,054 evaluations already took more

    than 15 hours on a current PC running Linux due to

    the increasing amount of processing required per tree

    as a result of bloat. The population of size-unlimited

    trees that occurred in the single 4-parity run that

    was tried (with trees containing up to 6,000 nodes)

    �lled virtually the entire swap space and caused per-

    formance to degrade to impractical levels. Clearly, the

    problem of bloat must be addressed in order to solve

    these and higher order versions of the problem in an

    eÆcient manner.

    0

    100

    200

    300

    400

    500

    600

    700

    0 20000 40000 60000 80000 1000000

    0.25

    0.5

    0.75

    1

    Fra

    ctio

    n of

    suc

    cess

    ful r

    uns

    Number of fitness evaluations

    Average treesize

    0

    100

    200

    300

    400

    500

    600

    700

    0 20000 40000 60000 80000 1000000

    0.25

    0.5

    0.75

    1

    Fra

    ctio

    n of

    suc

    cess

    ful r

    uns

    Number of fitness evaluations

    Average treesize

    0

    100

    200

    300

    400

    500

    600

    700

    0 20000 40000 60000 80000 1000000

    0.25

    0.5

    0.75

    1

    Fra

    ctio

    n of

    suc

    cess

    ful r

    uns

    Number of fitness evaluations

    Average treesize

    0

    100

    200

    300

    400

    500

    600

    700

    0 20000 40000 60000 80000 1000000

    0.25

    0.5

    0.75

    1

    Fra

    ctio

    n of

    suc

    cess

    ful r

    uns

    Number of fitness evaluations

    Average treesizeFraction of runs that yielded solution

    Minimum size of correct tree

    Figure 3: Average tree sizes and fraction of successful

    runs in the 3-parity problem using basic GP with a tree

    size limit of 200. Tree sizes are successfully limited, of

    course, but the approach is not ideal (see text).

    3.2 USING A FIXED TREE SIZE LIMIT

    Probably the most common way to avoid bloat is to

    simply limit the allowed tree size or depth (Langdon &

    Poli, 1998; Koza, 1992), although the latter has been

    found to lead to loss of diversity near the root node

    when used with crossover (Gathercole & Ross, 1996).

    Figure 3 shows the e�ect of using a limit of 200 on 3-

    parity. This limit is well above the minimum size of a

    correct solution, but not too high either since several

    larger solutions were found in the unrestricted run.

    The average tree size is around 140 nodes.

    On the 4-parity problem (with a tree size limit of 200),

    the average tree size varied around 150. However,

    whereas on 3-parity 90% of the runs found a solution

    within 100,000 evaluations, on 4-parity only 33% of

    the runs found a solution within 500,000 evaluations,

    testifying to the increased diÆculty of this order of

    the parity problem. For 5-parity, basic GP found no

    solutions within 1,000,000 evaluations for any of the

    30 runs. Thus, our version of GP with �xed tree size

    limit does not scale up well. Furthermore, a funda-

    mental problem with this method of preventing bloat

    is that the maximum tree size has to be selected before

    the search, when it is often unknown.

    3.3 WEIGHTED SUM OF FITNESS AND

    SIZE

    Instead of choosing a �xed tree size limit in advance

    one would rather like to have the algorithm search for

    trees that can be as large as they need to be, but not

    much larger. A popular approach that goes some way

    towards this goal is to include a component in the �t-

    ness that rewards small trees or programs. This is

    mostly done by adding a component to the �tness,

    thus making �tness a linear combination of a perfor-

    mance measure and a parsimony measure (Koza, 1992;

    Soule, Foster, & Dickinson, 1996). However, this ap-

    proach is not without its own problems (Soule & Fos-

    13GENETIC PROGRAMMING

  • Objective 1

    Objective 2

    Non-dominated

    Highest isocline of weightedsum that crosses an individual

    Direction in which weighted sum increases

    individuals

    Figure 4: Schematic rendition of a concave tradeo�

    surface. This occurs when better performance in one

    objective means worse performance in the other, vice

    versa. The lines mark the maximum �tness individu-

    als for three example weightings (see vectors) using a

    linear weighting of the objectives. No linear weight-

    ing exists that �nds the in-between individuals, with

    reasonable performance in both objectives.

    ter, 1999). First, the weight of the parsimony measure

    must be determined beforehand, and so a choice con-

    cerning the tradeo� between size and performance is

    already made before the search. Furthermore, if the

    tradeo� surface between the two �tness components

    is concave1 (see Fig. 4), a linear weighting of the two

    components favors individuals that do well in one of

    the objectives, but excludes individuals that perform

    reasonably in both respects (Fleming & Pashkevich,

    1985).

    Soule and Foster (1999) have investigated why a linear

    weighting of �tness and size has yielded mixed results.

    It was found that a weight value that adequately bal-

    ances �tness and size is diÆcult to �nd. However, if

    the required balance is di�erent for di�erent regions

    in objective space, then adequate parsimony pressure

    cannot be speci�ed using a single weight. If this is

    the case, then methods should be used that do not at-

    tempt to �nd such a single balance. This idea forms

    the basis of multi-objective optimization.

    4 MULTI-OBJECTIVE METHODS

    After several early papers describing the idea of opti-

    mizing for multiple objectives in evolutionary compu-

    tation (Scha�er, 1985; Goldberg, 1989), the approach

    has recently received increasing attention (Fonseca &

    Fleming, 1995; Van Veldhuizen, 1999). The basic idea

    is to search for multiple solutions, each of which satisfy

    the di�erent objectives to di�erent degrees. Thus, the

    selection of the �nal solution with a particular com-

    bination of objective values is postponed until a time

    when it is known what combinations exist.

    A key concept in multi-objective optimization is that

    of dominance. Let individual xAhave values A

    ifor the

    n objectives, and individual xBhave objective values

    1Since �tness is to be maximized, the tradeo� curveshown is concave.

    Bi. Then A dominates B if

    8i 2 [1::n] : Ai� B

    i^ 9i : A

    i> B

    i

    Multi-objective optimization methods typically strive

    for Pareto optimal solutions, i.e. individuals that are

    not dominated by any other individuals.

    5 DIVERSITY MAINTENANCE

    A key di�erence between classic search methods and

    evolutionary approaches is that in the latter a popu-

    lation of individuals is maintained. The idea behind

    this is that by maintaining individuals in several re-

    gions of the search space that look promising (diver-

    sity maintenance), there is a higher chance of �nding

    useful material from which to construct solutions.

    In order to maintain the existing diversity of a pop-

    ulation, evolutionary methods typically keep some or

    many of the individuals that happen to have been gen-

    erated and have relatively high �tness, but lower than

    that found so far. In the same way, evolutionary multi-

    objective methods usually keep some dominated indi-

    viduals in addition to the non-dominated individuals

    (Fonseca & Fleming, 1993). However, this appears to

    be a somewhat arbitrary way of maintaining diversity.

    In the following section, we present a more directed

    method. The relation to other diversity maintenance

    methods is discussed.

    6 THE FOCUS METHOD

    We propose to do diversity maintenance by using a

    basic multi-objective algorithm and including an ob-

    jective that actively promotes diversity. To the best

    of our knowledge, this idea has not been used in other

    work, including multi-objective research. If it works

    well, the need for keeping arbitrary dominated indi-

    viduals may be avoided. To test this, we use the di-

    versity objective in combination with a multi-objective

    method that only keeps non-dominated individuals, as

    reported in section 8.

    The approach strongly directs the attention of the

    search towards the explicitly speci�ed objectives. We

    therefore name this method FOCUS, which stands for

    Find Only and Complete Undominated Sets, reecting

    the fact that populations only contain non-dominated

    individuals, and contain all such individuals encoun-

    tered so far. Focusing on non-dominated individuals

    combines naturally with the idea that the objectives

    are responsible for exploration, and this combination

    de�nes the FOCUS method.

    The concept of diversity applies to populations, mean-

    ing that they are dispersed. To translate this aim into

    an objective for individuals, a metric has to be de�ned

    that, when optimized by individuals, leads to diverse

    populations. The metric used here is that of average

    14 GENETIC PROGRAMMING

  • squared distance to the other members of the popu-

    lation. When this measure is maximized, individuals

    are driven away from each other.

    Interestingly, the average distance metric strongly de-

    pends on the current population. If the population

    were centered around a single central peak in the �t-

    ness landscape, then individuals that moved away from

    that peak could survive by satisfying the diversity ob-

    jective better than the individuals around the �tness

    peak. It might be expected that this would cause

    large parts of the population to occupy regions that

    are merely far away from other individuals but are not

    relevant to the problem. However, if there are any

    di�erences in �tness in the newly explored region of

    the search space, then the �tter individuals will come

    to replace individuals that merely performed well on

    diversity. When more individuals are created in the

    same region, the potential for scoring highly on diver-

    sity for those individuals diminishes, and other areas

    will be explored. The dynamics thus created are a new

    way to maintain diversity.

    Other techniques that aim to promote diversity in a di-

    rected way exist, and include �tness sharing (Goldberg

    & Richardson, 1987; Deb & Goldberg, 1989), deter-

    ministic crowding (Mahfoud, 1995), and �tness derat-

    ing (Beasley, Bull, & Martin, 1993). A distinguishing

    feature of the method proposed here is that in choos-

    ing the diversity objective, problem-based criteria can

    be used to determine which individuals should be kept

    for exploration purposes.

    7 ALGORITHM DETAILS

    The algorithm selects individuals if and only if they are

    not dominated by other individuals in the population.

    The population is initialized with 300 randomly cre-

    ated individuals of 1 to 20 internal nodes. A cycle

    proceeds as follows. A chosen number n of new indi-

    viduals (300) is generated based on the current popu-

    lation using crossover (90%) and mutation (10%). If

    the individual already exists in the population, it is

    mutated. If the result also exists, it is discarded. Oth-

    erwise it is added to the population. All individuals

    are then evaluated if necessary. After evaluation, all

    population members are checked against other popu-

    lation members, and removed if dominated by any of

    them.

    A slightly stricter criterion than Pareto's is used: A

    dominates B if 8i 2 [1::n] : Ai� B

    i. Of multiple indi-

    viduals occupying the same point on the tradeo� sur-

    face, precisely one will remain, since the removal cri-

    terion is applied sequentially. This criterion was used

    because the Pareto criterion caused a proliferation of

    individuals occupying the same point on the trade-o�

    surface when no diversity objective was used2.

    2In later experiments including the diversity objec-

    0

    100

    200

    300

    400

    500

    600

    700

    0 20000 40000 60000 80000 1000000

    0.25

    0.5

    0.75

    1

    Fra

    ctio

    n of

    suc

    cess

    ful r

    uns

    Number of fitness evaluations

    Average treesize

    0

    100

    200

    300

    400

    500

    600

    700

    0 20000 40000 60000 80000 1000000

    0.25

    0.5

    0.75

    1

    Fra

    ctio

    n of

    suc

    cess

    ful r

    uns

    Number of fitness evaluations

    Average treesize

    0

    100

    200

    300

    400

    500

    600

    700

    0 20000 40000 60000 80000 1000000

    0.25

    0.5

    0.75

    1

    Fra

    ctio

    n of

    suc

    cess

    ful r

    uns

    Number of fitness evaluations

    Average treesize

    0

    100

    200

    300

    400

    500

    600

    700

    0 20000 40000 60000 80000 1000000

    0.25

    0.5

    0.75

    1

    Fra

    ctio

    n of

    suc

    cess

    ful r

    uns

    Number of fitness evaluations

    Average treesizeFraction of runs that yielded solution

    Minimum size of correct tree

    Figure 5: Average tree size and fraction of successful

    runs for the [�tness, size, diversity] objective vector on

    the 3-parity problem. The trees are much smaller than

    for basic GP, and solutions are found faster.

    The following distance measure is used in the diversity

    objective. The distance between two corresponding

    nodes is zero if they are identical and one if they are

    not. The distance between two trees is the sum of the

    distances of the corresponding nodes, i.e. nodes that

    overlap when the two trees are overlaid, starting from

    the root. The distance between two trees is normalized

    by dividing by the size of the smaller tree of the two.

    8 EXPERIMENTAL RESULTS

    In the following experiments we use �tness, size, and

    diversity as objectives. The implementation of the ob-

    jectives is as follows. Fitness is the fraction of all 2n

    input combinations handled correctly. For size, we use

    1 over the number of nodes in the tree as the objective

    value. The diversity objective is the average squared

    distance to the other population members.

    8.1 USING FITNESS, SIZE, AND

    DIVERSITY AS OBJECTIVES

    Fig. 5 shows the graph of Fig. 3 for the method of

    using �tness, size, and diversity as objectives. The av-

    erage tree size remains extremely small. In addition,

    a glance at the graphs indicates that correct solutions

    are found more quickly. To determine whether this

    is indeed the case, we compute the computational ef-

    fort, i.e. the expected number of evaluations required

    to yield a correct solution with a 99% probability, as

    described in detail by Koza (1994).

    The impression that correct solutions to 3-parity are

    found more quickly for the multi-objective approach

    (see Figure 6) is con�rmed by considering the com-

    putational e�ort E; whereas GP with the tree size

    limit requires 72,044 evaluations, the multi-objective

    approach requires 42,965 evaluations. For the 4-

    parity problem, the di�erence is larger; basic GP needs

    tive, this proliferation was not observed, and the standardPareto criterion also worked satisfactorily.

    15GENETIC PROGRAMMING

  • 0

    100000

    200000

    300000

    400000

    500000

    600000

    0 50000 1000000

    0.5

    1

    Exp

    ecte

    d R

    equi

    red

    eval

    uatio

    ns

    P(c

    orre

    ct s

    olut

    ion)

    Evaluations

    GP: E = 72,044

    MO: E = 42,965

    P for MO methodP for GP

    I for MO methodI for GP

    Figure 6: Probability of �nding a solution and com-

    putational e�ort for 3-parity using basic GP and the

    multi-objective method.

    0

    2e+06

    4e+06

    6e+06

    8e+06

    1e+07

    1.2e+07

    1.4e+07

    0 100000 200000 300000 400000 5000000

    0.5

    1

    Exp

    ecte

    d R

    equi

    red

    eval

    uatio

    ns

    P(c

    orre

    ct s

    olut

    ion)

    Evaluations

    MO: E = 238,856

    GP: E = 5,410,550

    P for MO methodP for GP

    I for MO methodI for GP

    Figure 7: Probability of �nding a solution and compu-

    tational e�ort for 4-parity for basic GP and the multi-

    objective method. The performance of the multi-

    objective method is considerably superior.

    5,410,550 evaluations, whereas the multi-objective ap-

    proach requires only 238,856. This is a dramatic im-

    provement, and demonstrates that our method can be

    very e�ective.

    Finally, experiments have been performed using the

    even more diÆcult 5-parity problem. For this prob-

    lem, basic GP did not �nd any correct solutions within

    a million evaluations. The multi-objective method did

    �nd solutions, and did so reasonably eÆciently, requir-

    ing a computational e�ort of 1,140,000 evaluations.

    Table 3 summarizes the results of the experiments.

    Considering the average size of correct solutions on

    3-parity, the multi-objective method outperforms all

    methods that have been compared, as the �rst solution

    it �nds has 30.4 nodes on average. What's more, the

    multi-objective method also requires a smaller num-

    ber of evaluations to do so than the other methods.

    Finally, perhaps most surprisingly, it �nds correct so-

    lutions using extremely small populations, typically

    containing less than 10 individuals. For example, the

    average population size over the whole experiment for

    3-parity was 6.4, and 8.5 at the end of the experiment,

    Table 3: Results of the experiments (GP and Multi-

    Objective rows). For comparison, results of Koza's

    (1994) set of experiments (population size 16,000) and

    the best results with other con�gurations (population

    size 4,000) found there. E: computational e�ort, S:

    average tree size of �rst solution, Pop: average popu-

    lation size.

    3-parity E S PopGP 72,044 93.67 1000Multi-objective 42,965 30.4 6.4Koza GP 96,000 44.6 16,000Koza GP-ADF 64,000 48.2 16,0004-parity E S PopGP 5,410,550 154 1000Multi-objective 238,856 68.5 15.8Koza GP 384,000 112.6 16,000Koza GP-ADF 176,000 60.1 16,0005-parity E S PopGP 11 n.a. n.aMulti-objective 1,140,000 218.7 49.7Koza GP 6,528,000 299.9 16,000Koza GP 1,632,000 299.9 4,000Koza GP-ADF 464,000 156.8 16,000Koza GP-ADF 272,000 99.5 4,000

    1No solutions were found for 5-parity using basic GP.

    and the highest population size encountered in all 30

    runs was 18. This suggests that the diversity main-

    tenance achieved by using this greedy multi-objective

    method in combination with an explicit diversity ob-

    jective is e�ective, since even extremely small popula-

    tions did not result in premature convergence.

    Considering 4 and 5-parity, the GP extended with the

    size and diversity objectives outperforms both basic

    GP methods used by Koza (1994) and the basic GP

    method tested here, both in terms of computational

    e�ort and tree size. The Automatically De�ned Func-

    tion (ADF) experiments performed by Koza for these

    and larger problem sizes perform better. These prob-

    ably bene�t from the inductive bias of ADFs, which

    favors a modular structure. Therefore, a natural di-

    rection for future experiments is to also extend ADFs

    with size and diversity objectives.

    For comparison, we also implemented an evolutionary

    multi-objective technique that does keep some domi-

    nated individuals. It used the number of individuals by

    which an individual is dominated as a rank, similar to

    the method described by Fonseca and Fleming (1993).

    The results were similar in terms of evaluations, but

    the method keeping strictly non-dominated individuals

    worked faster, probably due to the calculation of the

    distance measure. Since this is quadratic in the pop-

    ulation size, the small populations of multi-objective

    save much time (about a factor 7 for 5-parity), which

    made it preferable.

    16 GENETIC PROGRAMMING

  • As a control experiment, we also investigated whether

    the diversity objective is really required by using

    only �tness and size as objectives using the algorithm

    that was described. The individuals found are small

    (around 10 nodes), but the �tness of the individuals

    found was well below basic GP, and hence the diver-

    sity objective was indeed performing a useful function

    in the experiments.

    8.2 OBTAINING STILL SMALLER

    SOLUTIONS

    Finally, we investigate whether the algorithm is able

    to �nd smaller solutions, after �nding the �rst. Af-

    ter the �rst correct solution is found, we monitor the

    smallest correct solution. Although the �rst solution

    size of 30 was already low compared to other methods,

    the algorithm rapidly �nds smaller correct solutions.

    The average size drops to 22 within 4,000 additional

    evaluations, and converges to around 20. The smallest

    tree (found in 12 out of 30 runs) was 19, i.e. equalling

    the presumed minimum size. On 4-parity, solutions

    dropped in size from the initial 68.5 to 50 in about

    10,000 evaluations, and to 41 on average when runs

    were continued longer (85,000 evaluations). In 12 of

    the 30 runs, minimum size solutions (31 nodes) were

    found. Using the same method, a minimum size solu-

    tion to 5-parity (55 nodes) was also found.

    The quick convergence to smaller tree sizes shows that

    at least for the problem at hand, the method is e�ec-

    tive at �nding small solutions when it is continued run-

    ning after the �rst correct solutions have been found,

    in line with the seeding experiments by Langdon and

    Nordin (2000).

    9 CONCLUSIONS

    The paper has discussed using multi-objective meth-

    ods as a general approach to avoiding bloat in GP

    and to promoting diversity, which is relevant to evo-

    lutionary algorithms in general. Since both of these

    issues are often implicit goals, a straightforward idea

    is to make them explicit by adding corresponding ob-

    jectives. In the experiments that are reported, a size

    objective rewards smaller trees, and a diversity objec-

    tive rewards trees that are di�erent from other individ-

    uals in the population, as calculated using a distance

    measure.

    Strongly positive results are reported regarding both

    size control and diversity maintenance. The method

    is successful in keeping the trees that are visited small

    without requiring a size limit or a relative weighting of

    �tness and size. It impressively outperforms basic GP

    on the 3, 4, and 5-parity problem both with respect

    to computational e�ort and tree size. Furthermore,

    correct solutions of what we believe to be the minimum

    size have been found for all problem sizes examined,

    i.e. the even 3, 4, and 5-parity problems.

    The e�ectiveness of the new way of promoting diver-

    sity proposed here can be assessed from the follow-

    ing, which concerns the even 3, 4, and 5-parity prob-

    lems. The multi-objective algorithm that was used

    only maintains individuals that are not dominated by

    other individuals found so far, and maintains all such

    individuals (except those with identical objective vec-

    tors). Thus, only non-dominated individuals are se-

    lected after each generation, and populations (hence)

    remained extremely small (6, 16, and 50 on average,

    respectively). In de�ance of this uncommon degree of

    greediness or elitism, suÆcient diversity was achieved

    to solve these problems eÆciently in comparison with

    basic GP method results both as obtained here and as

    found in the literature. Control experiments in which

    the diversity objective was removed (leaving the �t-

    ness and size objectives) failed to maintain suÆcient

    diversity, as would be expected.

    The approach that was pursued here is to make de-

    sired characteristics of search into explicit objectives

    using multi-objective methods. This method is simple

    and straightforward and performed well on the prob-

    lem sizes reported, in that it improved the performance

    of basic GP on 3 and 4-parity. It solved 5-parity rea-

    sonably eÆciently, even though basic GP found no so-

    lutions on 5-parity. For problem sizes of 6 and larger,

    basic GP is no longer feasible, and more sophisticated

    methods must be invoked that make use of modular-

    ity, such as Koza's Automatically De�ned Functions

    (1994) or Angeline's GLiB (1992). We expect that the

    multi-objective approach with size and diversity as ob-

    jectives that was followed here could also be of value

    when used in combination with these or other existing

    methods in evolutionary computation.

    Acknowledgements

    The authors would like to thank Michiel de Jong,

    Pablo Funes, Hod Lipson, and Alfonso Renart for use-

    ful comments and suggestions concerning this work.

    Edwin de Jong gratefully acknowledges a Fulbright

    grant.

    References

    Angeline, P. J., & Pollack, J. B. (1992). The evolutionaryinduction of subroutines. In Proceedings of the fourteenthannual conference of the cognitive science society (p. 236-241). Bloomington, Indiana, USA: Lawrence Erlbaum.

    Beasley, D., Bull, D. R., & Martin, R. R. (1993). A sequen-tial niche technique for multimodal function optimization.Evolutionary Computation, 1 (2), 101{125.

    Blickle, T., & Thiele, L. (1994). Genetic programming andredundancy. In J. Hopf (Ed.), Genetic algorithms withinthe framework of evolutionary computation (workshop atki-94, saarbrucken) (pp. 33{38). Im Stadtwald, Building

    17GENETIC PROGRAMMING

  • 44, D-66123 Saarbrucken, Germany: Max-Planck-Institutfur Informatik (MPI-I-94-241).

    Deb, K., & Goldberg, D. E. (1989). An investigation ofniche and species formation in genetic function optimiza-tion. In J. D. Scha�er (Ed.), Proceedings of the 3rd in-ternational conference on genetic algorithms (pp. 42{50).George Mason University: Morgan Kaufmann.

    Ekart, A. (2001). Selection based on the Pareto nondomi-nation criterion for controlling code growth in genetic pro-gramming. Genetic Programming and Evolvable Machines,2, 61-73.

    Fleming, P. J., & Pashkevich, A. P. (1985). Computer-aided control system design using a multiobjective opti-mization approach. In Proceedings of the iee internationalconference | control '85 (pp. 174{179). Cambridge, UK.

    Fonseca, C. M., & Fleming, P. J. (1993). Genetic Algo-rithms for Multiobjective Optimization: Formulation, Dis-cussion and Generalization. In S. Forrest (Ed.), Proceedingsof the �fth international conference on genetic algorithms(ICGA'93) (pp. 416{423). San Mateo, California: MorganKau�man Publishers.

    Fonseca, C. M., & Fleming, P. J. (1995). An Overview ofEvolutionary Algorithms in Multiobjective Optimization.Evolutionary Computation, 3 (1), 1{16.

    Gathercole, C., & Ross, P. (1996). An adverse interactionbetween crossover and restricted tree depth in genetic pro-gramming. In J. R. Koza, D. E. Goldberg, D. B. Fogel, &R. L. Riolo (Eds.), Genetic programming 1996: Proceed-ings of the �rst annual conference (pp. 291{296). StanfordUniversity, CA, USA: MIT Press.

    Goldberg, D. E. (1989). Genetic algorithms in search,optimization, and machine learning. Addison-Wesley.

    Goldberg, D. E., & Richardson, J. (1987). Genetic algo-rithms with sharing for multimodal function optimization.In J. J. Grefenstette (Ed.), Genetic algorithms and theirapplications : Proc. of the second Int. Conf. on GeneticAlgorithms (pp. 41{49). Hillsdale, NJ: Lawrence ErlbaumAssoc.

    Koza, J. R. (1992). Genetic programming. Cambridge,MA: MIT Press.

    Koza, J. R. (1994). Genetic programming II: Automaticdiscovery of reusable programs. Cambridge, MA: MITPress.

    Langdon, W. B. (1996). Advances in genetic programming2. In P. J. Angeline & K. Kinnear (Eds.), (p. 395-414).Cambridge, MA: MIT Press. (Chapter 20)

    Langdon, W. B., & Nordin, J. P. (2000). Seeding GP pop-ulations. In R. Poli, W. Banzhaf, W. B. Langdon, J. F.Miller, P. Nordin, & T. C. Fogarty (Eds.), Genetic pro-gramming, proceedings of eurogp'2000 (Vol. 1802, pp. 304{315). Edinburgh: Springer-Verlag.

    Langdon, W. B., & Poli, R. (1998). Fitness causes bloat:Mutation. In W. Banzhaf, R. Poli, M. Schoenauer, & T. C.Fogarty (Eds.), Proceedings of the �rst european workshopon genetic programming (Vol. 1391, pp. 37{48). Paris:Springer-Verlag.

    Mahfoud, S. W. (1995). Niching methods for genetic al-gorithms. Unpublished doctoral dissertation, University ofIllinois at Urbana-Champaign, Urbana, IL, USA. (IlliGALReport 95001)

    McPhee, N. F., & Miller, J. D. (1995). Accurate repli-cation in genetic programming. In L. Eshelman (Ed.),Genetic algorithms: Proceedings of the sixth internationalconference (icga95) (pp. 303{309). Pittsburgh, PA, USA:Morgan Kaufmann.

    Nordin, P., & Banzhaf, W. (1995). Complexity compres-sion and evolution. In L. Eshelman (Ed.), Genetic algo-rithms: Proceedings of the sixth international conference(icga95) (pp. 310{317). Pittsburgh, PA, USA: MorganKaufmann.

    Nordin, P., Francone, F., & Banzhaf, W. (1996). Explicitlyde�ned introns and destructive crossover in genetic pro-gramming. In P. J. Angeline & K. E. Kinnear, Jr. (Eds.),Advances in genetic programming 2 (pp. 111{134). Cam-bridge, MA, USA: MIT Press.

    Rodriguez-Vazquez, K., Fonseca, C. M., & Fleming, P. J.(1997). Multiobjective genetic programming: A nonlinearsystem identi�cation application. In J. R. Koza (Ed.), Latebreaking papers at the 1997 genetic programming confer-ence (pp. 207{212). Stanford University, CA, USA: Stan-ford Bookstore.

    Rosca, J. (1996). Generality versus size in genetic pro-gramming. In J. R. Koza, D. E. Goldberg, D. B. Fogel, &R. L. Riolo (Eds.), Genetic programming 1996: Proceed-ings of the �rst annual conference (pp. 381{387). StanfordUniversity, CA, USA: MIT Press.

    Scha�er, J. D. (1985). Multiple objective optimizationwith vector evaluated genetic algorithms. In J. J. Grefen-stette (Ed.), Proceedings of the 1st international conferenceon genetic algorithms and their applications (pp. 93{100).Pittsburgh, PA: Lawrence Erlbaum Associates.

    Soule, T. (1998). Code growth in genetic programming.Unpublished doctoral dissertation, University of Idaho.

    Soule, T., & Foster, J. A. (1999). E�ects of code growthand parsimony presure on populations in genetic program-ming. Evolutionary Computation, 6 (4), 293{309.

    Soule, T., Foster, J. A., & Dickinson, J. (1996). Codegrowth in genetic programming. In J. R. Koza, D. E. Gold-berg, D. B. Fogel, & R. L. Riolo (Eds.), Genetic program-ming 1996: Proceedings of the �rst annual conference (pp.215{223). Stanford University, CA, USA: MIT Press.

    Tackett, W. A. (1993). Genetic programming for featurediscovery and image discrimination. In S. Forrest (Ed.),Proceedings of the 5th international conference on geneticalgorithms, icga-93 (pp. 303{309). University of Illinois atUrbana-Champaign: Morgan Kaufmann.

    Van Veldhuizen, D. A. (1999). Multiobjective Evolution-ary Algorithms: Classi�cations, Analyses, and New Inno-vations. Unpublished doctoral dissertation, Departmentof Electrical and Computer Engineering. Graduate Schoolof Engineering. Air Force Institute of Technology, Wright-Patterson AFB, Ohio.

    Zissos, D. (1972). Logic design algorithms. London: OxfordUniversity Press.

    18 GENETIC PROGRAMMING

  • Adaptive Genetic Programs via Reinforcement Learning

    Keith L. Downing

    Department of Computer Science

    The Norwegian University of Science and Technology (NTNU)

    7020 Trondheim, Norway

    tele: (+47) 73 59 18 40

    email: [email protected]

    Abstract

    Reinforced Genetic Programming (RGP) en-

    hances standard tree-based genetic program-

    ming (GP) [7] with reinforcement learning

    (RL)[11]. Essentially, leaf nodes of GP trees

    become monitored action-selection points,

    while the internal nodes form a decision tree

    for classifying the current state of the prob-

    lem solver. Reinforcements returned by the

    problem solver govern both �tness evaluation

    and intra-generation learning of the proper

    actions to take at the selection points. In

    theory, the hybrid RGP system hints of mu-

    tual bene�ts to RL and GP in controller-

    design applications, by, respectively, provid-

    ing proper abstraction spaces for RL search,

    and accelerating evolutionary progress via

    Baldwinian or Lamarckian mechanisms. In

    practice, we demonstrate RGP's improve-

    ments over standard GP search on maze-

    search tasks

    1 Introduction

    The bene�ts of combining evolution and learning,

    while largely theoretical in the biological sciences,

    have found solid empirical veri�cation in the �eld

    of evolutionary computation (EC). When evolution-

    ary algorithms (EAs) are supplemented with learning

    techniques, general adaptivity improves such that the

    learning EA �nds solutions faster than the standard

    EA [3, 16]. These enhancements can stem from bi-

    ologically plausible mechanisms such as the Baldwin

    E�ect [2, 14], or from disproven phenomena such as

    Lamarckianism [8, 4].

    In most learning EAs, the data structure or program

    in which learning occurs is divorced from the structure

    that evolves. For example, a common learning EA is a

    hybrid genetic-algorithm (GA) - arti�cial neural net-

    work (ANN) system in which the GA encodes a basic

    ANN topology (plus possibly some initial arc weights),

    and the ANN then uses backpropagation or hebbian

    learning to gradually modify those weights [17, 10, 6].

    A Baldwin E�ect is often evident in the fact that the

    GA-encoded weights improve over time, thus reduc-

    ing the need for learning [1]. Lamarckianism can be

    added by reversing the morphogenic process and back-

    encoding the ANN's learned weights into the GA chro-

    mosome prior to reproduction [12].

    Our primary objective is to realize Baldwinian and

    Lamarckian adaptivity within standard tree-based ge-

    netic programs [7], without the need for a complex

    morphogenic conversion to a separate learning struc-

    ture. Hence, as the GP program runs, the tree nodes

    can adapt, thereby altering (and hopefully improving)

    subsequent runs of the same program. Thus, the typi-

    cal problem domain is one in which each GP tree exe-

    cutes many times during �tness evaluation, for exam-

    ple, in control tasks.

    2 RGP Overview

    Reinforced Genetic Programming combines reinforce-

    ment learning [11] with conventional tree-based genetic

    programming [7]. This produces GP trees with rein-

    forced action-choice leaf nodes, such that successive

    runs of the same tree exhibit improved performance on

    the �tness task. These improvements may or may not

    be reverse-encoded into the genomic form of the tree,

    thus facilitating tests of both Baldwinian and Lamar-

    ckian enhancements to GP.

    The basic idea is most easily explained by exam-

    ple. Consider a small control program for a maze-

    wandering agent:

    19GENETIC PROGRAMMING

  • (if (between 0 x 5)

    (if (between 0 y 5)

    (choice (move-west) (move-north)) R1

    (choice (move-east) (move-south))) R2

    (if (between 6 x 8)

    (choice (move-west) (move-east)) R3

    (choice (move-north) (move-south)))) R4

    Figure 1 illustrates the relationship between this pro-

    gram and the 10x10 maze. Variables x and y specify

    the agents current maze coordinates, while the choice

    nodes are monitored action decisions. The between

    predicate simply tests if the middle argument is within

    the closed range speci�ed by the �rst and third argu-

    ments, while the move functions are discrete one-cell

    jumps. So if the agent's current location falls within

    the southwest region, R1, speci�ed by the (between 0

    x 5) and (between 0 y 5) predicates of the decision

    tree, then the agent can choose between a westward

    and a northward move; whereas the eastern edge gives

    a north-south option.

    During �tness testing, the agent will execute its tree

    code on each timestep and perform the recommended

    action in the maze, which then returns a reinforcement

    signal. For example, hitting a wall may invoke a small

    negative signal, while reaching a goal state would gar-

    ner a large positive payback.

    Initially, the choice nodes select randomly among their

    possible actions, but as the �tness test proceeds, each

    node accumulates reinforcement statistics as to the rel-

    ative utility of each action (in the context of the par-

    ticular location of the choice node in the decision tree,

    which reects the location of the agent in the maze).

    After a �xed number of random free trials, which is

    a standard parameter in reinforcement-learning sys-

    tems (RLSs), the node begins making stochastic action

    choices based on the reinforcement statistics. Hence,

    the node's initial exploration gives way to exploitation.

    Along with determining the tree's internal decisions,

    the evolving genome sets the range for RL exploration

    by specifying the possible actions to the choice nodes;

    the RLS then �ne-tunes the search. By including al-

    ternate forms of choice nodes in GP's primitive set,

    such as choice-4, choice-2, choice-1 (direct action),

    where the integer denotes the number of action argu-

    ments, the RGP's learning e�ort comes under evolu-

    tionary control. Over many evolutionary generations,

    the genomes provide more appropriate decision trees

    and more restricted (yet more relevant) action options

    to the RLS.

    In the maze domain, learning has an implicit cost due

    to the nature of the �tness function, which is based on

    X

    YR1

    R2

    R3

    R4

    0

    9

    9

    ?

    ?

    ?

    ?

    Start

    Goal

    If (between 0 y 5)

    (choice west north) (choice east south) (choice west east) (choice north south)

    if (between 6 x 8)

    If (between 0 x 5)Y N

    Y NY N

    N

    Figure 1: The genetic program determines a partition-

    ing of the reinforcement-learning problem space.

    the average reinforcement per timestep of the agent.

    So an agent that moves directly to a goal location (or

    follows a wall without any explorative "bumps" into it)

    will have higher average reinforcement than one that

    investigates areas o� the optimal path. Initially, ex-

    plorative learning helps the agent �nd the goal, but

    then evolution further hones the controllers to follow

    shorter paths to the goal, with little or no opportu-

    nity for stochastic action choices. Hence, the average

    reinforcement (i.e. �tness) steadily increases, �rst as

    a result of learning (phase I of the Baldwin E�ect)

    and then as a result of genomic hard-wiring (phase II)

    encouraged by the implicit learning cost [9].

    To exploit Lamarckianism, RGP can replace any

    choice node in the genomic tree with a direct action

    function for the action that was deemed best for that

    node. Hence, if the choice node for R1 in Figure 1

    learns that north is the best move from this region

    (while choices for R2 and R3 �nd eastward moves most

    pro�table, and R4 learns the advantage of southward

    moves), then prior to reproduction, the genome can be

    specialized to:

    (if (between 0 x 5)

    (if (between 0 y 5) (move-north) (move-east))

    (if (between 6 x 8) (move-east) (move-south)

    This represents an optimal control strategy for the ex-

    ample, with no time squandered on exploration.

    20 GENETIC PROGRAMMING

  • 3 Reinforcement Learning in RGP

    Reinforcement Learning comes in many shapes and

    forms, and the basic design of RGP supports many of

    these variations. However, the examples in this paper

    use Q-learning [15] with eligibility traces.

    Q-learning is an o�-policy temporal di�erencing form

    of RL. In conventional RL terminology, Q(s,a) denotes

    the value of choosing action a while in state s. Tempo-

    ral di�erencing implies that to update Q(s,a) for the

    current state, st, and most recent action, at, utilize

    the di�erence between the current value of Q(st; at),

    and the sum of a) the reward, rt+1, received after exe-

    cuting action a in state s, and b) the discounted value

    of the new state that results from performing a in s.

    For the new state, st+1, its value, V (st+1) is based on

    the best possible action that can be taken from st+1,

    or maxaQ(st+1; a). Hence, the complete update equa-

    tion is:

    Q(st; at) Q(st; at)+

    �[rt+1 + maxaQ(st+1; a)�Q(st; at)] (1)

    Here, is the discount rate and � is the step size

    or learning rate. The expression in brackets is the

    temporal-di�erence error, Æt. Thus, if performing a in

    s leads to positive (negative) rewards and good (bad)

    next states, then Q(s; a) will increase (decrease), with

    the degree of change governed by � and .

    To implement these Q(s,a) updates (the core activity

    of Q-learning) within GP trees, RGP employs qstate

    objects, one per choice node. Each qstate houses a list

    of state-action pairs (SAPs), where the value slot of

    each SAP corresponds to Q(s,a). For each GP tree, a

    qtable object is generated. It keeps track of all qstates

    in the tree, as well as those most recently visited and

    the the latest reinforcement signal.

    In conventional RL, all possible states, �, are deter-

    mined prior to any learning, with each state typically a

    point in a space whose dimensions are the relevant en-

    vironmental factors and internal state variables of the

    agent. So for a maze-wandering robot, the dimensions

    might be discretized


Recommended