Evolving Neural Networks That Are Both Modular and Regular ...

HAL Id: hal-01300699https://hal.archives-ouvertes.fr/hal-01300699

Submitted on 11 Apr 2016

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Evolving Neural Networks That Are Both Modular andRegular: HyperNeat Plus the Connection Cost

TechniqueJoost Huizinga, Jean-Baptiste Mouret, Jeff Clune

To cite this version:Joost Huizinga, Jean-Baptiste Mouret, Jeff Clune. Evolving Neural Networks That Are Both Modularand Regular: HyperNeat Plus the Connection Cost Technique. GECCO ’14: Proceedings of the 2014Annual Conference on Genetic and Evolutionary Computation, ACM, 2014, Vancouver, Canada.pp.697-704. �hal-01300699�

https://hal.archives-ouvertes.fr/hal-01300699

https://hal.archives-ouvertes.fr

To appear in: Proceedings of the Genetic and Evolutionary Computation Conference. 2014

Evolving Neural Networks That Are Both Modular andRegular: HyperNeat Plus the Connection Cost Technique

Joost HuizingaEvolving AI Lab

Department of ComputerScience

University of [email protected]

Jean-Baptiste MouretISIR, Université Pierre et

Marie Curie-Paris 6CNRS UMR 7222

Paris, [email protected]

Jeff CluneEvolving AI Lab

Department of ComputerScience

University of [email protected]

ABSTRACTOne of humanity’s grand scientific challenges is to create ar-tificially intelligent robots that rival natural animals in intel-ligence and agility. A key enabler of such animal complexityis the fact that animal brains are structurally organized inthat they exhibit modularity and regularity, amongst otherattributes. Modularity is the localization of function withinan encapsulated unit. Regularity refers to the compressibil-ity of the information describing a structure, and typicallyinvolves symmetries and repetition. These properties im-prove evolvability, but they rarely emerge in evolutionaryalgorithms without specific techniques to encourage them. Ithas been shown that (1) modularity can be evolved in neuralnetworks by adding a cost for neural connections and, sepa-rately, (2) that the HyperNEAT algorithm produces neuralnetworks with complex, functional regularities. In this pa-per we show that adding the connection cost technique toHyperNEAT produces neural networks that are significantlymore modular, regular, and higher performing than Hyper-NEAT without a connection cost, even when compared to avariant of HyperNEAT that was specifically designed to en-courage modularity. Our results represent a stepping stonetowards the goal of producing artificial neural networks thatshare key organizational properties with the brains of natu-ral animals.

Categories and Subject DescriptorsI.2.6 [Artificial Intelligence]: Learning—Connectionismand neural nets

KeywordsArtificial Neural Networks; Modularity; Regularity; Hyper-NEAT; NSGA-II

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior specific permission

and/or a fee. Request permissions from [email protected].

GECCO’14, July 12–16, 2014, Vancouver, BC, Canada.

Copyright is held by the owner/author(s). Publication rights licensed to ACM.

ACM 978-1-4503-2662-9/14/07 ...$15.00.

http://dx.doi.org/10.1145/2576768.2598232.

1. INTRODUCTIONAn open and ambitious question in the field of evolution-

ary robotics is how to produce robots that possess the intel-ligence, versatility, and agility of natural animals. A majorenabler of such complexity in animals lies in the fact thattheir bodies and brains are structurally organized in thatthey exhibit modularity and regularity, amongst other at-tributes [22,30].

A network is considered modular if it contains groups ofhighly interconnected nodes, called modules, which are onlysparsely connected to nodes outside the module [8, 17, 22].In addition to such topological modularity, an even strongercase for modularity can be made if such topological modulescorrespond to the performance of sub-functions, a propertycalled functional modularity [22].

Theoretical and empirical evidence suggests that modular-ity speeds up evolvability, which is the rate of evolutionaryadaptation [1, 8, 17–19]. Modularity improves evolvabilityby allowing building blocks (modules) to be rewired for newfunctions and—because the e↵ects of mutations tend to beconfined within a module—by allowing evolution to tinkerwith one module without global e↵ects [17].

Regularity can be defined as: “the compressibility of thedescription of the structure” [22]; as a structure becomesmore regular, less information is required to describe it.Common forms of regularity include repetition, symmetryand self-similarity. It has been shown that regularity im-proves both performance and evolvability [5, 7, 14, 15], andincreasingly does so on more regular problems [10]. Leggedlocomotion, for example, can greatly benefit from the reuseof information [5, 10,14–16].

Despite the advantages of modularity and regularity, theseproperties do not naturally emerge in evolutionary algo-rithms without special techniques to encourage them [8–10,17]. While these properties enhance evolvability and ulti-mately performance, such long-term advantages do not typ-ically grant an immediate fitness boost to individuals thatexhibit them. Evolution has been shown to forgo long-termbenefits in performance and evolvability if there is no short-term benefit [7, 37,38].

Encouraging the evolution of modularity has been a long-standing interest in the field, yielding a variety of strategiesto promote it [8,12,14,15,17,31,32,35,36]. One general strat-egy is to design developmental systems featuring encodingsbiased towards modular structures [14, 15, 31, 32, 35]. Whilee↵ective, these heavily biased encodings often produce net-

works that adhere very closely to a specific structure, leavinglittle room to adapt when circumstances require a di↵erenttopology. Another method was demonstrated by Kashtanand Alon [17], who showed that alternating environmentswith modularly varying goals can give rise to modular net-works. Unfortunately, it can be di�cult to define modularlyvarying goals for many tasks. Moreover, the frequency of al-ternating between these environments must be finely tunedfor the e↵ect to emerge [8, 12]. Espinosa-Soto and Wag-ner [12] found that modularity can be evolved by selectingfor a new task while retaining selection for previously ac-quired functionalities. However, it is non-trivial to decideon a sequence of sub-tasks that will eventually provide acomplex, functionally modular solution to a specific task.

In this paper we will build upon a di↵erent, recently pub-lished method that yields the evolution of modular networks.In 2013, Clune, Mouret and Lipson showed that applying acost for network connections leads to modular networks [8],and does so in a wider range of environments than a previousleading method [17]. This connection-cost technique (CCT)is biologically plausible, as many connection costs exist innatural networks, such as the cost to build and maintainconnections, slower propagation through long connections,and the physical space occupied by long connections [30].Connection costs may thus help explain the ubiquitous mod-ularity found in natural networks [8, 30]. Furthermore, theCCT is computationally inexpensive and can be easily in-corporated into the fitness of any evolutionary algorithm,especially multi-objective algorithms [11].

The most common method for producing regular networksis to use a generative encoding (also called an indirect or de-velopmental encoding) [10, 14–16, 27, 29, 31]. The encodingof an individual defines how its genotype is mapped to itsphenotype, and a generative encoding implies an indirectmapping such that elements in the genotype might describemore than just a single element in the phenotype. Gener-ative encodings are often based on natural developmentalsystems, such as gene regulatory networks, cell division, orchemical gradients, making them more biologically plausiblethan direct encodings [29]. In generative encodings, compactgenomes describe a much larger phenotype via the reuse ofgenomic information, giving rise to regular structures. Infact, if we consider the genotype as a compression of thephenotype, large phenotypes encoded by small genotypesare regular by definition [22].

To generate regularity we employ the HyperNEAT [27]algorithm, which encodes neural networks with a generativeencoding called Compositional Pattern Producing Networks(CPPNs) [26]. CPPNs produce spatial patterns that ex-hibit regularity with variation (Fig. 1a). These spatial pat-terns define the connectivity across the geometric layout ofnodes, enabling HyperNEAT to produce networks that ex-hibit structural regularity [9]. This paper demonstrates thatthe combination of HyperNEAT with the Connection CostTechnique –HyperNEAT-CCT– produces networks that areboth modular and regular.

2. METHODS

2.1 HyperNEATTo generate a network, HyperNEAT requires a geometric

layout of nodes (see Fig. 1b-d for the layouts for problemsin this paper). Given one of these layouts and a CPPN, the

��

��

��

��

��

�

�

�

�

�

��

��

(a) Example CPPN (c) H-XOR

��

�

�

�

�

�

��

��

��

�

�

�

��

��

(b) Retina (d) 5-XOR

Figure 1: Example CPPN and geometric layouts.(a) A CPPN example (see section 2.1). (b) Geometric lay-out for the Retina Problem. (c) Geometric layout for theH-XOR problem. (d) Geometric layout for the 5-XORproblem. Note that, the z coordinates of all nodes are 0.

connectivity of a network is determined by supplying the x,y, and, z coordinates of two neurons as inputs to the CPPN,after which the weight of the connection between those neu-rons is set to the weight output of the CPPN (Fig. 1a). Toset the biases of neurons, the x, y and z coordinates of asingle node are supplied to the CPPN together with a nullposition for the other inputs, and the bias is read from aseparate bias output node (Fig. 1a).

Because it has been demonstrated that the original Hyper-NEAT has trouble creating modular networks [4], we haveimplemented HyperNEAT with the Link-Expression Out-put (LEO) [36], an additional output neuron determiningwhether connections are expressed (Fig. 1a). This extensionallows HyperNEAT to separate network connectivity fromweight patterns, enhancing its ability to evolve sparsely con-nected, yet functional, sub-units. Connecting a Gaussianseed to LEO further increases HyperNEAT’s ability to pro-duce modular networks [36].

The Gaussian seed consists of an additional hidden nodewith a Gaussian activation function, added to the networkupon initialization. This node is connected to two inputsby one inhibitory and one excitatory connection, such thattheir sum represents the di↵erence along one axis. Becausethe Gaussian activation function produces strong activationsonly for values close to 0, it will ‘fire’ only when distancesbetween nodes are short, thus encouraging shorter connec-tions, which may help in discovering modular solutions [36].

However, by planting the Gaussian seed inside the CPPN,there is no guarantee that the seed will be preserved through-out evolution. Whenever there is no immediate benefit forshorter connections, which may occur at any point in a run,the Gaussian seed might disappear completely from the pop-ulation. We believe that, due to its persistence as a selec-tion pressure, the CCT will generally outperform the localityseed. That is because there exist many situations in whichthe advantages of short connections are not immediate whilesolving problems in challenging or changing environments.To test that hypothesis we include a treatment featuringHyperNEAT, LEO, and the Gaussian seed.

In our implementation the seed is exclusively employed forthe x inputs, which was reported to be the most successfulvariant [36]. The weights from the input to the Gaussianseed are �0.6 and 0.6, respectively. The LEO node startswith a sigmoid activation function (a hyperbolic tangent)and a negative bias (�1). A link is expressed when the LEOnode returns a value � 0, which provides a behavior similarto the step function used in [36].

HyperNEAT evolves CPPNs genomes via the NeuroEvo-lution of Augmenting Topologies (NEAT) algorithm [28].The three important parts of the NEAT algorithm are (1)an intelligent method for crossover between networks, (2)protecting diversity through speciation and (3) complexifi-cation, which means starting with networks that have fewnodes and connections, and adding them across evolution-ary time. In this paper we have implemented HyperNEATwithin the PNSGA algorithm from [8], which is programmedwithin the Sferes21 platform [23]. The resulting algorithmdi↵ers from NEAT in two ways. First, speciation, whichencourages genomic diversity in NEAT, is replaced by a be-havioral diversity objective, an adaptation employed in sev-eral other publications [20, 33]. Second, crossover has beenremoved for simplicity. We follow previous publications bythe authors of HyperNEAT in maintaining the name Hyper-NEAT to algorithmic variants that have its key components(e.g. CPPNs, complexification, and diversity) [20,33].

2.2 ExperimentsThere are four di↵erent treatments: (1) HyperNEAT, (2)

HyperNEAT with the Gaussian Seed (HyperNEAT-GS) [36],(3) HyperNEAT with the Connection Cost Technique(HyperNEAT-CCT), and (4) a direct encoding with the Con-nection Cost Technique (DirectEncoding-CCT), which is themain algorithm from [8]. Each HyperNEAT treatment fea-tures LEO (explained above). All treatments are evolvedaccording to the same evolutionary algorithm described insection 2.3, and every treatment optimizes at least two ob-jectives: performance on the test problem and behavioraldiversity. Treatments employing CCT add minimizing con-nection costs as a third objective.

Behavioral diversity of an individual is calculated by stor-ing the output for every possible input in a binary vector(< 0 is false, � 0 is true) and then taking the average Ham-ming distance to the binary vector of all other individuals inthe population. The connection-cost is calculated as the sumof squared lengths of all connections in the phenotype [8].

2.3 Evolutionary AlgorithmWe incorporate the CCT into the multi-objective PNSGA

algorithm [8], an extended version of NSGA-II [11]. Thesealgorithms optimize individuals on several tasks at once, andtry to preserve and select for all individuals that have someunique trade-o↵ between between objectives, such as beingvery good at one task but terrible at the others, or being av-erage at all tasks. PNSGA extends NSGA-II by assigning aprobability to an objective, which determines the frequencythat this objective will factor into selection. By assigning alower probability to the connection cost objective, we canimplement the intuition that performance on the task ismore important than a low connection cost. For these ex-

1All of the source code used to perform these experimentsis available on EvolvingAI.com.

periments, following [8], the probability of the connectioncost factoring into a fitness comparison is 25%.

The population is initialized with randomly generated,fully connected networks without hidden nodes, as is pre-scribed for NEAT [28]. Parents are chosen via tournamentselection (tournament size of 2), where the winner is the onethat dominates the other, with ties broken randomly.

Parents are copied and the copies are mutated follow-ing [8]. The mutation operators: add connection (9%), re-move connection (8%), add node (5%), and remove node(4%), are executed at most once. Change weight (10%) and,for CPPNs, change activation function (10%) mutations areperformed on a per connection and per node basis. Mutationrates were chosen as the result of a preliminary parametersweep for high performance. For CPPN-based treatmentsthe activation functions are randomly selected from the fol-lowing set: Gaussian, linear, sigmoid and sine. Biases arehandled by an additional input that always has an activa-tion of 1, meaning the connection between a node and thisinput determines the bias for that node.

Survivors were selected from the mixed population of o↵-spring and parents. For all experiments the population sizewas 1000 and the only stopping condition was the maximumnumber of generations, which was either 25000 or 50000, de-pending on the problem.

2.4 Test problemsWe have tested all treatments on three modular and reg-

ular problems from [8]: the Retina Problem (originallyintroduced in [17]), the 5-XOR problem, and the Hier-archical XOR problem.

The Retina Problem simulates a simple retina that re-ceives visual input from 8 pixels (Fig. 2a). The left and righthalves of the retina may each contain a pattern of interestknown as an “object”. The patterns, shown in figure 2b,are flattened versions of those from [8] and are defined suchthat each pattern has a mirror image on the other side of theretina, providing at least one symmetry that can be discov-ered. The network is tested on all 256 possible patterns andthe task for the network is to indicate whether there is (> 0)or is not (< 0) an object present at both the left and the rightside of the retina. Note that, while the problem is modu-larly decomposable, there also exist perfect-performing, non-modular solutions [8, 17].

��

��

��

��

��

��

(a) Retina

��

(b) Retina objects

��

(c) 5-XOR

��

��

(d) H-XOR

Figure 2: Experimental Problems. (a) The generalstructure of the Retina Problem, where the network hasto answer whether there is both a left and a right objectpresent. (b) The patterns that count as objects for theRetina Problem. (c) The H-XOR problem, consisting of2 identical, hierarchically nested XOR problems. (d) The5-XOR problem, which contains 5 separate XOR problems.

The 5-XOR problem (Fig. 2c) includes five independentXOR problems that a network must solve in parallel. Per-formance on this task is the average performance over allfive XOR tasks. The problem has regularity because of therepeated XORs and it is modularly decomposable becauseeach XOR can be solved separately.

The Hierarchical XOR problem (H-XOR) (Fig. 2d)consist of two separable, hierarchically nested XOR prob-lems (the XOR of two XORs). As with the 5-XOR prob-lem, separability and repetition make that both modularityand regularity are expected to be beneficial in this problem.

2.5 Metrics and VisualizationsWhen reporting the performance across runs we always

consider the ‘best’ individual of the population, where ‘best’means the first individual when sorting on performance firstand modularity second. Ties are broken arbitrarily.

The structural modularity of our networks is measuredby the widely-used modularity Q-score [24]. For functionalmodularity we use two measures from [8]: modular decom-position and sub-problems solved. To calculate modular de-composition we split the network to maximize modularity,as described in [24], with the maximum number of allowedsplits equal to the number of sub-problems, and test whetherinputs corresponding to di↵erent sub-problems end up indi↵erent modules. To calculate the number of sub-problemssolved we check, for every sub-problem, whether there existsa node in the network that linearly separates the positive andnegative classes of that sub-problem. If such a node existsthe sub-problem is considered solved.

Following [8], we visualize modularity by moving nodes tothe location that minimizes the summed length of their con-nections, while holding inputs and outputs fixed (Fig. 3c).This optimal neural placement (ONP) visualization is in-spired by the fact that neurons in some natural organismsare located optimally to minimize the summed length of theconnections between them [2, 3]. Nodes in the ONP visual-izations are colored according to the best modular split. Themaximum number of splits performed depends on the prob-lem: the Retina Problem and H-XOR problems are splitin two parts, while the 5-XOR problem is split in 5 parts.Nodes that solve one of the sub-problems are depicted with alarge colored border surrounding them. Because modularitydi↵erences are not visually apparent at the lower and higherlevels (all treatments produce some modular and some non-modular networks) the networks within each treatment aresorted according to their modularity and those around themiddle of this list are depicted in this paper.

As mentioned in section 1, regularity can be defined asthe compressibility of the data describing a structure. How-ever, since this minimum description length is impossibleto calculate exactly [21], we approximate the regularity bycompressing the network using the Lempel-Ziv-Welch com-pression algorithm. To approximate regularity, we write thenetwork weights and biases to an ASCII string, compress it,and test by which fraction the string size was reduced. Be-cause order matters, we repeat this process for 500 di↵erentpermutations of the weights and biases and take the averageas the regularity value.

When visualizing regularity we leave nodes in their ac-tual geometric locations so as not to distort regularities (e.g.Fig. 4). In this visualization we color excitatory connectionsgreen and inhibitory connections red. The width of the con-

nection indicates the strength of that connection. Similarly,we depict the bias of each node as a circle inside each node,where green circles indicate a positive bias, red circles indi-cate a negative bias, and the size of the circle indicates thestrength of the bias.

Statistical tests are performed with the Mann-Withney-Urank sum test, unless otherwise specified. Shaded areas ingraphs represent 95% bootstrapped confidence intervals ofthe median, generated by sampling the data 5000 times. Tri-angles below graphs indicate when values for HyperNEAT-CCT are significantly higher than for the treatment with thecorresponding symbol and treatment color (p < 0.05).

3. RESULTS

3.1 The Retina ProblemIn the retina experiment, the performance of HyperNEAT-

CCT is significantly higher at nearly every generation thanboth HyperNEAT and HyperNEAT-GS (Fig. 3a); even af-ter the medians of all treatments have reached perfect per-formance, lower-performing runs in the HyperNEAT andHyperNEAT-GS treatments make those treatments performsignificantly worse than HyperNEAT-CCT. In terms of mod-ularity, the level for HyperNEAT hardly changes over time,while the modularity of HyperNEAT-CCT progressively in-creases; the di↵erence becomes significant after 12000 gen-erations (Fig. 3b). The modularity of HyperNEAT-GS, onthe other hand, spikes during the first few generations, butthen it decreases over time to a significantly lower level thanHyperNEAT-CCT (Fig. 3b). This behavior is evidence forour hypothesis that the Gaussian seed may not be an e↵ec-tive way to promote modularity in cases where there is noimmediate fitness benefit.

To examine functional modularity we look at the best net-works produced after 50000 generations. Our test for prob-lem decomposition, which in this case is having the inputsfor the left and right sub-problems in di↵erent modules (sec-tion 2.5), shows that 75% of the HyperNEAT-CCT runs areleft-right modular, which is higher than HyperNEAT, forwhich 64% of the networks are left-right modular, but thedi↵erence is not significant (p = 0.124 Fisher’s exact test).In addition, when considering the number of sub-problemssolved (section 2.5), HyperNEAT-CCT networks solve anaverage of 0.67 (out of 2) sub-problems, which is significantly(p = 0.024) higher than HyperNEAT networks, which solvean average of 0.41 sub-problems.

The di↵erences in modularity are also visually apparent(Fig. 3c). The networks of HyperNEAT-CCT look moremodular, demonstrate left-right modularity more often, andhave more nodes that solve sub-problems than the Hyper-NEAT and HyperNEAT-GS networks.

The reason HyperNEAT-CCT performs better is proba-bly because the problem is modular. Additionally, by guid-ing evolution towards the space of networks with fewer con-nections, fewer weights need to be optimized. As analy-ses in [8] revealed, the reason treatments that select forperformance alone do not produce modularity despite itsbenefits is because the benefits of modularity come in thelong term, whereas selection acts on immediate fitness ben-efits. Interestingly, most of the modularity increases occurafter the majority of HyperNEAT-CCT runs have achievednear-perfect performance. That is likely because once per-formance is perfect, or nearly so, the only way a network

(a) (b)

(c) HyperNEAT-CCT

� �

HyperNEAT

�

HyperNEAT-GS

Figure 3: Results for the Retina Problem. HyperNEAT-CCT significantly (a) outperforms and (b) has higher mod-ularity than both HyperNEAT and HyperNEAT-GS. (c) ONP visualizations (see section 2.5) for networks 54 to 60 (out of100) for HyperNEAT and HyperNEAT-CCT and networks 27 to 33 (out of 50) for HyperNEAT-GS (sorted on modularity).Colored rings indicate neurons that solve the right (orange) and left (blue) sub-problems. HyperNEAT-CCT networks arevisually more modular, exhibit left-right modularity more often, and solve significantly more sub-problems than HyperNEATor HyperNEAT-GS networks.

(a) (b) HyperNEAT-CCT

DirectEncoding-CCT

Figure 4: HyperNEAT-CCT networks are more regular than DirectEncoding-CCT networks. (a) HyperNEAT-CCT networks compress significantly more (p < 0.00001). (b) Four perfectly-performing networks with the highest modularityscores from HyperNEAT-CCT and DirectEncoding-CCT. All four networks of HyperNEAT-CCT show some form of symmetrywith variation, including switching excitatory and inhibitory connections. The DirectEncoding-CCT networks are irregular.

can be selected over others is by reducing connection costs,which tends to increase modularity [8].While the results of HyperNEAT-GS are lower than ex-

pected, the figures shown are for the highest-performing ver-sion of it that we found after experimenting with its keyparameters (Fig 4a, b and c). Initially we ran HyperNEAT-GS with the default settings for the Gaussian seed [36] anda negative bias on the LEO output of �2, which resulted infewer connections by enforcing stronger spatial constraints.After analyzing our results, we hypothesized that the poorperformance of HyperNEAT-GS might have been becausethe distances between our nodes are greater than in [36],so we switched to settings (described in section 2.2) thatcompensate for the increased distances. These settings sig-nificantly improved performance (p < 0.05 for most gener-ations), but did not improve modularity. We subsequentlychanged the bias from �2 to �1, which is more suitablewhen applying locality to only a single axis, and while this

change improved modularity to the level reported in Fig. 3b,the overall results are still worse than those reported in [36].This di↵erence in performance is likely due to di↵erencesin the evolutionary algorithm (section 2.1) and in the prob-lem definition (in [36] the sub-problems did not have to becombined into a single answer).

Since changing its seed improved the performance ob-tained by HyperNEAT-GS, it is possible that it can be im-proved to a level where it outperforms HyperNEAT-CCT.However, the weights of the Gaussian seed were free toevolve, yet evolution was unable to adapt the seed to im-prove either performance or modularity, which shows thatHyperNEAT-GS can fail to produce modularity in absenceof direct rewards.

The di↵erences between the treatments with and withouta connection-cost are not as pronounced as those demon-strated by Clune, Mouret and Lipson [8], indicating that thebeneficial e↵ects of CCT on HyperNEAT are not as great as

was the case for the direct encoding. The reason for this isprobably that HyperNEAT, even with the Link-ExpressionOutput, has trouble pruning individual connections.

Comparing HyperNEAT-CCT with DirectEncoding-CCT(the main algorithm from [8]), the direct encoding is sig-nificantly higher performing in early generations and signif-icantly more modular throughout evolution. The indirectencoding of HyperNEAT seems to struggle more with theirregularities of this problem and is not as good at pruningconnections. That is expected, since removing connectionsin the direct encoding is easy compared to doing so in Hy-perNEAT, which has to adapt the patterns produced by theLEO node such that it cuts o↵ the redundant parts whilekeeping the rest of the network intact.

A main advantage of HyperNEAT is its ability to produceregular patterns [9,10,27]. Compression tests (see methods,section 2.5) reveal that HyperNEAT-CCT networks are sig-nificantly more regular than the DirectEncoding-CCT (Fig-ure 4): the direct encoding with CCT becomes 38% smallerupon compression, but HyperNEAT-CCT compresses fur-ther down by 43%, making it significantly more compress-ible (p < 0.00001). Thus, HyperNEAT-CCT networks areregular in addition to being modular.

The regularity of HyperNEAT-CCT is also visually ap-parent (Fig. 4b). In many of its networks, the left side ismirrored on the right side, even though the signs of the con-nections are sometimes switched. Other networks featurealternating or mirrored patterns in the biases or connec-tions. While HyperNEAT-CCT networks also exhibit someclear variations in each of its patterns, on balance they aremuch more regular than the DirectEncoding-CCT networks(Fig. 4b), which do not show any discernible patterns.

3.2 The 5-XOR and H-XOR problemsOn the 5-XOR problem the treatments di↵er only slightly

until after 25000 generations, where the performance andmodularity of HyperNEAT-CCT becomes significantly bet-ter than HyperNEAT (Fig. 5a,b). The visual di↵erence inmodularity is also clear for networks of intermediate modu-larity (Fig. 5c); HyperNEAT-CCT networks perfectly dividethe problem into five individual networks while the networksproduced by HyperNEAT are entangled.

On the H-XOR problem, HyperNEAT-CCT significantlyoutperforms HyperNEAT in both performance and modular-ity for most of the first 15000 generations (Fig. 5d,e). Hyper-NEAT eventually catches up, erasing the significant di↵er-ences. Network visualizations reveal clear examples whereHyperNEAT-CCT networks modularly decompose the prob-lem, whereas HyperNEAT networks have not managed todisentangle the left and right problems (Figure 5f).

Due to computational and time constraints, we did nottest HyperNEAT-GS on the 5-XOR and H-XOR problem.Given the results presented in section 3.1, it is likely thatHyperNEAT-GS would underperform HyperNEAT and Hy-perNEAT-CCT. Also, because splitting the sub-problems isnecessary to obtain perfect performance on these problems,there is no need to test for sub-problems solved as this willbe directly reflected in performance.

4. FUTURE WORKWhile we have shown the advantages of HyperNEAT-CCT

on simple diagnostic problems, the real power of this methodlies in its ability to create large-scale, modular networks.

HyperNEAT can create functional neural networks with mil-lions of connections, but the networks and tasks performedwere simple [13, 27]. In future research we will increase thescale of networks that have both regularity and modular-ity and test whether these properties improve the ability toperform more complex tasks.

An other area to explore is to allow HyperNEAT to evolvethe number and geometric location of its hidden neurons [25].Because the patterns of neural connectivity HyperNEATproduces depend on the geometric location of nodes [9],adding the CCT to Evolvable-Substrate HyperNEAT [25]may further increase HyperNEAT’s ability to create func-tional neural modularity and could reduce the need for usersto create geometric node layouts that encourage the appro-priate modular decomposition of problems.

Lastly, a recent technique showed that the regular pat-terns produced by a generative encoding, such as Hyper-NEAT, aids the learning capabilities of networks [34]. Wewill combine our method with intra-life learning algorithmsto investigate whether learning is improved when it occurs instructurally organized neural networks. Such learning mayalso ameliorate HyperNEAT’s inability to cope with irregu-larity [6, 10].

5. CONCLUSIONOne strategy to make robots more intelligent and agile is

to evolve neural network controllers that emulate the struc-tural organization of animal brains, including their modular-ity and regularity. Because these properties do not naturallyemerge in evolutionary algorithms, some techniques have tobe employed to encourage them to evolve. We have demon-strated how HyperNEAT with the connection-cost technique(HyperNEAT-CCT) can evolve networks that are both mod-ular and regular, which increases performance on modular,regular problems compared to both the default HyperNEATalgorithm and a variant of HyperNEAT specifically designedto encourage modularity. We have also shown that networksproduced by HyperNEAT-CCT are more regular than net-works produced by adding the CCT to a direct encoding.

While other methods that lead to modular and regularnetworks exist, this work demonstrates a powerful, generalway to promote modularity in the HyperNEAT algorithm,which has recently become one of the leading generative en-codings due to its ability to produce complex regularities,evolve extremely large-scale neural networks, and exploit thegeometry of problems. Our work thus merges separate linesof research into evolving regularity and modularity, allowingthem to be combined into a potentially powerful algorithmthat can produce large-scale neural networks that exhibitkey properties of structural organization in animal brains.Our work thus represents a step towards the day in whichwe can evolve computational brains that rival natural brainsin complexity and intelligence.

6. ACKNOWLEDGMENTSWe thank Kai Olav Ellefsen and the members of the Evolv-

ing AI Lab. JBM is supported by an ANR young researchersgrant (Creadapt, ANR-12-JS03-0009).

5-XOR H-XOR

(a) (d)

(b) (e)

(c) With connection-cost (f) With connection-cost

Without connection-cost Without connection-cost

Figure 5: A Connection Cost Also Increases Modularity and Regularity on the 5-XOR and H-XOR Problems.(a) Median performance over time for the 5-XOR experiment. HyperNEAT-CCT performs significantly better than Hyper-NEAT after 25000 generations. (b) Median modularity over time for the 5-XOR problem. HyperNEAT-CCT is significantlymore modular than HyperNEAT after around 22000 generations (and for periods prior). (c) A comparison of 5-XOR networks18-22 (sorted on modularity) showing ONP visualizations (see methods). HyperNEAT-CCT networks decompose the probleminto the appropriate five modules while HyperNEAT without a connection cost makes unnecessary connections between theseparate XOR problems. (d) On the H-XOR problem, HyperNEAT-CCT performs significantly better than HyperNEAT formost of the run. (e) HyperNEAT-CCT is significantly more modular for most early generations before HyperNEAT catchesup. (f) A visual comparison of H-XOR networks 24-28 (sorted on modularity) showing ONP visualizations (see methods).HyperNEAT-CCT splits the problem perfectly while HyperNEAT makes unnecessary connections between the left and theright problems.

7. REFERENCES[1] S.B. Carroll. Chance and necessity: the evolution of

morphological complexity and diversity. Nature,409(6823):1102–1109, 2001.

[2] C. Cherniak, Z. Mokhtarzada, R. Rodriguez-Esteban,and K. Changizi. Global optimization of cerebralcortex layout. Proceedings of the National Academy ofSciences, 101(4):1081–6, January 2004.

[3] D.B. Chklovskii, T. Schikorski, and C.F. Stevens.Wiring optimization in cortical circuits. Neuron,34(3):341–347, 2002.

[4] J. Clune, B.E. Beckmann, P.K. McKinley, andC. Ofria. Investigating whether HyperNEAT producesmodular neural networks. In Proceedings of the

Genetic and Evolutionary Computation Conference,pages 635–642. ACM, 2010.

[5] J. Clune, B.E. Beckmann, C. Ofria, and R.T.Pennock. Evolving coordinated quadruped gaits withthe HyperNEAT generative encoding. In Proceedingsof the IEEE Congress on Evolutionary Computation,pages 2764–2771, 2009.

[6] J. Clune, B.E. Beckmann, R.T. Pennock, andC. Ofria. HybrID: A Hybridization of Indirect andDirect Encodings for Evolutionary Computation. InProceedings of the European Conference on ArtificialLife, 2009.

[7] J. Clune, D. Misevic, C. Ofria, R.E. Lenski, S.F.Elena, and R. Sanjuan. Natural selection fails to

optimize mutation rates for long-term adaptation onrugged fitness landscapes. PLoS ComputationalBiology, 4(9):e1000187, 2008.

[8] J. Clune, J-B. Mouret, and H. Lipson. Theevolutionary origins of modularity. Proceedings of theRoyal Society B, 280(20122863), 2013.

[9] J. Clune, C. Ofria, and R.T. Pennock. The sensitivityof HyperNEAT to di↵erent geometric representationsof a problem. In Proceedings of the Genetic andEvolutionary Computation Conference, pages 675–682,2009.

[10] J. Clune, K.O. Stanley, R.T. Pennock, and C. Ofria.On the performance of indirect encoding across thecontinuum of regularity. IEEE Transactions onEvolutionary Computation, 15(4):346–367, 2011.

[11] K. Deb, A. Pratap, S. Agarwal, and T.A.M.T.Meyarivan. A fast and elitist multiobjective geneticalgorithm: Nsga-ii. Evolutionary Computation, IEEETransactions on, 6(2):182–197, 2002.

[12] C. Espinosa-Soto and A. Wagner. Specialization candrive the evolution of modularity. PLoSComputational Biology, 6(3):e1000719, 2010.

[13] J. Gauci and K.O. Stanley. Generating large-scaleneural networks through discovering geometricregularities. In Proceedings of the Genetic andEvolutionary Computation Conference, pages997–1004. ACM, 2007.

[14] G.S. Hornby, H. Lipson, and J.B. Pollack. Generativerepresentations for the automated design of modularphysical robots. IEEE Transactions on Robotics andAutomation, 19(4):703–719, 2003.

[15] G.S. Hornby and J. B. Pollack. Evolving L-systems togenerate virtual creatures. Computers & Graphics,25(6):1041–1048, December 2001.

[16] B. Inden, Y. Jin, R. Haschke, and H. Ritter.Exploiting inherent regularity in control of multileggedrobot locomotion by evolving neural fields. 2011 ThirdWorld Congress on Nature and Biologically InspiredComputing, pages 401–408, October 2011.

[17] N. Kashtan and U. Alon. Spontaneous evolution ofmodularity and network motifs. Proceedings of theNational Academy of Sciences, 102(39):13773–13778,September 2005.

[18] N. Kashtan, E. Noor, and U. Alon. Varyingenvironments can speed up evolution. Proceedings ofthe National Academy of Sciences,104(34):13711–13716, August 2007.

[19] C.P. Klingenberg. Developmental constraints, modulesand evolvability. Variation: A central concept inbiology, pages 1–30, 2005.

[20] J. Lehman, S. Risi, D.B. D’Ambrosio, and K.O.Stanley. Encouraging reactivity to create robustmachines. Adaptive Behavior, 21(6):484–500, August2013.

[21] M. Li. An introduction to Kolmogorov complexity andits applications. Springer, 1997.

[22] H. Lipson. Principles of modularity, regularity, andhierarchy for scalable systems. Journal of BiologicalPhysics and Chemistry, 7(December):125–128, 2007.

[23] J-B. Mouret and S. Doncieux. Sferes v2: Evolvin’ in

the multi-core world. IEEE Congress on EvolutionaryComputation, (2):1–8, July 2010.

[24] M.E.J. Newman. Modularity and community structurein networks. Proceedings of the National Academy ofSciences, 103(23):8577–8582, 2006.

[25] S. Risi, J. Lehman, and K.O. Stanley. Evolving theplacement and density of neurons in the hyperneatsubstrate. Proceedings of the 12th annual conferenceon Genetic and evolutionary computation - GECCO’10, (Gecco):563, 2010.

[26] K.O. Stanley. Compositional pattern producingnetworks: A novel abstraction of development.Genetic Programming and Evolvable Machines,8(2):131–162, 2007.

[27] K.O. Stanley, D.B. D’Ambrosio, and J. Gauci. Ahypercube-based encoding for evolving large-scaleneural networks. Artificial Life, 15(2):185–212, 2009.

[28] K.O. Stanley and R. Miikkulainen. Evolving neuralnetworks through augmenting topologies. EvolutionaryComputation, 10(2):99–127, 2002.

[29] K.O. Stanley and R. Miikkulainen. A taxonomy forartificial embryogeny. Artificial Life, 9(2):93–130,2003.

[30] G.F. Striedter. Principles of brain evolution. SinauerAssociates Sunderland, MA, 2005.

[31] M. Suchorzewski. Evolving scalable and modularadaptive networks with Developmental SymbolicEncoding. Evolutionary intelligence, 4(3):145–163,September 2011.

[32] M. Suchorzewski and J. Clune. A novel generativeencoding for evolving modular, regular and scalablenetworks. In Proceedings of the Genetic andEvolutionary Computation Conference, pages1523–1530, 2011.

[33] P. Szerlip and K.O. Stanley. Indirectly EncodedSodarace for Artificial Life. Advances in ArtificialLife, ECAL 2013, pages 218–225, September 2013.

[34] P. Tonelli and J-B. Mouret. On the relationshipsbetween generative encodings, regularity, and learningabilities when evolving plastic, artificial neuralnetworks. PLoS One, page To appear, 2013.

[35] V.K. Valsalam and R. Miikkulainen. Evolvingsymmetric and modular neural networks fordistributed control. Proceedings of the 11th Annualconference on Genetic and evolutionary computation -GECCO ’09, page 731, 2009.

[36] P. Verbancsics and K.O. Stanley. Constrainingconnectivity to encourage modularity in hyperneat. InProceedings of the 13th annual conference on Geneticand evolutionary computation, pages 1483–1490.ACM, 2011.

[37] G.P. Wagner, J. Mezey, and R. Calabretta.Modularity. Understanding the development andevolution of complex natural systems, chapter Naturalselection and the origin of modules. MIT Press, 2001.

[38] G.P. Wagner, M. Pavlicev, and J.M. Cheverud. Theroad to modularity. Nature Reviews Genetics,8(12):921–31, December 2007.

Date post:	15-Mar-2022
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

Evolving Neural Networks That Are Both Modular and Regular ...

Documents