Evolution of Complex Behavior Controllers using Genetic Algorithms Title.

transcript

Evolution

Complex Behavior Controllers

Genetic Algorithms

http://www.cs.unr.edu/~gruber 2

• Kerry Gruber (UNR now Intel)

• Jason Baurick (UNR)

• Sushil J. Louis (UNR)

• Funded in part from grant number 9624130 from NSF

G.A.T.O.R.S

Genetic Algorithm Training

Robot Simulations

General Description

• Use artificial neural networks for control of the simulated robots

• Evolve the weights of the neural networks using a Genetic Algorithm (GA)

Goals• Develop controllers exhibiting generalized

complex behavior– Perform complex spatially-independent tasks

– Able to perform adequately under varying environmental conditions

• Performance which meets or exceeds that of controllers designed by humans

• Use a minimum of state information

Why use a GA to train the neural network?

• Training is based on actual performance instead of expected performance– Supervised training models rely on the designer’s

understanding of the environment and the expected consequences of input to output relationships

– GA uses whole-run performance; instead of single-step input to output relationships

Vacuum Cleaning(Cover as larger an area as possible in the time allotted.)

• Move through the environment without retracing previous steps, and do so with no spatial information of the current or previous locations occupied

• Recharge by locating and accessing energy supplies/outlets (prey capture scenarios)

– Energy supplies may only be used by one unit at a time and are not accessible for a time afterwards (only one vacuum cleaner to an outlet)

• Interact with obstacles in the environment without crashing (obstacle avoidance)

• Negotiate obstacles without crashing (wall following)

Simulator• Robots

– Predators (Vacuum cleaners)– Prey (Energy supplies/outlets)

• Environment– 300X300 Spatially Independent Grid– Contains Obstacles

• Simulation Process– 1000 time steps

Simulation

Obstacles

Predators

Predators• Two independent motors

– 1 on each side – 4 possible states per motor

• Battery – Depleted as the robot moves

– May be recharged by consuming prey

• Five binary touch sensors – 4 feelers, 1 crash sensor

• Two real-valued hearing sensors

HearingSensor

VariableLength 10 Units

6 Units

TouchSensor

Robot Sensor Positions

Prey/Battery• Stationary

• Emit Sound (signal) Audible to Predators– Inversely proportional to square of distance.– Cut off outside hearing range

• May be Consumed by Predators– Only a single predator may have access at a time– May not be re-used for a certain period of time

Environment• 300X300 Spatially Independent Grid• 3-10 Obstacles (5-10% of area each)

• 5 Unit Border (Assures entire area not covered)

Random Environment

Obstacles

Predators

Simulation Process• 1000 Steps• Each predator randomly given a chance to move

– Provided touch and hearing sensor levels

• New position is determined and battery levels decreased in accordance with motor settings– If boundary crossed, moved outside of boundary and crash registered

(battery still decreased as if no crash occurred)

• If in contact with “live” prey, battery recharged and prey consumed

• New sensor states determined• If battery depleted, predator considered “dead”• Sleeping prey awaken if timer expired

Neural Network• Two-layer fully connected artificial neural network

• Sigmoid activation function

• Each node has a bias

• 10 Inputs – 5 binary touch sensors

– 2 hearing sensors (Using binary states dependent on side and presence)

– 2 binary hearing states

– Battery level

• 1-10 Hidden nodes

• 4 Output nodes; output threshold of 0.5

Virtual Prey Location

HearingSensors

Actual PreyLocation

Virtual PreyLocation

Hearing Sensor StatesPresence of a virtual food source required a minimum of state information in order to find and capture prey.

• Hearing sensor levels used to determine whether any prey could be heard and the side they were on

• Input as two binary levels

• Save and used as state information during the next step

GA-Encoding• 16-bit binary representation of weights

(1024-bit string for 4 hidden nodes)

• Input weights range from -100 to 100

• Biases range from -100*N to 100*N (N=number of inputs)

Node 1

W11 W21 ... WN1 B1 W1N W2N ... WNN BN

Node N

GA-Initialization• Input weights randomly initialized over

full range

• Operating point initialization – Biases set at -0.5*Input Weights

Transition Probability Distribution

0.9960.750

1.E-05

1.E-03

1.E-01

1.E+01

0 512 1024

Number of Transitions

Op. Region Random

75% of all nodes with the bias set by random generation never have a state transition during the first generation. With operating point initialization, 99.6% transition for 512 of 1024 possible input combinations. (10 million randomly generated nodes.)

First Generation Coverage Probability Distribution

1.E-06

1.E-05

1.E-041.E-03

1.E-02

1.E-01

1.E+00

0 200 400 600 800 1000

Coverage

babili

Op. Region Random

Operating point initialization more evenly distributes the initial coverage values used for fitness determination. Other feature measurements show similar differences. (250K random networks.)

Fitness vs. Initialization Method

507090

110130150170190

0 10 20 30 40 50 60 70 80 90 100Generation

sRandom Op. Point

Using operating point initialization, the GA progresses at a higher rate because the initial nodes are actually operational. (Differences in the random environments cause the high fluctuation of values.)

GA-Fitness Function• Use five features

– Area coverage– Number of prey consumed– Distance covered– Number of crashes– Number of obstacle touches

• Relative fitness based on the averages and standard deviations of each generation

Fi = Wf * 2(Xif-f)/f

where:

Fi is the fitness for the ith featureWf is the weight of the featureXif is the score for a given feature-f is the average value of a feature for a given generationf is the standard deviation of a feature for a given generation

GA-Fitness Function Cont.• Only feature scores which indicate operation are

included in the average and deviation calculation– Non functioning units tend to lower averages

– Non functioning units increase deviations

– Leads to insufficient selection pressure

• The deviation for crashes is set to the average if the deviation is greater than the average– There is clear cut-off for this feature

– High deviations and low averages lead to insufficient selection pressure as the GA matures

Training• Competitive environment; 3 human-

programmed controllers

• Random obstacles, prey, initial positions, and environment variables generated for each generation

• Random variables same for each chromosome

Training VariablesVariable Min. Value Max. Value

Number of Obs. 3 10

Area Coverage per Obs. 5% 10%

Ear Length 1 10

Hearing Range 20 200

Number of Prey 1 4

Prey Length 1 10

Prey Sleep Time 100 1000

Variables are set at the beginning of each generation and maintained for all chromosomes.

GA Setting for Final Controller

Variable SettingHidden Nodes 4Generations 1000

Population Size 100Crossover Rate 100%

Crossover Points 1Mutation Rate 0.001

Elite Percentage 30%Consumption Weight 50

Touch Weight 20Crash Weight 75

Distance Weight 25Coverage Weight 100

Final Selection• For testing purposes, final controller was

selected by hand based on objective performance in pre-defined environments

• Web demonstration uses the best sum of fitnesses of the top 20 controllers over 10 different environments

Implementation• 8 node Beowulf cluster

– PII 400MHz machines

– Red Hat Linux

– LAM version of MPI

• 3 Java interface applet/applications– Configuration

– Training

– Simulation Display

Speed-Up

3.483.37

1 2 3 4 5 6 7 8

Number of Nodes

Results-Average Area Coverage

0 10 20 30 40 50 60 70 80 90 100Random Environment Set

1 2 3 Test Unit

Averages using 100 different random simulation settings in 100different environments.

Results-Average Number of Touches

Averages using 100 different random simulation settings in 100different environments.

0100200300400500600700

1 2 3 Test Unit

Results-Average Distance

Test unit covers more area, but less distance. Indicates a slower speed and better energy conservation.

ce1 2 3 Test Unit

Results-Coverage vs. Prey Sleep Time

Final controller is less affected by variations in prey sleep time. (Sleep time increment over

100 iterations, results averaged from 100 random environments at each setting.)

100 1000Prey Sleep Time

ge1 2 3 Test Unit

Results-Crashes vs. Hearing Range

Final controller performs poorly in relation to crashes as the hearing range is increased. (Hearing range increment over 100 iterations, results averaged from 100 random

environments at each setting.)

20 200Hearing Range

shes1 2 3 Test Unit

Results-Crashes vs. Noise Bias

Final controller performs adequately in the presence of noise (Noise increment over 100

iterations, results averaged from 100 random environments at each setting.)

0 10 20 30 40 50 60 70 80 90 100

Noise Bias

shes1 2 3 Test Unit

Results-Coverage for Robots Trained w/wo Noise

Controllers trained in the presence of noise are less susceptible to its effects, but do not reach peak performance. (Noise increment over 100 iterations, results averaged from 100 random

environments at each setting.)

0 10 20 30 40 50 60 70 80 90 100

Noise Bias

geTest Unit Noise-Trained

Conclusions• Result controller surpassed those produced by humans in the

areas of coverage and energy conservation

• All stages of the GAs progression must be taken into account in the fitness function to achieve acceptable results– Operating point initialization guarantees function nodes during early

generations; and appears to increase GA performance

– Relative scoring functions appears to provide good selection pressure over the life of the GA

Evolution of Complex Behavior Controllers using Genetic Algorithms Title.

Documents