Post on 18-Jan-2016
transcript
Evolution
of
Complex Behavior Controllers
using
Genetic Algorithms
Title
http://www.cs.unr.edu/~gruber 2
• Kerry Gruber (UNR now Intel)
• Jason Baurick (UNR)
• Sushil J. Louis (UNR)
• Funded in part from grant number 9624130 from NSF
http://www.cs.unr.edu/~gruber 3
G.A.T.O.R.S
Genetic Algorithm Training
of
Robot Simulations
http://www.cs.unr.edu/~gruber 4
General Description
• Use artificial neural networks for control of the simulated robots
• Evolve the weights of the neural networks using a Genetic Algorithm (GA)
http://www.cs.unr.edu/~gruber 5
Goals• Develop controllers exhibiting generalized
complex behavior– Perform complex spatially-independent tasks
– Able to perform adequately under varying environmental conditions
• Performance which meets or exceeds that of controllers designed by humans
• Use a minimum of state information
http://www.cs.unr.edu/~gruber 6
Why use a GA to train the neural network?
• Training is based on actual performance instead of expected performance– Supervised training models rely on the designer’s
understanding of the environment and the expected consequences of input to output relationships
– GA uses whole-run performance; instead of single-step input to output relationships
http://www.cs.unr.edu/~gruber 7
Vacuum Cleaning(Cover as larger an area as possible in the time allotted.)
• Move through the environment without retracing previous steps, and do so with no spatial information of the current or previous locations occupied
• Recharge by locating and accessing energy supplies/outlets (prey capture scenarios)
– Energy supplies may only be used by one unit at a time and are not accessible for a time afterwards (only one vacuum cleaner to an outlet)
• Interact with obstacles in the environment without crashing (obstacle avoidance)
• Negotiate obstacles without crashing (wall following)
http://www.cs.unr.edu/~gruber 8
Simulator• Robots
– Predators (Vacuum cleaners)– Prey (Energy supplies/outlets)
• Environment– 300X300 Spatially Independent Grid– Contains Obstacles
• Simulation Process– 1000 time steps
http://www.cs.unr.edu/~gruber 9
Simulation
Obstacles
Prey
Predators
http://www.cs.unr.edu/~gruber 10
Predators• Two independent motors
– 1 on each side – 4 possible states per motor
• Battery – Depleted as the robot moves
– May be recharged by consuming prey
• Five binary touch sensors – 4 feelers, 1 crash sensor
• Two real-valued hearing sensors
http://www.cs.unr.edu/~gruber 11
Rear
HearingSensor
VariableLength 10 Units
6 Units
TouchSensor
Robot Sensor Positions
http://www.cs.unr.edu/~gruber 12
Prey/Battery• Stationary
• Emit Sound (signal) Audible to Predators– Inversely proportional to square of distance.– Cut off outside hearing range
• May be Consumed by Predators– Only a single predator may have access at a time– May not be re-used for a certain period of time
http://www.cs.unr.edu/~gruber 13
Environment• 300X300 Spatially Independent Grid• 3-10 Obstacles (5-10% of area each)
• 5 Unit Border (Assures entire area not covered)
http://www.cs.unr.edu/~gruber 14
Random Environment
Obstacles
Prey
Predators
http://www.cs.unr.edu/~gruber 15
Simulation Process• 1000 Steps• Each predator randomly given a chance to move
– Provided touch and hearing sensor levels
• New position is determined and battery levels decreased in accordance with motor settings– If boundary crossed, moved outside of boundary and crash registered
(battery still decreased as if no crash occurred)
• If in contact with “live” prey, battery recharged and prey consumed
• New sensor states determined• If battery depleted, predator considered “dead”• Sleeping prey awaken if timer expired
http://www.cs.unr.edu/~gruber 16
Neural Network• Two-layer fully connected artificial neural network
• Sigmoid activation function
• Each node has a bias
• 10 Inputs – 5 binary touch sensors
– 2 hearing sensors (Using binary states dependent on side and presence)
– 2 binary hearing states
– Battery level
• 1-10 Hidden nodes
• 4 Output nodes; output threshold of 0.5
http://www.cs.unr.edu/~gruber 17
Virtual Prey Location
HearingSensors
Actual PreyLocation
Virtual PreyLocation
http://www.cs.unr.edu/~gruber 18
Hearing Sensor StatesPresence of a virtual food source required a minimum of state information in order to find and capture prey.
• Hearing sensor levels used to determine whether any prey could be heard and the side they were on
• Input as two binary levels
• Save and used as state information during the next step
http://www.cs.unr.edu/~gruber 19
GA-Encoding• 16-bit binary representation of weights
(1024-bit string for 4 hidden nodes)
• Input weights range from -100 to 100
• Biases range from -100*N to 100*N (N=number of inputs)
Node 1
W11 W21 ... WN1 B1 W1N W2N ... WNN BN
Node N
...
http://www.cs.unr.edu/~gruber 20
GA-Initialization• Input weights randomly initialized over
full range
• Operating point initialization – Biases set at -0.5*Input Weights
http://www.cs.unr.edu/~gruber 21
Transition Probability Distribution
0.9960.750
1.E-05
1.E-03
1.E-01
1.E+01
0 512 1024
Number of Transitions
Pro
babi
lity
Op. Region Random
75% of all nodes with the bias set by random generation never have a state transition during the first generation. With operating point initialization, 99.6% transition for 512 of 1024 possible input combinations. (10 million randomly generated nodes.)
http://www.cs.unr.edu/~gruber 22
First Generation Coverage Probability Distribution
1.E-06
1.E-05
1.E-041.E-03
1.E-02
1.E-01
1.E+00
0 200 400 600 800 1000
Coverage
Pro
babili
ty
Op. Region Random
Operating point initialization more evenly distributes the initial coverage values used for fitness determination. Other feature measurements show similar differences. (250K random networks.)
http://www.cs.unr.edu/~gruber 23
Fitness vs. Initialization Method
507090
110130150170190
0 10 20 30 40 50 60 70 80 90 100Generation
Fit
nes
sRandom Op. Point
Using operating point initialization, the GA progresses at a higher rate because the initial nodes are actually operational. (Differences in the random environments cause the high fluctuation of values.)
http://www.cs.unr.edu/~gruber 24
GA-Fitness Function• Use five features
– Area coverage– Number of prey consumed– Distance covered– Number of crashes– Number of obstacle touches
• Relative fitness based on the averages and standard deviations of each generation
Fi = Wf * 2(Xif-f)/f
where:
Fi is the fitness for the ith featureWf is the weight of the featureXif is the score for a given feature-f is the average value of a feature for a given generationf is the standard deviation of a feature for a given generation
http://www.cs.unr.edu/~gruber 25
GA-Fitness Function Cont.• Only feature scores which indicate operation are
included in the average and deviation calculation– Non functioning units tend to lower averages
– Non functioning units increase deviations
– Leads to insufficient selection pressure
• The deviation for crashes is set to the average if the deviation is greater than the average– There is clear cut-off for this feature
– High deviations and low averages lead to insufficient selection pressure as the GA matures
http://www.cs.unr.edu/~gruber 26
Training• Competitive environment; 3 human-
programmed controllers
• Random obstacles, prey, initial positions, and environment variables generated for each generation
• Random variables same for each chromosome
http://www.cs.unr.edu/~gruber 27
Training VariablesVariable Min. Value Max. Value
Number of Obs. 3 10
Area Coverage per Obs. 5% 10%
Ear Length 1 10
Hearing Range 20 200
Number of Prey 1 4
Prey Length 1 10
Prey Sleep Time 100 1000
Variables are set at the beginning of each generation and maintained for all chromosomes.
http://www.cs.unr.edu/~gruber 28
GA Setting for Final Controller
Variable SettingHidden Nodes 4Generations 1000
Population Size 100Crossover Rate 100%
Crossover Points 1Mutation Rate 0.001
Elite Percentage 30%Consumption Weight 50
Touch Weight 20Crash Weight 75
Distance Weight 25Coverage Weight 100
http://www.cs.unr.edu/~gruber 29
Final Selection• For testing purposes, final controller was
selected by hand based on objective performance in pre-defined environments
• Web demonstration uses the best sum of fitnesses of the top 20 controllers over 10 different environments
http://www.cs.unr.edu/~gruber 30
Implementation• 8 node Beowulf cluster
– PII 400MHz machines
– Red Hat Linux
– LAM version of MPI
• 3 Java interface applet/applications– Configuration
– Training
– Simulation Display
http://www.cs.unr.edu/~gruber 31
Speed-Up
1.71
2.34
3.483.37
2.88
1.00
3.93
0.93
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
1 2 3 4 5 6 7 8
Number of Nodes
Sp
eed
-up
http://www.cs.unr.edu/~gruber 32
Results-Average Area Coverage
300
400
500
600
700
800
0 10 20 30 40 50 60 70 80 90 100Random Environment Set
Co
vera
ge
1 2 3 Test Unit
Averages using 100 different random simulation settings in 100different environments.
http://www.cs.unr.edu/~gruber 33
Results-Average Number of Touches
Averages using 100 different random simulation settings in 100different environments.
0100200300400500600700
0 10 20 30 40 50 60 70 80 90 100Random Environment Set
Nu
mb
er o
f T
ou
ches
1 2 3 Test Unit
http://www.cs.unr.edu/~gruber 34
Results-Average Distance
Test unit covers more area, but less distance. Indicates a slower speed and better energy conservation.
1000
1100
1200
1300
1400
0 10 20 30 40 50 60 70 80 90 100Random Environment Set
Dis
tan
ce1 2 3 Test Unit
http://www.cs.unr.edu/~gruber 35
Results-Coverage vs. Prey Sleep Time
Final controller is less affected by variations in prey sleep time. (Sleep time increment over
100 iterations, results averaged from 100 random environments at each setting.)
300
400
500
600
700
800
100 1000Prey Sleep Time
Co
vera
ge1 2 3 Test Unit
http://www.cs.unr.edu/~gruber 36
Results-Crashes vs. Hearing Range
Final controller performs poorly in relation to crashes as the hearing range is increased. (Hearing range increment over 100 iterations, results averaged from 100 random
environments at each setting.)
0
24
68
1012
14
20 200Hearing Range
Cra
shes1 2 3 Test Unit
http://www.cs.unr.edu/~gruber 37
Results-Crashes vs. Noise Bias
Final controller performs adequately in the presence of noise (Noise increment over 100
iterations, results averaged from 100 random environments at each setting.)
0
2040
60
80100
120
0 10 20 30 40 50 60 70 80 90 100
Noise Bias
Cra
shes1 2 3 Test Unit
http://www.cs.unr.edu/~gruber 38
Results-Coverage for Robots Trained w/wo Noise
Controllers trained in the presence of noise are less susceptible to its effects, but do not reach peak performance. (Noise increment over 100 iterations, results averaged from 100 random
environments at each setting.)
0
200
400
600
800
0 10 20 30 40 50 60 70 80 90 100
Noise Bias
Co
vera
geTest Unit Noise-Trained
http://www.cs.unr.edu/~gruber 39
Conclusions• Result controller surpassed those produced by humans in the
areas of coverage and energy conservation
• All stages of the GAs progression must be taken into account in the fitness function to achieve acceptable results– Operating point initialization guarantees function nodes during early
generations; and appears to increase GA performance
– Relative scoring functions appears to provide good selection pressure over the life of the GA