Post on 20-Jul-2020
transcript
1
Distributed Intelligent Systems – W11Machine-Learning Methods
Applied to Distributed Robotic Systems
Outline
• Expensive optimization problems– Noise resistance– Evaluation time
• Challenges in multi-robot scenarios– Credit assignment problems– Co-adaptation strategies
• Co-adaptation examples– Co-learning obstacle avoidance– Co-evolving coordinated motion
2
Expensive Optimization and Noise Resistance
3
Expensive Optimization Problems
1. Time for evaluation of candidate solutions (e.g., tens of seconds) >> time for application of metaheuristic operators (e.g., milliseconds)
2. Noisy performance evaluations disrupt the adaptation process and require multiple evaluations for actual performance
Two fundamental reasons making robot control design and optimization expensive in terms of time:
4
Expensive Optimization Problems1. Time for evaluation of candidate >> time for
application of metaheuristic operators
• Example: obstacle avoidance• Robots need to encounter
obstacles to learn to avoid them
• Evaluation span 20-60 s depending on size of the arena
• Current processors can execute several million instructions in that time (e.g. ARM Cortex-A9 ~5000 MIPS)
[Di Mario and Martinoli, Robotica, 2014] 5
Expensive Optimization Problems2. Noisy performance evaluations disrupt the adaptation process and require multiple evaluations for actual performance
fitness
# e
valu
atio
ns
• Multiple evaluations at the same point in the search space yield different results
• Example: fitness distribution for obstacle avoidance
• Noise from: sensors, actuators, initial conditions, other robots.
• Noise causes decreased convergence speed and residual error
[Di Mario and Martinoli, Robotica, 2014] 6
Reducing Evaluation TimeGeneral recipe: exploit more abstracted, calibrated representations (models and simulation tools)
See also multi-level modeling lectures
7
Dealing with Noisy Evaluations• Better information about candidate solution can be obtained by
combining multiple noisy evaluations• We could evaluate systematically each candidate solution for a
fixed number of times → not efficient from a computational perspective
• We want to dedicate more computational time to evaluate promising solutions and eliminate as quickly as possible the “lucky” ones
• Idea: re-evaluate and aggregate → each candidate solution might have been evaluated a different number of times → compare the aggregated value
• In GA good and robust candidate solutions survive over generations; in PSO they survive in the individual memory
• Use dedicated functions for aggregating multiple evaluations: e.g., minimum and average or generalized aggregation functions (e.g., quasi-linear weighted means), perhaps combined with a statistical test for comparing resulting aggregated performances
8
GA PSO
9
Testing Noise-Resistant on Benchmarks
• Benchmark 1 : Sphere and Generalized Rosenbrockfunctions– 30 real parameters [Pugh et al., SIS 2005]– 24 real parameters [Di Mario et al., CEC 2014]– Minimize objective function– Expensive only because of noise
• Benchmark 2: obstacle avoidance on a robot– 24 real parameters– Maximize objective function– Expensive because of noise and evaluation time
Biased results!
10
Benchmark 1: Gaussian Additive Noise on Generalized Rosenbrock
Fair test: samenumber of evaluations candidate solutions for all algorithms (i.e. N generations/ iterations of standard versions compared with N/2 of the noise-resistant ones)
[Pugh et al., SIS 2005]
Biased results: low number of runs (20) and population size (20) < search dimension (30)!
11
Benchmark 1: Functions
• Sphere
• Rosenbrock
• Normalized and bounded to [0, 1]
• Gaussian noise model
• Bernoulli noise model-2
-10
12
-10
12
340
500
1000
1500
2000
2500
3000
-2-1
01
2
-2-1
0
120
2
4
6
8
Sphere
Rosenbrock
[Di Mario et al., CEC 2014]
12
Rosenbrock with Gaussian Noise: Increasing σ
σ = 0 σ = 0.01
σ = 0.05 σ = 0.1
[Di Mario et al., CEC 2014]
13
Increasing Population Size Does Not Help
[Di Mario et al., CEC 2014]
14
Bernoulli Noise: Positive and Negative Amplitudes
15
Benchmark 2:Obstacle Avoidance on a Mobile Robot
• Similar to [Floreano and Mondada 1996]– Discrete-time, single-layer, artificial
recurrent neural network controller– Shaping of neural weights and biases
(24 real parameters)– fitness function: rewards speed,
straight movement, avoiding obstacles
• Different from [Floreano and Mondada 1996]– Environment: bounded open-space
of 2x2 m instead of a maze
V = average wheel speed, Δv = difference between wheel speeds, i = value of most active proximity sensor
[Pugh J., EPFL PhD Thesis No. 4256, 2008]16
Baseline Experiment: Extended-Time Adaptation
• Compare the basic algorithms with their corresponding noise-resistant version
• Population size 100, 100 iterations, evaluation span 300 seconds (150 s for noise-resistant algorithms)→ 34.7 days
• Fair test: same total evaluation time for all the algorithms
• Realistic simulation (Webots)• Best evolved solutions averaged
over 30 runs• Best candidate solution in the
final pool selected based on 5 runs of 30 s each; performance tested over 40 runs of 30s each
• Similar performance for all algorithms[Pugh J., EPFL PhD Thesis No. 4256, 2008]
17
18
Where Can Noise-Resistant Algorithms Make the Difference?
• Limited adaptation time • Hybrid adaptation (simulation/hardware in the loop)• Large amount of noise
Notes:• all examples from shaping obstacle avoidance behavior• best learned/evolved solution averaged over multiple runs• fair tests: same total amount of evaluation time for all the
different algorithms (standard and noise-resistant)
Limited-Time Adaptation Trade-Offs
• 1 robot, 24 parameters• Total adaptation time =
8.3 hours (1/100 of previous learning time)
• Trade-offs: population size, number of iterations, evaluation span
• Realistic simulation (Webots)Varying population size vs.
number of iterations
Good with small populations
[Pugh J., EPFL PhD Thesis No. 4256, 2008]
No advantage
19
Hybrid Adaptation with Real Robots• Move from realistic simulation (Webots) to real
robots after 90% learning (even faster evolution)• Compromise between time and accuracy• Noise-resistance helps manage transition
[Pugh J., EPFL PhD Thesis No. 4256, 2008] 20
Hybrid Adaptation vs. Only Real Robots
• Noise-resistant PSO• Hybrid: 30 iterations in
simulation, then 30 iterations on real robots
• Achieves similar fitness as running 60 iterations on real robots
• Requires half the real robot evaluation time
[Di Mario and Martinoli, Robotica, 2014] 21
Increasing Noise Level – Set-Up
• Scenario 1: One robot learning obstacle avoidance
• Scenario 2: One robot learning obstacle avoidance, one robot running pre-evolved obstacle avoidance
• Scenario 3: Two robots co-learning obstacle avoidance
Idea: more robots more noise (as perceived from an individual robot) because there is no explicit communication between the robots, but in scenario 3 there is information sharing through the population manager.
1x1 m arena, PSO, 50th iteration, scenario 3
[Pugh et al, SIS 2005] 22
Increasing Noise Level – Sample Results
[Pugh et al, SIS 2005] 23
Why Noise-Resistant Algorithms Make the Difference?
Standard PSO vs. A-Posteriori evaluations
PSO gbestavg of 1000 eval
[Di Mario et al., CEC 2014]
24
Why Noise-Resistant Algorithms Make the Difference?
Noise-Resistant PSO vs. A-Posteriori evaluations
PSO gbestavg of 1000 eval
[Di Mario et al., CEC 2014]
25
From Single to Multi-Unit Systems:Co-Adaptation in a
Shared World
26
Adaptation in Multi-Robot Scenarios
• Collective: fitness become noisy due to partial perception, independent parallel actions
27
Credit Assignment ProblemWith limited communication, no communication at all, or partial perception:
• A robot cannot distinguish between the environmentalmodifications caused by its own actions from thosegenerated by others.
• Punishments and rewards are likely to be inconsistent. 28
Co-Adaptation in a Collaborative Framework
29
Co-Shaping Collaborative Behavior
Three orthogonal axes to consider (extremities and balanced solutions are possible):
1. Performance evaluation: individual vs. group fitness or reinforcement
2. Solution sharing: private vs. public policies
3. Team diversity: homogeneous (identical controller and hardware) vs. heterogeneous learning
30
Policy Performance Sharing Diversity
i-pr-he individual private heterogeneous
i-pr-ho individual private homogeneous
i-pu-he individual public heterogeneous
i-pu-ho individual public homogeneous
g-pr-he group private heterogeneous
g-pr-ho group private homogeneous
g-pu-he group public heterogeneous
g-pu-ho group public homogeneous
Do not make sense (inconsistent)
Interesting (consistent)
Possible but not scalable
Search Algorithms for MR Systems
31
Example of collaborative co-learning with binary encoding of 100 candidate solutions and 2 robots
Population-Based Search Algorithmsfor Multi-Robot Systems
32
Stick-Pulling Case Study: Homogeneous Learning
33
• See W 10 lecture• Optimization of a single GTP for the whole team
Viable for exploring heterogeneous solutions
Not scalable
Heterogeneity allowed but eventually roughly homogeneous solution via shuffle around of candidate solutions
Homogeneity enforced
34
• See W 10 lecture• Learning to specialize the team members (multiple GTPs)
Stick-Pulling Case Study: Heterogeneous Learning
Co-Adaptation for Obstacle Avoidance
35
Population-Based Search Algorithms for Multi-Robot Systems
36
Distributed Robotic Adaptation• Standard approach: evaluate candidate solutions on
robots but centralize population manager• New approach: distributed also the population manager
on the robots and share population management through communication channels
• Currently: synchronization at the end of an iteration/generation
• Why PSO:– appears interesting since this metaheuristic works well with
small pools of candidate solutions: candidate pool size ≈ robot team size
– limited particle neighborhood sizes → scalable, on-board operation 37
Varying the Robotic Group Size
38
Varying the Robotic Group Size
39
Varying the Robotic Group Size
40
Varying the Robotic Group Size
41
• Same control architecture as [Floreano & Mondada, 1996] (ANN, 24 weights to tune, Khepera III has 9 proximity sensors)
• Same fitness function as [Floreano & Mondada, 1996]• Similar Webots world as [Pugh et al., 2005] but 3x3 m• Robot group size: 1, 2, 5, 10• PSO parameters
– Swarm size: 10– pw = nw = 2.0– w = 0.6
Varying the RoboticGroup Size
[Pugh and Martinoli, Swarm Intelligence J., 2009] 42
Varying the Robotic Group Size –Learning vs. Testing Environment
• Gradually increase number of robots on team
• Up to 10x faster learning with little performance loss
• Arena 3x3 m
[Pugh and Martinoli, Swarm Intelligence J., 2009]
Learned in a group of 10 robots (10x faster), final evaluation as single robotLearned as single
robot, final evaluation as single robot 43
Distributed Adaptation withReal Robots (Pugh, 2008)
Before adaptation (5x speed-up) 44
Distributed Adaptation withReal Robots (Pugh, 2008)
After adaptation (5x speed-up) 45
Increasing Number of Robots: Impact of Noise Resistance
• Webots experiments • 1x1 m arena (high density!)• Fair test: same amount of total
evaluation time for each bar• Performance decreases with
number of robots (more difficult to avoid in overcrowded arenas)
• Noise-resistance make the difference in high density (i.e. noisier) scenarios
[Di Mario and Martinoli, Robotica, 2014]46
Impact of Limited Time Adaptation• Webots experiments • 1x1 m arena (high density!)• full-time adaptation: 417 h • limited time adaptation: 8h• 52 times smaller evaluation time, 17% max drop in performance
• same obstacle avoidance strategy
Recipe:1. Evaluation span include at
least 1 interaction2. Swarm size = dimension of
parameter space3. Use noise-resistant algorithms4. Dedicate max time budget to
iterations
[Di Mario and Martinoli, Robotica, 2014]
47
Co-Adaptation for Coordinated Motion
48
We call “swarm-bot” an artifact composed of a number of simpler robots, called “s-bots”, capable of self-assembling and self-organizing to adapt to its environmentS-bots can connect to and disconnect from eachother to self-assemble and form structures whenneeded, and disband at will
The SWARM-BOTS project (2001-2005)http://www.swarm-bots.org
49
The coordinated motion task
• Four s-bots are connected in a swarm-bot formation
• Their chassis are randomly oriented
• The s-bots should be able to – collectively choose a direction
of motion – move as far as possible
50
Coordinated motion:The traction sensor
• Connected s-bots apply pulling/pushingforces to each other when moving
• Each s-bot can measure a traction forceacting on its turret/chassis connection
• The traction force indicates the mismatchbetween– the average direction of motion of the group– the desired direction of motion of the single
s-bot• Simple perceptrons are evolved as
controllers
4 traction sensors
2 motors
51
Coordinated motion:The evolutionary algorithm
• Binary encoded genotype– 8 bits per real valued parameter of the neural controllers
• Generational evolutionary algorithm– 100 individual evolved for 100 generations– 20 best individual are allowed to reproduce in each
generation– Mutation (3% per bit) is applied to the offspring
• The perceptron is cloned and downloaded to each s-bot
• Fitness is evaluated looking at the swarm-botperformance– Each individual is evaluated with equal starting
conditions52
Population-Based Search Algorithms for Multi-Robot Systems
53
Coordinated motion:Fitness evaluation
• The fitness F of a genotype is given by the distance covered by the group:
where X(t) is the coordinate vector of the center of mass at time t, and D is the maximum distance that can be covered in 150 simulation cycles
• Fitness is evaluated 5 times (fixed number per candidate solution!), starting from different random initializations
• The resulting average is assigned to the genotype54
Coordinated motion: Results
Replication Performance1 0.878882 0.839593 0.883384 0.715675 0.795736 0.752097 0.834258 0.858489 0.87222
10 0.76111
Average fitness
Post-evaluation
55
Coordinated motion: Real s-bots
flexibilitydefault (used for evolution)
56
Coordinated motion: Scalability
flexibility and scalabilityscalability
57
Conclusion
58
Take Home Messages• Machine-learning techniques (population-based and hill-climbing
algorithms) can be used for design and optimization of software and hardware features in multi-robot settings
• The cost of an optimization problem is heavily influenced by the amount of noise in the evaluation function, the time needed for evaluating a candidate solution, and the dimension of the parameter space
• Collaborative co-adaptation strategies can be differentiated along three axes: public/private solutions; homogeneous/heterogeneous system, individual/group performance
• Multi-robot platforms can be exploited for testing in parallel multiple candidate solutions
• One way to bypass the credit assignment problem in multi-robot contexts is to enforce homogeneity and reward group performance
• PSO appears to be well suited for fully distributed on-board operation and fairly robust to small pools of candidate solutions 59
Additional Literature – Week 11Books• T. Balch and L. E. Parker (Eds.), Robot teams: From diversity to polymorphism.
Natick, MA: A K Peters.
Papers• Zhang Y., Antonsson E. K., and Martinoli A., “Evolutionary Engineering
Design Synthesis of On-Board Traffic Monitoring Sensors”. Research in Engineering Design, 19(2-3): 113-125, 2008.
• Pugh J. and Martinoli A., “Multi-Robot Learning with Particle Swarm Optimization”. Proc. of the Fifth ACM Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems, May 2006, Hakodate, Japan, pp. 441–448.
• Dorigo M., Trianni V., Sahin E., Groß R., Labella T., Nolfi S., Baldassare G., Deneubourg J.-L., Mondada F., Floreano D., and Gambardella L.. “Evolving Self-organising Behaviours for a Swarm-bot”. Autonomous Robots, 17:223–245, 2004
• Murciano A. and Millán J. del R., "Specialization in Multi-Agent Systems Through Learning". Biological Cybernetics, 76: 375-382, 1997.
• Mataric, M. J. “Learning in behavior-based multi-robot systems: Policies, models, and other agents”. Special Issue on Multi-disciplinary studies of multi-agent learning, Ron Sun, editor, Cognitive Systems Research, 2(1):81-93, 2001.
60