Distributed Intelligent Systems – W11 Machine-Learning ... · • Challenges in multi-robot...

transcript

Distributed Intelligent Systems – W11Machine-Learning Methods

Applied to Distributed Robotic Systems

Outline

• Expensive optimization problems– Noise resistance– Evaluation time

• Challenges in multi-robot scenarios– Credit assignment problems– Co-adaptation strategies

• Co-adaptation examples– Co-learning obstacle avoidance– Co-evolving coordinated motion

Expensive Optimization and Noise Resistance

Expensive Optimization Problems

1. Time for evaluation of candidate solutions (e.g., tens of seconds) >> time for application of metaheuristic operators (e.g., milliseconds)

2. Noisy performance evaluations disrupt the adaptation process and require multiple evaluations for actual performance

Two fundamental reasons making robot control design and optimization expensive in terms of time:

Expensive Optimization Problems1. Time for evaluation of candidate >> time for

application of metaheuristic operators

• Example: obstacle avoidance• Robots need to encounter

obstacles to learn to avoid them

• Evaluation span 20-60 s depending on size of the arena

• Current processors can execute several million instructions in that time (e.g. ARM Cortex-A9 ~5000 MIPS)

[Di Mario and Martinoli, Robotica, 2014] 5

Expensive Optimization Problems2. Noisy performance evaluations disrupt the adaptation process and require multiple evaluations for actual performance

fitness

• Multiple evaluations at the same point in the search space yield different results

• Example: fitness distribution for obstacle avoidance

• Noise from: sensors, actuators, initial conditions, other robots.

• Noise causes decreased convergence speed and residual error

Reducing Evaluation TimeGeneral recipe: exploit more abstracted, calibrated representations (models and simulation tools)

See also multi-level modeling lectures

Dealing with Noisy Evaluations• Better information about candidate solution can be obtained by

combining multiple noisy evaluations• We could evaluate systematically each candidate solution for a

fixed number of times → not efficient from a computational perspective

• We want to dedicate more computational time to evaluate promising solutions and eliminate as quickly as possible the “lucky” ones

• Idea: re-evaluate and aggregate → each candidate solution might have been evaluated a different number of times → compare the aggregated value

• In GA good and robust candidate solutions survive over generations; in PSO they survive in the individual memory

• Use dedicated functions for aggregating multiple evaluations: e.g., minimum and average or generalized aggregation functions (e.g., quasi-linear weighted means), perhaps combined with a statistical test for comparing resulting aggregated performances

GA PSO

Testing Noise-Resistant on Benchmarks

• Benchmark 1 : Sphere and Generalized Rosenbrockfunctions– 30 real parameters [Pugh et al., SIS 2005]– 24 real parameters [Di Mario et al., CEC 2014]– Minimize objective function– Expensive only because of noise

• Benchmark 2: obstacle avoidance on a robot– 24 real parameters– Maximize objective function– Expensive because of noise and evaluation time

Biased results!

Benchmark 1: Gaussian Additive Noise on Generalized Rosenbrock

Fair test: samenumber of evaluations candidate solutions for all algorithms (i.e. N generations/ iterations of standard versions compared with N/2 of the noise-resistant ones)

[Pugh et al., SIS 2005]

Biased results: low number of runs (20) and population size (20) < search dimension (30)!

Benchmark 1: Functions

• Sphere

• Rosenbrock

• Normalized and bounded to [0, 1]

• Gaussian noise model

• Bernoulli noise model-2

Sphere

Rosenbrock

[Di Mario et al., CEC 2014]

Rosenbrock with Gaussian Noise: Increasing σ

σ = 0 σ = 0.01

σ = 0.05 σ = 0.1

Increasing Population Size Does Not Help

Bernoulli Noise: Positive and Negative Amplitudes

Benchmark 2:Obstacle Avoidance on a Mobile Robot

• Similar to [Floreano and Mondada 1996]– Discrete-time, single-layer, artificial

recurrent neural network controller– Shaping of neural weights and biases

(24 real parameters)– fitness function: rewards speed,

straight movement, avoiding obstacles

• Different from [Floreano and Mondada 1996]– Environment: bounded open-space

of 2x2 m instead of a maze

V = average wheel speed, Δv = difference between wheel speeds, i = value of most active proximity sensor

[Pugh J., EPFL PhD Thesis No. 4256, 2008]16

Baseline Experiment: Extended-Time Adaptation

• Compare the basic algorithms with their corresponding noise-resistant version

• Population size 100, 100 iterations, evaluation span 300 seconds (150 s for noise-resistant algorithms)→ 34.7 days

• Fair test: same total evaluation time for all the algorithms

• Realistic simulation (Webots)• Best evolved solutions averaged

over 30 runs• Best candidate solution in the

final pool selected based on 5 runs of 30 s each; performance tested over 40 runs of 30s each

• Similar performance for all algorithms[Pugh J., EPFL PhD Thesis No. 4256, 2008]

Where Can Noise-Resistant Algorithms Make the Difference?

• Limited adaptation time • Hybrid adaptation (simulation/hardware in the loop)• Large amount of noise

Notes:• all examples from shaping obstacle avoidance behavior• best learned/evolved solution averaged over multiple runs• fair tests: same total amount of evaluation time for all the

different algorithms (standard and noise-resistant)

Limited-Time Adaptation Trade-Offs

• 1 robot, 24 parameters• Total adaptation time =

8.3 hours (1/100 of previous learning time)

• Trade-offs: population size, number of iterations, evaluation span

• Realistic simulation (Webots)Varying population size vs.

number of iterations

Good with small populations

[Pugh J., EPFL PhD Thesis No. 4256, 2008]

No advantage

Hybrid Adaptation with Real Robots• Move from realistic simulation (Webots) to real

robots after 90% learning (even faster evolution)• Compromise between time and accuracy• Noise-resistance helps manage transition

[Pugh J., EPFL PhD Thesis No. 4256, 2008] 20

Hybrid Adaptation vs. Only Real Robots

• Noise-resistant PSO• Hybrid: 30 iterations in

simulation, then 30 iterations on real robots

• Achieves similar fitness as running 60 iterations on real robots

• Requires half the real robot evaluation time

Increasing Noise Level – Set-Up

• Scenario 1: One robot learning obstacle avoidance

• Scenario 2: One robot learning obstacle avoidance, one robot running pre-evolved obstacle avoidance

• Scenario 3: Two robots co-learning obstacle avoidance

Idea: more robots more noise (as perceived from an individual robot) because there is no explicit communication between the robots, but in scenario 3 there is information sharing through the population manager.

1x1 m arena, PSO, 50th iteration, scenario 3

[Pugh et al, SIS 2005] 22

Increasing Noise Level – Sample Results

[Pugh et al, SIS 2005] 23

Why Noise-Resistant Algorithms Make the Difference?

Standard PSO vs. A-Posteriori evaluations

PSO gbestavg of 1000 eval

Why Noise-Resistant Algorithms Make the Difference?

Noise-Resistant PSO vs. A-Posteriori evaluations

PSO gbestavg of 1000 eval

From Single to Multi-Unit Systems:Co-Adaptation in a

Shared World

Adaptation in Multi-Robot Scenarios

• Collective: fitness become noisy due to partial perception, independent parallel actions

Credit Assignment ProblemWith limited communication, no communication at all, or partial perception:

• A robot cannot distinguish between the environmentalmodifications caused by its own actions from thosegenerated by others.

• Punishments and rewards are likely to be inconsistent. 28

Co-Adaptation in a Collaborative Framework

Co-Shaping Collaborative Behavior

Three orthogonal axes to consider (extremities and balanced solutions are possible):

1. Performance evaluation: individual vs. group fitness or reinforcement

2. Solution sharing: private vs. public policies

3. Team diversity: homogeneous (identical controller and hardware) vs. heterogeneous learning

Policy Performance Sharing Diversity

i-pr-he individual private heterogeneous

i-pr-ho individual private homogeneous

i-pu-he individual public heterogeneous

i-pu-ho individual public homogeneous

g-pr-he group private heterogeneous

g-pr-ho group private homogeneous

g-pu-he group public heterogeneous

g-pu-ho group public homogeneous

Do not make sense (inconsistent)

Interesting (consistent)

Possible but not scalable

Search Algorithms for MR Systems

Example of collaborative co-learning with binary encoding of 100 candidate solutions and 2 robots

Population-Based Search Algorithmsfor Multi-Robot Systems

Stick-Pulling Case Study: Homogeneous Learning

• See W 10 lecture• Optimization of a single GTP for the whole team

Viable for exploring heterogeneous solutions

Not scalable

Heterogeneity allowed but eventually roughly homogeneous solution via shuffle around of candidate solutions

Homogeneity enforced

• See W 10 lecture• Learning to specialize the team members (multiple GTPs)

Stick-Pulling Case Study: Heterogeneous Learning

Co-Adaptation for Obstacle Avoidance

Population-Based Search Algorithms for Multi-Robot Systems

Distributed Robotic Adaptation• Standard approach: evaluate candidate solutions on

robots but centralize population manager• New approach: distributed also the population manager

on the robots and share population management through communication channels

• Currently: synchronization at the end of an iteration/generation

• Why PSO:– appears interesting since this metaheuristic works well with

small pools of candidate solutions: candidate pool size ≈ robot team size

– limited particle neighborhood sizes → scalable, on-board operation 37

Varying the Robotic Group Size

• Same control architecture as [Floreano & Mondada, 1996] (ANN, 24 weights to tune, Khepera III has 9 proximity sensors)

• Same fitness function as [Floreano & Mondada, 1996]• Similar Webots world as [Pugh et al., 2005] but 3x3 m• Robot group size: 1, 2, 5, 10• PSO parameters

– Swarm size: 10– pw = nw = 2.0– w = 0.6

Varying the RoboticGroup Size

[Pugh and Martinoli, Swarm Intelligence J., 2009] 42

Varying the Robotic Group Size –Learning vs. Testing Environment

• Gradually increase number of robots on team

• Up to 10x faster learning with little performance loss

• Arena 3x3 m

[Pugh and Martinoli, Swarm Intelligence J., 2009]

Learned in a group of 10 robots (10x faster), final evaluation as single robotLearned as single

robot, final evaluation as single robot 43

Distributed Adaptation withReal Robots (Pugh, 2008)

Before adaptation (5x speed-up) 44

Distributed Adaptation withReal Robots (Pugh, 2008)

After adaptation (5x speed-up) 45

Increasing Number of Robots: Impact of Noise Resistance

• Webots experiments • 1x1 m arena (high density!)• Fair test: same amount of total

evaluation time for each bar• Performance decreases with

number of robots (more difficult to avoid in overcrowded arenas)

• Noise-resistance make the difference in high density (i.e. noisier) scenarios

[Di Mario and Martinoli, Robotica, 2014]46

Impact of Limited Time Adaptation• Webots experiments • 1x1 m arena (high density!)• full-time adaptation: 417 h • limited time adaptation: 8h• 52 times smaller evaluation time, 17% max drop in performance

• same obstacle avoidance strategy

Recipe:1. Evaluation span include at

least 1 interaction2. Swarm size = dimension of

parameter space3. Use noise-resistant algorithms4. Dedicate max time budget to

iterations

[Di Mario and Martinoli, Robotica, 2014]

Co-Adaptation for Coordinated Motion

We call “swarm-bot” an artifact composed of a number of simpler robots, called “s-bots”, capable of self-assembling and self-organizing to adapt to its environmentS-bots can connect to and disconnect from eachother to self-assemble and form structures whenneeded, and disband at will

The SWARM-BOTS project (2001-2005)http://www.swarm-bots.org

The coordinated motion task

• Four s-bots are connected in a swarm-bot formation

• Their chassis are randomly oriented

• The s-bots should be able to – collectively choose a direction

of motion – move as far as possible

Coordinated motion:The traction sensor

• Connected s-bots apply pulling/pushingforces to each other when moving

• Each s-bot can measure a traction forceacting on its turret/chassis connection

• The traction force indicates the mismatchbetween– the average direction of motion of the group– the desired direction of motion of the single

s-bot• Simple perceptrons are evolved as

controllers

4 traction sensors

2 motors

Coordinated motion:The evolutionary algorithm

• Binary encoded genotype– 8 bits per real valued parameter of the neural controllers

• Generational evolutionary algorithm– 100 individual evolved for 100 generations– 20 best individual are allowed to reproduce in each

generation– Mutation (3% per bit) is applied to the offspring

• The perceptron is cloned and downloaded to each s-bot

• Fitness is evaluated looking at the swarm-botperformance– Each individual is evaluated with equal starting

conditions52

Population-Based Search Algorithms for Multi-Robot Systems

Coordinated motion:Fitness evaluation

• The fitness F of a genotype is given by the distance covered by the group:

where X(t) is the coordinate vector of the center of mass at time t, and D is the maximum distance that can be covered in 150 simulation cycles

• Fitness is evaluated 5 times (fixed number per candidate solution!), starting from different random initializations

• The resulting average is assigned to the genotype54

Coordinated motion: Results

Replication Performance1 0.878882 0.839593 0.883384 0.715675 0.795736 0.752097 0.834258 0.858489 0.87222

10 0.76111

Average fitness

Post-evaluation

Coordinated motion: Real s-bots

flexibilitydefault (used for evolution)

Coordinated motion: Scalability

flexibility and scalabilityscalability

Conclusion

Take Home Messages• Machine-learning techniques (population-based and hill-climbing

algorithms) can be used for design and optimization of software and hardware features in multi-robot settings

• The cost of an optimization problem is heavily influenced by the amount of noise in the evaluation function, the time needed for evaluating a candidate solution, and the dimension of the parameter space

• Collaborative co-adaptation strategies can be differentiated along three axes: public/private solutions; homogeneous/heterogeneous system, individual/group performance

• Multi-robot platforms can be exploited for testing in parallel multiple candidate solutions

• One way to bypass the credit assignment problem in multi-robot contexts is to enforce homogeneity and reward group performance

• PSO appears to be well suited for fully distributed on-board operation and fairly robust to small pools of candidate solutions 59

Additional Literature – Week 11Books• T. Balch and L. E. Parker (Eds.), Robot teams: From diversity to polymorphism.

Natick, MA: A K Peters.

Papers• Zhang Y., Antonsson E. K., and Martinoli A., “Evolutionary Engineering

Design Synthesis of On-Board Traffic Monitoring Sensors”. Research in Engineering Design, 19(2-3): 113-125, 2008.

• Pugh J. and Martinoli A., “Multi-Robot Learning with Particle Swarm Optimization”. Proc. of the Fifth ACM Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems, May 2006, Hakodate, Japan, pp. 441–448.

• Dorigo M., Trianni V., Sahin E., Groß R., Labella T., Nolfi S., Baldassare G., Deneubourg J.-L., Mondada F., Floreano D., and Gambardella L.. “Evolving Self-organising Behaviours for a Swarm-bot”. Autonomous Robots, 17:223–245, 2004

• Murciano A. and Millán J. del R., "Specialization in Multi-Agent Systems Through Learning". Biological Cybernetics, 76: 375-382, 1997.

• Mataric, M. J. “Learning in behavior-based multi-robot systems: Policies, models, and other agents”. Special Issue on Multi-disciplinary studies of multi-agent learning, Ron Sun, editor, Cognitive Systems Research, 2(1):81-93, 2001.

Distributed Intelligent Systems – W11 Machine-Learning ... · • Challenges in multi-robot...

Documents