Post on 12-Jan-2015
description
transcript
Robot Swarms as Ensembles ofCooperating Components
Matthias HölzlWith contributions from Martin Wirsing, Annabelle Klarl
AWASSLucca, June 24, 2013
www.ascens-ist.eu
The Task
Robots cleaning an exhibition area
Matthias Hölzl 2
marXbot
Miniature mobile robot developed by EPFLRough-terrain mobilityRobots can dock to other robotsMany sensors
Proximity sensorsGyroscope3D accelerometerRFID readerCameras. . .
ARM-based Linux systemGripper for picking up items
Matthias Hölzl 3
Swarm Robotics
Matthias Hölzl 4
Problems
Noise, sensor resolutionExtracting information from sensor dataUnforeseen situationsUncertainty about the environmentPerforming complex actions when intermediateresults are uncertain. . .
Matthias Hölzl 5
Action Logics
Logics that can represent change over timeProbabilistic behavior can be modeled (but is cumbersome)
Matthias Hölzl 6
Markov Decision Processes
pos = (x,y)
pos = (x,y+1) pos = (x+1,y+1)
pos = (x + 1,y)e / ...
s, n / ...
s / 0.9 / -0.1e,w / 0.025 / -0.1
w / ...s,n / ...
e / ...s, n / ...
w/ ...s,n / ...
n / 0.9 / -0.1e, w / 0.025 / -0.1
s,n,w,e / 0.05 / -0.1
Matthias Hölzl 7
Markov Decision Processes
watchTV
goToClub
DecideActivity
In Club
Watching TV
Dancing Alone
Drinking
Dancing With
Partner
dance p = 0.5
dance p = 0.5
drinkBeer
Oh, oh
Oh, no
flirt
p = 0.05
p = 0.95
Matthias Hölzl 8
MDPs: Strategies
watchTV
goToClub
DecideActivity
In Club
Watching TV
Dancing Alone
Drinking
Dancing With
Partner
dance p = 0.5
dance p = 0.5
drinkBeer
Oh, oh
Oh, no
flirt
p = 0.05
p = 0.95
State TV CDB CDFDA watchTV goToClub goToClubIC drinkBeer drinkBeer danceDWP flirt flirt flirtUtility 0.1 −0.05(∗) −1.975(∗)
(∗) 0.05+ (−0.1)(∗∗) 0.05+ (0.5 × 0.2) + 0.5 × (0.25+ (0.05 × 5) + (0.95 × −5))
Matthias Hölzl 9
Reinforcement Learning
General idea:Figure out the expectedvalue of each actionin each statePick the action withthe highest expectedvalue (most of thetime)Update the expectationsaccording to the actualrewards
Matthias Hölzl 10
How well does this work?
Rather well for small problemsBut: state explosion
Matthias Hölzl 11
Solutions
DecompositionHierarchyPartial programs
Matthias Hölzl 12
POEM
Action languageFirst-order reasoningHierarchical reinforcement learning
Learns completions for partial programsConcurrency
Reflection / meta-object protocols· · ·
Matthias Hölzl 13
Iliad: A POEM Implementation
Common Lisp-based programming languageFull first-order reasoning
Operations on logical theories: U(C)NA, domain closure, . . .Resolution, hyperresolution, DPLL, etc.Conditional answer extractionProcedural attachment, constraint solving
Hierarchical reinforcement learningBased on Concurrent ALispPartial programsThreadwise and temporal state abstractionHierarchically Optimal Yet Local (HOLY) Q-learningHierarchically Optimal Recursively Decomposed (HORD) Q-learning
Matthias Hölzl 14
Planned Contents
Introduction to CALisp/PoemSimple TD-learning: banditsFlat reinforcement learning: navigationHierarchical reinforcement learning:collecting items individuallyThreadwise decomposition for hierarchicalreinforcement learning: learning collaboratively
Matthias Hölzl 15
n-armed Bandits
S
search / 1.0 / N(0.1, 1.0)
coll-known / 1.0 / N(0.3, 3.0)
Choice between n actionsReward depends probabilisticly onthe action choiceNo long-term consequencesSimplest form of TD-learning
Matthias Hölzl 16
Flat Learning
XXXXXXXXXXXXXXXXXXXXXXXX Target: (0 0)XXTT XX XXXX XXXXXXXXXX XXXX Choices: (N E S W)XXXXXX XX XX XX Q-values:XX XX XXXXXXXX XX #((N (Q -1.8))XX XXXXXX XX XX (E (Q -1.8))XX XX XXXXXXXX (S (Q -2.25))XXXXXXXXXX XX XX (W (Q 2.76)))XX XX XXXX XX Recommended choice is WXX XX XX XX XX XXXX XX RR XXXXXXXXXXXXXXXXXXXXXXXXXX
Matthias Hölzl 17
Flat Learning
(defun simple-robot ()(call (nav (target-loc (robot-env)))))
(defun nav (loc)(until (equal (robot-loc) loc)
(with-choice navigate-choice (dir ’(N E S W))(action navigate-move dir))))
Matthias Hölzl 18
Hierarchical Learning
(defun waste-removal ()(loop
(choose choose-waste-removal-action(action sleep ’SLEEP)(call (pickup-waste))(call (drop-waste)))))
(defun pickup-waste ()(call (nav (waste-source)))(action pickup-waste ’PICKUP))
Matthias Hölzl 19
Multi-Robot Learning
XXXXXXXXXXXXXXXXXXXXXXXXXXTT XX RR XXXX XX XXXXXXXX XX XXXX XX RW RR XXXX XXXX RR WW XXXX XXXX RR XXXX XXXX XXXXXXXXXXXXXXXXXXXXXXXXXX
Matthias Hölzl 20