BIBAIST–2001-32115
Bayesian Inspired Brain and Artefacts!:Using probabilistic logic to understand brain function
and implement life-like behavioural co-ordination
Summer School Proceedings
Deliverable: 5Workpackage: 6Month due: July 2002Contract Start Date: 01.11.2001 Duration: 48 months
Project Co-ordinator: INRIA–UMR-GRAVIR
Partners: CNRS–UMR-GRAVIR; UCL-ARGM; UCAM-DPOL;CNRS-UMR-LPPA; CDF-UMR-LPPA; EPFL!; MIT-NSL
Project funded by the EuropeanCommunity under the “InformationSociety Technologies” Programme (1998-2002)
BIBA SUMMER SCHOOL
30th June – 5th July
Ecole Cantonale d'Agriculture de Grange-Verney –Moudon - Switzerland
Contents
1 Participants
2 Program
3 Sessions
Annex - Presentations
1 Participants
Autonomous Systems Lab (ASL) - Ecole polytechnique fédérale de Lausanne (EPFL)http://dmtwww.epfl.ch/isr/asl/Roland SIEGWART, ProfessorDr. Nicola TOMATIS, EngineerGuy RAMEL, PhD studentSantanu METIA, PhD student
Laplace Group - Institut National de Recherche en Informatique et Automatique(INRIA)http://www-laplace.imag.frPierre BESSIERE, ResearcherEmmanuel MAZER, Research DirectorOlivier AYCARD, Assistant ProfessorHubert ALTHUSER, Research EngineerOlivier LEBELTEL, Research EngineerKamel MEKHNACHA, Research EngineerChristophe COUE, PhD studentJulien DIARD, PhD studentCarla KOIKE, PhD studentCédric PRADALIER, PhD studentFrancis COLAS, GraduateRonan LE HY, GraduateFrédéric RASPAIL, GraduateAdriana TAPUS, GraduateOlivier MALRAIT, Administrative
Laboratoire de Physiologie pour la Perception et l'Action (LPPA) - Collège de Francehttp://www.college-de-france.fr/chaires/chaire3/index.htmAlain BERTHOZ, ProfessorFrédéric DAVESNE, PostDocJean LAURENS, Graduate
2 Program
Monday 1st July
9h00 – 12h30: Session 1 & 4Session 1 - Biologically plausible probabilistic representation and inference mechanisms
Session 4 - Visual-vestibular interaction in self motion perception and gaze stabilization
14h00 – end of afternoon: Session 3 & PMCSession 3 - Switching vs Weighing & Sensor selection based (in particular) on contractiontheory
Project Management Committee
Tuesday 2nd July
9h00 – 13h00: Session 2Session 2 - Representation of space, maps and navigation
14h00 – end of afternoon: Sessions 5 & 6Session 5 - Bayesian Robot programming and modelling
Session 6 - Learning Issues Discussion
Evening: Robots Exhibition Visit
Wednesday 3rd July
9h00 – 12h30: Robot Specification & General Discussion
Afternoon: End
3 Sessions1 - Biologically plausible probabilistic representation and inferencemechanisms
Session length: 2h (presentations + discussion)
* Analytical and Bayesian Models of Head Direction CellsFrancis Colas, Pierre Bessière, 30 mn
* NeuroKhepera : One Step towards Neural Implementation of Bayesian InferenceJean Laurens, 30 mn
2 - Representation of space, maps and navigation
Session length: 4h (presentations + discussion)
* Bayesian Maps and NavigationJulien Diard, 1h
* Markov Models Use in RoboticsAdriana Tapus, Olivier Aycard, 1h
* Hybrid Mobile Robot Navigation: A Natural Integration of Metric and Topological.Roland Siegwart, Nicola Tomatis, 1h
3 - Switching vs Weighing & Sensor selection based (in particular) oncontraction theory
Session length: 2h!30 (presentations + discussion)
* Some Examples of Visuo-Vestibular Interaction ModelsAlain Berthoz,!1h
* Selection of relevant sensors and setting of new sensory apparatusPierre Bessière, 30 mn
4 - Visual-vestibular interaction in self motion perception and gaze stabilization
Session length: 1h (presentations + discussion)
* Nystagmus and Visuo-Vestibular Interactions ModellingJean Laurens, 30mn
5 - Bayesian Robot programming and modelling
Session length: 2h30 (presentations + discussion)
* Obstacle Avoidance using Proscriptive ProgrammingCédric Pradalier, Carla Koike, 30 mn
* A First Step toward Bayesian Learning by ImitationPierre Bessière, 30mn
* Bayesian Programming of Videogames CharactersRonan Le Hy, Olivier Lebeltel, 30mn
6 – Learning Issues Discussion
Session length: 2h
Introduced and moderated by Fédéric Davesne & Jean Laurens
Annex - Presentations
Session 1 - Biologically plausible probabilistic representation and inference mechanisms
- Analytical and Bayesian Models of Head Direction CellsFrancis Colas Graduate, INRIA
Session 2 - Representation of space, maps and navigation
- Hybrid Robot Mobile Navigation: A natural integration of Metric and TopologicalDr Nicola Tomatis, Prof. Roland Siegwart, EPFL
- Bayesian Maps and NavigationJulien Diard PhD student, INRIA
Session 5 - Bayesian Robot programming and modelling
- Obstacle Avoidance Using Proscriptive ProgrammingCédric Pradalier, Carla Koike PhD students, INRIA
- Apprentissage bayésien par imitationFrédéric Raspail Graduate, INRIA
- Programmation bayésienne de personnages de jeux vidéosRonan Le Hy Graduate, INRIA
Session 6 – Learning Issues Discussion
Introduced and moderated by Jean Laurens PhD student & Frédéric Davesne PostDoc, LPPA
1
1
Analytical and bayesianmodelling of head direction cells
Francis COLAS
Tutor : Pierre Bessière
European project BIBA in collaboration with
L.P.P.A. (Collège de France)
2
Introduction
q?w
2
3
Angle coding
• Head direction cells :– firing rate correlated with angle
– coding for actual or anticipated orientation
Taken from [Arleo00]
4
Brains areas and projections
• Adn (antero-dorsal nucleus) : anticipated head direction (ª25 ms)• Dtn (dorsal tegmental nucleus) : head angular velocity• Lmn (lateral mammillary nucleus) : anticipated orientation (ª95 ms)• Psc (postsubiculum) : present angle
Adapted from [Stackman98]
Dtn(w) Lmn(q)
Adn (q)
Psc (q)
3
5
Contents
• Introduction
• Previous models
• Analytical modelling
• Bayesian modelling
6
Previous models
• First model [McNaughton91]
• Neural implementation [Skaggs95]
• Mathematical framework [Zhang96]
• Adn study [Blair95], [Blair97]
• Use of Lmn for imtegration [Arleo00]
• Modeling attractor deformation in Adn[Goodrige00]
4
7
Constraints
1. Integration2. Respect des projections3. Anticipatory time intervals4. Anatomical lesions5. Uncertainty
8
Contents
• Introduction• Previous models• Analytical modelling
– Functional dependencies– Methodology– Formulas– Tests– Behaviour– Results
• Bayesian modelling
5
9
Functional dependencies
),(
),(
),,(
113
2221
33312
--------
------------
----------------
====
====
====
ttt
ttt
tttt
AdnPscgPsc
LmnPscgAdn
LmnPscgLmn wwww
Psct
Lmnt-3
Psct-3
wt-3
Lmnt-2
Psct-2
Adnt-1
Psct-1
Adn
Dtn Lmn Psc
Constraint 2 :Projections
10
Methodology
Lmnt-2Adnt-1
Psct-3 Psct-2 Psct-1 Psct
wt-3
Lmnt-3
)(tPsct qqqq====)(1 dttPsct ----====---- qqqq
)2(2 dttPsct ----====---- qqqq)3(3 dttPsct ----====---- qqqq)(1 dttAdn At ----++++====---- ttttqqqq
)3(3 dttLmn Lt ----++++====---- ttttqqqq)2(2 dttLmn Lt ----++++====---- ttttqqqq
)3(3 dttt ----====---- wwwwwwww
)(. 211 dtOdtPscPsc ttt ++++++++==== -------- wwww
)(. 2111 AtAtt OPscAdn ttttwwwwtttt ++++++++==== ------------
)(111 A
A
ttt O
PscAdntttt
ttttwwww ++++
----==== --------
----
)( 2111 dtdtO
PscAdndtPscPsc A
A
tttt ++++++++
----++++==== --------
---- tttttttt
6
11
Formulas
)(
)(
2111
222221
dtdtOPscAdn
dtPscPsc
dtOLmndt
Pscdt
Adn
AA
tttt
ALtL
At
L
ALt
++++++++----
++++====
++++++++++++++++
++++--------
====
------------
------------
tttttttt
tttttttttttt
tttt
tttt
tttttttt
t+tL-2dt
qqqq
t-3dt
Psct-3
t+tL-3dt
Lmnt-3
Lmnt-2
wt-3
L
tt PscLmntttt
33 -------- ----
t
q
Lmnt-2
Psct-3
wt-3
Lmnt-3
))(()( 23
3332 dtdtO
dtPscLmndtdtLmnLmn Lt
L
L
L
tt
Ltt ++++++++
----++++
----++++==== ----
---------------- ttttwwww
tttt
tttt
tttttttt
12
Angular velocity
• Tests :– constant velocity
– trapezoidal profile for head angular velocity :
7
13
Behaviour
Results of an execution
Constraint 1 :Integration
14
Results
÷÷÷÷
÷÷÷÷
÷÷÷÷
¥¥¥¥
¥¥¥¥
1. Integration2. Respect des projections3. Anticipatory time intervals4. Anatomical lesions5. Uncertainty
8
15
Contents
• Introduction
• Previous models
• Analytical modelling
• Bayesian modelling– Variables
– Decomposition
– Identification
– Questions
16
Bayesian model : Variables
• Variables :– Psct-3, Psct-2, Psct-1,Psct D=[-180 ; 180[
– Lmnt-3, Lmnt-2 D=[-180 ; 180[
– Adnt-1 D=[-180 ; 180[
– wt-3 Dw=[-600 ; 600]
– tA DA=[-0.010 ; 0.050]
– tL DL=[0.060 ; 0.110]
9
17
Bayesian model : Decomposition
)|(
)|()|(
)()()()()()()(
)(
11
2213332
33123
3231123
Attt
LAtttLtttt
LAttttt
LAtttttttt
PscAdnPscP
PscLmnAdnPPscLmnLmnP
PPPLmnPPscPPscPPscP
LmnLmnAdnPscPscPscPscP
tttt
ttttttttttttwwww
ttttttttwwww
ttttttttwwww
--------
----------------------------
--------------------
----------------------------
¥¥¥¥
¥¥¥¥¥¥¥¥
¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥====
Lmnt-2
Adnt-1
Psct-3 Psct-2 Psct-1 Psct
wt-3
Lmnt-3
tL tA
18
Bayesian model : Distributions
• P(wt-3 ) : uniform ;
• P(tL), P(tA) : gaussians matching biologicaldata ;
• P(Psct-3), P(Psct-2), P(Psct-1), P(Lmnt-3) :gaussians around an initialization value ;
• For Lmn, Adn and Psc : gaussians centeredaround results from the analytical formulas.
10
19
Bayesian model : Program
• Questions :– P(Lmnt-2 | wt-3)
– P(Psct | wt-3)
Lmnt-2Adnt-1
Psct-3 Psct-2 Psct-1 Psct
wt-3
Lmnt-3
tL tA
20
Conclusion
÷÷÷÷
÷÷÷÷
÷÷÷÷
÷÷÷÷
÷÷÷÷
1. Integration2. Respect des projections3. Anticipatory time intervals4. Anatomical lesions5. Uncertainty
11
21
Future work
• Microscopic plausibility of bayesianhypothesis
• Extension to place cells
• Use of vision
† Swiss Federal Institute of Technology, Lausanne, Switzerland‡ The Robotics Institute, CMU, Pittsburgh, USA
Hybrid Mobile Robot Navigation: A Natural Integration of Metric and Topological
BIBA School, Moudon
Nicola Tomatis †
Roland Siegwart †
In collaboration with:Illah Nourbakhsh ‡
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
Contents
• Introduction• Environmental Modeling• Localization and Map Building
• Metric• Topological• Closing the Loop• Switching Model
• Experimental Results• Conclusion and Outlook
Introduction
• Mobile robotics for applications:• Precision with respect to the
environment• Robustness avoiding human
intervention• Practicability with limited embedded
resources• Ergonomics for the user (man-
machine interaction)
Motivation
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
Introduction
• Assumption: theory is available• Goal: theory to practice• Approach:
• Study the literature• Focus on advantages and
disadvantages of existing methods• Propose a more human (bio?) -
inspired approach• Validate it empirically
Motivation
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
Introduction
Related Work: Localization
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
Introduction
Related Work: SLAM
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
Environmental Modeling
The idea:• One global topological map
• Many local metric maps
Metric - Topological
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
• Features• Horizontal lines from laser scanner
• Local metric map containing the features belonging to the same physical place
Environmental Modeling
Metric Model
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
Environmental Modeling
• Features:• Corners• Openings
• Map is a graph• Openings correspond to map states
Topological Model
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
• Map building strategy• Implementation for office environment• Assumption: Precision is needed in
rooms• Navigation in hallways is topological,
in rooms metric• Exploration strategy
• Depth-first search in the hallways first• Then backtracking to visit the rooms
Localization and Map Building
Strategies
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
Localization and Map Building
Metric: The EKF
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
• The product rule:• The Bayesian rule:• The Markov assumption:
• The independence assumption:
• Errors are Gaussian, error propagation is linear
Localization and Map Building
The EKF is Bayesian
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
Localization and Map Building
• Stochastic Map [Smith88]
• Update:• Displacement• New observation• Re-observation
Metric: Stochastic Map
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
• Belief state vector:• State transition:• Observation:• Estimation:
• Control strategy:
• Path-planning: graph based
Localization and Map Building
Topological: POMDP
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
• The “observation graph” permit to detect new features
• Handle environmental dynamics
Localization and Map Building
Topological Map Building
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
Localization and Map Building
Closing the Loop
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
Localization and Map Building
• Topological is multimodal
• Metric is unimodal• EKF initialization!!!
• Confidence function:
Switching Model
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
• Performed with the fully autonomous Donald Duck
• Environment closed for a finite exploration
• Experiments:• Map building• Localization (tracking)• Bootstrapping (global localization)• Closing the loop
Experimental Results
Experiments
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
Experimental Results
Map Building
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
Experimental Results
Localization
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
Experimental Results
Bootstrapping
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
Experimental Results
Closing the Loop
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
• Contribution• Precision: Mean error < 10 mm• Robustness: Multimodality in dynamic
environments• Practicability: PowerPC 604e 300MHz• Closing the loop
• Limitations• Switching from topological to metric
Conclusions and Outlook
Conclusions
BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis
1
Bayesian Maps and Navigation
Julien Diard, Pierre BessièreLaplace Team - Sharp project
Gravir Lab/IMAG, Grenoble
http://www-laplace.imag.fr/
12/06/2002
2
Introduction
• Questions relevant in robotics and biology :– What is navigation?
– What is a location?
– What is a map?
– What is planning?
– What is localization?
2
3
Contents
• Bibliography• Bayesian Robot Programming
– Example and definition– Putting bayesian programs together
• Bayesian maps– Definition and example– Putting bayesian maps together
4
Bibliography : the classicalapproach
• Navigation :– Finding a path for a point in
workspace(-time) ;– control of plan
• applied mathematical problems
• Assumes that– A precise geometric map is available– The locations of the robot and goal point are precisely
known
• These conditions are never met
3
5
Bibliography : probabilisticapproaches
• Markov models– POMDPs, HMMs, Markov Localization, (Extended)
Kalman Filters, Dynamic Bayes Networks, etc.
• Pros– Treat incompleteness and uncertainties
• Cons– Often not hierarchical
– Often associated with geometric, fine grained models
– Impose dependencies and independencies
6
Bibliography : bio-inspiredapproaches
• Kuipers’ Spatial Semantic Hierarchy,among many others
• Pros– Reflexion on the definitions of « localization »
and « maps » : cognitive maps– Hierarchical
• Cons– Various formalisms
• Consistency, communication between modules, …
4
7
Bibliography : analysis
• Use Bayesian Robot Programming– Unified framework for dealing with
incompleteness and uncertainties
– Explicit declaration of assumptions• Emphasis on the semantics of variables
• Does not impose any dependencies
– Modularity allows• Incremental development
• Easy building of hierarchies
8
Contents
• Bibliography• Bayesian Robot Programming
– Example and definition– Putting bayesian programs together
• Bayesian maps– Definition and example– Putting bayesian maps together
5
9
Example : Sensor fusion
• Objective– Find the position of a light
source
• Problem– The robot does not have a dedicated sensor
• Solution– Model of each sensor– Fusion of the eight models
ThetaL
DistL
Lmi
10
ThetaL, DistL, Lmi
Light sensor model (1)
– A priori programming (or learning)
Utilization : inverse questions
Des
crip
tion
Des
crip
tion
Que
stio
nQ
uest
ion
Prog
ram
Prog
ram
Specification
Identification
– Variables
Preliminary knowledge psensor
– DecompositionP |
P | P |
ThetaL DistL Lmi
ThetaL DistL Lmi ThetaL DistL
i Sensor
Sensor i Sensor
Ÿ Ÿ Ÿ( )= Ÿ( ) ¥ Ÿ Ÿ Ÿ( )
d p
p d p
P |
P |
ThetaL Lmi li
DistL Lmi li
i Sensor
i Sensor
=[ ] Ÿ Ÿ( )=[ ] Ÿ Ÿ( )
d p
d p
– Parametric forms
P |
P |
ThetaL DistL
Lmi ThetaL DistL
Sensor
i Sensor
Ÿ( ) ¨
Ÿ Ÿ Ÿ( ) ¨
p
d p
Uniform
Gaussians
0
100
200
300
400
500
0
-90
90
-180
180
ThetaL
0
10
20
30
DistL
Lmi
6
11
Light sensor model (2)P(ThetaL | Lmi )
(Lmi = 15)
P(ThetaL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
-180-135 -90 -45 0 45 90 135 170
P(DistL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
0 5 10 15 20 25
(Lmi = 45)
P(ThetaL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
-180-135 -90 -45 0 45 90 135 170
P(DistL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
0 5 10 15 20 25
(Lmi = 100)
P(ThetaL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
-180-135 -90 -45 0 45 90 135 170
P(DistL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
0 5 10 15 20 25
(Lmi = 200)
P(ThetaL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
-180-135 -90 -45 0 45 90 135 170
P(DistL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
0 5 10 15 20 25
(Lmi = 300)
P(ThetaL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
-180-135 -90 -45 0 45 90 135 170
P(DistL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
0 5 10 15 20 25
(Lmi = 450)
P(ThetaL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
-180-135 -90 -45 0 45 90 135 170
P(DistL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
0 5 10 15 20 25
(Lmi = 475)
P(ThetaL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
-180-135 -90 -45 0 45 90 135 170
P(DistL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
0 5 10 15 20 25
(Lmi = 500)
P(ThetaL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
-180-135 -90 -45 0 45 90 135 170
P(DistL | Lmi Cp_li)
0.00
0.12
0.25
0.37
0.50
0 5 10 15 20 25
P(DistL | Lmi )
12
Sensor Fusion (1)
– No free parameters
Des
crip
tion
Des
crip
tion
Que
stio
nQ
uest
ion
Prog
ram
Prog
ram
Specification
Identification
– Variables
– Decomposition (Conditional Independence Hypothesis)
P ... |
P | P |
ThetaL DistL Lm Lm
ThetaL DistL Lmi ThetaL DistL
Fusion
Fusion Fusion
i
Ÿ Ÿ Ÿ Ÿ( )
= Ÿ( ) ¥ Ÿ Ÿ( )=
’
0 7
0
7
p
p p
Utilization
P | ...ThetaL DistL lm lm FusionŸ Ÿ Ÿ Ÿ( )0 7 p
– Parametrical FormsP |
P | P |
ThetaL DistL
Lmi ThetaL DistL Lmi ThetaL DistL
Fusion
Fusion i Sensor
Ÿ( ) ¨
Ÿ Ÿ( ) ¨ Ÿ Ÿ Ÿ( )
p
p d p
Uniform
ThetaL, DistL, Lm0, …, Lm7
7
13
Sensor Fusion (2)Lm2 = 391 (capteurlum -10° )
P(ThetaL | Lm2 Cp_l2)
0.00
0.12
0.25
0.37
0.50
-180 -90-45 0 45 90 170
Lm3 = 379 (capteurlum 10°)
P(ThetaL | Lm3 Cp_l3)
0.00
0.12
0.25
0.37
0.50
-180 -90-45 0 45 90 170
Lm1 = 480 (capteurlum -50° )
P(ThetaL | Lm1 Cp_l1)
0.00
0.12
0.25
0.37
0.50
-180 -90 -45 0 45 90 170
Lm4 = 430 (capteurlum 50°)
P(ThetaL | Lm4 Cp_l4)
0.00
0.12
0.25
0.37
0.50
-180 -90-45 0 45 90 170
Lm0 = 509 (capteurlum -90° )
P(ThetaL | Lm0 Cp_l0)
0.00
0.12
0.25
0.37
0.50
-180 -90 -45 0 45 90 170
Lm5 = 503 (capteurlum 90°)
P(ThetaL | Lm5 Cp_l5)
0.00
0.12
0.25
0.37
0.50
-180 -90-45 0 45 90 170
Lm7 = 511 (capteurlum -170°)
P(ThetaL | Lm7 Cp_l7)
0.00
0.12
0.25
0.37
0.50
-180 -90-45 0 45 90 170
Lm6 = 511 (capteurlum 170° )
P(ThetaL | Lm6 Cp_l6)
0.00
0.12
0.25
0.37
0.50
-180 -90-45 0 45 90 170
Tetha = 10, Dist = 20
P(ThetaL | Lm0..Lm7 Cp_SourceL)
0.00
0.25
0.50
0.75
1.00
-180 -90-50-1010 50 90 170
14
Bayesian Robot Programming
– P(searched variables | known variables Ÿ p Ÿ d)
• General probabilistic inference engine
Utilization– Learning (with data d) or a priori programming
Prog
ram D
escr
iptio
nQ
uest
ion
Specification
Identification
– Relevant variables X1, …, Xn
• Their range
– Decomposition
• P(X1 Ÿ … Ÿ Xn) as a product of simple terms
• Dependencies
• Conditional independence hypotheses
– Parametric Forms• For all terms
Preliminary Knowledge p
8
15
Contents
• Bibliography• Bayesian Robot Programming
– Example and definition– Putting bayesian programs together
• Bayesian maps– Definition and example– Putting bayesian maps together
16
Putting descriptions together
• Bayesian fusion– Probabilistic subroutine calling
• Bayesian program combination– Probabilistic « if - then - else »
• Bayesian program sequencing– Probabilistic « ; »
• Bayesian program iteration– Probabilistic « loop »
• Using functions
9
17
• Complex behaviour (42 var., 4 hierarch. levels…)
• Space is represented, but no explicit map
P _
P _
P _
.. ..
Vrot Vtranspx0 px7 lm0 lm7 veille feu obj?eng tach_t-1 td_t-1 tempo tour dir prox dirG proxG vtrans_c dnv mnv mld per
td_t - 1 tempo tour
veille feu obj?
eng tach_t - 1
Cp Surveil
Z
Cp TypDépl
Cp
Ê
ËÁ
ˆ
¯˜
=
Ê
ËÁ
ˆ
¯˜
1
TdTach
TachBase
TachTach
Cp DétectBase
Cp SL
Cp Surveil
Ê
Ë
ÁÁ
ˆ
¯
˜˜
Ê
ËÁ
ˆ
¯˜
Ê
Ë
ÁÁÁÁÁÁ
ˆ
¯
˜˜˜˜˜˜
Ê
Ë
ÁÁÁÁÁÁÁÁÁÁ
ˆ
¯
˜˜˜˜˜˜˜˜˜˜
( )
( )
ÂÂ
Â
P _
P _
P _
P
...
...
..
Base
ThetaL DistL
H
H
Base
Tach
DistL
px0 px7
lm0 lm7
lm0 lm7
prox
Vrot Vtrans
Td ThetaL
TdThetaLH
dir prox dirG proxG vtrans_cCp Surveil_
.
Ê
ËÁ
ˆ
¯˜
Ê
Ë
ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
ˆ
¯
˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜
ÂP(ThetaL | C4)
lm0 ... lm7
P(Base | C1)
lm0 ... lm7px0...px7
P(Tach | C2)
veilleengfeuobj?tach_t-1
P(Td | C3)
tourtempotd_t-1
P(H | C5)
P(Vrot Vtrans | C6)
proxdirdirG
proxGvtans_c
Cp_SL
Cp_DétectBase
Cp_Tach
Cp_TypDépl
Cp_Surveil
Cp_PhotoEvit
prox
décision
vrot vtrans
Scaling up : « TheNightwatchman Khepera »
18
« The Nightwatchman Khepera »
10
19
Contents
• Bibliography• Bayesian Robot Programming
– Example and definition– Putting bayesian programs together
• Bayesian maps– Definition and example– Putting bayesian maps together
What is navigation?What is a location?
What is a map?
20
Bayesian Map : definition
– Localization P(Lt | P)
– Prediction P(Lt’ | A Lt)
– Control P(A | Lt Lt’)
– anyProg
ram D
escr
iptio
nQ
uest
ion
Utilization
Specification
Identification
– Relevant variables :
• P : perception variable
• Lt : location at time t
• Lt’ : location at time t’ (t’ > t)
• A : action variable
– Decomposition : any (eg Markov Loc)
– Parametric Forms : any
What is a location?
What is localization?
What is planningbased on?What is navigationbased on?
What is a map?
11
21
Example (1/3) : variables
• Map of a room based on proximeters– P : Px0 Ÿ … Ÿ Px7
• Relevant features : corners, walls, empty space– Lt : Sitt = {corner, wall,
empty-space}– Lt’ : Sitt+Dt
• Motor commands :– A : Beh = {Stop, Straight,
FollowWall, QuitCorner, …}
corner
wall
empty
2D visualization of the bayesian map
What is a location?
22
Example (2/3) : induced graph
• Can be extracted from a bayesian map– Can then compute on it :
• diameter, connexity…
• Can help build a bayesian map– Comes more intuitively
– Helps talking about the
bayesian map
– Nicer than…
Empty Space
FollowW
QuitCStraight
Straight
Stop
Corner�
Wall
« graph form » visualization of the bayesian map
(excerpt)
What is planning?
12
23
Example (3/3) : bayesian maplocalization module
Des
crip
tion
Specification
Identification
– Relevant Variables• Sitt : {wall, corner, empty-space}, 3
• Px0 … Px7 : {0, 1, …, 15}, 16
– Decomposition of the joint• P(Px0 … Px7 Sitt | CPsit)
= P(Sitt | CPsit) Pi P(Pxi | Sitt CPsit)
– Parametric form for each term
• P(Sit | CPsit) Æ Uniform
• P(Pxi | [Sit=empty-space] CPsit) Æ Question P(Pxi | CPempty)
• P(Pxi | [Sit=wall] CPsit)
Æ Question P(Pxi | CPwall) = Â J D P(Pxi | J D CPwall)
• P(Pxi | [Sit=corner] CPsit)
Æ Question P(Pxi | CPcorner) = Â Pos P(Pxi | Pos CPcorner)
24
Contents
• Bibliography• Bayesian Robot Programming
– Example and definition– Putting bayesian programs together
• Bayesian maps– Definition and example– Putting bayesian maps together
13
25
Putting maps together :superposition
corner
wall
empty
VHighHigh
LowVLow
+VH +, -
+, -
+, -
H
L
VL
+
+
-
-
-
+, -
E,VLE,L
E,HE,VH
sup
Cor.
Wall
Emp.VH H L VL
+,-++ +
+,- +,-
---
+,-
+,- -
-
What is navigation?What is a map?
26
Putting maps together :juxtaposition (1)
• Abstracting maps on the same internal variable
Pass corridor
Room3
Room1 Room2
Any otherbehaviour
What is planning?What is a map?
14
27
Putting maps together :juxtaposition (2)
• Other examples :– Maps are floors– connectors are stairways or elevatorsÆ new abstraction is a building
– Maps are buildings and streets– connectors are doors
Æ new abstraction is (the whole) street
28
Putting maps together :abstraction (1)
• Abstracting maps of different natures
• Map of a wall :
– P : Px0 Ÿ … Ÿ Px7
– Lt : Lt’ : q Ÿ Dist
– A : Rot Ÿ TransTheta = -90
Theta
Dist
Theta = 90
Dist = 0
Dist = 1
Dist = 2
q-180 1500�
backward
forward
stop
What is a location?
15
29
« Wall » : localization
– Learning
Des
crip
tion
Specification
Identification
– Relevant Variables
• J : {-180, -150, …, +150}, 12
• D : {0, 1, 2}, 3
• Px0 … Px7 : {0, 1, …, 15}, 16– Decomposition of the joint
– Parametric form for each term
• P(J D)��� Æ Uniform
• P(Pxi | J D) Æ Gaussians
Theta = -90
Theta
Dist
Theta = 90
30
« Wall » : control
Des
crip
tion
Specification
Identification
– Relevant Variables
• J : {-180, -150, …, +150}, 12 ; Dist : {0, 1, 2}, 3
• Beh : {FollowWall, Away-from-wall}, 2
• Vrot, Vtrans
– Decomposition of the joint
• P(J D Beh Vrot Vtrans | CPwall-control)
P(J D Beh | CPwall-control)
P(Vrot Vtrans | J D Beh CPwall-control)– Parametric form for each term
• P(J D Beh | CPwall-control) Æ Uniform
• P(Vrot Vtrans | J D Beh CPwall-control)
Æ G J D Beh (Vrot, Vtrans)
16
31
« Wall » bayesian mapD
escr
iptio
n
Specification
Identification
– Relevant Variables• P : Px0 … Px7 (16 valeurs)
• Lt : J : {-180, -150, …, +150}, 12 ; D : {0, 1, 2}, 3
• Lt’ : Beh : {stop, followW, away-from-wall}, 3
• A : Vrot, Vtrans– Decomposition of the joint
• P(Px0 … Px7 J D Beh | CPwall)
P(Px0 … Px7 Beh | CPwall)
P(J D | Px0 … Px7 CPwall)
P(Vrot Vtrans | J D Beh CPwall)– Parametric form for each term
• P(Px0 … Px15 Beh | CPwall) Æ Uniform
• P(J D | Px0 … Px15 CPwall)
Æ Question P(J D | Px0 … Px15 CPwall-loc)
• P(Vrot Vtrans | J D Beh CPmur)
Æ Question P(Vrot Vtrans | J D CP )
32
Putting maps together :abstraction (2)
• Map of a corner :– Lt : Pos = {FrontL, FrontR, …}
• Map of the empty space
– Lt : ∆
FrontRight
What is a location?
corner
wall
empty
• New abstraction :
17
33
Loose ends
• Small state spaces– Necessary for planning, sufficient for most tasks?
• Explosing Lt into Lt1 Ÿ … Ÿ Ltn
• Planning– Iteration of the map on P(At At+1 … At+h | Lt Lt+h)
• Learning– Given the location variable, learn (parts of) the map : easy– Select a location variable out of several : maybe– Find out a relevant location variable out of the blue : hard!
• Time constant– Estimate it from the graph diameter
34
Conclusion
• New framework for (bayesian) mapping andnavigating– Intuitively appealing– Integrates hierarchies– Unified formalism
• Where’s the killer experiment?• Is it biologically plausible?
– Translation of existing models into our scheme?– Prediction of new data?
Obstacle Avoidance Using Proscriptive Programming
Cycab Vehicle Experiment Description
Cedric Pradalier
Carla Koike
PhD StudentsSharp − GRAVIR/IMAG − CNRS
1
Motivation
" First pratical experiment using bayesian program in Cycab
" Combination of Obstacle Avoidance and other tasks in a hierarchical manner
2
Presentation Objectives
" Show several ways to implement obstacle avoidance
" Show how to fuse proscriptive commands and reference values
" Present one example of the incremental cycle in robotic applications design using Bayesian Programming
3
Contents" Introduction
" Cycab Vehicle Description and Problem Context
" Proposed Solutions:
1. Zone weighting (prescriptive)2. Zone fusion (prescriptive)3. Zone fusion (proscriptive)
" Incremental Design Aspects
" Discussion and Conclusions
4
Proscriptive Programming
" Prescriptive versus Proscriptive
� Prescriptive tells you what you have to do
� Proscriptive tells you what you cannot do
" Some situations are better modelled in a way or other� Phototaxis is tipically prescriptive
� Obstacle avoidance is easily modelled as proscriptive
5
Prescriptive and Proscriptive Programming
6
Proscriptive in Bayesian Programming
" Bayesian Programming is a very useful tool for creating proscriptive models� Permission is given by high probabilities
� Interdictions are modelled as low valued probabilities
" Fusion of different command propositions allows to obtain a trade−off between desired values and allowed values
7
Command Fusion
" High level tasks determine desired (deterministic or probabilistic) values for command variables
" Obstacle avoidance fuses these desired values with the allowed situations when in the presence of obstacles
Probabilistic Fusion
High Level Task
Obstacle Avoidance
8
Contents" Introduction
" Cycab Vehicle Description and Problem Context
" Proposed Solutions:
1. Zone weighting (prescriptive)2. Zone fusion (prescriptive)3. Zone fusion (proscriptive)
" Incremental Design Aspects
" Discussion and Conclusions
9
Cycab Vehicle" Electric Vehicle" Sensor
� Laser Sick
" Vehicle Control
� Wheels speed
� Steering angle
10
Problem Description
" Obstacles are static. If some dynamic obstacles exist, they move at slow speed (walking man or manouvering car)
" The Cycab shall avoid the obstacles, keeping when possible the desired values of translation and steering angle
" We define as highly dangerous elliptic area located in front of the cycab. Any object in this area MUST impose a zero speed.
11
Sensor Variables
" Sensor reading values − Di
� Signal Preprocessing creates 8 zones in front of vehicle
� Only the lowest distance in a zone is taken
� Reading Values between 0~8191 scaled to 0~200 (Obstacle distance between 0 and 2000cm)
Zone 1
Zone 2
Zone 8
D6
12
Motor Variables" Control Values
� Vehicle Translation Speed V is discretized in 6 values " V = 0..5, a discretization of 0..V
max.
" Always positive since we have only a front sensor.
� Steering Angle φ can have 11 values −5..+5, discretization of φ
min..φ
max.
V
Φ
13
Contents" Introduction
" Cycab Vehicle Description and Problem Context
" Proposed Solutions
1. Zone weighting (prescriptive)2. Zone fusion (prescriptive)3. Zone fusion (proscriptive)
" Incremental Design Aspects
" Discussion and Conclusions
14
First Version Zone Weighting
" Each zone proposes probability distributions for the variables V and Φ in order to avoid the nearest obstacle seen in this zone
" A variable H is created to indicate the weight of each zone proposal for the final value of V and Φ
15
Zone Model Description16
Zone Weighting DescriptionD
escr
ipti
on
Prog
ram
17
Program UtilizationFirst Version
18
Zone Weighting ¯ Combination of Descriptions
+
P8(V | D
8)
P([H=8] | D8)
D8
*
P1(V | D
1)
P([H=1] | D1)
D1
*
P(V | D1...D
8)
19
Combination of Descriptions
Weighting
20
Results and CommentsFirst Version
" Results are coherent but depends a lot on the functions f
i(D
i) which are not easy to adjust
21
Second Version − Zone Command Fusion
" Standard deviation values of Pi(V|D
i) and
Pi(φ |D
i) are not taken into account
� They represent redundancy of information with fi(D
i)
" Command Fusion can use the standard deviation values of P
i(V|D
i) and P
i(φ |D
i) as weights
22
�Variables
Zone Model DescriptionD
escr
ipti
onQ
uest
ion
Prog
ram
Specification
Identification
�Decomposition
� Parametrical Forms
" A priori
23
V and φ Distribution building
24
Command Fusion DescriptionD
escr
ipti
onQ
uest
ion
Prog
ram
Specification
Identification
�Variables D1,...,D
8: 0...200 ,201
V : 0...5 ,6
φ : B5...5 ,11
�DecompositionP V ⊗φ⊗D
1⊗...⊗D
8= P V ⊗φ ∏
i=1
8
Pi Di|V ⊗φ
�Parametrical Forms P V ⊗φ =Uniform
Pi
Di|V ⊗φ =1
Z{ P
iV|D
iP
iφ|D
i}
Question to Zone
− A priori
Utilisation
P V ⊗φ|D1...D
8=
1
Z{∏
i=0
8
Pi
Di|V ⊗φ }
Command Fusion
25
Command Fusion ¯ Composition of Descriptions
*
P8(D
8|Vφ)
D8
P1(D
1|Vφ)
D1
P(Vφ| D1...D
8)
Pi(D
i|Vφ)
Di
...
...
P V ⊗φ|D1...D
8=
1
Z{∏
i=0
8
Pi
Di|V ⊗φ }
26
Results and CommentsSecond Version
" Results similar to anterior version, but it is easier to adjust the parameters
" Transitions in the speed variable and steering angle are smoother
27
Third Version − Proscriptive Programming
" The curves of Pi(V|D
i) and P
i(φ |D
i) change in order
to indicate the prohibited behavior, when an obstacle is identified in each zone
" The command fusion is equivalent to the anterior version
" When fusing probabilities curves, uniform distribution doesn’t add information
28
�Variables
Zone Model Description
� A priori
Des
crip
tion
Prog
ram
Specification
Identification
�Decomposition
� Parametrical Forms
29
Main Difference between Versions 2 and 3
Version 3 : what is allowed/safe
30
Command Fusion Description
� A priori
Des
crip
tion
Prog
ram
Specification
Identification
�Variables D1,...,D
8: 0...200 ,201
V ,Vc: 0...5 ,6
φ ,φc: B5...5 ,11
�DecompositionP V
c⊗V ⊗φ
c⊗φ⊗D
1⊗...⊗D
8=
P Vc
P φc
P V |Vc
P φ|φc ∏
i=1
8
{ Pi
Di|V ⊗φ }
�Parametrical Forms P V ⊗φ =Uniform;
P V c =P φc =Unknown;
P V|Vc=G
myv
Vc
;σv
V ;
P φ|φc =G myφφ
c;σ
φ
φ ;
Pi Di|V ⊗φ =1
Z{ Pi V|Di Pi φ|Di }
Question to Zone
31
Results and CommentsThird Version
Prescriptive :Safest speed is weighted moreResulting speed is greater than
safest speed.
Proscriptive :Resulting speed is the safest
speed.Safest speed is the strongest
constraint
32
Example of Fusion ResultSituation :
" Two near objects on each side : 2 and 7" One object, farther, in front
Objective : the faster in straight line
Reasonable behaviour : " The vehicle should go straight forward" It may turn slowly
Vφ
33
Summary Table
Version 1 Version 2 Version 3
Prescriptive Prescriptive Proscriptive
Reference Values
Zone Weighting °
Combination
Command Fusion ° Composition
Command Fusion °
Composition
No Reference Values
No Reference Values
34
Videos − Third Version
Media Clip
Media Clip
35
Contents" Introduction
" Cycab Vehicle Description and Problem Context
" Proposed Solutions:
1. Zone weigthing (prescriptive)2. Zone fusion (prescriptive)3. Zone fusion (proscriptive)
" Incremental Design Aspects
" Discussion and Conclusions
36
Incremental Design
" New tasks can be added incrementally and composed with obstacle avoidance task
" The tasks can suply reference values
� Deterministic or probabilistic
� Prescriptive or proscriptive
" Hierachical combination of tasks
� Reference values can result from combination or composition of other tasks
37
Contents" Introduction
" Cycab Vehicle Description and Problem Context
" Proposed Solutions:
1. Zone weigthing (prescriptive)2. Zone fusion (prescriptive)3. Zone fusion (proscriptive)
" Incremental Design Aspects
" Conclusions and Improvements
38
Comments and Conclusions" Obstacle avoidance task and Bayesian
Programming" Proscriptive programming as a way to
� increase modularity
� model security rules
" Basis for Incremental Design of more complex systems
" One or more versions can be implemented in BibaBot
39
Some Improvements...
" Implement the fourth version" Implement a wall following task to suply the
references values of speed and steering angle" Try different approaches to each zone
� Proscriptive when possible
� Prescriptive when necessary
� Different joint distributions regarding the importance of dependence between speed and angle
40
Thank you!
Questions, Sugestions, Comments,...
41
1
DEA I.S.C.
Apprentissage bayésien par imitation(Dans le cadre du projet européen BIBA, Bayesian Inspired Brain and Artefact)
Frédéric Raspail
Tuteur : Pierre BessièreLaboratoire GRAVIR Projet SHARP
19 juin 2002
1
Introduction
• Imitation : concept mal défini• Définition : «L’imitation est le processus par lequel
l’imitateur apprend quelques caractéristiques ducomportement du modèle»
• Présent dans le monde animal :• Éducation, propagation des comportements• Exemples :
• Caneton et sa mère• Mésanges
• Intérêt futur en robotique• Problème : Avoir un modèle interne de l’imité
2
2
Introduction
Expérience :
• 1 : remontée de gradient (vers la source)
• 2 : remontée vers une source parmi plusieurs
=> faire de l’imitation de manière simple
• 3 : remontée vers une source puis vers une autre
=> reconnaître ce que fait l’imité
3
Plan
• Environnement expérimental
• Programmation Bayésienne des Robots
• Expérience 1
• Expérience 2
• Expérience 3
• Conclusion et perspectives
4
3
Environnement expérimental :Robot
• Robot Koala• Caméra 2 degrés de
liberté
• Vrot, Vtrans [–64…63]
• « Nez »• Sources odorantes
simulées
• Odeur [-10…+10]
• Poursuite visuelle• DEA IVR Coué 2000
5
Environnement expérimental :Mise en place de l’imitation
• Apprendre : remontée vers une source
• Imiter grâce à la poursuite visuelle
6
4
Environnement expérimental :Mise en place de l’imitation
Phase d’apprentissage6
Plan
• Environnement expérimental
• Programmation Bayésienne des Robots
• Expérience 1
• Expérience 2
• Expérience 3
• Conclusion et perspectives
7
5
Comportement de remontée vers une source
Restitution du comportement programmé8
Programme « remontée vers une source »
• Fixé a priori• Télé-opération
• Imitation (expérience 1)
• P(Vrot | [Odeur=o] )
Des
crip
tion
Que
stio
n
Prog
ram
me
Spécification
Identification
Utilisation
•Variables
•Odeur : [-10..+10], 21
•Vrot : [-64..+63], 128
•Décomposition
P(Odeur Vrot) = P(Odeur) P(Vrot | Odeur)
•Formes paramétriques
•P(Odeur ) Æ Uniforme (1)
•P(Vrot | Odeur) Æ Gaussiennes (21)
9
6
Plan
• Environnement expérimental
• Programmation Bayésienne des Robots
• Expérience 1
• Expérience 2
• Expérience 3
• Conclusion et perspectives
10
Expérience 1 : Mode opératoire
• <odeure,vrote>• Un corpus / un apprentissage:
• Une position initiale• Plusieurs orientations initiales• Plusieurs fois le même mvt
• 3 apprentissages successifs
11
7
Expérience 1 :Résultats
Phase de restitution12
Expérience 1 :Résultats
12
8
Expérience 1 :Résultats
12
Expérience 1 :Résultats
12
9
Plan
• Environnement expérimental
• Programmation Bayésienne des Robots
• Expérience 1
• Expérience 2
• Expérience 3
• Conclusion et perspectives
13
Expérience 2 :Description
14
10
Expérience 2 :Description
• Apprentissage par imitation des moyennes et écart-types
Des
crip
tion
Que
stio
n
Prog
ram
me
Spécification
Identification
Utilisation
•Variables•Odeur1 : [-10..+10], 21
•Vrot : [-64..+63], 128
•DécompositionP(Odeur1 Vrot) = P(Odeur1) P(Vrot | Odeur1)
•Formes paramétriques
•P(Odeur1) Æ Uniforme (1)
•P(Vrot | Odeur1) Æ Gaussiennes (21)
•P(Vrot | [Odeur1=o])
14
Expérience 2 :Description
• Apprentissage par imitation des moyennes et écart-types
Des
crip
tion
Que
stio
n
Prog
ram
me
Spécification
Identification
Utilisation
•Variables•Odeur2 : [-10..+10], 21
•Vrot : [-64..+63], 128
•DécompositionP(Odeur2 Vrot ) = P(Odeur2) P(Vrot | Odeur2)
•Formes paramétriques
•P(Odeur2) Æ Uniforme
•P(Vrot | Odeur2) Æ Gaussiennes (21)
•P(Vrot | [Odeur2=o] )
14
11
Expérience 2 :Description
• Apprentissage par imitation des moyennes et écart-types
Des
crip
tion
Que
stio
n
Prog
ram
me
Spécification
Identification
Utilisation
•Variables•Odeur3 : [-10..+10], 21
•Vrot : [-64..+63], 128
•DécompositionP(Odeur3 Vrot) = P(Odeur3) P(Vrot | Odeur3)
•Formes paramétriques
•P(Odeur3) Æ Uniforme (1)
•P(Vrot | Odeur3) Æ Gaussiennes (21)
•P(Vrot | [Odeur3=o])
14
Expérience 2 : Mode opératoire & résultats
• Un corpus / un apprentissage:• Plusieurs positions initiales• Plusieurs orientations initiales• Plusieurs fois le même mouvement
• <odeure1, vrote>• <odeure2, vrote>• <odeure3, vrote>
15
12
Expérience 2 : Mode opératoire & résultats
15
Expérience 2 : Mode opératoire & résultats
15
13
Plan
• Environnement expérimental
• Programmation Bayésienne des Robots
• Expérience 1
• Expérience 2
• Expérience 3
• Conclusion et perspectives
16
Expérience 3 :Présentation
17
14
Expérience 3 :Présentation
Phase d’apprentissage17
Expérience 3 :Description
• P(Vrot | [Fuite=f] [Odeur2=o2] [Odeur1=o1] C-comb)
• Variables• Odeur1, Odeur2, Vrot
• H : {1,2}, 2
• Fuite: {1,2},2
• Décomposition• P(Odeur1 Odeur2 Fuite H Vrot) = P(Odeur1) P(Odeur2) P(Fuite) P(H | Fuite)
P(Vrot | H Odeur2 Odeur1)
• Formes paramétriques
• P(Odeur1), P(Odeur2), P(Fuite) Æ Uniformes
• P(H | Fuite) Æ Laplace
• P(Vrot | [H=1] Odeur2 Odeur1) = P(Vrot | Odeur1 C-rem1)
• P(Vrot | [H=2] Odeur2 Odeur1) = P(Vrot | Odeur2 C-rem1)
Des
crip
tion
Que
stio
n
Prog
ram
me
Spécification
Identification
Utilisation
• Apprentissage de P(H | Fuite) par imitation
18
15
Expérience 3 :Identification de P(H | Fuite)
• Imitation: 4-uplets <o1e, o2e, fe, ve>
• Question :
P(H | [Odeur1=o1e] [Odeur2=o2e] [Fuite=fe] [Vrot=ve])
• Couple < fe ,h>
P H Odeur Odeur FuiteVrotP Vrot Odeur Odeur FuiteH
P Vrot Odeur Odeur FuiteHH
1 21 2
1 2( )=
( )( )Â
P H Odeur Odeur FuiteVrotP Vrot Odeur C rem
P Vrot Odeur C rem P Vrot Odeur C rem=[ ]( )=
-( )-( )+ -( )
1 1 21 1
1 1 2 1
19
Expérience 3 : Mode opératoire & résultats
Construction des corpus:•Une position initiale•Plusieurs orientations initiales•Plusieurs fois le même mouvement
20
16
Expérience 3 : Mode opératoire & résultats
0,8160,184f = 2
0,2710,729f = 1
h = 2h = 1P([H=h] | [Fuite =f])
0,6730,327f = 2
0,5350,465f = 1
h = 2h = 1P([H=h] | [Fuite =f])
20
Plan
• Environnement expérimental
• Programmation Bayésienne des Robots
• Expérience 1
• Expérience 2
• Expérience 3
• Conclusion et perspectives
21
17
Conclusion
• Expérience 1 & 2 :imitation simple : résultats concluants
• Expérience 3 :L'imitateur reconnaît le comportement qu'il imite:résultats concluants dans une configuration.Début de réponse pour reconnaître ce que faitl’autre.
22
Perspectives
• Technique• Aide à l’apprentissage
• Adapter la poursuite visuelle
• Tester robustesse des comportements
• Compléter les corpus
• Apprendre des comportements plus complexes
• Plus long terme• « Dressage » du robot BIBA par l’homme (suivre au pied)
• Propagation de comportements (Koala à Koala )
• Reconnaître des comportements
23
9/07/02
1
programmationbayésienne de
personnages de jeuxvidéos
Ronan Le HyDEA Sciences Cognitives
2001 - 2002encadrants :
Pierre BessièreOlivier Lebeltel
2
9/07/02
2
3
idée
une nouvelle méthode deprogrammation pour les bots dans lesjeux vidéos
4
objectifs
pour l’équipe de développement facilité de programmation
temps de calcul limité
séparation programmation / conception ducomportement du personnage
facilité de programmer différents comportements
pour le joueur “humanité”
apprendre à jouer à des bots
9/07/02
3
5
plan cadre
objectifs
plateforme
objectif concret
modèle bayésien
programmation d’un comportement
apprentissage
conclusion
6
cadre : plateforme
bot
Unreal Tournament
Gamebots (ISI & CMU)
messages :• position• vitesse• santé• personnages visibles…
ordres :• aller à un point• tirer…
9/07/02
4
7
cadre : objectif concret
boucle :relever les valeurs des capteurschoisir un nouvel étatagir
recherche d’une arme
recherche de bonus de santé
fuite
attaque
exploration
détection danger
formellement :connaissant l’état courant Et
et les variables sensorielles Vidécider du nouvel état Et+1
8
programmation bayésiennedes robots
structure d’un programme bayésien
9/07/02
5
9
modèle bayésien :variables pertinentes
description variables pertinentes
Et Et+1
Vie Arme ArmeAdversaire Bruit NombreEnnemis ProxArme ProxSanté
sensorielles
motrices
états du bot Attaque RechercheArme RechercheVie Exploration Fuite DetectionDanger
10
modèle bayésien :décomposition
P(Et Et+1 V A Ad B Ne Pa Ps)
= P(Et) P(Et+1 | Et)P(V | Et Et+1)P(A | V Et Et+1)P(Ad | A V Et Et+1)P(B | Ad A V Et Et+1)P(Ne | B Ad A V Et Et+1)P(Pa | Ne B Ad A V Et Et+1)P(Ps | Pe Ne B Ad A V Et Et+1)
hypothèse :variables sensorielles et Et indépendantes deux à deux sachant Et+1
= P(Et) P(Et+1 | Et) P(V | Et+1) P(A | Et+1) P(Ad | Et+1) P(B | Et+1) P(Ne | Et+1) P(Pa | Et+1) P(Ps | Et+1)
9/07/02
6
11
formes paramétriques
formes paramétriques P(Et) : inconnue (non spécifiée)
P(Et+1 | Et) : table
P(Vsensorielle | Et+1) : tables
identification :à la main, ou par apprentissage
12
modèle bayésien : questionQuel est l’état suivant sachant l’état courantet les variables sensorielles ? P(Et+1 | Et V A Ad B Ne Pa Ps)
résolution :
9/07/02
7
13
programmation inverse :facilité de programmation
programmation classiquesachant Etsachant V1, V2…
donner Et+1
pour |Et|=6 transitions à partir d’unétat,
partitionner en 6 ensemblesl’espace sensoriel de taille |V| = 648
une transition = une formule logique
programmation inversesachant Et+1
donner V1, V2…
pour 7 variables sensorielles, pour|Et|=6 états (soit 42 cas),
donner une distribution
une distribution = une table
14
programmation inverse (2)
si P(Arme=Aucune | Et+1 = Attaque) = 0,
et si Arme = Aucune,
le bot ne peut pas passer en Attaque :
P(Et+1 = Attaque | … [Arme=Aucune] …) = 0
chaque distribution élémentaire contribue à laquestion, qui est une forme de moyenne géométrique
complexité maitrisée : linéaire dans le nombre d’états
linéaire dans le nombre de variables
9/07/02
8
15
facilité de développement
forme de spécification de l’automateplus condensée
puissante
légère en temps de calcul
16
programmation d’uncomportement
écriture des tables
P(Vie | Et+1)gestion du niveau de vie
ARARVExFuDD
AttaqueRecherche ArmeRecherche Vie
ExplorationFuite
Détection Danger
• 0.95
• 0.01
• 0.01
• 0.01
• 0.01
• 0.01• DD
• 0.01
• 0.95
• 0.01
• 0.01
• 0.01
• 0.01• Fu
• 0.01
• 0.01
• 0.95
• 0.01
• 0.01
• 0.01• Ex
• 0.01
• 0.01
• 0.01
• 0.95
• 0.01
• 0.01• RV
• 0.01
• 0.01
• 0.01
• 0.01
• 0.95
• 0.01• RA
• 0.01
• 0.01
• 0.01
• 0.01
• 0.01
• 0.95• A
• DD• Fu• Ex• RV• RA• A•
P(Et+1 | Et)auto-maintien
• 0.45• 0.1
• 0.45
• 0.001
• 0.45
• 0.899
• Haut
• 0.45• 0.2
• 0.45
• 0.01
• 0.45• 0.1
• Moyen
• 0.1• 0.7• 0.1• 0.9
89• 0.1• 0.0
01• Bas
• DD• Fu• Ex• RV• RA• A•
P(NombreEnnemis | Et+1)gestion du risque
9/07/02
9
17
spécification à la main
séparation du développement et de laconception du personnage
les comportements comme desdonnées
18
programmation d’un secondcomportement : ajustabilité second comportement aggressif
9/07/02
10
19
programmation d’uncomportement
(films : odge etberserk)
20
programmation d’uncomportement
(films : odge etberserk)
9/07/02
11
21
apprentissage :enseignement par le joueur
interface
22
tables
9/07/02
12
23
humanité
critère subjectif
cependant : forme de test de Turing
24
conclusion
pour l’équipe de développement temps de calcul limité
facilité de programmation
séparation programmation / conception dupersonnage
facilité de programmer différents comportements
pour le joueur “humanité”
enseigner le jeu à des bots
9/07/02
13
25
perspectives
perspectives : descendre plus bas avec le bayésien
intégrer des aspects délibératifs aumodèle
nouveaux schémas d’apprentissage
26
9/07/02
14
27
Gamebots(1) capteurs
messages synchrones et asynchrones personnage
id, nom, équipe
position (rotation, lieu), vitesse
santé, armure, niveau de vie
arme, munitions
autres personnages visibles id, nom, équipe
position (rotation, lieu), vitesse, accessibilité
arme, fait feu
environnement dans le champ de vision nœuds de navigation : id, lieu, accessibilité
portes, ascenseurs : id, lieu, accessibilité, type
objets : id, lieu, accessibilité, type
jeu scores, capture de drapeau
ramassé objet
pieds, tête ou corps changent de zone(eau, lave…)
changement d’arme (auto ou provoqué)
collision avec un mur, un objet ou un joueur
chute
mort, blessure
mort, blessure infligée par soi
bruit (pas, ascenseur, tirs, objet ramassé)
projectile se dirigeant vers soi
réponse à une requête de chemin oud’accessibilité
message d’un autre joueur (texte ou typé)
28
Gamebots(2) commandes
déplacement marcher, courir vers un point, un joueur, un objet, un nœudde navigation…
courir vers un point en faisant face à un point/objet (strafe)
se tourner vers un point/objet ou d’un angle
s’arrêter
sauter
armes commencer à tirer
arrêter de tirer
changer d’arme
requêtes chemin vers un point/objet
accessibilité d’un point/objet
message aux autres joueurs
1
Learning issues discussionLearning issues discussion
Moderated by Jean Laurens andModerated by Jean Laurens and Fr Frééddééric Davesneric DavesneLPPALPPA
BIBA Summer School, Moudon, 30 June-5 July 2002
BIBA Summer School - Moudon - July 2nd 2002
IntroductionIntroduction
When is learning useful ?When is learning useful ?–– Learning techniques are used when Learning techniques are used when uncertaintyuncertainty is is
encountered: lack of prior knowledgeencountered: lack of prior knowledge
But ...But ...–– Each learning technique needs the master to giveEach learning technique needs the master to give
prior knowledgeprior knowledge
Questions about this prior knowledgeQuestions about this prior knowledge–– Feasibility ? Certainty ? Accuracy ?Feasibility ? Certainty ? Accuracy ?
The core of the discussion ...The core of the discussion ...–– What about raising these questions within theWhat about raising these questions within the
bayesian framework ?bayesian framework ?
1- W
hat
do w
e m
ean
by «
Lea
rnin
g »?
2
BIBA Summer School - Moudon - July 2nd 2002
OverviewOverview
1 - What do we mean by 1 - What do we mean by ““LearningLearning”” ? ?–– Why using learning ?Why using learning ?–– Short overview of some learning paradigmsShort overview of some learning paradigms–– Learning issues (Opened to Discussion)Learning issues (Opened to Discussion)
2 - Learning within the2 - Learning within the bayesian bayesian framework framework––AA supervised learning example supervised learning example–– A latent learning exampleA latent learning example
BIBA Summer School - Moudon - July 2nd 2002
Why using learningWhy using learning tecniques tecniques ? ?
Causes of uncertaintyCauses of uncertainty–– UnabilityUnability in modelling the in modelling the ““worldworld””
the environment is unknown or is not constrained enoughthe environment is unknown or is not constrained enoughthe model of the sensors and/or the effectors is unknownthe model of the sensors and/or the effectors is unknownthe interaction between the system and its environmentthe interaction between the system and its environmentis unknown or is not constrained enoughis unknown or is not constrained enough
–– Lack in the computing processLack in the computing processhow to reach a goal or to achieve a behaviour ?how to reach a goal or to achieve a behaviour ?how to produce useful and reliable data ?how to produce useful and reliable data ?1-
Wha
t do
we
mea
n by
« L
earn
ing
»?
3
BIBA Summer School - Moudon - July 2nd 2002
The technical sideThe technical side
Perceptual issuePerceptual issuethe environment is unknown or is not constrained enoughthe environment is unknown or is not constrained enoughthe model of the sensors and/or the actuators is unknownthe model of the sensors and/or the actuators is unknownthe interaction between the system and its environmentthe interaction between the system and its environmentis unknown or is not constrained enoughis unknown or is not constrained enoughhow to produce useful and reliable data ?how to produce useful and reliable data ?
Procedural issueProcedural issuehow to reach a goal or to achieve a behaviour ?how to reach a goal or to achieve a behaviour ?
1- W
hat
do w
e m
ean
by «
Lea
rnin
g »?
BIBA Summer School - Moudon - July 2nd 2002
Learning paradigmsLearning paradigms
SupervisedSupervised
ReinforcementReinforcement
UnsupervisedUnsupervised
LatentLatent1- W
hat
do w
e m
ean
by «
Lea
rnin
g »?
SC
M
4
BIBA Summer School - Moudon - July 2nd 2002
ExampleExample
Dist
Robot
Visual landmark
Movement
Target
Sensory variables : landmarks perceptionMotor variable : « this way ! »Comportemental variable : d(Dist)/dt < 0 -> Target reaching
1- W
hat
do w
e m
ean
by «
Lea
rnin
g »?
BIBA Summer School - Moudon - July 2nd 2002
Multi-Layered Neural Networks with back-Multi-Layered Neural Networks with back-propagationpropagation
Hypothesis about prior knowledgeHypothesis about prior knowledge–– Each of the examples is a Each of the examples is a functional andfunctional and
meaningfulmeaningful relation between input and output data relation between input and output data–– Uncertainty = inaccuracyUncertainty = inaccuracy
Learning = best interpolationLearning = best interpolation
Learning paradigms: supervisedLearning paradigms: supervised
1- W
hat
do w
e m
ean
by «
Lea
rnin
g »?
NNSet of
examplesS
C
M
5
BIBA Summer School - Moudon - July 2nd 2002
AHC or Q-Learning-like methodsAHC or Q-Learning-like methods
Hypothesis about prior knowledgeHypothesis about prior knowledge–– The states are perfectly designedThe states are perfectly designed
Uncertainty = inaccuracy for the perceptual issueUncertainty = inaccuracy for the perceptual issueMDP or POMDPMDP or POMDP
–– The reinforcement value is relevant and perfectly knownThe reinforcement value is relevant and perfectly known–– Knowledge about the probability to find a first solutionKnowledge about the probability to find a first solution
(relevant internal parameters)(relevant internal parameters)
Learning = Maximising expected rewardsLearning = Maximising expected rewards
Learning paradigms: reinforcementLearning paradigms: reinforcement1-
Wha
t do
we
mea
n by
« L
earn
ing
»?
Robot environmentstate action
reward
SC
M
BIBA Summer School - Moudon - July 2nd 2002
Clustering methods (SOM)Clustering methods (SOM)
Hypothesis about prior knowledgeHypothesis about prior knowledge–– Unitisation and differenciation postulates (what makes aUnitisation and differenciation postulates (what makes a
state exist and be unique ?)state exist and be unique ?)–– May lead to a supervised learning problemMay lead to a supervised learning problem
Learning = fit a probability distributionLearning = fit a probability distribution
Learning paradigms: unsupervisedLearning paradigms: unsupervised
1- W
hat
do w
e m
ean
by «
Lea
rnin
g »?
System statesignals
6
BIBA Summer School - Moudon - July 2nd 2002
Some relevant informations can be learned withoutSome relevant informations can be learned withoutany conditionning processany conditionning process
–– Goal = AnticipationGoal = Anticipation - ACS [Stolzmann 1998] - ACS [Stolzmann 1998]
Hypothesis about prior knowledgeHypothesis about prior knowledge–– The states or the symbols used by the system are perfectlyThe states or the symbols used by the system are perfectly
designeddesigned
Learning = anticipate as best as possibleLearning = anticipate as best as possible
Learning paradigms: latent [Seward 1949]Learning paradigms: latent [Seward 1949]1-
Wha
t do
we
mea
n by
« L
earn
ing
»?
SC
M
BIBA Summer School - Moudon - July 2nd 2002
Learning issuesLearning issues
1- W
hat
do w
e m
ean
by «
Lea
rnin
g »?
•• Structure of the model Structure of the model
•• Curse of dimensionalityCurse of dimensionality
•• Meaningfulness of the variablesMeaningfulness of the variables
•• Specification of the prior knowledge Specification of the prior knowledge
•• Uncertainty about the relevance of the prior knowledge givenUncertainty about the relevance of the prior knowledge givenby the masterby the master
7
BIBA Summer School - Moudon - July 2nd 2002
Curse of dimensionality Curse of dimensionality « « AAlmost all the learninglmost all the learningtechniques are bound to fail if the techniques are bound to fail if the intrinsic dimensionalityintrinsic dimensionalityof the problem is too bigof the problem is too big » » [Verleysen 2000] [Verleysen 2000]
–– Input space (supervised,unsupervised): need of anInput space (supervised,unsupervised): need of anenormous amount of dataenormous amount of data
–– Search space (reinforcement): need of an enormousSearch space (reinforcement): need of an enormousamount of time to discover the goalamount of time to discover the goal
WARNING !!!WARNING !!!–– Generally, the intrinsic dimensionality is less than theGenerally, the intrinsic dimensionality is less than the
dimensionality of the input space (eg. A Khepera robotdimensionality of the input space (eg. A Khepera robotin a in a structuredstructured environement) environement)
Learning issues: structure of the modelLearning issues: structure of the model1-
Wha
t do
we
mea
n by
« L
earn
ing
»?
BIBA Summer School - Moudon - July 2nd 2002
Meaningfulness of the variablesMeaningfulness of the variables–– Hidden variablesHidden variables–– Symbol grounding problem [Harnad 1992]Symbol grounding problem [Harnad 1992]
Learning issues: structure of the modelLearning issues: structure of the model
1- W
hat
do w
e m
ean
by «
Lea
rnin
g »?
8
BIBA Summer School - Moudon - July 2nd 2002
Idealistic learning techniqueIdealistic learning technique•• Certainty about the reliability of the prior knowledgeCertainty about the reliability of the prior knowledgegiven by the master, in the robotics contextgiven by the master, in the robotics context
Consequence of uncertainty about prior knowledgeConsequence of uncertainty about prior knowledge•• Lack of reliability about the result of the learningLack of reliability about the result of the learningprocessprocess•• Lack of predictibility: unability to determine the cause of Lack of predictibility: unability to determine the cause ofthe failure in a learning processthe failure in a learning process
•• a) a) Learnability of my problem ?Learnability of my problem ?
•• b) b) Correctness of my prior knowledge ?Correctness of my prior knowledge ?
Example: learning of the cart pole balancing problem Example: learning of the cart pole balancing problem with withreinforcement techniques [Davesne 2002]reinforcement techniques [Davesne 2002]
Learning issues: prior knowledgeLearning issues: prior knowledge1-
Wha
t do
we
mea
n by
« L
earn
ing
»?
BIBA Summer School - Moudon - July 2nd 2002
Learning issues: prior knowledgeLearning issues: prior knowledge
1- W
hat
do w
e m
ean
by «
Lea
rnin
g »?
An example of the consequences of a wrong priorAn example of the consequences of a wrong priorknowledgeknowledge
Cart pole balancing task with reinforcement learning
The learningseems to be good but ...
9
BIBA Summer School - Moudon - July 2nd 2002
Learning issues: prior knowledgeLearning issues: prior knowledge1-
Wha
t do
we
mea
n by
« L
earn
ing
»?
If the successfulcriteria is deeply increased, it fails to learn the task
An example of the consequences of a wrong priorAn example of the consequences of a wrong priorknowledge [Davesne 2002]knowledge [Davesne 2002]
Cart pole balancing task with reinforcement learning
BIBA Summer School - Moudon - July 2nd 2002
We have shownWe have shown•• Some learning paradigms which must be furnished with Some learning paradigms which must be furnished withspecifical prior knowledgespecifical prior knowledge•• Some typical learning issues which must be overcome (if Some typical learning issues which must be overcome (ifpossible)possible)
Now, it's time for raising questionsNow, it's time for raising questions•• With which of these learning paradigms should weWith which of these learning paradigms should weassociate the bayesian framework ?associate the bayesian framework ?•• Is bayesian learning a new learning paradigm ? Is bayesian learning a new learning paradigm ?•• What about the hypothesis about the prior knowledge ? What about the hypothesis about the prior knowledge ?•• What are the main issues ? What are the main issues ?
Initiation of the discussion ...Initiation of the discussion ...
1- W
hat
do w
e m
ean
by «
Lea
rnin
g »?
10
BIBA Summer School - Moudon - July 2nd 2002
Behavior learning by imitationBehavior learning by imitation•• Wall-following, phototaxy, etc. By a Khepera robot Wall-following, phototaxy, etc. By a Khepera robot
Pattern recognition (WARNING !!! Symbol groundingPattern recognition (WARNING !!! Symbol groundingproblem)problem)•• Distinguish a wall from a corner Distinguish a wall from a corner
A supervised learning exampleA supervised learning example2-
Lea
rnin
g w
ithi
n th
e ba
yesi
an f
ram
ewor
k
BIBA Summer School - Moudon - July 2nd 2002
An example of latent learningAn example of latent learning
2- L
earn
ing
wit
hin
the
baye
sian
fra
mew
ork
11
BIBA Summer School - Moudon - July 2nd 2002
BibliographyBibliographyDavesne, Frédéric (2002) Etude de l'émergence de facultés d'apprentissage fiables etprédictibles d'actions réflexes, à partir de modèles paramétriques soumis à descontraintes internes – Thèse de doctorat – Université d'Evry Val d'Essonne
Harnad, Steven (1992) Cognition and the symbol grounding problem – Electronicsymposium on computation
Seward, John P. (1949). An Experimental Analysis of Latent Learning. Journal ofExperimental Psychology. 39 177-186.
Stolzmann, Wolfgang (1998). Anticipatory Classifier Systems. In Koza; John R etal.(editors). Genetic Programming 1998. Proceedings of the third Annual Conference,July 22-25, 1998, University of Wisconsin, Madison, Wisconsin. San Francisco. CA:Morgan Kaufmann. 658-664.
Verleysen, Michel (2000) Machine learning of high-dimensional data: Local artificialneural networks and the curse of dimensionality- Thèse d'agrégation – Universitécatholique de Louvain - Belgique