BIBA - Summer School Proceedings · BIBA IST–2001-32115 Bayesian Inspired Brain and Artefacts!:...

Post on 27-Sep-2020

1 views 0 download

transcript

BIBAIST–2001-32115

Bayesian Inspired Brain and Artefacts!:Using probabilistic logic to understand brain function

and implement life-like behavioural co-ordination

Summer School Proceedings

Deliverable: 5Workpackage: 6Month due: July 2002Contract Start Date: 01.11.2001 Duration: 48 months

Project Co-ordinator: INRIA–UMR-GRAVIR

Partners: CNRS–UMR-GRAVIR; UCL-ARGM; UCAM-DPOL;CNRS-UMR-LPPA; CDF-UMR-LPPA; EPFL!; MIT-NSL

Project funded by the EuropeanCommunity under the “InformationSociety Technologies” Programme (1998-2002)

BIBA SUMMER SCHOOL

30th June – 5th July

Ecole Cantonale d'Agriculture de Grange-Verney –Moudon - Switzerland

Contents

1 Participants

2 Program

3 Sessions

Annex - Presentations

1 Participants

Autonomous Systems Lab (ASL) - Ecole polytechnique fédérale de Lausanne (EPFL)http://dmtwww.epfl.ch/isr/asl/Roland SIEGWART, ProfessorDr. Nicola TOMATIS, EngineerGuy RAMEL, PhD studentSantanu METIA, PhD student

Laplace Group - Institut National de Recherche en Informatique et Automatique(INRIA)http://www-laplace.imag.frPierre BESSIERE, ResearcherEmmanuel MAZER, Research DirectorOlivier AYCARD, Assistant ProfessorHubert ALTHUSER, Research EngineerOlivier LEBELTEL, Research EngineerKamel MEKHNACHA, Research EngineerChristophe COUE, PhD studentJulien DIARD, PhD studentCarla KOIKE, PhD studentCédric PRADALIER, PhD studentFrancis COLAS, GraduateRonan LE HY, GraduateFrédéric RASPAIL, GraduateAdriana TAPUS, GraduateOlivier MALRAIT, Administrative

Laboratoire de Physiologie pour la Perception et l'Action (LPPA) - Collège de Francehttp://www.college-de-france.fr/chaires/chaire3/index.htmAlain BERTHOZ, ProfessorFrédéric DAVESNE, PostDocJean LAURENS, Graduate

2 Program

Monday 1st July

9h00 – 12h30: Session 1 & 4Session 1 - Biologically plausible probabilistic representation and inference mechanisms

Session 4 - Visual-vestibular interaction in self motion perception and gaze stabilization

14h00 – end of afternoon: Session 3 & PMCSession 3 - Switching vs Weighing & Sensor selection based (in particular) on contractiontheory

Project Management Committee

Tuesday 2nd July

9h00 – 13h00: Session 2Session 2 - Representation of space, maps and navigation

14h00 – end of afternoon: Sessions 5 & 6Session 5 - Bayesian Robot programming and modelling

Session 6 - Learning Issues Discussion

Evening: Robots Exhibition Visit

Wednesday 3rd July

9h00 – 12h30: Robot Specification & General Discussion

Afternoon: End

3 Sessions1 - Biologically plausible probabilistic representation and inferencemechanisms

Session length: 2h (presentations + discussion)

* Analytical and Bayesian Models of Head Direction CellsFrancis Colas, Pierre Bessière, 30 mn

* NeuroKhepera : One Step towards Neural Implementation of Bayesian InferenceJean Laurens, 30 mn

2 - Representation of space, maps and navigation

Session length: 4h (presentations + discussion)

* Bayesian Maps and NavigationJulien Diard, 1h

* Markov Models Use in RoboticsAdriana Tapus, Olivier Aycard, 1h

* Hybrid Mobile Robot Navigation: A Natural Integration of Metric and Topological.Roland Siegwart, Nicola Tomatis, 1h

3 - Switching vs Weighing & Sensor selection based (in particular) oncontraction theory

Session length: 2h!30 (presentations + discussion)

* Some Examples of Visuo-Vestibular Interaction ModelsAlain Berthoz,!1h

* Selection of relevant sensors and setting of new sensory apparatusPierre Bessière, 30 mn

4 - Visual-vestibular interaction in self motion perception and gaze stabilization

Session length: 1h (presentations + discussion)

* Nystagmus and Visuo-Vestibular Interactions ModellingJean Laurens, 30mn

5 - Bayesian Robot programming and modelling

Session length: 2h30 (presentations + discussion)

* Obstacle Avoidance using Proscriptive ProgrammingCédric Pradalier, Carla Koike, 30 mn

* A First Step toward Bayesian Learning by ImitationPierre Bessière, 30mn

* Bayesian Programming of Videogames CharactersRonan Le Hy, Olivier Lebeltel, 30mn

6 – Learning Issues Discussion

Session length: 2h

Introduced and moderated by Fédéric Davesne & Jean Laurens

Annex - Presentations

Session 1 - Biologically plausible probabilistic representation and inference mechanisms

- Analytical and Bayesian Models of Head Direction CellsFrancis Colas Graduate, INRIA

Session 2 - Representation of space, maps and navigation

- Hybrid Robot Mobile Navigation: A natural integration of Metric and TopologicalDr Nicola Tomatis, Prof. Roland Siegwart, EPFL

- Bayesian Maps and NavigationJulien Diard PhD student, INRIA

Session 5 - Bayesian Robot programming and modelling

- Obstacle Avoidance Using Proscriptive ProgrammingCédric Pradalier, Carla Koike PhD students, INRIA

- Apprentissage bayésien par imitationFrédéric Raspail Graduate, INRIA

- Programmation bayésienne de personnages de jeux vidéosRonan Le Hy Graduate, INRIA

Session 6 – Learning Issues Discussion

Introduced and moderated by Jean Laurens PhD student & Frédéric Davesne PostDoc, LPPA

1

1

Analytical and bayesianmodelling of head direction cells

Francis COLAS

Tutor : Pierre Bessière

European project BIBA in collaboration with

L.P.P.A. (Collège de France)

2

Introduction

q?w

2

3

Angle coding

• Head direction cells :– firing rate correlated with angle

– coding for actual or anticipated orientation

Taken from [Arleo00]

4

Brains areas and projections

• Adn (antero-dorsal nucleus) : anticipated head direction (ª25 ms)• Dtn (dorsal tegmental nucleus) : head angular velocity• Lmn (lateral mammillary nucleus) : anticipated orientation (ª95 ms)• Psc (postsubiculum) : present angle

Adapted from [Stackman98]

Dtn(w) Lmn(q)

Adn (q)

Psc (q)

3

5

Contents

• Introduction

• Previous models

• Analytical modelling

• Bayesian modelling

6

Previous models

• First model [McNaughton91]

• Neural implementation [Skaggs95]

• Mathematical framework [Zhang96]

• Adn study [Blair95], [Blair97]

• Use of Lmn for imtegration [Arleo00]

• Modeling attractor deformation in Adn[Goodrige00]

4

7

Constraints

1. Integration2. Respect des projections3. Anticipatory time intervals4. Anatomical lesions5. Uncertainty

8

Contents

• Introduction• Previous models• Analytical modelling

– Functional dependencies– Methodology– Formulas– Tests– Behaviour– Results

• Bayesian modelling

5

9

Functional dependencies

),(

),(

),,(

113

2221

33312

--------

------------

----------------

====

====

====

ttt

ttt

tttt

AdnPscgPsc

LmnPscgAdn

LmnPscgLmn wwww

Psct

Lmnt-3

Psct-3

wt-3

Lmnt-2

Psct-2

Adnt-1

Psct-1

Adn

Dtn Lmn Psc

Constraint 2 :Projections

10

Methodology

Lmnt-2Adnt-1

Psct-3 Psct-2 Psct-1 Psct

wt-3

Lmnt-3

)(tPsct qqqq====)(1 dttPsct ----====---- qqqq

)2(2 dttPsct ----====---- qqqq)3(3 dttPsct ----====---- qqqq)(1 dttAdn At ----++++====---- ttttqqqq

)3(3 dttLmn Lt ----++++====---- ttttqqqq)2(2 dttLmn Lt ----++++====---- ttttqqqq

)3(3 dttt ----====---- wwwwwwww

)(. 211 dtOdtPscPsc ttt ++++++++==== -------- wwww

)(. 2111 AtAtt OPscAdn ttttwwwwtttt ++++++++==== ------------

)(111 A

A

ttt O

PscAdntttt

ttttwwww ++++

----==== --------

----

)( 2111 dtdtO

PscAdndtPscPsc A

A

tttt ++++++++

----++++==== --------

---- tttttttt

6

11

Formulas

)(

)(

2111

222221

dtdtOPscAdn

dtPscPsc

dtOLmndt

Pscdt

Adn

AA

tttt

ALtL

At

L

ALt

++++++++----

++++====

++++++++++++++++

++++--------

====

------------

------------

tttttttt

tttttttttttt

tttt

tttt

tttttttt

t+tL-2dt

qqqq

t-3dt

Psct-3

t+tL-3dt

Lmnt-3

Lmnt-2

wt-3

L

tt PscLmntttt

33 -------- ----

t

q

Lmnt-2

Psct-3

wt-3

Lmnt-3

))(()( 23

3332 dtdtO

dtPscLmndtdtLmnLmn Lt

L

L

L

tt

Ltt ++++++++

----++++

----++++==== ----

---------------- ttttwwww

tttt

tttt

tttttttt

12

Angular velocity

• Tests :– constant velocity

– trapezoidal profile for head angular velocity :

7

13

Behaviour

Results of an execution

Constraint 1 :Integration

14

Results

÷÷÷÷

÷÷÷÷

÷÷÷÷

¥¥¥¥

¥¥¥¥

1. Integration2. Respect des projections3. Anticipatory time intervals4. Anatomical lesions5. Uncertainty

8

15

Contents

• Introduction

• Previous models

• Analytical modelling

• Bayesian modelling– Variables

– Decomposition

– Identification

– Questions

16

Bayesian model : Variables

• Variables :– Psct-3, Psct-2, Psct-1,Psct D=[-180 ; 180[

– Lmnt-3, Lmnt-2 D=[-180 ; 180[

– Adnt-1 D=[-180 ; 180[

– wt-3 Dw=[-600 ; 600]

– tA DA=[-0.010 ; 0.050]

– tL DL=[0.060 ; 0.110]

9

17

Bayesian model : Decomposition

)|(

)|()|(

)()()()()()()(

)(

11

2213332

33123

3231123

Attt

LAtttLtttt

LAttttt

LAtttttttt

PscAdnPscP

PscLmnAdnPPscLmnLmnP

PPPLmnPPscPPscPPscP

LmnLmnAdnPscPscPscPscP

tttt

ttttttttttttwwww

ttttttttwwww

ttttttttwwww

--------

----------------------------

--------------------

----------------------------

¥¥¥¥

¥¥¥¥¥¥¥¥

¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥====

Lmnt-2

Adnt-1

Psct-3 Psct-2 Psct-1 Psct

wt-3

Lmnt-3

tL tA

18

Bayesian model : Distributions

• P(wt-3 ) : uniform ;

• P(tL), P(tA) : gaussians matching biologicaldata ;

• P(Psct-3), P(Psct-2), P(Psct-1), P(Lmnt-3) :gaussians around an initialization value ;

• For Lmn, Adn and Psc : gaussians centeredaround results from the analytical formulas.

10

19

Bayesian model : Program

• Questions :– P(Lmnt-2 | wt-3)

– P(Psct | wt-3)

Lmnt-2Adnt-1

Psct-3 Psct-2 Psct-1 Psct

wt-3

Lmnt-3

tL tA

20

Conclusion

÷÷÷÷

÷÷÷÷

÷÷÷÷

÷÷÷÷

÷÷÷÷

1. Integration2. Respect des projections3. Anticipatory time intervals4. Anatomical lesions5. Uncertainty

11

21

Future work

• Microscopic plausibility of bayesianhypothesis

• Extension to place cells

• Use of vision

† Swiss Federal Institute of Technology, Lausanne, Switzerland‡ The Robotics Institute, CMU, Pittsburgh, USA

Hybrid Mobile Robot Navigation: A Natural Integration of Metric and Topological

BIBA School, Moudon

Nicola Tomatis †

Roland Siegwart †

In collaboration with:Illah Nourbakhsh ‡

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Contents

• Introduction• Environmental Modeling• Localization and Map Building

• Metric• Topological• Closing the Loop• Switching Model

• Experimental Results• Conclusion and Outlook

Introduction

• Mobile robotics for applications:• Precision with respect to the

environment• Robustness avoiding human

intervention• Practicability with limited embedded

resources• Ergonomics for the user (man-

machine interaction)

Motivation

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Introduction

• Assumption: theory is available• Goal: theory to practice• Approach:

• Study the literature• Focus on advantages and

disadvantages of existing methods• Propose a more human (bio?) -

inspired approach• Validate it empirically

Motivation

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Introduction

Related Work: Localization

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Introduction

Related Work: SLAM

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Environmental Modeling

The idea:• One global topological map

• Many local metric maps

Metric - Topological

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

• Features• Horizontal lines from laser scanner

• Local metric map containing the features belonging to the same physical place

Environmental Modeling

Metric Model

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Environmental Modeling

• Features:• Corners• Openings

• Map is a graph• Openings correspond to map states

Topological Model

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

• Map building strategy• Implementation for office environment• Assumption: Precision is needed in

rooms• Navigation in hallways is topological,

in rooms metric• Exploration strategy

• Depth-first search in the hallways first• Then backtracking to visit the rooms

Localization and Map Building

Strategies

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Localization and Map Building

Metric: The EKF

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

• The product rule:• The Bayesian rule:• The Markov assumption:

• The independence assumption:

• Errors are Gaussian, error propagation is linear

Localization and Map Building

The EKF is Bayesian

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Localization and Map Building

• Stochastic Map [Smith88]

• Update:• Displacement• New observation• Re-observation

Metric: Stochastic Map

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

• Belief state vector:• State transition:• Observation:• Estimation:

• Control strategy:

• Path-planning: graph based

Localization and Map Building

Topological: POMDP

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

• The “observation graph” permit to detect new features

• Handle environmental dynamics

Localization and Map Building

Topological Map Building

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Localization and Map Building

Closing the Loop

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Localization and Map Building

• Topological is multimodal

• Metric is unimodal• EKF initialization!!!

• Confidence function:

Switching Model

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

• Performed with the fully autonomous Donald Duck

• Environment closed for a finite exploration

• Experiments:• Map building• Localization (tracking)• Bootstrapping (global localization)• Closing the loop

Experimental Results

Experiments

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Experimental Results

Map Building

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Experimental Results

Localization

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Experimental Results

Bootstrapping

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Experimental Results

Closing the Loop

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

• Contribution• Precision: Mean error < 10 mm• Robustness: Multimodality in dynamic

environments• Practicability: PowerPC 604e 300MHz• Closing the loop

• Limitations• Switching from topological to metric

Conclusions and Outlook

Conclusions

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

1

Bayesian Maps and Navigation

Julien Diard, Pierre BessièreLaplace Team - Sharp project

Gravir Lab/IMAG, Grenoble

http://www-laplace.imag.fr/

12/06/2002

2

Introduction

• Questions relevant in robotics and biology :– What is navigation?

– What is a location?

– What is a map?

– What is planning?

– What is localization?

2

3

Contents

• Bibliography• Bayesian Robot Programming

– Example and definition– Putting bayesian programs together

• Bayesian maps– Definition and example– Putting bayesian maps together

4

Bibliography : the classicalapproach

• Navigation :– Finding a path for a point in

workspace(-time) ;– control of plan

• applied mathematical problems

• Assumes that– A precise geometric map is available– The locations of the robot and goal point are precisely

known

• These conditions are never met

3

5

Bibliography : probabilisticapproaches

• Markov models– POMDPs, HMMs, Markov Localization, (Extended)

Kalman Filters, Dynamic Bayes Networks, etc.

• Pros– Treat incompleteness and uncertainties

• Cons– Often not hierarchical

– Often associated with geometric, fine grained models

– Impose dependencies and independencies

6

Bibliography : bio-inspiredapproaches

• Kuipers’ Spatial Semantic Hierarchy,among many others

• Pros– Reflexion on the definitions of « localization »

and « maps » : cognitive maps– Hierarchical

• Cons– Various formalisms

• Consistency, communication between modules, …

4

7

Bibliography : analysis

• Use Bayesian Robot Programming– Unified framework for dealing with

incompleteness and uncertainties

– Explicit declaration of assumptions• Emphasis on the semantics of variables

• Does not impose any dependencies

– Modularity allows• Incremental development

• Easy building of hierarchies

8

Contents

• Bibliography• Bayesian Robot Programming

– Example and definition– Putting bayesian programs together

• Bayesian maps– Definition and example– Putting bayesian maps together

5

9

Example : Sensor fusion

• Objective– Find the position of a light

source

• Problem– The robot does not have a dedicated sensor

• Solution– Model of each sensor– Fusion of the eight models

ThetaL

DistL

Lmi

10

ThetaL, DistL, Lmi

Light sensor model (1)

– A priori programming (or learning)

Utilization : inverse questions

Des

crip

tion

Des

crip

tion

Que

stio

nQ

uest

ion

Prog

ram

Prog

ram

Specification

Identification

– Variables

Preliminary knowledge psensor

– DecompositionP |

P | P |

ThetaL DistL Lmi

ThetaL DistL Lmi ThetaL DistL

i Sensor

Sensor i Sensor

Ÿ Ÿ Ÿ( )= Ÿ( ) ¥ Ÿ Ÿ Ÿ( )

d p

p d p

P |

P |

ThetaL Lmi li

DistL Lmi li

i Sensor

i Sensor

=[ ] Ÿ Ÿ( )=[ ] Ÿ Ÿ( )

d p

d p

– Parametric forms

P |

P |

ThetaL DistL

Lmi ThetaL DistL

Sensor

i Sensor

Ÿ( ) ¨

Ÿ Ÿ Ÿ( ) ¨

p

d p

Uniform

Gaussians

0

100

200

300

400

500

0

-90

90

-180

180

ThetaL

0

10

20

30

DistL

Lmi

6

11

Light sensor model (2)P(ThetaL | Lmi )

(Lmi = 15)

P(ThetaL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

-180-135 -90 -45 0 45 90 135 170

P(DistL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

0 5 10 15 20 25

(Lmi = 45)

P(ThetaL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

-180-135 -90 -45 0 45 90 135 170

P(DistL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

0 5 10 15 20 25

(Lmi = 100)

P(ThetaL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

-180-135 -90 -45 0 45 90 135 170

P(DistL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

0 5 10 15 20 25

(Lmi = 200)

P(ThetaL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

-180-135 -90 -45 0 45 90 135 170

P(DistL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

0 5 10 15 20 25

(Lmi = 300)

P(ThetaL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

-180-135 -90 -45 0 45 90 135 170

P(DistL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

0 5 10 15 20 25

(Lmi = 450)

P(ThetaL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

-180-135 -90 -45 0 45 90 135 170

P(DistL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

0 5 10 15 20 25

(Lmi = 475)

P(ThetaL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

-180-135 -90 -45 0 45 90 135 170

P(DistL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

0 5 10 15 20 25

(Lmi = 500)

P(ThetaL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

-180-135 -90 -45 0 45 90 135 170

P(DistL | Lmi Cp_li)

0.00

0.12

0.25

0.37

0.50

0 5 10 15 20 25

P(DistL | Lmi )

12

Sensor Fusion (1)

– No free parameters

Des

crip

tion

Des

crip

tion

Que

stio

nQ

uest

ion

Prog

ram

Prog

ram

Specification

Identification

– Variables

– Decomposition (Conditional Independence Hypothesis)

P ... |

P | P |

ThetaL DistL Lm Lm

ThetaL DistL Lmi ThetaL DistL

Fusion

Fusion Fusion

i

Ÿ Ÿ Ÿ Ÿ( )

= Ÿ( ) ¥ Ÿ Ÿ( )=

0 7

0

7

p

p p

Utilization

P | ...ThetaL DistL lm lm FusionŸ Ÿ Ÿ Ÿ( )0 7 p

– Parametrical FormsP |

P | P |

ThetaL DistL

Lmi ThetaL DistL Lmi ThetaL DistL

Fusion

Fusion i Sensor

Ÿ( ) ¨

Ÿ Ÿ( ) ¨ Ÿ Ÿ Ÿ( )

p

p d p

Uniform

ThetaL, DistL, Lm0, …, Lm7

7

13

Sensor Fusion (2)Lm2 = 391 (capteurlum -10° )

P(ThetaL | Lm2 Cp_l2)

0.00

0.12

0.25

0.37

0.50

-180 -90-45 0 45 90 170

Lm3 = 379 (capteurlum 10°)

P(ThetaL | Lm3 Cp_l3)

0.00

0.12

0.25

0.37

0.50

-180 -90-45 0 45 90 170

Lm1 = 480 (capteurlum -50° )

P(ThetaL | Lm1 Cp_l1)

0.00

0.12

0.25

0.37

0.50

-180 -90 -45 0 45 90 170

Lm4 = 430 (capteurlum 50°)

P(ThetaL | Lm4 Cp_l4)

0.00

0.12

0.25

0.37

0.50

-180 -90-45 0 45 90 170

Lm0 = 509 (capteurlum -90° )

P(ThetaL | Lm0 Cp_l0)

0.00

0.12

0.25

0.37

0.50

-180 -90 -45 0 45 90 170

Lm5 = 503 (capteurlum 90°)

P(ThetaL | Lm5 Cp_l5)

0.00

0.12

0.25

0.37

0.50

-180 -90-45 0 45 90 170

Lm7 = 511 (capteurlum -170°)

P(ThetaL | Lm7 Cp_l7)

0.00

0.12

0.25

0.37

0.50

-180 -90-45 0 45 90 170

Lm6 = 511 (capteurlum 170° )

P(ThetaL | Lm6 Cp_l6)

0.00

0.12

0.25

0.37

0.50

-180 -90-45 0 45 90 170

Tetha = 10, Dist = 20

P(ThetaL | Lm0..Lm7 Cp_SourceL)

0.00

0.25

0.50

0.75

1.00

-180 -90-50-1010 50 90 170

14

Bayesian Robot Programming

– P(searched variables | known variables Ÿ p Ÿ d)

• General probabilistic inference engine

Utilization– Learning (with data d) or a priori programming

Prog

ram D

escr

iptio

nQ

uest

ion

Specification

Identification

– Relevant variables X1, …, Xn

• Their range

– Decomposition

• P(X1 Ÿ … Ÿ Xn) as a product of simple terms

• Dependencies

• Conditional independence hypotheses

– Parametric Forms• For all terms

Preliminary Knowledge p

8

15

Contents

• Bibliography• Bayesian Robot Programming

– Example and definition– Putting bayesian programs together

• Bayesian maps– Definition and example– Putting bayesian maps together

16

Putting descriptions together

• Bayesian fusion– Probabilistic subroutine calling

• Bayesian program combination– Probabilistic « if - then - else »

• Bayesian program sequencing– Probabilistic « ; »

• Bayesian program iteration– Probabilistic « loop »

• Using functions

9

17

• Complex behaviour (42 var., 4 hierarch. levels…)

• Space is represented, but no explicit map

P _

P _

P _

.. ..

Vrot Vtranspx0 px7 lm0 lm7 veille feu obj?eng tach_t-1 td_t-1 tempo tour dir prox dirG proxG vtrans_c dnv mnv mld per

td_t - 1 tempo tour

veille feu obj?

eng tach_t - 1

Cp Surveil

Z

Cp TypDépl

Cp

Ê

ËÁ

ˆ

¯˜

=

Ê

ËÁ

ˆ

¯˜

1

TdTach

TachBase

TachTach

Cp DétectBase

Cp SL

Cp Surveil

Ê

Ë

ÁÁ

ˆ

¯

˜˜

Ê

ËÁ

ˆ

¯˜

Ê

Ë

ÁÁÁÁÁÁ

ˆ

¯

˜˜˜˜˜˜

Ê

Ë

ÁÁÁÁÁÁÁÁÁÁ

ˆ

¯

˜˜˜˜˜˜˜˜˜˜

( )

( )

ÂÂ

Â

P _

P _

P _

P

...

...

..

Base

ThetaL DistL

H

H

Base

Tach

DistL

px0 px7

lm0 lm7

lm0 lm7

prox

Vrot Vtrans

Td ThetaL

TdThetaLH

dir prox dirG proxG vtrans_cCp Surveil_

.

Ê

ËÁ

ˆ

¯˜

Ê

Ë

ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ

ˆ

¯

˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜

ÂP(ThetaL | C4)

lm0 ... lm7

P(Base | C1)

lm0 ... lm7px0...px7

P(Tach | C2)

veilleengfeuobj?tach_t-1

P(Td | C3)

tourtempotd_t-1

P(H | C5)

P(Vrot Vtrans | C6)

proxdirdirG

proxGvtans_c

Cp_SL

Cp_DétectBase

Cp_Tach

Cp_TypDépl

Cp_Surveil

Cp_PhotoEvit

prox

décision

vrot vtrans

Scaling up : « TheNightwatchman Khepera »

18

« The Nightwatchman Khepera »

10

19

Contents

• Bibliography• Bayesian Robot Programming

– Example and definition– Putting bayesian programs together

• Bayesian maps– Definition and example– Putting bayesian maps together

What is navigation?What is a location?

What is a map?

20

Bayesian Map : definition

– Localization P(Lt | P)

– Prediction P(Lt’ | A Lt)

– Control P(A | Lt Lt’)

– anyProg

ram D

escr

iptio

nQ

uest

ion

Utilization

Specification

Identification

– Relevant variables :

• P : perception variable

• Lt : location at time t

• Lt’ : location at time t’ (t’ > t)

• A : action variable

– Decomposition : any (eg Markov Loc)

– Parametric Forms : any

What is a location?

What is localization?

What is planningbased on?What is navigationbased on?

What is a map?

11

21

Example (1/3) : variables

• Map of a room based on proximeters– P : Px0 Ÿ … Ÿ Px7

• Relevant features : corners, walls, empty space– Lt : Sitt = {corner, wall,

empty-space}– Lt’ : Sitt+Dt

• Motor commands :– A : Beh = {Stop, Straight,

FollowWall, QuitCorner, …}

corner

wall

empty

2D visualization of the bayesian map

What is a location?

22

Example (2/3) : induced graph

• Can be extracted from a bayesian map– Can then compute on it :

• diameter, connexity…

• Can help build a bayesian map– Comes more intuitively

– Helps talking about the

bayesian map

– Nicer than…

Empty Space

FollowW

QuitCStraight

Straight

Stop

Corner�

Wall

« graph form » visualization of the bayesian map

(excerpt)

What is planning?

12

23

Example (3/3) : bayesian maplocalization module

Des

crip

tion

Specification

Identification

– Relevant Variables• Sitt : {wall, corner, empty-space}, 3

• Px0 … Px7 : {0, 1, …, 15}, 16

– Decomposition of the joint• P(Px0 … Px7 Sitt | CPsit)

= P(Sitt | CPsit) Pi P(Pxi | Sitt CPsit)

– Parametric form for each term

• P(Sit | CPsit) Æ Uniform

• P(Pxi | [Sit=empty-space] CPsit) Æ Question P(Pxi | CPempty)

• P(Pxi | [Sit=wall] CPsit)

Æ Question P(Pxi | CPwall) = Â J D P(Pxi | J D CPwall)

• P(Pxi | [Sit=corner] CPsit)

Æ Question P(Pxi | CPcorner) = Â Pos P(Pxi | Pos CPcorner)

24

Contents

• Bibliography• Bayesian Robot Programming

– Example and definition– Putting bayesian programs together

• Bayesian maps– Definition and example– Putting bayesian maps together

13

25

Putting maps together :superposition

corner

wall

empty

VHighHigh

LowVLow

+VH +, -

+, -

+, -

H

L

VL

+

+

-

-

-

+, -

E,VLE,L

E,HE,VH

sup

Cor.

Wall

Emp.VH H L VL

+,-++ +

+,- +,-

---

+,-

+,- -

-

What is navigation?What is a map?

26

Putting maps together :juxtaposition (1)

• Abstracting maps on the same internal variable

Pass corridor

Room3

Room1 Room2

Any otherbehaviour

What is planning?What is a map?

14

27

Putting maps together :juxtaposition (2)

• Other examples :– Maps are floors– connectors are stairways or elevatorsÆ new abstraction is a building

– Maps are buildings and streets– connectors are doors

Æ new abstraction is (the whole) street

28

Putting maps together :abstraction (1)

• Abstracting maps of different natures

• Map of a wall :

– P : Px0 Ÿ … Ÿ Px7

– Lt : Lt’ : q Ÿ Dist

– A : Rot Ÿ TransTheta = -90

Theta

Dist

Theta = 90

Dist = 0

Dist = 1

Dist = 2

q-180 1500�

backward

forward

stop

What is a location?

15

29

« Wall » : localization

– Learning

Des

crip

tion

Specification

Identification

– Relevant Variables

• J : {-180, -150, …, +150}, 12

• D : {0, 1, 2}, 3

• Px0 … Px7 : {0, 1, …, 15}, 16– Decomposition of the joint

– Parametric form for each term

• P(J D)��� Æ Uniform

• P(Pxi | J D) Æ Gaussians

Theta = -90

Theta

Dist

Theta = 90

30

« Wall » : control

Des

crip

tion

Specification

Identification

– Relevant Variables

• J : {-180, -150, …, +150}, 12 ; Dist : {0, 1, 2}, 3

• Beh : {FollowWall, Away-from-wall}, 2

• Vrot, Vtrans

– Decomposition of the joint

• P(J D Beh Vrot Vtrans | CPwall-control)

P(J D Beh | CPwall-control)

P(Vrot Vtrans | J D Beh CPwall-control)– Parametric form for each term

• P(J D Beh | CPwall-control) Æ Uniform

• P(Vrot Vtrans | J D Beh CPwall-control)

Æ G J D Beh (Vrot, Vtrans)

16

31

« Wall » bayesian mapD

escr

iptio

n

Specification

Identification

– Relevant Variables• P : Px0 … Px7 (16 valeurs)

• Lt : J : {-180, -150, …, +150}, 12 ; D : {0, 1, 2}, 3

• Lt’ : Beh : {stop, followW, away-from-wall}, 3

• A : Vrot, Vtrans– Decomposition of the joint

• P(Px0 … Px7 J D Beh | CPwall)

P(Px0 … Px7 Beh | CPwall)

P(J D | Px0 … Px7 CPwall)

P(Vrot Vtrans | J D Beh CPwall)– Parametric form for each term

• P(Px0 … Px15 Beh | CPwall) Æ Uniform

• P(J D | Px0 … Px15 CPwall)

Æ Question P(J D | Px0 … Px15 CPwall-loc)

• P(Vrot Vtrans | J D Beh CPmur)

Æ Question P(Vrot Vtrans | J D CP )

32

Putting maps together :abstraction (2)

• Map of a corner :– Lt : Pos = {FrontL, FrontR, …}

• Map of the empty space

– Lt : ∆

FrontRight

What is a location?

corner

wall

empty

• New abstraction :

17

33

Loose ends

• Small state spaces– Necessary for planning, sufficient for most tasks?

• Explosing Lt into Lt1 Ÿ … Ÿ Ltn

• Planning– Iteration of the map on P(At At+1 … At+h | Lt Lt+h)

• Learning– Given the location variable, learn (parts of) the map : easy– Select a location variable out of several : maybe– Find out a relevant location variable out of the blue : hard!

• Time constant– Estimate it from the graph diameter

34

Conclusion

• New framework for (bayesian) mapping andnavigating– Intuitively appealing– Integrates hierarchies– Unified formalism

• Where’s the killer experiment?• Is it biologically plausible?

– Translation of existing models into our scheme?– Prediction of new data?

Obstacle Avoidance Using Proscriptive Programming

Cycab Vehicle Experiment Description

Cedric Pradalier

Carla Koike

PhD StudentsSharp − GRAVIR/IMAG − CNRS

1

Motivation

" First pratical experiment using bayesian program in Cycab

" Combination of Obstacle Avoidance and other tasks in a hierarchical manner

2

Presentation Objectives

" Show several ways to implement obstacle avoidance

" Show how to fuse proscriptive commands and reference values

" Present one example of the incremental cycle in robotic applications design using Bayesian Programming

3

Contents" Introduction

" Cycab Vehicle Description and Problem Context

" Proposed Solutions:

1. Zone weighting (prescriptive)2. Zone fusion (prescriptive)3. Zone fusion (proscriptive)

" Incremental Design Aspects

" Discussion and Conclusions

4

Proscriptive Programming

" Prescriptive versus Proscriptive

� Prescriptive tells you what you have to do

� Proscriptive tells you what you cannot do

" Some situations are better modelled in a way or other� Phototaxis is tipically prescriptive

� Obstacle avoidance is easily modelled as proscriptive

5

Prescriptive and Proscriptive Programming

6

Proscriptive in Bayesian Programming

" Bayesian Programming is a very useful tool for creating proscriptive models� Permission is given by high probabilities

� Interdictions are modelled as low valued probabilities

" Fusion of different command propositions allows to obtain a trade−off between desired values and allowed values

7

Command Fusion

" High level tasks determine desired (deterministic or probabilistic) values for command variables

" Obstacle avoidance fuses these desired values with the allowed situations when in the presence of obstacles

Probabilistic Fusion

High Level Task

Obstacle Avoidance

8

Contents" Introduction

" Cycab Vehicle Description and Problem Context

" Proposed Solutions:

1. Zone weighting (prescriptive)2. Zone fusion (prescriptive)3. Zone fusion (proscriptive)

" Incremental Design Aspects

" Discussion and Conclusions

9

Cycab Vehicle" Electric Vehicle" Sensor

� Laser Sick

" Vehicle Control

� Wheels speed

� Steering angle

10

Problem Description

" Obstacles are static. If some dynamic obstacles exist, they move at slow speed (walking man or manouvering car)

" The Cycab shall avoid the obstacles, keeping when possible the desired values of translation and steering angle

" We define as highly dangerous elliptic area located in front of the cycab. Any object in this area MUST impose a zero speed.

11

Sensor Variables

" Sensor reading values − Di

� Signal Preprocessing creates 8 zones in front of vehicle

� Only the lowest distance in a zone is taken

� Reading Values between 0~8191 scaled to 0~200 (Obstacle distance between 0 and 2000cm)

Zone 1

Zone 2

Zone 8

D6

12

Motor Variables" Control Values

� Vehicle Translation Speed V is discretized in 6 values " V = 0..5, a discretization of 0..V

max.

" Always positive since we have only a front sensor.

� Steering Angle φ can have 11 values −5..+5, discretization of φ

min..φ

max.

V

Φ

13

Contents" Introduction

" Cycab Vehicle Description and Problem Context

" Proposed Solutions

1. Zone weighting (prescriptive)2. Zone fusion (prescriptive)3. Zone fusion (proscriptive)

" Incremental Design Aspects

" Discussion and Conclusions

14

First Version Zone Weighting

" Each zone proposes probability distributions for the variables V and Φ in order to avoid the nearest obstacle seen in this zone

" A variable H is created to indicate the weight of each zone proposal for the final value of V and Φ

15

Zone Model Description16

Zone Weighting DescriptionD

escr

ipti

on

Prog

ram

17

Program UtilizationFirst Version

18

Zone Weighting ¯ Combination of Descriptions

+

P8(V | D

8)

P([H=8] | D8)

D8

*

P1(V | D

1)

P([H=1] | D1)

D1

*

P(V | D1...D

8)

19

Combination of Descriptions

Weighting

20

Results and CommentsFirst Version

" Results are coherent but depends a lot on the functions f

i(D

i) which are not easy to adjust

21

Second Version − Zone Command Fusion

" Standard deviation values of Pi(V|D

i) and

Pi(φ |D

i) are not taken into account

� They represent redundancy of information with fi(D

i)

" Command Fusion can use the standard deviation values of P

i(V|D

i) and P

i(φ |D

i) as weights

22

�Variables

Zone Model DescriptionD

escr

ipti

onQ

uest

ion

Prog

ram

Specification

Identification

�Decomposition

� Parametrical Forms

" A priori

23

V and φ Distribution building

24

Command Fusion DescriptionD

escr

ipti

onQ

uest

ion

Prog

ram

Specification

Identification

�Variables D1,...,D

8: 0...200 ,201

V : 0...5 ,6

φ : B5...5 ,11

�DecompositionP V ⊗φ⊗D

1⊗...⊗D

8= P V ⊗φ ∏

i=1

8

Pi Di|V ⊗φ

�Parametrical Forms P V ⊗φ =Uniform

Pi

Di|V ⊗φ =1

Z{ P

iV|D

iP

iφ|D

i}

Question to Zone

− A priori

Utilisation

P V ⊗φ|D1...D

8=

1

Z{∏

i=0

8

Pi

Di|V ⊗φ }

Command Fusion

25

Command Fusion ¯ Composition of Descriptions

*

P8(D

8|Vφ)

D8

P1(D

1|Vφ)

D1

P(Vφ| D1...D

8)

Pi(D

i|Vφ)

Di

...

...

P V ⊗φ|D1...D

8=

1

Z{∏

i=0

8

Pi

Di|V ⊗φ }

26

Results and CommentsSecond Version

" Results similar to anterior version, but it is easier to adjust the parameters

" Transitions in the speed variable and steering angle are smoother

27

Third Version − Proscriptive Programming

" The curves of Pi(V|D

i) and P

i(φ |D

i) change in order

to indicate the prohibited behavior, when an obstacle is identified in each zone

" The command fusion is equivalent to the anterior version

" When fusing probabilities curves, uniform distribution doesn’t add information

28

�Variables

Zone Model Description

� A priori

Des

crip

tion

Prog

ram

Specification

Identification

�Decomposition

� Parametrical Forms

29

Main Difference between Versions 2 and 3

Version 3 : what is allowed/safe

30

Command Fusion Description

� A priori

Des

crip

tion

Prog

ram

Specification

Identification

�Variables D1,...,D

8: 0...200 ,201

V ,Vc: 0...5 ,6

φ ,φc: B5...5 ,11

�DecompositionP V

c⊗V ⊗φ

c⊗φ⊗D

1⊗...⊗D

8=

P Vc

P φc

P V |Vc

P φ|φc ∏

i=1

8

{ Pi

Di|V ⊗φ }

�Parametrical Forms P V ⊗φ =Uniform;

P V c =P φc =Unknown;

P V|Vc=G

myv

Vc

;σv

V ;

P φ|φc =G myφφ

c;σ

φ

φ ;

Pi Di|V ⊗φ =1

Z{ Pi V|Di Pi φ|Di }

Question to Zone

31

Results and CommentsThird Version

Prescriptive :Safest speed is weighted moreResulting speed is greater than

safest speed.

Proscriptive :Resulting speed is the safest

speed.Safest speed is the strongest

constraint

32

Example of Fusion ResultSituation :

" Two near objects on each side : 2 and 7" One object, farther, in front

Objective : the faster in straight line

Reasonable behaviour : " The vehicle should go straight forward" It may turn slowly

33

Summary Table

Version 1 Version 2 Version 3

Prescriptive Prescriptive Proscriptive

Reference Values

Zone Weighting °

Combination

Command Fusion ° Composition

Command Fusion °

Composition

No Reference Values

No Reference Values

34

Videos − Third Version

Media Clip

Media Clip

35

Contents" Introduction

" Cycab Vehicle Description and Problem Context

" Proposed Solutions:

1. Zone weigthing (prescriptive)2. Zone fusion (prescriptive)3. Zone fusion (proscriptive)

" Incremental Design Aspects

" Discussion and Conclusions

36

Incremental Design

" New tasks can be added incrementally and composed with obstacle avoidance task

" The tasks can suply reference values

� Deterministic or probabilistic

� Prescriptive or proscriptive

" Hierachical combination of tasks

� Reference values can result from combination or composition of other tasks

37

Contents" Introduction

" Cycab Vehicle Description and Problem Context

" Proposed Solutions:

1. Zone weigthing (prescriptive)2. Zone fusion (prescriptive)3. Zone fusion (proscriptive)

" Incremental Design Aspects

" Conclusions and Improvements

38

Comments and Conclusions" Obstacle avoidance task and Bayesian

Programming" Proscriptive programming as a way to

� increase modularity

� model security rules

" Basis for Incremental Design of more complex systems

" One or more versions can be implemented in BibaBot

39

Some Improvements...

" Implement the fourth version" Implement a wall following task to suply the

references values of speed and steering angle" Try different approaches to each zone

� Proscriptive when possible

� Prescriptive when necessary

� Different joint distributions regarding the importance of dependence between speed and angle

40

Thank you!

Questions, Sugestions, Comments,...

41

1

DEA I.S.C.

Apprentissage bayésien par imitation(Dans le cadre du projet européen BIBA, Bayesian Inspired Brain and Artefact)

Frédéric Raspail

Tuteur : Pierre BessièreLaboratoire GRAVIR Projet SHARP

19 juin 2002

1

Introduction

• Imitation : concept mal défini• Définition : «L’imitation est le processus par lequel

l’imitateur apprend quelques caractéristiques ducomportement du modèle»

• Présent dans le monde animal :• Éducation, propagation des comportements• Exemples :

• Caneton et sa mère• Mésanges

• Intérêt futur en robotique• Problème : Avoir un modèle interne de l’imité

2

2

Introduction

Expérience :

• 1 : remontée de gradient (vers la source)

• 2 : remontée vers une source parmi plusieurs

=> faire de l’imitation de manière simple

• 3 : remontée vers une source puis vers une autre

=> reconnaître ce que fait l’imité

3

Plan

• Environnement expérimental

• Programmation Bayésienne des Robots

• Expérience 1

• Expérience 2

• Expérience 3

• Conclusion et perspectives

4

3

Environnement expérimental :Robot

• Robot Koala• Caméra 2 degrés de

liberté

• Vrot, Vtrans [–64…63]

• « Nez »• Sources odorantes

simulées

• Odeur [-10…+10]

• Poursuite visuelle• DEA IVR Coué 2000

5

Environnement expérimental :Mise en place de l’imitation

• Apprendre : remontée vers une source

• Imiter grâce à la poursuite visuelle

6

4

Environnement expérimental :Mise en place de l’imitation

Phase d’apprentissage6

Plan

• Environnement expérimental

• Programmation Bayésienne des Robots

• Expérience 1

• Expérience 2

• Expérience 3

• Conclusion et perspectives

7

5

Comportement de remontée vers une source

Restitution du comportement programmé8

Programme « remontée vers une source »

• Fixé a priori• Télé-opération

• Imitation (expérience 1)

• P(Vrot | [Odeur=o] )

Des

crip

tion

Que

stio

n

Prog

ram

me

Spécification

Identification

Utilisation

•Variables

•Odeur : [-10..+10], 21

•Vrot : [-64..+63], 128

•Décomposition

P(Odeur Vrot) = P(Odeur) P(Vrot | Odeur)

•Formes paramétriques

•P(Odeur ) Æ Uniforme (1)

•P(Vrot | Odeur) Æ Gaussiennes (21)

9

6

Plan

• Environnement expérimental

• Programmation Bayésienne des Robots

• Expérience 1

• Expérience 2

• Expérience 3

• Conclusion et perspectives

10

Expérience 1 : Mode opératoire

• <odeure,vrote>• Un corpus / un apprentissage:

• Une position initiale• Plusieurs orientations initiales• Plusieurs fois le même mvt

• 3 apprentissages successifs

11

7

Expérience 1 :Résultats

Phase de restitution12

Expérience 1 :Résultats

12

8

Expérience 1 :Résultats

12

Expérience 1 :Résultats

12

9

Plan

• Environnement expérimental

• Programmation Bayésienne des Robots

• Expérience 1

• Expérience 2

• Expérience 3

• Conclusion et perspectives

13

Expérience 2 :Description

14

10

Expérience 2 :Description

• Apprentissage par imitation des moyennes et écart-types

Des

crip

tion

Que

stio

n

Prog

ram

me

Spécification

Identification

Utilisation

•Variables•Odeur1 : [-10..+10], 21

•Vrot : [-64..+63], 128

•DécompositionP(Odeur1 Vrot) = P(Odeur1) P(Vrot | Odeur1)

•Formes paramétriques

•P(Odeur1) Æ Uniforme (1)

•P(Vrot | Odeur1) Æ Gaussiennes (21)

•P(Vrot | [Odeur1=o])

14

Expérience 2 :Description

• Apprentissage par imitation des moyennes et écart-types

Des

crip

tion

Que

stio

n

Prog

ram

me

Spécification

Identification

Utilisation

•Variables•Odeur2 : [-10..+10], 21

•Vrot : [-64..+63], 128

•DécompositionP(Odeur2 Vrot ) = P(Odeur2) P(Vrot | Odeur2)

•Formes paramétriques

•P(Odeur2) Æ Uniforme

•P(Vrot | Odeur2) Æ Gaussiennes (21)

•P(Vrot | [Odeur2=o] )

14

11

Expérience 2 :Description

• Apprentissage par imitation des moyennes et écart-types

Des

crip

tion

Que

stio

n

Prog

ram

me

Spécification

Identification

Utilisation

•Variables•Odeur3 : [-10..+10], 21

•Vrot : [-64..+63], 128

•DécompositionP(Odeur3 Vrot) = P(Odeur3) P(Vrot | Odeur3)

•Formes paramétriques

•P(Odeur3) Æ Uniforme (1)

•P(Vrot | Odeur3) Æ Gaussiennes (21)

•P(Vrot | [Odeur3=o])

14

Expérience 2 : Mode opératoire & résultats

• Un corpus / un apprentissage:• Plusieurs positions initiales• Plusieurs orientations initiales• Plusieurs fois le même mouvement

• <odeure1, vrote>• <odeure2, vrote>• <odeure3, vrote>

15

12

Expérience 2 : Mode opératoire & résultats

15

Expérience 2 : Mode opératoire & résultats

15

13

Plan

• Environnement expérimental

• Programmation Bayésienne des Robots

• Expérience 1

• Expérience 2

• Expérience 3

• Conclusion et perspectives

16

Expérience 3 :Présentation

17

14

Expérience 3 :Présentation

Phase d’apprentissage17

Expérience 3 :Description

• P(Vrot | [Fuite=f] [Odeur2=o2] [Odeur1=o1] C-comb)

• Variables• Odeur1, Odeur2, Vrot

• H : {1,2}, 2

• Fuite: {1,2},2

• Décomposition• P(Odeur1 Odeur2 Fuite H Vrot) = P(Odeur1) P(Odeur2) P(Fuite) P(H | Fuite)

P(Vrot | H Odeur2 Odeur1)

• Formes paramétriques

• P(Odeur1), P(Odeur2), P(Fuite) Æ Uniformes

• P(H | Fuite) Æ Laplace

• P(Vrot | [H=1] Odeur2 Odeur1) = P(Vrot | Odeur1 C-rem1)

• P(Vrot | [H=2] Odeur2 Odeur1) = P(Vrot | Odeur2 C-rem1)

Des

crip

tion

Que

stio

n

Prog

ram

me

Spécification

Identification

Utilisation

• Apprentissage de P(H | Fuite) par imitation

18

15

Expérience 3 :Identification de P(H | Fuite)

• Imitation: 4-uplets <o1e, o2e, fe, ve>

• Question :

P(H | [Odeur1=o1e] [Odeur2=o2e] [Fuite=fe] [Vrot=ve])

• Couple < fe ,h>

P H Odeur Odeur FuiteVrotP Vrot Odeur Odeur FuiteH

P Vrot Odeur Odeur FuiteHH

1 21 2

1 2( )=

( )( )Â

P H Odeur Odeur FuiteVrotP Vrot Odeur C rem

P Vrot Odeur C rem P Vrot Odeur C rem=[ ]( )=

-( )-( )+ -( )

1 1 21 1

1 1 2 1

19

Expérience 3 : Mode opératoire & résultats

Construction des corpus:•Une position initiale•Plusieurs orientations initiales•Plusieurs fois le même mouvement

20

16

Expérience 3 : Mode opératoire & résultats

0,8160,184f = 2

0,2710,729f = 1

h = 2h = 1P([H=h] | [Fuite =f])

0,6730,327f = 2

0,5350,465f = 1

h = 2h = 1P([H=h] | [Fuite =f])

20

Plan

• Environnement expérimental

• Programmation Bayésienne des Robots

• Expérience 1

• Expérience 2

• Expérience 3

• Conclusion et perspectives

21

17

Conclusion

• Expérience 1 & 2 :imitation simple : résultats concluants

• Expérience 3 :L'imitateur reconnaît le comportement qu'il imite:résultats concluants dans une configuration.Début de réponse pour reconnaître ce que faitl’autre.

22

Perspectives

• Technique• Aide à l’apprentissage

• Adapter la poursuite visuelle

• Tester robustesse des comportements

• Compléter les corpus

• Apprendre des comportements plus complexes

• Plus long terme• « Dressage » du robot BIBA par l’homme (suivre au pied)

• Propagation de comportements (Koala à Koala )

• Reconnaître des comportements

23

9/07/02

1

programmationbayésienne de

personnages de jeuxvidéos

Ronan Le HyDEA Sciences Cognitives

2001 - 2002encadrants :

Pierre BessièreOlivier Lebeltel

2

9/07/02

2

3

idée

une nouvelle méthode deprogrammation pour les bots dans lesjeux vidéos

4

objectifs

pour l’équipe de développement facilité de programmation

temps de calcul limité

séparation programmation / conception ducomportement du personnage

facilité de programmer différents comportements

pour le joueur “humanité”

apprendre à jouer à des bots

9/07/02

3

5

plan cadre

objectifs

plateforme

objectif concret

modèle bayésien

programmation d’un comportement

apprentissage

conclusion

6

cadre : plateforme

bot

Unreal Tournament

Gamebots (ISI & CMU)

messages :• position• vitesse• santé• personnages visibles…

ordres :• aller à un point• tirer…

9/07/02

4

7

cadre : objectif concret

boucle :relever les valeurs des capteurschoisir un nouvel étatagir

recherche d’une arme

recherche de bonus de santé

fuite

attaque

exploration

détection danger

formellement :connaissant l’état courant Et

et les variables sensorielles Vidécider du nouvel état Et+1

8

programmation bayésiennedes robots

structure d’un programme bayésien

9/07/02

5

9

modèle bayésien :variables pertinentes

description variables pertinentes

Et Et+1

Vie Arme ArmeAdversaire Bruit NombreEnnemis ProxArme ProxSanté

sensorielles

motrices

états du bot Attaque RechercheArme RechercheVie Exploration Fuite DetectionDanger

10

modèle bayésien :décomposition

P(Et Et+1 V A Ad B Ne Pa Ps)

= P(Et) P(Et+1 | Et)P(V | Et Et+1)P(A | V Et Et+1)P(Ad | A V Et Et+1)P(B | Ad A V Et Et+1)P(Ne | B Ad A V Et Et+1)P(Pa | Ne B Ad A V Et Et+1)P(Ps | Pe Ne B Ad A V Et Et+1)

hypothèse :variables sensorielles et Et indépendantes deux à deux sachant Et+1

= P(Et) P(Et+1 | Et) P(V | Et+1) P(A | Et+1) P(Ad | Et+1) P(B | Et+1) P(Ne | Et+1) P(Pa | Et+1) P(Ps | Et+1)

9/07/02

6

11

formes paramétriques

formes paramétriques P(Et) : inconnue (non spécifiée)

P(Et+1 | Et) : table

P(Vsensorielle | Et+1) : tables

identification :à la main, ou par apprentissage

12

modèle bayésien : questionQuel est l’état suivant sachant l’état courantet les variables sensorielles ? P(Et+1 | Et V A Ad B Ne Pa Ps)

résolution :

9/07/02

7

13

programmation inverse :facilité de programmation

programmation classiquesachant Etsachant V1, V2…

donner Et+1

pour |Et|=6 transitions à partir d’unétat,

partitionner en 6 ensemblesl’espace sensoriel de taille |V| = 648

une transition = une formule logique

programmation inversesachant Et+1

donner V1, V2…

pour 7 variables sensorielles, pour|Et|=6 états (soit 42 cas),

donner une distribution

une distribution = une table

14

programmation inverse (2)

si P(Arme=Aucune | Et+1 = Attaque) = 0,

et si Arme = Aucune,

le bot ne peut pas passer en Attaque :

P(Et+1 = Attaque | … [Arme=Aucune] …) = 0

chaque distribution élémentaire contribue à laquestion, qui est une forme de moyenne géométrique

complexité maitrisée : linéaire dans le nombre d’états

linéaire dans le nombre de variables

9/07/02

8

15

facilité de développement

forme de spécification de l’automateplus condensée

puissante

légère en temps de calcul

16

programmation d’uncomportement

écriture des tables

P(Vie | Et+1)gestion du niveau de vie

ARARVExFuDD

AttaqueRecherche ArmeRecherche Vie

ExplorationFuite

Détection Danger

• 0.95

• 0.01

• 0.01

• 0.01

• 0.01

• 0.01• DD

• 0.01

• 0.95

• 0.01

• 0.01

• 0.01

• 0.01• Fu

• 0.01

• 0.01

• 0.95

• 0.01

• 0.01

• 0.01• Ex

• 0.01

• 0.01

• 0.01

• 0.95

• 0.01

• 0.01• RV

• 0.01

• 0.01

• 0.01

• 0.01

• 0.95

• 0.01• RA

• 0.01

• 0.01

• 0.01

• 0.01

• 0.01

• 0.95• A

• DD• Fu• Ex• RV• RA• A•

P(Et+1 | Et)auto-maintien

• 0.45• 0.1

• 0.45

• 0.001

• 0.45

• 0.899

• Haut

• 0.45• 0.2

• 0.45

• 0.01

• 0.45• 0.1

• Moyen

• 0.1• 0.7• 0.1• 0.9

89• 0.1• 0.0

01• Bas

• DD• Fu• Ex• RV• RA• A•

P(NombreEnnemis | Et+1)gestion du risque

9/07/02

9

17

spécification à la main

séparation du développement et de laconception du personnage

les comportements comme desdonnées

18

programmation d’un secondcomportement : ajustabilité second comportement aggressif

9/07/02

10

19

programmation d’uncomportement

(films : odge etberserk)

20

programmation d’uncomportement

(films : odge etberserk)

9/07/02

11

21

apprentissage :enseignement par le joueur

interface

22

tables

9/07/02

12

23

humanité

critère subjectif

cependant : forme de test de Turing

24

conclusion

pour l’équipe de développement temps de calcul limité

facilité de programmation

séparation programmation / conception dupersonnage

facilité de programmer différents comportements

pour le joueur “humanité”

enseigner le jeu à des bots

9/07/02

13

25

perspectives

perspectives : descendre plus bas avec le bayésien

intégrer des aspects délibératifs aumodèle

nouveaux schémas d’apprentissage

26

9/07/02

14

27

Gamebots(1) capteurs

messages synchrones et asynchrones personnage

id, nom, équipe

position (rotation, lieu), vitesse

santé, armure, niveau de vie

arme, munitions

autres personnages visibles id, nom, équipe

position (rotation, lieu), vitesse, accessibilité

arme, fait feu

environnement dans le champ de vision nœuds de navigation : id, lieu, accessibilité

portes, ascenseurs : id, lieu, accessibilité, type

objets : id, lieu, accessibilité, type

jeu scores, capture de drapeau

ramassé objet

pieds, tête ou corps changent de zone(eau, lave…)

changement d’arme (auto ou provoqué)

collision avec un mur, un objet ou un joueur

chute

mort, blessure

mort, blessure infligée par soi

bruit (pas, ascenseur, tirs, objet ramassé)

projectile se dirigeant vers soi

réponse à une requête de chemin oud’accessibilité

message d’un autre joueur (texte ou typé)

28

Gamebots(2) commandes

déplacement marcher, courir vers un point, un joueur, un objet, un nœudde navigation…

courir vers un point en faisant face à un point/objet (strafe)

se tourner vers un point/objet ou d’un angle

s’arrêter

sauter

armes commencer à tirer

arrêter de tirer

changer d’arme

requêtes chemin vers un point/objet

accessibilité d’un point/objet

message aux autres joueurs

1

Learning issues discussionLearning issues discussion

Moderated by Jean Laurens andModerated by Jean Laurens and Fr Frééddééric Davesneric DavesneLPPALPPA

BIBA Summer School, Moudon, 30 June-5 July 2002

BIBA Summer School - Moudon - July 2nd 2002

IntroductionIntroduction

When is learning useful ?When is learning useful ?–– Learning techniques are used when Learning techniques are used when uncertaintyuncertainty is is

encountered: lack of prior knowledgeencountered: lack of prior knowledge

But ...But ...–– Each learning technique needs the master to giveEach learning technique needs the master to give

prior knowledgeprior knowledge

Questions about this prior knowledgeQuestions about this prior knowledge–– Feasibility ? Certainty ? Accuracy ?Feasibility ? Certainty ? Accuracy ?

The core of the discussion ...The core of the discussion ...–– What about raising these questions within theWhat about raising these questions within the

bayesian framework ?bayesian framework ?

1- W

hat

do w

e m

ean

by «

Lea

rnin

g »?

2

BIBA Summer School - Moudon - July 2nd 2002

OverviewOverview

1 - What do we mean by 1 - What do we mean by ““LearningLearning”” ? ?–– Why using learning ?Why using learning ?–– Short overview of some learning paradigmsShort overview of some learning paradigms–– Learning issues (Opened to Discussion)Learning issues (Opened to Discussion)

2 - Learning within the2 - Learning within the bayesian bayesian framework framework––AA supervised learning example supervised learning example–– A latent learning exampleA latent learning example

BIBA Summer School - Moudon - July 2nd 2002

Why using learningWhy using learning tecniques tecniques ? ?

Causes of uncertaintyCauses of uncertainty–– UnabilityUnability in modelling the in modelling the ““worldworld””

the environment is unknown or is not constrained enoughthe environment is unknown or is not constrained enoughthe model of the sensors and/or the effectors is unknownthe model of the sensors and/or the effectors is unknownthe interaction between the system and its environmentthe interaction between the system and its environmentis unknown or is not constrained enoughis unknown or is not constrained enough

–– Lack in the computing processLack in the computing processhow to reach a goal or to achieve a behaviour ?how to reach a goal or to achieve a behaviour ?how to produce useful and reliable data ?how to produce useful and reliable data ?1-

Wha

t do

we

mea

n by

« L

earn

ing

»?

3

BIBA Summer School - Moudon - July 2nd 2002

The technical sideThe technical side

Perceptual issuePerceptual issuethe environment is unknown or is not constrained enoughthe environment is unknown or is not constrained enoughthe model of the sensors and/or the actuators is unknownthe model of the sensors and/or the actuators is unknownthe interaction between the system and its environmentthe interaction between the system and its environmentis unknown or is not constrained enoughis unknown or is not constrained enoughhow to produce useful and reliable data ?how to produce useful and reliable data ?

Procedural issueProcedural issuehow to reach a goal or to achieve a behaviour ?how to reach a goal or to achieve a behaviour ?

1- W

hat

do w

e m

ean

by «

Lea

rnin

g »?

BIBA Summer School - Moudon - July 2nd 2002

Learning paradigmsLearning paradigms

SupervisedSupervised

ReinforcementReinforcement

UnsupervisedUnsupervised

LatentLatent1- W

hat

do w

e m

ean

by «

Lea

rnin

g »?

SC

M

4

BIBA Summer School - Moudon - July 2nd 2002

ExampleExample

Dist

Robot

Visual landmark

Movement

Target

Sensory variables : landmarks perceptionMotor variable : « this way ! »Comportemental variable : d(Dist)/dt < 0 -> Target reaching

1- W

hat

do w

e m

ean

by «

Lea

rnin

g »?

BIBA Summer School - Moudon - July 2nd 2002

Multi-Layered Neural Networks with back-Multi-Layered Neural Networks with back-propagationpropagation

Hypothesis about prior knowledgeHypothesis about prior knowledge–– Each of the examples is a Each of the examples is a functional andfunctional and

meaningfulmeaningful relation between input and output data relation between input and output data–– Uncertainty = inaccuracyUncertainty = inaccuracy

Learning = best interpolationLearning = best interpolation

Learning paradigms: supervisedLearning paradigms: supervised

1- W

hat

do w

e m

ean

by «

Lea

rnin

g »?

NNSet of

examplesS

C

M

5

BIBA Summer School - Moudon - July 2nd 2002

AHC or Q-Learning-like methodsAHC or Q-Learning-like methods

Hypothesis about prior knowledgeHypothesis about prior knowledge–– The states are perfectly designedThe states are perfectly designed

Uncertainty = inaccuracy for the perceptual issueUncertainty = inaccuracy for the perceptual issueMDP or POMDPMDP or POMDP

–– The reinforcement value is relevant and perfectly knownThe reinforcement value is relevant and perfectly known–– Knowledge about the probability to find a first solutionKnowledge about the probability to find a first solution

(relevant internal parameters)(relevant internal parameters)

Learning = Maximising expected rewardsLearning = Maximising expected rewards

Learning paradigms: reinforcementLearning paradigms: reinforcement1-

Wha

t do

we

mea

n by

« L

earn

ing

»?

Robot environmentstate action

reward

SC

M

BIBA Summer School - Moudon - July 2nd 2002

Clustering methods (SOM)Clustering methods (SOM)

Hypothesis about prior knowledgeHypothesis about prior knowledge–– Unitisation and differenciation postulates (what makes aUnitisation and differenciation postulates (what makes a

state exist and be unique ?)state exist and be unique ?)–– May lead to a supervised learning problemMay lead to a supervised learning problem

Learning = fit a probability distributionLearning = fit a probability distribution

Learning paradigms: unsupervisedLearning paradigms: unsupervised

1- W

hat

do w

e m

ean

by «

Lea

rnin

g »?

System statesignals

6

BIBA Summer School - Moudon - July 2nd 2002

Some relevant informations can be learned withoutSome relevant informations can be learned withoutany conditionning processany conditionning process

–– Goal = AnticipationGoal = Anticipation - ACS [Stolzmann 1998] - ACS [Stolzmann 1998]

Hypothesis about prior knowledgeHypothesis about prior knowledge–– The states or the symbols used by the system are perfectlyThe states or the symbols used by the system are perfectly

designeddesigned

Learning = anticipate as best as possibleLearning = anticipate as best as possible

Learning paradigms: latent [Seward 1949]Learning paradigms: latent [Seward 1949]1-

Wha

t do

we

mea

n by

« L

earn

ing

»?

SC

M

BIBA Summer School - Moudon - July 2nd 2002

Learning issuesLearning issues

1- W

hat

do w

e m

ean

by «

Lea

rnin

g »?

•• Structure of the model Structure of the model

•• Curse of dimensionalityCurse of dimensionality

•• Meaningfulness of the variablesMeaningfulness of the variables

•• Specification of the prior knowledge Specification of the prior knowledge

•• Uncertainty about the relevance of the prior knowledge givenUncertainty about the relevance of the prior knowledge givenby the masterby the master

7

BIBA Summer School - Moudon - July 2nd 2002

Curse of dimensionality Curse of dimensionality « « AAlmost all the learninglmost all the learningtechniques are bound to fail if the techniques are bound to fail if the intrinsic dimensionalityintrinsic dimensionalityof the problem is too bigof the problem is too big » » [Verleysen 2000] [Verleysen 2000]

–– Input space (supervised,unsupervised): need of anInput space (supervised,unsupervised): need of anenormous amount of dataenormous amount of data

–– Search space (reinforcement): need of an enormousSearch space (reinforcement): need of an enormousamount of time to discover the goalamount of time to discover the goal

WARNING !!!WARNING !!!–– Generally, the intrinsic dimensionality is less than theGenerally, the intrinsic dimensionality is less than the

dimensionality of the input space (eg. A Khepera robotdimensionality of the input space (eg. A Khepera robotin a in a structuredstructured environement) environement)

Learning issues: structure of the modelLearning issues: structure of the model1-

Wha

t do

we

mea

n by

« L

earn

ing

»?

BIBA Summer School - Moudon - July 2nd 2002

Meaningfulness of the variablesMeaningfulness of the variables–– Hidden variablesHidden variables–– Symbol grounding problem [Harnad 1992]Symbol grounding problem [Harnad 1992]

Learning issues: structure of the modelLearning issues: structure of the model

1- W

hat

do w

e m

ean

by «

Lea

rnin

g »?

8

BIBA Summer School - Moudon - July 2nd 2002

Idealistic learning techniqueIdealistic learning technique•• Certainty about the reliability of the prior knowledgeCertainty about the reliability of the prior knowledgegiven by the master, in the robotics contextgiven by the master, in the robotics context

Consequence of uncertainty about prior knowledgeConsequence of uncertainty about prior knowledge•• Lack of reliability about the result of the learningLack of reliability about the result of the learningprocessprocess•• Lack of predictibility: unability to determine the cause of Lack of predictibility: unability to determine the cause ofthe failure in a learning processthe failure in a learning process

•• a) a) Learnability of my problem ?Learnability of my problem ?

•• b) b) Correctness of my prior knowledge ?Correctness of my prior knowledge ?

Example: learning of the cart pole balancing problem Example: learning of the cart pole balancing problem with withreinforcement techniques [Davesne 2002]reinforcement techniques [Davesne 2002]

Learning issues: prior knowledgeLearning issues: prior knowledge1-

Wha

t do

we

mea

n by

« L

earn

ing

»?

BIBA Summer School - Moudon - July 2nd 2002

Learning issues: prior knowledgeLearning issues: prior knowledge

1- W

hat

do w

e m

ean

by «

Lea

rnin

g »?

An example of the consequences of a wrong priorAn example of the consequences of a wrong priorknowledgeknowledge

Cart pole balancing task with reinforcement learning

The learningseems to be good but ...

9

BIBA Summer School - Moudon - July 2nd 2002

Learning issues: prior knowledgeLearning issues: prior knowledge1-

Wha

t do

we

mea

n by

« L

earn

ing

»?

If the successfulcriteria is deeply increased, it fails to learn the task

An example of the consequences of a wrong priorAn example of the consequences of a wrong priorknowledge [Davesne 2002]knowledge [Davesne 2002]

Cart pole balancing task with reinforcement learning

BIBA Summer School - Moudon - July 2nd 2002

We have shownWe have shown•• Some learning paradigms which must be furnished with Some learning paradigms which must be furnished withspecifical prior knowledgespecifical prior knowledge•• Some typical learning issues which must be overcome (if Some typical learning issues which must be overcome (ifpossible)possible)

Now, it's time for raising questionsNow, it's time for raising questions•• With which of these learning paradigms should weWith which of these learning paradigms should weassociate the bayesian framework ?associate the bayesian framework ?•• Is bayesian learning a new learning paradigm ? Is bayesian learning a new learning paradigm ?•• What about the hypothesis about the prior knowledge ? What about the hypothesis about the prior knowledge ?•• What are the main issues ? What are the main issues ?

Initiation of the discussion ...Initiation of the discussion ...

1- W

hat

do w

e m

ean

by «

Lea

rnin

g »?

10

BIBA Summer School - Moudon - July 2nd 2002

Behavior learning by imitationBehavior learning by imitation•• Wall-following, phototaxy, etc. By a Khepera robot Wall-following, phototaxy, etc. By a Khepera robot

Pattern recognition (WARNING !!! Symbol groundingPattern recognition (WARNING !!! Symbol groundingproblem)problem)•• Distinguish a wall from a corner Distinguish a wall from a corner

A supervised learning exampleA supervised learning example2-

Lea

rnin

g w

ithi

n th

e ba

yesi

an f

ram

ewor

k

BIBA Summer School - Moudon - July 2nd 2002

An example of latent learningAn example of latent learning

2- L

earn

ing

wit

hin

the

baye

sian

fra

mew

ork

11

BIBA Summer School - Moudon - July 2nd 2002

BibliographyBibliographyDavesne, Frédéric (2002) Etude de l'émergence de facultés d'apprentissage fiables etprédictibles d'actions réflexes, à partir de modèles paramétriques soumis à descontraintes internes – Thèse de doctorat – Université d'Evry Val d'Essonne

Harnad, Steven (1992) Cognition and the symbol grounding problem – Electronicsymposium on computation

Seward, John P. (1949). An Experimental Analysis of Latent Learning. Journal ofExperimental Psychology. 39 177-186.

Stolzmann, Wolfgang (1998). Anticipatory Classifier Systems. In Koza; John R etal.(editors). Genetic Programming 1998. Proceedings of the third Annual Conference,July 22-25, 1998, University of Wisconsin, Madison, Wisconsin. San Francisco. CA:Morgan Kaufmann. 658-664.

Verleysen, Michel (2000) Machine learning of high-dimensional data: Local artificialneural networks and the curse of dimensionality- Thèse d'agrégation – Universitécatholique de Louvain - Belgique