BIBA - Summer School Proceedings · BIBA IST–2001-32115 Bayesian Inspired Brain and Artefacts!:...

transcript

BIBAIST–2001-32115

Bayesian Inspired Brain and Artefacts!:Using probabilistic logic to understand brain function

and implement life-like behavioural co-ordination

Summer School Proceedings

Deliverable: 5Workpackage: 6Month due: July 2002Contract Start Date: 01.11.2001 Duration: 48 months

Project Co-ordinator: INRIA–UMR-GRAVIR

Partners: CNRS–UMR-GRAVIR; UCL-ARGM; UCAM-DPOL;CNRS-UMR-LPPA; CDF-UMR-LPPA; EPFL!; MIT-NSL

Project funded by the EuropeanCommunity under the “InformationSociety Technologies” Programme (1998-2002)

BIBA SUMMER SCHOOL

30th June – 5th July

Ecole Cantonale d'Agriculture de Grange-Verney –Moudon - Switzerland

Contents

1 Participants

2 Program

3 Sessions

Annex - Presentations

1 Participants

Autonomous Systems Lab (ASL) - Ecole polytechnique fédérale de Lausanne (EPFL)http://dmtwww.epfl.ch/isr/asl/Roland SIEGWART, ProfessorDr. Nicola TOMATIS, EngineerGuy RAMEL, PhD studentSantanu METIA, PhD student

Laplace Group - Institut National de Recherche en Informatique et Automatique(INRIA)http://www-laplace.imag.frPierre BESSIERE, ResearcherEmmanuel MAZER, Research DirectorOlivier AYCARD, Assistant ProfessorHubert ALTHUSER, Research EngineerOlivier LEBELTEL, Research EngineerKamel MEKHNACHA, Research EngineerChristophe COUE, PhD studentJulien DIARD, PhD studentCarla KOIKE, PhD studentCédric PRADALIER, PhD studentFrancis COLAS, GraduateRonan LE HY, GraduateFrédéric RASPAIL, GraduateAdriana TAPUS, GraduateOlivier MALRAIT, Administrative

Laboratoire de Physiologie pour la Perception et l'Action (LPPA) - Collège de Francehttp://www.college-de-france.fr/chaires/chaire3/index.htmAlain BERTHOZ, ProfessorFrédéric DAVESNE, PostDocJean LAURENS, Graduate

2 Program

Monday 1st July

9h00 – 12h30: Session 1 & 4Session 1 - Biologically plausible probabilistic representation and inference mechanisms

Session 4 - Visual-vestibular interaction in self motion perception and gaze stabilization

14h00 – end of afternoon: Session 3 & PMCSession 3 - Switching vs Weighing & Sensor selection based (in particular) on contractiontheory

Project Management Committee

Tuesday 2nd July

9h00 – 13h00: Session 2Session 2 - Representation of space, maps and navigation

14h00 – end of afternoon: Sessions 5 & 6Session 5 - Bayesian Robot programming and modelling

Session 6 - Learning Issues Discussion

Evening: Robots Exhibition Visit

Wednesday 3rd July

9h00 – 12h30: Robot Specification & General Discussion

Afternoon: End

3 Sessions1 - Biologically plausible probabilistic representation and inferencemechanisms

Session length: 2h (presentations + discussion)

* Analytical and Bayesian Models of Head Direction CellsFrancis Colas, Pierre Bessière, 30 mn

* NeuroKhepera : One Step towards Neural Implementation of Bayesian InferenceJean Laurens, 30 mn

2 - Representation of space, maps and navigation

* Bayesian Maps and NavigationJulien Diard, 1h

* Markov Models Use in RoboticsAdriana Tapus, Olivier Aycard, 1h

* Hybrid Mobile Robot Navigation: A Natural Integration of Metric and Topological.Roland Siegwart, Nicola Tomatis, 1h

3 - Switching vs Weighing & Sensor selection based (in particular) oncontraction theory

Session length: 2h!30 (presentations + discussion)

* Some Examples of Visuo-Vestibular Interaction ModelsAlain Berthoz,!1h

* Selection of relevant sensors and setting of new sensory apparatusPierre Bessière, 30 mn

4 - Visual-vestibular interaction in self motion perception and gaze stabilization

* Nystagmus and Visuo-Vestibular Interactions ModellingJean Laurens, 30mn

5 - Bayesian Robot programming and modelling

Session length: 2h30 (presentations + discussion)

* Obstacle Avoidance using Proscriptive ProgrammingCédric Pradalier, Carla Koike, 30 mn

* A First Step toward Bayesian Learning by ImitationPierre Bessière, 30mn

* Bayesian Programming of Videogames CharactersRonan Le Hy, Olivier Lebeltel, 30mn

6 – Learning Issues Discussion

Session length: 2h

Introduced and moderated by Fédéric Davesne & Jean Laurens

Annex - Presentations

Session 1 - Biologically plausible probabilistic representation and inference mechanisms

- Analytical and Bayesian Models of Head Direction CellsFrancis Colas Graduate, INRIA

Session 2 - Representation of space, maps and navigation

- Hybrid Robot Mobile Navigation: A natural integration of Metric and TopologicalDr Nicola Tomatis, Prof. Roland Siegwart, EPFL

- Bayesian Maps and NavigationJulien Diard PhD student, INRIA

Session 5 - Bayesian Robot programming and modelling

- Obstacle Avoidance Using Proscriptive ProgrammingCédric Pradalier, Carla Koike PhD students, INRIA

- Apprentissage bayésien par imitationFrédéric Raspail Graduate, INRIA

- Programmation bayésienne de personnages de jeux vidéosRonan Le Hy Graduate, INRIA

Session 6 – Learning Issues Discussion

Introduced and moderated by Jean Laurens PhD student & Frédéric Davesne PostDoc, LPPA

Analytical and bayesianmodelling of head direction cells

Francis COLAS

Tutor : Pierre Bessière

European project BIBA in collaboration with

L.P.P.A. (Collège de France)

Introduction

Angle coding

• Head direction cells :– firing rate correlated with angle

– coding for actual or anticipated orientation

Taken from [Arleo00]

Brains areas and projections

• Adn (antero-dorsal nucleus) : anticipated head direction (ª25 ms)• Dtn (dorsal tegmental nucleus) : head angular velocity• Lmn (lateral mammillary nucleus) : anticipated orientation (ª95 ms)• Psc (postsubiculum) : present angle

Adapted from [Stackman98]

Dtn(w) Lmn(q)

Adn (q)

Psc (q)

Contents

• Introduction

• Previous models

• Analytical modelling

• Bayesian modelling

Previous models

• First model [McNaughton91]

• Neural implementation [Skaggs95]

• Mathematical framework [Zhang96]

• Adn study [Blair95], [Blair97]

• Use of Lmn for imtegration [Arleo00]

• Modeling attractor deformation in Adn[Goodrige00]

Constraints

1. Integration2. Respect des projections3. Anticipatory time intervals4. Anatomical lesions5. Uncertainty

Contents

• Introduction• Previous models• Analytical modelling

– Functional dependencies– Methodology– Formulas– Tests– Behaviour– Results

• Bayesian modelling

Functional dependencies

--------

------------

----------------

AdnPscgPsc

LmnPscgAdn

LmnPscgLmn wwww

Lmnt-3

Psct-3

Lmnt-2

Psct-2

Adnt-1

Psct-1

Dtn Lmn Psc

Constraint 2 :Projections

Methodology

Lmnt-2Adnt-1

Psct-3 Psct-2 Psct-1 Psct

Lmnt-3

)(tPsct qqqq====)(1 dttPsct ----====---- qqqq

)2(2 dttPsct ----====---- qqqq)3(3 dttPsct ----====---- qqqq)(1 dttAdn At ----++++====---- ttttqqqq

)3(3 dttLmn Lt ----++++====---- ttttqqqq)2(2 dttLmn Lt ----++++====---- ttttqqqq

)3(3 dttt ----====---- wwwwwwww

)(. 211 dtOdtPscPsc ttt ++++++++==== -------- wwww

)(. 2111 AtAtt OPscAdn ttttwwwwtttt ++++++++==== ------------

)(111 A

PscAdntttt

ttttwwww ++++

----==== --------

)( 2111 dtdtO

PscAdndtPscPsc A

tttt ++++++++

----++++==== --------

---- tttttttt

Formulas

222221

dtdtOPscAdn

dtPscPsc

dtOLmndt

++++++++----

++++====

++++++++++++++++

++++--------

------------

tttttttt

tttttttttttt

tttttttt

t+tL-2dt

Psct-3

t+tL-3dt

Lmnt-3

Lmnt-2

tt PscLmntttt

33 -------- ----

Lmnt-2

Psct-3

Lmnt-3

))(()( 23

3332 dtdtO

dtPscLmndtdtLmnLmn Lt

Ltt ++++++++

----++++

----++++==== ----

---------------- ttttwwww

tttttttt

Angular velocity

• Tests :– constant velocity

– trapezoidal profile for head angular velocity :

Behaviour

Results of an execution

Constraint 1 :Integration

Results

÷÷÷÷

¥¥¥¥

Contents

• Introduction

• Previous models

• Analytical modelling

• Bayesian modelling– Variables

– Decomposition

– Identification

– Questions

Bayesian model : Variables

• Variables :– Psct-3, Psct-2, Psct-1,Psct D=[-180 ; 180[

– Lmnt-3, Lmnt-2 D=[-180 ; 180[

– Adnt-1 D=[-180 ; 180[

– wt-3 Dw=[-600 ; 600]

– tA DA=[-0.010 ; 0.050]

– tL DL=[0.060 ; 0.110]

Bayesian model : Decomposition

)|()|(

)()()()()()()(

2213332

3231123

LAtttLtttt

LAttttt

LAtttttttt

PscAdnPscP

PscLmnAdnPPscLmnLmnP

PPPLmnPPscPPscPPscP

LmnLmnAdnPscPscPscPscP

ttttttttttttwwww

ttttttttwwww

--------

----------------------------

--------------------

----------------------------

¥¥¥¥

¥¥¥¥¥¥¥¥

¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥====

Lmnt-2

Adnt-1

Lmnt-3

Bayesian model : Distributions

• P(wt-3 ) : uniform ;

• P(tL), P(tA) : gaussians matching biologicaldata ;

• P(Psct-3), P(Psct-2), P(Psct-1), P(Lmnt-3) :gaussians around an initialization value ;

• For Lmn, Adn and Psc : gaussians centeredaround results from the analytical formulas.

Bayesian model : Program

• Questions :– P(Lmnt-2 | wt-3)

– P(Psct | wt-3)

Lmnt-2Adnt-1

Lmnt-3

Conclusion

÷÷÷÷

Future work

• Microscopic plausibility of bayesianhypothesis

• Extension to place cells

• Use of vision

† Swiss Federal Institute of Technology, Lausanne, Switzerland‡ The Robotics Institute, CMU, Pittsburgh, USA

Hybrid Mobile Robot Navigation: A Natural Integration of Metric and Topological

BIBA School, Moudon

Nicola Tomatis †

Roland Siegwart †

In collaboration with:Illah Nourbakhsh ‡

BIBA School: Hybrid Mobile Robot Navigation Nicola Tomatis

Contents

• Introduction• Environmental Modeling• Localization and Map Building

• Metric• Topological• Closing the Loop• Switching Model

• Experimental Results• Conclusion and Outlook

Introduction

• Mobile robotics for applications:• Precision with respect to the

environment• Robustness avoiding human

intervention• Practicability with limited embedded

resources• Ergonomics for the user (man-

machine interaction)

Motivation

Introduction

• Assumption: theory is available• Goal: theory to practice• Approach:

• Study the literature• Focus on advantages and

disadvantages of existing methods• Propose a more human (bio?) -

inspired approach• Validate it empirically

Motivation

Introduction

Related Work: Localization

Introduction

Related Work: SLAM

Environmental Modeling

The idea:• One global topological map

• Many local metric maps

Metric - Topological

• Features• Horizontal lines from laser scanner

• Local metric map containing the features belonging to the same physical place

Metric Model

• Features:• Corners• Openings

• Map is a graph• Openings correspond to map states

Topological Model

• Map building strategy• Implementation for office environment• Assumption: Precision is needed in

rooms• Navigation in hallways is topological,

in rooms metric• Exploration strategy

• Depth-first search in the hallways first• Then backtracking to visit the rooms

Localization and Map Building

Strategies

Metric: The EKF

• The product rule:• The Bayesian rule:• The Markov assumption:

• The independence assumption:

• Errors are Gaussian, error propagation is linear

The EKF is Bayesian

• Stochastic Map [Smith88]

• Update:• Displacement• New observation• Re-observation

Metric: Stochastic Map

• Belief state vector:• State transition:• Observation:• Estimation:

• Control strategy:

• Path-planning: graph based

Topological: POMDP

• The “observation graph” permit to detect new features

• Handle environmental dynamics

Topological Map Building

Closing the Loop

• Topological is multimodal

• Metric is unimodal• EKF initialization!!!

• Confidence function:

Switching Model

• Performed with the fully autonomous Donald Duck

• Environment closed for a finite exploration

• Experiments:• Map building• Localization (tracking)• Bootstrapping (global localization)• Closing the loop

Experimental Results

Experiments

Map Building

Localization

Bootstrapping

Closing the Loop

• Contribution• Precision: Mean error < 10 mm• Robustness: Multimodality in dynamic

environments• Practicability: PowerPC 604e 300MHz• Closing the loop

• Limitations• Switching from topological to metric

Conclusions and Outlook

Conclusions

Bayesian Maps and Navigation

Julien Diard, Pierre BessièreLaplace Team - Sharp project

Gravir Lab/IMAG, Grenoble

http://www-laplace.imag.fr/

12/06/2002

Introduction

• Questions relevant in robotics and biology :– What is navigation?

– What is a location?

– What is a map?

– What is planning?

– What is localization?

Contents

• Bibliography• Bayesian Robot Programming

– Example and definition– Putting bayesian programs together

• Bayesian maps– Definition and example– Putting bayesian maps together

Bibliography : the classicalapproach

• Navigation :– Finding a path for a point in

workspace(-time) ;– control of plan

• applied mathematical problems

• Assumes that– A precise geometric map is available– The locations of the robot and goal point are precisely

• These conditions are never met

Bibliography : probabilisticapproaches

• Markov models– POMDPs, HMMs, Markov Localization, (Extended)

Kalman Filters, Dynamic Bayes Networks, etc.

• Pros– Treat incompleteness and uncertainties

• Cons– Often not hierarchical

– Often associated with geometric, fine grained models

– Impose dependencies and independencies

Bibliography : bio-inspiredapproaches

• Kuipers’ Spatial Semantic Hierarchy,among many others

• Pros– Reflexion on the definitions of « localization »

and « maps » : cognitive maps– Hierarchical

• Cons– Various formalisms

• Consistency, communication between modules, …

Bibliography : analysis

• Use Bayesian Robot Programming– Unified framework for dealing with

incompleteness and uncertainties

– Explicit declaration of assumptions• Emphasis on the semantics of variables

• Does not impose any dependencies

– Modularity allows• Incremental development

• Easy building of hierarchies

Contents

Example : Sensor fusion

• Objective– Find the position of a light

source

• Problem– The robot does not have a dedicated sensor

• Solution– Model of each sensor– Fusion of the eight models

ThetaL

ThetaL, DistL, Lmi

Light sensor model (1)

– A priori programming (or learning)

Utilization : inverse questions

Specification

Identification

– Variables

Preliminary knowledge psensor

– DecompositionP |

P | P |

ThetaL DistL Lmi

ThetaL DistL Lmi ThetaL DistL

i Sensor

Sensor i Sensor

Ÿ Ÿ Ÿ( )= Ÿ( ) ¥ Ÿ Ÿ Ÿ( )

ThetaL Lmi li

DistL Lmi li

i Sensor

=[ ] Ÿ Ÿ( )=[ ] Ÿ Ÿ( )

– Parametric forms

ThetaL DistL

Lmi ThetaL DistL

Sensor

i Sensor

Ÿ( ) ¨

Ÿ Ÿ Ÿ( ) ¨

Uniform

Gaussians

ThetaL

Light sensor model (2)P(ThetaL | Lmi )

(Lmi = 15)

P(ThetaL | Lmi Cp_li)

-180-135 -90 -45 0 45 90 135 170

P(DistL | Lmi Cp_li)

0 5 10 15 20 25

(Lmi = 45)

-180-135 -90 -45 0 45 90 135 170

0 5 10 15 20 25

(Lmi = 100)

-180-135 -90 -45 0 45 90 135 170

0 5 10 15 20 25

(Lmi = 200)

-180-135 -90 -45 0 45 90 135 170

0 5 10 15 20 25

(Lmi = 300)

-180-135 -90 -45 0 45 90 135 170

0 5 10 15 20 25

(Lmi = 450)

-180-135 -90 -45 0 45 90 135 170

0 5 10 15 20 25

(Lmi = 475)

-180-135 -90 -45 0 45 90 135 170

0 5 10 15 20 25

(Lmi = 500)

-180-135 -90 -45 0 45 90 135 170

0 5 10 15 20 25

P(DistL | Lmi )

Sensor Fusion (1)

– No free parameters

Specification

Identification

– Variables

– Decomposition (Conditional Independence Hypothesis)

P ... |

P | P |

ThetaL DistL Lm Lm

ThetaL DistL Lmi ThetaL DistL

Fusion

Fusion Fusion

Ÿ Ÿ Ÿ Ÿ( )

= Ÿ( ) ¥ Ÿ Ÿ( )=

Utilization

P | ...ThetaL DistL lm lm FusionŸ Ÿ Ÿ Ÿ( )0 7 p

– Parametrical FormsP |

P | P |

ThetaL DistL

Lmi ThetaL DistL Lmi ThetaL DistL

Fusion

Fusion i Sensor

Ÿ( ) ¨

Ÿ Ÿ( ) ¨ Ÿ Ÿ Ÿ( )

Uniform

ThetaL, DistL, Lm0, …, Lm7

Sensor Fusion (2)Lm2 = 391 (capteurlum -10° )

P(ThetaL | Lm2 Cp_l2)

-180 -90-45 0 45 90 170

Lm3 = 379 (capteurlum 10°)

-180 -90-45 0 45 90 170

Lm1 = 480 (capteurlum -50° )

-180 -90 -45 0 45 90 170

-180 -90-45 0 45 90 170

Lm0 = 509 (capteurlum -90° )

-180 -90 -45 0 45 90 170

-180 -90-45 0 45 90 170

Lm7 = 511 (capteurlum -170°)

-180 -90-45 0 45 90 170

Lm6 = 511 (capteurlum 170° )

-180 -90-45 0 45 90 170

Tetha = 10, Dist = 20

P(ThetaL | Lm0..Lm7 Cp_SourceL)

-180 -90-50-1010 50 90 170

Bayesian Robot Programming

– P(searched variables | known variables Ÿ p Ÿ d)

• General probabilistic inference engine

Utilization– Learning (with data d) or a priori programming

Specification

Identification

– Relevant variables X1, …, Xn

• Their range

– Decomposition

• P(X1 Ÿ … Ÿ Xn) as a product of simple terms

• Dependencies

• Conditional independence hypotheses

– Parametric Forms• For all terms

Preliminary Knowledge p

Contents

Putting descriptions together

• Bayesian fusion– Probabilistic subroutine calling

• Bayesian program combination– Probabilistic « if - then - else »

• Bayesian program sequencing– Probabilistic « ; »

• Bayesian program iteration– Probabilistic « loop »

• Using functions

• Complex behaviour (42 var., 4 hierarch. levels…)

• Space is represented, but no explicit map

Vrot Vtranspx0 px7 lm0 lm7 veille feu obj?eng tach_t-1 td_t-1 tempo tour dir prox dirG proxG vtrans_c dnv mnv mld per

td_t - 1 tempo tour

veille feu obj?

eng tach_t - 1

Cp Surveil

Cp TypDépl

TdTach

TachBase

TachTach

Cp DétectBase

Cp Surveil

ÁÁÁÁÁÁ

˜˜˜˜˜˜

ÁÁÁÁÁÁÁÁÁÁ

˜˜˜˜˜˜˜˜˜˜

ThetaL DistL

px0 px7

lm0 lm7

Vrot Vtrans

Td ThetaL

TdThetaLH

dir prox dirG proxG vtrans_cCp Surveil_

ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ

˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜˜

ÂP(ThetaL | C4)

lm0 ... lm7

P(Base | C1)

lm0 ... lm7px0...px7

P(Tach | C2)

veilleengfeuobj?tach_t-1

P(Td | C3)

tourtempotd_t-1

P(H | C5)

P(Vrot Vtrans | C6)

proxdirdirG

proxGvtans_c

Cp_DétectBase

Cp_Tach

Cp_TypDépl

Cp_Surveil

Cp_PhotoEvit

décision

vrot vtrans

Scaling up : « TheNightwatchman Khepera »

« The Nightwatchman Khepera »

Contents

What is navigation?What is a location?

What is a map?

Bayesian Map : definition

– Localization P(Lt | P)

– Prediction P(Lt’ | A Lt)

– Control P(A | Lt Lt’)

– anyProg

Utilization

Specification

Identification

– Relevant variables :

• P : perception variable

• Lt : location at time t

• Lt’ : location at time t’ (t’ > t)

• A : action variable

– Decomposition : any (eg Markov Loc)

– Parametric Forms : any

What is a location?

What is localization?

What is planningbased on?What is navigationbased on?

What is a map?

Example (1/3) : variables

• Map of a room based on proximeters– P : Px0 Ÿ … Ÿ Px7

• Relevant features : corners, walls, empty space– Lt : Sitt = {corner, wall,

empty-space}– Lt’ : Sitt+Dt

• Motor commands :– A : Beh = {Stop, Straight,

FollowWall, QuitCorner, …}

corner

2D visualization of the bayesian map

What is a location?

Example (2/3) : induced graph

• Can be extracted from a bayesian map– Can then compute on it :

• diameter, connexity…

• Can help build a bayesian map– Comes more intuitively

– Helps talking about the

bayesian map

– Nicer than…

Empty Space

FollowW

QuitCStraight

Straight

Corner�

« graph form » visualization of the bayesian map

(excerpt)

What is planning?

Example (3/3) : bayesian maplocalization module

Specification

Identification

– Relevant Variables• Sitt : {wall, corner, empty-space}, 3

• Px0 … Px7 : {0, 1, …, 15}, 16

– Decomposition of the joint• P(Px0 … Px7 Sitt | CPsit)

= P(Sitt | CPsit) Pi P(Pxi | Sitt CPsit)

– Parametric form for each term

• P(Sit | CPsit) Æ Uniform

• P(Pxi | [Sit=empty-space] CPsit) Æ Question P(Pxi | CPempty)

• P(Pxi | [Sit=wall] CPsit)

Æ Question P(Pxi | CPwall) = Â J D P(Pxi | J D CPwall)

• P(Pxi | [Sit=corner] CPsit)

Æ Question P(Pxi | CPcorner) = Â Pos P(Pxi | Pos CPcorner)

Contents

Putting maps together :superposition

corner

VHighHigh

LowVLow

+VH +, -

E,VLE,L

E,HE,VH

Emp.VH H L VL

+,-++ +

+,- +,-

What is navigation?What is a map?

Putting maps together :juxtaposition (1)

• Abstracting maps on the same internal variable

Pass corridor

Room1 Room2

Any otherbehaviour

What is planning?What is a map?

Putting maps together :juxtaposition (2)

• Other examples :– Maps are floors– connectors are stairways or elevatorsÆ new abstraction is a building

– Maps are buildings and streets– connectors are doors

Æ new abstraction is (the whole) street

Putting maps together :abstraction (1)

• Abstracting maps of different natures

• Map of a wall :

– P : Px0 Ÿ … Ÿ Px7

– Lt : Lt’ : q Ÿ Dist

– A : Rot Ÿ TransTheta = -90

Theta = 90

Dist = 0

Dist = 1

Dist = 2

q-180 1500�

backward

forward

What is a location?

« Wall » : localization

– Learning

Specification

Identification

– Relevant Variables

• J : {-180, -150, …, +150}, 12

• D : {0, 1, 2}, 3

• Px0 … Px7 : {0, 1, …, 15}, 16– Decomposition of the joint

– Parametric form for each term

• P(J D)�� Æ Uniform

• P(Pxi | J D) Æ Gaussians

Theta = -90

Theta = 90

« Wall » : control

Specification

Identification

– Relevant Variables

• J : {-180, -150, …, +150}, 12 ; Dist : {0, 1, 2}, 3

• Beh : {FollowWall, Away-from-wall}, 2

• Vrot, Vtrans

– Decomposition of the joint

• P(J D Beh Vrot Vtrans | CPwall-control)

P(J D Beh | CPwall-control)

P(Vrot Vtrans | J D Beh CPwall-control)– Parametric form for each term

• P(J D Beh | CPwall-control) Æ Uniform

• P(Vrot Vtrans | J D Beh CPwall-control)

Æ G J D Beh (Vrot, Vtrans)

« Wall » bayesian mapD

Specification

Identification

– Relevant Variables• P : Px0 … Px7 (16 valeurs)

• Lt : J : {-180, -150, …, +150}, 12 ; D : {0, 1, 2}, 3

• Lt’ : Beh : {stop, followW, away-from-wall}, 3

• A : Vrot, Vtrans– Decomposition of the joint

• P(Px0 … Px7 J D Beh | CPwall)

P(Px0 … Px7 Beh | CPwall)

P(J D | Px0 … Px7 CPwall)

P(Vrot Vtrans | J D Beh CPwall)– Parametric form for each term

• P(Px0 … Px15 Beh | CPwall) Æ Uniform

• P(J D | Px0 … Px15 CPwall)

Æ Question P(J D | Px0 … Px15 CPwall-loc)

• P(Vrot Vtrans | J D Beh CPmur)

Æ Question P(Vrot Vtrans | J D CP )

Putting maps together :abstraction (2)

• Map of a corner :– Lt : Pos = {FrontL, FrontR, …}

• Map of the empty space

– Lt : ∆

FrontRight

What is a location?

corner

• New abstraction :

Loose ends

• Small state spaces– Necessary for planning, sufficient for most tasks?

• Explosing Lt into Lt1 Ÿ … Ÿ Ltn

• Planning– Iteration of the map on P(At At+1 … At+h | Lt Lt+h)

• Learning– Given the location variable, learn (parts of) the map : easy– Select a location variable out of several : maybe– Find out a relevant location variable out of the blue : hard!

• Time constant– Estimate it from the graph diameter

Conclusion

• New framework for (bayesian) mapping andnavigating– Intuitively appealing– Integrates hierarchies– Unified formalism

• Where’s the killer experiment?• Is it biologically plausible?

– Translation of existing models into our scheme?– Prediction of new data?

Obstacle Avoidance Using Proscriptive Programming

Cycab Vehicle Experiment Description

Cedric Pradalier

Carla Koike

PhD StudentsSharp − GRAVIR/IMAG − CNRS

Motivation

" First pratical experiment using bayesian program in Cycab

" Combination of Obstacle Avoidance and other tasks in a hierarchical manner

Presentation Objectives

" Show several ways to implement obstacle avoidance

" Show how to fuse proscriptive commands and reference values

" Present one example of the incremental cycle in robotic applications design using Bayesian Programming

Contents" Introduction

" Cycab Vehicle Description and Problem Context

" Proposed Solutions:

1. Zone weighting (prescriptive)2. Zone fusion (prescriptive)3. Zone fusion (proscriptive)

" Incremental Design Aspects

" Discussion and Conclusions

Proscriptive Programming

" Prescriptive versus Proscriptive

� Prescriptive tells you what you have to do

� Proscriptive tells you what you cannot do

" Some situations are better modelled in a way or other� Phototaxis is tipically prescriptive

� Obstacle avoidance is easily modelled as proscriptive

Prescriptive and Proscriptive Programming

Proscriptive in Bayesian Programming

" Bayesian Programming is a very useful tool for creating proscriptive models� Permission is given by high probabilities

� Interdictions are modelled as low valued probabilities

" Fusion of different command propositions allows to obtain a trade−off between desired values and allowed values

Command Fusion

" High level tasks determine desired (deterministic or probabilistic) values for command variables

" Obstacle avoidance fuses these desired values with the allowed situations when in the presence of obstacles

Probabilistic Fusion

High Level Task

Obstacle Avoidance

Cycab Vehicle" Electric Vehicle" Sensor

� Laser Sick

" Vehicle Control

� Wheels speed

� Steering angle

Problem Description

" Obstacles are static. If some dynamic obstacles exist, they move at slow speed (walking man or manouvering car)

" The Cycab shall avoid the obstacles, keeping when possible the desired values of translation and steering angle

" We define as highly dangerous elliptic area located in front of the cycab. Any object in this area MUST impose a zero speed.

Sensor Variables

" Sensor reading values − Di

� Signal Preprocessing creates 8 zones in front of vehicle

� Only the lowest distance in a zone is taken

� Reading Values between 0~8191 scaled to 0~200 (Obstacle distance between 0 and 2000cm)

Zone 1

Zone 2

Zone 8

Motor Variables" Control Values

� Vehicle Translation Speed V is discretized in 6 values " V = 0..5, a discretization of 0..V

" Always positive since we have only a front sensor.

� Steering Angle φ can have 11 values −5..+5, discretization of φ

min..φ

" Proposed Solutions

First Version Zone Weighting

" Each zone proposes probability distributions for the variables V and Φ in order to avoid the nearest obstacle seen in this zone

" A variable H is created to indicate the weight of each zone proposal for the final value of V and Φ

Zone Model Description16

Zone Weighting DescriptionD

Program UtilizationFirst Version

Zone Weighting ¯ Combination of Descriptions

P8(V | D

P([H=8] | D8)

P1(V | D

P([H=1] | D1)

P(V | D1...D

Combination of Descriptions

Weighting

Results and CommentsFirst Version

" Results are coherent but depends a lot on the functions f

i) which are not easy to adjust

Second Version − Zone Command Fusion

" Standard deviation values of Pi(V|D

i) and

Pi(φ |D

i) are not taken into account

� They represent redundancy of information with fi(D

" Command Fusion can use the standard deviation values of P

i) and P

i(φ |D

i) as weights

�Variables

Zone Model DescriptionD

Specification

Identification

�Decomposition

� Parametrical Forms

" A priori

V and φ Distribution building

Command Fusion DescriptionD

Specification

Identification

�Variables D1,...,D

8: 0...200 ,201

V : 0...5 ,6

φ : B5...5 ,11

�DecompositionP V ⊗φ⊗D

1⊗...⊗D

8= P V ⊗φ ∏

Pi Di|V ⊗φ

�Parametrical Forms P V ⊗φ =Uniform

Di|V ⊗φ =1

Question to Zone

− A priori

Utilisation

P V ⊗φ|D1...D

Di|V ⊗φ }

Command Fusion

Command Fusion ¯ Composition of Descriptions

8|Vφ)

1|Vφ)

P(Vφ| D1...D

i|Vφ)

P V ⊗φ|D1...D

Di|V ⊗φ }

Results and CommentsSecond Version

" Results similar to anterior version, but it is easier to adjust the parameters

" Transitions in the speed variable and steering angle are smoother

Third Version − Proscriptive Programming

" The curves of Pi(V|D

i) and P

i(φ |D

i) change in order

to indicate the prohibited behavior, when an obstacle is identified in each zone

" The command fusion is equivalent to the anterior version

" When fusing probabilities curves, uniform distribution doesn’t add information

�Variables

Zone Model Description

� A priori

Specification

Identification

�Decomposition

� Parametrical Forms

Main Difference between Versions 2 and 3

Version 3 : what is allowed/safe

Command Fusion Description

� A priori

Specification

Identification

�Variables D1,...,D

8: 0...200 ,201

V ,Vc: 0...5 ,6

φ ,φc: B5...5 ,11

�DecompositionP V

c⊗V ⊗φ

c⊗φ⊗D

1⊗...⊗D

P V |Vc

P φ|φc ∏

Di|V ⊗φ }

�Parametrical Forms P V ⊗φ =Uniform;

P V c =P φc =Unknown;

P V|Vc=G

P φ|φc =G myφφ

Pi Di|V ⊗φ =1

Z{ Pi V|Di Pi φ|Di }

Question to Zone

Results and CommentsThird Version

Prescriptive :Safest speed is weighted moreResulting speed is greater than

safest speed.

Proscriptive :Resulting speed is the safest

speed.Safest speed is the strongest

constraint

Example of Fusion ResultSituation :

" Two near objects on each side : 2 and 7" One object, farther, in front

Objective : the faster in straight line

Reasonable behaviour : " The vehicle should go straight forward" It may turn slowly

Summary Table

Version 1 Version 2 Version 3

Prescriptive Prescriptive Proscriptive

Reference Values

Zone Weighting °

Combination

Command Fusion ° Composition

Command Fusion °

Composition

No Reference Values

Videos − Third Version

Media Clip

1. Zone weigthing (prescriptive)2. Zone fusion (prescriptive)3. Zone fusion (proscriptive)

Incremental Design

" New tasks can be added incrementally and composed with obstacle avoidance task

" The tasks can suply reference values

� Deterministic or probabilistic

� Prescriptive or proscriptive

" Hierachical combination of tasks

� Reference values can result from combination or composition of other tasks

1. Zone weigthing (prescriptive)2. Zone fusion (prescriptive)3. Zone fusion (proscriptive)

" Conclusions and Improvements

Comments and Conclusions" Obstacle avoidance task and Bayesian

Programming" Proscriptive programming as a way to

� increase modularity

� model security rules

" Basis for Incremental Design of more complex systems

" One or more versions can be implemented in BibaBot

Some Improvements...

" Implement the fourth version" Implement a wall following task to suply the

references values of speed and steering angle" Try different approaches to each zone

� Proscriptive when possible

� Prescriptive when necessary

� Different joint distributions regarding the importance of dependence between speed and angle

Thank you!

Questions, Sugestions, Comments,...

DEA I.S.C.

Apprentissage bayésien par imitation(Dans le cadre du projet européen BIBA, Bayesian Inspired Brain and Artefact)

Frédéric Raspail

Tuteur : Pierre BessièreLaboratoire GRAVIR Projet SHARP

19 juin 2002

Introduction

• Imitation : concept mal défini• Définition : «L’imitation est le processus par lequel

l’imitateur apprend quelques caractéristiques ducomportement du modèle»

• Présent dans le monde animal :• Éducation, propagation des comportements• Exemples :

• Caneton et sa mère• Mésanges

• Intérêt futur en robotique• Problème : Avoir un modèle interne de l’imité

Introduction

Expérience :

• 1 : remontée de gradient (vers la source)

• 2 : remontée vers une source parmi plusieurs

=> faire de l’imitation de manière simple

• 3 : remontée vers une source puis vers une autre

=> reconnaître ce que fait l’imité

• Environnement expérimental

• Programmation Bayésienne des Robots

• Expérience 1

• Expérience 2

• Expérience 3

• Conclusion et perspectives

Environnement expérimental :Robot

• Robot Koala• Caméra 2 degrés de

liberté

• Vrot, Vtrans [–64…63]

• « Nez »• Sources odorantes

simulées

• Odeur [-10…+10]

• Poursuite visuelle• DEA IVR Coué 2000

Environnement expérimental :Mise en place de l’imitation

• Apprendre : remontée vers une source

• Imiter grâce à la poursuite visuelle

Environnement expérimental :Mise en place de l’imitation

Phase d’apprentissage6

• Expérience 1

• Expérience 2

• Expérience 3

Comportement de remontée vers une source

Restitution du comportement programmé8

Programme « remontée vers une source »

• Fixé a priori• Télé-opération

• Imitation (expérience 1)

• P(Vrot | [Odeur=o] )

Spécification

Identification

Utilisation

•Variables

•Odeur : [-10..+10], 21

•Vrot : [-64..+63], 128

•Décomposition

P(Odeur Vrot) = P(Odeur) P(Vrot | Odeur)

•Formes paramétriques

•P(Odeur ) Æ Uniforme (1)

•P(Vrot | Odeur) Æ Gaussiennes (21)

• Expérience 1

• Expérience 2

• Expérience 3

Expérience 1 : Mode opératoire

• <odeure,vrote>• Un corpus / un apprentissage:

• Une position initiale• Plusieurs orientations initiales• Plusieurs fois le même mvt

• 3 apprentissages successifs

Expérience 1 :Résultats

Phase de restitution12

Expérience 1 :Résultats

• Expérience 1

• Expérience 2

• Expérience 3

Expérience 2 :Description

• Apprentissage par imitation des moyennes et écart-types

Spécification

Identification

Utilisation

•Variables•Odeur1 : [-10..+10], 21

•Vrot : [-64..+63], 128

•DécompositionP(Odeur1 Vrot) = P(Odeur1) P(Vrot | Odeur1)

•P(Odeur1) Æ Uniforme (1)

•P(Vrot | Odeur1) Æ Gaussiennes (21)

•P(Vrot | [Odeur1=o])

Spécification

Identification

Utilisation

•Vrot : [-64..+63], 128

•DécompositionP(Odeur2 Vrot ) = P(Odeur2) P(Vrot | Odeur2)

•P(Odeur2) Æ Uniforme

•P(Vrot | [Odeur2=o] )

Spécification

Identification

Utilisation

•Vrot : [-64..+63], 128

•DécompositionP(Odeur3 Vrot) = P(Odeur3) P(Vrot | Odeur3)

•P(Odeur3) Æ Uniforme (1)

•P(Vrot | [Odeur3=o])

Expérience 2 : Mode opératoire & résultats

• Un corpus / un apprentissage:• Plusieurs positions initiales• Plusieurs orientations initiales• Plusieurs fois le même mouvement

• <odeure1, vrote>• <odeure2, vrote>• <odeure3, vrote>

• Expérience 1

• Expérience 2

• Expérience 3

Expérience 3 :Présentation

Phase d’apprentissage17

• P(Vrot | [Fuite=f] [Odeur2=o2] [Odeur1=o1] C-comb)

• Variables• Odeur1, Odeur2, Vrot

• H : {1,2}, 2

• Fuite: {1,2},2

• Décomposition• P(Odeur1 Odeur2 Fuite H Vrot) = P(Odeur1) P(Odeur2) P(Fuite) P(H | Fuite)

P(Vrot | H Odeur2 Odeur1)

• Formes paramétriques

• P(Odeur1), P(Odeur2), P(Fuite) Æ Uniformes

• P(H | Fuite) Æ Laplace

• P(Vrot | [H=1] Odeur2 Odeur1) = P(Vrot | Odeur1 C-rem1)

• P(Vrot | [H=2] Odeur2 Odeur1) = P(Vrot | Odeur2 C-rem1)

Spécification

Identification

Utilisation

• Apprentissage de P(H | Fuite) par imitation

Expérience 3 :Identification de P(H | Fuite)

• Imitation: 4-uplets <o1e, o2e, fe, ve>

• Question :

P(H | [Odeur1=o1e] [Odeur2=o2e] [Fuite=fe] [Vrot=ve])

• Couple < fe ,h>

P H Odeur Odeur FuiteVrotP Vrot Odeur Odeur FuiteH

P Vrot Odeur Odeur FuiteHH

1 21 2

1 2( )=

( )( )Â

P H Odeur Odeur FuiteVrotP Vrot Odeur C rem

P Vrot Odeur C rem P Vrot Odeur C rem=[ ]( )=

-( )-( )+ -( )

1 1 21 1

1 1 2 1

Construction des corpus:•Une position initiale•Plusieurs orientations initiales•Plusieurs fois le même mouvement

0,8160,184f = 2

0,2710,729f = 1

h = 2h = 1P([H=h] | [Fuite =f])

0,6730,327f = 2

0,5350,465f = 1

h = 2h = 1P([H=h] | [Fuite =f])

• Expérience 1

• Expérience 2

• Expérience 3

Conclusion

• Expérience 1 & 2 :imitation simple : résultats concluants

• Expérience 3 :L'imitateur reconnaît le comportement qu'il imite:résultats concluants dans une configuration.Début de réponse pour reconnaître ce que faitl’autre.

Perspectives

• Technique• Aide à l’apprentissage

• Adapter la poursuite visuelle

• Tester robustesse des comportements

• Compléter les corpus

• Apprendre des comportements plus complexes

• Plus long terme• « Dressage » du robot BIBA par l’homme (suivre au pied)

• Propagation de comportements (Koala à Koala )

• Reconnaître des comportements

9/07/02

programmationbayésienne de

personnages de jeuxvidéos

Ronan Le HyDEA Sciences Cognitives

2001 - 2002encadrants :

Pierre BessièreOlivier Lebeltel

9/07/02

une nouvelle méthode deprogrammation pour les bots dans lesjeux vidéos

objectifs

pour l’équipe de développement facilité de programmation

temps de calcul limité

séparation programmation / conception ducomportement du personnage

facilité de programmer différents comportements

pour le joueur “humanité”

apprendre à jouer à des bots

9/07/02

plan cadre

objectifs

plateforme

objectif concret

modèle bayésien

programmation d’un comportement

apprentissage

conclusion

cadre : plateforme

Unreal Tournament

Gamebots (ISI & CMU)

messages :• position• vitesse• santé• personnages visibles…

ordres :• aller à un point• tirer…

9/07/02

cadre : objectif concret

boucle :relever les valeurs des capteurschoisir un nouvel étatagir

recherche d’une arme

recherche de bonus de santé

attaque

exploration

détection danger

formellement :connaissant l’état courant Et

et les variables sensorielles Vidécider du nouvel état Et+1

programmation bayésiennedes robots

structure d’un programme bayésien

9/07/02

modèle bayésien :variables pertinentes

description variables pertinentes

Et Et+1

Vie Arme ArmeAdversaire Bruit NombreEnnemis ProxArme ProxSanté

sensorielles

motrices

états du bot Attaque RechercheArme RechercheVie Exploration Fuite DetectionDanger

modèle bayésien :décomposition

P(Et Et+1 V A Ad B Ne Pa Ps)

hypothèse :variables sensorielles et Et indépendantes deux à deux sachant Et+1

9/07/02

formes paramétriques

formes paramétriques P(Et) : inconnue (non spécifiée)

P(Et+1 | Et) : table

P(Vsensorielle | Et+1) : tables

identification :à la main, ou par apprentissage

modèle bayésien : questionQuel est l’état suivant sachant l’état courantet les variables sensorielles ? P(Et+1 | Et V A Ad B Ne Pa Ps)

résolution :

9/07/02

programmation inverse :facilité de programmation

programmation classiquesachant Etsachant V1, V2…

donner Et+1

pour |Et|=6 transitions à partir d’unétat,

partitionner en 6 ensemblesl’espace sensoriel de taille |V| = 648

une transition = une formule logique

programmation inversesachant Et+1

donner V1, V2…

pour 7 variables sensorielles, pour|Et|=6 états (soit 42 cas),

donner une distribution

une distribution = une table

programmation inverse (2)

si P(Arme=Aucune | Et+1 = Attaque) = 0,

et si Arme = Aucune,

le bot ne peut pas passer en Attaque :

P(Et+1 = Attaque | … [Arme=Aucune] …) = 0

chaque distribution élémentaire contribue à laquestion, qui est une forme de moyenne géométrique

complexité maitrisée : linéaire dans le nombre d’états

linéaire dans le nombre de variables

9/07/02

facilité de développement

forme de spécification de l’automateplus condensée

puissante

légère en temps de calcul

programmation d’uncomportement

écriture des tables

P(Vie | Et+1)gestion du niveau de vie

ARARVExFuDD

AttaqueRecherche ArmeRecherche Vie

ExplorationFuite

Détection Danger

• 0.95

• 0.01

• 0.01• DD

• 0.01

• 0.95

• 0.01

• 0.01• Fu

• 0.01

• 0.95

• 0.01

• 0.01• Ex

• 0.01

• 0.95

• 0.01

• 0.01• RV

• 0.01

• 0.95

• 0.01• RA

• 0.01

• 0.95• A

• DD• Fu• Ex• RV• RA• A•

P(Et+1 | Et)auto-maintien

• 0.45• 0.1

• 0.45

• 0.001

• 0.45

• 0.899

• Haut

• 0.45• 0.2

• 0.45

• 0.01

• 0.45• 0.1

• Moyen

• 0.1• 0.7• 0.1• 0.9

89• 0.1• 0.0

01• Bas

• DD• Fu• Ex• RV• RA• A•

P(NombreEnnemis | Et+1)gestion du risque

9/07/02

spécification à la main

séparation du développement et de laconception du personnage

les comportements comme desdonnées

programmation d’un secondcomportement : ajustabilité second comportement aggressif

9/07/02

(films : odge etberserk)

9/07/02

apprentissage :enseignement par le joueur

interface

tables

9/07/02

humanité

critère subjectif

cependant : forme de test de Turing

conclusion

pour l’équipe de développement temps de calcul limité

facilité de programmation

séparation programmation / conception dupersonnage

facilité de programmer différents comportements

pour le joueur “humanité”

enseigner le jeu à des bots

9/07/02

perspectives

perspectives : descendre plus bas avec le bayésien

intégrer des aspects délibératifs aumodèle

nouveaux schémas d’apprentissage

9/07/02

Gamebots(1) capteurs

messages synchrones et asynchrones personnage

id, nom, équipe

position (rotation, lieu), vitesse

santé, armure, niveau de vie

arme, munitions

autres personnages visibles id, nom, équipe

position (rotation, lieu), vitesse, accessibilité

arme, fait feu

environnement dans le champ de vision nœuds de navigation : id, lieu, accessibilité

portes, ascenseurs : id, lieu, accessibilité, type

objets : id, lieu, accessibilité, type

jeu scores, capture de drapeau

ramassé objet

pieds, tête ou corps changent de zone(eau, lave…)

changement d’arme (auto ou provoqué)

collision avec un mur, un objet ou un joueur

mort, blessure

mort, blessure infligée par soi

bruit (pas, ascenseur, tirs, objet ramassé)

projectile se dirigeant vers soi

réponse à une requête de chemin oud’accessibilité

message d’un autre joueur (texte ou typé)

Gamebots(2) commandes

déplacement marcher, courir vers un point, un joueur, un objet, un nœudde navigation…

courir vers un point en faisant face à un point/objet (strafe)

se tourner vers un point/objet ou d’un angle

s’arrêter

sauter

armes commencer à tirer

arrêter de tirer

changer d’arme

requêtes chemin vers un point/objet

accessibilité d’un point/objet

message aux autres joueurs

Learning issues discussionLearning issues discussion

Moderated by Jean Laurens andModerated by Jean Laurens and Fr Frééddééric Davesneric DavesneLPPALPPA

BIBA Summer School, Moudon, 30 June-5 July 2002

BIBA Summer School - Moudon - July 2nd 2002

IntroductionIntroduction

When is learning useful ?When is learning useful ?–– Learning techniques are used when Learning techniques are used when uncertaintyuncertainty is is

encountered: lack of prior knowledgeencountered: lack of prior knowledge

But ...But ...–– Each learning technique needs the master to giveEach learning technique needs the master to give

prior knowledgeprior knowledge

Questions about this prior knowledgeQuestions about this prior knowledge–– Feasibility ? Certainty ? Accuracy ?Feasibility ? Certainty ? Accuracy ?

The core of the discussion ...The core of the discussion ...–– What about raising these questions within theWhat about raising these questions within the

bayesian framework ?bayesian framework ?

OverviewOverview

1 - What do we mean by 1 - What do we mean by ““LearningLearning”” ? ?–– Why using learning ?Why using learning ?–– Short overview of some learning paradigmsShort overview of some learning paradigms–– Learning issues (Opened to Discussion)Learning issues (Opened to Discussion)

2 - Learning within the2 - Learning within the bayesian bayesian framework framework––AA supervised learning example supervised learning example–– A latent learning exampleA latent learning example

Why using learningWhy using learning tecniques tecniques ? ?

Causes of uncertaintyCauses of uncertainty–– UnabilityUnability in modelling the in modelling the ““worldworld””

the environment is unknown or is not constrained enoughthe environment is unknown or is not constrained enoughthe model of the sensors and/or the effectors is unknownthe model of the sensors and/or the effectors is unknownthe interaction between the system and its environmentthe interaction between the system and its environmentis unknown or is not constrained enoughis unknown or is not constrained enough

–– Lack in the computing processLack in the computing processhow to reach a goal or to achieve a behaviour ?how to reach a goal or to achieve a behaviour ?how to produce useful and reliable data ?how to produce useful and reliable data ?1-

The technical sideThe technical side

Perceptual issuePerceptual issuethe environment is unknown or is not constrained enoughthe environment is unknown or is not constrained enoughthe model of the sensors and/or the actuators is unknownthe model of the sensors and/or the actuators is unknownthe interaction between the system and its environmentthe interaction between the system and its environmentis unknown or is not constrained enoughis unknown or is not constrained enoughhow to produce useful and reliable data ?how to produce useful and reliable data ?

Procedural issueProcedural issuehow to reach a goal or to achieve a behaviour ?how to reach a goal or to achieve a behaviour ?

Learning paradigmsLearning paradigms

SupervisedSupervised

ReinforcementReinforcement

UnsupervisedUnsupervised

LatentLatent1- W

ExampleExample

Visual landmark

Movement

Target

Sensory variables : landmarks perceptionMotor variable : « this way ! »Comportemental variable : d(Dist)/dt < 0 -> Target reaching

Multi-Layered Neural Networks with back-Multi-Layered Neural Networks with back-propagationpropagation

Hypothesis about prior knowledgeHypothesis about prior knowledge–– Each of the examples is a Each of the examples is a functional andfunctional and

meaningfulmeaningful relation between input and output data relation between input and output data–– Uncertainty = inaccuracyUncertainty = inaccuracy

Learning = best interpolationLearning = best interpolation

Learning paradigms: supervisedLearning paradigms: supervised

NNSet of

examplesS

AHC or Q-Learning-like methodsAHC or Q-Learning-like methods

Hypothesis about prior knowledgeHypothesis about prior knowledge–– The states are perfectly designedThe states are perfectly designed

Uncertainty = inaccuracy for the perceptual issueUncertainty = inaccuracy for the perceptual issueMDP or POMDPMDP or POMDP

–– The reinforcement value is relevant and perfectly knownThe reinforcement value is relevant and perfectly known–– Knowledge about the probability to find a first solutionKnowledge about the probability to find a first solution

(relevant internal parameters)(relevant internal parameters)

Learning = Maximising expected rewardsLearning = Maximising expected rewards

Learning paradigms: reinforcementLearning paradigms: reinforcement1-

Robot environmentstate action

reward

Clustering methods (SOM)Clustering methods (SOM)

Hypothesis about prior knowledgeHypothesis about prior knowledge–– Unitisation and differenciation postulates (what makes aUnitisation and differenciation postulates (what makes a

state exist and be unique ?)state exist and be unique ?)–– May lead to a supervised learning problemMay lead to a supervised learning problem

Learning = fit a probability distributionLearning = fit a probability distribution

Learning paradigms: unsupervisedLearning paradigms: unsupervised

System statesignals

Some relevant informations can be learned withoutSome relevant informations can be learned withoutany conditionning processany conditionning process

–– Goal = AnticipationGoal = Anticipation - ACS [Stolzmann 1998] - ACS [Stolzmann 1998]

Hypothesis about prior knowledgeHypothesis about prior knowledge–– The states or the symbols used by the system are perfectlyThe states or the symbols used by the system are perfectly

designeddesigned

Learning = anticipate as best as possibleLearning = anticipate as best as possible

Learning paradigms: latent [Seward 1949]Learning paradigms: latent [Seward 1949]1-

Learning issuesLearning issues

•• Structure of the model Structure of the model

•• Curse of dimensionalityCurse of dimensionality

•• Meaningfulness of the variablesMeaningfulness of the variables

•• Specification of the prior knowledge Specification of the prior knowledge

•• Uncertainty about the relevance of the prior knowledge givenUncertainty about the relevance of the prior knowledge givenby the masterby the master

Curse of dimensionality Curse of dimensionality « « AAlmost all the learninglmost all the learningtechniques are bound to fail if the techniques are bound to fail if the intrinsic dimensionalityintrinsic dimensionalityof the problem is too bigof the problem is too big » » [Verleysen 2000] [Verleysen 2000]

–– Input space (supervised,unsupervised): need of anInput space (supervised,unsupervised): need of anenormous amount of dataenormous amount of data

–– Search space (reinforcement): need of an enormousSearch space (reinforcement): need of an enormousamount of time to discover the goalamount of time to discover the goal

WARNING !!!WARNING !!!–– Generally, the intrinsic dimensionality is less than theGenerally, the intrinsic dimensionality is less than the

dimensionality of the input space (eg. A Khepera robotdimensionality of the input space (eg. A Khepera robotin a in a structuredstructured environement) environement)

Learning issues: structure of the modelLearning issues: structure of the model1-

Meaningfulness of the variablesMeaningfulness of the variables–– Hidden variablesHidden variables–– Symbol grounding problem [Harnad 1992]Symbol grounding problem [Harnad 1992]

Learning issues: structure of the modelLearning issues: structure of the model

Idealistic learning techniqueIdealistic learning technique•• Certainty about the reliability of the prior knowledgeCertainty about the reliability of the prior knowledgegiven by the master, in the robotics contextgiven by the master, in the robotics context

Consequence of uncertainty about prior knowledgeConsequence of uncertainty about prior knowledge•• Lack of reliability about the result of the learningLack of reliability about the result of the learningprocessprocess•• Lack of predictibility: unability to determine the cause of Lack of predictibility: unability to determine the cause ofthe failure in a learning processthe failure in a learning process

•• a) a) Learnability of my problem ?Learnability of my problem ?

•• b) b) Correctness of my prior knowledge ?Correctness of my prior knowledge ?

Example: learning of the cart pole balancing problem Example: learning of the cart pole balancing problem with withreinforcement techniques [Davesne 2002]reinforcement techniques [Davesne 2002]

Learning issues: prior knowledgeLearning issues: prior knowledge1-

Learning issues: prior knowledgeLearning issues: prior knowledge

An example of the consequences of a wrong priorAn example of the consequences of a wrong priorknowledgeknowledge

Cart pole balancing task with reinforcement learning

The learningseems to be good but ...

Learning issues: prior knowledgeLearning issues: prior knowledge1-

If the successfulcriteria is deeply increased, it fails to learn the task

An example of the consequences of a wrong priorAn example of the consequences of a wrong priorknowledge [Davesne 2002]knowledge [Davesne 2002]

Cart pole balancing task with reinforcement learning

We have shownWe have shown•• Some learning paradigms which must be furnished with Some learning paradigms which must be furnished withspecifical prior knowledgespecifical prior knowledge•• Some typical learning issues which must be overcome (if Some typical learning issues which must be overcome (ifpossible)possible)

Now, it's time for raising questionsNow, it's time for raising questions•• With which of these learning paradigms should weWith which of these learning paradigms should weassociate the bayesian framework ?associate the bayesian framework ?•• Is bayesian learning a new learning paradigm ? Is bayesian learning a new learning paradigm ?•• What about the hypothesis about the prior knowledge ? What about the hypothesis about the prior knowledge ?•• What are the main issues ? What are the main issues ?

Initiation of the discussion ...Initiation of the discussion ...

Behavior learning by imitationBehavior learning by imitation•• Wall-following, phototaxy, etc. By a Khepera robot Wall-following, phototaxy, etc. By a Khepera robot

Pattern recognition (WARNING !!! Symbol groundingPattern recognition (WARNING !!! Symbol groundingproblem)problem)•• Distinguish a wall from a corner Distinguish a wall from a corner

A supervised learning exampleA supervised learning example2-

An example of latent learningAn example of latent learning

BibliographyBibliographyDavesne, Frédéric (2002) Etude de l'émergence de facultés d'apprentissage fiables etprédictibles d'actions réflexes, à partir de modèles paramétriques soumis à descontraintes internes – Thèse de doctorat – Université d'Evry Val d'Essonne

Harnad, Steven (1992) Cognition and the symbol grounding problem – Electronicsymposium on computation

Seward, John P. (1949). An Experimental Analysis of Latent Learning. Journal ofExperimental Psychology. 39 177-186.

Stolzmann, Wolfgang (1998). Anticipatory Classifier Systems. In Koza; John R etal.(editors). Genetic Programming 1998. Proceedings of the third Annual Conference,July 22-25, 1998, University of Wisconsin, Madison, Wisconsin. San Francisco. CA:Morgan Kaufmann. 658-664.

Verleysen, Michel (2000) Machine learning of high-dimensional data: Local artificialneural networks and the curse of dimensionality- Thèse d'agrégation – Universitécatholique de Louvain - Belgique

BIBA - Summer School Proceedings · BIBA IST–2001-32115 Bayesian Inspired Brain and Artefacts!:...

Documents