Mental Models & Active Perceptionalumni.media.mit.edu/~nmav/misc/ActivePerc_May2004.pdf · Mental...

Active Perception&

Mental Models

Nikolaos MavridisCognitive Machines

MIT Media Lab

Today’s Menu

I. VISIONII. ACTIVE PERCEPTION III. MENTAL MODELSIV. FUTURE STEPSV. CONCLUSION

I. Our VisionI. Our VisionTo build intelligent devices that can

cooperate with humans in a natural manner

And also: learn about humans!

• Key prerequisites:

– Language

– Mental Models of the world

– Multimodal Active Sensing

• Early examples:

– Ripley the robot, Elvis the lighting system, Intelligent car

General Setting:Internal model of world

"A greyhouse!"

S E N S O R YW O R LD

(Im perfec t, C hang ing)

E X TE R N A LR E A LITY(Fixed S pace tim e)

A C T IV EP E R C E P TIO N

(S ensory da ta & AC TIO N S !!)

M E N TA LM O D E LS(P artia l D escrip tions)

AC T IO N S

D ATA

II. Active PerceptionII. Active PerceptionCAP TU RE

S EG M E NTATIO N(C O LO R-BA SED )

FAC EDE TE CTIO N

S ALIE NT PO INTDE TE CTIO N

O BJE CTREC O G NIT IO N

V IS O R :PRO PO SA LS FO R

O BJEC T INS TA NTIA TIO N /U PDA TE / D ELETIO N

P RO PR IO CEP TO R :P RO PO S ALS FO R

O B JECT INSTAN TIAT IO N /UP DATE / DELE TIO N

M EN TALM O D EL

STER EO DEP THC ALCULA TIO NCA PTURE

2D REG IO NP ERM ANE NCE

2D FACEP ERM ANE NCE

IM AG IN ATO R :P RO PO S ALS FO R

O B JECT INSTAN TIAT IO N /UP DATE / DELE TIO N

U N D E RC O N S TR U C TIO N

(S AM E A S LEFTCHA NNE L)

S PEE CHR ECO G N ITIO N

U N D ERC O N ST R U C T IO N

Ripley’s Perceptual System

Cameras

• ELMO

• Panasonic KX-HCM280 (Pan/Tilt/Zoom)

Segmentation

• Probabilistic color-based

• Requires uniform background & objects :-(• Replacement: Yair Ghitza’s method

Face Detection

• Paul Viola’s algorithm:Cascade of classifiers, simple features

Salient Point Detection

• Koch/Itti algorithm:(multiscale color/intensity/edge maps)

• Bottom-up human attention model, neurosc.

Object Recognition

• Andre Ribeiro’s algorithm

• Robust to rotations, background…• Andre will tell you more!

Stereo Depth Calculation

• SRI Small Vision System:Stereo engine using area corellation

• Calibration & filtering!

Region/Face Permanence

• “Objecter”: 2D permanence across frames

• Hysterisis before creation/deletion

• Finds optimal across-frame correspondence,

based on color/position/size metric

• Keeps indices across frames

Visor: Proposals for 3D objectinstantiation/update/deletion

• Gets state of the world from mental model

• Compares with evidence, proposes changes

• Stochastic / voxel descriptions, too…

…includes Voxeliser!

Voxeliser

• Shape estimation system using “sculpture”by multiple views (app: spatial domains)

Active Perception

Bottom-up feed-through vs. on-demand active!

(also integrating bottom-up with top-down)

• Theory: visual routines, next best view etc.

• Next Action: current cost & goal-based utility…

• Two models: Resolver, Spectator

Resolver: To ask or to sense?Planning to integrate Speech and Sensorimotor Acts (ICMI ’04)

Early motivation: Disambiguating referents

“Hand me the ball!”

Resolver

• Selects the next action:Question or sensory measurement

• Probabilistic model with one-step planning:Utility (goal-oriented information gain) vs. Cost

• Human-like performance, double matching, 25% cost gain!

Resolver: A screenshot

• After: “The heavy one” - “Is it small? No” - measuresize1-3 - “Is it medium?”

Spectator

• Bottom-up attention guiding camera movement

(Alexander Patrikalakis (UROP) & Nikolaos Mavridis)

• Finds & tracks interesting pointszooms in, marks on map, goes on!

III. Mental ModelsIII. Mental ModelsMOTIVATION:

How are people able to think about things that are

not directly accessible to their senses at the moment?

What is required for a machine to able to talk about things that are:

out of sight,

happened in the past, or

view the world through somebody else’s eyes (and mind)?

What is the machinery required for the comprehension of:

“Give me the green beanbag

that was on my left!”

Mental models - why? (p.I)Goal: Provide an intermediate representation, mediating between perception, language and action.• In essence:

– an internalized representation of the state of the world as best known so far, in a form convenient for “hooking up” language

(shown below: the revisualisation of the rep)

– and a set of methods for updating this representation given further relevant sensory data, and predicting future states in the absence of such data

Mental models - why? (p.II)

• But also:– A useful decomposition of a complex problem:

a practical engineering methodology with reusable components

a theoretical framework (dynamical systems)

– A unified platform for the instantiation of hypothetical scenarios:

planning (goal state descriptions)

instantiation of situations communicated through language etc

– A starting point for experimental simulations of:

Multi-agent systems, Theory of mind, Learning

Ripley’s “Internalised World”(early version: IEEE SMC)

Object Permanence & Viewpoint Switching

RED

PROTOTYPES(Coding, tuition)

SPACETIME

SENSORYWORLD

WORLD(COMPOUND_AGENT)

AGENTs

...

AGENT_RELATIONs

...

AGENT AGENT_RELATION

BODY(COMPOUND_

OBJECT)

SOULINTERFACE

OBJECTs

OBJECT_RELATIONs

VIEWPOINT

MOVER

MENTALMODEL

GOALS

AFFECT

Objects & attributes

3 Layers of Attributes: (shape, color, weight… apparent/deep)– Stochastic – knowing how much you know!!!: for language, curiosity…

– Deterministic - maximum likelihood

– Categorical - quantized for language: “red”, learnt and ctxt-dependent!

EXAMPLE: STOCHASTIC RADIUS AND POSITION

The Architecture

M E N T A L M O D E L& R E C O N C IL L IA T O R

(m e n ta l_ m o d e l.e x e )W [ t ] a n d F

M O D A L IT Y -S P E C IF IC

IN S T A T IA T O R S(v is o r .e x e e tc . )

(W [ t ] ,S [ t ] ) -> W s [ t ]

V IR T U A LO B J E C T

IN S T A N T IA T O R( im a g in e r .e x e )

(W [ t ] ,H [ t ] ) - > W h [ t ]

D Y N A M IC SP R E D IC T O R

(p re d ic to r .e x e )W p [ t ]

V IS U A L IS E R(v is u a lis e r .e x e )

S E N S E SS [ t ]

H Y P O T H E S ISG E N E R A T IO N

V IS U A LF E A T U R EA N A L Y S IS

L A N G U A G EU N D E R S T A N D IN G

(b is h o p )v ie w p o in ts e le c t io n

M E N T A L M O D E L S : R ip le y 's c a s e

P r e l im in a r y b lo c k d ia g r a m , S e p t '0 3N ik o la o s M a v r id is , M IT M e d ia L a b

• Modality-specific processes:

– Visor

– Proprioceptor

– Imaginator

• Central processes:

– MM: Processes proposals

– Predictor

• Recent Work: Goals,Affect

• Open Questions:

- Cognitive spacetime

- Comms etc.

Evaluating performance

• Ground truth: Flock of birds sensors

(Stephen Oney (UROP) & Nikolaos Mavridis)

• Measure systematic errors, noise, time delay, dynamics… & calibrate parameters!

IV: Future StepsIV: Future Steps• Imaginator: Language to mental model!

• Voxelizer: Better shapes and categories

• Resolver: Full integration & active sensing

• Multiagent, Theory of Mind, Innate vs. learnt…

• Parts of soul: Affekt & goal modelling

Multiagent systems

• Prerequisites:– Action recognition across agents

(not strict prereq)– Thus, useful to start by embedding

everything in virtual world wrapper,and cheating on action recognition

– Also, mixed real/virtual agents (Ripleyconversing with a non-existent friend)

• Benefits:– Systematic external examination of effects of different partial world knowledge or

structure/methods of mental models (I.e. contents & form of MM), or even different sensory organs.

– For example, differing categorical boundaries and negotiated alignment (methods difference, I.e. update/prediction function)

– Prerequisite for Theory of Mind!• First preliminary examples:

– Ashwani’s demo for viewpoint-dependent description generation (using the generic MM)

Theory of mind

• Now, each agent’s MMalso contains an estimated mental model of each other agent as part of their descriptions…

• Prerequisites:– Uncertainty – Multi-agent models– Action recognition across agents (strict prereq now!, +gaze)

• Benefits:– Start playing with intention though action recognition – Interesting coupling with inferred goals etc.– “Mind reading” is an immense area for experimentation!– Collaborative tasks

Innate vs. learnt• Now that we have a clean architecture to start with

how about learning parameters or structuresof the architecture, and experimenting withlearned vs. innate (predesigned or evolved) tradeoffs?

• Examples:– Learning predictive dynamics

• Where do I expect the object to be?• Learning “empirical” newtonian mechanics

– Learning senses-to-model maps• Which property of which object does this sensory signal

inform me about, and how do its contents alter the property?– Learning language-to-model maps (example: Deb’s thesis)

• Which property of which object does this utterance inform me about, and how does it alter the property?– Learning mental model structures

• Which properties should my object descriptions contain?• How can I get an empirical derivation of 3D position as a crucial non-apparent property of an object?

– Concatenating parts at the input-output equivalence level• Forget about all the internalised fuss. Can I get an equivalent structure without postulating and enforcing the exact

architecture?

• In essence: – How arbitrary is everything that was hardcoded? Are some things redundant? Can they be learnt? If so,

How?• FINALLY, FOR ALL PREVIOUSLY STATED FUTURE PLANS:

– Relation with how humans perform (cognitive modeling) - categorical level

V: ConclusionV: Conclusion

The Picture!

General SettingInternal model of world

"A greyhouse!"

S E N S O R YW O R LD

(Im perfec t, C hang ing)

E X TE R N A LR E A LITY(Fixed S pace tim e)

A C T IV EP E R C E P TIO N

(S ensory da ta & AC TIO N S !!)

M E N TA LM O D E LS(P artia l D escrip tions)

AC T IO N S

D ATA

The ultimate goal:

Why Active Perception & Mental Models?The ultimate goal is clear:

• Let’s make Ripley and co. more fun to interact with!• And let’s learn more about us on the way…

Date post:	21-Sep-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Mental Models & Active Perceptionalumni.media.mit.edu/~nmav/misc/ActivePerc_May2004.pdf · Mental...

Documents