Predictive Learning of Sensorimotor Information as a Key for Cognitive Development, Yukie Nagai

Cognitive Neuroscience Robotics

Predictive Learning of Sensorimotor Information as a Key for Cognitive Development

Yukie Nagai Graduate School of Engineering, Osaka University

Open Lecture on Cognitive Interaction Design Kyoto Institute of Technology, July 12, 2015

Development of joint attention [Nagai et al., 2003; 2006; Nagai, 2005]

Imitation based on mirror neuron system [Nagai et al., 2011; Kawai et al., 2012]

Infant-directed action [Nagai & Rohlfing, 2009]

Gaze-head coordination in social interaction [Schillingmann et al., 2015]

Cognitive Developmental Robotics [Asada et al., 2001; 2009; Lungarella et al., 2003]

•  Aim at understanding the principle of human cognitive development by means of constructive approach –  Bridge the gap between neuroscience (micro level), and

psychology and cognitive science (macro level) –  Build human-like intelligent robots

Various Cognitive Abilities in Infants

Development as a Continuous Process C

ogni

tive

dev

elop

men

t Theory of mind

Imitation

Self-other cognition

Cooperation

Joint attention

Goal-directed action

Language use

Understanding intention

Cog

niti

ve d

evel

opm

ent

Development as a Continuous Process

Theory of mind

Imitation

Self-other cognition

Cooperation

Joint attention

Goal-directed action

Language use

Understanding intention

What is the underlying mechanism for cognitive development

(i.e., the root of the tree)?

Our Hypothesis [Nagai, in press]

Predictive learning of sensorimotor information (i.e., minimizing prediction error ei(t+1)) leads to cognitive development.

Prediction error ei(t+1) = si(t+1) − ŝi(t+1)

Sensory state si(t)

Motor command aj(t) Sensory feedback

si(t+1) Sensorimotor

system

Predicted sensory feedback ŝi(t+1)

Predicted motor command âj(t+1) Efference copy

Predictor



(1) Learn the predictor through sensorimotor experiences ! Self-other cognition ! Goal-directed action, etc.

(2) Produce an action in response to other’s action ! Imitation ! Altruistic behavior, etc.

ei(t+1)

si(t) aj(t) si(t+1) Sensorimotor

system

ŝi(t+1) âj(t+1)

Predictor

ei(t+1)


system

ŝi(t+1) âj(t+1)

Predictor

Outline

1.  Cognitive development in robots based on predictive learning –  Self-other cognition [Nagai, Kawai, & Asada, ICDL-EpiRob 2011]

–  Goal-directed action [Park, Kim, & Nagai, SAB 2014]

–  Altruistic behavior [Baraglia, Nagai, & Asada, ICDL-EpiRob 2014]

2.  Autism spectrum disorder (ASD) caused by atypical tolerance for prediction error –  Theory of underlying mechanism of ASD [Nagai, in press]

–  Simulator of atypical perception in ASD [Qin, Nagai, Kumagaya, Ayaya, & Asada, ICDL-EpiRob 2014]

Infants Start Discriminating Self from Others in First Year of Life

[Rochat & Morgan, 1995]

[Bahrick & Watson, 1985]

[Rochat & Striano, 2002]

Our Hypothesis about Self-Other Cognition

•  Spatiotemporal predictability in sensorimotor information discriminates the self from others. –  Self = perfect predictability, others = lower predictability –  Perceptual development leads to the emergence of

Mirror Neuron Systems (MNS).

(1) Immature perception ! self-other assimilation

(3) Matured perception ! self-other correspondence

(2)

Self

Others

Temporal predictability Spatial

predictability

Freq

uenc

y

[Nagai et al., ICDL-EpiRob2011; Kawai et al., IROS2012]

Computational Model for Emergence of MNS Early Stage of Development

No differentiation between Self and Others

Motor output

Visual input

[Nagai et al., ICDL-EpiRob 2011]

Computational Model for Emergence of MNS Later Stage of Development

MNS =

Self ’s motion

Others’ motion

Motor output

Visual input


Result 1: Self-Other Discrimination through Visual Development

No differentiation Self ’s motion Others’ motion

Visual development

Self

Others


Result 2: MNS Acquired in Sensorimotor Mapping

Self’

s m

otio

n O

ther

s’ m

otio

n

Motor command

(a) w/ visual development

high

low MNS

(b) w/o visual development


Result 3: Imitation Using Acquired MNS


ei(t+1)


system

ŝi(t+1) âj(t+1)

Predictor

Infant Action is Goal-Directed – Why?

[Carpenter et al., 2005]

[Bekkering et al., 2000]

Dow

nloa

ded

By:

[Uni

vers

ity o

f Bie

lefe

ld] A

t: 15

:44

28 J

uly

2008

156 BEKKERING, WOHLSCHLAÈ GER, GATTIS

paused for a few seconds before initiating the next trial. Both model and participant returned hands

to standard position between items. Each response was followed by encouragement. A video camera

placed behind the experimenter, focused on the upper body parts of both the child and the experi-

menter, recorded the movements for each action. The ®nal hand position was then analysed from the

video recording. Although latency data were also collected, they yielded results that mimic those

obtained with errors and will therefore not be reported.

Results and Discussion

Children always produced one of the six possible movements, but not always the match-

ing movement. Overall, participants produced errors in 24.5% of the trials, most of which

occurred in response to contralateral modelled movements. In 40.0% of the contralateral

trials, children produced ipsilateral movements instead–a so-called contra-ipsi error (CI-

error). In contrast, ipsilateral movements were usually imitated correctly: children made a

FIG. 1. An illustration of the six hand movements used as target actions in Experiment 1.

Dow

nloa

ded

By:

[Uni

vers

ity o

f Bie

lefe

ld] A

t: 15

:44

28 J

uly

2008















Dow

nloa

ded

By:

[Uni

vers

ity o

f Bie

lefe

ld] A

t: 15

:44

28 J

uly

2008















F16 Malinda Carpenter et al.

© Blackwell Publishing Ltd. 2005

related if the hopping or sliding action was accompaniedby a task-related sound, or if the infant looked at theaction or at E while performing it. Infants also had tohold the mouse by its body (not tail or hair) for hop-ping, and put pressure on the mouse while sliding. Foreach trial, infants’ behavior was coded as making themouse hop or slide (i.e. matching or mismatching E’sdemonstration), moving the mouse directly to the loca-tion (i.e. picking it up and putting it there withouttouching the mat in-between), some other action style(e.g. throwing the mouse to the location), or no relevantresponse (e.g. throwing the mouse off the table).

We also coded whether infants copied E’s soundeffects. Here, a match with the adult sound effect wasscored when infants made any repeated syllables (e.g.‘dedede’) in the hopping condition or any long, singlesyllable (e.g. ‘teeee’) in the sliding condition. Finally, wecoded whether infants went to the same location (left orright) as E did. A match was scored in the House con-dition if infants put the mouse in, on top of, or directlyin front of the same house as E, and in the No Housecondition if infants put the mouse in the same left orright area of the middle of the mat as E.

Because infants did not always respond on every trial,we analyzed the percentage of matches (the number oftrials in which infants matched divided by the totalnumber of trials in which they produced a task-relatedresponse) for each of the measures. To determinewhether infants were more likely to produce a matchthan a mismatch, we also analyzed the percentage ofmatches after subtracting the percentage of mismatches(e.g. the hopping style or sound effect in the sliding con-dition, or the left location when E went to the right one).This corrected score produced the same overall results asthe percentage of matches. Only infants’ first responseson every trial were used in analyses.

An independent coder re-coded 20% of infants at eachage to assess inter-observer reliability. Excellent levels ofreliability were achieved: Cohen’s kappas were .91 foraction style (with .88 for task-relatedness), .82 for soundeffect, and .92 for location.

Results

Prior to analyses we investigated the effect of order ofpresentation of the House/No House condition on eachof the dependent variables for matching action style,sound, and location. There were no significant differ-ences between orders for any of the dependent variables(t-test: p > .10 in all cases). Therefore, we collapsed theorder of presentation in all subsequent analyses. All pvalues below are one-tailed.

Matching action style

Figure 2 presents the overall percentage of trials inwhich infants matched the adult’s action style in thetwo experimental conditions. Infants matched E’s actionstyle significantly more often in the No House than inthe House condition, F(1, 98) = 117.32, p < .001, and olderinfants produced significantly more matches thanyounger infants, F(1, 98) = 20.07, p < .001. There wasalso an interaction between age and condition, F(1, 98)= 20.19, p < .001: although the effect of condition wassignificant for both ages, the magnitude of the effect waslarger in 18-month-olds, t(65) = 13.2, p < .001, than in12-month-olds, t(33) = 3.85, p < .001. The analyses ofthe corrected percentage of matches mirrored theseresults and also showed that both 12-month-olds (t(34)= 1.82, p < .05) and 18-month-olds (t(68) = 7.59, p <.001) produced more matches than mismatches.

The same pattern of results held for each action style(hopping and sliding) separately. Figures 3 (a) and (b)present the percentage of matches of the hopping andthe sliding styles, respectively, across conditions. Infantsmatched each style significantly more often in the NoHouse than the House condition (F(1, 80) = 70.58,p < .001 for hopping and F(1, 70) = 34.84, p < .001for sliding). Older infants produced significantly morematches than younger infants (F(1, 80) = 12.42, p < .001for hopping and F(1, 70) = 9.37, p = .002 for sliding).There were significant age × condition effects for bothstyles (F(1, 80) = 11.47, p < .001 for hopping and F(1,70) = 9.64, p = .002 for sliding). For each style separ-ately, condition was a significant factor at both ages,although the magnitude of the effect was larger in 18-month-olds (t(53) = 10.11, p < .001 for hopping andt(48) = 7.61, p < .001 for sliding), than in 12-month-olds(t(27) = 3.07, p = .003 for hopping and t(22) = 1.91, p <.05 for sliding).

Figure 2 The overall percentage of matches in action style across conditions as a function of age.

Infants copy goals F15


Materials

A stuffed toy mouse and two identical black 51.5 cm ×64.5 cm mats were used. Attached to the top of one ofthe mats were two houses made of a rectangular 6 cm ×6 cm × 12 cm tube and a cardboard roof (see Figure 1).The houses were 18 cm apart from each other and cen-tered on the mat. The other mat was plain, with nothingattached.

Procedure

Infants sat on their parents’ laps across the table from afemale experimenter (E). First, E gave the mouse toinfants briefly so they could become familiar with it. Ethen placed the assigned mat (see below) on the table, letinfants explore it briefly, and then began the tests.

There were eight trials, four in each of two conditions:House and No House, corresponding to the two mats.The four trials of each condition were blocked and theorder of the blocks was counterbalanced. In both condi-tions, E performed exactly the same actions, to the same

location on the mat; the only difference between condi-tions was whether there was a house on the mat in thefinal location. For each trial, E obtained the infant’sattention and then moved the mouse to the final locationusing one of two action styles: hopping or sliding. Forthe hopping action style, E made the mouse jump in astraight line to the location, breaking contact with the matapproximately eight times. Each hop was accompaniedby a [bi] sound (so infants heard ‘beebeebeebee . . .’).For the sliding action style, E moved the mouse in astraight line to the location, never breaking contact withthe mat. This action was accompanied by one long‘beeeeeeeee’ sound.

Half the trials in each condition were to the left loca-tion and half were to the right. In half of each of thesetrials, E made the mouse hop and in half she made itslide. For each infant, the order of left/right and hop/slide trials was the same for each of the two conditions’blocks. E held the mouse in her right hand for actions toher right side and in her left hand for actions to her leftside. She started the action from the middle of her sideof the mat.

In each of the eight trials, after a single demonstrationof E moving the mouse to one of the locations and leav-ing it there for a few seconds, E picked up the mouseand placed it in front of infants in the middle of theirside of the mat. E told infants, ‘Now you.’ E waited untilinfants made a relevant response or else made it clearthat they did not wish to respond (e.g. by throwing themouse or giving it back to E), and then E took themouse and went on to the next demonstration.

Coding

It was not possible to code infants’ responses blind toexperimental condition (House/No House) because houseswere present or not throughout the response period.However, coders did not watch demonstrations and sowere blind to the action style and sound effects E used,and the left/right location in which she put the mouse.

The main measure of interest was whether infantscopied E’s action style differently in the House versusNo House conditions. All hopping and sliding move-ments were coded, even those not directed at the correctlocations. Hopping was coded if infants made the mousebreak contact with the mat more than once and slidingwas coded if infants pushed the mouse on the mat with-out breaking contact. However, because coders notedthat infants sometimes banged or slid the mouse aroundrandomly, we added the additional criterion that thehopping or sliding needed to be task-related, but thepattern of results was the same at the overall level whenall responses were included. Actions were coded as task-

Figure 1 (a) The mouse at the start location in the House condition. (b) The mouse at the end location in the No House condition.

Infants copy goals F17


Matching sound

Figure 4 presents the overall percentage of trials inwhich infants matched the adult’s sound effect in thetwo experimental conditions. Infants matched E’s soundeffect significantly more often in the No House than inthe House condition, F(1, 98) = 6.82, p < .01. This was thecase for the hopping sound separately as well, F(1, 80) =

7.70, p < .01. Overall, older infants produced significantlymore matches than younger infants, F(1, 98) = 15.96,p < .001, and this was the case for the hopping soundseparately again, F(1, 80) = 11.14, p < .001. There wasno significant age × condition effect, either overall or forthe hopping sound separately. For the sliding soundseparately, older infants produced significantly morematches than younger infants, F(1, 71) = 10.57, p = .001,but those matches occurred independently of condition(and there was no significant age × condition interaction).The analyses of the corrected percentage of matchesmirrored these overall results. Moreover, 18-month-oldsproduced more matches than mismatches overall, t(68) =5.11, p < .001, and 12-month-olds tended to do this, t(34)= 1.34, p = .095.

Matching location

Figure 5 presents the overall percentage of trials inwhich infants matched the adult’s location in the twoexperimental conditions. In this case – differently fromthe above analyses – infants matched E’s location signi-ficantly more often in the House than in the No House con-dition, F(1, 98) = 180.83, p < .001. When houses werepresent, infants very often placed the mouse in one ofthem, whereas when no houses were present, infants paidmuch less attention to the location that was the terminusof the mouse’s travels on the mat. There was no signi-ficant effect of age and no age × condition interaction. Theanalyses of the corrected percentage of matches mirroredthese results. Moreover, both age groups produced morematches than mismatches (18-month-olds: t(68) = 2.32,p < .02; 12-month-olds: t(34) = 4.52, p < .001).

Combinations of responses

Because we were also interested in infants’ social learn-ing skills above and beyond their understanding of the

Figure 3 The percentage of action style matches for (a) the hopping style and (b) the sliding style.

Figure 4 The overall percentage of matches of the demonstrated sound effect across conditions as a function of age.

Figure 5 The overall percentage of matches of location across conditions as a function of age.

Goal-match Means-match

Dow

nloa

ded

By:

[Uni

vers

ity o

f Bie

lefe

ld] A

t: 15

:44

28 J

uly

2008


children’s imitation of these gestures. We reasoned that limiting the set of movements to

only one ear would eliminate the problem of choosing an ear. As a consequence, another

goal–using the correct hand–might be ful®lled.

Method

Subjects

Participants were nine pre-school children, aged 4:0 to 5:11 years (mean age of 4:4 years). Each

child was tested individu ally in a quiet room.

Design and Procedure

Experiment 2 was similar to Experiment 1, with the exception that now only two movements were

modelled : an ipsilateral and a contralateral movement, both to the same ear. Right and left ear were

counterbalanced between-participants. For four children, the model always moved with either the

left (thus with the ipsilateral) hand or the right (thus with the contralateral) hand to the left ear; for

the other ®ve participants the model always moved to the right ear. The two movements were

repeated 12 times in total, in a random order, resulting in 6 ipsi- and 6 contralateral hand movements.

This time, all children were simply instructed, ``You do what I do’’.2

FIG. 2. Percentage of errors for the different conditions in Experiments 1±3. The blank bar represents errors

on contralateral movement trials, and the striped bar represents errors on ipsilateral movment trials.

2Although in Experiment 1 we instructed the children with the words, ``Try to imitate me as if you were my

mirror. You do what I do’’, in Experiments 2 and 3 we used the minimal instruction ``You do what I do’’, because

in a pilot experiment we observed that for children this automatically implies that they will copy the movements

as if they were looking in a mirror, as previously observed by Scho®eld (1976).

Contralateral

Ipsilateral

[Gleissner et al., 2000]

•  Difference in prediction error produces hierarchical development of actions. –  Goal: large dynamics ! larger error ! learned first

–  Means: small dynamics ! smaller error ! learned later

Our Hypothesis about Hierarchical Representation of Action

Means Goal

Init

[Park, Kim, & Nagai, SAB 2014]

Learning Goal-Directed Actions Using RNNPB

•  Recurrent Neural Network with Parametric Bias (RNNPB) [Tani & Ito, 2003]

–  Represent multiple time-series of data using PBs –  Learn with back-propagation through time

PB1

PB2

Before learning

PB1

PB2

After learning

[Park, Kim, & Nagai, SAB 2014] �(t) C(t)

C(t+1)

PB

�(t+1)

Result 1: Developmental Change in PBs & Generated Actions (t = 0)


A, B: Goal 0, 1, 2: Means

: Desired trajectory : Acquired trajectory

Result 1: Developmental Change in PBs & Generated Actions (t = 1,000)
























Result 2: Action Generation by Robot

A1 B1

A1 B1

Middle stage of learning: ! Goal only

Later stage of learning: ! Goal + means

(t = 3,500)

(t = 200,000) : Desired trajectory


Infants Help Others Even Without Reward – Why?

[Warneken & Tomasello, 2006]

Two Theories for Altruistic Behaviors [Paulus, 2014]

•  Emotion-sharing theory –  Understand other person as an

intentional agent [Batson, 1991]

–  Be motivated to help other based on empathic concern for other’s needs [Davidov et al., 2013]

–  Self-other differentiation

•  Goal-alignment theory –  Understand other’s goal, but not his/

her intention [Barresi & Moore, 1996] –  Take over other’s goal as if it were

infant’s own –  No self-other discrimination

[Warneken & Tomasello, 2006]

Our Hypothesis about Emergence of Altruistic Behavior

1.  Learn the predictor by minimizing the prediction error ei(t+1) through the robot’s own experiences

Cover

Push

[Baraglia, Nagai, & Asada, ICDL-EpiRob 2014]

ei(t+1)


system

ŝi(t+1) âj(t+1)

Predictor


1.  Learn the predictor to minimize the prediction error ei(t+1) through the robot’s own experiences

2.  Estimate ei(t+1) while observing other’s action si(t+1)

Predict

(Other person)

ei(t+1) "


system

ŝi(t+1) âj(t+1)

Predictor


Push


1.  Learn the predictor to minimize the prediction error ei(t+1) through the robot’s own experiences

2.  Estimate ei(t+1) while observing other’s action si(t+1)

3.  Execute the action âj(t+1) to minimize ei(t+1) if ei(t+1) > threshold

! Altruistic behavior

ei(t+1) #


system

ŝi(t+1) âj(t+1)

Predictor


Result: Emergence of Altruistic Behavior






ei(t+1)


system

ŝi(t+1) âj(t+1)

Predictor

ei(t+1)


system

ŝi(t+1) âj(t+1)

Predictor

Outline

1.  Cognitive development in robots based on predictive learning –  Self-other cognition [Nagai, Kawai, & Asada, ICDL-EpiRob 2011]

–  Goal-directed action [Park, Kim, & Nagai, SAB 2014]

–  Altruistic behavior [Baraglia, Nagai, & Asada, ICDL-EpiRob 2014]

2.  Autism spectrum disorder (ASD) caused by atypical tolerance for prediction error –  Theory of underlying mechanism of ASD [Nagai, in press]

–  Simulator of atypical perception in ASD [Qin, Nagai, Kumagaya, Ayaya, & Asada, ICDL-EpiRob 2014]

Autism Spectrum Disorder (ASD)

•  Difficulties in communicating with other people [Baron-Cohen, 1995; Charman et al., 1997; Mundy et al., 1986]

–  Lack of theory of mind –  Less eye contact –  Difficulty in joint attention, etc.

•  Atypical perception & weak central coherence [O’Neill & Jones,1997; Happé & Frith, 2006; Ayaya & Kumagaya, 2008] –  Hyperesthesia/hypoesthesia –  Weak ability to integrate

information, etc.

Perception

Social

Atypical Perception in ASD

[Behrmann et al., 2006]

116 M. Behrmann et al. / Neuropsychologia 44 (2006) 110–129

Fig. 2. Examples of stimuli and results of global/local task. (a) Four com-pound stimuli, two of which are consistent and share identity at the globaland local level and two of which do not share identity at the global and locallevel. (b) RT (and one SE) for means for control and autism group for globaland local identification as a function of consistency.

shared identity (a large H made of smaller Hs and a large Smade of small Ss) or inconsistent letters, in which the lettersat the two levels had different identities (a large H madeof small Ss and a large S made of small Hs; see Fig. 2a).The global letter subtended 3.2◦ in height and 2.3◦ in width,and the local letter subtended 0.44◦ in height and 0.53◦ inwidth.The experiment consisted of the factorial combination

of two variables in a repeated measures design: globality(global identification versus local identification), and con-sistency (consistent stimuli versus inconsistent stimuli). Thetwo tasks, local or global identification, were administeredin separate blocks of 96 experimental trials each, precededby 10 practice trials. The consistent and inconsistent letterswere randomized within block with each letter occurring onan equal number of trials, for a total of 192 trials. Before eachblock, participants were verbally instructed to respond to theglobal or local letters. Each trial was initiated with a centralfixation cross of 500ms duration. This was immediately re-placed by one of the four possible stimuli, which remainedcentrally on the screen until a response was made. Partici-pants were instructed to press the left key on the button box(or keyboard) to indicate a response of ‘s’ or the right keyfor ‘h’. The order of the blocks and response designation wascounterbalanced across subjects.

4.1.2. Results and discussionAt the outset, we note that there is neither a group differ-

ence between autistic and control subjects, nor an interactionof any sort, in the accuracy data (all F< 1). Autistic and con-trol subjects were correct on average 98% (S.D., 2%) and98.2% (S.D., 1.3%) of the time, respectively. The high ac-curacy rate is not surprising given the unlimited exposureduration and ease of task (making s/h decisions). The RTdata, calculated on the median for each subject for each con-dition, reveals a significant three-way interaction betweengroup× globality (global and local)× consistency (consis-tent and inconsistent) (F(1,39) = 4.9, p< 0.05). There are alsomain effects of group and of globality (p< 0.0001). As is evi-dent from Fig. 2b, under these testing conditions, the controlsubjects responded quickly and showed a slight advantagefor global over local identification (23ms) and a slight asym-metry with greater slowing in the inconsistent case (relativeto the congruent case) when local identification is required(interference from globally incongruous letter) than whenglobal identification is required (interference from locallyincongruous letters). This global advantage and the global-to-local interference replicates the standard findings (Navon,2003), although the condition differences may not be as largeas usual given the unlimited duration and repeated foveal pre-sentation.The autistic subjects were slower than the control sub-

jects overall, but most importantly, a different pattern of per-formance is observed for them (Fig. 2b). The autistic groupis overall faster for local than global identification althoughthis difference comes from the inconsistent trials: there is nostatistically significant difference between global and localidentification in the consistent case, but in the inconsistentcase, local identification is faster than global identification(p< 0.05). The latter result, namely greater slowing in theinconsistent case when global identification is required, in-dicates a large local-to-global interference. The faster localidentification and the local-to-global interference both indi-cate that autistic individuals show a local bias in their pro-cessing. It is the case, however, that there is no obvious localadvantage in the consistent case suggesting that there maybe some partial processing of the global identity too which,when congruent with the local letter, can be extracted.We caninfer then that, under the conditions employed here, the autis-tic individuals were able to derive the global configuration inthe consistent condition, but that their local bias gave riseto large interference in the inconsistent condition, suggest-ing that it was difficult to attain a stable global configurationwhen the elements had a conflicting identity.Given the ongoing controversy in the autism literature con-

cerning the extent to which processing is locally biased andthe extent to which configural processing is possible, andgiven the suggestion that the autistic group in this experi-ment may be able to derive the global identity as well asthe local identity in the consistent case, we examined thedata of each autistic subject individually. To this end, we cal-culated the number of autistic individuals who fall outside

[Ayaya & Kumagaya, 2008]

Original

ASD

Our Hypothesis about Mechanism of ASD

•  ASD might be caused by an atypical tolerance for prediction error in predictive learning. [Ayaya & Kumagaya, 2008; Nagai, in press]

Sensorimotor information

Typically developing people Proper tolerance for prediction error

People with ASD Atypical tolerance for

prediction error

(smaller tolerance ! hyperesthesia)

(larger tolerance ! hypoesthesia)

[Qin et al., ICDL-EpiRob 2014; Nagai et al., Japanese CogSci 2015]

Simulator of Atypical Perception in ASD

Cognitive Neuroscience Robotics

Conclusion





ei(t+1)


system

ŝi(t+1) âj(t+1)

Predictor

ei(t+1)


system

ŝi(t+1) âj(t+1)

Predictor

Our Hypothesis about Mechanism of ASD

•  ASD might be caused by an atypical tolerance for the prediction error in predictive learning. [Ayaya & Kumagaya, 2008; Nagai, in press]

Sensorimotor information

Typically developing people Proper tolerance for prediction error

People with ASD Atypical tolerance for

prediction error

(small tolerance ! hyperesthesia)

(large tolerance ! hypoesthesia)

Increasing Interest in Predictive Learning

Predictive coding under the free-energy principleKarl Friston* and Stefan Kiebel

The Wellcome Trust Centre of Neuroimaging, Institute of Neurology, University College London,Queen Square, London WC1N 3BG, UK

This paper considers prediction and perceptual categorization as an inference problem that is solvedby the brain. We assume that the brain models the world as a hierarchy or cascade of dynamicalsystems that encode causal structure in the sensorium. Perception is equated with the optimization orinversion of these internal models, to explain sensory data. Given a model of how sensory data aregenerated, we can invoke a generic approach to model inversion, based on a free energy bound on themodel’s evidence. The ensuing free-energy formulation furnishes equations that prescribe theprocess of recognition, i.e. the dynamics of neuronal activity that represent the causes of sensoryinput. Here, we focus on a very general model, whose hierarchical and dynamical structure enablessimulated brains to recognize and predict trajectories or sequences of sensory states. We first reviewhierarchical dynamical models and their inversion. We then show that the brain has the necessaryinfrastructure to implement this inversion and illustrate this point using synthetic birds that canrecognize and categorize birdsongs.

Keywords: generative models; predictive coding; hierarchical; birdsong

1. INTRODUCTIONThis paper reviews generic models of our sensoriumand a Bayesian scheme for their inversion. We thenshow that the brain has the necessary anatomical andphysiological equipment to invert these models, givensensory data. Critically, the scheme lends itself to arelatively simple neural network implementation thatshares many features with real cortical hierarchies inthe brain. The basic idea that the brain tries to infer thecauses of sensations dates back to Helmholtz (e.g.Helmholtz 1860/1962; Barlow 1961; Neisser 1967;Ballard et al. 1983; Mumford 1992; Kawato et al. 1993;Dayan et al. 1995; Rao & Ballard 1998), with a recentemphasis on hierarchical inference and empirical Bayes(Friston 2003, 2005; Friston et al. 2006). Here, wegeneralize this idea to cover dynamics in the world andconsider how neural networks could be configured toinvert hierarchical dynamical models and deconvolvesensory causes from sensory input.

This paper comprises four sections. In §1, weintroduce hierarchical dynamical models and theirinversion. These models cover most of the modelsencountered in the statistical literature. An importantaspect of thesemodels is their formulation in generalizedcoordinates of motion, which lends them a hierarchalform in both structure and dynamics. These hierarchiesinduce empirical priors that provide structural anddynamical constraints, which can be exploited duringinversion. In §2, we show how inversion can beformulated as a simple gradient ascent using neuronalnetworks; in §3, we consider how evoked brain responsesmight be understood in terms of inference underhierarchical dynamical models of sensory input.1

2. HIERARCHICAL DYNAMICAL MODELSIn this section, we look at dynamical generative modelspð y;wÞZpð y jwÞp ðwÞ that entail a likelihood, p(yjw), ofgetting some data, y, given some causes, wZ{x, v, q},and priors on those causes, p(w). The sorts of modelswe consider have the following form:

yZ gðx; v; qÞCz;

_xZ f ðx; v; qÞCw;ð2:1Þ

where the nonlinear functions f and g of the states areparametrized by q. The states v(t) can be deterministic,stochastic or both, and are variously referred to asinputs, sources or causes. The states x(t) meditate theinfluence of the input on the output and endow thesystem with memory. They are often referred to ashidden states because they are seldom observeddirectly. We assume that the stochastic innovations(i.e. observation noise) z(t) are analytic, such that thecovariance of ~zZ ½z; z 0; z 00;.$T is well defined; simi-larly, for w(t), which represents random fluctuations onthe motion of hidden states. Under local linearityassumptions, the generalized motion of the output orresponse ~yZ ½ y; y 0; y 00;.$T is given by

yZ gðx; vÞCz x 0 Z f ðx; vÞCw

y 0 Z gxx0 Cgvv

0 Cz 0 x 00 Z fxx0 C fvv

0 Cw 0

y 00 Z gxx00 Cgvv

00 Cz 00 x000 Z fxx00 C fvv

00 Cw 00

« «

ð2:2Þ

The first (observer) equation shows that the generalizedstates uZ ½ ~v; ~x; $T are needed to generate a generalizedresponse or trajectory. The second (state) equationsenforce a coupling between different orders of themotion of the hidden states and confer memory on thesystem. We can write these equations compactly as

Phil. Trans. R. Soc. B (2009) 364, 1211–1221

doi:10.1098/rstb.2008.0300

One contribution of 18 to a Theme Issue ‘Predictions in the brain:using our past to prepare for the future’.

*Author for correspondence ([email protected]).

1211 This journal is q 2009 The Royal Society

on 30 March 2009rstb.royalsocietypublishing.orgDownloaded from

REVIEW

Predictive coding: an account of the mirror neuron system

James M. Kilner Æ Karl J. Friston Æ Chris D. Frith

Received: 21 February 2007 / Revised: 19 March 2007 / Accepted: 21 March 2007 / Published online: 12 April 2007! Marta Olivetti Belardinelli and Springer-Verlag 2007

Abstract Is it possible to understand the intentions of

other people by simply observing their actions? Many be-lieve that this ability is made possible by the brain’s mirror

neuron system through its direct link between action and

observation. However, precisely how intentions can beinferred through action observation has provoked much

debate. Here we suggest that the function of the mirror

system can be understood within a predictive codingframework that appeals to the statistical approach known as

empirical Bayes. Within this scheme the most likely cause

of an observed action can be inferred by minimizing theprediction error at all levels of the cortical hierarchy that

are engaged during action observation. This account

identifies a precise role for the mirror system in our abilityto infer intentions from actions and provides the outline of

the underlying computational mechanisms.

Keywords Mirror neurons ! Action observation !Bayesian inference ! Predictive coding

Introduction

The notion that actions are intrinsically linked to percep-

tion was proposed by William James, who claimed, ‘‘every

mental representation of a movement awakens to somedegree the actual movement which is its object’’ (James

1890). The implication is that observing, imagining, or in

anyway representing an action excites the motor program

used to execute that same action (Jeannerod 1994; Prinz

1997). Interest in this idea has grown recently, in part dueto the neurophysiological discovery of ‘‘mirror’’ neurons.

Mirror neurons discharge not only during action execution

but also during action observation, which has led many tosuggest that these neurons are the substrate for action

understanding.

Mirror-neurons were first discovered in the premotorarea, F5, of the macaque monkey (Di Pellegrino et al.

1992; Gallese et al. 1996; Rizzolatti et al. 2001; Umilta

et al. 2001) and have been identified subsequently in anarea of inferior parietal lobule, area PF (Gallese et al. 2002;

Fogassi et al. 2005). Neurons in the superior temporal

sulcus (STS), also respond selectively to biologicalmovements, both in monkeys (Oram and Perrett 1994) and

in humans (Frith and Frith 1999; Allison et al. 2000;

Grossman et al. 2000) but they are not mirror-neurons, asthey do not discharge during action execution. Neverthe-

less, they are often considered part of the mirror neuron

system (MNS; Keysers and Perrett 2004) and we willconsider them as such here. These three cortical areas,

which constitute the MNS, the STS, area PF and area F5,

are reciprocally connected. In the macaque monkey, areaF5 in the premotor cortex is reciprocally connected to area

PF (Luppino et al. 1999) creating a premotor–parietal MNS

and STS is reciprocally connected to area PF of the inferiorparietal cortex (Harries and Perrett 1991; Seltzer and

Pandya 1994) providing a sensory input to the MNS (see

Keysers and Perrett 2004 for a review). Furthermore, thesereciprocal connections show regional specificity. Although

STS has extensive connections with the inferior parietal

lobule, area PF is connected to an area of the STS that isspecifically activated by observation of complex body

movements. An analogous pattern of connectivity between

premotor areas and inferior parietal lobule has also been

J. M. Kilner (&) ! K. J. Friston ! C. D. FrithThe Wellcome Trust Centre for Neuroimaging,Institute of Neurology, 12 Queen Square,WC1N 3BG London, UKe-mail: [email protected]

123

Cogn Process (2007) 8:159–166

DOI 10.1007/s10339-007-0170-2

only underlies motor but cognitive and social skills as well”. Interest-ingly, they note consolidation of memory traces after the initial acqui-sition can “result in increased resistance to interference or evenimprovement in performance following an offline period”. This is afascinating observation that suggests optimisation of the brain's gen-erative model does not necessarily need online sensory data. Indeed,there are current theories about the role of sleep in optimising thebrain's generative model, not in terms of its ability to accurately pre-dict data, but in terms of minimising complexity. Mathematically, thisis interesting because surprise or model evidence can be decomposedinto accuracy and complexity terms; suggesting that model evidencecan be increased by removing redundant model components or pa-rameters (Friston, 2010). This provides a nice Bayesian perspectiveon synaptic pruning and the issues considered by Németh andJanacsek (2012-this issue).

4. Active inference

As noted above, a simple extension to predictive coding is to con-sider their suppression by the motor system. In this extension, predic-tion errors are not just suppressed by optimising top-down ordescending predictions but can also be reduced by changing sensoryinput. This does not necessarily mean visual or auditory input butthe proprioceptive input responding to bodily movements. As notedabove, the suppression of proprioceptive prediction errors is, ofcourse, just the classical reflex arc. In this view, motor control be-comes a function of descending predictions about anticipated or pre-dicted kinematic trajectories. See Fig. 1 for a schematic. The importantobservation here is that the same sorts of synaptic mechanisms andinferential principles can be applied to both perception and the con-sequences of action. This nicely accommodates the literature on

error related negativity reviewed by Hoffmann and Falkenstein(2012-this issue); who consider the “monitoring of one’s own ac-tions” and its role in adjusting behaviour. Again, the focus is on EEG,suggesting that even within single trial recordings, the neurophysio-logical correlates of behaviour-dependent prediction errors can beobserved empirically. In their words: “The initiated response is com-pared with the desired response and a difference; i.e., mismatch be-tween both representations induces the error negativity”. This is notthe proprioceptive prediction error that drives reflex arcs but a highlevel perceptual (or indeed conceptual) prediction error; suggestingthat the long-term hierarchical predictions of unfolding sensory andkinematic changes have been violated. In other words, these phe-nomena speak again to separation of temporal scales and hierarchiesin providing multimodal predictions to the peripheral sensory andmotor systems.

Active inference means that movements are caused by top-downpredictions, which means that the brain must have a model of whatcaused these movements. This begs the interesting question as towhether there is any sense of agency associated with representations.In other words, if I expect to move my fingers and classical motor re-flexes cause them to move, do I need to know that it was me who ini-tiated the movement? Furthermore, can I disambiguate between meas the agent or another. These are deep questions and move us onto issues of self modelling and action observation:

5. Action observation and agency

In a nice analysis of agency, gait and self consciousness, Kannapeand Blanke (2012-this issue) start by acknowledging: “Agency is animportant aspect of bodily self consciousness, allowing us to separateown movements from those induced by the environment and to

Fig. 1. This figure illustrates the neuronal architectures thatmight implement predictive coding and active inference. The left panel shows a schematic of predictive coding schemes inwhich Bayesian filtering is implemented by neuronal message passing between superficial (red) and deep (black) pyramidal cells encoding prediction errors and conditional pre-dictions or estimates respectively (Mumford 1992). In these predictive coding schemes, top-down predictions conveyed by backward connections are compared with conditionalexpectations at the lower level to form a prediction error. This prediction error is then passed forward to update the expectations in a Bayes-optimal fashion. In active inference,this scheme is extended to include classical reflex arcs, where proprioceptive prediction errors drive action— a (alpha motor neurons in the ventral horn of the spinal-cord) to elicitextrafusal muscle contractions and changes in primary sensory afferents frommuscle spindles. These suppress prediction errors encoded by Renshaw cells. The right panel presents aschematic of units encoding conditional expectations and prediction errors at some arbitrary level in a cortical hierarchy. In this example, there is a distinction between hidden statesxx that model dynamics and hidden causes xv that mediate the influence of one level on the level below. The equations correspond to a generalized Bayesian filtering or predictivecoding in generalized coordinates of motion as described in (Friston, 2010). In this hierarchical form f(i) := f(xx(i),xv(i)) corresponds to the equations of motion at the i-th level, whileg(i) :=g(xx(i),xv(i)) link levels. These equations constitute the agent's prior beliefs. D is a derivative operator and Π(i) represents precision or inverse variance. These equations wereused in the simulations presented in the next figure.

250 K. Friston / International Journal of Psychophysiology 83 (2012) 248–252

Thank You!

Osaka University •  Minoru Asada •  Jimmy Baraglia •  Yuji Kawai •  Shibo Qin •  Many students University of Tokyo •  Shinichiro Kumagaya •  Satsuki Ayaya KAIST •  Jun-Cheol Park

[email protected] http://cnr.ams.eng.osaka-u.a.jp/~yukie/

Date post:	10-Aug-2015
Category:	Science
Upload:	kit-cognitive-interaction-design
View:	234 times
Download:	2 times

Predictive Learning of Sensorimotor Information as a Key for Cognitive Development, Yukie Nagai

Science