Date post: | 10-Aug-2015 |
Category: |
Science |
Upload: | kit-cognitive-interaction-design |
View: | 234 times |
Download: | 2 times |
Cognitive Neuroscience Robotics
Predictive Learning of Sensorimotor Information as a Key for Cognitive Development
Yukie Nagai Graduate School of Engineering, Osaka University
Open Lecture on Cognitive Interaction Design Kyoto Institute of Technology, July 12, 2015
Development of joint attention [Nagai et al., 2003; 2006; Nagai, 2005]
Imitation based on mirror neuron system [Nagai et al., 2011; Kawai et al., 2012]
Infant-directed action [Nagai & Rohlfing, 2009]
Gaze-head coordination in social interaction [Schillingmann et al., 2015]
Cognitive Developmental Robotics [Asada et al., 2001; 2009; Lungarella et al., 2003]
• Aim at understanding the principle of human cognitive development by means of constructive approach – Bridge the gap between neuroscience (micro level), and
psychology and cognitive science (macro level) – Build human-like intelligent robots
Various Cognitive Abilities in Infants
Development as a Continuous Process C
ogni
tive
dev
elop
men
t Theory of mind
Imitation
Self-other cognition
Cooperation
Joint attention
Goal-directed action
Language use
Understanding intention
Cog
niti
ve d
evel
opm
ent
Development as a Continuous Process
Theory of mind
Imitation
Self-other cognition
Cooperation
Joint attention
Goal-directed action
Language use
Understanding intention
What is the underlying mechanism for cognitive development
(i.e., the root of the tree)?
Our Hypothesis [Nagai, in press]
Predictive learning of sensorimotor information (i.e., minimizing prediction error ei(t+1)) leads to cognitive development.
Prediction error ei(t+1) = si(t+1) − ŝi(t+1)
Sensory state si(t)
Motor command aj(t) Sensory feedback
si(t+1) Sensorimotor
system
Predicted sensory feedback ŝi(t+1)
Predicted motor command âj(t+1) Efference copy
Predictor
Our Hypothesis [Nagai, in press]
Predictive learning of sensorimotor information (i.e., minimizing prediction error ei(t+1)) leads to cognitive development.
(1) Learn the predictor through sensorimotor experiences ! Self-other cognition ! Goal-directed action, etc.
(2) Produce an action in response to other’s action ! Imitation ! Altruistic behavior, etc.
ei(t+1)
si(t) aj(t) si(t+1) Sensorimotor
system
ŝi(t+1) âj(t+1)
Predictor
ei(t+1)
si(t) aj(t) si(t+1) Sensorimotor
system
ŝi(t+1) âj(t+1)
Predictor
Outline
1. Cognitive development in robots based on predictive learning – Self-other cognition [Nagai, Kawai, & Asada, ICDL-EpiRob 2011]
– Goal-directed action [Park, Kim, & Nagai, SAB 2014]
– Altruistic behavior [Baraglia, Nagai, & Asada, ICDL-EpiRob 2014]
2. Autism spectrum disorder (ASD) caused by atypical tolerance for prediction error – Theory of underlying mechanism of ASD [Nagai, in press]
– Simulator of atypical perception in ASD [Qin, Nagai, Kumagaya, Ayaya, & Asada, ICDL-EpiRob 2014]
Infants Start Discriminating Self from Others in First Year of Life
[Rochat & Morgan, 1995]
[Bahrick & Watson, 1985]
[Rochat & Striano, 2002]
Our Hypothesis about Self-Other Cognition
• Spatiotemporal predictability in sensorimotor information discriminates the self from others. – Self = perfect predictability, others = lower predictability – Perceptual development leads to the emergence of
Mirror Neuron Systems (MNS).
(1) Immature perception ! self-other assimilation
(3) Matured perception ! self-other correspondence
(2)
Self
Others
Temporal predictability Spatial
predictability
Freq
uenc
y
[Nagai et al., ICDL-EpiRob2011; Kawai et al., IROS2012]
Computational Model for Emergence of MNS Early Stage of Development
No differentiation between Self and Others
Motor output
Visual input
[Nagai et al., ICDL-EpiRob 2011]
Computational Model for Emergence of MNS Later Stage of Development
MNS =
Self ’s motion
Others’ motion
Motor output
Visual input
[Nagai et al., ICDL-EpiRob 2011]
Result 1: Self-Other Discrimination through Visual Development
No differentiation Self ’s motion Others’ motion
Visual development
Self
Others
[Nagai et al., ICDL-EpiRob 2011]
Result 2: MNS Acquired in Sensorimotor Mapping
Self’
s m
otio
n O
ther
s’ m
otio
n
Motor command
(a) w/ visual development
high
low MNS
(b) w/o visual development
[Nagai et al., ICDL-EpiRob 2011]
Result 3: Imitation Using Acquired MNS
[Nagai et al., ICDL-EpiRob 2011]
ei(t+1)
si(t) aj(t) si(t+1) Sensorimotor
system
ŝi(t+1) âj(t+1)
Predictor
Infant Action is Goal-Directed – Why?
[Carpenter et al., 2005]
[Bekkering et al., 2000]
Dow
nloa
ded
By:
[Uni
vers
ity o
f Bie
lefe
ld] A
t: 15
:44
28 J
uly
2008
156 BEKKERING, WOHLSCHLAÈ GER, GATTIS
paused for a few seconds before initiating the next trial. Both model and participant returned hands
to standard position between items. Each response was followed by encouragement. A video camera
placed behind the experimenter, focused on the upper body parts of both the child and the experi-
menter, recorded the movements for each action. The ®nal hand position was then analysed from the
video recording. Although latency data were also collected, they yielded results that mimic those
obtained with errors and will therefore not be reported.
Results and Discussion
Children always produced one of the six possible movements, but not always the match-
ing movement. Overall, participants produced errors in 24.5% of the trials, most of which
occurred in response to contralateral modelled movements. In 40.0% of the contralateral
trials, children produced ipsilateral movements instead–a so-called contra-ipsi error (CI-
error). In contrast, ipsilateral movements were usually imitated correctly: children made a
FIG. 1. An illustration of the six hand movements used as target actions in Experiment 1.
Dow
nloa
ded
By:
[Uni
vers
ity o
f Bie
lefe
ld] A
t: 15
:44
28 J
uly
2008
156 BEKKERING, WOHLSCHLAÈ GER, GATTIS
paused for a few seconds before initiating the next trial. Both model and participant returned hands
to standard position between items. Each response was followed by encouragement. A video camera
placed behind the experimenter, focused on the upper body parts of both the child and the experi-
menter, recorded the movements for each action. The ®nal hand position was then analysed from the
video recording. Although latency data were also collected, they yielded results that mimic those
obtained with errors and will therefore not be reported.
Results and Discussion
Children always produced one of the six possible movements, but not always the match-
ing movement. Overall, participants produced errors in 24.5% of the trials, most of which
occurred in response to contralateral modelled movements. In 40.0% of the contralateral
trials, children produced ipsilateral movements instead–a so-called contra-ipsi error (CI-
error). In contrast, ipsilateral movements were usually imitated correctly: children made a
FIG. 1. An illustration of the six hand movements used as target actions in Experiment 1.
Dow
nloa
ded
By:
[Uni
vers
ity o
f Bie
lefe
ld] A
t: 15
:44
28 J
uly
2008
156 BEKKERING, WOHLSCHLAÈ GER, GATTIS
paused for a few seconds before initiating the next trial. Both model and participant returned hands
to standard position between items. Each response was followed by encouragement. A video camera
placed behind the experimenter, focused on the upper body parts of both the child and the experi-
menter, recorded the movements for each action. The ®nal hand position was then analysed from the
video recording. Although latency data were also collected, they yielded results that mimic those
obtained with errors and will therefore not be reported.
Results and Discussion
Children always produced one of the six possible movements, but not always the match-
ing movement. Overall, participants produced errors in 24.5% of the trials, most of which
occurred in response to contralateral modelled movements. In 40.0% of the contralateral
trials, children produced ipsilateral movements instead–a so-called contra-ipsi error (CI-
error). In contrast, ipsilateral movements were usually imitated correctly: children made a
FIG. 1. An illustration of the six hand movements used as target actions in Experiment 1.
F16 Malinda Carpenter et al.
© Blackwell Publishing Ltd. 2005
related if the hopping or sliding action was accompaniedby a task-related sound, or if the infant looked at theaction or at E while performing it. Infants also had tohold the mouse by its body (not tail or hair) for hop-ping, and put pressure on the mouse while sliding. Foreach trial, infants’ behavior was coded as making themouse hop or slide (i.e. matching or mismatching E’sdemonstration), moving the mouse directly to the loca-tion (i.e. picking it up and putting it there withouttouching the mat in-between), some other action style(e.g. throwing the mouse to the location), or no relevantresponse (e.g. throwing the mouse off the table).
We also coded whether infants copied E’s soundeffects. Here, a match with the adult sound effect wasscored when infants made any repeated syllables (e.g.‘dedede’) in the hopping condition or any long, singlesyllable (e.g. ‘teeee’) in the sliding condition. Finally, wecoded whether infants went to the same location (left orright) as E did. A match was scored in the House con-dition if infants put the mouse in, on top of, or directlyin front of the same house as E, and in the No Housecondition if infants put the mouse in the same left orright area of the middle of the mat as E.
Because infants did not always respond on every trial,we analyzed the percentage of matches (the number oftrials in which infants matched divided by the totalnumber of trials in which they produced a task-relatedresponse) for each of the measures. To determinewhether infants were more likely to produce a matchthan a mismatch, we also analyzed the percentage ofmatches after subtracting the percentage of mismatches(e.g. the hopping style or sound effect in the sliding con-dition, or the left location when E went to the right one).This corrected score produced the same overall results asthe percentage of matches. Only infants’ first responseson every trial were used in analyses.
An independent coder re-coded 20% of infants at eachage to assess inter-observer reliability. Excellent levels ofreliability were achieved: Cohen’s kappas were .91 foraction style (with .88 for task-relatedness), .82 for soundeffect, and .92 for location.
Results
Prior to analyses we investigated the effect of order ofpresentation of the House/No House condition on eachof the dependent variables for matching action style,sound, and location. There were no significant differ-ences between orders for any of the dependent variables(t-test: p > .10 in all cases). Therefore, we collapsed theorder of presentation in all subsequent analyses. All pvalues below are one-tailed.
Matching action style
Figure 2 presents the overall percentage of trials inwhich infants matched the adult’s action style in thetwo experimental conditions. Infants matched E’s actionstyle significantly more often in the No House than inthe House condition, F(1, 98) = 117.32, p < .001, and olderinfants produced significantly more matches thanyounger infants, F(1, 98) = 20.07, p < .001. There wasalso an interaction between age and condition, F(1, 98)= 20.19, p < .001: although the effect of condition wassignificant for both ages, the magnitude of the effect waslarger in 18-month-olds, t(65) = 13.2, p < .001, than in12-month-olds, t(33) = 3.85, p < .001. The analyses ofthe corrected percentage of matches mirrored theseresults and also showed that both 12-month-olds (t(34)= 1.82, p < .05) and 18-month-olds (t(68) = 7.59, p <.001) produced more matches than mismatches.
The same pattern of results held for each action style(hopping and sliding) separately. Figures 3 (a) and (b)present the percentage of matches of the hopping andthe sliding styles, respectively, across conditions. Infantsmatched each style significantly more often in the NoHouse than the House condition (F(1, 80) = 70.58,p < .001 for hopping and F(1, 70) = 34.84, p < .001for sliding). Older infants produced significantly morematches than younger infants (F(1, 80) = 12.42, p < .001for hopping and F(1, 70) = 9.37, p = .002 for sliding).There were significant age × condition effects for bothstyles (F(1, 80) = 11.47, p < .001 for hopping and F(1,70) = 9.64, p = .002 for sliding). For each style separ-ately, condition was a significant factor at both ages,although the magnitude of the effect was larger in 18-month-olds (t(53) = 10.11, p < .001 for hopping andt(48) = 7.61, p < .001 for sliding), than in 12-month-olds(t(27) = 3.07, p = .003 for hopping and t(22) = 1.91, p <.05 for sliding).
Figure 2 The overall percentage of matches in action style across conditions as a function of age.
Infants copy goals F15
© Blackwell Publishing Ltd. 2005
Materials
A stuffed toy mouse and two identical black 51.5 cm ×64.5 cm mats were used. Attached to the top of one ofthe mats were two houses made of a rectangular 6 cm ×6 cm × 12 cm tube and a cardboard roof (see Figure 1).The houses were 18 cm apart from each other and cen-tered on the mat. The other mat was plain, with nothingattached.
Procedure
Infants sat on their parents’ laps across the table from afemale experimenter (E). First, E gave the mouse toinfants briefly so they could become familiar with it. Ethen placed the assigned mat (see below) on the table, letinfants explore it briefly, and then began the tests.
There were eight trials, four in each of two conditions:House and No House, corresponding to the two mats.The four trials of each condition were blocked and theorder of the blocks was counterbalanced. In both condi-tions, E performed exactly the same actions, to the same
location on the mat; the only difference between condi-tions was whether there was a house on the mat in thefinal location. For each trial, E obtained the infant’sattention and then moved the mouse to the final locationusing one of two action styles: hopping or sliding. Forthe hopping action style, E made the mouse jump in astraight line to the location, breaking contact with the matapproximately eight times. Each hop was accompaniedby a [bi] sound (so infants heard ‘beebeebeebee . . .’).For the sliding action style, E moved the mouse in astraight line to the location, never breaking contact withthe mat. This action was accompanied by one long‘beeeeeeeee’ sound.
Half the trials in each condition were to the left loca-tion and half were to the right. In half of each of thesetrials, E made the mouse hop and in half she made itslide. For each infant, the order of left/right and hop/slide trials was the same for each of the two conditions’blocks. E held the mouse in her right hand for actions toher right side and in her left hand for actions to her leftside. She started the action from the middle of her sideof the mat.
In each of the eight trials, after a single demonstrationof E moving the mouse to one of the locations and leav-ing it there for a few seconds, E picked up the mouseand placed it in front of infants in the middle of theirside of the mat. E told infants, ‘Now you.’ E waited untilinfants made a relevant response or else made it clearthat they did not wish to respond (e.g. by throwing themouse or giving it back to E), and then E took themouse and went on to the next demonstration.
Coding
It was not possible to code infants’ responses blind toexperimental condition (House/No House) because houseswere present or not throughout the response period.However, coders did not watch demonstrations and sowere blind to the action style and sound effects E used,and the left/right location in which she put the mouse.
The main measure of interest was whether infantscopied E’s action style differently in the House versusNo House conditions. All hopping and sliding move-ments were coded, even those not directed at the correctlocations. Hopping was coded if infants made the mousebreak contact with the mat more than once and slidingwas coded if infants pushed the mouse on the mat with-out breaking contact. However, because coders notedthat infants sometimes banged or slid the mouse aroundrandomly, we added the additional criterion that thehopping or sliding needed to be task-related, but thepattern of results was the same at the overall level whenall responses were included. Actions were coded as task-
Figure 1 (a) The mouse at the start location in the House condition. (b) The mouse at the end location in the No House condition.
Infants copy goals F17
© Blackwell Publishing Ltd. 2005
Matching sound
Figure 4 presents the overall percentage of trials inwhich infants matched the adult’s sound effect in thetwo experimental conditions. Infants matched E’s soundeffect significantly more often in the No House than inthe House condition, F(1, 98) = 6.82, p < .01. This was thecase for the hopping sound separately as well, F(1, 80) =
7.70, p < .01. Overall, older infants produced significantlymore matches than younger infants, F(1, 98) = 15.96,p < .001, and this was the case for the hopping soundseparately again, F(1, 80) = 11.14, p < .001. There wasno significant age × condition effect, either overall or forthe hopping sound separately. For the sliding soundseparately, older infants produced significantly morematches than younger infants, F(1, 71) = 10.57, p = .001,but those matches occurred independently of condition(and there was no significant age × condition interaction).The analyses of the corrected percentage of matchesmirrored these overall results. Moreover, 18-month-oldsproduced more matches than mismatches overall, t(68) =5.11, p < .001, and 12-month-olds tended to do this, t(34)= 1.34, p = .095.
Matching location
Figure 5 presents the overall percentage of trials inwhich infants matched the adult’s location in the twoexperimental conditions. In this case – differently fromthe above analyses – infants matched E’s location signi-ficantly more often in the House than in the No House con-dition, F(1, 98) = 180.83, p < .001. When houses werepresent, infants very often placed the mouse in one ofthem, whereas when no houses were present, infants paidmuch less attention to the location that was the terminusof the mouse’s travels on the mat. There was no signi-ficant effect of age and no age × condition interaction. Theanalyses of the corrected percentage of matches mirroredthese results. Moreover, both age groups produced morematches than mismatches (18-month-olds: t(68) = 2.32,p < .02; 12-month-olds: t(34) = 4.52, p < .001).
Combinations of responses
Because we were also interested in infants’ social learn-ing skills above and beyond their understanding of the
Figure 3 The percentage of action style matches for (a) the hopping style and (b) the sliding style.
Figure 4 The overall percentage of matches of the demonstrated sound effect across conditions as a function of age.
Figure 5 The overall percentage of matches of location across conditions as a function of age.
Goal-match Means-match
Dow
nloa
ded
By:
[Uni
vers
ity o
f Bie
lefe
ld] A
t: 15
:44
28 J
uly
2008
158 BEKKERING, WOHLSCHLAÈ GER, GATTIS
children’s imitation of these gestures. We reasoned that limiting the set of movements to
only one ear would eliminate the problem of choosing an ear. As a consequence, another
goal–using the correct hand–might be ful®lled.
Method
Subjects
Participants were nine pre-school children, aged 4:0 to 5:11 years (mean age of 4:4 years). Each
child was tested individu ally in a quiet room.
Design and Procedure
Experiment 2 was similar to Experiment 1, with the exception that now only two movements were
modelled : an ipsilateral and a contralateral movement, both to the same ear. Right and left ear were
counterbalanced between-participants. For four children, the model always moved with either the
left (thus with the ipsilateral) hand or the right (thus with the contralateral) hand to the left ear; for
the other ®ve participants the model always moved to the right ear. The two movements were
repeated 12 times in total, in a random order, resulting in 6 ipsi- and 6 contralateral hand movements.
This time, all children were simply instructed, ``You do what I do’’.2
FIG. 2. Percentage of errors for the different conditions in Experiments 1±3. The blank bar represents errors
on contralateral movement trials, and the striped bar represents errors on ipsilateral movment trials.
2Although in Experiment 1 we instructed the children with the words, ``Try to imitate me as if you were my
mirror. You do what I do’’, in Experiments 2 and 3 we used the minimal instruction ``You do what I do’’, because
in a pilot experiment we observed that for children this automatically implies that they will copy the movements
as if they were looking in a mirror, as previously observed by Scho®eld (1976).
Contralateral
Ipsilateral
[Gleissner et al., 2000]
• Difference in prediction error produces hierarchical development of actions. – Goal: large dynamics ! larger error ! learned first
– Means: small dynamics ! smaller error ! learned later
Our Hypothesis about Hierarchical Representation of Action
Means Goal
Init
[Park, Kim, & Nagai, SAB 2014]
Learning Goal-Directed Actions Using RNNPB
• Recurrent Neural Network with Parametric Bias (RNNPB) [Tani & Ito, 2003]
– Represent multiple time-series of data using PBs – Learn with back-propagation through time
PB1
PB2
Before learning
PB1
PB2
After learning
[Park, Kim, & Nagai, SAB 2014] �(t) C(t)
C(t+1)
PB
�(t+1)
Result 1: Developmental Change in PBs & Generated Actions (t = 0)
[Park, Kim, & Nagai, SAB 2014]
A, B: Goal 0, 1, 2: Means
: Desired trajectory : Acquired trajectory
Result 1: Developmental Change in PBs & Generated Actions (t = 1,000)
[Park, Kim, & Nagai, SAB 2014]
A, B: Goal 0, 1, 2: Means
: Desired trajectory : Acquired trajectory
Result 1: Developmental Change in PBs & Generated Actions (t = 2,000)
[Park, Kim, & Nagai, SAB 2014]
A, B: Goal 0, 1, 2: Means
: Desired trajectory : Acquired trajectory
Result 1: Developmental Change in PBs & Generated Actions (t = 10,000)
[Park, Kim, & Nagai, SAB 2014]
A, B: Goal 0, 1, 2: Means
: Desired trajectory : Acquired trajectory
Result 1: Developmental Change in PBs & Generated Actions (t = 15,000)
[Park, Kim, & Nagai, SAB 2014]
A, B: Goal 0, 1, 2: Means
: Desired trajectory : Acquired trajectory
Result 1: Developmental Change in PBs & Generated Actions (t = 100,000)
[Park, Kim, & Nagai, SAB 2014]
A, B: Goal 0, 1, 2: Means
: Desired trajectory : Acquired trajectory
Result 1: Developmental Change in PBs & Generated Actions (t = 200,000)
[Park, Kim, & Nagai, SAB 2014]
A, B: Goal 0, 1, 2: Means
: Desired trajectory : Acquired trajectory
Result 2: Action Generation by Robot
A1 B1
A1 B1
Middle stage of learning: ! Goal only
Later stage of learning: ! Goal + means
(t = 3,500)
(t = 200,000) : Desired trajectory
[Park, Kim, & Nagai, SAB 2014]
Infants Help Others Even Without Reward – Why?
[Warneken & Tomasello, 2006]
Two Theories for Altruistic Behaviors [Paulus, 2014]
• Emotion-sharing theory – Understand other person as an
intentional agent [Batson, 1991]
– Be motivated to help other based on empathic concern for other’s needs [Davidov et al., 2013]
– Self-other differentiation
• Goal-alignment theory – Understand other’s goal, but not his/
her intention [Barresi & Moore, 1996] – Take over other’s goal as if it were
infant’s own – No self-other discrimination
[Warneken & Tomasello, 2006]
Our Hypothesis about Emergence of Altruistic Behavior
1. Learn the predictor by minimizing the prediction error ei(t+1) through the robot’s own experiences
Cover
Push
[Baraglia, Nagai, & Asada, ICDL-EpiRob 2014]
ei(t+1)
si(t) aj(t) si(t+1) Sensorimotor
system
ŝi(t+1) âj(t+1)
Predictor
Our Hypothesis about Emergence of Altruistic Behavior
1. Learn the predictor to minimize the prediction error ei(t+1) through the robot’s own experiences
2. Estimate ei(t+1) while observing other’s action si(t+1)
Predict
(Other person)
ei(t+1) "
si(t) aj(t) si(t+1) Sensorimotor
system
ŝi(t+1) âj(t+1)
Predictor
[Baraglia, Nagai, & Asada, ICDL-EpiRob 2014]
Push
Our Hypothesis about Emergence of Altruistic Behavior
1. Learn the predictor to minimize the prediction error ei(t+1) through the robot’s own experiences
2. Estimate ei(t+1) while observing other’s action si(t+1)
3. Execute the action âj(t+1) to minimize ei(t+1) if ei(t+1) > threshold
! Altruistic behavior
ei(t+1) #
si(t) aj(t) si(t+1) Sensorimotor
system
ŝi(t+1) âj(t+1)
Predictor
[Baraglia, Nagai, & Asada, ICDL-EpiRob 2014]
Result: Emergence of Altruistic Behavior
[Baraglia, Nagai, & Asada, ICDL-EpiRob 2014]
Our Hypothesis [Nagai, in press]
Predictive learning of sensorimotor information (i.e., minimizing prediction error ei(t+1)) leads to cognitive development.
(1) Learn the predictor through sensorimotor experiences ! Self-other cognition ! Goal-directed action, etc.
(2) Produce an action in response to other’s action ! Imitation ! Altruistic behavior, etc.
ei(t+1)
si(t) aj(t) si(t+1) Sensorimotor
system
ŝi(t+1) âj(t+1)
Predictor
ei(t+1)
si(t) aj(t) si(t+1) Sensorimotor
system
ŝi(t+1) âj(t+1)
Predictor
Outline
1. Cognitive development in robots based on predictive learning – Self-other cognition [Nagai, Kawai, & Asada, ICDL-EpiRob 2011]
– Goal-directed action [Park, Kim, & Nagai, SAB 2014]
– Altruistic behavior [Baraglia, Nagai, & Asada, ICDL-EpiRob 2014]
2. Autism spectrum disorder (ASD) caused by atypical tolerance for prediction error – Theory of underlying mechanism of ASD [Nagai, in press]
– Simulator of atypical perception in ASD [Qin, Nagai, Kumagaya, Ayaya, & Asada, ICDL-EpiRob 2014]
Autism Spectrum Disorder (ASD)
• Difficulties in communicating with other people [Baron-Cohen, 1995; Charman et al., 1997; Mundy et al., 1986]
– Lack of theory of mind – Less eye contact – Difficulty in joint attention, etc.
• Atypical perception & weak central coherence [O’Neill & Jones,1997; Happé & Frith, 2006; Ayaya & Kumagaya, 2008] – Hyperesthesia/hypoesthesia – Weak ability to integrate
information, etc.
Perception
Social
Atypical Perception in ASD
[Behrmann et al., 2006]
116 M. Behrmann et al. / Neuropsychologia 44 (2006) 110–129
Fig. 2. Examples of stimuli and results of global/local task. (a) Four com-pound stimuli, two of which are consistent and share identity at the globaland local level and two of which do not share identity at the global and locallevel. (b) RT (and one SE) for means for control and autism group for globaland local identification as a function of consistency.
shared identity (a large H made of smaller Hs and a large Smade of small Ss) or inconsistent letters, in which the lettersat the two levels had different identities (a large H madeof small Ss and a large S made of small Hs; see Fig. 2a).The global letter subtended 3.2◦ in height and 2.3◦ in width,and the local letter subtended 0.44◦ in height and 0.53◦ inwidth.The experiment consisted of the factorial combination
of two variables in a repeated measures design: globality(global identification versus local identification), and con-sistency (consistent stimuli versus inconsistent stimuli). Thetwo tasks, local or global identification, were administeredin separate blocks of 96 experimental trials each, precededby 10 practice trials. The consistent and inconsistent letterswere randomized within block with each letter occurring onan equal number of trials, for a total of 192 trials. Before eachblock, participants were verbally instructed to respond to theglobal or local letters. Each trial was initiated with a centralfixation cross of 500ms duration. This was immediately re-placed by one of the four possible stimuli, which remainedcentrally on the screen until a response was made. Partici-pants were instructed to press the left key on the button box(or keyboard) to indicate a response of ‘s’ or the right keyfor ‘h’. The order of the blocks and response designation wascounterbalanced across subjects.
4.1.2. Results and discussionAt the outset, we note that there is neither a group differ-
ence between autistic and control subjects, nor an interactionof any sort, in the accuracy data (all F< 1). Autistic and con-trol subjects were correct on average 98% (S.D., 2%) and98.2% (S.D., 1.3%) of the time, respectively. The high ac-curacy rate is not surprising given the unlimited exposureduration and ease of task (making s/h decisions). The RTdata, calculated on the median for each subject for each con-dition, reveals a significant three-way interaction betweengroup× globality (global and local)× consistency (consis-tent and inconsistent) (F(1,39) = 4.9, p< 0.05). There are alsomain effects of group and of globality (p< 0.0001). As is evi-dent from Fig. 2b, under these testing conditions, the controlsubjects responded quickly and showed a slight advantagefor global over local identification (23ms) and a slight asym-metry with greater slowing in the inconsistent case (relativeto the congruent case) when local identification is required(interference from globally incongruous letter) than whenglobal identification is required (interference from locallyincongruous letters). This global advantage and the global-to-local interference replicates the standard findings (Navon,2003), although the condition differences may not be as largeas usual given the unlimited duration and repeated foveal pre-sentation.The autistic subjects were slower than the control sub-
jects overall, but most importantly, a different pattern of per-formance is observed for them (Fig. 2b). The autistic groupis overall faster for local than global identification althoughthis difference comes from the inconsistent trials: there is nostatistically significant difference between global and localidentification in the consistent case, but in the inconsistentcase, local identification is faster than global identification(p< 0.05). The latter result, namely greater slowing in theinconsistent case when global identification is required, in-dicates a large local-to-global interference. The faster localidentification and the local-to-global interference both indi-cate that autistic individuals show a local bias in their pro-cessing. It is the case, however, that there is no obvious localadvantage in the consistent case suggesting that there maybe some partial processing of the global identity too which,when congruent with the local letter, can be extracted.We caninfer then that, under the conditions employed here, the autis-tic individuals were able to derive the global configuration inthe consistent condition, but that their local bias gave riseto large interference in the inconsistent condition, suggest-ing that it was difficult to attain a stable global configurationwhen the elements had a conflicting identity.Given the ongoing controversy in the autism literature con-
cerning the extent to which processing is locally biased andthe extent to which configural processing is possible, andgiven the suggestion that the autistic group in this experi-ment may be able to derive the global identity as well asthe local identity in the consistent case, we examined thedata of each autistic subject individually. To this end, we cal-culated the number of autistic individuals who fall outside
[Ayaya & Kumagaya, 2008]
Original
ASD
Our Hypothesis about Mechanism of ASD
• ASD might be caused by an atypical tolerance for prediction error in predictive learning. [Ayaya & Kumagaya, 2008; Nagai, in press]
Sensorimotor information
Typically developing people Proper tolerance for prediction error
People with ASD Atypical tolerance for
prediction error
(smaller tolerance ! hyperesthesia)
(larger tolerance ! hypoesthesia)
[Qin et al., ICDL-EpiRob 2014; Nagai et al., Japanese CogSci 2015]
Simulator of Atypical Perception in ASD
Cognitive Neuroscience Robotics
Conclusion
Our Hypothesis [Nagai, in press]
Predictive learning of sensorimotor information (i.e., minimizing prediction error ei(t+1)) leads to cognitive development.
(1) Learn the predictor through sensorimotor experiences ! Self-other cognition ! Goal-directed action, etc.
(2) Produce an action in response to other’s action ! Imitation ! Altruistic behavior, etc.
ei(t+1)
si(t) aj(t) si(t+1) Sensorimotor
system
ŝi(t+1) âj(t+1)
Predictor
ei(t+1)
si(t) aj(t) si(t+1) Sensorimotor
system
ŝi(t+1) âj(t+1)
Predictor
Our Hypothesis about Mechanism of ASD
• ASD might be caused by an atypical tolerance for the prediction error in predictive learning. [Ayaya & Kumagaya, 2008; Nagai, in press]
Sensorimotor information
Typically developing people Proper tolerance for prediction error
People with ASD Atypical tolerance for
prediction error
(small tolerance ! hyperesthesia)
(large tolerance ! hypoesthesia)
Increasing Interest in Predictive Learning
Predictive coding under the free-energy principleKarl Friston* and Stefan Kiebel
The Wellcome Trust Centre of Neuroimaging, Institute of Neurology, University College London,Queen Square, London WC1N 3BG, UK
This paper considers prediction and perceptual categorization as an inference problem that is solvedby the brain. We assume that the brain models the world as a hierarchy or cascade of dynamicalsystems that encode causal structure in the sensorium. Perception is equated with the optimization orinversion of these internal models, to explain sensory data. Given a model of how sensory data aregenerated, we can invoke a generic approach to model inversion, based on a free energy bound on themodel’s evidence. The ensuing free-energy formulation furnishes equations that prescribe theprocess of recognition, i.e. the dynamics of neuronal activity that represent the causes of sensoryinput. Here, we focus on a very general model, whose hierarchical and dynamical structure enablessimulated brains to recognize and predict trajectories or sequences of sensory states. We first reviewhierarchical dynamical models and their inversion. We then show that the brain has the necessaryinfrastructure to implement this inversion and illustrate this point using synthetic birds that canrecognize and categorize birdsongs.
Keywords: generative models; predictive coding; hierarchical; birdsong
1. INTRODUCTIONThis paper reviews generic models of our sensoriumand a Bayesian scheme for their inversion. We thenshow that the brain has the necessary anatomical andphysiological equipment to invert these models, givensensory data. Critically, the scheme lends itself to arelatively simple neural network implementation thatshares many features with real cortical hierarchies inthe brain. The basic idea that the brain tries to infer thecauses of sensations dates back to Helmholtz (e.g.Helmholtz 1860/1962; Barlow 1961; Neisser 1967;Ballard et al. 1983; Mumford 1992; Kawato et al. 1993;Dayan et al. 1995; Rao & Ballard 1998), with a recentemphasis on hierarchical inference and empirical Bayes(Friston 2003, 2005; Friston et al. 2006). Here, wegeneralize this idea to cover dynamics in the world andconsider how neural networks could be configured toinvert hierarchical dynamical models and deconvolvesensory causes from sensory input.
This paper comprises four sections. In §1, weintroduce hierarchical dynamical models and theirinversion. These models cover most of the modelsencountered in the statistical literature. An importantaspect of thesemodels is their formulation in generalizedcoordinates of motion, which lends them a hierarchalform in both structure and dynamics. These hierarchiesinduce empirical priors that provide structural anddynamical constraints, which can be exploited duringinversion. In §2, we show how inversion can beformulated as a simple gradient ascent using neuronalnetworks; in §3, we consider how evoked brain responsesmight be understood in terms of inference underhierarchical dynamical models of sensory input.1
2. HIERARCHICAL DYNAMICAL MODELSIn this section, we look at dynamical generative modelspð y;wÞZpð y jwÞp ðwÞ that entail a likelihood, p(yjw), ofgetting some data, y, given some causes, wZ{x, v, q},and priors on those causes, p(w). The sorts of modelswe consider have the following form:
yZ gðx; v; qÞCz;
_xZ f ðx; v; qÞCw;ð2:1Þ
where the nonlinear functions f and g of the states areparametrized by q. The states v(t) can be deterministic,stochastic or both, and are variously referred to asinputs, sources or causes. The states x(t) meditate theinfluence of the input on the output and endow thesystem with memory. They are often referred to ashidden states because they are seldom observeddirectly. We assume that the stochastic innovations(i.e. observation noise) z(t) are analytic, such that thecovariance of ~zZ ½z; z 0; z 00;.$T is well defined; simi-larly, for w(t), which represents random fluctuations onthe motion of hidden states. Under local linearityassumptions, the generalized motion of the output orresponse ~yZ ½ y; y 0; y 00;.$T is given by
yZ gðx; vÞCz x 0 Z f ðx; vÞCw
y 0 Z gxx0 Cgvv
0 Cz 0 x 00 Z fxx0 C fvv
0 Cw 0
y 00 Z gxx00 Cgvv
00 Cz 00 x000 Z fxx00 C fvv
00 Cw 00
« «
ð2:2Þ
The first (observer) equation shows that the generalizedstates uZ ½ ~v; ~x; $T are needed to generate a generalizedresponse or trajectory. The second (state) equationsenforce a coupling between different orders of themotion of the hidden states and confer memory on thesystem. We can write these equations compactly as
Phil. Trans. R. Soc. B (2009) 364, 1211–1221
doi:10.1098/rstb.2008.0300
One contribution of 18 to a Theme Issue ‘Predictions in the brain:using our past to prepare for the future’.
*Author for correspondence ([email protected]).
1211 This journal is q 2009 The Royal Society
on 30 March 2009rstb.royalsocietypublishing.orgDownloaded from
REVIEW
Predictive coding: an account of the mirror neuron system
James M. Kilner Æ Karl J. Friston Æ Chris D. Frith
Received: 21 February 2007 / Revised: 19 March 2007 / Accepted: 21 March 2007 / Published online: 12 April 2007! Marta Olivetti Belardinelli and Springer-Verlag 2007
Abstract Is it possible to understand the intentions of
other people by simply observing their actions? Many be-lieve that this ability is made possible by the brain’s mirror
neuron system through its direct link between action and
observation. However, precisely how intentions can beinferred through action observation has provoked much
debate. Here we suggest that the function of the mirror
system can be understood within a predictive codingframework that appeals to the statistical approach known as
empirical Bayes. Within this scheme the most likely cause
of an observed action can be inferred by minimizing theprediction error at all levels of the cortical hierarchy that
are engaged during action observation. This account
identifies a precise role for the mirror system in our abilityto infer intentions from actions and provides the outline of
the underlying computational mechanisms.
Keywords Mirror neurons ! Action observation !Bayesian inference ! Predictive coding
Introduction
The notion that actions are intrinsically linked to percep-
tion was proposed by William James, who claimed, ‘‘every
mental representation of a movement awakens to somedegree the actual movement which is its object’’ (James
1890). The implication is that observing, imagining, or in
anyway representing an action excites the motor program
used to execute that same action (Jeannerod 1994; Prinz
1997). Interest in this idea has grown recently, in part dueto the neurophysiological discovery of ‘‘mirror’’ neurons.
Mirror neurons discharge not only during action execution
but also during action observation, which has led many tosuggest that these neurons are the substrate for action
understanding.
Mirror-neurons were first discovered in the premotorarea, F5, of the macaque monkey (Di Pellegrino et al.
1992; Gallese et al. 1996; Rizzolatti et al. 2001; Umilta
et al. 2001) and have been identified subsequently in anarea of inferior parietal lobule, area PF (Gallese et al. 2002;
Fogassi et al. 2005). Neurons in the superior temporal
sulcus (STS), also respond selectively to biologicalmovements, both in monkeys (Oram and Perrett 1994) and
in humans (Frith and Frith 1999; Allison et al. 2000;
Grossman et al. 2000) but they are not mirror-neurons, asthey do not discharge during action execution. Neverthe-
less, they are often considered part of the mirror neuron
system (MNS; Keysers and Perrett 2004) and we willconsider them as such here. These three cortical areas,
which constitute the MNS, the STS, area PF and area F5,
are reciprocally connected. In the macaque monkey, areaF5 in the premotor cortex is reciprocally connected to area
PF (Luppino et al. 1999) creating a premotor–parietal MNS
and STS is reciprocally connected to area PF of the inferiorparietal cortex (Harries and Perrett 1991; Seltzer and
Pandya 1994) providing a sensory input to the MNS (see
Keysers and Perrett 2004 for a review). Furthermore, thesereciprocal connections show regional specificity. Although
STS has extensive connections with the inferior parietal
lobule, area PF is connected to an area of the STS that isspecifically activated by observation of complex body
movements. An analogous pattern of connectivity between
premotor areas and inferior parietal lobule has also been
J. M. Kilner (&) ! K. J. Friston ! C. D. FrithThe Wellcome Trust Centre for Neuroimaging,Institute of Neurology, 12 Queen Square,WC1N 3BG London, UKe-mail: [email protected]
123
Cogn Process (2007) 8:159–166
DOI 10.1007/s10339-007-0170-2
only underlies motor but cognitive and social skills as well”. Interest-ingly, they note consolidation of memory traces after the initial acqui-sition can “result in increased resistance to interference or evenimprovement in performance following an offline period”. This is afascinating observation that suggests optimisation of the brain's gen-erative model does not necessarily need online sensory data. Indeed,there are current theories about the role of sleep in optimising thebrain's generative model, not in terms of its ability to accurately pre-dict data, but in terms of minimising complexity. Mathematically, thisis interesting because surprise or model evidence can be decomposedinto accuracy and complexity terms; suggesting that model evidencecan be increased by removing redundant model components or pa-rameters (Friston, 2010). This provides a nice Bayesian perspectiveon synaptic pruning and the issues considered by Németh andJanacsek (2012-this issue).
4. Active inference
As noted above, a simple extension to predictive coding is to con-sider their suppression by the motor system. In this extension, predic-tion errors are not just suppressed by optimising top-down ordescending predictions but can also be reduced by changing sensoryinput. This does not necessarily mean visual or auditory input butthe proprioceptive input responding to bodily movements. As notedabove, the suppression of proprioceptive prediction errors is, ofcourse, just the classical reflex arc. In this view, motor control be-comes a function of descending predictions about anticipated or pre-dicted kinematic trajectories. See Fig. 1 for a schematic. The importantobservation here is that the same sorts of synaptic mechanisms andinferential principles can be applied to both perception and the con-sequences of action. This nicely accommodates the literature on
error related negativity reviewed by Hoffmann and Falkenstein(2012-this issue); who consider the “monitoring of one’s own ac-tions” and its role in adjusting behaviour. Again, the focus is on EEG,suggesting that even within single trial recordings, the neurophysio-logical correlates of behaviour-dependent prediction errors can beobserved empirically. In their words: “The initiated response is com-pared with the desired response and a difference; i.e., mismatch be-tween both representations induces the error negativity”. This is notthe proprioceptive prediction error that drives reflex arcs but a highlevel perceptual (or indeed conceptual) prediction error; suggestingthat the long-term hierarchical predictions of unfolding sensory andkinematic changes have been violated. In other words, these phe-nomena speak again to separation of temporal scales and hierarchiesin providing multimodal predictions to the peripheral sensory andmotor systems.
Active inference means that movements are caused by top-downpredictions, which means that the brain must have a model of whatcaused these movements. This begs the interesting question as towhether there is any sense of agency associated with representations.In other words, if I expect to move my fingers and classical motor re-flexes cause them to move, do I need to know that it was me who ini-tiated the movement? Furthermore, can I disambiguate between meas the agent or another. These are deep questions and move us onto issues of self modelling and action observation:
5. Action observation and agency
In a nice analysis of agency, gait and self consciousness, Kannapeand Blanke (2012-this issue) start by acknowledging: “Agency is animportant aspect of bodily self consciousness, allowing us to separateown movements from those induced by the environment and to
Fig. 1. This figure illustrates the neuronal architectures thatmight implement predictive coding and active inference. The left panel shows a schematic of predictive coding schemes inwhich Bayesian filtering is implemented by neuronal message passing between superficial (red) and deep (black) pyramidal cells encoding prediction errors and conditional pre-dictions or estimates respectively (Mumford 1992). In these predictive coding schemes, top-down predictions conveyed by backward connections are compared with conditionalexpectations at the lower level to form a prediction error. This prediction error is then passed forward to update the expectations in a Bayes-optimal fashion. In active inference,this scheme is extended to include classical reflex arcs, where proprioceptive prediction errors drive action— a (alpha motor neurons in the ventral horn of the spinal-cord) to elicitextrafusal muscle contractions and changes in primary sensory afferents frommuscle spindles. These suppress prediction errors encoded by Renshaw cells. The right panel presents aschematic of units encoding conditional expectations and prediction errors at some arbitrary level in a cortical hierarchy. In this example, there is a distinction between hidden statesxx that model dynamics and hidden causes xv that mediate the influence of one level on the level below. The equations correspond to a generalized Bayesian filtering or predictivecoding in generalized coordinates of motion as described in (Friston, 2010). In this hierarchical form f(i) := f(xx(i),xv(i)) corresponds to the equations of motion at the i-th level, whileg(i) :=g(xx(i),xv(i)) link levels. These equations constitute the agent's prior beliefs. D is a derivative operator and Π(i) represents precision or inverse variance. These equations wereused in the simulations presented in the next figure.
250 K. Friston / International Journal of Psychophysiology 83 (2012) 248–252
Thank You!
Osaka University • Minoru Asada • Jimmy Baraglia • Yuji Kawai • Shibo Qin • Many students University of Tokyo • Shinichiro Kumagaya • Satsuki Ayaya KAIST • Jun-Cheol Park
[email protected] http://cnr.ams.eng.osaka-u.a.jp/~yukie/