Post on 13-Jul-2020
transcript
Inference Through Embodied Simulation in Cognitive Robots
Vishwanathan Mohan • Pietro Morasso •
Giulio Sandini • Stathis Kasderidis
Received: 1 October 2012 / Accepted: 3 February 2013 / Published online: 12 March 2013
� Springer Science+Business Media New York 2013
Abstract In Professor Taylor’s own words, the most
striking feature of any cognitive system is its ability to
‘‘learn and reason’’ cumulatively throughout its lifetime,
the structure of its inferences both emerging and con-
strained by the structure of its bodily experiences. Under-
standing the computational/neural basis of embodied
intelligence by reenacting the ‘‘developmental learning’’
process in cognitive robots and in turn endowing them with
primitive capabilities to learn, reason and survive in
‘‘unstructured’’ environments (domestic and industrial) is
the vision of the EU-funded DARWIN project, one of the
last adventures Prof. Taylor embarked upon. This journey
is about a year old at present, and our article describes the
first developments in relation to the learning and reasoning
capabilities of DARWIN robots. The novelty in the com-
putational architecture stems from the incorporation of
recent ideas firstly from the field of ‘‘connectomics’’ that
attempts to explore the large-scale organization of the
cerebral cortex and secondly from recent functional
imaging and behavioral studies in support of the embodied
simulation hypothesis. We show through the resulting
behaviors’ of the robot that from a computational view-
point, the former biological inspiration plays a central role
in facilitating ‘‘functional segregation and global integra-
tion,’’ thus endowing the cognitive architecture with
‘‘small-world’’ properties. The latter on the other hand
promotes the incessant interleaving of ‘‘top-down’’ and
‘‘bottom-up’’ information flows (that share computational/
neural substrates) hence allowing learning and reasoning to
‘‘cumulatively’’ drive each other. How the robot learns
about ‘‘objects’’ and simulates perception, learns about
‘‘action’’ and simulates action (in this case learning to
‘‘push’’ that follows pointing, reaching, grasping behav-
iors’) are used to illustrate central ideas. Finally, an
example of how simulation of perception and action lead
the robot to reason about how its world can change such
that it becomes little bit more conducive toward realization
of its internal goal (an assembly task) is used to describe
how ‘‘object,’’ ‘‘action,’’ and ‘‘body’’ meet in the Darwin
architecture and how inference emerges through embodied
simulation.
Keywords Brain guidance � Embodied simulation � Small
worlds � Body schema � Cognitive robotics � Learning and
reasoning � DARWIN project
Introduction
So it was a cold winter night of 2011 in Luxembourg when
we (VM and PM) last met Professor Taylor enjoying his
apple pie and acknowledging the young chef for her cre-
ativity. She was all in smiles and indeed, creativity is
infectious, be it Betty the Caledonian crow [42, 81], Alex
the parrot [58], a capuchin or a chimp [75, 76, 84] or a
human infant playing! How brains ‘‘become’’ creative and
V. Mohan (&) � P. Morasso � G. Sandini
Robotics, Brain and Cognitive Science Department, Istituto
Italiano di Tecnologia, Via Company Morego 30,
16163 Genoa, Italy
e-mail: vishwanathan.mohan@iit.it
P. Morasso
e-mail: pietro.morasso@iit.it
G. Sandini
e-mail: giulio.sandini@iit.it
S. Kasderidis
Novocaptis Cognitive Systems and Robotics, Thessaloniki,
Greece
e-mail: kasderidis@novocaptis.com
123
Cogn Comput (2013) 5:355–382
DOI 10.1007/s12559-013-9205-4
exhibit novelty in behavior through cumulative learning
and effective use of experience is still a mystery. This has
to be unlocked to better understand our own selves and to
create artifacts that can intelligently assist us in the envi-
ronments we inhabit and create. Next day morning, we
were asked ‘‘what aspect of cognition needs to be under-
stood further to effectively design artificial cognitive sys-
tems?’’ There was an instantaneous reply: ‘‘the most
striking feature of any cognitive system is its ability to
learn cumulatively forever and use its experiences effec-
tively to survive.’’ It was the negotiation meeting of the
newly funded EU Project DARWIN and Prof. Taylor had
concisely spelled out the mantra to pursue. The underlying
rationale was twofold: firstly to explore the computational/
neural basis of embodied intelligence by reenacting the
infant developmental learning process in cognitive robots
and secondly create practical systems with ‘‘end user
value’’ that demonstrate cognitive capabilities. This is also
evident from the expansion of the acronym ‘‘Darwin.’’1
Our journey is about a year old now, and this article pre-
sents the first developments in relation to the learning and
reasoning capabilities of the DARWIN robots.
In general, after the tryst with GOFAI, most current
research in the field of cognitive developmental robotics
appreciates the fact that ‘‘sensorimotor experience precedes
representation’’ and cognition is gradually bootstrapped
through a cumulative process of learning by interaction
(physical and social) within the zone of proximal develop-
ment [78] of the agent. This approach indeed has roots in
Wiener’s Cybernetics [80], Varela et al. autopoesis [73],
Chiel and Beer’s neuroethology [14], Clark’s situatedness
[15], Hesslow’s simulation hypothesis [32, 33], and Thom-
son’s enactive cognition [71]. The obvious reason to pursue
this path is because it is impossible to predict and program at
‘‘design time’’ every possible situation in every time instance
to which an artifact may be subjected to in the future. Of
course, robot programming approaches work for simple
machines performing targeted functions (like a washing
machines, etc.) but certainly not for general-purpose robotic
companions envisaged to assist us in unstructured environ-
ments like housekeeping, work-place automation, industrial
assembly, aid for the elderly and physically challenged, to
mention a few. Complementing the extrinsic application-
specific value, the embodied/enactive approach is also rele-
vant from an intrinsic viewpoint of understanding our own
selves; understanding how interactions between body and the
brain shapes the mind, shapes action, and reason. This is
because unlike the range of direct problems in conventional
physics that involve computing effects of forces on objects,
brains of animals have to deal exactly with the inverse
problems of learning, reasoning, and choosing actions that
would enable realization of one’s goals and hence ultimately
survive. Strikingly, many of the inverse problems faced by the
brain to learn, reason, and generate goal-directed behavior are
indeed analogous to the ones roboticists must solve to make
their robots act cognitively in the real world. It was this
interleaving of ‘‘extrinsic’’ and ‘‘intrinsic’’ value that fasci-
nated Prof Taylor and drove him to co-author and work in
DARWIN project. At the same time, it is only fair to say that in
spite of extensive research scattered across multiple scientific
disciplines and prevalence of numerous machine learning
techniques, the present artificial agents still lack much of the
resourcefulness, purposefulness, flexibility, and adaptability
that biological agents so effortlessly exhibit. Certainly, this
points toward the need to develop novel computational
frameworks that go beyond the state of the art and endow
cognitive agents with the capability to learn cumulatively and
use past experience effectively ‘‘to connect the dots’’ when
faced with novel situations [25]. Perhaps a ‘‘humanlike’’
learning touch to machine learning algorithms is the need of
the times ahead!
Looking at incessant loop of gaining experience and
using experience (as prevalent in most biological systems
that demonstrate cognition), learning and reasoning can be
seen as foreground and background alternating each other as
intricately depicted in the artistic creations of M.Escher [46].
In an intriguing work during the early days of embodied/
enactive cognition, Mark [41] playfully remarked that ‘‘we
are rational animals but we are also rational animals,’’
emphasizing the fact like learning that the structure of rea-
soning and inference also does not transcend the structure of
bodily experience. The centrality of embodiment directly
influences ‘‘what’’ and ‘‘how’’ things can be meaningful to
us, the ways in which our understanding of the world is
gradually bootstrapped by experience and the ways in which
we reason about them. In this essence, we believe that for
cognitive robots foreseen to operate in open-ended
unstructured environments, learning and reasoning must
cumulatively drive each other in a closed loop: more
learning leading to better reasoning and inconsistencies in
reasoning driving new learning. For simplicity in neural
computation, this implies that part of the cortical substrates
activated during perceptual and motor learning (i.e., when
an agent gains experience) are also activated when an agent
reasons and simulates the causal consequences of its
actions. While resonance between top-down and bottom-up
information flows is a measure of the quality of learning,
dissonance is the stepping stone to explore, gain more
experience and learn further. Such neural reuse also makes
sense considering the fact that brain is a product of evolu-
tion, meant to support the survival of a species in its natural
environments and importantly operates under constraints of
space, time, and energy. A wealth of emerging evidence
1 DARWIN stands for Dexterous Assembler Robots Working with
embodied INtelligence (www.darwin-project.eu).
356 Cogn Comput (2013) 5:355–382
123
from neuroscience substantiates this fact (see [8, 24, 28, 33,
49] for recent reviews). We believe that this aspect must be
an essential design feature in future cognitive robots that
have any chance to survive, cooperate, and assist humans in
the real world. While emerging results from functional
imaging and behavioral studies may serve as a guiding light,
there is still an urgent need to also focus on ‘‘cognitive
computation’’ and look deeper into the underlying compu-
tational principles in order to create artificial cognitive
systems that can both be ‘‘practically useful’’ and in turn
shed deeper insights into the ongoing ‘‘neural computation’’
in the brain. In this context, building up on an intriguing
review a decade back by Hesslow [32], we believe that
computational architectures driving cognitive robots must
include three basic features that form the core of the
embodied simulation hypothesis.
Simulation of Action and Body Schema
Mounting evidence accumulated from different directions
such as brain imaging studies [21, 28], mirror neuron
systems [61–63], and embodied cognition [23, 24] gener-
ally supports the idea that action ‘‘generation, observation,
imagination and understanding’’ share similar underlying
functional networks in the brain. In general, there is
growing evidence for the fact that neural circuits in the
predominantly motor areas are also activated in other
contexts related to ‘‘action’’ that do not cause any overt
movement. Such neural activity occurs not only during
imagination of movement ([13, 17, 18], several others) but
also during observation and imitation of other’s actions [9,
21, 28, 39] and comprehension of language, that is, both
action-related verbs and nouns [20, 26, 27, 47, 59]. The
neural activation patterns include not only pre-motor and
motor areas such as PMC, SMA, and M1 but also sub-
cortical areas of the cerebellum and the basal ganglia.
During the observation of movements of others, an entire
network of cortical areas called as the ‘‘action observation
network’’ that includes the bilateral posterior superior
temporal sulcus (STS), inferior parietal lobule (IPL),
inferior frontal gyrus (IFG), dorsal pre-motor cortex, and
ventral pre-motor cortex are activated in a highly repro-
ducible fashion [28]. The central hypothesis that emerges
out of these results is that motor imagery and motor exe-
cution draw on a shared set of cortical mechanisms
underlying motor cognition. In simple terms, it posits that
one can reason about an action (reach, grasp, push, etc.)
without actually performing the action and yet use the
same neural substrate in the sensory motor system. Based
on the wealth of neurobiological evidence, a preliminary
foundation for such a ‘‘shared’’ computational machinery
for ‘‘execution, simulation and understanding’’ of Action
has been created through the development of the Passive
Motion Paradigm framework (see [51] for a recent review)
and used successfully in a range of tasks like bimanual
coordination, motor skill learning, and tool use in the
humanoid iCub (http://www.icub.org/) one of the robots
used by the DARWIN consortium. The PMP mechanism
basically emulates the animation of a ‘‘body schema’’
intended to be not like a passive homunculus posited by
Penfield but a multi-referential dynamical system which
deals at the same time with sensorimotor variables in the
end-effector space, joint space, and ‘‘tool space.’’ Note that
the issue of body schema is not as popular in cognitive
robotics in comparison with the concept of embodiment.
These are not the same things. If you have a body schema,
you also have embodiment but not the other way around.
Venon et al. [74] in their discussion on a roadmap for
cognitive development in humanoid robots present a cata-
log of cognitive architectures, but in none of them, the
concept of body schema is a key element. More recently,
Hoffmann et al. [34] and Mohan and Morasso [51] review
this concept in robotics, emphasizing the gap between the
idea and its computational implementations. Studies on
tool use in animals by Iriki and Sakura [40] and Umilta
et al. [72] further support this viewpoint. In this article, we
develop these ideas further describing how ‘‘object action
and body’’ are connected in the DARWIN architecture,
how novel actions can be learnt and simulated in the
context of a goal, and what are the underlying advantages.
Simulation of Perception and Distributed Organization
of Semantic Memory
Imagining perceiving something is similar to actually per-
ceiving it, only difference being that the perceptual activity
is generated top down rather than by environmental stimuli.
While this perspective has been emphasized in the reviews
of Hesslow [32, 33], Grush, [29] among others, more recent
developments on the organization of semantic knowledge in
the brain (See [16, 48, 49, 57]) provide further insights that
help to constrain computational architectures for cognitive
agents. The main finding emerging from these results is that
conceptual information is grounded in a ‘‘distributed fash-
ion’’ in ‘‘property-specific’’ cortical networks that directly
support perception and action (and that were active during
learning). Same set of cortical areas are known to be active
during real perception, imagination, and lexical processing.
It is also established that ‘‘retrieval’’ or reactivation of the
neural representation can be triggered based on partial cues
coming from ‘‘multiple modalities’’: for example, sound of a
hammer retro activates its shape representation [43, 50],
presence of a real object (banana) or a 2D picture of it can
still activate the complete network associated with the object
(and that was active during learning of it in the first place).
The results indicate that while there is a fine level of
Cogn Comput (2013) 5:355–382 357
123
‘‘functional segregation’’ in the higher-level cortical areas
processing sensorimotor information, there is also an
underlying cortical dynamics that facilitates ‘‘cross-modal,
top-down and bottom-up’’ activation of these areas. ‘‘Higher
level’’ is emphasized because there is reason to believe that
both early stages of perception (lower level color, shape
processing) and late stages of action (like muscle activity)
should not be involved in embodied simulation. Otherwise,
it would become impossible to distinguish simulation from
reality (and we believe retaining this distinction has
advantages in computational terms too). There is evidence
from both motor [19] and perceptual studies [49]. In the
sections that follow, we attempt to transform the findings
from neuroscience related to simulation of perception into a
possible computational framework for organization of per-
ceptual systems driving DARWIN robots (and conduct
experiments to understand the resulting benefits in terms of
the inferential capabilities of the robot).
Small Worlds, Hubs, and Global Integration
For any large-scale interconnected system composed of many
millions of contributing elements (neurons, people, comput-
ers, etc.) to work efficiently, mechanisms related to functional
segregation and global integration must be synergistically
coupled, disruption of such synergy often leading to large-
scale systemic breakdown. From the view point of embodied
simulation, global integration gives rise to powerful associa-
tive mechanisms that enable neural activity (coming from
partial cues) to swiftly elicit other context-related neural
activity in various cortical areas, hence resulting in the
emergence of loops of anticipation between perception and
action (most often in relation to the goal at hand). Importantly,
a simulated action must be able to elicit perceptual activity that
resembles the activity that would have occurred if the action
had actually been performed and vice versa, that is, a simu-
lated perception must be to trigger actions that are ‘‘doable’’ in
the context of the imagined situation, hence revealing how the
world can further be causally transformed (and if it is valuable
in the context of a goal). Integrative mechanisms hence basi-
cally close the loop between simulated perception and simu-
lated action. From a computational perspective, in a large-
scale complex system like the brain, efficient integrative
mechanisms may also help minimizing the number of pro-
cessing steps, ensure efficient wiring thus costing less area,
low metabolic cost for transmission of information, synchro-
nizability, pattern completion, and conflict resolution. Are
these specific principles involved, which can be exploited
while creating cognitive artifacts like Darwin?
Recent developments in the fields of network theory [4, 5]
and connectomics [67] provide the guiding light. The point of
intersection is the property of ‘‘small worldness’’ now found to
be prevalent in many large-scale networks. In simple terms,
‘‘small worlds’’ are complex systems where individual mem-
bers both form tightly knit local communities (high clustering)
but at the same time characterized with very short path length.
Since the seminal work of Watts and Strogatz [79], [6], it is
now established that several complex systems like social
networks, transportation networks, power grids, connectivity
of the internet, gene networks, food webs, patterns in sexually
transmitted diseases (STDs) among several others exhibit the
‘‘small-world’’ property. Emerging evidence from analysis of
large-scale architecture of the cerebral cortex [30, 67–69]
using techniques like Diffusion Tensor Imaging substantiates
the fact that cortical networks of the brain also exhibit the
small-world property. Basically, they suggest existence of a
small set of hubs (highly connected cortical patches) that
closely interact to facilitate swift cross-modal, top-down, and
bottom-up iterations between sub-networks involved in
learning, simulating, and representing various sensorimotor
information. This is interesting because the studies mentioned
earlier (in relation to simulation of perception and action) also
point toward existence of few set of hubs that facilitate both
‘‘integration and differentiation’’ [16, 49, 57]. Further, with the
recent discovery of the default mode network in the brain [8,
10, 11, 70]; Addis and Schacter [1, 2, 31, 82], it is now also
known that a core network of ‘‘highly connected’’ areas is
consistently activated when subjects perform diverse cogni-
tive functions like recalling past experiences, simulating
possible future events (or prospection), planning possible
actions, and interpreting thoughts and perspectives of other
individuals. In sum, the exciting recent developments from
neuroscience, connectomics, and network science call for
creation of novel computational frameworks for learning and
reasoning that are strongly grounded in the neurobiology of the
brain. As far as we are aware, these recent findings have still
not found a place in the computational architectures driving
‘‘acting, learning, and reasoning’’ robots.
‘‘Brain guidance’’ or the need to ‘‘learn from the
existing solutions’’ was often emphasized by Prof
Taylor in his numerous plenary talks and articles
throughout his illustrious career and also during his
short but inspirational stint while working on the
DARWIN architecture. Recent developments
emerging from multiple fields like connectomics,
network science and neuroscience provide valuable
insights to guide development of novel computational
frameworks that go beyond the existing state-of-the-
art machine learning systems. This would most
probably both increase the sustainability of cognitive
artifacts assisting humans in the real world and
increase their value in the eyes of its end user. At the
same time such a pursuit would lead towards novel
theoretical formulations of embodied intelligence that
358 Cogn Comput (2013) 5:355–382
123
is deeply grounded in the biology of the brain. We
believe this article is just a preliminary attempt in this
direction.
The rest of the paper is organized as follows. The next
section deals with simple naming games or the robot learning
about objects in its play ground. This section is used to go
deeper into small-world networks, distributed organization of
concepts, network dynamics, and related issues. We describe
how even a small sub-network consisting of just four neural
maps is endowed with its own ‘‘local’’ ability to both ‘‘rea-
son’’ in novel situations and ‘‘resolve’’ contradictions that
may arise between what the system anticipates ‘‘top down’’
and what actually activates the system ‘‘bottom up.’’ ‘‘A Body
Schema for Cognitive Robots: ‘‘Why and What’’’’ section
deals with the issue of body schema and its implementation in
one of the DARWIN robots (iCub) and discusses its potential
utility in the DARWIN architecture in relation to ‘‘simulation
of action.. ‘‘Connecting Object, Action and the Body:
Learning About Action and Simulating Action’’ section
connects ‘‘object, action, and body’’ building up on what has
been presented in ‘‘Naming Games: Learning About Objects
and Simulating Perception’’ and ‘‘A Body Schema for Cog-
nitive Robots: ‘‘Why and What’’’’ sections. How the robot
‘‘learns to push’’, anticipates its consequences on various
objects, inversely generates goal-directed pushing is used to
illustrate central ideas. Note that ‘‘pushing’’ is an important
‘‘multipurpose’’ action investigated significantly in the field
of animal and infant cognition. ‘‘Simulating ‘Perception’ and
‘Action’ in the Context of a ‘‘Goal’’’’ section demonstrates
how all the learning comes to use in the context of a goal (a
simple assembly task). A discussion concludes.
Naming Games: Learning About Objects
and Simulating Perception
We start with a simple scenario of the robot learning about
various objects in its playground and associating their names
with their perceptual properties. The scenario is used to
describe how object concepts are learnt and organized in the
DARWIN architecture. For clarity, this section is broken into
various subsections that go into the details of various topics
like small worldness, distributed organization of object con-
cepts in Darwin, learning a simple ‘‘color-word-shape’’
small-world network, activity dynamics, and inferential
capabilities of the robot at the end of this learning phase.
Small Worldness
Intuitively, any interconnected system consisting of many
millions of individual members (people, neurons, com-
puters, etc.) is a ‘‘small world’’ if any member can connect
to any other member in very few number of hops. Small
world’s are complex systems where individual members
both form tightly knit local communities (high clustering)
but at the same time characterized with short path length
(globally accessible). Since the seminal works of Watts and
Strogatz [79] and Barabasi and Albert [6], it is now
established that several complex systems exhibit the
‘‘small-world’’ property [5]. As an analogy, we all like to
connect to the most well-connected people around us and
this shortens our global reach in the complex social net-
work. More recent attempts to map the large-scale struc-
tural architecture of the cerebral cortex (Haggman et al.
[30], or see an exciting recent book by Sporns [67]), has
now revealed that the cortical networks in the brain also
exhibit the small-world property. Several highly connected
zones or Hubs have been identified through DTI and
tractography, and there is emerging evidence that disrup-
tion of ‘‘small worldness’’ may play a role in causing
neurological disorders like Alzheimer’s disease, Schizo-
phrenia, and Autism spectrum disorders (see chapter 10 of
Sporns [67] for recent survey). In any network in general
too (like internet, airports, etc.), we must note that it is the
well-connected zones (hubs) that are vulnerable to attack,
hence causing noticeable disruption in the global func-
tioning of the system. While ‘‘self-organization’’ as a
computational principle has been used extensively in the
literature, ‘‘small worldness’’ has seldom been exploited in
the design of cognitive architectures for embodied robots
(as far as we are aware). We explore this idea further while
designing the DARWIN architecture and investigate its
computational advantages.
Distributed Organization of Object Concepts Within
a ‘‘Small World’’
Recent functional imaging and behavioral studies shed
light on how conceptual knowledge (and semantic mem-
ory) is organized in the brain, more importantly compatible
with the ‘‘hub-based’’ small-world framework [16, 49, 50,
57]. The main finding emerging from this area of investi-
gation is that conceptual information is grounded in a
‘‘distributed fashion’’ in ‘‘property specific’’ cortical net-
works that directly support perception and action (and that
were active during learning). Same set of networks are
known to be active during real perception/action, imagi-
nation, and even lexical processing. It is also now well
known that ‘‘retrieval’’ or reactivation of the conceptual
representation can be triggered based on partial cues
coming from ‘‘multiple modalities’’ (sound, 2D picture,
real object, word, etc.).
What are the computational principles necessary to
create such a brain inspired computational framework of
representing object concepts in Darwin that will allow both
Cogn Comput (2013) 5:355–382 359
123
learning globally (in sounds, colors, words, shapes, and
movements) and endowed with the ability to activate the
complete network from partial cue (sound, word, picture) ?
It is here that we exploit computational principles of ‘‘self-
organization’’ to learn from experience, emerging evidence
related to property-specific distributed networks (to endow
compositionality) and network theory inspired ideas of
‘‘small worldness’’ (to enable multimodal integration and
pattern completion from partial cues). Figure 1 shows the
block diagram that captures the building blocks and
information flows. We briefly summarize the details below.
The sensory streams: At the bottom is the DARWIN
sensory layer that includes the sensors and associated
lower-level communication protocols and algorithms to
analyze properties of the objects mainly color, shape, and
size. The color of objects is analyzed by a color segmen-
tation module based on a recent approach using Markov
random fields [64] developed by one of the partners in the
DARWIN consortium (referred in the acknowledgment
section). This returns a triad of RGB values which forms
the input to the color SOM. At the level of concept system,
information related to object shape is passed as 120-bit
vector’s unique for each shape (like an abstract identifier
of the object). In this way, the complexity of shape anal-
ysis is abstracted from the concept system. Size-related
information is organized into two different maps one
coding for magnitude (the maximum length of the object
across any axes in Cartesian space: say S1) and proportion
(i.e., the ratio of the maximum length with respect to
lengths in the other two axes, say S2). S3 relates to ori-
entation that is not a property of the object itself but rather
is relative to the frame of reference of the observer. This
kind of organization of size-related information is partly
inspired by recent evidence related to representation of
magnitude in the parietal cortex [12]. There are several
advantages of this scheme in terms of inferring what can be
done with different objects that may be indistinguishable
through color or shape (for example, consider a green cube
and a green stick: both have same shape and color; what
distinguishes them is the abstract magnitude and propor-
tion: the former can be used to build a stack the latter as a
tool to pull an unreachable reward). Word information is
the input directly coming from the teacher. Infants often
learn to associate ‘‘words’’ with objects by learning in a
social environment and interacting with the parent/teacher.
It is further possible to exploit compositionality in the
domain of words. For example, consider an example of a
‘‘black apple,’’ even though we may have never encoun-
tered what such an object, we can easily ‘‘imagine’’ what is
should be and this should activate ‘‘top-down’’ higher-level
Fig. 1 The block diagram that captures the building blocks and
information flows that leads to distributed representation/learning of
object concepts. Growing SOM stands for ‘‘growing self-organizing’’
maps learning, representing, and simulating different perceptual
information about objects color, shape, name, size, etc. The box in the
top left shows 12 (out of 13) possibilities to connect 3 nodes, of which
a particular type of connectivity called as ‘‘dual dyad’’ (highlighted)
has been found prevalent in the cortex of several organisms. In the
block diagram, the connectivity between various self-organizing maps
is of dual dyad type, every node representing a neuron in a different
map. The basic computational advantage is to have both functional
segregation and at the same time global integration, hence allowing
the possibility of even a single neuron (in any map) to ‘‘retroactivate’’
a large-scale cognitive network (Color figure online)
360 Cogn Comput (2013) 5:355–382
123
areas processing color and shape and not just words as is
known from several studies in brain imaging [16]. At
present, ‘‘word’’-related inputs are entered by the teacher
using the keyboard and converted into vectors on the basis
of letter usage frequencies in English language as is done in
[37]. In the present system, a sequence of maximum three
words or lesser describing the object (size-color-shape, for
example ‘‘small red cube’’) are considered, the resulting
individual activity superimposed to get the final activations
in the word SOM. Form an ‘‘application perspective,’’
incorporation of little linguistics (that is grounded in sen-
sorimotor experience of the learner) endows the architec-
ture with a measure of user friendliness. In the future, we
look forward to replace the input modality from the
‘‘keyboard’’ to a direct auditory channel (along the lines of
work done in the EU-funded FP7 CHRIS project or other
available speech analysis software).
Learning a Simple ‘‘Color-Word-Shape’’ Small-World
Network
The information coming from the DARWIN sensory layer
is projected bottom up to a set of growing self-organizing
maps (SOM) learning and representing object properties at
a conceptual level. The first-level neural connectivity
between the sensory layer to property-specific SOMs is
learnt using basic SOM procedure [22, 44]. As we go
higher up in the hierarchy (Fig. 1), the representations
become more multimodal and there is greater integration of
information coming from multiple SOM maps (in layer 1).
Here, we need go beyond the standard self-organizing
maps and introduce some novel concepts for learning
‘‘layer 1 SOM to hub’’ (and higher up ‘‘hub to hub’’ neural
connectivity). In general, hubs are also self-organizing
maps but higher up in the processing hierarchy and serve
two main purposes:
1. Facilitate multimodal integration of information arriv-
ing bottom up (from the sensory streams through layer
1 SOMs)
2. Enable both ‘‘top-down’’ and ‘‘cross-modal’’ activation
of various ‘‘property-specific’’ maps during reasoning
and resolution of contradictions.
As seen in Fig. 1, we distinguish between two kinds of
hubs: ‘‘provincial hubs’’ that integrate neural activity
coming from small sets of lower-level SOMs and ‘‘con-
nector hubs’’ that integrate information coming from pro-
vincial hubs. An analogy may be that of a team leader (who
works on a specific problem with a group of 3–4 students)
and the director of a department (who is the face of the
organization for the external world). The connectivity
between property-specific maps and hubs is developed
using three additional rules as described below.
1. Preferential Attachment: This idea is simple and just
means that there is a tendency of individuals to ‘‘preferen-
tially’’ connect to other highly connected ‘‘individuals’’
(instead of randomly connecting to anyone in the network).
This has the net effect of reducing path length between any
two individuals in a large-scale complex network. It is well
known from network theory that preferential attachment
gives rise to growing scale-free networks with small-world
properties [6], a feature prevalent in many real-world systems.
In the initial attempts to create growing networks with small-
world properties, preferential attachment of new nodes was
directed toward existing ‘‘highly connected’’ individuals with
greater ‘‘nodal degree,’’ hence modeling a kind of ‘‘rich get
richer’’ phenomenon. In this case, the previously existing
nodes (or senior ones) have a clear advantage over new
comers. If this is the case, then how do new comers make it in
a world where ‘‘rich get richer’’? Realizing this issue, Bara-
basi [4] proposed a measure called as ‘‘fitness-connectivity’’
index hence combining ‘‘fit gets richer’’ with ‘‘rich get richer’’
to create growing networks. ‘‘Fitness’’ is generally ‘‘context
dependent’’ can be attributed to different factors based on the
network in question (power grids, internet, and air transport).
Considering that ‘‘space and wiring’’ constraints play a cru-
cial role in the emerging connectivity of the brain, we decided
to have a gradient of ‘‘fitness’’ so as to promote layer 1 neural
SOMs to preferentially connect to ‘‘provincial hubs’’ (and
‘‘provincial hubs’’ to ‘‘connector hubs’’). In the biological
case, we believe it is plausible that evolutionary pressures and
genetic factors may play a role determining ‘‘fitness’’ of
cortical areas to promote preferential attachment.
2. Temporal Coincidence: This simply means that if
neurons in different self-organizing maps are concurrently
active (within a temporal window), then they get connected
to each other (not directly) but through the ‘‘provincial
hub’’ in their territory. Note that being connected through
the provincial hub (and not directly) ensures that there is
both functional segregation (between different neural
maps) and at the same time global integration. An analogy
is two doctoral students working on their own problems,
collaborating at instances and connected through a team
leader. There is close contact at the same time a level of
local functional autonomy.
3. Dual Dyad Connectivity: If there are 3 nodes, then
there are 13 ways to connect them (12 of which are shown
in the left panel of Fig. 1). C. Elegans is a tiny worm
measuring about 1 mm whose brain (with about 302 neu-
rons) has been exquisitely studied for almost 3 decades.
Way back in 1985 the overabundance of ‘‘triangular sub
circuits’’ of a particular type called as ‘‘dual dyad’’ (high-
lighted in Fig. 1) in the brain of C. Elegans was noted by
White [83] and this has been confirmed in several other
subsequent studies. More recently analysis of the cat and
macaque cortex has also revealed that ‘‘dual dyad’’
Cogn Comput (2013) 5:355–382 361
123
connectivity is found in significantly high proportions [69].
This implies that such kind of connectivity comes with
advantages (hence being retained by evolution). Guided by
these studies, while connecting neurons belonging to dif-
ferent neural SOMs, we have retained the ‘‘dual dyad’’-
type connectivity. The computational gains of having such
reciprocal connectivity between multiple maps will be
demonstrated gradually in various sections.
Figure 2 (right panel) shows activity in the color, word,
shape, and provincial hub maps while learning the first steps
of associating names of different objects (given by the user)
with its perceptual properties (color and shape processed
bottom up through the sensory channels). The left panel
results will be elaborated in the next subsection after
describing global network dynamics of the ‘‘small world.’’
In this subsection, we merely focus on how the connectivity
between various neural maps is developed (the connectivity
relates to ‘‘color,’’ ‘‘word,’’ ‘‘shape,’’ and provincial hub).
Let N be the number of neurons in any SOM and S be
the dimensionality of the bottom-up input feeding the map.
Fig. 2 (Left panel) Neural activity in the self-organizing maps related
to color, word, shape, and provincial hub while learning a simple
‘‘small-world network’’ that brings together all these functionally
segregated neural maps (all driven bottom up by sensory channel) into a
globally integrated system (at the level of provincial hub which is
analogous to a team leader working with 3 graduate students). Five
different cases are shown in the left panel. In every case, the robot is
presented with a novel object and coincident with a word sequence
(provided by the teacher). Neurons in different property-specific maps
that have sensory weights closest to the incoming input signal start
representing these signals (with their sensory weights gradually adapted
as in standard SOMs). At the same time, connections are developed
between ‘‘property-specific’’ SOMs and the provincial hub due to
preferential attachment and temporal coincidence (for example, winner
the color SOM connects to the winner in the word SOM through the
provincial hub by means of dual dyad connectivity pattern). In the left
panel, the activations in the word and provincial hub are shown twice as
they correspond to activations in response to individual components
(color and shape). The net activation can be considered as the
superposition of activations resulting from individual components (like
n the right panel). Right panel the local inferential powers of even this
‘‘small patch’’ in the DARWIN architecture. When someone else
mentions the word ‘‘red horse’’ or grasp a ‘‘black apple,’’ most of us may
be able to anticipate what this new sequence of words may refer to. If a
black apple is eventually kept in front of us, most of us would even grasp
it because we can anticipate top down what a novel object ‘‘could be’’
and if bottom-up sensory input activates the neural maps exactly in the
same way as top down, we can infer this object is indeed the ‘‘black
apple.’’ Action networks could be triggered to initiate the action, though
the goal was unheard of. The same scenario is replicated on the robot:
the user inputs a new word sequence, we observe how activity in the
‘‘word map’’ retroactivities gradually in time, the complete network (in a
very different way from what was learnt: right panel). If a blue cube is
indeed kept in front of the robot, ‘‘top-down’’ and ‘‘bottom-up’’ activity
will resonate allowing the robot to infer that the new object is indeed the
blue cube that it has been commanded to grasp (Color figure online)
362 Cogn Comput (2013) 5:355–382
123
Then, the connectivity matrix has a dimensionality of
N 9 S. Since we are dealing with multiple maps here, for
clarity, we address NC, NS, NW, and NPH as the number of
neurons in the color, shape, word, and provincial hub,
respectively. Since color, word, and shape SOM activity
forms the bottom-up input to the provincial hub, the con-
nectivity matrix of the provincial hub has a dimensionality
of NPH 9 (NC ? NS ? NW). Since all SOMs are growing,
N itself is a function of time and experience that the robot
acquires. For the illustration purposes in Fig. 2, activity of
9, 9, 30 and 36 neurons in the color, shape, word, and
provincial hub maps, respectively, is shown. Five different
cases are shown in different rows, in each case the robot is
presented with a new object followed by the linguistic
input of what it is from the teacher. In the first case, the
robot is presented with a yellow cylinder. Color and shape
is analyzed bottom up through the sensory layer and feed
the respective SOMs with sensory vectors SC and SV,
respectively. In the same temporal window of integration,
the teacher inputs the word sequence ‘‘yellow cylinder.’’
The two words are inputted in a sequence, and the activity
in the word SOM and provincial hub in response to indi-
vidual components (in this case word 1 describing color
and word 2 describing shape) is shown separately in Fig. 2
(left panel). The net activation in the word SOM and
provincial hub can be visualized as the superposition of the
individual activations (like in Fig. 2 right panel). The dif-
ferent sensory streams activate bottom up the various layer
1 neural maps that initially have randomly initialized
connectivity matrix. Layer 1 maps are trained in parallel
using the standard SOM procedure that is fairly standard
and discussed in detail in numerous references (see [22,
44]). In short, this consists basically of two steps:
1. Finding the neuron ‘i’ that shows maximum activity
for the observed sensory stimulus St at time t. This also
implies that neuron ‘i’ sensory weights si such that
||(si - St)2|| has the smallest value, among all neurons
existing in the respective SOM at that instance of
time.;
2. Adapting the sensory weights of the winner in a
Hebbian fashion by bringing the sensory weights si of
the winner ‘‘i’’ closer to the stimulus St. This simply
has the effect that in future instances the neuron ‘‘i’’
actively codes for the particular sensory stimulus St. In
this way, neurons in different property-specific maps
of layer 1 that have sensory weights closest to the
incoming input sensory vector start representing these
signals.
The net activity in the color, word, and shape SOMs
forms the bottom-up input to the provincial hub. The
connectivity is of dual dyad type, and weights are adjusted
in two identical steps one relating to ‘‘color–word’’
association and other related to ‘‘shape–word’’ association.
This is because the teacher is inputting a sequence of two
words, the first related to color and second related to shape
(of course nothing stops from training the maps separately
in a distributed organization scheme, for example, while
showing a yellow paper and uttering the word ‘‘yellow,’’
shape map is just switched off and learning takes place
between hub and color SOMs). However, we chose to train
them together because color and shape maps do not gen-
erally interfere with each other. Haptics has just recently
been incorporated in iCub humanoid (see EU-funded
Roboskin project for details), and work is ongoing to
exploit this modality, but at present, vision is the main
source of sensory information. The learning rule to connect
layer 1 SOMs with the provincial hub is as follows:
‘if neuron ‘‘i’’ and neuron ‘‘j’’ winning in the color and
word SOMs, respectively, manage to activate neuron ‘‘k’’
in the provincial hub, make Wik = 1 and Wjk = 1. This has
a net effect of enabling neuron ‘‘k,’’ ‘‘i,’’ and ‘‘j’’ in three
different SOMs (operating on their own local sensory
streams) to retro activate each other in ‘‘bottom-up,’’ ‘‘top-
down,’’ and ‘‘cross-modal’’ fashion. The same applies to
adjusting connectivity between shape, word, and hub
SOMs.’’ The internal weights of the provincial hub can
either have random initialization or a winner ‘‘k’’ can be
randomly chosen from the subset of neurons in the pro-
vincial hub that have internal weights zero. The net effect
is that in both cases, there is some neuron in the ‘‘provin-
cial hub’’ that responds to activity in two different SOMs
processing different sensory streams. Activity in any map
can gradually trigger the whole network, hence enabling
‘‘pattern completion’’ from a partial cue.
To start with 5 objects (of different colors and shapes)
associated with their names are taught to the robot. The
activity in various SOMs is shown in Fig. 2. The activity in
the ‘‘word’’ and ‘‘provincial hub’’ maps is shown twice for
clarity, because the teacher input consists of a sequence of
two words. As we can see, in every the layer 1 SOM,
different neural units start learning and representing dif-
ferent sensory stimulus. In the future, if similar stimulus is
projected bottom up, then the neuron coding for it is
reactivated. For example as seen in Fig. 3 (right panel, row
1), showing just a red paper to the robot activates the
neuron coding for ‘‘red’’ sensory stimulus in the color SOM
processed bottom up through vision (and experienced first
while the robot was presented with the red pyramid). At the
same time, activity in the color, shape, and word SOMs is
integrated at the level of the provincial hub by means of the
‘‘dual dyad’’ connectivity pattern. This also implies that
showing a ‘‘red paper’’ to the robot should ‘‘cross-mod-
ally’’ activate the word representation ‘‘red’’ in the word
SOM, even though in this case there is no word input from
the teacher. This is indeed the case. In other words, just
Cogn Comput (2013) 5:355–382 363
123
perception of color ‘‘bottom up’’ is sufficient to retroacti-
vate the global network learnt during past experience, but
at the same time just associated with the particular ‘‘partial
cue’’ perceived in the present context (in this case, there is
no activation in shape map). This behavior is very common
in infants (show them a ‘‘dog’’ for example and say the
word ‘‘bow bow,’’ next time the child sees a dog, we often
see it playfully pointing to it with the word ‘‘bow bow’’).
Studies in functional imaging go even further providing
evidence that even if it was a toy dog, a real dog, or a
cartoon or just the word ‘‘bow bow’’ should activate the
global network as was experienced during learning [49]. To
further understand how this ‘‘cross-modal’’ and ‘‘top-
down’’ retroactivation takes places when the network is
triggered with a partial cue from the environment, we look
into the global dynamics of the ‘‘small world,’’ which is the
topic of discussion in the next section.
Network Dynamics, Pattern Completion, and Modality
Independence
In the proposed distributed small-world organization, even
the simple ‘‘color-world-shape’’ network consisting of just
four neural maps is endowed with its own ‘‘local’’ ability to
‘‘reason’’ in novel situations, grow, and resolve con-
tradictions that may arise between what the system antic-
ipates ‘‘top down’’ and what actually activates the system
‘‘bottom up.’’ To achieve this objective, the small-world
network has to be complemented with an equally powerful
dynamics that allows neural activity in one map to retro-
activate other relevant networks in ‘‘top-down,’’ ‘‘bottom-
up,’’ and ‘‘cross-modal’’ fashion. The network dynamics
builds upon the idea of neural fields [3] and supplements it
with novel concepts like the introduction of the bifurca-
tion parameter [53] that both brings in computational
Fig. 3 Left panel four cases that demonstrate compositionality,
modality independence, and pattern completion properties depicted
by the ‘‘color-word-shape-provincial hub’’ sub-network composed of
four neural maps. Right panel presents an interesting scenario where
the user issues a goal to reach the ‘‘red container’’ (both a novel word
and at the same time such an object has never been encountered
before). The evolving graphs on the top show the temporal evolution
of activity in different maps when given a ‘‘new word.’’ The graphs at
the bottom show two cases: bottom-up network activity (bifurcation
parameter = 0) when a previously unseen object (green container) is
kept in front of the robot and when another unseen object ‘‘red
container’’ is placed in front of the robot. In the later case, we can
observe that ‘‘top-down’’ activity correlates with ‘‘bottom-up,’’ even
though in both cases the object has been never encountered before
(and commanded just using linguistic input) (Color figure online)
364 Cogn Comput (2013) 5:355–382
123
advantages and is biological plausible (as will be discussed
later). Let hi be the activity of the ith neuron in the pro-
vincial hub and xprop be the activity of a neuron in any of
the property-specific SOMs connected to the provincial hub
(in this case color, word, and shape SOMs). Let Wprop,hub
encode the connections between the property-specific
maps and the provincial hub. Basically, Wprop, hub is a
NPH 9 (NC ? NS ? NW) matrix learnt as explained in the
previous section. Its transpose encodes backward connec-
tivity from hub to individual maps. The network dynamics
of hub neurons and neurons in the property-specific maps
are governed by Eqs. (1) and (2), respectively:
shub_hi ¼�hiþð1� bÞ
X
i;j
Wprop;hubXpropþ b:ðTopdownÞ� �
ð1Þ
sprop _xprop ¼ �xprop þ ð1� bÞSprop þ b �X
i;j
ðWhub; prophhubÞ
ð2Þ
where
Sprop ¼1ffiffiffiffiffiffiffiffiffiffi
2prs
p e�ðsi�sÞ2
2r2s
The instantaneous activation of any neuron in the hub or
the property-specific maps is governed by three different
components: The first term induces an exponential
relaxation to the dynamics. The second term is the net
feed-forward (or alternatively bottom-up) input. Since
property-specific maps are inputs to the provincial hub,
activity of neurons in the property-specific maps (Xprop
evolving through Eq. 2) drives the activity of hub neurons
(modulated by the connectivity matrix Wprop, hub). At the
same time, the sensory layer is the bottom-up input to the
property-specific maps. Since the property-specific maps are
trained using standard SOM procedure, a Gaussian kernel is
used to compare the sensory weight si of neuron i with
current sensor activations S in order to determine its bottom-
up activity. So while sensory layer drives property-specific
neural maps bottom up, the activity of the neurons in the
individual neural maps drives the provincial hub bottom up.
The third component is the top-down component: for the
property-specific SOMs, the top-down input comes from
the provincial hub to which they are connected. For the
provincial hub, the top-down component comes from the
connector hub (to which it will be linked using exactly
the same principles of preferential attachment and temporal
coincidence: this will become prominent in later sections
hence is just mentioned as ‘‘top down’’ in Eq. 1). So just like
the provincial hub activates the property-specific maps,
activity in the connector hub can activate the provincial hub
(which inversely acts as the bottom-up input to the
connector hub). Thus, there is always a bidirectional flow
of information, as we move upwards, information becomes
more multimodal and integrated, as we move downwards, it
becomes more differentiated (to the level of basic properties
that are sensed by the sensory layer). The top-down input is
also biased by a parameter ‘‘b’’ called as the bifurcation
parameter proposed originally in [53] that plays the role of
modulating ‘‘how much’’ of the neural activity in a specific
map is governed ‘‘top down’’ and how much ‘‘bottom up.’’
For example, if b = 0 in Eq. 2, the system operates only on
real sensory input and is not modulated by activity coming
from the provincial hub. Recent results from brain imaging
[8] have provided evidence for existence of such
dynamic switching between endogenous mental activity
and attention-driven exogenous activity mediated by
anterior insula (AI) and anterior cingulate cortex (ACC).
Computationally, the bifurcation parameter has several
functions the main being detecting contradictions between
ones anticipations situation and what is actually perceived.
In simple terms, if the world does not behave the way we
anticipate it should, it may be better to attend to what is
happening in the real world and learn new things.
Perception as an Act of ‘‘Memory’’: Bottom Up Versus Top
Down
Figure 2 (right panel) shows an example of the application
of the network dynamics of the ‘‘color-word-shape’’ net-
work in novel situations. The user inputs a sequence of new
words ‘‘blue cube’’ (with no such object present in the
environment). As we see, the activity in the word SOM
gradually propagates to the provincial hub and eventually
activates the color SOM in a way that was learnt when the
robot was presented a ‘‘blue container’’ and shape SOM in
way that was learnt when the robot was presented a ‘‘green
cube’’. But the overall activity in the global system, that is,
provincial hub ? color-word-shape maps as a result of the
network dynamics triggered by the utterance of a new word
‘‘blue cube’’ resembles what the robot now anticipates that
a ‘‘blue cube’’ must be. If a blue cube is really kept in front
of the robot, bottom-up sensory input and top-down
anticipation will end up activating the same neurons in
every neural map (and resonance between top down and
bottom up is enough evidence to confirm that the novel
object placed in front of the robot is indeed a ‘‘blue cube’’).
Further any motor behavior (reaching, grasping, trans-
porting) can be executed also on this novel object (men-
tioned just by linguistic input by the user).
Figure 3 (right panel) presents an interesting scenario
where a user now issues a goal to grasp the ‘‘red con-
tainer’’ (both a novel word and at the same time such an
object has never been encountered before). The graphs on
the top show the temporal evolution of activity in different
maps when given a ‘‘new word.’’ The graphs at the bottom
Cogn Comput (2013) 5:355–382 365
123
show two cases: network activity when a previously unseen
object (green container) is kept in front of the robot and
when another unseen object ‘‘red container’’ is placed in
front of the robot. In the later case, we can observe that
‘‘top-down’’ activity correlates with ‘‘bottom up’’ even
though in both cases the object has been never encountered
before (and commanded just using linguistic input). Even if
the situation is novel, the robot is still able to execute a new
user command in the latter case (but at the same time in the
former case, the robot can infer that there is no ‘‘red con-
tainer’’ placed in front of it, hence quits the goal). In Figure
3, left panel shows four additional cases that show pattern
completion properties of the network. In sum, even in the
very basic network consisting of just four neural maps, the
results demonstrate three aspects:
1. How novel combinations of neural activity can emerge
by reconstructing relevant past experiences (relevant
means triggered by a partial sensory cue). This
perception is also seen as an act of memory and not
essentially driven bottom up.).
2. Resonance between ‘‘top-down’’ anticipation and
‘‘bottom-up’’ sensation leads to inferential mecha-
nisms that can be used to drive goal-directed action
(here simple cases like reaching, grasping novel
objects, unheard words).
3. Contradictions between ‘‘top down’’ and ‘‘bottom up’’
can be used as a stepping stone to learn further and
grow the neural maps.
Detecting Contradictions: Switching to Attention-Driven
Exploration to Learn Further
A side effect of ‘‘top-down’’ and ‘‘bottom-up’’ activity
being projected on the same neural substrate is the auto-
matic detection of contradictions. This information is cru-
cial and can be used to generate saliency signals to bias
attention toward the anomaly and generate exploratory
behaviors to learn further (to resolve the contradiction).
Such mechanisms are important if the robot has to keep
learning ‘‘cumulatively’’ and gradually build up its under-
standing of how the world works. Results from neurosci-
ence [8] provide support for this idea and suggest that
anterior insula and anterior cingulate cortex play important
role in the saliency detection network of the brain. Perhaps,
it is already evident when the user issues the goal to grasp
the ‘‘red container’’ (Fig. 3 right panel). Comparing the
top-down and bottom-up activity in different neural maps,
it is possible to infer that there is a container in the envi-
ronment but in the first case it is not of the right ‘‘color’’
that was requested by the user, while in the latter case, the
goal is realized. Further, the concept system is inherently
multimodal. Hence, in addition to mismatch between top
down versus bottom up, contradictions can also occur if
information coming from different modalities do not res-
onate with each other. The proposed computational model
also deals with such issues. Figure 4 presents some results.
When presented with a green sphere along with the word
green container inputted by the teacher, there is saliency in
the shape, hub, and word maps. Note that contradictions are
detected locally; in other words, the robot infers that there is
something green that correlated with the color perceived
visually and the word uttered by the teacher, but it also
infers that the there is contradictions in the shape and word
(it anticipation should be associated with the presented
object and what the teacher calls it to be). In the second
example, all maps detect saliency. The same applies to the
third scenario where an absolutely new object is presented
to the robot. As seen from the activity in different neural
maps, there is no definitive winner (there are multiple
hypothesis, hence greater saliency). Saliency can also be
thought as a measure of how confused the system is and this
applies ‘‘both’’ when there are ‘‘contradictions’’ and when
the system is operating in ‘‘novel situations.’’ Also note that
global saliency of a network is the cumulative sum of local
saliencies of individual members. Greater the global sal-
iency greater is the discomfort in the network, greater is the
urgency to learn further. The net effect of saliency in terms
of the network dynamics is to lower the bifurcation
parameter, hence causing the switch from endogenous
mental simulations to attention-driven exogenous explora-
tion. Thus, contradictions can be seen stepping stones to
learn new stuff (and the bifurcation parameter drives the
switch from endogenous mental simulations to attention-
driven exogenous exploration). More recently, interesting
results are emerging from neuroscience that implicate the
fact the delusional behaviors in neurological disorders (like
schizophrenia) are a result of improper mixing of ‘‘top
down’’ with ‘‘bottom up.’’ In this background, the bifurca-
tion parameter, now also has a biological basis and signif-
icant importance in switching network dynamics between
exogenous activity driven by real world and endogenous
mental simulations during reasoning about actions, resolv-
ing contradictions by either learn more or just reconciling
ones beliefs with what new has been experienced.
A Body Schema for Cognitive Robots: ‘‘Why
and What’’
Why do cognitive robots need a body schema? For the same
reason for which a human or a chimp needs it: simply put,
without one, it would be unable to use its ‘‘complex body,’’
take advantage of it, and ultimately survive. In general, for
an organism with a complex body inhabiting an unstruc-
tured world, the purpose of ‘‘Action’’ is not just restricted to
366 Cogn Comput (2013) 5:355–382
123
shaping motor output to generate movement but also to
provide the self with information on the feasibility, conse-
quence, and understanding of ‘‘potential actions’’ (that
could lead to realization of ‘‘goals’’). As described in the
introduction, mounting evidence from neuroscience sub-
stantiates the fact that neural circuits in predominantly
motor areas are activated in many contexts related to action
that do not cause any overt movement. Hence, overt actions
are just the tip of an iceberg: under the surface is hidden a
vast territory of ‘‘actions without movements’’ (covert
actions) which is essence of motor cognition. But as in the
iceberg metaphor, there must be continuity between what is
above and what is below the surface: the link, we suggest, is
the body schema mechanism. The issue of body schema is
not as popular in cognitive robotics in comparison with the
concept of embodiment. These are not the same things. If
you have a body schema, you also have embodiment but not
the other way around. Vernon et al. [74] in their discussion
on a roadmap for cognitive development in humanoid
robots present a catalog of cognitive architectures, but in
none of them the concept of body schema is a key element.
Hoffmann et al. [34] review this concept in robotics,
emphasizing the gap between the idea and its computational
implementations. A biologically plausible computational
formulation of ‘‘body schema’’ based on the idea of passive
motion paradigm (PMP: [55]) was developed by Mohan and
Morasso [51] and implemented on the 53 DoF humanoid
iCub. The model has been successfully applied on iCub
humanoid in a number of contexts related to action. We
refer the interested reader to Mohan and Morasso [51] for
detailed formal analysis and applications in the context of
whole body coordination, skill learning, and tool use.
Considering that the body schema also contributes toward
goal-directed reasoning and embodied simulation, in this
section, we briefly introduce the central ideas to the level of
detail necessary to build up sections on action development
and reasoning.
As seen in Fig. 5a, the body schema is characterized by
different body parts and end points (hands, legs, etc.)
available for connection in the context of ‘‘goal.’’ The links
between body parts are associated with a number of
degrees of freedom (highly redundant in a complex body).
Fig. 4 Ability to detect and resolve contradictions is built in at every
local network. While the results in Fig. 3 show how contradictions
can be inferred due to mismatch between ‘‘top down’’ and ‘‘bottom
up’’. Figure 4 presents results where contradictions are caused due to
mismatch between information coming from different sensory
modalities. Simply, show any infant a potato and say that it is an
apple, it should naturally be surprised. The first two examples show
similar situations with the robot. When presented with a green sphere
along with the word green container inputted by the teacher, there is
saliency in the shape, hub, and word maps. Note that contradictions
are detected locally; in other words, the robot infers that there is
something green that correlated with the color perceived visually and
the word uttered by the teacher, but it also infers that the there is
contradictions in the shape and word (it anticipates should be
associated with the presented object and what the teacher calls it to
be). In the second example, all maps detect saliency. In this sense,
global saliency of a network is the cumulative sum of local saliencies
of individual members. Greater the global saliency, smaller is the
bifurcation parameter greater is the urgency to learn by switching to
attention-guided exploration (Color figure online)
Cogn Comput (2013) 5:355–382 367
123
In simple terms, the idea behind PMP is that such a schema
can be animated by attaching/detaching ‘‘force fields’’ to
one or more body parts in a ‘‘task-specific’’ fashion. The
animation process is analogous to coordination of a mari-
onette by means of attached strings: as the puppeteer pulls
the task-relevant effector to a target (or along a specific
trajectory), the rest of the body elastically reconfigures so
as to allow the motion to be simulated internally. The idea
is that such simulation process can characterize both
‘‘covert and overt’’ actions. PMP framework has features
that distinguish it from other leading approaches in com-
putational motor control like optimal feedback control
framework and Equilibrium point hypothesis (this has been
discussed in detail in a recent review Mohan and Morasso
[51]). PMP networks are assembled on the fly and operate
in a local, distributed, multi-referential, and goal-directed
fashion. Figure 5b shows the PMP network coordinating
the upper body (left arm-waist-right arm) chain of iCub,
which is relevant for tasks addressed in this paper. As seen,
the network is grouped into the different motor spaces
involved (in this case end effector, arm, and waist). Each
motor space consists of a displacement (blue) and force
node (pink) grouped as a work unit. Vertical links (purple)
within each work unit denote the impedance (stiffness K
and admittance A), while horizontal links (green) between
two work units denote the geometric transformation
between them (Jacobian: J). Note that the links do not carry
information, like in a block diagram, but a combination of
force and motion, that is, (computational) energy. There
are two additional nodes ‘‘sum’’ and ‘‘assignment’’ that add
or assign (forces or displacements) between different ‘‘sub-
networks’’ (in this case connection between tow arms and
the waist). The resulting network is fully connected; con-
nectivity articulated in a fashion that all transformations are
‘‘well posed.’’ This reduces computational cost because it
circumvents the need for kinematic inversions and cost
function computation (as in optimal control approaches).
Starting from an equilibrium state, goals (when switched
‘‘ON’’) basically inject virtual elastic energy in the net-
work, eliciting a reconfiguration of the internal DoFs to a
new equilibrium. The goal can be point attractor (like in
reaching) or moving point attractor (virtual trajectory) as in
the case of handwriting, tool use, etc., where specific
motion trajectories have to be created using the desired end
effector/tool. The dynamics of the network evoked by the
activation of a goal (xT) is equivalent to integrating non-
linear differential equations that, in the simplest case of just
the right arm network and with no additional task-specific
constraints, takes the following form:
_x ¼ J Ar JT Kr xT � xrð Þ ð3Þ
whenever a network like in Fig. 5b is trigged with a goal,
we get four sets of trajectories (as a function of time): (1)
trajectory of joint angles given by the position node in the
joint space (arm and waist); (2) the resulting consequence,
that is, the trajectory of end effectors given by the position
node in end effector space (hands, tools, etc.); (3) the tra-
jectory of torques at the different joints (arm and waist),
given by the force node in the joint space; (4) the resulting
consequence, that is, the trajectory of forces applied by the
end effector given by the force node in the end effector
space. Hence, PMP networks naturally form forward/
inverse models (we always get the motor commands to
coordinate a redundant body and at the same time get the
resulting consequence). If motor commands obtained by
this process of PMP simulation are relevant in the context of
the goal, they can be fed to the actuators and the robot
reproduces the movement. Otherwise, the information
related to consequence of the action predicted by the for-
ward model serves as valuable ‘‘internal event’’ for goal-
directed reasoning. It is here that PMP diverges from
Equilibrium point hypothesis. In EPH, the attractor
dynamics that underlies production of movement is attrib-
uted to the elastic properties of skeletal neuromuscular
system. But this contradicts with emerging results from
neuroscience that both real and imagined actions activate
similar neural substrates in the motor cortex, importantly
covert actions not activating the neuromuscular apparatus.
PMP on the other hand posits that even real actions are a
result of an internal simulation, using similar attractor
dynamics like posited by EPH but at a cortical level. This
could explain the similarity of real and imagined move-
ments because, although in the latter case the attractor
dynamics associated with the neuromuscular system is not
operant, the dynamics due to the interaction among other
brain areas are still at play. If actions generated by anima-
tion of the body schema are perceived to be ‘‘useful,’’ they
can be executed (as motor commands are always synthe-
sized at the intrinsic space in any simulation). Otherwise,
prediction of the forward model is a crucial event to drive
goal-directed reasoning. In this sense, PMP can be consid-
ered a generalization of EPH from action execution (‘‘overt
actions’’) to action planning and reasoning about actions
(‘‘covert actions’’). At the same time, it solves the degrees
of freedom problem [7] and abstracts the complexity of the
‘‘body’’ to the higher-level cognitive networks. As action-
related goals are ‘‘switched on,’’ what we get by the ani-
mation of the body schema is both the motor commands to
‘‘execute’’ a specific movement (inverse model) and at the
same time information on ‘‘feasibility, consequence, and
usefulness’’ of potential movements (forward model).
Since force fields are additive, multiple task-specific
constraints can be incorporated into the PMP relaxation at
run time through superposition of multiple force fields. A
constraint in the extrinsic space could be an obstacle to
avoid, achieving the proper wrist pose to perform an action
368 Cogn Comput (2013) 5:355–382
123
(for example, while grasping or pushing an object); con-
straints in the intrinsic space mainly relate to taking into
account the limited range of motion of a joint, the joint
power, etc. This issue has been dealt with formally with
several experiments on iCub in a recent article [51] and
hence is not reiterated here. To summarize, we can see PMP
as a mechanism of multiple constraints satisfaction, which
solves implicitly the ‘degrees of freedom problem’ without
any fixed hierarchy between the extrinsic and intrinsic
spaces. The constraints integrated in the system are task-
oriented and can be modified at run time as a function of
performance and success. The relaxation implied by the
Fig. 5 a A graphical representation of the body schema with various
end effectors available for connection with tools, force fields, and
targets. b The network implementation of such a body schema to
coordinate upper body of the humanoid iCub: Work unit: Force node
(pink) plus a displacement node (blue); Geometric causality repre-
sented by Jacobians (green), Elastic causality represented by Admit-
tance and Stiffness (light blue), Branching nodes (black), Timing
signal (yellow). The goal can be point attractor (like in reaching) or
moving point attractor (virtual trajectory) like in the case pushing,
handwriting, use of tools, etc. The application of the goal causes
incremental elastic reconfigurations in the network analogous to the
coordination of a marionette with attached strings. Panels c–e show
the initial condition, end effector trajectories and the final solution
when the network of panel b is used to generate a bimanual reaching
action coordinating the upper body of the robot. This is a multi-
referential system of action representation and synergy formation,
which integrates a Forward and an Inverse Internal Model (Color
figure online)
Cogn Comput (2013) 5:355–382 369
123
PMP model does not require the target to be fixed. It works
as well with moving targets. In this case, the ‘‘attractor
field’’ becomes an ‘‘attracting wave,’’ with a moving equi-
librium point (also in the case of pushing an object to a goal
location as we will see in the next section). In human
experiments also, there is some evidence of moving equi-
librium points in perturbation experiments [65].
Connecting Object, Action, and the Body: Learning
About Action and Simulating Action
In an embodied framework, ‘‘Actions’’ are mediated
through the ‘‘Body’’ and directed toward ‘‘Objects’’ in the
environment. Playful interactions with objects give rise to
sensorimotor experience, learning, and ability to reason.
Thus, the need to connect ‘‘object,’’ ‘‘action,’’ and the
body/body schema. The scheme is shown in Fig. 6 and
directly builds up on the ‘‘object’’ related ‘‘small world’’
created in ‘‘Naming Games: Learning About Objects and
Simulating Perception’’ section. Note that there is a subtle
separation between representation of actions at an abstract
level (‘‘what all can be done with an object/tool’’) and the
procedural memory related to the action itself (‘‘how to
do’’). While the former relates to ‘‘affordance’’ of an
object, the latter relates to the ‘‘skill’’ of using an object.
The abstract layer forms the ‘‘connector hub’’ and consists
of single neurons coding for different actions like reach,
grasp, push, use of different tools, etc., and grows with time
as new skills are learnt. Single neurons in the connector
hub in turn have the capability to trigger the procedural
memory network responsible for generating the action they
code for. Connector hub in the object space and action
space is connected. All the connections are meant to
Fig. 6 As seen Fig. 6 builds up on Fig. 2 by adding new networks
related to ‘‘action’’ and ‘‘body schema.’’ The connectivity and
information flows between ‘‘object,’’- ‘‘action,’’- and ‘‘body’’-related
networks are shown. The information flow is inherently bidirectional
and characterized by ‘‘dual dyad’’-type connectivity. There is a subtle
separation between representation of actions at an abstract level and
the procedural memory network related to the action itself. The
abstract layer forms the ‘‘connector hub’’ in the action space and
consists of single neurons coding for different actions at an abstract
level (like reach, grasp, push, tool use, etc.). The abstract action layer
is similar to ‘‘canonical neurons’’ found in the pre-motor cortex that
are activated at the sight of objects to which specific actions are
applicable. Note that these single neurons do not code for the action
itself but instead have the capability to trigger the complete procedural
memory network responsible for generating the plan to execute the
concern action. All connectivity between various networks is learnt by
explorative sensorimotor experience. Specific functions of various
layers are summarized in the figure (Color figure online)
370 Cogn Comput (2013) 5:355–382
123
develop by experience. Connectivity is of the ‘‘dual dyad’’
type, hence allowing bidirectional flow of information
between different neural maps. As a simple example, con-
sider that an object is presented to the robot. Assume for
the sake of discussion that the robot has some past expe-
rience with it too. Then, information from the sensory layer
activates various property-specific maps, their provincial
hubs, finally leading to distributed activity in the object
connector hub (which is a multimodal representation of the
object: like in Figs. 2, and 3). Assuming that the robot
already had experience of performing different actions on
this object, the activity in the ‘‘connector hub’’ of the action
layer basically codes for what all high-level ‘‘actions’’ can
be done with this object (more the robot learns more will
be the possibilities to exploit an object). In this sense,
single neurons in the top level ‘‘action connector hub’’ are
similar to ‘‘canonical neurons’’ found in the pre-motor
cortex (of monkeys and humans) that are activated at the
sight of objects to which specific actions are applicable. At
the same time, the detailed knowledge itself is learnt/rep-
resented in specialized procedural memory networks which
are triggered by neurons at the action connector hub.
The neural connectivity between top-level object and
action hubs is learnt gradually as the robot tries out and
learns what actions are possible with an object. This can be
due to explorative interaction (for example, a small red
cylinder may be reached, grasped, moves in a specific way
pushed, etc.) or by observing and imitating a teacher like
while learning to maneuver various tools [52]. Importantly,
motor repertoire is gradually built up by interacting with
various objects. As a last step, actions have to be ultimately
executed by the body and for this we must synthesize the
motor commands in the task-relevant body chain. This is
accomplished by the link between the ‘‘procedural memory
layer’’ and ‘‘body schema.’’ Action plans (or virtual tra-
jectories) synthesized by the procedural memory networks
serve as attractors to the ‘‘body schema,’’ hence triggering
the PMP simulation in the task-relevant body network. As
an example, if the task is to rotate a lever, the desired
trajectory of motion in the extrinsic space is planned by the
procedural memory network. This acts as a moving point
attractor to the task-relevant body network of the PMP (for
example, the right hand-waist chain). PMP simulation
gives out the motor commands which if sent to the actua-
tors produces the desired action. Basic actions like reach,
grasp, directed search through vision, use of one tool (a toy
crane) to pick up unreachable objects are presently func-
tional with a reasonable level of accuracy. So to go even
deeper inside the scheme presented in Fig. 6, in the next
section, we describe how the robot learns a new and fairly
important multipurpose action ‘‘pushing’’ (and inversely
learning to predict how objects move when forces are
applied on them).
The Pushing Sub-Network
‘‘Pushing’’ is an interesting action investigated extensively
in studies related to understanding of ‘‘physical causality’’
in primates and infants [77, 84]. In addition to the multiple
utilities of the ‘‘push/pull’’ action itself in manipulation
tasks, what makes it significant is the sheer range of
physical concepts that have to be ‘‘learnt’’ and ‘‘abstrac-
ted’’ in order to execute this action successfully. For
example, it has to be learnt that contact is necessary to
push, object properties influence pushability (balls roll
faster than cubes, etc.), pushing objects gives rise to path of
motion in specific directions (the inverse applies for goal-
directed pushing), pushing can be used to support grasping,
bringing objects to proximity, there can be counterforces
that block the pushed object (similar to a goal keeper). The
requirement to capture/learn such a wide range of physical
concepts through ‘‘playful interactions’’ with different
objects makes this task both interesting and challenging.
Different objects move in different ways when force is
exerted on them, some do not move too. By interacting with
various objects, the goal of the robot is to learn a general
forward/inverse model for ‘‘pushing action’’: that is, being
able to predict how an object will move when pushed
(forward model) and being able to generate goal-directed
pushing actions in order to displace an object to a desired
location. Figure 7 zooms into the push sub-network as
connected to the rest of the system (other neural maps,
hubs) and body schema (PMP). To begin, when presented
with any ‘‘object,’’ different property-specific neural maps
are activated bottom up leading to a distributed represen-
tation of the concerned object in the object connector hub
(as described in ‘‘Naming Games: Learning About Objects
and Simulating Perception’’ section). Since object proper-
ties influence pushing, activity in the object connector hub
influences the pushing forward/inverse model and hence is
bidirectionally connected to it (connectivity learnt by
experience). As seen in Fig. 7, the pushing system is rep-
resented using two neural maps: one that is a growing SOM
learning ‘‘average displacement of an object per unit force’’
and the second that represents a distributed coding of
direction in which the object is moving (there is ample
evidence from studies in neuroscience that such directional
coding exists in the brain and serves many purposes). We
shall justify the choice of this representation shortly, but
before that we wish to summarize the process by which the
robot gains experience (which basically precedes repre-
sentation and learning).
Figure 8 (left panel) zooms further into the information
flows and connectivity that is learnt while playing with just
‘‘one: object. The pushing SOM is empty to start with and
gradually grows as the robot interacts with various objects.
The new connectivity that needs to be learnt are the ones
Cogn Comput (2013) 5:355–382 371
123
between the object connector hub and neurons in the
pushing SOM (‘‘W’’) and the internal weight (Pi) of each
neuron that represents the ‘‘average displacement per unit
force’’ of the object it is representing. The former relates to
perception (as information comes from sensory channels to
activate the connector hub) while the latter relates to action
(or effect of force on the displacement of an object). Note
that both these learnt elements (W and P) complement each
other, that is,
1. Observing a motion, the robot must be able to
anticipate which object it is (by activating ‘‘top down’’
the object connector hub and hence the property-
specific SOMs and the name of the object) and
2. Inversely if it is necessary to ‘‘push’’ a given object to
a target location, the robot must be able to estimate the
force it needs to exert in which direction to realize the
goal. For every novel object the robot is interacting
with, the neural connectivity is learnt as per the
following steps:
1. Growth in the Pushing SOM: To start with the robot
presented with an ‘‘object’’ This leads to ‘‘bottom up’’
activation of different perceptual maps ultimately culmi-
nating in a distributed representation of the concerned
object in the object connector hub (for novel unencoun-
tered objects learning and growth takes place at the per-
ceptual level too as described in ‘‘Naming Games:
Learning About Objects and Simulating Perception’’ sec-
tion based on saliency). For every novel object, we grow
one neuron in the pushing SOM which codes for the
influence of object properties in relation to the ‘‘Pushing
Fig. 7 Left panel zoom into the push sub-network as connected to the
rest of the system (object connector hub) and body schema (PMP).
When presented with an object, the different property-specific neural
maps are activated bottom up leading to a distributed representation
of the object connector hub (as described in chapter 2). Since the user
goal is to learn to push, the push sub-network is activated (empty to
begin with as there is no experience or knowledge). The push sub-
network is represented using two neural maps: one that is a growing
SOM learning ‘‘average displacement of an object per unit force’’ and
the second that represents a distributed coding of direction in which
the object is moving. The former map is empty to start with and
gradually grown as the robot interacts with different objects, growth
only taking place if there is a contradiction between ‘‘the robot
anticipation of how an object might move’’ and ‘‘how it actually
moves in reality.’’ All connections indicated with ‘‘L’’ are learnt from
the scratch. Information flow is bidirectional meaning that it is
possible to move to the object hub from the pushing action network,
and if the connector hub is active, it is possible to trigger he property-
specific maps (as seen in the previous chapter). The right panel how
goal-directed pushing actions are generated through incremental
iterations between the pushing SOM and the direction between the
pushed object and the goal. This gives rise to a virtual trajectory that
serves as an attractor to the action generation system (see text for
details) (Color figure online)
372 Cogn Comput (2013) 5:355–382
123
action.’’ More details on subtleties regarding growth in the
pushing SOM will follow in the next step.
2. Learning Connectivity Between Object Connector
Hub and Pushing SOM (W): Let ‘‘i’’ be the neuron in the
pushing SOM instantiated to represent the behavior of a
novel object presented and xj be the instantaneous activity
of the jth neuron in the connector hub (determined as per
dynamics of Eqs. 1 and 2). Let Wji be the connection
between j neuron in the connector hub and ith neuron (i.e.,
the new neuron) in the pushing SOM. Then, the learning
rule is as follows: for all active neurons in the connector
hub, that is, if xj [ xThreshold, make Wji = 1. For all cases,
we took xThreshold as 0.86. This has the net effect that any
time in future if either the same object or a similar object is
presented, it will end up activating the ith neuron in the
pushing SOM. The word ‘‘similar’’ is relevant here
because note that activity in the ‘‘connector’’ hub itself is a
result of activity in the property-specific maps. So if the
robot has experienced and learnt how a red cube behaves
when pushed and then later it is presented with a ‘‘blue
cube,’’ still the neuron in the pushing SOM that learnt the
‘‘red cube’’ will be active and will be in a position to
‘‘anticipate’’ the behavior of the novel object. This is
because both ‘‘red cube’’ and ‘‘blue cube’’ will activate
some common neurons in the connector hub (because of
similarity), and the activity of such common neurons is
sufficient to activate neurons in the pushing SOM (because
of connectivity learnt in the past). This implies that by
interacting with the red cube, the robot also has some
capability to ‘‘predict’’ how a blue cube might move when
pushed. If the top-down prediction is same as the observed
behavior, there is no need to grow the pushing SOM fur-
ther, the neuron coding for the red cube also codes for the
blue cube. Since there is no contradiction between antici-
pation and observed behavior, there is no need to grow the
pushing SOM further. Growth in pushing SOM takes place
when there is a contradiction between ‘‘anticipated’’ and
‘‘observed’’ behavior (simply this indicates that the robot
has either no information or incorrect information about the
object being manipulated).
3. Learning Average Displacement of an Object Per
Unit Force (P): So far, we described the learning of bot-
tom-up connectivity between connector hub and pushing
SOM that basically code for perceptual properties of the
object the robot is interacting with. Next we need to learn
the ‘‘action’’-related effects that are coded by the internal
weight Pi for the ith neuron in the pushing SOM. To learn
‘‘Pi’’ that basically estimates ‘‘displacement of an object
per unit force exerted on it,’’ the robot has to act on the
object. So the robot is allowed to exert force on different
objects in different directions randomly (see Fig. 8), at the
same time visually observing the consequence, that is, the
displacement of the object as a result of exerting unit force.
Unit force is approximated as an ‘‘attempted’’ movement of
the deployed end effector by 5 cm. We clarify ‘‘attempted’’
movement ‘‘step by step’’ further because this is nontrivial
Fig. 8 Bottom panel some examples of robot gaining experience. In
addition to different kinds of objects, different end effectors were also
used for pushing the same objects (right hand, left hand, and a ‘‘long
stick’’ as an extension of the arm). Such diverse experience is needed
to learn that ‘‘end effectors’’ used also do not really matter as far as
the causal behavior of the object is concerned’’ using a novel learning
rule that will be proposed in the next section. Further use of tools as
an extension to the arm to push or pull a food reward is one of the
widely investigated cases of tool use in animal behavior (Color figure
online)
Cogn Comput (2013) 5:355–382 373
123
and relates to explorative action generation using the PMP
mechanism. So firstly, the robot reaches the object of
interest (using the network of Fig. 5b the coordinates the
upper body). If reaching is successful (error between goal
and forward model prediction is zero), a virtual trajectory
(a straight line in this case) is synthesized from the current
position of the end effector to a point 5 cm away in a
randomly chosen direction (we chose 8 directions, that is,
with 45� separation as seen in Fig. 7). This virtual trajec-
tory acts as a moving point attractor to the end effector
performing the explorative action, hence causing the end
effector to follow it like the pull of the puppeteer that PMP
mechanism computationally emulates. While the robot
executes this action, it is also physically interacting with
the object it had reached previously and the intrinsic
properties of the object now begin to influence how suc-
cessful the end effector is in following the virtual trajec-
tory. For objects like ‘‘balls’’ as the end effector moves, the
object of interest goes quite far away (perceived and
localized through vision), small cubes move rather uni-
formly in correspondence with the force exerted, for some
other objects like a heavy box, there is relatively small
displacement and so on.
Hence, ‘‘displacement per unit force’’ basically mea-
sures the ‘‘mobility’’ of the object when a certain amount of
force is exerted on it. Inversely, this information allows the
robot to predict how an object will move when force is
exerted on it (useful while generating goal-directed push-
ing). For every object presented, the robot is allowed to
explore displacing it to a distance of 15 cm (i.e., 3 itera-
tions of application of unit force) in eight different direc-
tions. Averaging the result of this experience, the
parameter Pi for the neuron ‘‘i’’ coding for a particular
object is estimated. Cubes, cylinders, and balls of different
colors and sizes (some heavy ones) encountered by the
robot previously while learning their names (‘‘Naming
Games: Learning About Objects and Simulating Percep-
tion’’ section), few MECCANO blocks (from the MEC-
CANO 2 ? kit for 2-year-olds) we presented gradually.
Figure 8 (right panel) shows some examples of robot
gaining experience. Different end effectors were also used
for pushing the objects (right hand, left hand and also a
‘‘long stick’’ as an extension of the arm). Such diverse
experience is needed to learn that end effector used also
does not really matter as far as the causal behavior of the
object is concerned when pushed (same is the case of color,
but properties like shape and size do matter). This issue of
additionally learning what are the ‘‘causally dominant
properties’’ relevant in a particular task is work in progress.
Still, while gaining experience, we opted to subject the
robot to a diverse set of experiences so that the acquired
sensorimotor data can be utilized to explore other questions
(the discussion section on ongoing work deals with these
issues).
4. Pushing in a goal-directed fashion: Finally, how does
the distributed coding of direction and a growing SOM
learning ‘‘average displacement of an object per unit force’’
generate goal-directed pushing? Consider that the robot has
to push some object from its initial position to a desired
location.
The sequence of network activations is illustrated in the
right panel of Fig. 7. We clarify the loop in detail below.
Given an object, firstly, there is a distributed representation
of the object in the connector hub. Activity in the connector
hub triggers the neurons in the Pushing SOM (through the
connectivity matrix ‘‘W’’) that code for the learnt behavior of
the object. In case there is no activity in the pushing SOM, it
implies that the object has not been experienced before and
we go back to step 1 (note that we also go back to step 1 in
another case too, that is, if there is a contradiction between
the predicted and observed behavior of the object when
pushed, because this indicates that more exploration is nee-
ded). This is the static part (bottom up) of the goal-directed
pushing phase (because the object being pushed remains the
same, within the scope of the issued goal). The dynamic/
interactive phase is a closed loop between ‘‘perception–
prediction–action’’ and consists of the following sequence:
(4a) Detect and localize the current position ‘X(x,y,z)’ of
the object and the target ‘XT(xT,yT,zT)’ (where the
object has to be displaced). This process involves
detection of the object through vision (i.e., what) and
3D reconstruction of the location in the egocentric
frame of reference of the robot (i.e., where). The 3D
reconstruction algorithm to localize the object and
the target position is based on direct linear transform
[66] and has been learnt through a motor babbling
process [54] for details).
(4b) Compute the desired direction ‘‘h’’ to push using
information on X and XT; this activates the neurons
in the motor map responsible for directional coding.
Based on the instantaneously computed direction,
we also see distributed activity the 8 neurons coding
for different directions (see Fig. 9 top panel for 3
such cases).
(4c) If Ai is the activation of the ith neuron in the pushing
SOM and Pi is the internal weight representing
‘‘displacement per unit force’’ learnt by the ith
neuron, then the average predicted mobility of the
object for an incremental iteration where unit force
is applied on it can be computed as P =P�Ai�Pi.
(4d) Compute the next ‘‘virtual target: VT’’ where the
end effector must be such that the object moves to
the predicted location P. To start with, VT is
374 Cogn Comput (2013) 5:355–382
123
initialized as the starting location of the object being
pushed, that is, X(x,y,z), but diverges later in time as
seen in Fig. 9 (because VT also depends on the
mobility of the object being pushed). We can
decompose it into three components along the x, y,
and z axes (z component same as the initial condition
as pushing is learnt/executed on a planar surface
with infinitesimal effect of gravity):
VTx ¼ VTx þ 1=Pð Þ cos hð ÞVTy ¼ VTy þ 1=Pð Þ sin hð ÞVTz ¼ X zð Þ;
ð4Þ
(4e) Compute the incremental predicted displacement of
the object if the end effector is displaced to a
location estimated by the virtual trajectory as per
Eq. 4. Go to step (b) till the time the predicted
location of the object is close to the goal (\1 cm).
Instead of we may also choose to feed the next
computed position of the virtual target to the PMP
system to move the designated arm to the next
incremental location, go to step (a) of visual tracking
and continue. However, it is computationally
expensive to involve vision in each incremental
step (consider a foot ball player hitting a penalty
kick, he sees the goal post, synthesizes a trajectory,
and executes the kick. With robots we do have the
chance to go back to step (a) and recompute again).
So we chose to iterate steps ‘‘b–c–d–e’’ till the time
the predicted location of the object is close (\1 cm)
to the target.
Xx ¼ Xx þ P cos hð ÞXy ¼ Xy þ P sin hð ÞXz ¼ X zð Þ;
ð5Þ
In this way, we basically move from a ‘‘virtual target’’ to
a ‘‘virtual trajectory.’’ Secondly, the predicted end effector
position need not follow the virtual trajectory as it also
depends on the mobility of the object itself (i.e., the learnt
parameter P). This is clearly shown the three different
cases of Fig. 9. While pushing a ball, the end effector
needs to be displaced just by a small amount along an
estimated virtual trajectory (green trajectory) with an
expectation that it should be enough to send it to the
target location. Cubes move more uniformly with the
displacement of the end effector and have to be pushed to
the destination. For large and heavy objects (we took a box
with a bottle of water inside it), as seen in Fig. 9, the
planned virtual trajectory goes beyond the goal position
because much greater force needs to be exerted (or P � 1).
In sum, the loop ‘‘b–c–d’’ generates two sets of trajectories:
(1) the predicted trajectory in which the object will move
toward the goal and (2) the desired motion of the end
effector (or the virtual trajectory) to generate the action.
Fig. 9 Right panels the virtual trajectories (attractors) and real
trajectories during goal-directed pushing of a cube, ball, and a large
container. Activity in the neurons responsible for distributed coding
of direction during the synthesis of the motor actions is shown at the
top for all three cases (see text for details). While pushing a ball, the
end effector needs to be displaced just by a small amount along an
estimated virtual trajectory (green trajectory) with like kicking a
football to the goal. Cubes move more uniformly with the displace-
ment of the end effector and have to be pushed gradually to the
destination. For large and heavy objects (a box with a bottle of water
inside it), the planned virtual trajectory goes beyond the goal position
because much greater force needs to be exerted (Color figure online)
Cogn Comput (2013) 5:355–382 375
123
The final step (e) is to synthesize the motor commands and
execute the action. For this, we feed the virtual trajectory
synthesized in step (d) as an attractor to the relevant body
chain of the PMP system (Fig. 5b). Motor commands syn-
thesized are transmitted to the actuators to generate the
movement of pushing (along a smooth planned trajectory)
with the designated end effector. Perceive the consequence
through vision (steps a–b) to evaluate whether the object is
close to the target (most often it is). The process (a–e) is like
smooth sliding of an object along a planned trajectory
(sometimes less force is needed like in the case of the ball, but
sometimes we need to take it to the destination with a lot of
force like the large container like depicted in Fig. 9). Note
that, time also is implicitly represented in Fig. 9. This is
because every ‘‘dot’’ in the virtual trajectory represents iter-
ation in time: the virtual trajectory is very short while pushing
the ball, almost uniform while pushing the small cube and
longer when pushing the large container. This resonates with
our own experiences, to slide a ball to a target location takes
less time than pushing a heavy cylinder (because intrinsic
properties of the object like shape and mass influence the
mobility of an object).
To summarize, in ‘‘A Body Schema for Cognitive
Robots: ‘‘Why and What’’’’ and ‘‘Connecting Object, Action
and the Body: Learning About Action and Simulating
Action’’ sections, we started with the description of the body
schema for DARWIN robots and its various functions within
the cognitive architecture. The small-world framework for
distributed organization and learning of object concepts was
extended in a way to connect ‘‘object, action, and the body
schema’’ retaining top-down/bottom-up information flows
and small worldness. Finally, we showed how a forward/
inverse model for pushing is learnt by the robot by explor-
ative interactions with different objects in its playground.
The inverse problem of generating motor actions to push a
given object to a desired location was also summarized with
results. The system always generates two kinds of trajecto-
ries (for a large set of experienced objects) (Fig. 9): (1) the
virtual trajectory to push the object to a desired location that
basically acts as a moving point attractor to the body schema
to synthesize motor commands for the body chain per-
forming the action and (2) the predicted trajectory in which
the object will move as a consequence of pushing. The latter
is also a crucial piece of information for goal-directed
reasoning.
Simulating ‘‘Perception’’ and ‘‘Action’’ in the Context
of a ‘‘Goal’’
Since the DARWIN robot has gradually developed basic
capabilities to perceive objects, name them (with the help
of teacher/user input), generate primitive actions (like
reach, grasp, push, etc.), and anticipate the sensory motor
consequences of such actions, recently we introduced the
robot to ‘‘make and break’’ tasks using the MECCANO 2?
toy kit (it is a toy set of 2- to 3-year-old infants). The toy kit
basically consists of various building blocks using which
‘‘composite’’ objects can be assembled. In this section, we
present a rather simple scenario from the MECCANO
assembly task to illustrate how simulation of ‘‘perception
and action’’ enables the robot to generate a novel combi-
nation of actions in order to realize an otherwise unreal-
izable goal. As seen in panel 10 A, the task is to insert
‘‘object 1’’ (the face attached to a screw) into ‘‘object 2’’
(that has a hole where the screw can be inserted), to
assemble a new composite toy. The standard sequence
consists of two actions: pick up object 1 and insert it into
object 2. However, standard sequences apply to ‘‘well-
defined’’ environments (like a fully programmed industrial
‘‘pick and place’’ set up). In an unstructured world, the
complexity of the environment under which the goal needs
to be realized plays a significant role in the causal sequence
of actions a cognitive agent must generate to realize its
goals. Standard sequences very often may not work.
• Firstly, there is a need to infer this without blindly
executing the standard/default action plan.
• Secondly, in such cases, cognitive agents must effec-
tively use their past experiences to go beyond experi-
ence and generate novel behaviors to realize the goal
or learn something new if unsuccessful. A simple
scenario of this kind and how simulation of perception
and action enables the robot to infer how its world
should be ‘‘causally’’ transformed such that it becomes
little bit more conducive toward realization of its goal
is the subject of discussion in this section.
As seen in Fig. 10a, both objects are randomly placed at
different locations (inside the visual workspace of the stereo
cameras). To begin with, given any goal, the robot first
visually explores its locally available environment to gather
information about the objects that are present and what all
can be done with them. This process basically involves
focusing attention on various objects, activating bottom up
the various neural maps (related to color, shape, etc., in
Fig. 6) ultimately leading to a distributed representation of
the object in the connector hub (indicated as ‘‘what is it’’ in
Fig. 6). Figure 10e shows the running loop of visual pro-
cessing related to identification and localization of objects in
the scene as action takes place. Neural activity in the object
connector hub in turn causes activations in the single neu-
rons in the action hub coding for various motor actions
experienced with the object in the past (indicated as ‘‘what
all can be done with it’’ in Fig. 6). As mentioned in ‘‘A Body
Schema for Cognitive Robots: ‘‘Why and What’’’’ section,
neurons in the action hub are like the canonical neurons
376 Cogn Comput (2013) 5:355–382
123
Fig. 10 a–d The task is to insert ‘‘object 1’’ (face) into ‘‘object 2’’ (body),
to assemble a new composite object. b The first 3 virtual actions using the
network of Fig. 5b. In simulations 1 and 2, the robot infers that though the
‘‘face’’ is directly reachable with the right arm, the ‘‘blue body’’ is located so
far that inserting it will not be successful. At the same time, the left arm
network is not coupled to any goal, so is available as a ‘‘tool’’ that could be
exploited. Coupling part 2 as a ‘‘goal’’ to the available left arm, the robot can
infer that it is indeed reachable by the left arm. Exploiting the knowledge of
pushing (learnt in the past and a feasible action here), the robot infers that if
part 2 is slowly displaced close to the ‘‘face,’’ it then becomes reachable by
the right hand and hence allowing the possibility of realizing the goal (2c) in
such an altered world. d The full combination of real and virtual actions that
basically enable the robot to infer how the world can change through ones
actions hence make it more conducive toward realization of its internal
goals. e–j The sequence of actions initiated by the robot to realize the goal
along with perceptual feedback (Color figure online)
Cogn Comput (2013) 5:355–382 377
123
found in the pre-motor cortex that are activated at the sight of
objects to which specific actions are applicable. While the
‘‘object 1’’ affords reach and grasp actions, the ‘‘object 2’’
affords reach and push action (the robot has indeed experi-
enced pushing the blue MECCANO blocks and learnt a
forward/inverse model of how the object moves when force
is exerted on it: see Fig. 8). This information is stored in the
working memory. At present, the working memory of
DARWIN robot is fairly simple and keeps track of objects in
the world, their spatial locations (in the egocentric frame of
reference), feasible actions on the objects (activity in the
action hub), and status of utilization of body parts (mainly
end effectors that couple to goals during manipulation).
Such a WM structure does suffice for simple scenarios in the
early stages of development of the robot and we hope to
expand the WM further in the future in line with recent
developments [56].
Once the information about the available world is cap-
tured and stored in the WM, the robot initiates internal
simulation of the default plan. Using the body schema net-
work for the iCub upper body (Fig. 5b), the robot internally
simulates the standard sequence of assembly (i.e., picking up
the face and inserting it in the body) and its resulting con-
sequence (given by the forward model). These are virtual
actions 1–2 of Fig. 10b. As seen from the simulated right-
hand trajectories (1–2), the robot infers that though the
‘‘face’’ is directly reachable with the right arm, the ‘‘blue
body’’ is located so far from reach that inserting it will not be
successful. This leads to the inference that the goal cannot be
directly realized (there is a large error between the attempted
goal and predicted forward model consequence). At the
same time, the left arm network is not coupled to any goal, so
is available as an additional degree of freedom (or tool) that
can be used. Coupling part 2 as a goal to the left arm net-
work, the robot can infer that the object 2 is indeed reachable
by the left arm (virtual action 3). Information in the working
memory indicates that pushing is a feasible action supported
by object 2 (and was experienced in the past: Fig. 8). Now
the robot exploits its knowledge of pushing forward/inverse
model to infer how the world will change if ‘‘object 2’’ is
incrementally pushed (with the left hand). While the inverse
model gives the motor commands to generate goal-directed
pushing, the forward model gives the resulting consequence
(the predicted location of the object as a consequence of
pushing). The predicted consequence of pushing is shown as
virtual action 4 in Fig. 10c. The result of this simulation is an
‘‘imagined environment’’ that allows the goal to be realized
(simulated action 5 shows that the screw can indeed be
assembled to the body in such a modified environment). In
sum, simulated actions 1–5 basically allows the robot to
infer that while the default plan will not work, it is indeed
possible to causally transform the world such that it becomes
more conducive toward realizing the goal at hand. Several
subsystems involved in perception and simulation of per-
ception, body schema, action-related forward/inverse mod-
els, and task-specific working memory structures play a
synergetic role in leading to this inference. Panel 10d shows
the full combination of real (shown in yellow) and simulated
actions. The robot basically uses the left hand to slide the
‘‘body’’ close to the ‘‘face,’’ picks up the face with its right
hand, and inserts it into the body, hence assembling a
composite object and realizing the goal. Figure 10e–j show
snapshots of the real actions executed by the robot.
Summing up, the ability to reason, orchestrate thought
and action in accordance with internal goals, especially
when inhabiting an unstructured environment is a funda-
mental feature of any kind of cognitive behavior. Coher-
ently integrating the information from bottom up (sensory,
motor) and top down (memories of past learnt experiences,
simulations of various internal models, etc.), cognitive
agents often manage to swiftly exploit possibilities affor-
ded by the structure in one’s immediate environment to
counteract limitations (of perceptions, actions, and move-
ments) imposed by their bodies. This scenario is a simple
example from the initial phases of developmental curve of
the DARWIN that basically demonstrates the power of
embodied simulation in relation to generation of goal-
directed action in unstructured environments.
Concluding Remarks
Affordances are the seeds of ‘‘action.’’ Identifying and
exploiting them opportunistically in the ‘‘context’’ of an
otherwise unrealizable goal is a sign of cognition. The ability
to mentally manipulate the causal structure of their physical
interactions with their environments endows cognitive agents
with the capability to evaluate ‘‘what additional affordances’’
they can create in the world. This in turn enables them to infer
how the world must ‘‘change’’ such that it becomes a little bit
more conducive toward realization of their goals. A major
part of this process of transformation ‘‘from affordance to
action’’ and the inverse is a result of ‘‘inferences emerging
through embodied simulation.’’ Experiments related to
learning about objects, actions, and the underlying compu-
tational basis that endows DARWIN robots to demonstrate a
preliminary level of embodied intelligence were presented in
this article. The developmental curve of the DARWIN robot
started with simple tasks like learning to associate names of
objects (presented to it by the teacher) with their perceptual
properties (‘‘Naming Games: Learning About Objects and
Simulating Perception’’ section). The underlying computa-
tional framework incorporated several recent findings related
to large-scale functional organization of the cortex, ‘‘small-
world’’ properties, ‘‘dual dyad’’-type connectivity and pow-
ered by network dynamics that ensures ‘‘bottom-up, top-
378 Cogn Comput (2013) 5:355–382
123
down, and cross-modal’’ activation of various growing neural
maps. The computational advantages are numerous from
promoting functional segregation and global integration,
minimization of processing steps and efficient wiring thus
ensuring low metabolic cost, synchronizability, pattern
completion, and conflict resolution. We showed using various
examples (Figs. 2, 3, 4) how learning about objects parallelly
endowed the robot with the capability to ‘‘anticipate’’ about
novel objects, generate primitive actions (like reach, grasp) on
them, detect novelty and contradictions, and trigger new
learning and growth.
The need for cognitive robots to have a ‘‘flexible’’ body
schema and a possible computational implementation of it
was presented (Fig. 5). The PMP formalism gives rise to
‘‘growing’’ network implementations (forward/inverse mod-
els) of the body (and tool) being coordinated that importantly
operate in a local, distributed, multi-referential, and goal-
directed fashion. We suggest that the computational model for
the body schema developed in DARWIN basically acts like a
powerful ‘‘middle ware’’ that both interacts with lower-level
motor execution layers (that deal with the complexity/
redundancy of the body being coordinated: the degrees of
freedom problem of Bernstein [7] and higher-level reasoning
and cognitive layers (that deal with the complexity of the
world and goals that have to be accomplished, that is, when to
do what with which effector/tool and what is the resulting
consequence). In addition, it provides a shared computational
basis for ‘‘execution, imagination, and understanding’’ of
action for which there is resounding neurobiological evi-
dence. Action development in the DARWIN robots started
with primitive actions like pointing, reaching, grasping (still
in progress), recently extended to drawing/scribbling [54],
and learning to use simple tools by imitation [52]. The motor
repertoire was further extended in this paper by learning a
forward/inverse model of an important multipurpose action,
that is, ‘‘pushing’’ (Figs. 6, 7, 8, 9). Both the forward problem
of being able to anticipate how an object will move when
force is exerted on it and the inverse problem of generating
goal-directed pushing action in order to displace an object to a
desired location were learnt. Note that an expanding body
schema network basically blurs the distinction between tool
and the body during goal-directed coordination as known
from experiments on animals using tools [40, 72] and reso-
nates with the PMP framework [51]. Learning to use other
tools generally used for extension of reach, amplification of
force, coupling objects, etc., that are common in both indus-
trial and domestic environments are scheduled in the next
phase of developmental learning curve of DARWIN robots.
Before going toward these issues, our main emphasis was to
close the loop between perception, action, and reasoning in a
‘‘robust’’ manner (taking inspiration from biology).
An interesting feature in the proposed framework for
connecting ‘‘object, action, and body’’ is the fact that as we
move higher upwards information becomes more and more
integrated and multimodal and as we move downwards
information is more and more differentiated (to the level of
sensed properties). The underlying connectivity and dynam-
ics ensures that activations in any neural map can trigger a
complete network (both higher up in the hierarchy like hubs
or below like property-specific maps and procedural memory
networks for specific actions). We believe that this feature
essentially allows us to go beyond ‘‘object-action’’ to
‘‘property-action.’’ In other words, it enables the robot not
only to learn which actions apply to various objects and what
are their consequences (that we showed in the paper) but also
learn ‘‘which properties are causally dominant’’ while per-
forming specific goals and actions. As an example to clarify,
to wipe off a spider web on the top most corner of a room, it
does not matter if someone uses a red-colored broom or
yellow-colored broom or even a long stick. Any object that
has the specific property ‘‘length’’ will suffice. Similarly,
colors of objects do not affect the way they move when
pushed, shape does and size too (in specific ways based on
recent ongoing experiments with the robot). Objects really do
not matter; it is their properties that matter in the context of
realization of various goals. Note that while we are speaking
about properties, the robot is basically interacting with
‘‘objects’’ in the world. While the robot is playing with objects
gradually in time, how can it also learn and ‘‘pin down’’ which
properties are causally dominant in a particular task by
comparing multiple such playful interactions? Going beyond
object action while learning and interacting with objects
(gradually in time), we believe, has fundamental significance
in terms of analogical reasoning. Humans excel in making
analogies, and in many ways, it is the essence of their crea-
tivity [35, 36]. Even a simple stone may be used as a weapon,
as a paper weight, as a blockage to obstruct flow, as a building
block of a house and so on. Objects do not really matter, it has
their properties that do, and hence allow them to be exploited
for different purposes in different circumstances. Though
approaches for analogical reasoning exist in the literature [38,
45], they lack the embodied framework hence limiting their
reach in common unstructured worlds where neither every
object can be experienced nor everything is known precisely.
If a novel object has a property that supports a particular
action in the context of a ‘‘goal,’’ the robot must certainly
attempt to opportunistically exploit it. If it succeeds what we
will see is behavior that is ‘‘novel’’ and ‘‘creative.’’ A prop-
erty-specific distributed organization of perception and action
further endowed with small-world properties pushes for an
‘‘embodied approach’’ for analogical reasoning and experi-
ments are ongoing in this direction in the context of numerous
tasks like pushing, learning to build the tallest stack given a
random set of objects, use of tools for assembly, etc. These
issues will be a subject of discussion in the Darwin-related
articles in the near future.
Cogn Comput (2013) 5:355–382 379
123
To conclude, as noted by the neuroscientist Ramachandran
[60], with people from several disciplines tinkering around
with different open problems, cognitive science research is
right now entering into an exciting ‘‘Faraday era’’ in terms of
discovering the general principles related to the structure and
function of the ‘‘three pound jelly,’’ that on the first place
makes all these ‘‘tinkering and discovering’’ possible. Using a
humanoid robot equipped with state-of-the-art sensory and
motor capabilities playing, learning, and reasoning in a
moderately complex (and changing) world, we have attemp-
ted to develop and better understand computational principles
necessary to drive a cognitive robot to exhibit a preliminary
level of purposefulness, flexibility, and adaptability in its
behavior. The results presented are indeed from the early
stages of developmental curve of DARWIN robot, and cer-
tainly, vast oceans lie undiscovered in the quest to better
understand the ‘‘forces and the causes’’ that shape our ‘‘rea-
sons and actions’’ and make us as explorative, intuitive, cog-
nitive, expressive, emotional, irrational, unpredictable, and
conscious we really are! This often drove the curiosity of
Professor Taylor and looking for a computational basis for
cognitive computation grounded in the biology of the brain
would be the only fitting homage offered by younger gener-
ations to keep the essence of his teachings alive!
Acknowledgments The research presented in this article is sup-
ported by IIT (Istituto Italiano di Tecnologia, RBCS dept) and by the
EU FP7 project DARWIN (http://www.darwin-project.eu, Grant No:
FP7-270138). We are indebted to the anonymous reviewers for their
detailed analysis and suggestions to make the draft sharp and more
reader friendly. The authors also acknowledge the support of all teams
involved in the DARWIN consortium.
References
1. Addis DR, Schacter DL. The hippocampus and imagining the future:
where do we stand? Front Hum Neurosci. 2012;5. Article 173.
2. Addis DR, Pan L, Vu MA, Laiser N, Schacter DL. Constructive
episodic simulation of the future and the past: distinct subsystems
of a core brain network mediate imagining and remembering.
Neuropsychologia. 2009;47:2222–38.
3. Amari S. Dynamics of patterns formation in lateral-inhibition
type neural fields. Biol Cybern. 1977;27:77–87.
4. Barabasi AL (2003) Linked: the new science of networks. Bos-
ton: Perseus Books. ISBN-10:0738206679.
5. Barabasi A-L. The network takeover. Nat Phys. 2012;8:14–6.
6. Barabasi A-L, Albert R. Emergence of scaling in random net-
works. Science. 1999;286:509–12.
7. Bernstein N. The coordination and regulation of movements.
Oxford: Pergamon Press; 1967.
8. Bressler SL, Menon V. Large-scale brain networks in cognition:
emerging methods and principles. Trends Cogn Sci. 2010;14(6):
277–90.
9. Buccino G, Binkofski F, Fink GR. Action observation activates
premotor and parietal areas in a somatotopic manner: an fMRI
study. Eur J Neurosci. 2001;13:400–4.
10. Buckner RL, Carroll DC. Self-projection and the brain. Trends
Cogn Sci. 2007;2:49–57. [medline abstract].
11. Buckner RL, Andrews-Hanna JR, Schacter DL. The brain’s
default network: anatomy, function, and relevance to disease.
Ann N Y Acad Sci. 2008;1124:1–38. [medline abstract].
12. Bueti D, Walsh V. The parietal cortex and the representation of
time, space, number and other magnitudes. Philos Trans R Soc B
Biol Sci. 2009;364(1525):1831–40.
13. Caeyenberghs K, van Roon D, Swinnen SP, Smits-Engelsman
BC. Deficits in executed and imagined aiming performance in
brain-injured children. Brain Cogn. 2009;69(1):154–61.
14. Chiel HJ, Beer RD. The brain has a body: adaptive behavior
emerges from interactions of nervous system, body and envi-
ronment. Trends Neurosci. 1997;20:553–7.
15. Clark A. Being there: putting brain, body and world together
again. Cambridge: MIT Press; 1997.
16. Damasio A. Self comes to mind: constructing the conscious brain.
New York: Pantheon; 2010.
17. Decety J. Do imagined and executed actions share the same
neural substrate. Cog Brain Res. 1996;3:87–93.
18. Decety J, Sommerville J. Motor cognition and mental simulation.
In: Kosslyn SM, Smith E, editors. Cognitive psychology: mind
and brain. New York: Prentice Hall; 2007. p. 451–81.
19. Desmurget M, Sirigu A. A parietal-premotor network for movement
intention and motor awareness. Trends Cogn Sci. 2009;13:411–9.
20. Feldman J. From molecule to metaphor: a neural theory of lan-
guage. Cambridge, MA: MIT Press; 2006.
21. Frey SH, Gerry VE. Modulation of neural activity during
observational learning of actions and their sequential orders.
J Neurosci. 2006;26:13194–201.
22. Fritzke B. A growing neural gas network learns topologies. In:
Tesauro G, Touretzky D, Leen T, editors. Advances in neural
information processing systems. 7th ed. Cambridge, MA: MIT
Press; 1995. p. 625–32.
23. Gallese V, Lakoff G. The brain’s concepts: the role of the sen-
sory-motor system in reason and language. Cogn Neuropsychol.
2005;22:455–79.
24. Gallese V, Sinigaglia C. What is so special with embodied sim-
ulation. Trends Cogn Sci (Oct 7). 2011. http://www.unipr.it/arpa/
mirror/pubs/pdffiles/Gallese/2011/tics_20111007.pdf.
25. Georg Stork H (2012) Towards a scientific foundation for engi-
neering cognitive systems—a European research agenda, its
rationale and perspectives. BICA Elsevier Science publishers,
1:82–91. doi:10.1016/j.bica.2012.04.002.
26. Glenberg AM. What memory is for. Behav Brain Sci. 1997;20:
1–19.
27. Glenberg A, Gallese V. Action-based language: a theory of language
acquisition production and comprehension. Cortex. 2012;48(7):
905–22.
28. Grafton ST. Embodied cognition and the simulation of action to
understand others. Ann N Y Acad Sci. 2009;1156:97–117.
29. Grush R. The emulation theory of representation: motor control,
imagery, and perception. Behav Brain Sci. 2004;27:377–96.
30. Hagmann P, Cammoun L, Gigandet X, Meuli R, Honey CJ,
Wedeen VJ, Sporns O. Mapping the structural core of human
cerebral cortex. PLoS Biol. 2008;6(7):e159, 1479–93.
31. Hassabis D, Maguire EA. The construction system of the brain.
In: Bar M, editor. Predictions in the brain: using our past to
generate a future. New York: Oxford University Press; 2011.
32. Hesslow G. Conscious thought as a simulation of behavior and
perception. Trends Cogn Sci. 2002;6:242–7.
33. Hesslow G, Jirenhed DA. The inner world of a simple robot.
J Conscious Stud. 2007;14:85–96.
34. Hoffmann M, Gravato Marques H, et al. Body schema in robotics: a
review. IEEE Trans Auton Mental Dev. 2010;2:304–24.
35. Hofstadter DR. Godel, Escher, Bach: an eternal golden braid.
NY: Basic Books; 1979.
36. Hofstadter DR. I am a strange loop. NY: Basic Books; 2007.
380 Cogn Comput (2013) 5:355–382
123
37. Hopfield JJ. Searching for memories, Sudoku, implicit check bits,
and the iterative use of not-always-correct rapid neural compu-
tation. Neural Comput. 2008;20(5):1119–64.
38. Hummel JE, Holyoak KJ. A symbolic-connectionist theory of
relational inference and generalization. Psychol Rev. 2003;110:
220–64.
39. Iacoboni M. Neurobiology of imitation. Annual review of psy-
chology. Curr Opin Neurobiol. 2009;19(6):661–5.
40. Iriki A, Sakura O. Neuroscience of primate intellectual evolution:
natural selection and passive and intentional niche construction.
Philos Trans R Soc Lond B Biol Sci. 2008;363:2229–41.
41. Johnson M. The body in the mind: the bodily basis of meaning,
imagination and reason. Chicago: University of Chicago Press;
1987.
42. Kacelnik A, Chappell J, Weir AAS, Kenward B. Tool use and
manufacture in birds. In: Bekoff M, editor. Encyclopedia of
animal behavior, vol 3. Westport, CT: Greenwood Publishing
Group; 2004. p. 1067–9.
43. Kohler E, et al. Hearing sounds, understanding actions: action
representation in mirror neurons. Science. 2002;297(5582):
846–8.
44. Kohonen T. Self-organizing maps. Berlin: Springer; 1995.
45. Kokinov BN, Petrov A. Integration of Memory and Reasoning in
Analogy-Making: The AMBR Model, The Analogical Mind:
Perspectives from Cognitive Science. Cambridge, MA: MIT
Press; 2001.
46. Locher JL. The magic of M. C. Escher. Harry N. Abrams, Inc.
2000. ISBN 0-8109-6720-0.
47. Marino BFM, Gough PM, Gallese V, Riggio L, Buccino G. How
the motor system handles nouns: a behavioral study. Psychol Res.
2013;77(1):64–73.
48. Martin A. The representation of object concepts in the brain.
Annu Rev Psychol. 2007;58:25–45.
49. Martin A. Circuits in mind: the neural foundations for object
concepts. In: Gazzaniga M, editor. The cognitive neurosciences.
4th ed. Cambridge, MA: MIT Press; 2009. p. 1031–45.
50. Meyer K, Damasio A. Convergence and divergence in a neural
architecture for recognition and memory. Trends Neurosci.
2009;32(7):376–82.
51. Mohan V, Morasso P. Passive motion paradigm: an alternative to
optimal control. Front Neurorobot. 2011;5:4. doi:10.3389/fnbot.
2011.00004.
52. Mohan V, Morasso P. How past experience, imitation and prac-
tice can be combined to swiftly learn to use novel ‘‘tools’’:
insights from skill learning experiments with baby humanoids.
international conference on biomimetic and biohybrid systems:
living machines 2012, July 9–12 2012, Barcelona, Spain. 2012.
53. Mohan V, Morasso P, Metta G, Kasderidis S. The distribution of
rewards in growing sensorimotor maps acquired by cognitive
robots through exploration. Neurocomputing. 2011;. doi:
10.1016/j.neucom.2011.06.009.
54. Mohan V, Morasso P, Zenzeri J, Metta G, Chakravarthy VS,
Sandini G. Teaching a humanoid robot to draw ‘Shapes’. Auton
Robots. 2011;31(1):21–53.
55. Mussa Ivaldi FA, Morasso P, Zaccaria R. Kinematic networks. A
distributed model for representing and regularizing motor
redundancy. Biol Cybern. 1988;60:1–16.
56. O’Reilly RC, Munakata Y, Frank MJ, Hazy TE, Contributors.
Computational Cognitive Neuroscience. Wiki Book, 1st Edition.
2012. URL:http://ccnbook.colorado.edu.
57. Patterson K, Nestor PJ, Rogers TT. Where do you known what
you know? The representation of semantic knowledge in the
human brain. Nat Rev Neurosci. 2007;8(12):976–87.
58. Pepperberg IM. The Alex studies: cognitive and communicative
abilities of grey parrots. Harvard University Press. 2000. ISBN
0-674-00806-5.
59. Pulvermuller F, Fadiga L. Active perception: sensorimotor cir-
cuits as a cortical basis for language. Nat Rev Neurosci. 2010;
11(5):351–60.
60. Ramachandran VS. The tell-tale brain: a neuroscientist’s quest
for what makes us human. New York: W. W. Norton & Com-
pany; 2011.
61. Rizzolatti G, Sinigaglia C. The functional role of the parieto-
frontal mirror circuit: interpretations and misinterpretations. Nat
Rev Neurosci. 2010;11:264–74.
62. Rizzolatti G, Fadiga L, Matelli M, Bettinardi V, Paulesu E, Perani
D, Fazio F. Localization of grasp representations in humans by
PET: 1. Observation versus execution. Exp Brain Res. 1996;111:
246–52.
63. Rizzolatti G, Fogassi L, Gallese V. Neurophysiological mecha-
nisms underlying action understanding and imitation. Nat Rev
Neurosci. 2001;2:661–70.
64. Rother C, Kolmogorov V, Blake A. GrabCut: Interactive fore-
ground extraction using iterated graph cuts. In: ACM transactions
on graphics (SIGGRAPH). Los Angeles, CA: ACM Press; 2004.
p. 309–14.
65. Shadmehr R, Mussa-Ivaldi FA, Bizzi E. Postural force fields of
the human arm and their role in generating multijoint movements.
J Neurosci. 1993;13:45–82.
66. Shapiro R. Direct linear transformation method for three-
dimensional cinematography. Res Quart. 1978;49:197–205.
67. Sporns O. Networks of the brain. Cambridge, MA: MIT Press;
2010.
68. Sporns O, Kotter R. Motifs in brain networks. PLoS Biol.
2004;2:1910–8.
69. Sporns O, Honey CJ, Kotter R. Identification and classification of
hubs in brain networks. PLoS ONE. 2007;2:e1049.
70. Suddendorf T, Addis DR, Corballis MC. Mental time travel and
the shaping of the human mind. Philos Trans R Soc B.
2009;364:1317–24.
71. Thompson E. Mind in life biology, phenomenology and the sci-
ences of mind. 1st ed. Cambridge, MA: Harvard University Press;
2007. p. 568.
72. Umilta MA, Escola L, Intskirveli I, Grammont F, Rochat M,
Caruana F, Jezzini A, Gallese V, Rizzolatti G. When pliers
become fingers in the monkey motor system. Proc Natl Acad Sci
USA. 2008;105(6):2209–13.
73. Varela FJ, Maturana HR, Uribe R. Autopoiesis: the organization
of living systems, its characterization and a model. Biosystems.
1974;5:187–96.
74. Venon D, von Hofsten C, Fadiga L. A roadmap for cognitive
development in humanoid robots. Berlin: Springer; 2010.
75. Visalberghi E, Fragaszy D. What is challenging about tool use? The
capuchin’s perspective. In: Wasserman EA, Zentall TR, editors.
Comparative cognition: experimental explorations of animal
intelligence. New York: Oxford University Press; 2006. p. 529–52.
76. Visalberghi E, Limongelli L. Action and understanding: tool use
revisited through the mind of capuchin monkeys. In: Russon A,
Bard K, Parker S, editors. Reaching into thought. The minds of
the great apes. Cambridge: Cambridge University Press; 1996.
p. 57–79.
77. Visalberghi E, Tomasello M. Primate causal understanding in the
physical and in the social domains. Behav Process. 1997;42:
189–203.
78. Vygotsky LS. Mind in society: the development of higher psy-
chological processes. Cambridge, MA: Harvard University Press;
1978.
79. Watts JD, Strogatz S. Collective dynamics of small world net-
works. Nature. 1998;393(6684).
80. Weiner N. Cybernetics: or control and communication in the
animal and the machine. Paris: Hermann & Cie, Cambridge, MA:
MIT Press. 1948. ISBN 978-0-262-73009-9.
Cogn Comput (2013) 5:355–382 381
123
81. Weir AAS, Chappell J, Kacelnik A. Shaping of hooks in New
Caledonian crows. Science. 2002;297:981–3.
82. Welberg L. Neuroimaging: rats join the ‘default mode’ club. Nat
Rev Neurosci. 2012;13(4):223. doi:10.1038/nrn3224.
83. White JG. Neuronal connectivity in C elegans. Trends Neurosci.
1985;8:277–83.
84. Whiten A, McGuigan N, Marshall-Pescini S, Hopper LM.
Emulation, imitation, overimitation and the scope of culture for
child and chimpanzee. Philos Trans R Soc B Biol Sci. 2009;364:
2417–28.
382 Cogn Comput (2013) 5:355–382
123