Date post: | 15-May-2023 |
Category: |
Documents |
Upload: | aldebaran-robotics |
View: | 0 times |
Download: | 0 times |
Interpretation of Emotional Body Language Displayed by Robots
Aryel Beck1 Antoine Hiolle1 Alexandre Mazel2 Lola Cañamero1
1Adaptive Systems Research Group
School of Computer Science & STRI University of Hertfordshire, UK
+44 (0)1707 286327
{a.beck2,a.hiolle,l.canamero}@herts.ac.uk
2Aldebaran Robotics
168bis-170 rue Raymond Losserand Paris 75014
+33(0)177371764
ABSTRACT In order for robots to be socially accepted and generate empathy
they must display emotions. For robots such as Nao, body
language is the best medium available, as they do not have the
ability to display facial expressions. Displaying emotional body
language that can be interpreted whilst interacting with the robot
should greatly improve its acceptance.
This research investigates the creation of an „Affect Space‟ [1] for
the generation of emotional body language that could be displayed
by robots. An Affect Space is generated by „blending‟ (i.e.
interpolating between) different emotional expressions to create
new ones. An Affect Space for body language based on the
Circumplex Model of emotions [2] has been created.
The experiment reported in this paper investigated the perception
of specific key poses from the Affect Space. The results suggest
that this Affect Space for body expressions can be used to
improve the expressiveness of humanoid robots.
In addition, early results of a pilot study are described. It revealed
that the context helps human subjects improve their recognition
rate during a human-robot imitation game, and in turn this
recognition leads to better outcome of the interactions.
General Terms Experimentation, Human Factors.
Keywords
Human Robot Interactions, Emotional Body Language
1. INTRODUCTION Expressive robots have already been successfully created. For
instance Kismet expresses emotions through its face [1]. Its
expressions are based on nine prototypical facial expressions that
„blend‟ (interpolate) together along three axes: Arousal, Valence
and Stance. Arousal defines the level of energy. Valence specifies
how positive or negative the stimulus is. Stance defines how
approachable the stimulus is. This method defines an Affect
Space in which expressive behaviours span continuously across
these three dimensions, creating a wide range of expressions [1].
This research focuses on developing a system to generate
emotional expressions for humanoid robots such as Nao [3].
Whilst such robots cannot display facial expressions, they can
display rich body language postures that portray complex
emotional states [4].
Some research has already focused on achieving responsive
behaviours, especially for Virtual Humans. For instance, Gillies
et al have created a method to create responsive virtual humans
that can generate their own expressions based on motion capture
data [5]. However it is not possible to transfer this method onto
robots directly as they cannot reproduce the movements captured
by Motion capture as smoothly as virtual humans or without
falling over.. Therefore, at this stage, it was decided to take a
simpler approach.
The approach proposed is comparable to the one used to create
Kismet‟s expressions. Kismet uses a small set of facial
expressions that „blend‟ together. This research investigates
whether a similar approach would be effective for bodily
expressions. „Blending‟ body expressions may result in the
intended emotions. However these types of expressions need to
be tested as the interpretation of the expression may differ from
the intended one. For instance, it is not evident that „blending‟
two negative body expressions would result in a negative
expression.
Therefore, an experiment investigating how such key poses are
perceived was conducted.
2. Affect Space for Body Expressions An algorithm that blends (interpolates) between a defined set of
key poses was developed to automatically generate new ones.
The algorithm can generate movements from the current joint
positions of Nao to new ones during a specified duration.
The postures are generated by calculating the weighted mean of
Permission to make digital or hard copies of all or part of this work
for personal or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
To copy otherwise, or republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee. AFFINE‟10, October 29, 2010, Firenze, Italy.
Copyright 2010 ACM 978-1-4503-0170-1/10/10...$10.00.
the joint angles from up to three postures taken from a defined set.
Currently, the coefficient and the duration of the movements are
manually specified.
To ensure smooth movements and avoid abrupt changes of
directions, the movements are interpolated using B-Spline. Figure
1, illustrates a situation in which the robot was moving from
posture A to posture C while in B, a new posture is entered. The
new movement to be performed is then interpolated using B-
Splines, between posture C and D resulting in the smooth curve
shown in figure 1.
This method is interesting as it produces a wide range of different
emotional expressions easily and quickly. These animations are
fully configurable and use only a small amount of memory. Each
key frame is computed “on the fly”. Another benefit is the ability
to change from one emotion to another without the need a neutral
pose. It is also possible to easily add new basic key poses for
every emotion producing an extremely wide range of emotional
expressions.
Using this algorithm, an „Affect Space‟ based on the circumplex
model of emotion [2] was defined for body language. According
to Russell‟s circumplex model of emotions, emotional experiences
depend on two major dimensions, Arousal and Valence. The
postures were chosen from a study looking at how head position
affects the interpretation of the emotion displayed [4]. It was
found that head positions had a significant effect on the perceived
Valence and Arousal [4]. Therefore, the head was positioned to
be consistent with these results. Four emotions were chosen from
[4] for this study. Happiness was chosen as it was the positive
emotion conveying the highest level of Arousal. Pride was chosen
because it was the positive emotion conveying the lowest level of
Arousal. For the negative emotions, Fear was chosen as it was
conveying the highest level of Arousal. Sadness was chosen as it
was conveying the lowest level of Arousal. A neutral and stable
pose was developed and added to the set.
Finally, the axes of the Affect Space were built by placing in
opposition the most positive and aroused posture with the most
negative and non-aroused key pose (Figure 2). Similarly, the
most negative and aroused key pose was placed in opposition with
the most positive and non aroused key pose (Figure 2). The
resulting system created postures based on the two dimensions of
the circumflex model (Arousal and Valence).
3. The Experiment
3.1 Design The experiment was designed to test how key poses generated by
the Affect Space presented in section 2 are interpreted and
whether these interpretations are consistent with the postures‟
position in the model (Figure 2). The experiment used a within-
subjects design with one dependent variable (Emotion Displayed).
Four dependent variables were defined to explore the Affect
Space: Primary Emotion, Secondary Emotion, Arousal and
Valence (See section 1 for definition). Primary Emotion and
Secondary Emotion were used to test whether it was possible for
participants to interpret the key pose displayed. Arousal and
Valence were used to investigate the position of each tested key
pose‟ in the Affect Space (Figure 2).
The main question tested was:
Is the interpretation of the key poses displayed consistent with
their positions in the Affect Space?
3.2 Participants 23 Participants were recruited, mostly students from the
University of Portsmouth (7 females and 16 males) ranging in age
from 19 to 49 (M=27.22, SD=7.80). Participants were entered in
a raffle to win an IPhone in exchange for participation.
3.3 Material and Apparatus The platform chosen for this experiment was Nao. Nao is a
humanoid robot with 25 degrees of freedom (Figure 2).
The four key poses from [4] were modified to improve the
stability of the robot and to ensure it would not fall on account of
Figure 3: Five Key poses generated by the system (A: 100%
Sadness. B: 70% Sadness 30% Fear. C: 50% Sadness 50% Fear.
D: 30% Sadness 70% Fear. E: 100% Fear).
Figure 2: The model tested. The Black dots correspond
to the key poses tested. The position of the dots is
symmetrical for illustrative purpose.
Figure 1: Changes of direction using B-Splines
a bad combination. Sixteen additional poses were then generated
using the system presented in section 2. Each emotion was
„blended‟ with its „neighbors‟ (Figure 2) at three different levels
(100%, 70%/30%, 50%/50%). To limit the number of key poses
being assessed by each participant, the neutral position was
blended with all emotions at 50%/50% only (Figure 2).
3.4 Procedure All participants were tested by the same experimenter in
individual sessions. Each session began by obtaining consent then
participants watched and assessed the 20 poses. Each pose was
displayed only once in a randomized order. For each pose,
participants were asked to make an „open‟ interpretation. They
had to categorize it by choosing one emotion among Happiness,
Pride, Excitement, Fear, Anger, Sadness and Neutral. Eventually
they had the possibility to add a secondary emotion. Participants
also rated Valence (Negative/Positive) and Arousal (Low
Energy/High Energy) on a 10 points Lickert scale. Once all the
poses had been assessed, participants were fully debriefed
regarding the purpose of the study. The whole session took
around 30 minutes.
4. Results
4.1 Identification of five key poses in the Set Since the original postures were slightly modified and a neutral
posture was added, it was necessary to check whether it was still
possible for participants to correctly identify them.
Table 1 confirmed that participants were able to interpret all the
postures used in the set. As in [4], Happiness was most commonly
misinterpreted as Excitement (by 26% of participants). In the
context of this experiment this was a positive result as it
confirmed that the key pose showing Happiness was likely to be
perceived as positive and aroused.
4.2 Interpretations of the Generated Key
Poses
Table 2 shows that the interpretations of the key poses displayed
were consistent with their position in the model (Figure 1). The
negative key poses that were automatically generated were
interpreted as negative whereas the positive ones were interpreted
as positive. Moreover, for most of the key poses, the primary
interpretation was consistent with the „blend‟ of emotions being
displayed (Table 2).
Table2: Postures and their main interpretations. “None” indicates
that the question was left unanswered.
Table 1: Recognition rate of the set of posture used (Chance
level would be 14%).
4.3 Perceived Valence In order to investigate how the blended postures were perceived,
the different key poses were compared in pairs using Two Ways
Repeated Measures ANOVAs.
As expected, Happiness was perceived as significantly more
positive (Valence) than Sadness (F(1,22)=69.51, p<0.01, Partial
Eta Squared=0.76) and than Fear (F(1,22)=73.59, p<0.01, Partial
Eta Squared=0.77). Pride was perceived as more positive than
Sadness (F(1,22)=106.55, p<0.01, Partial Eta Squared=0.83) and
than Fear (F(1,22)=164.14, p<0.01, Partial Eta Squared=0.88).
There was no difference between Happiness and Pride
(F(1,22)=0.04, p=0.84, Partial Eta Squared=0.00). Similarly,
there was no difference between Sadness and Fear (F(1,22)=0.68,
p=0.42, Partial Eta Squared=0.03).
The results of the comparisons made between the different
postures are summarized in Figure 4. It shows that overall the
perceived Valence of the Positive Aroused area (i.e. Happiness)
was not affected by the changes in postures, neither by adding
elements of Fear nor Pride (Figure 4). However, the „Positive
non-aroused‟ (i.e. Pride) area was affected by the changes when
adding elements of Happiness (Figure 4).
For the Negative Aroused (i.e. Fear) and the Negative Non
Aroused (i.e. Sadness) areas, the results show that the key poses
from the basic set were perceived as the most negative (Valence)
ones (Figure 3).
4.4 Perceived Arousal As expected, Happiness was perceived as significantly more
Aroused than Pride (F(1,22)=10.27, p<0.01, Partial Eta
Squared=0.32) and than Sadness (F(1,22)=166.84, p<0.01, Partial
Eta Squared=0.88). Fear was perceived as more Aroused than
Sadness (F(1,22)=47.13, p<0.01, Partial Eta Squared=0.68).
Moreover, Pride was perceived as more Aroused than Sadness
(F(1,22)=30.55, p<0.01, Partial Eta Squared=0.58). Happiness
was perceived as more Aroused than Fear (F(1,22)=5.46, p<0.05,
Partial Eta Squared=0.20).
As in section 4.3, the results of the Repeated Measures ANOVAs
were summarized in Figure 5. Figure 5 shows that perceived
Arousal is consistent with the prediction of the model. In other
word, „blending‟ an aroused emotion with a non-aroused one
either decreases the perceived Arousal or does not affect it.
Similarly, blending a non aroused emotion with an aroused one
increases the perceived Arousal or does not affect it (Figure 4).
Moreover, for each emotion, there was a decrease (significant or
a trend) in Arousal when it was blended with the neutral key pose.
5. Discussions
5.1 Interpretations Participants were far better than chance at interpreting the five key
poses used as a set. The recognition rates were weaker than in
[4]. However, this is not surprising as the questionnaire used in
this study had more options and participants watched each key
pose only once.
The recognition rates (Table 1 and Table 2) confirmed that it is
possible to interpret emotions displayed by a humanoid robot and
that the lack of facial expression is not a barrier to expressing
emotions.
Moreover, the results show that it was possible for participants to
successfully recognize the key poses generated by the system. For
Figure 4: Results and Direction of the Two Ways Repeated
Measures Anovas conducted on Valence (V). (* indicates that
p<0.05. ** indicates that p<0.01). The position of the dots is
symmetrical for illustrative purpose.
Figure 5: Results and Direction of the Two Ways Repeated
Measures Anovas conducted on Arousal (A). (* indicates that
p<0.05. ** indicates that p<0.01). The position of the dots is
symmetrical for illustrative purpose.
instance, the key poses created by blending 70%/30% of different
emotions were interpreted in a manner consistent with the primary
emotions being displayed (Table 2). This suggests that it is
possible to create variations of an emotional expression using the
Affect Space while maintaining the way it is perceived. In other
words, this method can be used to automatically generate different
expressions for an emotion.
However, the results suggest that the key poses created by
blending emotion at 50%/50% were more difficult to interpret
(Table 2). For instance 50% Happiness 50% Pride was
interpreted by 30% of the participants as Neutral and by another
30% as Happiness (Table 2). However, looking at the value of
Valence and Arousal, the key pose‟s position was still consistent
with the model (Figure 3 and Figure 4). This was further
suggested by the answers to the open question. Four participants
described the key pose as „Welcoming‟, „Embracing‟ or „Wants a
hug‟. This shows that it was perceived as predicted in Figure 1
(Positive but less aroused than 100% Happiness). Similarly, 50%
Fear 50% Sadness was interpreted as Fear by only 35% of the
participants (Table 2). Looking at the value of Valence and
Arousal, the key pose‟s position was still consistent with the
model (Figure 3 and 4). This was also suggested by some
participants‟ answers to the open question. The key pose was
described as “Shy”, “Apprehensive”, “Cautious” or “Reserved”.
The interpretations of the key poses thus suggest that the Affect
Space created can be used to greatly enrich the expressiveness of
the robot. It could also be used to avoid always displaying the
exact same expression for an emotion while still being
understandable.
5.2 Valence The algorithm did not create „aberrant‟ postures. The perceived
Valence was always consistent with the emotions being displayed.
This confirmed that the interpretations of the emotions were
consistent with the intended display.
However, there were some unexpected results regarding the
perceived Valence of the negative key poses generated. The
results show that the key poses generated by blending Fear and
Sadness were perceived as less negative than the original ones
(Figure 3). The model predicted no change in Valence. The key
pose 100% Fear and the key pose 100% Sadness may have been
perceived as extreme occurrence, prototypical displays, of these
emotions (Figure 3A and 3E). This would explain why they were
perceived as more negative than the generated ones, which are not
prototypical. Nevertheless, the generated key poses were still
interpreted as negative. The organization of the Affect Space will
be modified to take this into account.
The Affect Space was tested with key poses and it is expected that
movements will further improve the expressivity of the system.
5.3 Arousal Figure 4 shows that the generated key poses were consistent with
the predictions made by the Affect Space. The results show that it
is possible to increase or decrease the perceived Arousal by
adding element of an aroused or un-aroused posture. For instance,
the key pose 50% Fear 50% Sadness, was interpreted as Neutral.
It was however rated as more aroused than Sadness and less
aroused than Fear (Figure 4).
However, the results also show that the anticipated position of the
postures needs to be corrected. For instance, 100% Pride was
completely misplaced, as it conveyed a higher level of Arousal.
Because of this, the Affect Space generated for this study did not
cover the „positive non-aroused area' (Figure 2). It will be
necessary to complete it with a non-aroused positive posture.
So far, only key poses have been tested and Arousal is known to
be related to the speed of movements [6]. It is therefore expected
that the model will benefit from incorporating motion varying in
speed depending on the robot‟s Arousal.
6. Application to Human Robot Interactions The fact that participants can successfully interpret the emotional
postures displayed by a robot suggests that it could be used to
facilitate human-robot interactions. For instance, the emotional
feedback provided by a robot could be intuitively used by humans
to establish whether or not an interaction was successful.
This was reinforced by the results of a pilot study in which
participants were asked to teach a Nao robot to imitate four
distinct movements, based on four different perceptions. Nao was
sat in front of participants, who were using a recognizable pink
ball to show the robot four different arm movements (moving the
right arm up, right arm down, left arm up and left arm down).
Throughout this interaction, the robot was supposed to associate
the position of the ball to the appropriate movement to perform.
In order to learn the correct associations, the robot only used the
rhythm at which participants were changing the ball‟s position.
The experiment was investigating whether rhythm can be used as
a reward by an autonomous robot during a simple interaction,
without any prior knowledge. The underlying mechanism of the
learning algorithm is thoroughly described in [7], although it was
only applied to a human-computer interaction.
The rhythm of interaction was chosen because it is a natural
component of every interaction. Its variations convey meaningful
information. For instance, an adult teaching a child how to use a
special toy, or how to play a game, would keep showing how it
can be manipulated and let the child try. If the child performed an
incorrect action, the adult would stop, and could say “Not like
this!”, for instance, and would show the correct action again. This
is an implicit break in the rhythm of the interaction. However, as
natural as this phenomenon can be when adults interact with
children, it is not evident that interacting with a robot will trigger
the same behaviour. Andry & Al also adapted the experiment to a
SONY Aibo robot [8]. Their results showed that learning was
achievable. However, it was hard to obtain and hard to maintain.
These difficulties were partly due the learning algorithm having a
slow memory consolidation mechanism, and having an
exploration/exploitation trade-off making which made the robot
try out new actions from time to time. More crucially, the robot
did not express any feedback as to how it was understanding the
human behaviour, namely the changes in rhythm, which in natural
interactions is an important factor.
In order to assess the importance of expressing feedback, a pilot
study with ten participants was conducted. The aim was to assess
whether context would help the recognition of a particular body
posture displayed by the robot, and if the emotional expression
affected participants‟ behaviour. Therefore, the behaviour of the
robot was modified so that it provided participants with feedback.
If a certain amount of negative reward was experienced by the
robot, it stopped the interaction and displayed a Sad posture for
two seconds. Similarly, if a certain amount of positive reward
was experienced, the robot displayed a Happy posture. The robot
displayed a Bored posture when participants repeated the same
movement time and time again. The postures were indicators of
whether the interaction was successful. In this experiment, the
postures used were not generated by the Affect Space. The body
postures were displayed based on the accumulation of negative or
positive rewards. Moreover, the recognition rates of the body
postures displayed by the robot suggested that they were harder to
identify than the ones used to generate the Affect Space.
Nevertheless, the results show that the recognition rates were
higher when the postures were displayed within the context of the
interaction. These results suggest that the context of the
interaction has a significant impact on the interpretation of a body
posture. Moreover, after the postures were displayed the
behaviour of the subjects who recognized the postures was
altered. Usually, the subjects would be surprised at first,
wondering why the robot would express sadness (or frustration,
disappointment) at this point in time, then changed the way they
interacted with the robot, leading to more success in the imitation
game.
With regards to these results, the Affect Space described in this
paper could be used as an efficient way of indicating to a human
whether the interaction was successful. The robot could enrich
the interaction with a wide range of emotional expressions. Using
the Affect Space, the robot could display blend of emotions
specific to an interaction (rather than Sadness/Happiness). For
instance, when the robot learned to perform a new movement, it
could display a posture expressing Pride as well as Happiness.
However, it is necessary to formally assess this medium in the
context of an interaction.
7. Conclusion The results show that it is possible to interpret key poses
generated by the Affect Space. This suggests that the approach
can be used to enrich, at a low cost, the expressiveness of
humanoid robots. However, the exact position in the Affect Space
of the generated expressions still needs to be clearly assessed.
The system can generate animation „on the fly‟. However, the
interpretations of such animations remain to be tested. The next
research step is to use this approach to create animation. The final
version will consider acceleration and curvature, as it has been
established that these parameters are related to arousal and
valence [6]. These additions should improve the expressiveness
of robots.
Moreover, the overall purpose of communicating the emotional
state of the robot is to
facilitate interactions. The effectiveness of
the Affective Space will be assessed during real time interactions.
The evaluation will consider the recognition of the postures being
displayed as well as their effect on the interaction. It is expected
that the widening of the range of emotional expressions of the
robot, will help human partners interact with the robot intuitively
8. ACKNOWLEDGMENTS The authors would like to thank the School of Creative
Technologies, University of Portsmouth, for hosting the
experiment.
This work is partly funded by the EU FP6 Feelix Growing project
(grant number IST-045169), and partly by the EU FP7 ALIZ-E
project (grant number 248116).
9. REFERENCES [1] Breazal, C., Designing sociable robots. Intelligent Robotics
& Autonomous Agents. 2002: MIT press.
[2] Russell, J.A., A circumplex model of affect. Journal of
Personality and Social Psychology, 1980. 39: p. 1161-1178.
[3] Aldebaran, http://www.aldebaran-robotics.com/. 2010.
[4] Beck, A., Canamero, L., Bard, K., Toward an affect space
for robots to display body language. In proceedings of the
International Symposium Re-thinking interaction with robots
(Ro-Man 2010).
[5] M. Gillies, et al., "Responsive listening behavior,
"Computer animation and virtual worlds, vol. 19, pp. 579-
589, 2008.
[6] Saerbeck, M. and Bartneck, C. Perception of affect elicited
by robot motion, in Human-Robot Interaction (HRI2010),
ACM/IEE, Editor. 2010, ACM/IEE: Osaka. p. 53-60.
[7] Andry, P., Gaussier, P., Moga, S., Banquet, J.P. and Nadel, J.
Learning and communication via imitation: an autonomous
robot perspective. In Transactions on Systems, Man, and
Cybernetics, vol 31, number 5, pp 431-442, 2001.
[8] Andry, P., Garnault, N. and Gaussier, P. Using the
interaction rhythm to build an internal reinforcement signal:
a tool for intuitive HRI, in Proceedings of the Ninth
International Conference on Epigenetic Robotics 2009.