Interpretation of emotional body language displayed by robots

Interpretation of Emotional Body Language Displayed by Robots

Aryel Beck1 Antoine Hiolle1 Alexandre Mazel2 Lola Cañamero1

1Adaptive Systems Research Group

School of Computer Science & STRI University of Hertfordshire, UK

+44 (0)1707 286327

{a.beck2,a.hiolle,l.canamero}@herts.ac.uk

2Aldebaran Robotics

168bis-170 rue Raymond Losserand Paris 75014

+33(0)177371764

[email protected]

ABSTRACT In order for robots to be socially accepted and generate empathy

they must display emotions. For robots such as Nao, body

language is the best medium available, as they do not have the

ability to display facial expressions. Displaying emotional body

language that can be interpreted whilst interacting with the robot

should greatly improve its acceptance.

This research investigates the creation of an „Affect Space‟ [1] for

the generation of emotional body language that could be displayed

by robots. An Affect Space is generated by „blending‟ (i.e.

interpolating between) different emotional expressions to create

new ones. An Affect Space for body language based on the

Circumplex Model of emotions [2] has been created.

The experiment reported in this paper investigated the perception

of specific key poses from the Affect Space. The results suggest

that this Affect Space for body expressions can be used to

improve the expressiveness of humanoid robots.

In addition, early results of a pilot study are described. It revealed

that the context helps human subjects improve their recognition

rate during a human-robot imitation game, and in turn this

recognition leads to better outcome of the interactions.

General Terms Experimentation, Human Factors.

Keywords

Human Robot Interactions, Emotional Body Language

1. INTRODUCTION Expressive robots have already been successfully created. For

instance Kismet expresses emotions through its face [1]. Its

expressions are based on nine prototypical facial expressions that

„blend‟ (interpolate) together along three axes: Arousal, Valence

and Stance. Arousal defines the level of energy. Valence specifies

how positive or negative the stimulus is. Stance defines how

approachable the stimulus is. This method defines an Affect

Space in which expressive behaviours span continuously across

these three dimensions, creating a wide range of expressions [1].

This research focuses on developing a system to generate

emotional expressions for humanoid robots such as Nao [3].

Whilst such robots cannot display facial expressions, they can

display rich body language postures that portray complex

emotional states [4].

Some research has already focused on achieving responsive

behaviours, especially for Virtual Humans. For instance, Gillies

et al have created a method to create responsive virtual humans

that can generate their own expressions based on motion capture

data [5]. However it is not possible to transfer this method onto

robots directly as they cannot reproduce the movements captured

by Motion capture as smoothly as virtual humans or without

falling over.. Therefore, at this stage, it was decided to take a

simpler approach.

The approach proposed is comparable to the one used to create

Kismet‟s expressions. Kismet uses a small set of facial

expressions that „blend‟ together. This research investigates

whether a similar approach would be effective for bodily

expressions. „Blending‟ body expressions may result in the

intended emotions. However these types of expressions need to

be tested as the interpretation of the expression may differ from

the intended one. For instance, it is not evident that „blending‟

two negative body expressions would result in a negative

expression.

Therefore, an experiment investigating how such key poses are

perceived was conducted.

2. Affect Space for Body Expressions An algorithm that blends (interpolates) between a defined set of

key poses was developed to automatically generate new ones.

The algorithm can generate movements from the current joint

positions of Nao to new ones during a specified duration.

The postures are generated by calculating the weighted mean of

Permission to make digital or hard copies of all or part of this work

for personal or classroom use is granted without fee provided that

copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.

To copy otherwise, or republish, to post on servers or to redistribute

to lists, requires prior specific permission and/or a fee. AFFINE‟10, October 29, 2010, Firenze, Italy.

Copyright 2010 ACM 978-1-4503-0170-1/10/10...$10.00.

the joint angles from up to three postures taken from a defined set.

Currently, the coefficient and the duration of the movements are

manually specified.

To ensure smooth movements and avoid abrupt changes of

directions, the movements are interpolated using B-Spline. Figure

1, illustrates a situation in which the robot was moving from

posture A to posture C while in B, a new posture is entered. The

new movement to be performed is then interpolated using B-

Splines, between posture C and D resulting in the smooth curve

shown in figure 1.

This method is interesting as it produces a wide range of different

emotional expressions easily and quickly. These animations are

fully configurable and use only a small amount of memory. Each

key frame is computed “on the fly”. Another benefit is the ability

to change from one emotion to another without the need a neutral

pose. It is also possible to easily add new basic key poses for

every emotion producing an extremely wide range of emotional

expressions.

Using this algorithm, an „Affect Space‟ based on the circumplex

model of emotion [2] was defined for body language. According

to Russell‟s circumplex model of emotions, emotional experiences

depend on two major dimensions, Arousal and Valence. The

postures were chosen from a study looking at how head position

affects the interpretation of the emotion displayed [4]. It was

found that head positions had a significant effect on the perceived

Valence and Arousal [4]. Therefore, the head was positioned to

be consistent with these results. Four emotions were chosen from

[4] for this study. Happiness was chosen as it was the positive

emotion conveying the highest level of Arousal. Pride was chosen

because it was the positive emotion conveying the lowest level of

Arousal. For the negative emotions, Fear was chosen as it was

conveying the highest level of Arousal. Sadness was chosen as it

was conveying the lowest level of Arousal. A neutral and stable

pose was developed and added to the set.

Finally, the axes of the Affect Space were built by placing in

opposition the most positive and aroused posture with the most

negative and non-aroused key pose (Figure 2). Similarly, the

most negative and aroused key pose was placed in opposition with

the most positive and non aroused key pose (Figure 2). The

resulting system created postures based on the two dimensions of

the circumflex model (Arousal and Valence).

3. The Experiment

3.1 Design The experiment was designed to test how key poses generated by

the Affect Space presented in section 2 are interpreted and

whether these interpretations are consistent with the postures‟

position in the model (Figure 2). The experiment used a within-

subjects design with one dependent variable (Emotion Displayed).

Four dependent variables were defined to explore the Affect

Space: Primary Emotion, Secondary Emotion, Arousal and

Valence (See section 1 for definition). Primary Emotion and

Secondary Emotion were used to test whether it was possible for

participants to interpret the key pose displayed. Arousal and

Valence were used to investigate the position of each tested key

pose‟ in the Affect Space (Figure 2).

The main question tested was:

Is the interpretation of the key poses displayed consistent with

their positions in the Affect Space?

3.2 Participants 23 Participants were recruited, mostly students from the

University of Portsmouth (7 females and 16 males) ranging in age

from 19 to 49 (M=27.22, SD=7.80). Participants were entered in

a raffle to win an IPhone in exchange for participation.

3.3 Material and Apparatus The platform chosen for this experiment was Nao. Nao is a

humanoid robot with 25 degrees of freedom (Figure 2).

The four key poses from [4] were modified to improve the

stability of the robot and to ensure it would not fall on account of

Figure 3: Five Key poses generated by the system (A: 100%

Sadness. B: 70% Sadness 30% Fear. C: 50% Sadness 50% Fear.

D: 30% Sadness 70% Fear. E: 100% Fear).

Figure 2: The model tested. The Black dots correspond

to the key poses tested. The position of the dots is

symmetrical for illustrative purpose.

Figure 1: Changes of direction using B-Splines

a bad combination. Sixteen additional poses were then generated

using the system presented in section 2. Each emotion was

„blended‟ with its „neighbors‟ (Figure 2) at three different levels

(100%, 70%/30%, 50%/50%). To limit the number of key poses

being assessed by each participant, the neutral position was

blended with all emotions at 50%/50% only (Figure 2).

3.4 Procedure All participants were tested by the same experimenter in

individual sessions. Each session began by obtaining consent then

participants watched and assessed the 20 poses. Each pose was

displayed only once in a randomized order. For each pose,

participants were asked to make an „open‟ interpretation. They

had to categorize it by choosing one emotion among Happiness,

Pride, Excitement, Fear, Anger, Sadness and Neutral. Eventually

they had the possibility to add a secondary emotion. Participants

also rated Valence (Negative/Positive) and Arousal (Low

Energy/High Energy) on a 10 points Lickert scale. Once all the

poses had been assessed, participants were fully debriefed

regarding the purpose of the study. The whole session took

around 30 minutes.

4. Results

4.1 Identification of five key poses in the Set Since the original postures were slightly modified and a neutral

posture was added, it was necessary to check whether it was still

possible for participants to correctly identify them.

Table 1 confirmed that participants were able to interpret all the

postures used in the set. As in [4], Happiness was most commonly

misinterpreted as Excitement (by 26% of participants). In the

context of this experiment this was a positive result as it

confirmed that the key pose showing Happiness was likely to be

perceived as positive and aroused.

4.2 Interpretations of the Generated Key

Poses

Table 2 shows that the interpretations of the key poses displayed

were consistent with their position in the model (Figure 1). The

negative key poses that were automatically generated were

interpreted as negative whereas the positive ones were interpreted

as positive. Moreover, for most of the key poses, the primary

interpretation was consistent with the „blend‟ of emotions being

displayed (Table 2).

Table2: Postures and their main interpretations. “None” indicates

that the question was left unanswered.

Table 1: Recognition rate of the set of posture used (Chance

level would be 14%).

4.3 Perceived Valence In order to investigate how the blended postures were perceived,

the different key poses were compared in pairs using Two Ways

Repeated Measures ANOVAs.

As expected, Happiness was perceived as significantly more

positive (Valence) than Sadness (F(1,22)=69.51, p<0.01, Partial

Eta Squared=0.76) and than Fear (F(1,22)=73.59, p<0.01, Partial

Eta Squared=0.77). Pride was perceived as more positive than

Sadness (F(1,22)=106.55, p<0.01, Partial Eta Squared=0.83) and

than Fear (F(1,22)=164.14, p<0.01, Partial Eta Squared=0.88).

There was no difference between Happiness and Pride

(F(1,22)=0.04, p=0.84, Partial Eta Squared=0.00). Similarly,

there was no difference between Sadness and Fear (F(1,22)=0.68,

p=0.42, Partial Eta Squared=0.03).

The results of the comparisons made between the different

postures are summarized in Figure 4. It shows that overall the

perceived Valence of the Positive Aroused area (i.e. Happiness)

was not affected by the changes in postures, neither by adding

elements of Fear nor Pride (Figure 4). However, the „Positive

non-aroused‟ (i.e. Pride) area was affected by the changes when

adding elements of Happiness (Figure 4).

For the Negative Aroused (i.e. Fear) and the Negative Non

Aroused (i.e. Sadness) areas, the results show that the key poses

from the basic set were perceived as the most negative (Valence)

ones (Figure 3).

4.4 Perceived Arousal As expected, Happiness was perceived as significantly more

Aroused than Pride (F(1,22)=10.27, p<0.01, Partial Eta

Squared=0.32) and than Sadness (F(1,22)=166.84, p<0.01, Partial

Eta Squared=0.88). Fear was perceived as more Aroused than

Sadness (F(1,22)=47.13, p<0.01, Partial Eta Squared=0.68).

Moreover, Pride was perceived as more Aroused than Sadness

(F(1,22)=30.55, p<0.01, Partial Eta Squared=0.58). Happiness

was perceived as more Aroused than Fear (F(1,22)=5.46, p<0.05,

Partial Eta Squared=0.20).

As in section 4.3, the results of the Repeated Measures ANOVAs

were summarized in Figure 5. Figure 5 shows that perceived

Arousal is consistent with the prediction of the model. In other

word, „blending‟ an aroused emotion with a non-aroused one

either decreases the perceived Arousal or does not affect it.

Similarly, blending a non aroused emotion with an aroused one

increases the perceived Arousal or does not affect it (Figure 4).

Moreover, for each emotion, there was a decrease (significant or

a trend) in Arousal when it was blended with the neutral key pose.

5. Discussions

5.1 Interpretations Participants were far better than chance at interpreting the five key

poses used as a set. The recognition rates were weaker than in

[4]. However, this is not surprising as the questionnaire used in

this study had more options and participants watched each key

pose only once.

The recognition rates (Table 1 and Table 2) confirmed that it is

possible to interpret emotions displayed by a humanoid robot and

that the lack of facial expression is not a barrier to expressing

emotions.

Moreover, the results show that it was possible for participants to

successfully recognize the key poses generated by the system. For

Figure 4: Results and Direction of the Two Ways Repeated

Measures Anovas conducted on Valence (V). (* indicates that

p<0.05. ** indicates that p<0.01). The position of the dots is


Figure 5: Results and Direction of the Two Ways Repeated

Measures Anovas conducted on Arousal (A). (* indicates that

p<0.05. ** indicates that p<0.01). The position of the dots is


instance, the key poses created by blending 70%/30% of different

emotions were interpreted in a manner consistent with the primary

emotions being displayed (Table 2). This suggests that it is

possible to create variations of an emotional expression using the

Affect Space while maintaining the way it is perceived. In other

words, this method can be used to automatically generate different

expressions for an emotion.

However, the results suggest that the key poses created by

blending emotion at 50%/50% were more difficult to interpret

(Table 2). For instance 50% Happiness 50% Pride was

interpreted by 30% of the participants as Neutral and by another

30% as Happiness (Table 2). However, looking at the value of

Valence and Arousal, the key pose‟s position was still consistent

with the model (Figure 3 and Figure 4). This was further

suggested by the answers to the open question. Four participants

described the key pose as „Welcoming‟, „Embracing‟ or „Wants a

hug‟. This shows that it was perceived as predicted in Figure 1

(Positive but less aroused than 100% Happiness). Similarly, 50%

Fear 50% Sadness was interpreted as Fear by only 35% of the

participants (Table 2). Looking at the value of Valence and

Arousal, the key pose‟s position was still consistent with the

model (Figure 3 and 4). This was also suggested by some

participants‟ answers to the open question. The key pose was

described as “Shy”, “Apprehensive”, “Cautious” or “Reserved”.

The interpretations of the key poses thus suggest that the Affect

Space created can be used to greatly enrich the expressiveness of

the robot. It could also be used to avoid always displaying the

exact same expression for an emotion while still being

understandable.

5.2 Valence The algorithm did not create „aberrant‟ postures. The perceived

Valence was always consistent with the emotions being displayed.

This confirmed that the interpretations of the emotions were

consistent with the intended display.

However, there were some unexpected results regarding the

perceived Valence of the negative key poses generated. The

results show that the key poses generated by blending Fear and

Sadness were perceived as less negative than the original ones

(Figure 3). The model predicted no change in Valence. The key

pose 100% Fear and the key pose 100% Sadness may have been

perceived as extreme occurrence, prototypical displays, of these

emotions (Figure 3A and 3E). This would explain why they were

perceived as more negative than the generated ones, which are not

prototypical. Nevertheless, the generated key poses were still

interpreted as negative. The organization of the Affect Space will

be modified to take this into account.

The Affect Space was tested with key poses and it is expected that

movements will further improve the expressivity of the system.

5.3 Arousal Figure 4 shows that the generated key poses were consistent with

the predictions made by the Affect Space. The results show that it

is possible to increase or decrease the perceived Arousal by

adding element of an aroused or un-aroused posture. For instance,

the key pose 50% Fear 50% Sadness, was interpreted as Neutral.

It was however rated as more aroused than Sadness and less

aroused than Fear (Figure 4).

However, the results also show that the anticipated position of the

postures needs to be corrected. For instance, 100% Pride was

completely misplaced, as it conveyed a higher level of Arousal.

Because of this, the Affect Space generated for this study did not

cover the „positive non-aroused area' (Figure 2). It will be

necessary to complete it with a non-aroused positive posture.

So far, only key poses have been tested and Arousal is known to

be related to the speed of movements [6]. It is therefore expected

that the model will benefit from incorporating motion varying in

speed depending on the robot‟s Arousal.

6. Application to Human Robot Interactions The fact that participants can successfully interpret the emotional

postures displayed by a robot suggests that it could be used to

facilitate human-robot interactions. For instance, the emotional

feedback provided by a robot could be intuitively used by humans

to establish whether or not an interaction was successful.

This was reinforced by the results of a pilot study in which

participants were asked to teach a Nao robot to imitate four

distinct movements, based on four different perceptions. Nao was

sat in front of participants, who were using a recognizable pink

ball to show the robot four different arm movements (moving the

right arm up, right arm down, left arm up and left arm down).

Throughout this interaction, the robot was supposed to associate

the position of the ball to the appropriate movement to perform.

In order to learn the correct associations, the robot only used the

rhythm at which participants were changing the ball‟s position.

The experiment was investigating whether rhythm can be used as

a reward by an autonomous robot during a simple interaction,

without any prior knowledge. The underlying mechanism of the

learning algorithm is thoroughly described in [7], although it was

only applied to a human-computer interaction.

The rhythm of interaction was chosen because it is a natural

component of every interaction. Its variations convey meaningful

information. For instance, an adult teaching a child how to use a

special toy, or how to play a game, would keep showing how it

can be manipulated and let the child try. If the child performed an

incorrect action, the adult would stop, and could say “Not like

this!”, for instance, and would show the correct action again. This

is an implicit break in the rhythm of the interaction. However, as

natural as this phenomenon can be when adults interact with

children, it is not evident that interacting with a robot will trigger

the same behaviour. Andry & Al also adapted the experiment to a

SONY Aibo robot [8]. Their results showed that learning was

achievable. However, it was hard to obtain and hard to maintain.

These difficulties were partly due the learning algorithm having a

slow memory consolidation mechanism, and having an

exploration/exploitation trade-off making which made the robot

try out new actions from time to time. More crucially, the robot

did not express any feedback as to how it was understanding the

human behaviour, namely the changes in rhythm, which in natural

interactions is an important factor.

In order to assess the importance of expressing feedback, a pilot

study with ten participants was conducted. The aim was to assess

whether context would help the recognition of a particular body

posture displayed by the robot, and if the emotional expression

affected participants‟ behaviour. Therefore, the behaviour of the

robot was modified so that it provided participants with feedback.

If a certain amount of negative reward was experienced by the

robot, it stopped the interaction and displayed a Sad posture for

two seconds. Similarly, if a certain amount of positive reward

was experienced, the robot displayed a Happy posture. The robot

displayed a Bored posture when participants repeated the same

movement time and time again. The postures were indicators of

whether the interaction was successful. In this experiment, the

postures used were not generated by the Affect Space. The body

postures were displayed based on the accumulation of negative or

positive rewards. Moreover, the recognition rates of the body

postures displayed by the robot suggested that they were harder to

identify than the ones used to generate the Affect Space.

Nevertheless, the results show that the recognition rates were

higher when the postures were displayed within the context of the

interaction. These results suggest that the context of the

interaction has a significant impact on the interpretation of a body

posture. Moreover, after the postures were displayed the

behaviour of the subjects who recognized the postures was

altered. Usually, the subjects would be surprised at first,

wondering why the robot would express sadness (or frustration,

disappointment) at this point in time, then changed the way they

interacted with the robot, leading to more success in the imitation

game.

With regards to these results, the Affect Space described in this

paper could be used as an efficient way of indicating to a human

whether the interaction was successful. The robot could enrich

the interaction with a wide range of emotional expressions. Using

the Affect Space, the robot could display blend of emotions

specific to an interaction (rather than Sadness/Happiness). For

instance, when the robot learned to perform a new movement, it

could display a posture expressing Pride as well as Happiness.

However, it is necessary to formally assess this medium in the

context of an interaction.

7. Conclusion The results show that it is possible to interpret key poses

generated by the Affect Space. This suggests that the approach

can be used to enrich, at a low cost, the expressiveness of

humanoid robots. However, the exact position in the Affect Space

of the generated expressions still needs to be clearly assessed.

The system can generate animation „on the fly‟. However, the

interpretations of such animations remain to be tested. The next

research step is to use this approach to create animation. The final

version will consider acceleration and curvature, as it has been

established that these parameters are related to arousal and

valence [6]. These additions should improve the expressiveness

of robots.

Moreover, the overall purpose of communicating the emotional

state of the robot is to

facilitate interactions. The effectiveness of

the Affective Space will be assessed during real time interactions.

The evaluation will consider the recognition of the postures being

displayed as well as their effect on the interaction. It is expected

that the widening of the range of emotional expressions of the

robot, will help human partners interact with the robot intuitively

8. ACKNOWLEDGMENTS The authors would like to thank the School of Creative

Technologies, University of Portsmouth, for hosting the

experiment.

This work is partly funded by the EU FP6 Feelix Growing project

(grant number IST-045169), and partly by the EU FP7 ALIZ-E

project (grant number 248116).

9. REFERENCES [1] Breazal, C., Designing sociable robots. Intelligent Robotics

& Autonomous Agents. 2002: MIT press.

[2] Russell, J.A., A circumplex model of affect. Journal of

Personality and Social Psychology, 1980. 39: p. 1161-1178.

[3] Aldebaran, http://www.aldebaran-robotics.com/. 2010.

[4] Beck, A., Canamero, L., Bard, K., Toward an affect space

for robots to display body language. In proceedings of the

International Symposium Re-thinking interaction with robots

(Ro-Man 2010).

[5] M. Gillies, et al., "Responsive listening behavior,

"Computer animation and virtual worlds, vol. 19, pp. 579-

589, 2008.

[6] Saerbeck, M. and Bartneck, C. Perception of affect elicited

by robot motion, in Human-Robot Interaction (HRI2010),

ACM/IEE, Editor. 2010, ACM/IEE: Osaka. p. 53-60.

[7] Andry, P., Gaussier, P., Moga, S., Banquet, J.P. and Nadel, J.

Learning and communication via imitation: an autonomous

robot perspective. In Transactions on Systems, Man, and

Cybernetics, vol 31, number 5, pp 431-442, 2001.

[8] Andry, P., Garnault, N. and Gaussier, P. Using the

interaction rhythm to build an internal reinforcement signal:

a tool for intuitive HRI, in Proceedings of the Ninth

International Conference on Epigenetic Robotics 2009.

http://www.aldebaran-robotics.com/

Date post:	15-May-2023
Category:	Documents
Upload:	aldebaran-robotics
View:	0 times
Download:	0 times

Interpretation of emotional body language displayed by robots

Documents