DogMate: Intent Recognition through AnticipationDesculpa-me Nadine, tu e que mais sofreste com isto...

DogMate: Intent Recognitionthrough Anticipation

Eurico Jose Teodoro Doirado

Dissertacao para obtencao do Grau de Mestre emEngenharia Informatica e de Computadores

Juri

Presidente: Doutora Ana Maria Severino de Almeida e Paiva

Orientador: Doutor Carlos Antonio Roque Martinho

Vogais: Doutor Manuel Joao da Fonseca

Doutor Pedro Alexandre Simoes dos Santos

Novembro 2009

2

Agradecimentos

Um enorme Obrigado aos meus pais que durante anos e anos aturaram-me, ajudaram-me

e fizeram com que fosse possıvel para mim chegar ate aqui.

Agradeco as minhas irmas, a Alexandra e a Nadine. Desculpa-me Nadine, tu e que

mais sofreste com isto tudo.

Agradeco aos meus colegas do GAIPS, em particular ao Guilherme e sobretudo ao

professor Carlos Martinho, o meu orientador.

Agradeco aos meus companheiros de treino, aos meus amigos e aos que sao os dois, os

“quatro outros”: o Dante, a Xana, o Vlad, a Mestre.

Por fim, quero deixar um agradecimento muito especial a pessoa que terei de qualificar

como a minha melhor amiga, a falta de melhor termo. Nao ha palavras que descrevem o

que fizeste por mim.

Ha realmente pessoas fantasticas neste mundo. Muito Obrigado a todos.

Porto Salvo, 21 de Outubro de 2009

Eurico Jose Teodoro Doirado

4

to You

6

Resumo

Os sidekicks tendem a ter dificuldades em perceber os seus parceiros, nomeadamente o

avatar do jogador, o que tem um impacto negativo na sua credibilidade. Neste trabalho,

apresentamos um metodo para detectar algumas das intencoes do jogador. Comecamos

por analisar a questao da credibilidade nas personagens sinteticas. Analisamos trabalhos

relacionados que focam em melhorar a credibilidade das personagens baseado em tecnicas

antecipatorias, nomeadamente antecipando as accoes de outras personagens. Com o ob-

jectivo de perceber a razao por detras destas accoes, propomos um modelo que relaciona

intencoes e accoes e continuamos com a elaboracao de uma framework que interpreta

as intencoes de uma accao baseado num mecanismo anticipatorio. Implementamos essa

framework num demonstrador no qual o seu proposito e controlar o comportamento do

sidekick do jogador. Finalmente, apresentamos e analisamos os tres aspectos aborda-

dos nos nossos testes: o reconhecimento das intencoes do jogador, a sua interpretacao e

a eficiencia da nossa solucao. Os resultados indicam que a nossa solucao pode ser us-

ada para detectar algumas das intencoes do jogador e, nesses casos, o reconhecimento e

semelhante ao de um observador humano, sem impactos significativos no desempenho.

8

Abstract

Sidekicks generally lack the support to understand a fundamental character in their world

— the player’s avatar. This has a negative impact on their behavioural believability. In

this work, we present an approach to detect the intent underlying certain actions of

the player. We begin by introducing synthetic characters and sidekicks in particular,

emphasising on their believability. We then review related works which enhance the

believability of characters by anticipating the actions of other characters. To understand

the reason underlying their actions, we model the relation between intention and action

and then proceed on elaborating a framework that can interpret the intent of an action

based on an anticipatory mechanism. We then present a test-case in which our framework

is used to create an architecture controlling the sidekick of the player. Finally, we discuss

three aspects of our evaluation: the player’s intent recognition, its interpretation and the

solution’s efficiency. Our results suggest that our solution can be used to detect certain

intents and, in such cases, perform similarly to a human observer, with no impact on

computational performance.

10

Palavras Chave

Keywords

Palavras Chave

Intencoes

Accoes

Personagens Sinteticas

Videojogos

Keywords

Intentions

Actions

Synthetic Characters

Videogames

12

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Believable Characters 5

2.1 Synthetic Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Dimensions of a Character . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Interactive Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 The Sidekick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Related Work 13

3.1 The SOAR Quakebot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Situated Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.3 Temporal Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4 Aini — The Synthetic Flower . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Intentions and Actions 21

4.1 Intentions in Philosophy of Mind . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Intentions in Computer Science . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3 The Player and the Player-Character . . . . . . . . . . . . . . . . . . . . . 23

4.4 Model of Intentions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.5 Components of an Action . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.5.1 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.5.2 Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.5.3 Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.5.4 Intent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

i

5 DogMate 31

5.1 Emotivector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.2 Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.3 Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.4 Affective Appraisal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.5 DogMate’s Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6 Test-Case: K9 43

6.1 The Story . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.2 The Gameplay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.3 Technological View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.4 Rusty’s Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.5 Interacting with DogMate . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7 Evaluation 55

7.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.2 Intent Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

7.3 Intent Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

7.5 Computation Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

8 Conclusions 63

8.1 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

ii

List of Figures

1.1 Sonic evolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2.1 CG films visuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Rusty, our synthetic character. . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 The player-character and Rusty, his sidekick. . . . . . . . . . . . . . . . . . 10

3.1 Quake 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Duncan, the Highland Terrier. . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3 Aini, the synthetic flower. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.1 Model of intentions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 The entities of our scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3 Rusty observes an unexpected moment. . . . . . . . . . . . . . . . . . . . . 27

4.4 Alone, the distance is not enough. . . . . . . . . . . . . . . . . . . . . . . . 28

5.1 The emotivector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.2 Emotivector’s affective sensations. . . . . . . . . . . . . . . . . . . . . . . . 33

5.3 Rusty, the observer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.4 Problem of a normalized range. . . . . . . . . . . . . . . . . . . . . . . . . 36

5.5 Rusty’s interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.6 DogMate’s update flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.1 Ralph, listening to the radio. . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.2 The Chinese commando. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.3 Three Dog kills the robot medic. . . . . . . . . . . . . . . . . . . . . . . . . 45

6.4 Ralph uses Rusty’s senses. . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.5 The G.E.C.K., Fallout 3 main modding tool. . . . . . . . . . . . . . . . . . 48

6.6 Editing the navmesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.7 Creating a character. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.8 Rusty’s four animations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.9 Rusty’s thoughts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.10 Interaction between K9 and Rusty’s brain. . . . . . . . . . . . . . . . . . . 51

iii

7.1 MOJO: a videogames exposition and demonstration. . . . . . . . . . . . . 56

7.2 The player is strafing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

7.3 Which distance? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7.4 The problem of avoiding an entity. . . . . . . . . . . . . . . . . . . . . . . 60

7.5 DogMate computational impact. . . . . . . . . . . . . . . . . . . . . . . . . 61

8.1 Filtering intents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

iv

List of Tables

3.1 Comparative analysis of the presented works. . . . . . . . . . . . . . . . . 19

5.1 The nine salience patterns from two velocity-emotivectors. . . . . . . . . . 37

5.2 Affective appraisal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.1 The animations and subtitles for each intent. . . . . . . . . . . . . . . . . . 52

7.1 Matches between Rusty’s interpreted intent and the player’s intent. . . . . 57

7.2 Matching external observer’s interpreted intent and Rusty’s recognized in-

tent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

7.3 Matching observer’s recognized intent and player’s intent. . . . . . . . . . . 59

v

vi

Chapter 1

Introduction

1.1 Motivation

In the last few years, the graphical quality of synthetic characters has improved tremen-

dously. Despite the fact that interactive applications and videogames in particular offer

stunning visuals, approaching photo-realism in some cases, characters with human-like be-

haviour are still nowhere to be found. While graphics have been continuously and rapidly

pumped up, synthetic characters’ behaviour has not evolved at the same rate. When, two

decades ago, characters were but a few pixels on the screen, they could already express a

wide range of behaviours. Who can not remember Sonic the Hedgehog1 tapping its foot

onto the ground, expressing its impatience? Or the angry Donkey Kong2 grinning at the

“poor” plumber which was trying to rescue his love, Pauline? While their expressions

were varied, and are even more nowadays, they were only superficial behaviours. And we

can say that it has not changed that much.

Figure 1.1: From few pixels to stunning visuals, Sonic evolution in a few years.

Imitating the human behaviour cannot be reduced to reproducing a fixed set of an-

imations. Animations (including sound and dialogues) would be the final step in the

long process necessary to emulate such behaviour and express it. Characters need the

right support to express themselves consistently in different context without adopting a

behaviour that would look out-of-place. This support can be based on events (e.g. the

1Sonic Team (1993): Sonic the Hedgehog. http://www.sega.com/sonic/2Nintendo (1981): Donkey Kong. http://en.wikipedia.org/wiki/Donkey Kong (video game)

1

player triggers a dialogue with another character and the character talks back), on the

character’s personality (e.g. the character is a coward and runs away from any dangerous

situation) or even on its relationship with another being (e.g. the player’s avatar and his

sidekick are comrades; thus, the sidekick will always help that character). Nevertheless,

such examples only constitute a small parcel of the support a character might need to

simulate the traits of human beings. It makes us notice that there is a difference in the

progress achieved in both their behavioural and their graphical components (in which we

sometimes have difficulties to tell computer-graphics apart from the reality). Even if the

academic community has, over the time, tackled on a large variety of problems directly

related to the enhancement of believability (encompassing the previous topics and many

more like emotions, psychological modelling or even cultural roles), game developers have

yet to incorporate the work of these researchers into their productions.

Videogames still rely heavily on scripts to define the behaviour of their characters and,

as time goes on, those characters tend to be perceived as flat and life-less. If we were to

meet a stranger in the street and ask him a thousand times the same question, he would

probably start by ignoring us and think we are crazy. Eventually, if we are insistent, the

“poor” guy would probably begin to feel irritated and ask us to stop bothering him. If we

were to continue, he would probably get angry and things would look bad for us. However

in most videogames, if we were to ask something to a character a thousand times, it would

probably answer us the same way every time. Fallout 33 is an exception in the sense that

such behaviour is displayed.

The problem deepens if we look at other traits like the character’s social status, or

its relationship with another character. In the latter, for instance, there is a character

that suffers tremendously from the lack of support to develop its natural relation with

the player’s character. This character is the player’s sidekick. The sidekick accompanies

the player’s avatar the most. Being almost always together means that both should know

each other relatively well. But then, why after so many adventures can it not guess the

player’s preferences? If we spend a lot of time with another being, we naturally begin to

understand him better. Each time we guess much more accurately how he would react

and feel about different situation. We can judge and interpret his overall intentions much

better than someone we never met before. This, however, does not happen with sidekicks.

Their interaction with the player at the beginning and at the end of the game remains

mostly the same, which does not contributes for the character to appear as believable

[33].

3Bethesda Game Studios — Fallout 3 (2008): http://fallout.bethsoft.com/.

2

1.2 The Problem

As the player advances through the game, he gradually learns more about his sidekick

companion. It can be through the story or, indirectly, through their long and close

relationship. He starts to expect some kind of attitude from the sidekick, some kind of

actions in specific situations. The player begins to understand the how and why of his

companion’s behaviour, or in other words, he begins to understand the intentions behind

its actions. Sidekicks, however, spend the exact same amount of time with the player’s

avatar, yet their behaviour does not reflect it. The underlying intention of the player’s

actions does not matter as the sidekick seems to ignore it. Understanding another being’s

intentions is a natural part of human’s cognitive capacities, and may reflect the closeness

of two individuals. Dennett’s intentional stance [8] tells us that it is an important tool in

understanding our surrounding and anticipating how everyone around us would behave.

In a videogame, it is even more important to understand the intentions behind the

actions of another character. In fact, it may open many doors to new kinds of behaviours,

only limited by the designer’s imagination. A character, by understanding the intentions

of another being, can reflect in his own actions a certain sense of importance and pres-

ence through pro-activeness and anticipation. However, we cannot discard an important

fact, videogames are real-time applications which already uses the current-generation of

hardware to their limit. To be usable in such conditions, the intent detection needs to

happen in real-time and to be almost costless.

As such, the work presented in the remainder of this document addresses the problem

that sidekicks face when accompanying the player’s avatar. We focus on detecting the

player’s avatar action to analyse its underlying intention from the point of view of the

sidekick, in a light-weight and real-time manner. Our solution for this problem is based on

the hypothesis that: by anticipating the player’s avatar actions and confronting

them with his real actions, we can understand some of his intentions.

1.3 Contributions

In this work we propose a model of intentions that we use to detect actions and their

underlying intents in videogames. We use and remodel an anticipatory mechanism, the

emotivector, to suit certain aspects of interactive environments such as videogames and

support an objective comparison between multiple expectations violation. We present a

framework that analyses the player’s avatar action to extract possible intentions from it

and interpret them from the perspective of the sidekick. Its purpose is to serve as a support

mechanism to allow the usage of an adequate behaviour for the sidekick, a behaviour that

would match the situation that the player is currently experiencing. This framework is

built taking into consideration the peculiar aspect of videogames performance. As such,

3

we built a solution that works in real-time and that can be implemented with no noticeable

impact on the usage of hardware resources.

1.4 Outline

This document is organised as follows. In chapter 2, we gradually introduce the notion that

constitutes a character and the difficulties it faces to achieve believability in an interactive

environment. We explore the sidekick, its role and its problems in a videogame. Chapter 3

presents related work that contribute to the problem of believability by using anticipation

to predict the actions of another character. In chapter 4, we model the relation between

intention and action, and discuss how action can be divided into movement, target and

intent. We then discuss how each concept can be applied in the context of virtual worlds.

Afterwards, in chapter 5, we propose an anticipation-based framework (DogMate) to

detect, from a character’s perspective, the intent underlying some of the player’s avatar

actions. The following chapter (6) describes our test-case (K9) that exemplifies how

our framework can be used to control the behaviour of a synthetic character (in this

case, Rusty, the dog companion of the player’s avatar). The evaluation of our test-case

follows in chapter 7, focused on three aspects: intent recognition, intent interpretation

and solution’s efficiency. Finally, we conclude in chapter 8 and present a few directions

to further pursue this work.

4

Chapter 2

Believable Characters

Achieving believability with synthetics is still an open problem. Many researchers focus

their efforts into advancing toward that goal and, while noteworthy progress has been

achieved, we have yet to see a character that could be mistaken for a human-being in an

interactive environment.

The field of synthetic characters covers broad topics from diverse fields such as com-

puter science or psychology but also from others artistic areas. The latter ones repre-

sent the component of synthetic characters which has evolved the faster. By looking at

computer-generated films such as the recent Kung Fu Panda1 (fig. 2.1(a)) or Monster vs.

Aliens2 (fig. 2.1(b)), we can safely say that for a character to get visually believable, in

an interactive environment, it is now a matter of processing power, i.e. a matter of time.

(a) Kung Fu Panda. Copyright c⃝DreamWorks AnimationLLC, All rights reserved.

(b) Monsters vs. Aliens. Copy-right c⃝DreamWorks AnimationLLC, All rights reserved.

Figure 2.1: Computer-generated films achieve a stunning visual quality.

1DreamWorks Animation LLC, Kung Fu Panda (2008): http://www.kungfupanda.com/.2DreamWorks Animation LLC, Monster vs. Aliens (2009): http://www.monstersvsaliens.com/.

5

Comparatively, synthetic characters’ behaviour has several noticeable shortcomings.

Typically it is their behaviour that makes characters look less believable and it is rather

easy to observe. Earlier, we stated one of them with the repetitive behaviour that they

possess when the player interacts with them too many times in a row. Sometimes, it feels

as if the character almost forgot that we talked to him earlier.

In our investigation group, GAIPS3, we focus on building synthetic characters that

display a behaviour that humans would like to interact with. Narrowing it down, in this

work we propose a mechanism to support the enhancement of character’s behaviour in

videogames environment. But what elements does a synthetic character possess? Also,

why are videogames such a special environment for synthetic character? And, most of

all, how can we achieve believability in such environment? The rest of this section will be

dedicated to understand the fundamentals of synthetic character so that these questions

can be answered.

2.1 Synthetic Characters

Studies on character’s believability started a long time ago. When the first cartoons ap-

peared, this was already a huge preoccupation. In 1981, Thomas and Johnston published

a book called “Illusion of Life” [32], in which they discussed techniques about how to give

life to hand-drawn characters. Their objective was to permit the viewer’s suspension of

disbelief or, in other words, make him forget that what they were seeing was not real. It

has been a starting point for computer scientists to initiate their journey into the complex

problem that is giving virtual characters a life of their own.

Synthetic characters, just like hand-drawn characters, can be divided into two funda-

mental parts. Metaphorically, we can think of it as the body and the mind. In one hand,

we have the external appearance, including sounds and animations. In the other, we have

the internal states, including personality and feelings. The character we build up through

this work is a dog, Rusty, created using Fallout 34 and its modding framework. Rusty can

bark loudly and can wave his tail in happiness (fig. 2.2). These are part of his physical

attributes. But Rusty is also the player’s trustworthy companion and tries to help him

to the best of his abilities — by guiding him, counselling him or even fighting along with

him. This results from his current internal attributes.

Both the body and the mind need to be consistent with each other for the character

to appear believable [27]. Back in 1992, Bates recognised that too much focus was given

to the visual appearance [2], a fact that still persists today. By breaking the equilibrium

between both parts the efforts to achieve an overall believable character tend to be coun-

terproductive [33]. While this problem can be dissimulated for non-interactive synthetic

3Intelligent Agents and Synthetic Characters Group: http://gaips.inesc-id.pt/gaips/.4Bethesda Game Studios — Fallout 3 (2008): http://fallout.bethsoft.com/.

6

Figure 2.2: Rusty, our synthetic character, is the player’s trustworthy companion.

characters (e.g. characters that appears in computer generated films), once the user has

the power to interact with characters it will be a matter of time before their behaviour

breaks the suspension of disbelief [1]. But why is the behavioural believability so hard

to achieve? We now analyse the dimensions behind a character’s behaviour and try to

understand their inherent complexity.

2.2 Dimensions of a Character

Sheldon [31] divides a character’s behaviour into a sociological dimension and a psycholog-

ical dimension. The character’s sociological component defines how the character came to

be what it currently is. It takes into consideration its growth, its history, its culture and

even its living environment. All these help to define how a character interacts with the

others and how its behaviour might be perceived by them. The psychological dimension

can be though as the character’s mind. We could consider this as the main reason to

justify any current action made by the character. The strongest reason why the character

would do an action in the first place is because of its psychological attributes.

On the other hand, Rollings and Adams [29] prefer to take into consideration the

character’s emotions. They state that a character’s dimension can range from being

zero-dimensional to three-dimensional. Zero-dimensional characters are those with fixed

emotions but no variability between them. They have emotional states from which they

switch unrealistically (e.g. an enemy switching from killing intent to fear when he decides

to flee). One-dimensional characters make this transition smoother. They have a fixed

axis on which their state is slowly progressing. Two-dimensional ones tend to be fairly the

same but vary on more than one axis, giving birth to more complex behaviours. However

two-dimensional characters do not have any kind of internal conflicts between their axes.

James Bond is an example of a two-dimensional character. He can love and he can be

dishonest. He can even do both at the same time, but his impulses will not conflict nor he

will have any emotional ambiguity; he would do anything without remorse for Britain’s

sake. Finally, we have humans and their natural characteristics such as inner conflicts

(e.g. hating and loving the same person), doubts and complex behaviours which is also

the definition of three-dimensional characters. Back in 1994, Bates was already defending

7

that an emotion-less character is a life-less one [3]. He argued that if a character does not

react emotionally to what is happening, if he does not care, then neither will the viewer.

So far, we talked about sociology, psychology and even emotions. By itself, each

of those words represents a broad and complex area. Yet, as said earlier, if believable

characters are present within computer-generated films it must mean that behavioural

believability has already been achieved. So why is there still no believable character in

videogames? The problem might be related to the user interaction.

2.3 Interactive Characters

Rollings and Adams [29] defend that a believable character should possess three important

characteristics. The character should intrigue the user. It should get the user to like him.

And, finally, the character should change and grow according to its experience. This last

affirmation is what makes it so hard for a character that appears believable in a film to

also appear that way in a videogame. In a film, everything that is going to happen is

known before hand. The character’s reactions, its relationships or even its growth, all

these little details are known and set in stone. In a videogame, or any other interactive

application for that matter, we cannot anticipate the full range of action that the player

(the user) will do, nor their impact on the characters. With the introduction of interactive

application, synthetic characters just received a whole new set of problems.

Based on the work on the “Illusion of Life” [32], Bates defined a believable agent

as an interactive character that does not disrupt the user’s suspension of disbelief [1].

He continues saying that in an interactive world, any user would want to be able to do

anything which he is allowed to do. Only then, the user would feel that the world is a

believable one. So that such a believable world could exist, an interactive character should

be able to react for every action that the user makes. Visually, there has been tremendous

progress toward that end. However, for this to happen behaviourally, every one of the

supra-mentioned dimensions should evolve accordingly to the current user interaction.

Yet, Bates also states something that, at first glance, seems contradictory. He says that

a character does not need to have a complex behaviour to appear believable. He clarifies

himself saying that the character should just appear that way to the user [4]. Quoting

him, “an agent that keeps quiet may appear wise, while one that oversteps its abilities

may destroy the suspension of disbelief”. Loyall [18] also argues in the same direction. He

states that it is not necessary for a synthetic character to be fully realistic. Instead, fine

details already have a great impact on how the user may perceive the situation. Indeed,

if the character, at some point, is able to display specific and context-aware behaviour it

could be enough for the viewer to portrait it as having a unique personality, to give him

a distinct behaviour, to give him a life of its own.

A good example is the work of Martinho and Paiva [21]. There, a synthetic char-

8

acter tries to anticipate what the user would do and reacts emotionally whether it is

disappointed or excited by what really happened. Although anticipation has rarely been

considered when creating believable agents [22], they show how powerful it can be in

producing those fine details, that synthetic characters need to feel alive. And this is what

it is all about: creating small tricks that make the viewer forget that the character is a

synthetic one.

Tomlinson et al. work [33] give a few guidelines to help build such mechanisms. They

identified six aspects necessary to create a believable interactive character, which sum-

marise what we have presented until now. First, they defined several kinds of interactions

that a synthetic character should possess. They defend that, initially, it should be aware

of the user and try to establishing a long term relationship with him. Secondly, they say

that if there is something that can be done in the world then the user would possibly want

to do it. In consequence, the character should be able to react accordingly. The third

aspect is about how the character’s behaviour should be rich and varied enough, so that

users do not feel bored while looking at it. The fourth aspect is related to the character’s

growth accordingly to its experience and how it is reflected to the user. The fifth one is

about how the character is used. It should be in a context in which the user can make

expectations and assumptions about what is going on. This can possibly be achieved by

making allusion to existing media, such as using cinematographic cliches. Finally, they

argue that a character’s body and mind should be balanced. One should not raise the

expectations if the other cannot match it. Therefore, if a character is visually believable,

he also needs to be believable behaviourally. Otherwise, it is better to keep the visual

down to avoid breaking the viewer’s suspension of disbelief.

One fact remains, the more a character interacts with a user, the more his flaws will be

exposed. From minions to arch-enemies, a videogame harbours many kind of characters,

each one with a different purpose, each one intending to interact differently with the user

(in this case the player). Among them lives a special character, one that by the nature of

its interaction with the player requires a special attention to its behaviour: the sidekick.

2.4 The Sidekick

In videogames, sidekicks are a specific kind of synthetic character that accompany the

player through his adventure. In some games, the player’s companion may last for more

than 100 hours (e.g. Fallout 3). Their purpose and usefulness vary from game to game

but they always tend to have a crucial role [11]. A sidekick is probably the most important

character besides the player-character and undoubtedly the one which interacts the most

with the player.

The sidekick may be used to deliver story elements by the writer or integrated within

the game’s mechanics by the game designer [31]. But the sheer amount of interaction

9

between the sidekick and the player is what makes him so special and, because the sidekick

has a close relationship with the player-character (as depicted in fig. 2.3), it would be

only normal for them to know each other quite well. The player knows more about the

sidekick as the story unfolds, generally culminating into a strong emotional bond [11], but

what about the opposite? And while a no-name character has to appear believable for

a few seconds, a sidekick has to endure long and long hours of unpredictable interaction

and has to do so while avoiding to bore the player [19]. If he is not able to understand

quite a bit more about the player-character after hours of interaction, is it not a drawback

to believable behaviour?

Figure 2.3: The player-character and Rusty, his sidekick.

There are several ways to give the illusion that a character understands the player.

One example would be to comment about the player’s behaviour (e.g. comment about

the risks of attacking a particular enemy before the player even engages it so that he

can retreat unarmed). It could also be to pro-actively take the initiative without an

explicit order from the player. Both these examples are based on an independent and

context-sensitive behaviour that would match the player’s intentions. But to implement

such behaviour we, who are responsible to create it, would need to access the player’s

intentions, something clearly not possible with our current technology. Still, this is the

direction we choose to follow and in the remainder of this work we will focus on exploring

a possible way to understand the player’s intent so that the above behaviour we gave as

example could possibly be implemented.

2.5 Summary

Through this section we saw that a synthetic character possess both a body and a mind.

We saw that it is important to correctly balance the character’s appearance with its

behaviour and that, to appear believable to the user, it does not need to be fully realistic.

Specific and context-aware details are enough to give the user that sensation. We saw the

problems that arise from its interaction with a user, and we gave an emphasis to characters

that interact for a long period with a user, such as sidekicks. In these cases, we should

expect a deepening relationship between both the user’s avatar (i.e. the player-character,

10

in videogames) and the character and, to that end, it seems important for sidekicks to be

able to understand the player-character’s intentions.

In the next section we shift our attention toward this problem and analyse several

works which use anticipation to enhance the believability of synthetic character by pre-

dicting the next actions of other characters.

11

12

Chapter 3

Related Work

Anticipation has been used in several works to enhance a synthetic character’s behaviour

by allowing, for instance, to model emotional reactions (e.g. being surprised or confused

[13, 20]) or by permitting to adopt context-dependent strategies such as in the SOAR

Quakebot [14]. However, it was recognised by Isla et al. [12] as a key area that still

needs more attention when creating artificial intelligence for computer and videogames

characters.

At its base, anticipation permits to formulate an opinion about how and when we

think that an event may occur. Although we are interested in analysing the intentions of

a character when it does an action, our first step is to actually detect when an action is

performed. The nature of anticipation also provides another important tool: the capacity

to confirm or not that our guess was correct, i.e. an expectation confirmation or violation

[13]. These confirmations or violations already serve an important purpose in permitting

to model several emotions [20] but they may also serve as a basis to understand the reason

why an action was, or was not, performed.

Along this chapter, we present works that use anticipation to enhance the behaviour

of a synthetic character by predicting the future state of their environment. We begin by

presenting the Soar quakebot which takes inspiration in theory of mind to anticipate the

enemy’s decisions. Then, we look at a system that detects the player’s actions (and his

possible intentions) using a notion of plan and use them to disambiguate speech-based

orders. Afterwards, we review a system able to model temporal causality. Finally, we

review Aini, the synthetic flower, which uses anticipation to guide the player into solving

a task and react emotionally based on his performance.

3.1 The SOAR Quakebot

The SOAR — State Operator And Result, a symbolic cognitive architecture [15] — quake-

bot anticipates the behaviour of other characters taking an approach similar to the psy-

chological theory about the human mind, the Theory of Mind. Theory of Mind studies

13

the human’s ability to explain and predict both his own actions and those of the others

[7]. In other words, it gives a possible explanation about our mental processing and how

it influences our behaviour. Multiple Theory of Mind theories exists but they usually

can be divided into two groups, the Theory theories and the Simulation theories (refer to

Theories of Theories of Mind [7] for more details). The latter defends that our ability to

understand others is by simulating them, and their states, into ourselves, as if we were

them. Anticipation was introduced into the SOAR architecture using a similar paradigm.

Laird made the agent simulates that it is another character, consequently giving him the

capacity to predict its enemy’s behaviour [14].

Initially, the SOAR quakebot was built with the objective to control a single character

as if it was a human playing in a Quake deathmatch (see fig. 3.1). It ensures to work

independently of the level as it builds an internal map, in its working memory, at run-

time. Along, it stores useful information about the game’s items such as health packs and

weapons’ respawn points, as well as the agent’s own information (e.g. perceptions, knowl-

edge). Based on these, it applies the operators — primitive actions that can be applied

to the world — which pre-condition match a certain state in the working memory (i.e. in

a reactive manner). The complete update cycle starts by sensing the world, including the

agent’s internal state, therefore updating its working memory. Then, checking for valid

operators and selecting the one to apply, it transfers its command to the game engine to

be executed.

Figure 3.1: A screenshot of Quake 2.

Anticipation was plugged in as an operator, i.e. it is triggered by the desire to predict

an enemy’s behaviour. When selected, predicting an enemy’s behaviour is achieved by

creating an internal representation of that enemy’s known information, structurally similar

to the agent’s own information (e.g. current position, health, weapons). The agent then

tries to apply its own tactics to this set of information. As in the Simulation Theory

[7], the agent uses what it knows from its enemy and simulates what it would do in the

enemy’s place. Thus, the agent can predict the enemy’s action by looking at the final

output.

This, however, has the drawback of assuming that the SOAR quakebot and its enemies

have the same tactical strategy (e.g. which weapon to use, when to look for a health pack,

14

how to ambush). This gets especially problematic if the enemy is a human-player, as two

players would rarely make the same tactical decisions. It could get better if the bot could

learn the enemy’s strategy (but it would have to take an extra care to avoid learning

the wrong things). Yet, even with a good model of the enemy, another problem subsists

— this approach is limited by the model itself. The model only permits the agent to

anticipate decisions previously included in it as tactical decisions.

From reactive rules to plans, these decisions can be as simple or as complex as we

want them to be. They may reflect how a character usually should proceed in the game

(e.g. do the first action, then go somewhere else do the second one, and so on...). This is

also how the next work we present approaches the detection of the player’s intention, i.e.

using a plan.

3.2 Situated Speech

To communicate, natural language seems the most practical way. However, this has yet

to become the main communication medium for games application. While some games,

like Facade1 [23, 24], have made a huge step toward the use of natural language, most

still use the interface commands to let the player express himself.

Gorniak and Roy approached this problem using speech recognition. Speech intro-

duces an additional complexity (e.g. the noise in the utterance) not present in a textual

interpretation but, as they point out, it is also more natural for a player to use his hands

to play and his voice to communicate [10]. To understand the meaning behind the player’s

speech, they analysed it against two models.

First, to understand to which object a word is referring to, they implemented a physical

model. For instance, when the player refers to an object by its name, he has a specific

object in mind, mostly determined by its spatial location, which is not reflected by his

speech (i.e. multiple objects might have the same name).

The second model is an intentional one. In this one, the purpose is to understand a

possible intention behind the player’s speech, also to disambiguate it. The same action

might be referred by diverse utterances and there is a need to understand their funda-

mental meaning. If the player asks to open the door or to let him get out of his room,

both refer to the same intention of having the door opened. This example also reflects

that the player’s might not always express his intentions at the same level. Thus, to cope

with this situation, they implemented a predictive grammar in which symbols correspond

to possible actions for the player. It determines, probabilistically, which action the player

will mostly likely do next, and use it to disambiguate the utterance. To construct such

grammar, they take into consideration the possible ways to achieve a particular goal and

their respective variations.

1Procedural Arts: Facade. http://www.interactivestory.net/.

15

Because a game has such goals for the player, and assuming that the player typically

plays the game to complete it (or explore it) [16], it gives us a basis to understand the

player’s actions and to match them against possible intentions. While useful for broader,

and well structured goals in which a plan can be devised, sometimes the game might

provide (or the player himself may create) optional challenges which is up to the player

to beat or to ignore (e.g. defeat an enemy). Including these optional challenges in the

grammar might give it an unnecessary complexity and not always reflect its true purpose

(e.g. is the player defeating enemies to progress toward the completion of the game or

is he just looking for a rare loot?). Although, some times, the intention behind it might

not be relevant (e.g. the fact that the player is attacking an enemy is sufficient), it might

also hide another intention (e.g. defeat an enemy to obtain a key relevant for the main

challenge).

The next work we present departs from the notion of plan and permits formulate

relationship between different events in time by modelling temporal causality.

3.3 Temporal Causality

In his master thesis [6], Burke presents a system that understands temporal causality

between previously unrelated events. His work, based on ethological studies, gives the

agent a mechanism to represent cause and effect in time. The agent is able to believe

that, under certain conditions, a stimulus might cause another one to appear — even if

both might be unrelated.

Nevertheless, to achieve proper results, some pruning and reinforcement are needed.

A trial is started each time a stimulus that might predict another one is received. Suc-

cessfully passing a trial reinforces the belief that certain stimuli cause the appearance of

another one. On the other hand, failure leads to more uncertainty and, ultimately, that

possibility is pruned. This system’s predictors encapsulate both Scalar Expectancy The-

ory (SET) and Rate Estimation Theory (RET) theories. SET helps the agent to know

when to expect the effect caused by a certain stimulus. RET, on the other hand, helps

it to judge the stimulus’ reliability when trying to predict another one. Burke also shows

that the agent’s drives can help it to select which action might be the more appealing.

Unifying both drives and predictors, he shows how agents can learn to expect specific

events and how they can even be triggered, out of necessity, by the agent.

Burke used this system so that agents could understand their surroundings, which is

also where the user’s interaction belongs. This system makes the agent able to analyse

and understand the player’s behaviour. Indeed, while it is not only used to that end,

it detects the player’s action pattern and learn how he tends to act. It has been used

to make Duncan, the Highland Terrier (fig. 3.2), learn when to expect a treat from the

user, for instance. At the same time, Duncan can also use the system to its advantage by

16

Figure 3.2: Duncan, the Highland Terrier.

understanding better how his world works.

Using this system, we could potentially predict local actions independent from their

broader goals. Nevertheless, if we focus on anticipating the player’s behaviour, this ap-

proach would take some time to build a useful set of patterns and, in the mean time, the

player might change the way he plays. More importantly, we would want to avoid the

system to display that it learned the wrong patterns (that might have been learnt out of

pure luck, and that might take a while to prune) which might engender a non-believable

behaviour.

Another complementary approach that could help to recognise which entity could

interest the player — thus allowing to recognise local actions independently from a plan

— might lie in emulating his attentional system. If we can recognise where the player

is shifting his attention, then we might also formulate some hypotheses about his next

action. The next section presents how such system was emulated within a synthetic

character.

3.4 Aini — The Synthetic Flower

Aini is a synthetic flower built in a hangman-like game. While the player uncovers the

unknown word by putting letters in their respective slot, Aini watch the scene and reacts

emotionally based on the player’s performance [21] (see fig. 3.3). Aini was built on top of

emotivectors, low-level anticipatory mechanisms [20] (presented in detail in section 5.1),

that help to control his attentional and emotional behaviour. Aini has a grid of sensors

that covers her vision area and each one reports a normalised distance to the closest letter

it detects.

The emotivector’s exogenous salience helps Aini to detect which letters moved the most

surprisingly. It is achieved by calculating the mismatch between the object’s expected

position and its actual position. Greater differences are translated as surprising events

to Aini, events to which she will most likely pay attention. It was compared against

17

Figure 3.3: Aini, the synthetic flower.

popular games approach, which in Aini’s world corresponds to give the attentional focus

to the letter which is closest to her. It was shown that the exogenous salience helps to

select more relevant letters and support a behaviour more adequate to what the player is

actually doing.

This is important to us as it models Aini’s attentional system. It assumes that sur-

prising events make strong candidates for the character’s focus of attention and this was

corroborated by the realised experience. Detecting the character’s attention focus is a

first step toward understanding his possible actions as it narrows down the list of possible

entities the character might wish to interact with.

3.5 Summary

In this section we looked at works that focused on improving the believability of synthetic

characters’ behaviour through the use of anticipatory techniques. We also saw that they

could be used to predict a character’s (and thus the player-character’s) next actions and

in the following table (3.1) we briefly review the presented works. In it, we analyse the

type of actions these systems might be suited to detect as well as their respective approach

and some their limitations.

The SOAR quakebot [14] and the situated speech work [10] seem better suited to

detect a character’s behaviour that is part of a plan, suited in cases in which we want to

predict the higher level objectives of the character. On the other hand, temporal causality

[6] and Aini’s attentional system [21] seem to be most useful in detecting the player’s local

actions, regardless of its possible use at a larger scale.

Detecting actions is just the first step, we still need to understand the reason why

they were performed. In the next chapter, we propose a model that relates the intentions

of a character to the actions it performs in its virtual world.

18

Work Possible use Approach Limitation

SOAR quakebot

Detect actionsthat may bepart of a strate-gy/plan.

Guessing the char-acter’s internalstate.

Assumes that thecharacter has thesame internal modelas the agent.

Situated speechDetect actionsthat are part ofa plan.

Plan-based proba-bilities.

Does not account foractions outside theplan.

Temporal CausalityDetect corre-lated actions.

Model causality.Might not adaptquickly enough.

AiniDetect the focusof attention.

Quantifying unex-pected events.

Does not take plansinto account.

Table 3.1: Comparative analysis of the presented works.

19

20

Chapter 4

Intentions and Actions

Until now, we have been talking about intentions but we have yet to define the scope

of this term. Indeed, we have been using the same word to refer to multiple concepts.

An intention can potentially assume several forms, it can be as simple or as a complex

as we want them to be. It can encompass higher-level goals supported by a plan (e.g.

invade a building), tactical decisions (e.g. choosing a safe route over a way which is

ambush-prone) or spontaneous intents (e.g. getting out of the way of an enemy). These

can even be combined into hierarchical intents [10]. Consider, for instance, a game session

in which the player wants to invade a building. He has two alternatives from which he

choose the safest. Even if it is the safest, it does not mean that it is enemy-free and

one suddenly appears in front of him. If he goes toward that enemy, what could possibly

be his intention? The player might want to kill the enemy but does that mean that he

does not want to invade the building any more? Not necessarily as he might just have a

short-term intent to dispose of the enemy and keep the building invasion as a higher-level

intent that involve multiple short-term, and spontaneous, intents.

This is, however, a confuse use of the term “intention”, therefore, along this section,

we elaborate a model of intention that we will use consistently through the remainder of

this work. We first look at Searle’s theory in which he defines the notion of intentions and

relates it with actions. We then look at how, in agent systems, computer scientists have

incorporated this notion. With both a theoretical and practical background, we make a

distinction between the player and his controlled character, the player-character and we

elaborate our model that relates two kinds of intentions with the actions a player may

perform. We continue describing how an action can be divided into three components:

movement, target and intent. Finally, we proceed to detail how the three components of

an action can be detected in a virtual world, based on the matching and mismatching of

anticipated behaviour related to distance change between entities.

21

4.1 Intentions in Philosophy of Mind

Philosophers have long ago formalised the notion of intentionality and, in his definition

of intentionality, Searle [30] states that:

“Intentionality is that property of many mental states and events by which

they are directed at or about or of objects and states of affairs in the world.”

As an example, Searle stated that “if I have an intention, it must be an intention

to do something” ([30] p.1). An interesting topic, for our model of intentions, is the

relationship he defined between intention and action ([30] p.79-111). To proceed, we first

need to borrow the definition of four terms from Searle’s theory:

∙ A prior intention is an intention that would lead to an action (i.e. premeditation).

∙ An action is a causal and intentional transaction between the mind and the world.

∙ An intention in action is the intentional component of the action. Might be

though of as a volition.

∙ The bodily movement is the element which constitute the conditions of satisfac-

tion of the intention in action.

Summarizing these definitions, the following relationship can be established ([30] p.94):

prior intentioncauses−→

action︷︸︸︷intention in action

causes−→ bodily movement

Searle states that while a prior intention may lead to an action, an action may exist

without a prior intention ([30] p.85), but it has to contain an intention in action ([30]

p.82). Consider an example in which someone wants to enter a building (prior intention).

First, he needs to open the door (action). The action of opening the door has an intention

in action (the desire to open the door) and a bodily movement (the actual opening of the

door). The relation between intention and action is important as, in interactive virtual

worlds, every change caused by the user to the application is a result of an action of the

avatar the user controls.

Yet, we still feel that there is a need to express a definition suited for videogames

applications. Hence, in the next section, we look at the belief-desire-intention (BDI)

architecture that already implements the notion of intention in software agents.

4.2 Intentions in Computer Science

The BDI architecture finds its roots in philosophy and is based on a human practical

reasoning model of the same name [5]. This architecture is used to model deliberative

22

agent [28] with a clear separation between choosing what to do and how to do it. In such

architecture, the agent’s information about its world is represented as beliefs, the desires

represent the agent’s objectives or motivation and its intentions represent its commitment

to achieve a particular desire.

Rao and Georgeff [28] propose an abstract implementation of this model in which two

definitions are notably interesting for our work:

∙ An intention is formed by adopting a certain plan of actions. It is selected according

to the agent’s beliefs, desires and current intentions.

∙ A plan is a sequence of actions, or sub-plans (hence the possibility of hierarchical

plans), which purpose is to fulfil a desire in a given situation.

The BDI definition constitutes a practical application of the notion of intentions,

and some similitude can be found between it and the SOAR quakebot and the intention

detection in Gorniak and Roy’s work (presented in section 3) as both use a notion of

plan to guess a possible intention. This model does not, however, encompass the notion

of intentional action with no prior intention (or desire, in this case). Furthermore, for

this model to be truly effective we ought to know about the subject’s beliefs, desires and

current intentions. It is not a problem when applied to an agent, as they are part of the

system, but it constitutes a problem with the player.

4.3 The Player and the Player-Character

Before proceeding toward our model of intentions, we need to categorise what we assume

when we refer to the player. Searle’s definition assumes that the subject is a human, which

accomplish his actions through movements. But what kind of movements does a player

have? He mainly interacts with a physical device, his controller (or keyboard and mouse,

in the case of computer games). The BDI model is also based on humans and, as an

agent architecture, it inherently assumes that all the agent’s internal states are available.

However, the player, while human, is but a sequence of inputs from the point of view of

the system. This sequence of inputs controls a virtual character, the player-character,

from which the system knows nothing of its internal state. It does not know its beliefs,

nor its desire, much less of its intentions, these are all in the player’s mind. While we can

try to create a mental model (Gorniak suggests some steps to create an accurate mental

model of the player in [9] (p.79-85)), it will not give us all the variables of the equation,

and it is outside of our research’s scope.

Because the player is the human which express himself through his character, all the

information that we can get from him comes directly from his avatar. Anything that he

wants to express is limited by the videogame mechanics and by his avatar capabilities. In

23

the same way, the intentions that we try to recognise are those of the player-character,

which might not correspond exactly to the intention of the human behind it. The logic

behind most videogames is one of mutual interaction between the present entities [25]

(e.g. the player-character and an item or another character). Thus, we suppose that most

of the intentions of the player’s avatar are related to these interactions which might not

be the player’s real intentions — he is a human with an unlimited range of intentions and

not one which can only interact in a limited way with his surroundings.

With this consideration, we then proceed to formulate our model of intentions and

will use it through the remainder of this work to detect the intentions behind the player-

character’s intentions.

4.4 Model of Intentions

In a videogame, a player has to obey certain rules. If the player has to invade the enemy’s

headquarters, he will have to do it. If the player cannot kill an innocent villager, there will

be no intention in the world that will make it happen. These rules, or game mechanics,

help us to narrow down the actions that a player can perform and consequently their

intention in action. From here onward, we use the term intent instead of intention in

action, but it still represent the volition in an action’s execution.

Actions, in videogames, are typically performed from a list of available “physical”

movements that an entity may perform on a target. The target can be the player-character,

another character or a world artefact, as long as it is a game entity (i.e. something that

is relevant to the gameplay [29]). Additionally, the movement and the target restrict the

possible intent behind the action based on the target’s stimulus-response compatibility

(affordances) defined by the gameplay rules. Once again, game mechanics are the key

in the sense that if they define the player-character as being solely able to “attack” a

character of the type “enemy” and “greet” those of the type “villager”, then we will only

look for these specific intents when the opportunity arises.

Let us exemplify with a concrete movement : the player displaces the player-character

(e.g. makes it run or walk, one of the actions available for that character) through its

virtual world. While moving, if the player has the intent to attack an enemy, he moves his

character toward it. This action will be different than if he wants to flee from it. In the

first case, the player-character is going toward an entity, while in the second, the player-

character is going away from it. This example also helps to understand that a gameplay

action might not just refer to explicit, well-defined actions typically tied directly to inputs

(e.g. jump, shoot, use an item). In this case, we consider that fleeing from a threat is also

a gameplay action, at the same level as opening a container or a door. These actions have

different intents (e.g. avoid a threat, get some item or progress further into the level),

different movements (e.g. walk/run or “activate” an item) and different targets (e.g. an

24

enemy character, a container or a door).

Finally, we have the prior intention. These intentions are, as defined by Searle, inten-

tions that may lead to one or more action or prior intention – as a premeditation. We

thus define these intentions recursively, similar to the concept of plans in the BDI archi-

tecture and to the suggestion given by Peter Gorniak and Deb Roy [10] (i.e. when the

player tries to achieve his objective, he typically follows some kind of plan). Their level

in the hierarchy makes these intentions suitable for the objectives that the gameplay may

offer to the player. Gameplay objectives provide challenges attainable through a serie of

actions (or lesser challenges) but they usually do not force them on the player, i.e. he is

free to realise these actions whenever he wants.

The introduced concepts and their relation to their videogame’s counterpart are re-

sumed in the following figure:

Figure 4.1: Our model of intentions mapped into game elements.

We may wonder whether or not the player might even want to fulfil the game objectives

and, thus, follow our model of intentions. The player is free to do whatever he wants, but

if he is playing the game, Nicole Lazzaro’s work [16] points that it is typically to have

fun and play along the rules. Her work analyses the reason people play games and results

are divided in four categories. Some peoples like to play games as a mean to achieve

“altered states”. They play to clear their mind and feel better. Others play for “the

people factor”. They enjoy the activity and social interaction with other people, much

more than the game itself. However, the remaining two categories are the most interesting

for the kind of problem we are targeting, an open world full of challenges and places to

explore.

First, some people play for the emotions they get from the game’s challenges. They

want to beat the game and feel accomplishment in doing so. These players are categorised

as having a “hard fun”. In contrast, there are people that like to have “easy fun”. Such

people like to explore their virtual world and be in awe with their discoveries. They pay

attention to details and enjoy experience total immersion in the game.

In games such as role-playing games, adventures or even shooters, this means that

players like to either explore and/or beat the game. In the first case, the player is not

preoccupied with clearing objectives so he might elicit more actions per-se, deprived from

25

prior intentions. In the second, as he wants to complete the game, he is prone to follow

some kind of plan — although it might happen in an unordered fashion.

Through this work, we opted to focus on the detection of the former, actions with

no prior intentions. Such actions might happen anywhere, any time. No players are

contained strictly in one of the four aforementioned categories. So even players that are

set on beating all the game challenges might, at one time or another, do something that

has not much purpose in their plan but that are relevant for their own enjoyment.

We now clarify how and when to detect these actions.

4.5 Components of an Action

The logic underlying the game mechanics of a virtual world is inherently constrained by

the graphical structure in which the action takes place and, in game dependants of their

graphical logic, Mateas and Wardrip-Fruin [25] state that the logic can often be reduced

to two fundamental components: displacement and collision detection. Therefore, the

concept of distance is fundamental to understand what is happening in the virtual world.

As an example, consider Rusty trying to bite an enemy. To be able to catch his opponent,

Rusty will need to reduce the distance between them until he is close enough to perform

the action of biting the enemy.

As a first step, we perceive all that is happening in the virtual world by reducing

everything to distance variations between entities. If the distance between two entities

is below a certain threshold, then something happens. These entities can be intentional

or static. Intentional entities are able to move and act on their own and, as such, have

inherently an intent associated with that movement, while static entities can only be acted

upon, and have no intentional state attributed, although displaying certain stimulus-

response compatibility (affordances): if a character approaches a closed door, it probably

intends to open it.

Because we reduce everything to distance variations, detecting the player’s intention is,

in our approach, to understand why the player-character is moving toward or away from

certain entities. Because distance varies generally in a continuous manner, expectations

can be created regarding distance variation. Such expectations, when violated, can be

used as a temporal signal that new intentions may have been generated by unpredicted

change in the action.

We now discuss how the three components of the action — movement, target, and

intent — are computed in our model using a small scenario extracted from our test-case.

26

4.5.1 Scenario

Imagine a scenario with four distinct entities: the player (fig. 4.2(a)); a dog, Rusty

(4.2(b)); enemies (4.2(c)); and a building (4.2(d)). The player’s objective is to get into

the building alive. He may choose to dispose of his enemies, but it is not a pre-requisite

to enter the building. In this scenario, Rusty acts as the player’s sidekick and observes

him constantly.

(a) Player-character (b) Rusty (c) Enemy (d) Building

Figure 4.2: The entities of our scenario.

4.5.2 Movement

A first problem in the detection of the player-character’s intent is to detect when a new

action begins. For that effect, we use movement. Consider that the player-character is

moving in the direction of a certain entity (a building’s door) in a predictable manner. At

a certain point, it unexpectedly changes direction and moves towards another entity (an

enemy). While all this happens, Rusty is observing. Figure 4.3 depicts such a situation.

Figure 4.3: When the player-character makes an unexpected movement, Rusty detectsone action based on the intentional entity (the enemy) and another action based on thestatic entity.

We consider this event mark the moment the player-character decided to interact with

the new entity. While unable to determine the start of any action until that point, Rusty

is now confident that the player-character started a new action: to go toward the enemy.

It is important to note that another action also started: getting away from the door.

In practice, we monitor the distance variation between pairs of entities. When an

unexpected variation occurs, it marks the start of a new action: if the first derivative

27

of the distance between entities is negative, they are moving closer; if positive, they are

moving away from each other.

4.5.3 Target

When monitoring the distance between an intentional (e.g. the player-character) and a

static entity, we know that when the distance changes, the intentional (not the static)

entity is moving (fig. 4.4(a)). When both entities are intentional, however, other measure-

ments are needed to disambiguate between possibilities for agency. Consider, for instance,

that distance between the player-character and another enemy agent has decreased more

than expected. In this case, there are three possible scenarios (fig.4.4(b)): the player-

character moved closer to the enemy; the enemy moved closer to the player-character;

both moved closer to each other.

(a) The player is the sole responsible for the distance increase betweenhimself and the building.

(b) Who is responsible for the distance to decrease?

Figure 4.4: Distance between intentional entities alone is not enough to determine agency.

To determine agency, we monitor the entities relative velocities. When the velocity of

an entity changes unexpectedly, we assume it may also influence distance between entities

in an unexpected manner. As such, if we monitor both an unexpected distance variation

as well as an unexpected velocity variation of an entity, the entity is considered responsible

for the movement, and the other entity is marked as the target of the movement. In such

cases, four combinations are possible: the player-character is getting {closer to, further

away from} the entity; the entity is getting {closer to, further away from} the player-

character.

28

4.5.4 Intent

The last component of an action is its intent. The intent results from a natural combina-

tion of movement and target. As discussed previously, the rules underlying virtual worlds

are mostly based on distance variation and collision detection [25]: if the player-character

wants to interact with an entity, it first needs to reach within its activation range. By

detecting the movement of “getting closer”, we can devise the intent based on the affor-

dances of the target, i.e. the type of interactions allowed by the target entity. The same

reasoning can be applied for “getting away from”.

Consider an example in which Rusty notices the player-character moving closer to

(movement) an enemy (target). From the affordances given by the enemy entity, Rusty

assumes it must be to attack it (intent). If the player-character was to move toward the

building’s door, Rusty would assume it would be to open it (the only thing one can do

to a closed door in the virtual world). If the player would move his avatar away from an

enemy, Rusty would assume he wanted to avoid him.

4.6 Summary

In this section, we provided a model of intentions that we will use to detect the player-

character’s intent. Using Searle’s [30] distinction between prior intention and intention

in action (intent), we then formulate that an action is composed of:

∙ A movement or one of the available physical movements of an intentional entity

(e.g. “moving” around in the world, “using” an item).

∙ A target or a game entity in videogames (e.g. the player-character, an enemy, a

building).

∙ An intent or the affordances the target offers for the given movement (or e.g. to

attack, to invade, to flee from).

This model is inspired by Searle’s essay on intentionality [30] and on the BDI archi-

tecture, an existing software model. Our model defines that an action may exist per-se, it

can also be included into a hierarchical form, a plan. The intention to achieve or perform

such a plan corresponds to a prior intention, i.e. a premeditated intention which actions

represent a necessary condition to be completed. However, we restrict the scope of this

work to the detection of actions per-se, or in other words, spontaneous actions that may

or may not be part of a prior intention.

We then proceeded to detail how the three components of an action can be detected in

a virtual world, based on the matching and mismatching of anticipated behaviour related

to distance change between entities. Namely that the movement marks the moment an

29

action is performed and, thus, detected and analysed. The target is the entity involved in

that action which agency is the lowest (the highest being the responsible for the action).

Finally, we detect the intent based on the combination of the movement and the target :

it expresses the reason why the responsible for the action would interact with the target

with such movement.

In the following section, we describe DogMate, the anticipation-based framework that

we developed to recognise this model in virtual worlds.

30

Chapter 5

DogMate

Our objective with this work is to detect the underlying intent of a character’s actions.

More importantly, we want to achieve this detection on a specific character, the player-

character (therefore, entirely controlled by the player). In chapter 3, we analysed several

approach to enhance the believability of character by predicting the player-character’s ac-

tions. In this chapter, we elaborate a framework (DogMate) in which we apply our model

of intentions to extract the intent of the predicted actions. However, several problems

arise from the application of this model to videogames. Namely, when multiple actions

are detected, which should we consider? Which is the most relevant? Also, as we are

creating a support for believable behaviour, we find important to include the context in

which an intent is detected as it may produce a noticeable difference on the responses it

may elicit. For instance, should a character consider that the intent to attack an enemy

is always appropriate? It depends on its own beliefs as different individuals may inter-

pret differently the same situation, depending on their own perspective. After looking at

Aini’s architecture (section 3.4), we chose to use the emotivector to support our work as

it permits to solve the new problems we enumerated above.

This process is presented in this chapter as our framework, DogMate. We first provide

a description of the support we use, the emotivector. Then, we discuss the predictors used

to detect the components of an action. We also highlight some problems that led us change

the original model of the emotivector and propose a new method to quantify the relevance

of an unexpected event. We then show how it can be used to compare several actions and

its components. Afterwards, we present how the emotivector’s affective appraisal helps

to interpret the detected intents from the character’s perspective. Finally, we finish this

chapter by presenting DogMate’s flow as a whole, ready to be integrated within an agent

architecture.

31

5.1 Emotivector

The emotivector is an affective anticipatory mechanism [20] designed to operate at the

perception level. It approaches representation of anticipation by giving the agent archi-

tecture a wide range of possible feedbacks about the expectations it may possess. This

approach allows the use of a richer set of behaviours, such as realising that a mistake has

been made or even being confused by what is happening, illustrating important traits of

human behaviour. The emotivector is applied in works such as Aini, the synthetic flower

[21, 20] (presented in section 3.4), or with the Philips iCat robot acting as an affective

game buddy [17]. In both, the emotivector is used to predict the user’s next action and

it was shown to enhance character’s behaviour. In our work, more than predicting what

the player-character will do, we use the emotivector to understand why he would do it.

The reason why the user did a certain action might change his understanding of his sur-

roundings and consequently change his expectations about the possible consequences of

his act.

The emotivector is a strong support to apply in practice the concept we formulated

in last chapter as it allows to detect not only the three components of an action but

also to quantify it based on its relevance to the player-character and classify it from the

observer’s perspective (i.e. based on Rusty’s beliefs).

It works by keeping a history of the perception’s value and uses it to predict its future

value. By analysing both the expected value and the sensed one, the emotivector generates

an attentional (salience) and affective (sensation) value associated with the signal (see fig.

5.1). On one hand, the emotivector’s salience helps to inform about the percept’s relevance

and its exogenous component reflects the unexpectedness of external stimulus. On the

other hand, the generated affective sensation helps to classify the mismatch between the

sensed value and the expected value as positive (rewarding) or negative (punishing), based

on whether or not it is converging toward the desired value.

Figure 5.1: The emotivector is attached to a perception channel. The mismatch betweenthe sensed and predicted information produces an attentional salience. Classifying bothagainst the desired value permits the emission of an affective sensation.

By introducing a concept of prediction error, the emotivector models an “as expected”

range. The error-predictor produces an expected prediction error which represents the

tolerated variance for a given sensed value. If the mismatch between the sensed value and

the expected one is within the predicted error’s range, then it is considered an expectation

32

confirmation. Otherwise, it produces an expectation violation. Combining this concept

with the original affective sensations allows the emotivector to classify a signal within

nine sensations, from (more or less) punishing to (more or less) rewarding, from expected

to unexpected (as shown in figure 5.2).

Figure 5.2: The emotivector’s nine affective sensations. We use the original nomenclaturein which R stands for reward and P for punishment. The sensed signal (columns) isrepresented as a continuous line while the expected one (lines) as the dotted line. Theinterval represents the expected error.

The emotivector’s predictor is independent from its sensation model and should cor-

respond to the one which adapts the best to the signal it monitors. For instance, through

this work, we use mostly examples based on the distance. The predictor we opted to use

is a moving average [26] based on the second derivative (i.e. the projected acceleration).

The same goes for the error predictor, in which using a predictor that best match the

signal produces more accurate results.

Let us then exemplify the usage of an emotivector and at the same time define its

underlying concepts. Recall the scenario’s entities presented in section 4.5.1 and imagine

that the only thing the player-character has done recently is to be idling around (fig.

5.3(a)) but suddenly he starts moving and goes toward an enemy (fig. 5.3(b)). Rusty

looks at him happily, as he is going to free the world from a dangerous threat. To model

such situation using an emotivector, we first we attach it to one of Rusty’s sensors that

measure the distance between the player-character and an enemy. Initially, as the player-

character has not been moving that much, the predictor estimates the distance to remain

approximately the same.

However, once the player-character started to move toward the enemy, the predicted

distance did not match the expected one and the percept receives a high salience from its

exogenous component.

The emotivector’s exogenous component model the unexpectedness of a signal such

as:

EXOt = ∣xt − xt∣ (5.1)

33

(a) The player is idling. (b) The player suddenly moves towards its enemy.

Figure 5.3: Rusty observes the player-character happily as he suddenly begins to movetowards an enemy.

In equation 5.1, xt represents the sensed value at time t and xt the expected value for

this time t. Let us consider that the player-character was at a distance of 100 units from

his enemy, which corresponds to the expected value (xt). Suddenly, this distance reduces

to 90 units, representing the sensed value (xt). This gives us an exogenous component of

EXOt = ∣90− 100∣ = 10.

This prediction’s error also corresponds to 10 units, as it is defined similarly to the

exogenous component:

PredictionErrort = ∣xt − xt∣ (5.2)

This value exceeds the predicted error which is just 1 unit (obtained from the error-

predictor) and is consequently classified in the unexpected range. Furthermore, because

Rusty wants the player-character to dispose of his enemies, when he observes him going

toward an enemy (the sensed value is converging toward the desired value — 0 in this

case), he classifies that event as rewarding. This combination emits an unexpected reward

(unexpected R in fig. 5.2) and a happy emotion surfaces: Rusty wants the player-character

to move closer to that enemy and that is what he did.

We now present the predictors that could have been used within this scenario. At the

same time, we shift our focus on detecting actions and its three components.

5.2 Predictors

To fully identify an action, we make use of three emotivectors between pairs of entities.

One uses its predictor to estimate the distance variation between the entities (to determine

the movement) and two to estimate each entity’s relative velocity (to determine the agency

and therefore the target). We currently use the euclidean distance as it represents a

good trade-off between estimation accuracy and computational cost. Our predictors are

implemented using the following equations:

34

dt = d(t−1) + ˆdt (5.3)

ˆdt = d(t−1) + ˆdt (5.4)

The expected distance at time t (dt) depends on the expected velocity ˆdt (eq. 5.3)

which, in turn, is based on the expected acceleration ˆdt (eq. 5.4). To estimate the

acceleration, we use the following equation:

ˆdt =

n∑i=1

d(t−i) − d(t−(i+1))

n(5.5)

We predict the acceleration (eq. 5.5) by applying a moving average [26] to the last n

sensed accelerations. Empirically, we found that a moving window of n = 3 gives adequate

results, as the window is small enough to adapt quickly to acceleration change, while large

enough to mitigate small acceleration variation due to the noise present in the original

signal.

5.3 Relevance

In figure 4.3, Rusty observes an unexpected movement of the player-character which

generates two actions: “getting closer to the enemy” and “getting away from the building’s

door”. However, the player probably only intended to make one of them. While both may

contain relevant information for decision-making, it is important to be able to quantify

and classify their relative importance for Rusty.

To express the relevance of the action for a character at time t (Rt), we use the

following equation, representing the degree of unexpectedness of the sensed value:

Rentt =

PredictionErrorentt

ExpectedErrorentt

=errentt

êrrentt

=∣xt − xt∣

êrrt(5.6)

In equation 5.6, xt and xt represent the sensed and expected value, respectively. We

based this equation on the concept behind the emotivector’s exogenous model (eq. 5.1)

but by computing relevance as a ratio, we make the prediction independent from the

metric it measures or even its range.

This is important as the signal we measure is the distance and it may pose problems

if we normalize it as the original exogenous model requires. For instance, if we use the

maximum distance of the virtual world to normalise any sensed distance, most of the

range would be unequally used as shown in figure 5.4. First, because the maximum range

would only be used if the player-character stands near one extremity while the other entity

35

is at the other extremity, which is a limited scenario. Second, the visual obstruction does

not permit, most of the time, to actually notice the whole level at once (the same is true

for either the player or other characters as they are restricted by their sensors). We would

then be restricted to a sub-range almost all of the time. Also, an alteration to the game

level design would also imply an alteration to the normalization function — a process

which might be prone to errors if it has to be hand-adjusted. Another drawback is that

it might not be trivial to calculate the maximum distance of a level. It could be the

maximum of the minimum distances but, even then, if the level is dynamic, it is prone to

change. And such calculus at run-time might not be viable at all. These problems made

us consider that the distance value should be used as-is within the emotivector and made

us consider this new model to compute the relevance (conceptually identical to the notion

of salience) of a sensed value.

Figure 5.4: In a normalized range, the whole range would hardly be used.

When Rentt > 1, we say that the signal of that emotivector is salient and that an

expectation violation occurred. To compute it, however, we need to predict the prediction

error ( êrrentt ). We compute it using the following equation:

êrrentt = k × errent(t−1) + (1− k)× êrrent(t−1) (5.7)

The variable k takes values within the [0, 1] interval. A high value for k allows the

estimation to adapt quickly to huge variations, while a low value makes the estimation

more conservative. When an unexpected event occurs, the acceleration predictors need

a certain time-frame to re-adapt to the signal and emit correct predictions once again.

Within this time, it is important to avoid detecting a false unexpected event as a result

of adaptation. After adapting to the new signal, prediction error (and consequently the

expected prediction error) will decrease.

Let us exemplify how prediction and relevance work using the scenario from figure

4.3. The player initially moves the player-character simultaneously toward an intentional

entity (enemy) and toward a static entity (the building’s door). Assume that, when the

unexpected movement occurs, the player-character is 700 units away from the door and

400 units away from the enemy. Additionally, because the player-character was walking

36

toward the door, we expect the distance to the enemy (xt) to remain around 400 units,

while the distance to the door is expected to decrease to 650 units. Finally, because our

predictions have been accurate this far, we expect a low prediction error in both cases:

10 units. Now, the user broke our expectations and is (xt) closer to the enemy (300

units away from the enemy) while further away from the door (the distance to the door

increased to 750 units):

Rdoor =∣750− 700∣

10= 5

Renemy =∣300− 400∣

10= 10

When comparing both actions resulting from the distance variation to the door and to

the enemy, we can say that the action to go toward the enemy is two times more relevant

(unexpected) than the one to go further away from the door. We assume that the more

unexpected an action is, the more likely the player meant to do that action intentionally

(in this case the action to go toward the enemy).

With this model, we are not only able to distinguish actions and order them by rel-

evance, but also, if we combine relevant changes in the velocity-emotivector (table 5.1),

we can decide which entity is responsible for the action (the other entity being considered

the target of the action).

Velocity Entity+ Not salient Entity-Player+ Both seek P seeks E P seeks E; E avoid P

Not salient E seeks P None E avoid PPlayer - E seeks P; P avoid E P avoid E Both avoid

Table 5.1: The nine salience patterns from the velocities emotivectors. Plus and minusindicates the convergence toward the other entity and the divergence, respectively.

In this case, if the enemy had a relevant change in relative velocity, it would mean the

enemy started to attack the player-character, while if the relevant change occurred with

the relative velocity of the avatar, the player-character would have initiated combat.

5.4 Affective Appraisal

To model the intent interpretation from the character’s perspective, we appraise the intent

using the emotivector’s sensation model. However, this model does not allow cases in

which Rusty wants the player to avoid another entity. Until now, we have been exploring

examples in which Rusty actually knows that there are no risks for the player-character

if he engages into a fight with his enemies. Still, it only makes sense if Rusty clearly sees

37

that the player is in condition to sustain yet another fight (fig 5.5(a)). If Rusty thinks

that the player-character might not handle it, then it would not be wise for Rusty to

express a happy emotion or it would appear as out-of-context.

(a) The player is in good condition. It is al-rightto fight.

(b) The player’s condition is bad. Rusty doesnot want him to fight.

Figure 5.5: Rusty might interpret the same action differently depending on the status ofthe player-character.

To allow the modelling of such situations, we needed to have a concept of undesired

value. The difference between having an explicit undesired value and setting the desired

value to a different value is that the desire to avoid may not equal the desire to reach

something else. We might just want to avoid getting close to a particular entity but, at

the same time, not care that much how far we get from it.

Using this concept, we can appraise an intent depending on a character’s perspective.

First, we set desired (and undesired) values for distance, depicting where that character

would like the distance between two entities to be (or move away from). Then, when a

relevant change occurs in the sensed distance between the two entities, we compare the

sensed value, the expected value and the desired/undesired value to produce an affective

state. If the sensed value is closer to the desired value, it is considered a positive sensation.

If it was expected to be even closer it is a “positive but worst than expected”, if not it

is a “positive and even better than expected”. The same applies if the sensed value is

diverging from the desired value, but in such case it is considered a negative sensation.

The reasoning for getting closer or away from undesired values is the negation of the

previous one (e.g. avoiding an undesired value is considered positive). As a result of this

affective appraisal, one of four affective states is generated for each unexpected change.

The affective states will influence the character’s decision making.

As an example, consider that Rusty sees the player-character is hurt. Rusty desper-

ately wants him to avoid any threat, by setting an undesired value of 0 for the distance

between the player-character and any enemy. Consequently, if he detects the player-

character has an intent to attack an enemy, he will view that intent as bad for the player

(the sensation will be a negative one). Based on this personal interpretation, Rusty will

assume a context-specific strategy and display an adequate context-aware behaviour, to-

tally different from the one resulting from appraising the event as positive. Table 5.2

shows an example of such categorisation.

38

Sensation Affective Reaction Description

Positive and better than ex. Happy The intent fits the situation.

Negative but better than ex. Surprised (Happy)Expected a negative intent,but this one is good as it fitsthe situation.

Negative and worst than ex. AngryThe intent does not fit thesituation.

Positive but worst than ex. Confused (Worried)Expected a positive intentbut this one is bad as it doesnot fit the situation.

Table 5.2: The generated affective sensations help Rusty to contextualise the detectedintent within its current situation.

We now describe our framework’s flow, from the emotivector’s parametrisation to the

appraisal and selection of the resulting intent.

5.5 DogMate’s Flow

We now have described all elements needed to recognise actions and their underlying

intents, to appraise and classify them. DogMate (represented in fig. 5.6) is the name

we gave to this framework and we now review its flow during one update cycle. Let us,

again, turn to the example depicted in figure 4.3 from Rusty’s perspective. Let us also

consider that Rusty knows that the player-character has an objective: to break through

the building to invade it. Rusty also knows that the player-character is healthy enough

to handle a few enemies. These are his current beliefs and we use them to set the desired

value of predictors (1 in figure 5.6) such that reducing the distance toward enemies or the

door is considered as positive within this context. Rusty’s beliefs also contain information

about the position of entities that Rusty can sense (i.e. within a certain radius). This

information is fed to the emotivectors at each cycle.

To represent the possible actions between the player-character and an enemy, we use

one set of three emotivectors (one distance-based and two velocity-based). To represent

the possible actions between the player-character and the building’s door, we use a set

with one emotivector (distance-based) (2). The player is currently moving his avatar

through the world and is reducing his distance to both the door and the enemy. Suddenly,

something unexpected occurred: the avatar reduced drastically its distance to the enemy

and the distance to the door increased. At this time, the distance-based emotivector in

both sets became salient (3). In the set of the door (a static entity), we only have one

emotivector so we know that the player-character is getting further away (movement)

from the door (target). In the second set, the emotivector that monitors the relative

velocity from the player-character to the enemy is also salient. This pattern means that

39

Figure 5.6: DogMate’s update flow: (1-2) emotivector parametrisation; (3-4) intent de-tection; (5-6) affective appraisal; (7) action selection.

an action was initiated by the player, whose target is the enemy, and his movement is to

get closer to it. Two intents are produced based on these recognised attributes (4).

Because Rusty knows that the player-character can handle another enemy, the distance-

based emotivector leads to the emission of a positive sensation (5) which makes Rusty

interpret that distance reduction (and the corresponding intent to attack) as beneficial

for the player (6). On the other hand, because Rusty wants the player to complete his

objective, he interprets the increased distance to the door as negative and consequently

views with bad eyes his disinterest in completing his task.

Finally, these intents are compared to each others in the decision making module

to select the decision that Rusty will adopt. Because the emotivector that monitors

the player-character’s relative velocity to the enemy is the most salient of them all, we

choose to select that intent as the one that will make Rusty react (7). As such, Rusty

encourages the player-character through verbal and non-verbal behaviour, described in

the next chapter.

5.6 Summary

In this section, we presented the emotivector, an affective anticipatory mechanism. We

discussed our predictors for the distance and velocity, and discussed how to classify the

unexpected value in term of their relevance. We based it on the exogenous model of the

40

emotivector, although we assume a different usage: we avoid normalizing the monitored

signal (distance) as it could pose some problems. We use the relevance value not only to

classify actions between them, but also to determine which is the target of an action. Next,

we introduced the notion of undesired value in the sensation model of the emotivector to

model the concept of avoidance and presented how we appraise intents from the point of

view of a character.

Afterwards, we presented DogMate and its flow within one update cycle. It begins

by parametrising the emotivectors based on the beliefs of the character. Each entity is

monitored by a set of emotivector, varying from one distance-emotivector for static entities

to three emotivectors for intentional entities (one for the distance, two for velocities).

The salience formed by each set corresponds to a detected intent which is then appraised

based on the affective sensation generated from the distance-emotivector present in the

set. Finally, we use this information in the decision-making component to select the intent

from the action which is the most relevant for the character, reflecting its judgement of

the situation.

In the next chapter, we present K9, our test-case, in which we use DogMate coupled

to an agent architecture that simulates Rusty’s brain and controls him in a game engine.

41

42

Chapter 6

Test-Case: K9

K9 (“Canine”) is a small role-playing game (RPG) environment used as a test-case for our

framework. We built it on top of the modding framework of FO3 and therefore it makes

full use of its game mechanics. In this section, we first present K9 universe, introducing

its story — incorporated within Fallout’s timeline1 — its gameplay, and the relationship

between Rusty and the player-character. We then overview its technological support,

namely the interaction between FO3 and its modding components. Finally, we present

Rusty as a synthetic character and describe its interaction with DogMate.

6.1 The Story

October 23rd, 2076. The world is in chaos.

Resource Wars have been raging for more than two decades now and everyone

can feel that something big will happen soon... something called the Great

Nuclear War2.

After completing their Power Armor3 prototype, the U.S. military plans to

send them to the front in order to keep Chinese forces in check. On their

side, the Chinese Secret Agency found out about the Enclave4 plans and they

are pressuring their research department to finish their human-experiments

quickly...

Ralph Canine is a hard worker living in a remote, peaceful town. He is nearing his

thirties and lives with his beloved wife. Recently, however, he has been lost in thoughts,

and keeps listening to the news on the radio (fig. 6.1). It seems like the Chinese military

forces are roaming through the country, kidnapping citizens on their way. Witnessing his

1The Fallout setting timeline: http://fallout.wikia.com/wiki/Timeline2The Great Nuclear War start on October 23rd, 2077: http://fallout.wikia.com/wiki/Great War3The Power Armor: http://fallout.wikia.com/wiki/Power Armor4The Enclave: http://fallout.wikia.com/wiki/Enclave

43

husband in such state, Pat Canine cheers him up and convinces him to take Rusty, their

dog, out for a walk. She tells him to forget about the news, as such things would never

happen in their village anyway.

Figure 6.1: Ralph, listening to the radio.

The moment Ralph puts a foot outside the door, he cannot help but to feel anxious. He

feels that something is wrong. Seconds later, a small commando appears out of nowhere

and invades the village, bashing the villagers (fig. 6.2). While their aim is not clear,

Ralph and Rusty resist. Their efforts are, however, fruitless and they rapidly are put out

of cold.

Figure 6.2: The Chinese commando bashes the villagers and kidnaps them.

Two weeks later, Ralph awakes in a strange and desolated room. While wondering

about his surroundings, a strange robot enters the room and calls him “subject 101”. It

seems to Ralph that the robot is some kind of medic, as it talks about checking his vital

status. On a comical, yet creepy tone, the robot tells him that he is a failed experiment

and that he will not live much longer. On these words, it begins to leave the room.

On its way out, however, the robot is completely destroyed by a strangely dressed man

(fig. 6.3). That man goes by the name of Three Dog, and it seems like his skin is rotting

44

away, like if he was some kind of ghoul. He tells Ralph to hurry invoking that they do not

have much time. It seems like he is breaking out of that place and that he wants to take

Ralph with him. Ralph, insecure, follows him as he does not have much choice. Along

their way, Three Dog slowly enlightens Ralph about his current condition. If what Three

Dog says is true, it seems like Ralph was used in a genetic experiment.

Figure 6.3: Three Dog kills the robot medic.

He does not seem to lie, however. Indeed, Ralph begins to feel aware of the new

abilities he possesses. He feels stronger and his senses are sharpened. Strangely, he feels

like he is able to understand the sensation of someone else. He soon finds out that the

other being is Rusty, his dog. Both Three Dog and Ralph rescued the poor animal and

it was then that Rusty “talked” to him — or it is better to say that it was Ralph that

could understand him. The conversation is not a happy one, as Rusty delivers him sad

news. Pat, his wife, passed away.

Seeing Ralph depressed, Three Dog tries to console him. He rapidly understands

that Ralph is not depressed. He is enraged. As he cannot understand what is going

on, Ralph menaces Three Dog to explain him everything. It is at that moment that the

sorrowful truth is revealed. To face the uprising technological prowess of the Americans,

the Chinese army has been working on experiments to create super-soldiers. They created

a nano-technology that allows humans to assimilate the genes of animals. The result is

two-fold. Humans are able to feel the sensorial information of their companion. However,

their genes are being destroyed by the combination. Without new doses of genes they are

prone to death. Hence, Three Dog tells him to make sure to keep Rusty around and safe,

at all costs.

On their way out, Three Dog also reveals to Ralph information about the responsible

behind their problems. He tells him that this person, Dr. Wu, might possess a cure for

their disease. He is, however, interrupted by a robot-guard which is set on making the

three of them go to heaven. Ralph and Rusty barely make it alive, but Three Dog was

not so lucky. On his deathbed, Three Dog reveals that he is one of the leaders of the

45

underground resistance and that if Ralph contacts them, they will surely help him out.

Decided on finding a cure and getting revenge, Ralph sets out on his journey and soon

finds out Dr. Wu’s whereabouts. . .

6.2 The Gameplay

Fallout 3 is an action role-playing game, mixing some elements of first/third person shoot-

ers and western role-playing games. The player may wield several weapons and armours,

and his own dexterity influences his performance – hence the shooter categorisation. How-

ever, there is also a part determined by statistics and skills that only depends on the

player’s character level.

As K9 is a FO3 mod, we continued using their core gameplay system. However,

we modified two important aspects of the original game. First, the Vault-Tec Assisted

Targeting System, or V.A.T.S., which allows the player to use some action points and, in

turn, stop the game to aim at a specific body part of an enemy. Second, the radiation

system as K9 takes places before the Great War, a period in which people do not have

to worry about the lethal radiation that enveloped the world. We transformed these

game mechanics to reinforce our story and implemented an adrenalin system, plus a

degeneration system. The adrenalin act as action points, as they permit the player to use

the V.A.T.S. (albeit renamed for the story’s purpose). Adrenalin goes up as the player

fight and down as he rests or use the V.A.T.S. The degeneration system is here to pressure

the player. Degeneration gives the player some disease (i.e. statistical transformations,

with some advantages and penalties) and increase continually in time, but can be slowed

by the player’s adrenalin. Indeed, a greater adrenalin level may permit the degeneration

to stop worsening.

Fortunately for him, Ralph is not alone. He has Rusty by his side, accompanying

through his journey and taking an active role in the unfolding of story. He stays by Ralph’s

side at all the time and tries his best to protect him. Rusty has also an important role in

terms of gameplay mechanics. In fact, the only mean to make the degeneration lower is

transferring some of Rusty’s genes to Ralph’s body, to feed the nano-technology. Making

so, however, temporarily lowers the nano-technology power and disable the possibility

from using shared abilities between Rusty and Ralph. These consist principally into

sharing their own senses (resulting into a colour alteration of the screen, turning enemies

more salient as shown in fig. 6.4) and perceptions.

We implemented three levels to represent K9’s story. First, an introductory level,

in which the player gets a first contact with the world. It presents the story up to the

kidnapping. Second, a tutorial level, in which the player meets with Three Dog. Three

Dog act as a mentor, teaching the player new abilities and revealing more about the

story. Finally, we implemented a main level in which the objective for the player is to

46

Figure 6.4: Ralph uses Rusty’s sensorial information to better discern his enemies.

invade the building in which Dr. Wu’s laboratory is supposed to be. This level is divided

into three parts representing the outside (filled with failed experiments, almost completely

degenerated), a first line of defence (filled with soldiers partially enhanced) and a compact

barricade securing the main entrance to the laboratory (filled with heavily armed soldiers).

6.3 Technological View

To implement K9 we used Fallout 3’s modding tool, the Garden of Eden Creation Toolkit5

(G.E.C.K.). The G.E.C.K. allows modifying everything that is considered a content (e.g.

characters, levels, quests) of the FO3 game engine. In fig. 6.5 we show its main interface.

Using this tool we created the K9’s levels. The process is straightforward, we started

by defining the world characteristics, then proceeded to shape the height-map to as we

envisioned it. Texturing the level is as easy as painting in a usual image editing software

as it use the same metaphor (i.e. you pick a brush, a texture, and you paint the area you

wish to cover). The biggest advantage to mod such a game is that we have an access to

a vast amount of professional and well integrated assets. Filling the level with content

such as trees, roads, or buildings, is as simple as dropping them where we want them to

be placed.

FO3 engine uses the concept of navigation meshes (navmesh, see fig. 6.6) to support its

pathfinding algorithms. Although some automatic tools are available to provide a good-

enough navmesh, it needs, most of the time, some manual correction. It is an important

step if we want autonomous character to behave in a correct way as, without correct

5Bethesda Game Studios, Garden of Eden Creation Kit (2008): http://geck.bethsoft.com/index.php/.

47

Figure 6.5: The G.E.C.K., Fallout 3 main modding tool.

navmesh information, they would just be lost within the level.

Figure 6.6: Editing the navmesh in the G.E.C.K. A character can only move within itsboundaries.

These characters can be created and parameterised using a generic template, as shown

in fig 6.7. We can define their attributes, skills, or experience, but also some data specific

to their behaviour, such as their aggressiveness or confidence. This allowed us to quickly

create several types of characters that will behave differently in punctual combat situa-

tions. For their more common behaviour, the G.E.C.K. possesses the concept of packages.

Packages are a configurable set of instructions that will be executed when a precondition

is matched. They allow to specify behaviours like “follow”, “lead” or even “eat” and

“sleep”. It has, however, a limitation if we want to have a fine-grained control over the

character itself. Low-level actuators are, for the most part, inaccessible which makes it

tricky to control a character exactly the way we want it to be. We are restricted to higher

level control, i.e. restricted to the use of packages.

To modify existing game mechanics, we mostly modified game settings already avail-

able. In the G.E.C.K., these are similar to the concept of global variables. It made the

48

Figure 6.7: Creating a character.

adaptation of some mechanics, like the radiation or the action points, straightforward.

However, to create new mechanics we have to go through a whole different process. In-

deed, there is no such support within the G.E.C.K. The only possibility is the support

of some limited scripting functions that we used to create both new behaviours, for our

characters, and new mechanics, for K9. To overcome most of the imposed scripting lim-

itation, we opted to use Fallout Script Extender6 (F.O.S.E.). The F.O.S.E. injects some

new instructions at the initialisation of the game executable, making it possible to create

new scripting functions and make them available to the mod. We can then program these

functions in whatever language we want, as they are self-contained in an external dll. In

this case we opted for the C++ language and, by extending the source code of the original

F.O.S.E., we created Rusty’s brain.

Before looking at the internals of its brain, we will first describe Rusty as a synthetic

character existing within FO3 engine, i.e. Rusty’s body.

6.4 Rusty’s Body

Rusty is a dog, one which, in appearance, is similar to Dogmeat — the dog of the original

game. To convey his thoughts, Rusty uses several animations that encompass unique

emotions like being happy, surprised / relieved, confused / fearful, or angry (shown in fig.

6.8).

These animations are coupled to a sound component that matches it, namely sounds

corresponding to barking and panting. In the happy animation (fig. 6.8(a)), Rusty raises

his tails as he barks happily, to the sky, five times. When he is surprised (fig. 6.8(b)),

with a feeling of being relieved, he pants a few times while waving his tails, horizontally,

in a joyful manner. In contrast, when he is confused (fig. 6.8(c)), with a touch of fear,

he lowers his head, put his tails in-between his legs and he whimpers. Finally, when

6I. Patterson, S. Abel and P. Connelly, Fallout Script Extender (2008): http://fose.silverlock.org/.

49

(a) Happy. (b) Surprised / relieved.

(c) Confused / fearful. (d) Angry.

Figure 6.8: The four emotions that Rusty can convey through his animations.

he is angry (fig. 6.8(d)), he assumes a serious and threatening pose in which he growls

ferociously.

Along these, we reinforce his expression by using on-screen subtitles which reinforce

his expression but also reflect Rusty’s thoughts (fig. 6.9) — in the fiction of K9, the

player-character can understand Rusty.

Figure 6.9: Rusty’s expression is reinforced through a sound component and on-screensubtitles, which also reflects his thoughts.

The similarity between Rusty and Dogmeat ends here. When and why these anima-

tions are used is almost (if not more) as important as the animation themselves. This is

mostly DogMate’s responsibility, which we describe next.

50

6.5 Interacting with DogMate

Rusty’s behaviour is mostly controlled through an independent module, named as its

brain, and interpreted by the scripting component of FO3 game engine. The interaction

between the module and the engine is technologically limited to a unidirectional invocation

from the scripting component. In fact, to the game engine, it contains a mod — K9

— that has a script which invokes yet another scripting function. In reality, F.O.S.E.

interrupts the process and redirect the invocation toward our code, which provides us the

mean to implement the functions in an independent (and more convenient) manner. The

interaction is based around two functions, one to update the brain and the other to get

the decision that Rusty will execute in its current situation. This decision is then carried

out within the mod itself. In practice, it corresponds to play an animation, a sound, a

dialogue or to activate a specific package. This step is necessary as such actions cannot

be carried out from within the external module.

This component (Rusty’s brain in fig.6.10) includes DogMate and a world interface

which contains sensors and effectors.

Figure 6.10: Interaction between K9 and Rusty’s brain.

First, let us outline the general flow that corresponds to the brain being updated

(fig.6.10). It starts from within K9’s scripts (1a). At this point, information about the

world is sensed so that we only consider detected enemies and relevant entities (1b) from

Rusty’s perspective. This information is stored in DogMate’s beliefs. DogMate performs

its update cycle, eventually detecting the intents of relevant intentional entities. As the

result of its cycle, DogMate sends primitives corresponding to the selected behaviour for

Rusty. We store this information in a buffer (1c), the effectors, and in the end of the

K9 cycle, the script checks which was the decision (2). Within the FO3 engine, this

information is used to execute verbal and non-verbal behaviour for Rusty which consist

of the subtitles and animations presented earlier.

The main purpose of the world interface is to act as an interface between the engine

and the rest of brain. It is responsible to keep an up to date cache that replicates

the game engine information deemed relevant for our component. This information is

then culled using Rusty’s sensors and transformed into its own beliefs. This information

51

typically either represents the perception needed to update a predictor or its desired

value. The world interface is also composed of effectors which consist in a buffer that

stores the decision. Once the decision is received through the “GetDecision” invocation

(2 in fig.6.10), the engine processes and makes Rusty behave in the intended manner.

Rusty’s actions are mostly passive. Indeed, he only counsels, and support verbally the

player based on the situation he is witnessing. Hence, the brain’s decision is transformed

into one animation coupled with a matching counsel (i.e. a dialogue appearing as a

subtitle). While the animation, and its respective sound, remains the same for a similar

emotion, Rusty’s thoughts may differ. In table 6.1 we show a sample of the text shown to

the player as being Rusty’s thought and counsels. Note that we use the emotion elicited

by the affective appraisal [20] to the select one of the four animations.

Intent Player condition Appraisal Thought / Counsel

Player attacks Good Happy “Oh yeah! Let’s get him!”

Player attacks Good Surprised“Oh yeah! Being cowards ain’t forus!”

Player attacks Bad Angry“An enemy now? That would bepretty bad...”

Player attacks Bad Confused“Seriously, we should avoid con-frontations for now...”

Enemy attacks Good Happy “Oh oh! Come and get some!”

Enemy attacks Bad Angry“Now would be a good time toflee...”

Player flees Bad Happy “That’s it! Let’s avoid that one.”

Player flees Bad Surprised“Facing hostility in this situa-tion? You had me worried for amoment there...”

Player flees Good Angry“A little fight wouldn’t be thatbad!”

Player flees Good Confused“Seriously, why are you cower-ing...?”

Want to invade — Happy “ Oh yeah! We’ll be there soon!”

Want to invade — Surprised “That’s it, no time to waste!”

Avoiding invasion — Angry“What are you doing...? We’regetting further away...”

Avoiding invasion — Confused“Is there a problem..? We shouldhurry you know...”

Table 6.1: The animation displayed and the subtitles are based on the selected intent andits affective appraisal.

Some possibilities are not used as, in preliminary tests, they were poorly understood

by the player. An example is the enemy fleeing, and being noticed by Rusty but not by

the player. Rusty would comment on it, “Oh oh! Scared, are you?” but most of the time,

52

the player could not understand his reasoning. We could surely adapt it, by restricting

such expression to those that the player also witnessed but, as this is not the scope of our

work, we opted to not include them in Rusty’s final behaviour.

6.6 Summary

In this section, we described K9. As a mod of Fallout 3, it uses most of its games

mechanics, but also adds new ones that suit our goals. The story itself takes place as

a prequel of the Fallout series, a year before the Great War which almost completely

destroyed everything on our planet.

Technologically, we used the G.E.C.K. to create the virtual world and setting, as well

as its characters. Using F.O.S.E. we created an independent module, Rusty’s brain, which

can be invoked from within K9’s scripts.

Rusty’s brain is made by a world interface and DogMate. The engine provides Rusty’s

brain with game data, that the world interface stores and filter, passing through the

information that Rusty can sense to DogMate. In return, Rusty’s brain provides the

game engine with the behaviour that should be used by Rusty’s virtual body. This

behaviour is a conjunction of Rusty’s animations, sounds and subtitles, which match the

intent selected by DogMate and its affective appraisal.

In the next section, we describe the evaluation we performed on K9, and the results

we extracted from it.

53

54

Chapter 7

Evaluation

In this work, we pursued several objectives. First, we wanted to create a framework

to support the creation of believable behaviour for synthetic characters in general, by

allowing the detection of some of the other characters intentions, namely those of the

player-character. We were particularly interested in applying such an approach to syn-

thetic character with a particular role, sidekicks, and this interest guided our development

to K9. To evaluate our approach, we focused on analysing whether the framework could

correctly detect the player’s intentions. We were also interested in understanding how

well the recognition of intentions was performing when compared to an external human

observer. Finally, we wanted to be aware of the performance. Virtual worlds are real-

time applications in which the greatest emphasis is almost always given to the visual

component. Other components, such as the application’s physics or artificial intelligence,

are left with minimal resources. With such restrictions in mind, we wanted to verify the

computational cost (in terms of CPU consumption) of our solution.

Thought this section, we first review our experiment, describing the setting and each

step of the process. We then discuss the results of the experiments, analysing the recog-

nition rate of the player’s intent and the recognition rate of interpreted intent (i.e. the

player’s intent as interpreted by an external observer). We finish this section with an

analysis of the impact of using this system in terms of computational costs.

7.1 Experiment

To evaluate our test-case, we performed an experiment in which we asked subjects to play

the game. The experiment took place at a public venue, MOJO (“MOntra de JOgos”,

which can be translated as “videogames exposition”) which purpose was to demonstrate

videogames created as part of a master’s course. K9 was developed mostly during that

course, while, at the same time, supporting this work as a demonstrator. Although most

of the participants were university students whose age varied from 19 to 28, MOJO gave

us the chance to evaluate this work with people unfamiliar with our working area.

55

(a) (b)

Figure 7.1: MOJO: a videogames exposition and demonstration.

Before asking them to start the experiment, we made sure that they had a minimal

knowledge of how to play the game, as well as knowledge about K9’s storyline and its

specific mechanics, namely the interaction with Rusty. After the tutorial, we had a small

interview in which we resolved any remaining misunderstanding that participants could

have about these topics. Once ready, we asked them to play a session of K9’s main level,

in which their objective was to invade the enemy’s headquarters. Each interaction lasted

for two to three minutes.

Each game session was recorded through a screen capture software. Immediately after

finishing a session, we asked the participant to annotate the recognized intents. For the

annotation, participants could choose one of five available options: whether they wanted

to attack an enemy or flee from it; whether they wanted to open the door or were avoiding

it; the fifth option was to be used if their intent did not match any of the previous four

options. From this evaluation, we gathered a total of 54 samples (instants in which Rusty

expressed itself based on the user’s intent) from 9 different participants.

Because human observers watching the game would also fail at recognizing certain

intents, we wanted to compare the performance of our approach with that of a human

observer. Our motivation is that if the mismatch between the real intent and the recog-

nized intent can be understood as natural by the player, it might appear as believable.

However, for the player to understand Rusty’s reactions, he has to interpret the situation

from Rusty’s point of view. With this idea in mind, we randomly selected 30 samples

from the 54 available. We then published them on a video streaming website. Although

we lost some video quality, it gave us the possibility to pursue the evaluation virtually.

Our aim was to reach a larger population this time and each participant was asked to

review the thirty samples and to classify them using the same options previously available

to the participants. With this approach we collected 820 valid samples from 30 observers.

We now discuss our results.

56

7.2 Intent Recognition

We gathered a set of 54 samples from the venue’s participants and we compared each

of them against their respective annotation. We found that the participant’s intent was

recognized 61% of the time (see table 7.1).

Rusty samples matches %

player invades building 17 13 76.47%player attacks enemy 25 17 68.00%

“avoid” intents 12 3 25.00%total 54 33 61.11%

Table 7.1: Matches between Rusty’s interpreted intent and the player’s intent.

During our tests, we found out that participants rarely fled from an enemy (although

being positively detected several times). They also had a certain difficulty to express

negative intents such as “avoiding an entity” because, in reality, when they did so, it

was mostly due to the fact that they started to do something else (i.e. “to seek” another

entity). Comparatively, we achieved a recognition rate of 71% for intents related to moving

toward an entity (first two rows in table 7.1). These results support our assumption that

if the player moves his avatar toward an entity, he probably has the intent to interact

with it.

The gathered data also suggests that some refinements could help increase the recog-

nition performance. At the conceptual level, it may be important to take the player’s

field of view into account. Even if the player could be focusing his attention on something

he cannot directly see (e.g. he wants to ambush an enemy by making a detour around

an obstacle and get in his back), such information could be useful to confirm specific

situations and prune others. An example is the when the player is strafing, such as in

figure 7.2. We should not consider that the player intends to attack the enemy that is

outside his field of view, if we remarks that he is focusing on keeping a safe distance from

the enemy he is seeing.

Figure 7.2: The player is strafing toward an enemy while looking at another one.

At the implementation level, the notion of distance could be implemented differently.

57

In K9, we chose to use euclidean distance as a compromise between accuracy and per-

formance. However, in a world with a high number of obstacles (e.g. buildings, rocks,

interiors), this value may be inadequate. A more realistic value for distance could be

computed using path-finding algorithms. However it is a costly process in virtual worlds

of a dynamic nature, and the added value for intent recognition remains to be measured.

Even if we could access to the path-finding information, which would be the distance to

consider in figure 7.3? It is but a simple situation, yet not a trivial one to solve. When

one of the three distance decrease, the others might increase. Given this scenario, should

we consider that the player-character is getting closer or getting away from the enemy?

Figure 7.3: Which of the three distance to use? The blue line represents the euclideandistance and green lines available paths.

7.3 Intent Interpretation

We do not need realistic behaviour to achieve believability, only to display specific and

context-ware behaviour [18]. When Rusty fails to recognize the user’s intent, we do not

want it to break the user’s suspension of disbelief by displaying inadequate behaviour

based on its failed intent recognition. If we ask a human observer to comment about

the possible intentions of his peer, he too might get some intentions wrong. However,

every time the observer recognizes an intention, he can justify it based on his observation.

Then, if the user uses some theory of mind, and puts himself in the shoes of the observer,

limiting his own knowledge to the information the observer had, he might agree with the

observer that the recognized intention could indeed be valid from a certain point of view.

As such, we wanted to verify whether Rusty succeeds or fails at recognizing the same

intents a human observer would succeed or fail to recognize.

We classified each sample according to whether Rusty and/or the external observer

had recognized correctly the user’s intent (see table 7.2) and applied a Person’s chi-square

test, �2(1, N = 820) = 18.31 (� < 0.001). Results suggest that, generally, Rusty would

recognize the same user’s intents the human observer would, and fail to recognize the

same user’s intents the external observer would fail to recognize. As such, Rusty and

external human observers seem to perform similarly when it comes to recognizing intents,

58

leading us to believe that Rusty’s intent reaction may be perceived as believable by the

user.

Rusty ≈ Observer External observertotal

�2 (� < 0.001) recognized not recog.

Rustyrecognized 377 272 649not recog. 68 103 171

total 445 375 820

Table 7.2: Matching external observer’s interpreted intent and Rusty’s recognized intent.

The 820 gathered samples also reinforce our previous statement about negative intents.

Subjects assuming the role of observers also had difficulties in identifying such intents as

only 6.14% were correctly recognized (compared to 62% positive intent recognition —

see table 7.3). The small variation between Rusty’s performance and the lower human

observer’s performance may be related to the fact that Rusty’s has access to information

that is not always on the game screen. This is desirable for virtual sidekick agents, as the

user will value the added understanding of the surrounding virtual world, without feeling

that the sidekick has access to information the user simply cannot access.

Observer samples matches %

player invades building 325 217 66.77%player attacks enemy 381 221 57.40%

“avoid” intents 114 7 6.14%total 820 33 54.27%

Table 7.3: Matching observer’s recognized intent and player’s intent.

7.4 Limitations

The experiment helped us identifying several limitations in our approach. First and fore-

most, not all actions’ intent can be equally identified. Our results support that “moving

toward an entity” is correlated with the player intending to interact with the said entity.

However, our results also show that “moving away from an entity” is not correlated with

the intent to avoid the same entity. As referred earlier, a possible explanation is that

participants explained their actions by stating they wanted to do something new and not

by stating they did not want to do an action anymore: if the player stops attacking and

moves away from a virtual enemy agent, he might not be fleeing from it, he might just be

looking for something that suddenly appeared on the floor or attacking another enemy.

The second limitation is that our approach emits intents that have no past history, as

if each one was the first and only detected intent. This results in duplicated and correlated

59

Figure 7.4: Avoiding an entity is not just a simple negation of the converging intention.

intents emitted disregarding their possible cause. If the player gets surprisingly closer to

an enemy and, seconds later, his distance decrease surprisingly once again, two different

intents are emitted while referring to a same intent: to attack the enemy. Also, attacking

an enemy might produce an intent that suggests that the player is fleeing from another

one. This lack of history and correlation between intents led to the emission of some false

intents.

7.5 Computation Impact

A graphical application, such as videogames, usually requires most of the hardware re-

sources to be dedicated to the graphics engine. When each CPU cycle is crucial, the most

important concern is to sustain the graphical frame rate, as it is one of the most noticeable

aspects of the interaction for the end-user. While the game logic also needs its share of

resources to be processed, in computer and video games, the approach is that the lower

the better. With this in mind, we aimed for a system with low resources requirements: if

it is to be added on top of current generation game technology, then it has to be hardly

noticeable.

Our first concern was the framework’s update rate. An empirical test showed that 2

to 4 hertz tend to present good results. Lower than 2 hertz seems unnatural as it takes

too much time between the user’s action and the intent’s detection. Any frequency higher

than 4 hertz seems to produce the same unnatural behaviour, as if Rusty was reacting

immediately. In a special testing environment, we used 300 predictors divided into 100

sets, to solely monitor virtual enemy agents. This number is exaggerated, but should

cover most of the user’s potential centre of focus in a virtual world. Even if we use more

entities, those which are not relevant for the user’s actual situation can be culled. Profiling

this instantiation of the framework did not show any significant overhead (0.18%) at 4

hertz.

Real-time constraints are mandatory in applications such as computer and video

games. While our solution only recognize a subset of all possible intents, we would argue

that, with such a low impact on CPU cycles, it does recognize them almost for free. As

60

Figure 7.5: DogMate did not take more than 0.18% of the used CPU cycles.

such, we defend this approach could be used to help in creating believable behaviour for

synthetic characters inhabiting the virtual world.

7.6 Summary

We assessed DogMate’s successfulness by evaluating K9 in a public venue. We focused on

gathering two types of data: about the recognition of the player’s intentions and about

the interpretation of these intentions. Hence, we interviewed the participants, and classi-

fied several samples (instants in which an intent is detected) with them. Furthermore, we

published a (randomly selected) subset of these samples on the internet for more classifi-

cations. This time, however, the objective was to gather data about external observers.

These classifications allowed us to support our assumption that if the player is going

toward an entity, it must be to interact with it. We also propose that this assumption

might be refined, namely by looking at the player’s field of view and by implementing

other notions of distances, like analysing several paths. The data also helped us to realise

that this assumption cannot be negated and still be valid. In fact, if the player is not going

toward an entity, it does not mean that he does not want to interact with it. Usually, the

player prefer to refer that they started to seek something else rather than avoiding the

said entity. Our analysis suggests that these facts are consistent whether classified by the

player himself or by an external observer.

Rusty interprets the player-character’s intent from a third-person perspective. While

guessing accurately his intent is important, to remain believable it should, at least, have a

logical explanation from an observer’s point of view. Thus, we compared Rusty’s detected

intents with the external observer’s classification and found out that both were consistent

with each other. This could effectively mean that the player would be able to accept

Rusty’s recognised intentions as valid in his own context.

61

We, finally, evaluated our implementation in a special environment with a large group

of entities. Profiling the application showed us that the overhead is minimal. This was

expected as the framework only needs a low refresh rate, 2 to 4 hertz, to produce accept-

able performance. Hence, with such a low overhead, this solution comes almost as a free

support for character’s designers to enhance their creation.

62

Chapter 8

Conclusions

Synthetic characters generally lack the support to understand a fundamental character

in their world — the player-character. This has a negative impact on their behavioural

believability, as the surrounding characters are unable to provide adequate context-ware

behaviour to the player. In this work, we analysed synthetic characters and sidekicks in

particular, which typically interact for long hours with the player and are consequently

in more danger to break his suspension of disbelief than other characters. We saw that

they could benefit from the capacity to understand the intention underlying the player-

character’s actions and therefore our objective was to provide a framework for recognizing

some of the intentions underlying the actions of other synthetic character sharing the same

virtual world. By understanding intentions in particular situations, we allow characters

to display specific context-aware behavioural reactions and improve their perceived be-

lievability.

We started to analyse several works in which anticipation is used to recognise these

actions and based on Searle’s definition of intention, we modelled the relation between

intention and action. We discussed how an action can be divided into three components

(movement, target and intent) and we proceeded to detail how these components of an

action can be detected in a virtual world, based on the matching and mismatching of

anticipated behaviour related to distance change between entities.

To support this work, we used the emotivector and provided some alteration to its

models, namely to classify the relevance of an action. We detailed how the salience that

it generates are triggers for new intents, which type depends on the emotivectors that

become relevant simultaneously, and how an affective appraisal provides the character

with a personal view on the situation. The whole process constitutes our framework,

DogMate.

A test-case (K9) was presented, showing how our framework can be connected to a

current generation game engine. In K9, DogMate was implemented in an agent architec-

ture to control the behaviour of the player-character’s sidekick, Rusty. Rusty interacts

with the player within a virtual world and advised him based on the intents it would inter-

63

pret from the actions of his avatar. The test-case evaluation suggests that our framework

can be used to identify some of the player’s intents and, at least in some tasks, performs

comparably to a human observer, an encouraging result as we were mostly focused on the

generation of believable behaviour. The results also show that while the framework only

recognizes a subset of possible intents, it has a low computational impact, and as such is

suited for integration.

8.1 Concluding Remarks

Our solution’s evaluation showed satisfactory results in the recognition rate of the player’s

intention based on the movement to get closer to an entity. We view these results as a

confirmation that anticipation is a valid support for the detection of some of the player’s

intention. However, our assumption was not equally valid in all situations. In fact, we

should not have assumed that when the player moves away from an entity, the intentions

should be the same as when he gets close to it but in its negated form. This led us to find

that player’s rarely ever think in the negative terms, e.g. instead of not being interested

in one entity, they affirm to be interested in another one.

This approach also yield promising results as it showed to that the intention detected

could be valid as an interpreted intention, as if it was an external observer that made the

guess. This is important as the role of a sidekick, like any other character, is not to be

the player, but one that interacts with him and they should act like an external observer

would: both watch and interpret the player’s experience from a third-person perspective.

Our last concern was to be able to realize the detection not only in real-time but also

efficiently so that it could appear as an appealing method to be used in resource-intensive

application that are 3D application (e.g. videogames). Here too, the results were very

encouraging as the total processing time consumed by our framework (in a stress test)

did not exceed 0.18% of the total. This result let us say that while we may not detect a

wide range of intentions, their detection is almost costless.

8.2 Future Work

We left some details out of our approach, details that could be further investigated. First

and foremost, we analysed the unexpected actions of the player-character, but left out the

expected actions. Expected actions pose a temporal problem. When does an expected

action start and finish? Similarly, when is an action changing within two contiguous

expected actions? Take for instance the case in which the player-character runs through

the world full of static entities. He always goes in the same direction, same speed, in a

predictable manner. After a while, he passed by a few entities. Did he have an intention

64

to do anything with any of them? Probably not, but we only knew it after he passed by

them. How can we know it before hand?

We believe the limitations identified during evaluation could be addressed by introduc-

ing a higher-level layer that would create a history of the intents to filter duplicates and

related intents. Additionally, the framework should also be able to detect intentions in

expected movement (e.g. while exploring), by introducing prior intentions to the frame-

work. These will undoubtedly require new components to recognize the user’s plans, their

purpose and if an action is part of it. All these are possible directions we are considering

for future work.

Another possible direction for this work come from the limitations identified during

evaluation, such as intents being detected without taking in account neither its history

nor possible correlations with other intents. We believe that these limitations could be

addressed by introducing a higher level layer that would place detected intents in a time-

line, to recognise duplicates and correlate intents between each other. We could maintain

this framework as-is and use the detect intent as input in a higher-level layer, as shown

in figure 8.1. This layer would then act as a filter, avoid false-positives to pass through.

Figure 8.1: The intent is used as an input into a higher-level layer which filters correlatedand duplicated intents.

We should also revise how a set’s salience pattern might elicit the detection of an

intention. For instance, we saw that if the player gets further away from an entity, it

does not necessarily mean that he is running away from it. If, however, we measure other

metrics and compare them against a desired value, we might gather a valuable indication.

Take for instance the visibility and let us consider that a desired value of zero indicates

that the player-character is fully hidden from the enemy. Then, if an unexpected change

occurs in the monitored value of that emotivector, it might suggest that the player is either

65

hiding from that enemy or getting out of his cover to possibly attack him (depending on

the signal’s variation). This is just an example of a possible idea that might easily be

implemented in the current framework to refine detected intentions.

Finally, there are still other intentions to consider, namely the prior intentions. These

might be detected if the system can recognise that the player-character is doing several

actions to achieve the same objective. To that end, however, we do not believe that

looking solely at independent actions and correlating them would be enough. We should,

for instance, try to detect every bit of information that the player knows about the

world. We should probably relate it with a model of the player and try to understand

if he performed a certain action to achieve an objective (effectively detecting a prior

intentions).

Hopefully, these are questions we will further investigate in the future.

66

Bibliography

[1] J. Bates. The nature of characters in interactive worlds and the oz project. Technicalreport, Carnegie Mellon University, 1992.

[2] J. Bates. Virtual reality, art, and entertainment. Presence: Teleoper. Virtual Envi-ron., 1(1):133–138, 1992.

[3] J. Bates. The role of emotion in believable agents. Commun. ACM2, 37(7):122–125,1994.

[4] J. Bates, B. Loyall, and W. S. Reilly. Broad agents. In Proceedings of AAAI SpringSymposium on Integrated Intelligent Architectures, pages 38–40, 1991.

[5] M. Bratman. Intentions, Plans, and Practical Reason. Harvard University Press,1987.

[6] R. Burke. It’s about time: Temporal representations for synthetic characters. Mas-ter’s thesis, MIT, The Media Lab, 2001.

[7] P. Carruthers and P. K. Smith. Theories of Theories of Mind. Cambridge UniversityPress, 1996.

[8] D. C. Dennett. The Intentional Stance. MIT Press, 1987.

[9] P. Gorniak. The Affordance-Based Concept. PhD thesis, MIT, 2005.

[10] P. Gorniak and D. Roy. Speaking with your sidekick: Understanding situated speechin computer role playing games. In Proceedings of the First Annual Artificial In-telligence and Interactive Digital Entertainment Conference, pages 385–392, MenloPark, CA, USA, 2005. AAAI.

[11] K. Isbister. Better Game Characters by Design: A Psychological Approach. MorganKaufmann Publishers Inc., San Francisco, CA, USA, 2006.

[12] D. Isla and B. Blumberg. New challenges for character-based AI in games. InArtificial Intelligence and Interactive Entertainment: Papers from the 2002 AAAISpring Symposium. AAAI Press, 2002.

[13] C. Kline. Observation-based expectation generation and response for behavior-basedartificial creatures. Master’s thesis, MIT, The Media Lab, 1999.

[14] J. E. Laird. It knows what you’re going to do: adding anticipation to a quakebot.In AGENTS ’01: Proceedings of the fifth international conference on Autonomousagents, pages 385–392, New York, NY, USA, 2001. ACM.

67

[15] J. E. Laird, A. Newell, and P. S. Rosenbloom. Soar: an architecture for generalintelligence. Artif. Intell., 33(1):1–64, 1987.

[16] N. Lazzaro. Why we play games: Four keys to more emotion without story. In GameDevelopers Conference, March 2004.

[17] I. Leite, C. Martinho, A. Pereira, and A. Paiva. iCat: an affective game buddy basedon anticipatory mechanisms. In AAMAS ’08: Proceedings of the 7th internationaljoint conference on Autonomous agents and multiagent systems, pages 1229–1232,Richland, SC, 2008. International Foundation for Autonomous Agents and Multia-gent Systems.

[18] A. B. Loyall and J. Bates. Personality-rich believable agents that use language.In Proceedings of the First International Conference on Autonomous Agents, pages106–113. ACM Press, 1997.

[19] D. Mark. The art of AI sidekicks: Making sure robin doesn’t suck.http://aigamedev.com/open/article/art-of-sidekicks/., June 2008.

[20] C. Martinho. Emotivector: Affective Anticipatory Mechanism for Synthetic Charac-ters. PhD thesis, Instituto Superior Tecnico, Technical University of Lisbon, Lisbon,Portugal, September 2007.

[21] C. Martinho and A. Paiva. Using anticipation to create believable behaviour. InProceedings of the AAAI 2006, pages 175–180. AAAI Press, 2006.

[22] C. Martinho and A. Paiva. It’s all in the anticipation. In Proceedings of the seventhinternational conference on Intelligent Virtual Agents, pages 331–338, Paris, France,2007. Lecture Notes in Computer Sciences 4722 Springer 2007.

[23] M. Mateas and A. Stern. Facade: An experiment in building a fully-realized inter-active drama, 2003.

[24] M. Mateas and A. Stern. Natural language understanding in facade: Surface textprocessing. In Proceedings of the Conference on Technologies for Interactive DigitalStorytelling and Entertainment, 2004.

[25] M. Mateas and N. Wardrip-Fruin. Defining operational logics. In Proceedings of theDigital Games Research Association, September 2009.

[26] NIST/SEMATECH e-Handbook of Statistical Methods.http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.htm, October2009.

[27] R. Prada. Teaming Up Humans and Synthetic Characters. PhD thesis, InstitutoSuperior Tecnico, Technical University of Lisbon, Lisbon, Portugal, December 2005.

[28] A. S. Rao and M. P. Georgeff. Bdi agents: From theory to practice. In Proceedingsof the first international conference on multi-agent systems (ICMAS-95), pages 312–319, 1995.

[29] A. Rollings and E. Adams. Andrew Rollings and Ernest Adams on Game Design.New Riders Publishing, Indianapolis, 2003.

68

[30] J. R. Searle. Intentionality, an essay in the philosophy of mind. Cambridge UniversityPress, Cambridge, NY, USA, 1983.

[31] L. Sheldon. Character Development and Storytelling for Games. Premier Press, 2004.

[32] F. Thomas and O. Johnston. The Illusion of Life. Hyperion Press, New York, NY,USA, 1994.

[33] B. Tomlinson, M. Downie, and B. Blumberg. Multiple conceptions of character-basedinteractive installations. In ITU Recommendation I371, pages 5–11, 2001.

69

Date post:	09-Oct-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

DogMate: Intent Recognition through AnticipationDesculpa-me Nadine, tu e que mais sofreste com isto...

Documents