DogMate: Intent Recognitionthrough Anticipation
Eurico Jose Teodoro Doirado
Dissertacao para obtencao do Grau de Mestre emEngenharia Informatica e de Computadores
Juri
Presidente: Doutora Ana Maria Severino de Almeida e Paiva
Orientador: Doutor Carlos Antonio Roque Martinho
Vogais: Doutor Manuel Joao da Fonseca
Doutor Pedro Alexandre Simoes dos Santos
Novembro 2009
2
Agradecimentos
Um enorme Obrigado aos meus pais que durante anos e anos aturaram-me, ajudaram-me
e fizeram com que fosse possıvel para mim chegar ate aqui.
Agradeco as minhas irmas, a Alexandra e a Nadine. Desculpa-me Nadine, tu e que
mais sofreste com isto tudo.
Agradeco aos meus colegas do GAIPS, em particular ao Guilherme e sobretudo ao
professor Carlos Martinho, o meu orientador.
Agradeco aos meus companheiros de treino, aos meus amigos e aos que sao os dois, os
“quatro outros”: o Dante, a Xana, o Vlad, a Mestre.
Por fim, quero deixar um agradecimento muito especial a pessoa que terei de qualificar
como a minha melhor amiga, a falta de melhor termo. Nao ha palavras que descrevem o
que fizeste por mim.
Ha realmente pessoas fantasticas neste mundo. Muito Obrigado a todos.
Porto Salvo, 21 de Outubro de 2009
Eurico Jose Teodoro Doirado
4
to You
6
Resumo
Os sidekicks tendem a ter dificuldades em perceber os seus parceiros, nomeadamente o
avatar do jogador, o que tem um impacto negativo na sua credibilidade. Neste trabalho,
apresentamos um metodo para detectar algumas das intencoes do jogador. Comecamos
por analisar a questao da credibilidade nas personagens sinteticas. Analisamos trabalhos
relacionados que focam em melhorar a credibilidade das personagens baseado em tecnicas
antecipatorias, nomeadamente antecipando as accoes de outras personagens. Com o ob-
jectivo de perceber a razao por detras destas accoes, propomos um modelo que relaciona
intencoes e accoes e continuamos com a elaboracao de uma framework que interpreta
as intencoes de uma accao baseado num mecanismo anticipatorio. Implementamos essa
framework num demonstrador no qual o seu proposito e controlar o comportamento do
sidekick do jogador. Finalmente, apresentamos e analisamos os tres aspectos aborda-
dos nos nossos testes: o reconhecimento das intencoes do jogador, a sua interpretacao e
a eficiencia da nossa solucao. Os resultados indicam que a nossa solucao pode ser us-
ada para detectar algumas das intencoes do jogador e, nesses casos, o reconhecimento e
semelhante ao de um observador humano, sem impactos significativos no desempenho.
8
Abstract
Sidekicks generally lack the support to understand a fundamental character in their world
— the player’s avatar. This has a negative impact on their behavioural believability. In
this work, we present an approach to detect the intent underlying certain actions of
the player. We begin by introducing synthetic characters and sidekicks in particular,
emphasising on their believability. We then review related works which enhance the
believability of characters by anticipating the actions of other characters. To understand
the reason underlying their actions, we model the relation between intention and action
and then proceed on elaborating a framework that can interpret the intent of an action
based on an anticipatory mechanism. We then present a test-case in which our framework
is used to create an architecture controlling the sidekick of the player. Finally, we discuss
three aspects of our evaluation: the player’s intent recognition, its interpretation and the
solution’s efficiency. Our results suggest that our solution can be used to detect certain
intents and, in such cases, perform similarly to a human observer, with no impact on
computational performance.
10
Palavras Chave
Keywords
Palavras Chave
Intencoes
Accoes
Personagens Sinteticas
Videojogos
Keywords
Intentions
Actions
Synthetic Characters
Videogames
12
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Believable Characters 5
2.1 Synthetic Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Dimensions of a Character . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Interactive Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 The Sidekick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Related Work 13
3.1 The SOAR Quakebot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Situated Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Temporal Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Aini — The Synthetic Flower . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Intentions and Actions 21
4.1 Intentions in Philosophy of Mind . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Intentions in Computer Science . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 The Player and the Player-Character . . . . . . . . . . . . . . . . . . . . . 23
4.4 Model of Intentions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.5 Components of an Action . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5.1 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5.2 Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5.3 Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5.4 Intent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
i
5 DogMate 31
5.1 Emotivector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4 Affective Appraisal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.5 DogMate’s Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6 Test-Case: K9 43
6.1 The Story . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 The Gameplay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.3 Technological View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.4 Rusty’s Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.5 Interacting with DogMate . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7 Evaluation 55
7.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.2 Intent Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.3 Intent Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.5 Computation Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8 Conclusions 63
8.1 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
ii
List of Figures
1.1 Sonic evolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.1 CG films visuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Rusty, our synthetic character. . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 The player-character and Rusty, his sidekick. . . . . . . . . . . . . . . . . . 10
3.1 Quake 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Duncan, the Highland Terrier. . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Aini, the synthetic flower. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1 Model of intentions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 The entities of our scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Rusty observes an unexpected moment. . . . . . . . . . . . . . . . . . . . . 27
4.4 Alone, the distance is not enough. . . . . . . . . . . . . . . . . . . . . . . . 28
5.1 The emotivector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Emotivector’s affective sensations. . . . . . . . . . . . . . . . . . . . . . . . 33
5.3 Rusty, the observer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.4 Problem of a normalized range. . . . . . . . . . . . . . . . . . . . . . . . . 36
5.5 Rusty’s interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.6 DogMate’s update flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.1 Ralph, listening to the radio. . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.2 The Chinese commando. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.3 Three Dog kills the robot medic. . . . . . . . . . . . . . . . . . . . . . . . . 45
6.4 Ralph uses Rusty’s senses. . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.5 The G.E.C.K., Fallout 3 main modding tool. . . . . . . . . . . . . . . . . . 48
6.6 Editing the navmesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.7 Creating a character. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.8 Rusty’s four animations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.9 Rusty’s thoughts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.10 Interaction between K9 and Rusty’s brain. . . . . . . . . . . . . . . . . . . 51
iii
7.1 MOJO: a videogames exposition and demonstration. . . . . . . . . . . . . 56
7.2 The player is strafing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.3 Which distance? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.4 The problem of avoiding an entity. . . . . . . . . . . . . . . . . . . . . . . 60
7.5 DogMate computational impact. . . . . . . . . . . . . . . . . . . . . . . . . 61
8.1 Filtering intents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
iv
List of Tables
3.1 Comparative analysis of the presented works. . . . . . . . . . . . . . . . . 19
5.1 The nine salience patterns from two velocity-emotivectors. . . . . . . . . . 37
5.2 Affective appraisal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.1 The animations and subtitles for each intent. . . . . . . . . . . . . . . . . . 52
7.1 Matches between Rusty’s interpreted intent and the player’s intent. . . . . 57
7.2 Matching external observer’s interpreted intent and Rusty’s recognized in-
tent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.3 Matching observer’s recognized intent and player’s intent. . . . . . . . . . . 59
v
vi
Chapter 1
Introduction
1.1 Motivation
In the last few years, the graphical quality of synthetic characters has improved tremen-
dously. Despite the fact that interactive applications and videogames in particular offer
stunning visuals, approaching photo-realism in some cases, characters with human-like be-
haviour are still nowhere to be found. While graphics have been continuously and rapidly
pumped up, synthetic characters’ behaviour has not evolved at the same rate. When, two
decades ago, characters were but a few pixels on the screen, they could already express a
wide range of behaviours. Who can not remember Sonic the Hedgehog1 tapping its foot
onto the ground, expressing its impatience? Or the angry Donkey Kong2 grinning at the
“poor” plumber which was trying to rescue his love, Pauline? While their expressions
were varied, and are even more nowadays, they were only superficial behaviours. And we
can say that it has not changed that much.
Figure 1.1: From few pixels to stunning visuals, Sonic evolution in a few years.
Imitating the human behaviour cannot be reduced to reproducing a fixed set of an-
imations. Animations (including sound and dialogues) would be the final step in the
long process necessary to emulate such behaviour and express it. Characters need the
right support to express themselves consistently in different context without adopting a
behaviour that would look out-of-place. This support can be based on events (e.g. the
1Sonic Team (1993): Sonic the Hedgehog. http://www.sega.com/sonic/2Nintendo (1981): Donkey Kong. http://en.wikipedia.org/wiki/Donkey Kong (video game)
1
player triggers a dialogue with another character and the character talks back), on the
character’s personality (e.g. the character is a coward and runs away from any dangerous
situation) or even on its relationship with another being (e.g. the player’s avatar and his
sidekick are comrades; thus, the sidekick will always help that character). Nevertheless,
such examples only constitute a small parcel of the support a character might need to
simulate the traits of human beings. It makes us notice that there is a difference in the
progress achieved in both their behavioural and their graphical components (in which we
sometimes have difficulties to tell computer-graphics apart from the reality). Even if the
academic community has, over the time, tackled on a large variety of problems directly
related to the enhancement of believability (encompassing the previous topics and many
more like emotions, psychological modelling or even cultural roles), game developers have
yet to incorporate the work of these researchers into their productions.
Videogames still rely heavily on scripts to define the behaviour of their characters and,
as time goes on, those characters tend to be perceived as flat and life-less. If we were to
meet a stranger in the street and ask him a thousand times the same question, he would
probably start by ignoring us and think we are crazy. Eventually, if we are insistent, the
“poor” guy would probably begin to feel irritated and ask us to stop bothering him. If we
were to continue, he would probably get angry and things would look bad for us. However
in most videogames, if we were to ask something to a character a thousand times, it would
probably answer us the same way every time. Fallout 33 is an exception in the sense that
such behaviour is displayed.
The problem deepens if we look at other traits like the character’s social status, or
its relationship with another character. In the latter, for instance, there is a character
that suffers tremendously from the lack of support to develop its natural relation with
the player’s character. This character is the player’s sidekick. The sidekick accompanies
the player’s avatar the most. Being almost always together means that both should know
each other relatively well. But then, why after so many adventures can it not guess the
player’s preferences? If we spend a lot of time with another being, we naturally begin to
understand him better. Each time we guess much more accurately how he would react
and feel about different situation. We can judge and interpret his overall intentions much
better than someone we never met before. This, however, does not happen with sidekicks.
Their interaction with the player at the beginning and at the end of the game remains
mostly the same, which does not contributes for the character to appear as believable
[33].
3Bethesda Game Studios — Fallout 3 (2008): http://fallout.bethsoft.com/.
2
1.2 The Problem
As the player advances through the game, he gradually learns more about his sidekick
companion. It can be through the story or, indirectly, through their long and close
relationship. He starts to expect some kind of attitude from the sidekick, some kind of
actions in specific situations. The player begins to understand the how and why of his
companion’s behaviour, or in other words, he begins to understand the intentions behind
its actions. Sidekicks, however, spend the exact same amount of time with the player’s
avatar, yet their behaviour does not reflect it. The underlying intention of the player’s
actions does not matter as the sidekick seems to ignore it. Understanding another being’s
intentions is a natural part of human’s cognitive capacities, and may reflect the closeness
of two individuals. Dennett’s intentional stance [8] tells us that it is an important tool in
understanding our surrounding and anticipating how everyone around us would behave.
In a videogame, it is even more important to understand the intentions behind the
actions of another character. In fact, it may open many doors to new kinds of behaviours,
only limited by the designer’s imagination. A character, by understanding the intentions
of another being, can reflect in his own actions a certain sense of importance and pres-
ence through pro-activeness and anticipation. However, we cannot discard an important
fact, videogames are real-time applications which already uses the current-generation of
hardware to their limit. To be usable in such conditions, the intent detection needs to
happen in real-time and to be almost costless.
As such, the work presented in the remainder of this document addresses the problem
that sidekicks face when accompanying the player’s avatar. We focus on detecting the
player’s avatar action to analyse its underlying intention from the point of view of the
sidekick, in a light-weight and real-time manner. Our solution for this problem is based on
the hypothesis that: by anticipating the player’s avatar actions and confronting
them with his real actions, we can understand some of his intentions.
1.3 Contributions
In this work we propose a model of intentions that we use to detect actions and their
underlying intents in videogames. We use and remodel an anticipatory mechanism, the
emotivector, to suit certain aspects of interactive environments such as videogames and
support an objective comparison between multiple expectations violation. We present a
framework that analyses the player’s avatar action to extract possible intentions from it
and interpret them from the perspective of the sidekick. Its purpose is to serve as a support
mechanism to allow the usage of an adequate behaviour for the sidekick, a behaviour that
would match the situation that the player is currently experiencing. This framework is
built taking into consideration the peculiar aspect of videogames performance. As such,
3
we built a solution that works in real-time and that can be implemented with no noticeable
impact on the usage of hardware resources.
1.4 Outline
This document is organised as follows. In chapter 2, we gradually introduce the notion that
constitutes a character and the difficulties it faces to achieve believability in an interactive
environment. We explore the sidekick, its role and its problems in a videogame. Chapter 3
presents related work that contribute to the problem of believability by using anticipation
to predict the actions of another character. In chapter 4, we model the relation between
intention and action, and discuss how action can be divided into movement, target and
intent. We then discuss how each concept can be applied in the context of virtual worlds.
Afterwards, in chapter 5, we propose an anticipation-based framework (DogMate) to
detect, from a character’s perspective, the intent underlying some of the player’s avatar
actions. The following chapter (6) describes our test-case (K9) that exemplifies how
our framework can be used to control the behaviour of a synthetic character (in this
case, Rusty, the dog companion of the player’s avatar). The evaluation of our test-case
follows in chapter 7, focused on three aspects: intent recognition, intent interpretation
and solution’s efficiency. Finally, we conclude in chapter 8 and present a few directions
to further pursue this work.
4
Chapter 2
Believable Characters
Achieving believability with synthetics is still an open problem. Many researchers focus
their efforts into advancing toward that goal and, while noteworthy progress has been
achieved, we have yet to see a character that could be mistaken for a human-being in an
interactive environment.
The field of synthetic characters covers broad topics from diverse fields such as com-
puter science or psychology but also from others artistic areas. The latter ones repre-
sent the component of synthetic characters which has evolved the faster. By looking at
computer-generated films such as the recent Kung Fu Panda1 (fig. 2.1(a)) or Monster vs.
Aliens2 (fig. 2.1(b)), we can safely say that for a character to get visually believable, in
an interactive environment, it is now a matter of processing power, i.e. a matter of time.
(a) Kung Fu Panda. Copyright c⃝DreamWorks AnimationLLC, All rights reserved.
(b) Monsters vs. Aliens. Copy-right c⃝DreamWorks AnimationLLC, All rights reserved.
Figure 2.1: Computer-generated films achieve a stunning visual quality.
1DreamWorks Animation LLC, Kung Fu Panda (2008): http://www.kungfupanda.com/.2DreamWorks Animation LLC, Monster vs. Aliens (2009): http://www.monstersvsaliens.com/.
5
Comparatively, synthetic characters’ behaviour has several noticeable shortcomings.
Typically it is their behaviour that makes characters look less believable and it is rather
easy to observe. Earlier, we stated one of them with the repetitive behaviour that they
possess when the player interacts with them too many times in a row. Sometimes, it feels
as if the character almost forgot that we talked to him earlier.
In our investigation group, GAIPS3, we focus on building synthetic characters that
display a behaviour that humans would like to interact with. Narrowing it down, in this
work we propose a mechanism to support the enhancement of character’s behaviour in
videogames environment. But what elements does a synthetic character possess? Also,
why are videogames such a special environment for synthetic character? And, most of
all, how can we achieve believability in such environment? The rest of this section will be
dedicated to understand the fundamentals of synthetic character so that these questions
can be answered.
2.1 Synthetic Characters
Studies on character’s believability started a long time ago. When the first cartoons ap-
peared, this was already a huge preoccupation. In 1981, Thomas and Johnston published
a book called “Illusion of Life” [32], in which they discussed techniques about how to give
life to hand-drawn characters. Their objective was to permit the viewer’s suspension of
disbelief or, in other words, make him forget that what they were seeing was not real. It
has been a starting point for computer scientists to initiate their journey into the complex
problem that is giving virtual characters a life of their own.
Synthetic characters, just like hand-drawn characters, can be divided into two funda-
mental parts. Metaphorically, we can think of it as the body and the mind. In one hand,
we have the external appearance, including sounds and animations. In the other, we have
the internal states, including personality and feelings. The character we build up through
this work is a dog, Rusty, created using Fallout 34 and its modding framework. Rusty can
bark loudly and can wave his tail in happiness (fig. 2.2). These are part of his physical
attributes. But Rusty is also the player’s trustworthy companion and tries to help him
to the best of his abilities — by guiding him, counselling him or even fighting along with
him. This results from his current internal attributes.
Both the body and the mind need to be consistent with each other for the character
to appear believable [27]. Back in 1992, Bates recognised that too much focus was given
to the visual appearance [2], a fact that still persists today. By breaking the equilibrium
between both parts the efforts to achieve an overall believable character tend to be coun-
terproductive [33]. While this problem can be dissimulated for non-interactive synthetic
3Intelligent Agents and Synthetic Characters Group: http://gaips.inesc-id.pt/gaips/.4Bethesda Game Studios — Fallout 3 (2008): http://fallout.bethsoft.com/.
6
Figure 2.2: Rusty, our synthetic character, is the player’s trustworthy companion.
characters (e.g. characters that appears in computer generated films), once the user has
the power to interact with characters it will be a matter of time before their behaviour
breaks the suspension of disbelief [1]. But why is the behavioural believability so hard
to achieve? We now analyse the dimensions behind a character’s behaviour and try to
understand their inherent complexity.
2.2 Dimensions of a Character
Sheldon [31] divides a character’s behaviour into a sociological dimension and a psycholog-
ical dimension. The character’s sociological component defines how the character came to
be what it currently is. It takes into consideration its growth, its history, its culture and
even its living environment. All these help to define how a character interacts with the
others and how its behaviour might be perceived by them. The psychological dimension
can be though as the character’s mind. We could consider this as the main reason to
justify any current action made by the character. The strongest reason why the character
would do an action in the first place is because of its psychological attributes.
On the other hand, Rollings and Adams [29] prefer to take into consideration the
character’s emotions. They state that a character’s dimension can range from being
zero-dimensional to three-dimensional. Zero-dimensional characters are those with fixed
emotions but no variability between them. They have emotional states from which they
switch unrealistically (e.g. an enemy switching from killing intent to fear when he decides
to flee). One-dimensional characters make this transition smoother. They have a fixed
axis on which their state is slowly progressing. Two-dimensional ones tend to be fairly the
same but vary on more than one axis, giving birth to more complex behaviours. However
two-dimensional characters do not have any kind of internal conflicts between their axes.
James Bond is an example of a two-dimensional character. He can love and he can be
dishonest. He can even do both at the same time, but his impulses will not conflict nor he
will have any emotional ambiguity; he would do anything without remorse for Britain’s
sake. Finally, we have humans and their natural characteristics such as inner conflicts
(e.g. hating and loving the same person), doubts and complex behaviours which is also
the definition of three-dimensional characters. Back in 1994, Bates was already defending
7
that an emotion-less character is a life-less one [3]. He argued that if a character does not
react emotionally to what is happening, if he does not care, then neither will the viewer.
So far, we talked about sociology, psychology and even emotions. By itself, each
of those words represents a broad and complex area. Yet, as said earlier, if believable
characters are present within computer-generated films it must mean that behavioural
believability has already been achieved. So why is there still no believable character in
videogames? The problem might be related to the user interaction.
2.3 Interactive Characters
Rollings and Adams [29] defend that a believable character should possess three important
characteristics. The character should intrigue the user. It should get the user to like him.
And, finally, the character should change and grow according to its experience. This last
affirmation is what makes it so hard for a character that appears believable in a film to
also appear that way in a videogame. In a film, everything that is going to happen is
known before hand. The character’s reactions, its relationships or even its growth, all
these little details are known and set in stone. In a videogame, or any other interactive
application for that matter, we cannot anticipate the full range of action that the player
(the user) will do, nor their impact on the characters. With the introduction of interactive
application, synthetic characters just received a whole new set of problems.
Based on the work on the “Illusion of Life” [32], Bates defined a believable agent
as an interactive character that does not disrupt the user’s suspension of disbelief [1].
He continues saying that in an interactive world, any user would want to be able to do
anything which he is allowed to do. Only then, the user would feel that the world is a
believable one. So that such a believable world could exist, an interactive character should
be able to react for every action that the user makes. Visually, there has been tremendous
progress toward that end. However, for this to happen behaviourally, every one of the
supra-mentioned dimensions should evolve accordingly to the current user interaction.
Yet, Bates also states something that, at first glance, seems contradictory. He says that
a character does not need to have a complex behaviour to appear believable. He clarifies
himself saying that the character should just appear that way to the user [4]. Quoting
him, “an agent that keeps quiet may appear wise, while one that oversteps its abilities
may destroy the suspension of disbelief”. Loyall [18] also argues in the same direction. He
states that it is not necessary for a synthetic character to be fully realistic. Instead, fine
details already have a great impact on how the user may perceive the situation. Indeed,
if the character, at some point, is able to display specific and context-aware behaviour it
could be enough for the viewer to portrait it as having a unique personality, to give him
a distinct behaviour, to give him a life of its own.
A good example is the work of Martinho and Paiva [21]. There, a synthetic char-
8
acter tries to anticipate what the user would do and reacts emotionally whether it is
disappointed or excited by what really happened. Although anticipation has rarely been
considered when creating believable agents [22], they show how powerful it can be in
producing those fine details, that synthetic characters need to feel alive. And this is what
it is all about: creating small tricks that make the viewer forget that the character is a
synthetic one.
Tomlinson et al. work [33] give a few guidelines to help build such mechanisms. They
identified six aspects necessary to create a believable interactive character, which sum-
marise what we have presented until now. First, they defined several kinds of interactions
that a synthetic character should possess. They defend that, initially, it should be aware
of the user and try to establishing a long term relationship with him. Secondly, they say
that if there is something that can be done in the world then the user would possibly want
to do it. In consequence, the character should be able to react accordingly. The third
aspect is about how the character’s behaviour should be rich and varied enough, so that
users do not feel bored while looking at it. The fourth aspect is related to the character’s
growth accordingly to its experience and how it is reflected to the user. The fifth one is
about how the character is used. It should be in a context in which the user can make
expectations and assumptions about what is going on. This can possibly be achieved by
making allusion to existing media, such as using cinematographic cliches. Finally, they
argue that a character’s body and mind should be balanced. One should not raise the
expectations if the other cannot match it. Therefore, if a character is visually believable,
he also needs to be believable behaviourally. Otherwise, it is better to keep the visual
down to avoid breaking the viewer’s suspension of disbelief.
One fact remains, the more a character interacts with a user, the more his flaws will be
exposed. From minions to arch-enemies, a videogame harbours many kind of characters,
each one with a different purpose, each one intending to interact differently with the user
(in this case the player). Among them lives a special character, one that by the nature of
its interaction with the player requires a special attention to its behaviour: the sidekick.
2.4 The Sidekick
In videogames, sidekicks are a specific kind of synthetic character that accompany the
player through his adventure. In some games, the player’s companion may last for more
than 100 hours (e.g. Fallout 3). Their purpose and usefulness vary from game to game
but they always tend to have a crucial role [11]. A sidekick is probably the most important
character besides the player-character and undoubtedly the one which interacts the most
with the player.
The sidekick may be used to deliver story elements by the writer or integrated within
the game’s mechanics by the game designer [31]. But the sheer amount of interaction
9
between the sidekick and the player is what makes him so special and, because the sidekick
has a close relationship with the player-character (as depicted in fig. 2.3), it would be
only normal for them to know each other quite well. The player knows more about the
sidekick as the story unfolds, generally culminating into a strong emotional bond [11], but
what about the opposite? And while a no-name character has to appear believable for
a few seconds, a sidekick has to endure long and long hours of unpredictable interaction
and has to do so while avoiding to bore the player [19]. If he is not able to understand
quite a bit more about the player-character after hours of interaction, is it not a drawback
to believable behaviour?
Figure 2.3: The player-character and Rusty, his sidekick.
There are several ways to give the illusion that a character understands the player.
One example would be to comment about the player’s behaviour (e.g. comment about
the risks of attacking a particular enemy before the player even engages it so that he
can retreat unarmed). It could also be to pro-actively take the initiative without an
explicit order from the player. Both these examples are based on an independent and
context-sensitive behaviour that would match the player’s intentions. But to implement
such behaviour we, who are responsible to create it, would need to access the player’s
intentions, something clearly not possible with our current technology. Still, this is the
direction we choose to follow and in the remainder of this work we will focus on exploring
a possible way to understand the player’s intent so that the above behaviour we gave as
example could possibly be implemented.
2.5 Summary
Through this section we saw that a synthetic character possess both a body and a mind.
We saw that it is important to correctly balance the character’s appearance with its
behaviour and that, to appear believable to the user, it does not need to be fully realistic.
Specific and context-aware details are enough to give the user that sensation. We saw the
problems that arise from its interaction with a user, and we gave an emphasis to characters
that interact for a long period with a user, such as sidekicks. In these cases, we should
expect a deepening relationship between both the user’s avatar (i.e. the player-character,
10
in videogames) and the character and, to that end, it seems important for sidekicks to be
able to understand the player-character’s intentions.
In the next section we shift our attention toward this problem and analyse several
works which use anticipation to enhance the believability of synthetic character by pre-
dicting the next actions of other characters.
11
12
Chapter 3
Related Work
Anticipation has been used in several works to enhance a synthetic character’s behaviour
by allowing, for instance, to model emotional reactions (e.g. being surprised or confused
[13, 20]) or by permitting to adopt context-dependent strategies such as in the SOAR
Quakebot [14]. However, it was recognised by Isla et al. [12] as a key area that still
needs more attention when creating artificial intelligence for computer and videogames
characters.
At its base, anticipation permits to formulate an opinion about how and when we
think that an event may occur. Although we are interested in analysing the intentions of
a character when it does an action, our first step is to actually detect when an action is
performed. The nature of anticipation also provides another important tool: the capacity
to confirm or not that our guess was correct, i.e. an expectation confirmation or violation
[13]. These confirmations or violations already serve an important purpose in permitting
to model several emotions [20] but they may also serve as a basis to understand the reason
why an action was, or was not, performed.
Along this chapter, we present works that use anticipation to enhance the behaviour
of a synthetic character by predicting the future state of their environment. We begin by
presenting the Soar quakebot which takes inspiration in theory of mind to anticipate the
enemy’s decisions. Then, we look at a system that detects the player’s actions (and his
possible intentions) using a notion of plan and use them to disambiguate speech-based
orders. Afterwards, we review a system able to model temporal causality. Finally, we
review Aini, the synthetic flower, which uses anticipation to guide the player into solving
a task and react emotionally based on his performance.
3.1 The SOAR Quakebot
The SOAR — State Operator And Result, a symbolic cognitive architecture [15] — quake-
bot anticipates the behaviour of other characters taking an approach similar to the psy-
chological theory about the human mind, the Theory of Mind. Theory of Mind studies
13
the human’s ability to explain and predict both his own actions and those of the others
[7]. In other words, it gives a possible explanation about our mental processing and how
it influences our behaviour. Multiple Theory of Mind theories exists but they usually
can be divided into two groups, the Theory theories and the Simulation theories (refer to
Theories of Theories of Mind [7] for more details). The latter defends that our ability to
understand others is by simulating them, and their states, into ourselves, as if we were
them. Anticipation was introduced into the SOAR architecture using a similar paradigm.
Laird made the agent simulates that it is another character, consequently giving him the
capacity to predict its enemy’s behaviour [14].
Initially, the SOAR quakebot was built with the objective to control a single character
as if it was a human playing in a Quake deathmatch (see fig. 3.1). It ensures to work
independently of the level as it builds an internal map, in its working memory, at run-
time. Along, it stores useful information about the game’s items such as health packs and
weapons’ respawn points, as well as the agent’s own information (e.g. perceptions, knowl-
edge). Based on these, it applies the operators — primitive actions that can be applied
to the world — which pre-condition match a certain state in the working memory (i.e. in
a reactive manner). The complete update cycle starts by sensing the world, including the
agent’s internal state, therefore updating its working memory. Then, checking for valid
operators and selecting the one to apply, it transfers its command to the game engine to
be executed.
Figure 3.1: A screenshot of Quake 2.
Anticipation was plugged in as an operator, i.e. it is triggered by the desire to predict
an enemy’s behaviour. When selected, predicting an enemy’s behaviour is achieved by
creating an internal representation of that enemy’s known information, structurally similar
to the agent’s own information (e.g. current position, health, weapons). The agent then
tries to apply its own tactics to this set of information. As in the Simulation Theory
[7], the agent uses what it knows from its enemy and simulates what it would do in the
enemy’s place. Thus, the agent can predict the enemy’s action by looking at the final
output.
This, however, has the drawback of assuming that the SOAR quakebot and its enemies
have the same tactical strategy (e.g. which weapon to use, when to look for a health pack,
14
how to ambush). This gets especially problematic if the enemy is a human-player, as two
players would rarely make the same tactical decisions. It could get better if the bot could
learn the enemy’s strategy (but it would have to take an extra care to avoid learning
the wrong things). Yet, even with a good model of the enemy, another problem subsists
— this approach is limited by the model itself. The model only permits the agent to
anticipate decisions previously included in it as tactical decisions.
From reactive rules to plans, these decisions can be as simple or as complex as we
want them to be. They may reflect how a character usually should proceed in the game
(e.g. do the first action, then go somewhere else do the second one, and so on...). This is
also how the next work we present approaches the detection of the player’s intention, i.e.
using a plan.
3.2 Situated Speech
To communicate, natural language seems the most practical way. However, this has yet
to become the main communication medium for games application. While some games,
like Facade1 [23, 24], have made a huge step toward the use of natural language, most
still use the interface commands to let the player express himself.
Gorniak and Roy approached this problem using speech recognition. Speech intro-
duces an additional complexity (e.g. the noise in the utterance) not present in a textual
interpretation but, as they point out, it is also more natural for a player to use his hands
to play and his voice to communicate [10]. To understand the meaning behind the player’s
speech, they analysed it against two models.
First, to understand to which object a word is referring to, they implemented a physical
model. For instance, when the player refers to an object by its name, he has a specific
object in mind, mostly determined by its spatial location, which is not reflected by his
speech (i.e. multiple objects might have the same name).
The second model is an intentional one. In this one, the purpose is to understand a
possible intention behind the player’s speech, also to disambiguate it. The same action
might be referred by diverse utterances and there is a need to understand their funda-
mental meaning. If the player asks to open the door or to let him get out of his room,
both refer to the same intention of having the door opened. This example also reflects
that the player’s might not always express his intentions at the same level. Thus, to cope
with this situation, they implemented a predictive grammar in which symbols correspond
to possible actions for the player. It determines, probabilistically, which action the player
will mostly likely do next, and use it to disambiguate the utterance. To construct such
grammar, they take into consideration the possible ways to achieve a particular goal and
their respective variations.
1Procedural Arts: Facade. http://www.interactivestory.net/.
15
Because a game has such goals for the player, and assuming that the player typically
plays the game to complete it (or explore it) [16], it gives us a basis to understand the
player’s actions and to match them against possible intentions. While useful for broader,
and well structured goals in which a plan can be devised, sometimes the game might
provide (or the player himself may create) optional challenges which is up to the player
to beat or to ignore (e.g. defeat an enemy). Including these optional challenges in the
grammar might give it an unnecessary complexity and not always reflect its true purpose
(e.g. is the player defeating enemies to progress toward the completion of the game or
is he just looking for a rare loot?). Although, some times, the intention behind it might
not be relevant (e.g. the fact that the player is attacking an enemy is sufficient), it might
also hide another intention (e.g. defeat an enemy to obtain a key relevant for the main
challenge).
The next work we present departs from the notion of plan and permits formulate
relationship between different events in time by modelling temporal causality.
3.3 Temporal Causality
In his master thesis [6], Burke presents a system that understands temporal causality
between previously unrelated events. His work, based on ethological studies, gives the
agent a mechanism to represent cause and effect in time. The agent is able to believe
that, under certain conditions, a stimulus might cause another one to appear — even if
both might be unrelated.
Nevertheless, to achieve proper results, some pruning and reinforcement are needed.
A trial is started each time a stimulus that might predict another one is received. Suc-
cessfully passing a trial reinforces the belief that certain stimuli cause the appearance of
another one. On the other hand, failure leads to more uncertainty and, ultimately, that
possibility is pruned. This system’s predictors encapsulate both Scalar Expectancy The-
ory (SET) and Rate Estimation Theory (RET) theories. SET helps the agent to know
when to expect the effect caused by a certain stimulus. RET, on the other hand, helps
it to judge the stimulus’ reliability when trying to predict another one. Burke also shows
that the agent’s drives can help it to select which action might be the more appealing.
Unifying both drives and predictors, he shows how agents can learn to expect specific
events and how they can even be triggered, out of necessity, by the agent.
Burke used this system so that agents could understand their surroundings, which is
also where the user’s interaction belongs. This system makes the agent able to analyse
and understand the player’s behaviour. Indeed, while it is not only used to that end,
it detects the player’s action pattern and learn how he tends to act. It has been used
to make Duncan, the Highland Terrier (fig. 3.2), learn when to expect a treat from the
user, for instance. At the same time, Duncan can also use the system to its advantage by
16
Figure 3.2: Duncan, the Highland Terrier.
understanding better how his world works.
Using this system, we could potentially predict local actions independent from their
broader goals. Nevertheless, if we focus on anticipating the player’s behaviour, this ap-
proach would take some time to build a useful set of patterns and, in the mean time, the
player might change the way he plays. More importantly, we would want to avoid the
system to display that it learned the wrong patterns (that might have been learnt out of
pure luck, and that might take a while to prune) which might engender a non-believable
behaviour.
Another complementary approach that could help to recognise which entity could
interest the player — thus allowing to recognise local actions independently from a plan
— might lie in emulating his attentional system. If we can recognise where the player
is shifting his attention, then we might also formulate some hypotheses about his next
action. The next section presents how such system was emulated within a synthetic
character.
3.4 Aini — The Synthetic Flower
Aini is a synthetic flower built in a hangman-like game. While the player uncovers the
unknown word by putting letters in their respective slot, Aini watch the scene and reacts
emotionally based on the player’s performance [21] (see fig. 3.3). Aini was built on top of
emotivectors, low-level anticipatory mechanisms [20] (presented in detail in section 5.1),
that help to control his attentional and emotional behaviour. Aini has a grid of sensors
that covers her vision area and each one reports a normalised distance to the closest letter
it detects.
The emotivector’s exogenous salience helps Aini to detect which letters moved the most
surprisingly. It is achieved by calculating the mismatch between the object’s expected
position and its actual position. Greater differences are translated as surprising events
to Aini, events to which she will most likely pay attention. It was compared against
17
Figure 3.3: Aini, the synthetic flower.
popular games approach, which in Aini’s world corresponds to give the attentional focus
to the letter which is closest to her. It was shown that the exogenous salience helps to
select more relevant letters and support a behaviour more adequate to what the player is
actually doing.
This is important to us as it models Aini’s attentional system. It assumes that sur-
prising events make strong candidates for the character’s focus of attention and this was
corroborated by the realised experience. Detecting the character’s attention focus is a
first step toward understanding his possible actions as it narrows down the list of possible
entities the character might wish to interact with.
3.5 Summary
In this section we looked at works that focused on improving the believability of synthetic
characters’ behaviour through the use of anticipatory techniques. We also saw that they
could be used to predict a character’s (and thus the player-character’s) next actions and
in the following table (3.1) we briefly review the presented works. In it, we analyse the
type of actions these systems might be suited to detect as well as their respective approach
and some their limitations.
The SOAR quakebot [14] and the situated speech work [10] seem better suited to
detect a character’s behaviour that is part of a plan, suited in cases in which we want to
predict the higher level objectives of the character. On the other hand, temporal causality
[6] and Aini’s attentional system [21] seem to be most useful in detecting the player’s local
actions, regardless of its possible use at a larger scale.
Detecting actions is just the first step, we still need to understand the reason why
they were performed. In the next chapter, we propose a model that relates the intentions
of a character to the actions it performs in its virtual world.
18
Work Possible use Approach Limitation
SOAR quakebot
Detect actionsthat may bepart of a strate-gy/plan.
Guessing the char-acter’s internalstate.
Assumes that thecharacter has thesame internal modelas the agent.
Situated speechDetect actionsthat are part ofa plan.
Plan-based proba-bilities.
Does not account foractions outside theplan.
Temporal CausalityDetect corre-lated actions.
Model causality.Might not adaptquickly enough.
AiniDetect the focusof attention.
Quantifying unex-pected events.
Does not take plansinto account.
Table 3.1: Comparative analysis of the presented works.
19
20
Chapter 4
Intentions and Actions
Until now, we have been talking about intentions but we have yet to define the scope
of this term. Indeed, we have been using the same word to refer to multiple concepts.
An intention can potentially assume several forms, it can be as simple or as a complex
as we want them to be. It can encompass higher-level goals supported by a plan (e.g.
invade a building), tactical decisions (e.g. choosing a safe route over a way which is
ambush-prone) or spontaneous intents (e.g. getting out of the way of an enemy). These
can even be combined into hierarchical intents [10]. Consider, for instance, a game session
in which the player wants to invade a building. He has two alternatives from which he
choose the safest. Even if it is the safest, it does not mean that it is enemy-free and
one suddenly appears in front of him. If he goes toward that enemy, what could possibly
be his intention? The player might want to kill the enemy but does that mean that he
does not want to invade the building any more? Not necessarily as he might just have a
short-term intent to dispose of the enemy and keep the building invasion as a higher-level
intent that involve multiple short-term, and spontaneous, intents.
This is, however, a confuse use of the term “intention”, therefore, along this section,
we elaborate a model of intention that we will use consistently through the remainder of
this work. We first look at Searle’s theory in which he defines the notion of intentions and
relates it with actions. We then look at how, in agent systems, computer scientists have
incorporated this notion. With both a theoretical and practical background, we make a
distinction between the player and his controlled character, the player-character and we
elaborate our model that relates two kinds of intentions with the actions a player may
perform. We continue describing how an action can be divided into three components:
movement, target and intent. Finally, we proceed to detail how the three components of
an action can be detected in a virtual world, based on the matching and mismatching of
anticipated behaviour related to distance change between entities.
21
4.1 Intentions in Philosophy of Mind
Philosophers have long ago formalised the notion of intentionality and, in his definition
of intentionality, Searle [30] states that:
“Intentionality is that property of many mental states and events by which
they are directed at or about or of objects and states of affairs in the world.”
As an example, Searle stated that “if I have an intention, it must be an intention
to do something” ([30] p.1). An interesting topic, for our model of intentions, is the
relationship he defined between intention and action ([30] p.79-111). To proceed, we first
need to borrow the definition of four terms from Searle’s theory:
∙ A prior intention is an intention that would lead to an action (i.e. premeditation).
∙ An action is a causal and intentional transaction between the mind and the world.
∙ An intention in action is the intentional component of the action. Might be
though of as a volition.
∙ The bodily movement is the element which constitute the conditions of satisfac-
tion of the intention in action.
Summarizing these definitions, the following relationship can be established ([30] p.94):
prior intentioncauses−→
action︷ ︸︸ ︷intention in action
causes−→ bodily movement
Searle states that while a prior intention may lead to an action, an action may exist
without a prior intention ([30] p.85), but it has to contain an intention in action ([30]
p.82). Consider an example in which someone wants to enter a building (prior intention).
First, he needs to open the door (action). The action of opening the door has an intention
in action (the desire to open the door) and a bodily movement (the actual opening of the
door). The relation between intention and action is important as, in interactive virtual
worlds, every change caused by the user to the application is a result of an action of the
avatar the user controls.
Yet, we still feel that there is a need to express a definition suited for videogames
applications. Hence, in the next section, we look at the belief-desire-intention (BDI)
architecture that already implements the notion of intention in software agents.
4.2 Intentions in Computer Science
The BDI architecture finds its roots in philosophy and is based on a human practical
reasoning model of the same name [5]. This architecture is used to model deliberative
22
agent [28] with a clear separation between choosing what to do and how to do it. In such
architecture, the agent’s information about its world is represented as beliefs, the desires
represent the agent’s objectives or motivation and its intentions represent its commitment
to achieve a particular desire.
Rao and Georgeff [28] propose an abstract implementation of this model in which two
definitions are notably interesting for our work:
∙ An intention is formed by adopting a certain plan of actions. It is selected according
to the agent’s beliefs, desires and current intentions.
∙ A plan is a sequence of actions, or sub-plans (hence the possibility of hierarchical
plans), which purpose is to fulfil a desire in a given situation.
The BDI definition constitutes a practical application of the notion of intentions,
and some similitude can be found between it and the SOAR quakebot and the intention
detection in Gorniak and Roy’s work (presented in section 3) as both use a notion of
plan to guess a possible intention. This model does not, however, encompass the notion
of intentional action with no prior intention (or desire, in this case). Furthermore, for
this model to be truly effective we ought to know about the subject’s beliefs, desires and
current intentions. It is not a problem when applied to an agent, as they are part of the
system, but it constitutes a problem with the player.
4.3 The Player and the Player-Character
Before proceeding toward our model of intentions, we need to categorise what we assume
when we refer to the player. Searle’s definition assumes that the subject is a human, which
accomplish his actions through movements. But what kind of movements does a player
have? He mainly interacts with a physical device, his controller (or keyboard and mouse,
in the case of computer games). The BDI model is also based on humans and, as an
agent architecture, it inherently assumes that all the agent’s internal states are available.
However, the player, while human, is but a sequence of inputs from the point of view of
the system. This sequence of inputs controls a virtual character, the player-character,
from which the system knows nothing of its internal state. It does not know its beliefs,
nor its desire, much less of its intentions, these are all in the player’s mind. While we can
try to create a mental model (Gorniak suggests some steps to create an accurate mental
model of the player in [9] (p.79-85)), it will not give us all the variables of the equation,
and it is outside of our research’s scope.
Because the player is the human which express himself through his character, all the
information that we can get from him comes directly from his avatar. Anything that he
wants to express is limited by the videogame mechanics and by his avatar capabilities. In
23
the same way, the intentions that we try to recognise are those of the player-character,
which might not correspond exactly to the intention of the human behind it. The logic
behind most videogames is one of mutual interaction between the present entities [25]
(e.g. the player-character and an item or another character). Thus, we suppose that most
of the intentions of the player’s avatar are related to these interactions which might not
be the player’s real intentions — he is a human with an unlimited range of intentions and
not one which can only interact in a limited way with his surroundings.
With this consideration, we then proceed to formulate our model of intentions and
will use it through the remainder of this work to detect the intentions behind the player-
character’s intentions.
4.4 Model of Intentions
In a videogame, a player has to obey certain rules. If the player has to invade the enemy’s
headquarters, he will have to do it. If the player cannot kill an innocent villager, there will
be no intention in the world that will make it happen. These rules, or game mechanics,
help us to narrow down the actions that a player can perform and consequently their
intention in action. From here onward, we use the term intent instead of intention in
action, but it still represent the volition in an action’s execution.
Actions, in videogames, are typically performed from a list of available “physical”
movements that an entity may perform on a target. The target can be the player-character,
another character or a world artefact, as long as it is a game entity (i.e. something that
is relevant to the gameplay [29]). Additionally, the movement and the target restrict the
possible intent behind the action based on the target’s stimulus-response compatibility
(affordances) defined by the gameplay rules. Once again, game mechanics are the key
in the sense that if they define the player-character as being solely able to “attack” a
character of the type “enemy” and “greet” those of the type “villager”, then we will only
look for these specific intents when the opportunity arises.
Let us exemplify with a concrete movement : the player displaces the player-character
(e.g. makes it run or walk, one of the actions available for that character) through its
virtual world. While moving, if the player has the intent to attack an enemy, he moves his
character toward it. This action will be different than if he wants to flee from it. In the
first case, the player-character is going toward an entity, while in the second, the player-
character is going away from it. This example also helps to understand that a gameplay
action might not just refer to explicit, well-defined actions typically tied directly to inputs
(e.g. jump, shoot, use an item). In this case, we consider that fleeing from a threat is also
a gameplay action, at the same level as opening a container or a door. These actions have
different intents (e.g. avoid a threat, get some item or progress further into the level),
different movements (e.g. walk/run or “activate” an item) and different targets (e.g. an
24
enemy character, a container or a door).
Finally, we have the prior intention. These intentions are, as defined by Searle, inten-
tions that may lead to one or more action or prior intention – as a premeditation. We
thus define these intentions recursively, similar to the concept of plans in the BDI archi-
tecture and to the suggestion given by Peter Gorniak and Deb Roy [10] (i.e. when the
player tries to achieve his objective, he typically follows some kind of plan). Their level
in the hierarchy makes these intentions suitable for the objectives that the gameplay may
offer to the player. Gameplay objectives provide challenges attainable through a serie of
actions (or lesser challenges) but they usually do not force them on the player, i.e. he is
free to realise these actions whenever he wants.
The introduced concepts and their relation to their videogame’s counterpart are re-
sumed in the following figure:
Figure 4.1: Our model of intentions mapped into game elements.
We may wonder whether or not the player might even want to fulfil the game objectives
and, thus, follow our model of intentions. The player is free to do whatever he wants, but
if he is playing the game, Nicole Lazzaro’s work [16] points that it is typically to have
fun and play along the rules. Her work analyses the reason people play games and results
are divided in four categories. Some peoples like to play games as a mean to achieve
“altered states”. They play to clear their mind and feel better. Others play for “the
people factor”. They enjoy the activity and social interaction with other people, much
more than the game itself. However, the remaining two categories are the most interesting
for the kind of problem we are targeting, an open world full of challenges and places to
explore.
First, some people play for the emotions they get from the game’s challenges. They
want to beat the game and feel accomplishment in doing so. These players are categorised
as having a “hard fun”. In contrast, there are people that like to have “easy fun”. Such
people like to explore their virtual world and be in awe with their discoveries. They pay
attention to details and enjoy experience total immersion in the game.
In games such as role-playing games, adventures or even shooters, this means that
players like to either explore and/or beat the game. In the first case, the player is not
preoccupied with clearing objectives so he might elicit more actions per-se, deprived from
25
prior intentions. In the second, as he wants to complete the game, he is prone to follow
some kind of plan — although it might happen in an unordered fashion.
Through this work, we opted to focus on the detection of the former, actions with
no prior intentions. Such actions might happen anywhere, any time. No players are
contained strictly in one of the four aforementioned categories. So even players that are
set on beating all the game challenges might, at one time or another, do something that
has not much purpose in their plan but that are relevant for their own enjoyment.
We now clarify how and when to detect these actions.
4.5 Components of an Action
The logic underlying the game mechanics of a virtual world is inherently constrained by
the graphical structure in which the action takes place and, in game dependants of their
graphical logic, Mateas and Wardrip-Fruin [25] state that the logic can often be reduced
to two fundamental components: displacement and collision detection. Therefore, the
concept of distance is fundamental to understand what is happening in the virtual world.
As an example, consider Rusty trying to bite an enemy. To be able to catch his opponent,
Rusty will need to reduce the distance between them until he is close enough to perform
the action of biting the enemy.
As a first step, we perceive all that is happening in the virtual world by reducing
everything to distance variations between entities. If the distance between two entities
is below a certain threshold, then something happens. These entities can be intentional
or static. Intentional entities are able to move and act on their own and, as such, have
inherently an intent associated with that movement, while static entities can only be acted
upon, and have no intentional state attributed, although displaying certain stimulus-
response compatibility (affordances): if a character approaches a closed door, it probably
intends to open it.
Because we reduce everything to distance variations, detecting the player’s intention is,
in our approach, to understand why the player-character is moving toward or away from
certain entities. Because distance varies generally in a continuous manner, expectations
can be created regarding distance variation. Such expectations, when violated, can be
used as a temporal signal that new intentions may have been generated by unpredicted
change in the action.
We now discuss how the three components of the action — movement, target, and
intent — are computed in our model using a small scenario extracted from our test-case.
26
4.5.1 Scenario
Imagine a scenario with four distinct entities: the player (fig. 4.2(a)); a dog, Rusty
(4.2(b)); enemies (4.2(c)); and a building (4.2(d)). The player’s objective is to get into
the building alive. He may choose to dispose of his enemies, but it is not a pre-requisite
to enter the building. In this scenario, Rusty acts as the player’s sidekick and observes
him constantly.
(a) Player-character (b) Rusty (c) Enemy (d) Building
Figure 4.2: The entities of our scenario.
4.5.2 Movement
A first problem in the detection of the player-character’s intent is to detect when a new
action begins. For that effect, we use movement. Consider that the player-character is
moving in the direction of a certain entity (a building’s door) in a predictable manner. At
a certain point, it unexpectedly changes direction and moves towards another entity (an
enemy). While all this happens, Rusty is observing. Figure 4.3 depicts such a situation.
Figure 4.3: When the player-character makes an unexpected movement, Rusty detectsone action based on the intentional entity (the enemy) and another action based on thestatic entity.
We consider this event mark the moment the player-character decided to interact with
the new entity. While unable to determine the start of any action until that point, Rusty
is now confident that the player-character started a new action: to go toward the enemy.
It is important to note that another action also started: getting away from the door.
In practice, we monitor the distance variation between pairs of entities. When an
unexpected variation occurs, it marks the start of a new action: if the first derivative
27
of the distance between entities is negative, they are moving closer; if positive, they are
moving away from each other.
4.5.3 Target
When monitoring the distance between an intentional (e.g. the player-character) and a
static entity, we know that when the distance changes, the intentional (not the static)
entity is moving (fig. 4.4(a)). When both entities are intentional, however, other measure-
ments are needed to disambiguate between possibilities for agency. Consider, for instance,
that distance between the player-character and another enemy agent has decreased more
than expected. In this case, there are three possible scenarios (fig.4.4(b)): the player-
character moved closer to the enemy; the enemy moved closer to the player-character;
both moved closer to each other.
(a) The player is the sole responsible for the distance increase betweenhimself and the building.
(b) Who is responsible for the distance to decrease?
Figure 4.4: Distance between intentional entities alone is not enough to determine agency.
To determine agency, we monitor the entities relative velocities. When the velocity of
an entity changes unexpectedly, we assume it may also influence distance between entities
in an unexpected manner. As such, if we monitor both an unexpected distance variation
as well as an unexpected velocity variation of an entity, the entity is considered responsible
for the movement, and the other entity is marked as the target of the movement. In such
cases, four combinations are possible: the player-character is getting {closer to, further
away from} the entity; the entity is getting {closer to, further away from} the player-
character.
28
4.5.4 Intent
The last component of an action is its intent. The intent results from a natural combina-
tion of movement and target. As discussed previously, the rules underlying virtual worlds
are mostly based on distance variation and collision detection [25]: if the player-character
wants to interact with an entity, it first needs to reach within its activation range. By
detecting the movement of “getting closer”, we can devise the intent based on the affor-
dances of the target, i.e. the type of interactions allowed by the target entity. The same
reasoning can be applied for “getting away from”.
Consider an example in which Rusty notices the player-character moving closer to
(movement) an enemy (target). From the affordances given by the enemy entity, Rusty
assumes it must be to attack it (intent). If the player-character was to move toward the
building’s door, Rusty would assume it would be to open it (the only thing one can do
to a closed door in the virtual world). If the player would move his avatar away from an
enemy, Rusty would assume he wanted to avoid him.
4.6 Summary
In this section, we provided a model of intentions that we will use to detect the player-
character’s intent. Using Searle’s [30] distinction between prior intention and intention
in action (intent), we then formulate that an action is composed of:
∙ A movement or one of the available physical movements of an intentional entity
(e.g. “moving” around in the world, “using” an item).
∙ A target or a game entity in videogames (e.g. the player-character, an enemy, a
building).
∙ An intent or the affordances the target offers for the given movement (or e.g. to
attack, to invade, to flee from).
This model is inspired by Searle’s essay on intentionality [30] and on the BDI archi-
tecture, an existing software model. Our model defines that an action may exist per-se, it
can also be included into a hierarchical form, a plan. The intention to achieve or perform
such a plan corresponds to a prior intention, i.e. a premeditated intention which actions
represent a necessary condition to be completed. However, we restrict the scope of this
work to the detection of actions per-se, or in other words, spontaneous actions that may
or may not be part of a prior intention.
We then proceeded to detail how the three components of an action can be detected in
a virtual world, based on the matching and mismatching of anticipated behaviour related
to distance change between entities. Namely that the movement marks the moment an
29
action is performed and, thus, detected and analysed. The target is the entity involved in
that action which agency is the lowest (the highest being the responsible for the action).
Finally, we detect the intent based on the combination of the movement and the target :
it expresses the reason why the responsible for the action would interact with the target
with such movement.
In the following section, we describe DogMate, the anticipation-based framework that
we developed to recognise this model in virtual worlds.
30
Chapter 5
DogMate
Our objective with this work is to detect the underlying intent of a character’s actions.
More importantly, we want to achieve this detection on a specific character, the player-
character (therefore, entirely controlled by the player). In chapter 3, we analysed several
approach to enhance the believability of character by predicting the player-character’s ac-
tions. In this chapter, we elaborate a framework (DogMate) in which we apply our model
of intentions to extract the intent of the predicted actions. However, several problems
arise from the application of this model to videogames. Namely, when multiple actions
are detected, which should we consider? Which is the most relevant? Also, as we are
creating a support for believable behaviour, we find important to include the context in
which an intent is detected as it may produce a noticeable difference on the responses it
may elicit. For instance, should a character consider that the intent to attack an enemy
is always appropriate? It depends on its own beliefs as different individuals may inter-
pret differently the same situation, depending on their own perspective. After looking at
Aini’s architecture (section 3.4), we chose to use the emotivector to support our work as
it permits to solve the new problems we enumerated above.
This process is presented in this chapter as our framework, DogMate. We first provide
a description of the support we use, the emotivector. Then, we discuss the predictors used
to detect the components of an action. We also highlight some problems that led us change
the original model of the emotivector and propose a new method to quantify the relevance
of an unexpected event. We then show how it can be used to compare several actions and
its components. Afterwards, we present how the emotivector’s affective appraisal helps
to interpret the detected intents from the character’s perspective. Finally, we finish this
chapter by presenting DogMate’s flow as a whole, ready to be integrated within an agent
architecture.
31
5.1 Emotivector
The emotivector is an affective anticipatory mechanism [20] designed to operate at the
perception level. It approaches representation of anticipation by giving the agent archi-
tecture a wide range of possible feedbacks about the expectations it may possess. This
approach allows the use of a richer set of behaviours, such as realising that a mistake has
been made or even being confused by what is happening, illustrating important traits of
human behaviour. The emotivector is applied in works such as Aini, the synthetic flower
[21, 20] (presented in section 3.4), or with the Philips iCat robot acting as an affective
game buddy [17]. In both, the emotivector is used to predict the user’s next action and
it was shown to enhance character’s behaviour. In our work, more than predicting what
the player-character will do, we use the emotivector to understand why he would do it.
The reason why the user did a certain action might change his understanding of his sur-
roundings and consequently change his expectations about the possible consequences of
his act.
The emotivector is a strong support to apply in practice the concept we formulated
in last chapter as it allows to detect not only the three components of an action but
also to quantify it based on its relevance to the player-character and classify it from the
observer’s perspective (i.e. based on Rusty’s beliefs).
It works by keeping a history of the perception’s value and uses it to predict its future
value. By analysing both the expected value and the sensed one, the emotivector generates
an attentional (salience) and affective (sensation) value associated with the signal (see fig.
5.1). On one hand, the emotivector’s salience helps to inform about the percept’s relevance
and its exogenous component reflects the unexpectedness of external stimulus. On the
other hand, the generated affective sensation helps to classify the mismatch between the
sensed value and the expected value as positive (rewarding) or negative (punishing), based
on whether or not it is converging toward the desired value.
Figure 5.1: The emotivector is attached to a perception channel. The mismatch betweenthe sensed and predicted information produces an attentional salience. Classifying bothagainst the desired value permits the emission of an affective sensation.
By introducing a concept of prediction error, the emotivector models an “as expected”
range. The error-predictor produces an expected prediction error which represents the
tolerated variance for a given sensed value. If the mismatch between the sensed value and
the expected one is within the predicted error’s range, then it is considered an expectation
32
confirmation. Otherwise, it produces an expectation violation. Combining this concept
with the original affective sensations allows the emotivector to classify a signal within
nine sensations, from (more or less) punishing to (more or less) rewarding, from expected
to unexpected (as shown in figure 5.2).
Figure 5.2: The emotivector’s nine affective sensations. We use the original nomenclaturein which R stands for reward and P for punishment. The sensed signal (columns) isrepresented as a continuous line while the expected one (lines) as the dotted line. Theinterval represents the expected error.
The emotivector’s predictor is independent from its sensation model and should cor-
respond to the one which adapts the best to the signal it monitors. For instance, through
this work, we use mostly examples based on the distance. The predictor we opted to use
is a moving average [26] based on the second derivative (i.e. the projected acceleration).
The same goes for the error predictor, in which using a predictor that best match the
signal produces more accurate results.
Let us then exemplify the usage of an emotivector and at the same time define its
underlying concepts. Recall the scenario’s entities presented in section 4.5.1 and imagine
that the only thing the player-character has done recently is to be idling around (fig.
5.3(a)) but suddenly he starts moving and goes toward an enemy (fig. 5.3(b)). Rusty
looks at him happily, as he is going to free the world from a dangerous threat. To model
such situation using an emotivector, we first we attach it to one of Rusty’s sensors that
measure the distance between the player-character and an enemy. Initially, as the player-
character has not been moving that much, the predictor estimates the distance to remain
approximately the same.
However, once the player-character started to move toward the enemy, the predicted
distance did not match the expected one and the percept receives a high salience from its
exogenous component.
The emotivector’s exogenous component model the unexpectedness of a signal such
as:
EXOt = ∣xt − xt∣ (5.1)
33
(a) The player is idling. (b) The player suddenly moves towards its enemy.
Figure 5.3: Rusty observes the player-character happily as he suddenly begins to movetowards an enemy.
In equation 5.1, xt represents the sensed value at time t and xt the expected value for
this time t. Let us consider that the player-character was at a distance of 100 units from
his enemy, which corresponds to the expected value (xt). Suddenly, this distance reduces
to 90 units, representing the sensed value (xt). This gives us an exogenous component of
EXOt = ∣90− 100∣ = 10.
This prediction’s error also corresponds to 10 units, as it is defined similarly to the
exogenous component:
PredictionErrort = ∣xt − xt∣ (5.2)
This value exceeds the predicted error which is just 1 unit (obtained from the error-
predictor) and is consequently classified in the unexpected range. Furthermore, because
Rusty wants the player-character to dispose of his enemies, when he observes him going
toward an enemy (the sensed value is converging toward the desired value — 0 in this
case), he classifies that event as rewarding. This combination emits an unexpected reward
(unexpected R in fig. 5.2) and a happy emotion surfaces: Rusty wants the player-character
to move closer to that enemy and that is what he did.
We now present the predictors that could have been used within this scenario. At the
same time, we shift our focus on detecting actions and its three components.
5.2 Predictors
To fully identify an action, we make use of three emotivectors between pairs of entities.
One uses its predictor to estimate the distance variation between the entities (to determine
the movement) and two to estimate each entity’s relative velocity (to determine the agency
and therefore the target). We currently use the euclidean distance as it represents a
good trade-off between estimation accuracy and computational cost. Our predictors are
implemented using the following equations:
34
dt = d(t−1) + ˆdt (5.3)
ˆdt = d(t−1) + ˆdt (5.4)
The expected distance at time t (dt) depends on the expected velocity ˆdt (eq. 5.3)
which, in turn, is based on the expected acceleration ˆdt (eq. 5.4). To estimate the
acceleration, we use the following equation:
ˆdt =
n∑i=1
d(t−i) − d(t−(i+1))
n(5.5)
We predict the acceleration (eq. 5.5) by applying a moving average [26] to the last n
sensed accelerations. Empirically, we found that a moving window of n = 3 gives adequate
results, as the window is small enough to adapt quickly to acceleration change, while large
enough to mitigate small acceleration variation due to the noise present in the original
signal.
5.3 Relevance
In figure 4.3, Rusty observes an unexpected movement of the player-character which
generates two actions: “getting closer to the enemy” and “getting away from the building’s
door”. However, the player probably only intended to make one of them. While both may
contain relevant information for decision-making, it is important to be able to quantify
and classify their relative importance for Rusty.
To express the relevance of the action for a character at time t (Rt), we use the
following equation, representing the degree of unexpectedness of the sensed value:
Rentt =
PredictionErrorentt
ExpectedErrorentt
=errentt
ˆerrentt
=∣xt − xt∣
ˆerrt(5.6)
In equation 5.6, xt and xt represent the sensed and expected value, respectively. We
based this equation on the concept behind the emotivector’s exogenous model (eq. 5.1)
but by computing relevance as a ratio, we make the prediction independent from the
metric it measures or even its range.
This is important as the signal we measure is the distance and it may pose problems
if we normalize it as the original exogenous model requires. For instance, if we use the
maximum distance of the virtual world to normalise any sensed distance, most of the
range would be unequally used as shown in figure 5.4. First, because the maximum range
would only be used if the player-character stands near one extremity while the other entity
35
is at the other extremity, which is a limited scenario. Second, the visual obstruction does
not permit, most of the time, to actually notice the whole level at once (the same is true
for either the player or other characters as they are restricted by their sensors). We would
then be restricted to a sub-range almost all of the time. Also, an alteration to the game
level design would also imply an alteration to the normalization function — a process
which might be prone to errors if it has to be hand-adjusted. Another drawback is that
it might not be trivial to calculate the maximum distance of a level. It could be the
maximum of the minimum distances but, even then, if the level is dynamic, it is prone to
change. And such calculus at run-time might not be viable at all. These problems made
us consider that the distance value should be used as-is within the emotivector and made
us consider this new model to compute the relevance (conceptually identical to the notion
of salience) of a sensed value.
Figure 5.4: In a normalized range, the whole range would hardly be used.
When Rentt > 1, we say that the signal of that emotivector is salient and that an
expectation violation occurred. To compute it, however, we need to predict the prediction
error ( ˆerrentt ). We compute it using the following equation:
ˆerrentt = k × errent(t−1) + (1− k)× ˆerrent(t−1) (5.7)
The variable k takes values within the [0, 1] interval. A high value for k allows the
estimation to adapt quickly to huge variations, while a low value makes the estimation
more conservative. When an unexpected event occurs, the acceleration predictors need
a certain time-frame to re-adapt to the signal and emit correct predictions once again.
Within this time, it is important to avoid detecting a false unexpected event as a result
of adaptation. After adapting to the new signal, prediction error (and consequently the
expected prediction error) will decrease.
Let us exemplify how prediction and relevance work using the scenario from figure
4.3. The player initially moves the player-character simultaneously toward an intentional
entity (enemy) and toward a static entity (the building’s door). Assume that, when the
unexpected movement occurs, the player-character is 700 units away from the door and
400 units away from the enemy. Additionally, because the player-character was walking
36
toward the door, we expect the distance to the enemy (xt) to remain around 400 units,
while the distance to the door is expected to decrease to 650 units. Finally, because our
predictions have been accurate this far, we expect a low prediction error in both cases:
10 units. Now, the user broke our expectations and is (xt) closer to the enemy (300
units away from the enemy) while further away from the door (the distance to the door
increased to 750 units):
Rdoor =∣750− 700∣
10= 5
Renemy =∣300− 400∣
10= 10
When comparing both actions resulting from the distance variation to the door and to
the enemy, we can say that the action to go toward the enemy is two times more relevant
(unexpected) than the one to go further away from the door. We assume that the more
unexpected an action is, the more likely the player meant to do that action intentionally
(in this case the action to go toward the enemy).
With this model, we are not only able to distinguish actions and order them by rel-
evance, but also, if we combine relevant changes in the velocity-emotivector (table 5.1),
we can decide which entity is responsible for the action (the other entity being considered
the target of the action).
Velocity Entity+ Not salient Entity-Player+ Both seek P seeks E P seeks E; E avoid P
Not salient E seeks P None E avoid PPlayer - E seeks P; P avoid E P avoid E Both avoid
Table 5.1: The nine salience patterns from the velocities emotivectors. Plus and minusindicates the convergence toward the other entity and the divergence, respectively.
In this case, if the enemy had a relevant change in relative velocity, it would mean the
enemy started to attack the player-character, while if the relevant change occurred with
the relative velocity of the avatar, the player-character would have initiated combat.
5.4 Affective Appraisal
To model the intent interpretation from the character’s perspective, we appraise the intent
using the emotivector’s sensation model. However, this model does not allow cases in
which Rusty wants the player to avoid another entity. Until now, we have been exploring
examples in which Rusty actually knows that there are no risks for the player-character
if he engages into a fight with his enemies. Still, it only makes sense if Rusty clearly sees
37
that the player is in condition to sustain yet another fight (fig 5.5(a)). If Rusty thinks
that the player-character might not handle it, then it would not be wise for Rusty to
express a happy emotion or it would appear as out-of-context.
(a) The player is in good condition. It is al-rightto fight.
(b) The player’s condition is bad. Rusty doesnot want him to fight.
Figure 5.5: Rusty might interpret the same action differently depending on the status ofthe player-character.
To allow the modelling of such situations, we needed to have a concept of undesired
value. The difference between having an explicit undesired value and setting the desired
value to a different value is that the desire to avoid may not equal the desire to reach
something else. We might just want to avoid getting close to a particular entity but, at
the same time, not care that much how far we get from it.
Using this concept, we can appraise an intent depending on a character’s perspective.
First, we set desired (and undesired) values for distance, depicting where that character
would like the distance between two entities to be (or move away from). Then, when a
relevant change occurs in the sensed distance between the two entities, we compare the
sensed value, the expected value and the desired/undesired value to produce an affective
state. If the sensed value is closer to the desired value, it is considered a positive sensation.
If it was expected to be even closer it is a “positive but worst than expected”, if not it
is a “positive and even better than expected”. The same applies if the sensed value is
diverging from the desired value, but in such case it is considered a negative sensation.
The reasoning for getting closer or away from undesired values is the negation of the
previous one (e.g. avoiding an undesired value is considered positive). As a result of this
affective appraisal, one of four affective states is generated for each unexpected change.
The affective states will influence the character’s decision making.
As an example, consider that Rusty sees the player-character is hurt. Rusty desper-
ately wants him to avoid any threat, by setting an undesired value of 0 for the distance
between the player-character and any enemy. Consequently, if he detects the player-
character has an intent to attack an enemy, he will view that intent as bad for the player
(the sensation will be a negative one). Based on this personal interpretation, Rusty will
assume a context-specific strategy and display an adequate context-aware behaviour, to-
tally different from the one resulting from appraising the event as positive. Table 5.2
shows an example of such categorisation.
38
Sensation Affective Reaction Description
Positive and better than ex. Happy The intent fits the situation.
Negative but better than ex. Surprised (Happy)Expected a negative intent,but this one is good as it fitsthe situation.
Negative and worst than ex. AngryThe intent does not fit thesituation.
Positive but worst than ex. Confused (Worried)Expected a positive intentbut this one is bad as it doesnot fit the situation.
Table 5.2: The generated affective sensations help Rusty to contextualise the detectedintent within its current situation.
We now describe our framework’s flow, from the emotivector’s parametrisation to the
appraisal and selection of the resulting intent.
5.5 DogMate’s Flow
We now have described all elements needed to recognise actions and their underlying
intents, to appraise and classify them. DogMate (represented in fig. 5.6) is the name
we gave to this framework and we now review its flow during one update cycle. Let us,
again, turn to the example depicted in figure 4.3 from Rusty’s perspective. Let us also
consider that Rusty knows that the player-character has an objective: to break through
the building to invade it. Rusty also knows that the player-character is healthy enough
to handle a few enemies. These are his current beliefs and we use them to set the desired
value of predictors (1 in figure 5.6) such that reducing the distance toward enemies or the
door is considered as positive within this context. Rusty’s beliefs also contain information
about the position of entities that Rusty can sense (i.e. within a certain radius). This
information is fed to the emotivectors at each cycle.
To represent the possible actions between the player-character and an enemy, we use
one set of three emotivectors (one distance-based and two velocity-based). To represent
the possible actions between the player-character and the building’s door, we use a set
with one emotivector (distance-based) (2). The player is currently moving his avatar
through the world and is reducing his distance to both the door and the enemy. Suddenly,
something unexpected occurred: the avatar reduced drastically its distance to the enemy
and the distance to the door increased. At this time, the distance-based emotivector in
both sets became salient (3). In the set of the door (a static entity), we only have one
emotivector so we know that the player-character is getting further away (movement)
from the door (target). In the second set, the emotivector that monitors the relative
velocity from the player-character to the enemy is also salient. This pattern means that
39
Figure 5.6: DogMate’s update flow: (1-2) emotivector parametrisation; (3-4) intent de-tection; (5-6) affective appraisal; (7) action selection.
an action was initiated by the player, whose target is the enemy, and his movement is to
get closer to it. Two intents are produced based on these recognised attributes (4).
Because Rusty knows that the player-character can handle another enemy, the distance-
based emotivector leads to the emission of a positive sensation (5) which makes Rusty
interpret that distance reduction (and the corresponding intent to attack) as beneficial
for the player (6). On the other hand, because Rusty wants the player to complete his
objective, he interprets the increased distance to the door as negative and consequently
views with bad eyes his disinterest in completing his task.
Finally, these intents are compared to each others in the decision making module
to select the decision that Rusty will adopt. Because the emotivector that monitors
the player-character’s relative velocity to the enemy is the most salient of them all, we
choose to select that intent as the one that will make Rusty react (7). As such, Rusty
encourages the player-character through verbal and non-verbal behaviour, described in
the next chapter.
5.6 Summary
In this section, we presented the emotivector, an affective anticipatory mechanism. We
discussed our predictors for the distance and velocity, and discussed how to classify the
unexpected value in term of their relevance. We based it on the exogenous model of the
40
emotivector, although we assume a different usage: we avoid normalizing the monitored
signal (distance) as it could pose some problems. We use the relevance value not only to
classify actions between them, but also to determine which is the target of an action. Next,
we introduced the notion of undesired value in the sensation model of the emotivector to
model the concept of avoidance and presented how we appraise intents from the point of
view of a character.
Afterwards, we presented DogMate and its flow within one update cycle. It begins
by parametrising the emotivectors based on the beliefs of the character. Each entity is
monitored by a set of emotivector, varying from one distance-emotivector for static entities
to three emotivectors for intentional entities (one for the distance, two for velocities).
The salience formed by each set corresponds to a detected intent which is then appraised
based on the affective sensation generated from the distance-emotivector present in the
set. Finally, we use this information in the decision-making component to select the intent
from the action which is the most relevant for the character, reflecting its judgement of
the situation.
In the next chapter, we present K9, our test-case, in which we use DogMate coupled
to an agent architecture that simulates Rusty’s brain and controls him in a game engine.
41
42
Chapter 6
Test-Case: K9
K9 (“Canine”) is a small role-playing game (RPG) environment used as a test-case for our
framework. We built it on top of the modding framework of FO3 and therefore it makes
full use of its game mechanics. In this section, we first present K9 universe, introducing
its story — incorporated within Fallout’s timeline1 — its gameplay, and the relationship
between Rusty and the player-character. We then overview its technological support,
namely the interaction between FO3 and its modding components. Finally, we present
Rusty as a synthetic character and describe its interaction with DogMate.
6.1 The Story
October 23rd, 2076. The world is in chaos.
Resource Wars have been raging for more than two decades now and everyone
can feel that something big will happen soon... something called the Great
Nuclear War2.
After completing their Power Armor3 prototype, the U.S. military plans to
send them to the front in order to keep Chinese forces in check. On their
side, the Chinese Secret Agency found out about the Enclave4 plans and they
are pressuring their research department to finish their human-experiments
quickly...
Ralph Canine is a hard worker living in a remote, peaceful town. He is nearing his
thirties and lives with his beloved wife. Recently, however, he has been lost in thoughts,
and keeps listening to the news on the radio (fig. 6.1). It seems like the Chinese military
forces are roaming through the country, kidnapping citizens on their way. Witnessing his
1The Fallout setting timeline: http://fallout.wikia.com/wiki/Timeline2The Great Nuclear War start on October 23rd, 2077: http://fallout.wikia.com/wiki/Great War3The Power Armor: http://fallout.wikia.com/wiki/Power Armor4The Enclave: http://fallout.wikia.com/wiki/Enclave
43
husband in such state, Pat Canine cheers him up and convinces him to take Rusty, their
dog, out for a walk. She tells him to forget about the news, as such things would never
happen in their village anyway.
Figure 6.1: Ralph, listening to the radio.
The moment Ralph puts a foot outside the door, he cannot help but to feel anxious. He
feels that something is wrong. Seconds later, a small commando appears out of nowhere
and invades the village, bashing the villagers (fig. 6.2). While their aim is not clear,
Ralph and Rusty resist. Their efforts are, however, fruitless and they rapidly are put out
of cold.
Figure 6.2: The Chinese commando bashes the villagers and kidnaps them.
Two weeks later, Ralph awakes in a strange and desolated room. While wondering
about his surroundings, a strange robot enters the room and calls him “subject 101”. It
seems to Ralph that the robot is some kind of medic, as it talks about checking his vital
status. On a comical, yet creepy tone, the robot tells him that he is a failed experiment
and that he will not live much longer. On these words, it begins to leave the room.
On its way out, however, the robot is completely destroyed by a strangely dressed man
(fig. 6.3). That man goes by the name of Three Dog, and it seems like his skin is rotting
44
away, like if he was some kind of ghoul. He tells Ralph to hurry invoking that they do not
have much time. It seems like he is breaking out of that place and that he wants to take
Ralph with him. Ralph, insecure, follows him as he does not have much choice. Along
their way, Three Dog slowly enlightens Ralph about his current condition. If what Three
Dog says is true, it seems like Ralph was used in a genetic experiment.
Figure 6.3: Three Dog kills the robot medic.
He does not seem to lie, however. Indeed, Ralph begins to feel aware of the new
abilities he possesses. He feels stronger and his senses are sharpened. Strangely, he feels
like he is able to understand the sensation of someone else. He soon finds out that the
other being is Rusty, his dog. Both Three Dog and Ralph rescued the poor animal and
it was then that Rusty “talked” to him — or it is better to say that it was Ralph that
could understand him. The conversation is not a happy one, as Rusty delivers him sad
news. Pat, his wife, passed away.
Seeing Ralph depressed, Three Dog tries to console him. He rapidly understands
that Ralph is not depressed. He is enraged. As he cannot understand what is going
on, Ralph menaces Three Dog to explain him everything. It is at that moment that the
sorrowful truth is revealed. To face the uprising technological prowess of the Americans,
the Chinese army has been working on experiments to create super-soldiers. They created
a nano-technology that allows humans to assimilate the genes of animals. The result is
two-fold. Humans are able to feel the sensorial information of their companion. However,
their genes are being destroyed by the combination. Without new doses of genes they are
prone to death. Hence, Three Dog tells him to make sure to keep Rusty around and safe,
at all costs.
On their way out, Three Dog also reveals to Ralph information about the responsible
behind their problems. He tells him that this person, Dr. Wu, might possess a cure for
their disease. He is, however, interrupted by a robot-guard which is set on making the
three of them go to heaven. Ralph and Rusty barely make it alive, but Three Dog was
not so lucky. On his deathbed, Three Dog reveals that he is one of the leaders of the
45
underground resistance and that if Ralph contacts them, they will surely help him out.
Decided on finding a cure and getting revenge, Ralph sets out on his journey and soon
finds out Dr. Wu’s whereabouts. . .
6.2 The Gameplay
Fallout 3 is an action role-playing game, mixing some elements of first/third person shoot-
ers and western role-playing games. The player may wield several weapons and armours,
and his own dexterity influences his performance – hence the shooter categorisation. How-
ever, there is also a part determined by statistics and skills that only depends on the
player’s character level.
As K9 is a FO3 mod, we continued using their core gameplay system. However,
we modified two important aspects of the original game. First, the Vault-Tec Assisted
Targeting System, or V.A.T.S., which allows the player to use some action points and, in
turn, stop the game to aim at a specific body part of an enemy. Second, the radiation
system as K9 takes places before the Great War, a period in which people do not have
to worry about the lethal radiation that enveloped the world. We transformed these
game mechanics to reinforce our story and implemented an adrenalin system, plus a
degeneration system. The adrenalin act as action points, as they permit the player to use
the V.A.T.S. (albeit renamed for the story’s purpose). Adrenalin goes up as the player
fight and down as he rests or use the V.A.T.S. The degeneration system is here to pressure
the player. Degeneration gives the player some disease (i.e. statistical transformations,
with some advantages and penalties) and increase continually in time, but can be slowed
by the player’s adrenalin. Indeed, a greater adrenalin level may permit the degeneration
to stop worsening.
Fortunately for him, Ralph is not alone. He has Rusty by his side, accompanying
through his journey and taking an active role in the unfolding of story. He stays by Ralph’s
side at all the time and tries his best to protect him. Rusty has also an important role in
terms of gameplay mechanics. In fact, the only mean to make the degeneration lower is
transferring some of Rusty’s genes to Ralph’s body, to feed the nano-technology. Making
so, however, temporarily lowers the nano-technology power and disable the possibility
from using shared abilities between Rusty and Ralph. These consist principally into
sharing their own senses (resulting into a colour alteration of the screen, turning enemies
more salient as shown in fig. 6.4) and perceptions.
We implemented three levels to represent K9’s story. First, an introductory level,
in which the player gets a first contact with the world. It presents the story up to the
kidnapping. Second, a tutorial level, in which the player meets with Three Dog. Three
Dog act as a mentor, teaching the player new abilities and revealing more about the
story. Finally, we implemented a main level in which the objective for the player is to
46
Figure 6.4: Ralph uses Rusty’s sensorial information to better discern his enemies.
invade the building in which Dr. Wu’s laboratory is supposed to be. This level is divided
into three parts representing the outside (filled with failed experiments, almost completely
degenerated), a first line of defence (filled with soldiers partially enhanced) and a compact
barricade securing the main entrance to the laboratory (filled with heavily armed soldiers).
6.3 Technological View
To implement K9 we used Fallout 3’s modding tool, the Garden of Eden Creation Toolkit5
(G.E.C.K.). The G.E.C.K. allows modifying everything that is considered a content (e.g.
characters, levels, quests) of the FO3 game engine. In fig. 6.5 we show its main interface.
Using this tool we created the K9’s levels. The process is straightforward, we started
by defining the world characteristics, then proceeded to shape the height-map to as we
envisioned it. Texturing the level is as easy as painting in a usual image editing software
as it use the same metaphor (i.e. you pick a brush, a texture, and you paint the area you
wish to cover). The biggest advantage to mod such a game is that we have an access to
a vast amount of professional and well integrated assets. Filling the level with content
such as trees, roads, or buildings, is as simple as dropping them where we want them to
be placed.
FO3 engine uses the concept of navigation meshes (navmesh, see fig. 6.6) to support its
pathfinding algorithms. Although some automatic tools are available to provide a good-
enough navmesh, it needs, most of the time, some manual correction. It is an important
step if we want autonomous character to behave in a correct way as, without correct
5Bethesda Game Studios, Garden of Eden Creation Kit (2008): http://geck.bethsoft.com/index.php/.
47
Figure 6.5: The G.E.C.K., Fallout 3 main modding tool.
navmesh information, they would just be lost within the level.
Figure 6.6: Editing the navmesh in the G.E.C.K. A character can only move within itsboundaries.
These characters can be created and parameterised using a generic template, as shown
in fig 6.7. We can define their attributes, skills, or experience, but also some data specific
to their behaviour, such as their aggressiveness or confidence. This allowed us to quickly
create several types of characters that will behave differently in punctual combat situa-
tions. For their more common behaviour, the G.E.C.K. possesses the concept of packages.
Packages are a configurable set of instructions that will be executed when a precondition
is matched. They allow to specify behaviours like “follow”, “lead” or even “eat” and
“sleep”. It has, however, a limitation if we want to have a fine-grained control over the
character itself. Low-level actuators are, for the most part, inaccessible which makes it
tricky to control a character exactly the way we want it to be. We are restricted to higher
level control, i.e. restricted to the use of packages.
To modify existing game mechanics, we mostly modified game settings already avail-
able. In the G.E.C.K., these are similar to the concept of global variables. It made the
48
Figure 6.7: Creating a character.
adaptation of some mechanics, like the radiation or the action points, straightforward.
However, to create new mechanics we have to go through a whole different process. In-
deed, there is no such support within the G.E.C.K. The only possibility is the support
of some limited scripting functions that we used to create both new behaviours, for our
characters, and new mechanics, for K9. To overcome most of the imposed scripting lim-
itation, we opted to use Fallout Script Extender6 (F.O.S.E.). The F.O.S.E. injects some
new instructions at the initialisation of the game executable, making it possible to create
new scripting functions and make them available to the mod. We can then program these
functions in whatever language we want, as they are self-contained in an external dll. In
this case we opted for the C++ language and, by extending the source code of the original
F.O.S.E., we created Rusty’s brain.
Before looking at the internals of its brain, we will first describe Rusty as a synthetic
character existing within FO3 engine, i.e. Rusty’s body.
6.4 Rusty’s Body
Rusty is a dog, one which, in appearance, is similar to Dogmeat — the dog of the original
game. To convey his thoughts, Rusty uses several animations that encompass unique
emotions like being happy, surprised / relieved, confused / fearful, or angry (shown in fig.
6.8).
These animations are coupled to a sound component that matches it, namely sounds
corresponding to barking and panting. In the happy animation (fig. 6.8(a)), Rusty raises
his tails as he barks happily, to the sky, five times. When he is surprised (fig. 6.8(b)),
with a feeling of being relieved, he pants a few times while waving his tails, horizontally,
in a joyful manner. In contrast, when he is confused (fig. 6.8(c)), with a touch of fear,
he lowers his head, put his tails in-between his legs and he whimpers. Finally, when
6I. Patterson, S. Abel and P. Connelly, Fallout Script Extender (2008): http://fose.silverlock.org/.
49
(a) Happy. (b) Surprised / relieved.
(c) Confused / fearful. (d) Angry.
Figure 6.8: The four emotions that Rusty can convey through his animations.
he is angry (fig. 6.8(d)), he assumes a serious and threatening pose in which he growls
ferociously.
Along these, we reinforce his expression by using on-screen subtitles which reinforce
his expression but also reflect Rusty’s thoughts (fig. 6.9) — in the fiction of K9, the
player-character can understand Rusty.
Figure 6.9: Rusty’s expression is reinforced through a sound component and on-screensubtitles, which also reflects his thoughts.
The similarity between Rusty and Dogmeat ends here. When and why these anima-
tions are used is almost (if not more) as important as the animation themselves. This is
mostly DogMate’s responsibility, which we describe next.
50
6.5 Interacting with DogMate
Rusty’s behaviour is mostly controlled through an independent module, named as its
brain, and interpreted by the scripting component of FO3 game engine. The interaction
between the module and the engine is technologically limited to a unidirectional invocation
from the scripting component. In fact, to the game engine, it contains a mod — K9
— that has a script which invokes yet another scripting function. In reality, F.O.S.E.
interrupts the process and redirect the invocation toward our code, which provides us the
mean to implement the functions in an independent (and more convenient) manner. The
interaction is based around two functions, one to update the brain and the other to get
the decision that Rusty will execute in its current situation. This decision is then carried
out within the mod itself. In practice, it corresponds to play an animation, a sound, a
dialogue or to activate a specific package. This step is necessary as such actions cannot
be carried out from within the external module.
This component (Rusty’s brain in fig.6.10) includes DogMate and a world interface
which contains sensors and effectors.
Figure 6.10: Interaction between K9 and Rusty’s brain.
First, let us outline the general flow that corresponds to the brain being updated
(fig.6.10). It starts from within K9’s scripts (1a). At this point, information about the
world is sensed so that we only consider detected enemies and relevant entities (1b) from
Rusty’s perspective. This information is stored in DogMate’s beliefs. DogMate performs
its update cycle, eventually detecting the intents of relevant intentional entities. As the
result of its cycle, DogMate sends primitives corresponding to the selected behaviour for
Rusty. We store this information in a buffer (1c), the effectors, and in the end of the
K9 cycle, the script checks which was the decision (2). Within the FO3 engine, this
information is used to execute verbal and non-verbal behaviour for Rusty which consist
of the subtitles and animations presented earlier.
The main purpose of the world interface is to act as an interface between the engine
and the rest of brain. It is responsible to keep an up to date cache that replicates
the game engine information deemed relevant for our component. This information is
then culled using Rusty’s sensors and transformed into its own beliefs. This information
51
typically either represents the perception needed to update a predictor or its desired
value. The world interface is also composed of effectors which consist in a buffer that
stores the decision. Once the decision is received through the “GetDecision” invocation
(2 in fig.6.10), the engine processes and makes Rusty behave in the intended manner.
Rusty’s actions are mostly passive. Indeed, he only counsels, and support verbally the
player based on the situation he is witnessing. Hence, the brain’s decision is transformed
into one animation coupled with a matching counsel (i.e. a dialogue appearing as a
subtitle). While the animation, and its respective sound, remains the same for a similar
emotion, Rusty’s thoughts may differ. In table 6.1 we show a sample of the text shown to
the player as being Rusty’s thought and counsels. Note that we use the emotion elicited
by the affective appraisal [20] to the select one of the four animations.
Intent Player condition Appraisal Thought / Counsel
Player attacks Good Happy “Oh yeah! Let’s get him!”
Player attacks Good Surprised“Oh yeah! Being cowards ain’t forus!”
Player attacks Bad Angry“An enemy now? That would bepretty bad...”
Player attacks Bad Confused“Seriously, we should avoid con-frontations for now...”
Enemy attacks Good Happy “Oh oh! Come and get some!”
Enemy attacks Bad Angry“Now would be a good time toflee...”
Player flees Bad Happy “That’s it! Let’s avoid that one.”
Player flees Bad Surprised“Facing hostility in this situa-tion? You had me worried for amoment there...”
Player flees Good Angry“A little fight wouldn’t be thatbad!”
Player flees Good Confused“Seriously, why are you cower-ing...?”
Want to invade — Happy “ Oh yeah! We’ll be there soon!”
Want to invade — Surprised “That’s it, no time to waste!”
Avoiding invasion — Angry“What are you doing...? We’regetting further away...”
Avoiding invasion — Confused“Is there a problem..? We shouldhurry you know...”
Table 6.1: The animation displayed and the subtitles are based on the selected intent andits affective appraisal.
Some possibilities are not used as, in preliminary tests, they were poorly understood
by the player. An example is the enemy fleeing, and being noticed by Rusty but not by
the player. Rusty would comment on it, “Oh oh! Scared, are you?” but most of the time,
52
the player could not understand his reasoning. We could surely adapt it, by restricting
such expression to those that the player also witnessed but, as this is not the scope of our
work, we opted to not include them in Rusty’s final behaviour.
6.6 Summary
In this section, we described K9. As a mod of Fallout 3, it uses most of its games
mechanics, but also adds new ones that suit our goals. The story itself takes place as
a prequel of the Fallout series, a year before the Great War which almost completely
destroyed everything on our planet.
Technologically, we used the G.E.C.K. to create the virtual world and setting, as well
as its characters. Using F.O.S.E. we created an independent module, Rusty’s brain, which
can be invoked from within K9’s scripts.
Rusty’s brain is made by a world interface and DogMate. The engine provides Rusty’s
brain with game data, that the world interface stores and filter, passing through the
information that Rusty can sense to DogMate. In return, Rusty’s brain provides the
game engine with the behaviour that should be used by Rusty’s virtual body. This
behaviour is a conjunction of Rusty’s animations, sounds and subtitles, which match the
intent selected by DogMate and its affective appraisal.
In the next section, we describe the evaluation we performed on K9, and the results
we extracted from it.
53
54
Chapter 7
Evaluation
In this work, we pursued several objectives. First, we wanted to create a framework
to support the creation of believable behaviour for synthetic characters in general, by
allowing the detection of some of the other characters intentions, namely those of the
player-character. We were particularly interested in applying such an approach to syn-
thetic character with a particular role, sidekicks, and this interest guided our development
to K9. To evaluate our approach, we focused on analysing whether the framework could
correctly detect the player’s intentions. We were also interested in understanding how
well the recognition of intentions was performing when compared to an external human
observer. Finally, we wanted to be aware of the performance. Virtual worlds are real-
time applications in which the greatest emphasis is almost always given to the visual
component. Other components, such as the application’s physics or artificial intelligence,
are left with minimal resources. With such restrictions in mind, we wanted to verify the
computational cost (in terms of CPU consumption) of our solution.
Thought this section, we first review our experiment, describing the setting and each
step of the process. We then discuss the results of the experiments, analysing the recog-
nition rate of the player’s intent and the recognition rate of interpreted intent (i.e. the
player’s intent as interpreted by an external observer). We finish this section with an
analysis of the impact of using this system in terms of computational costs.
7.1 Experiment
To evaluate our test-case, we performed an experiment in which we asked subjects to play
the game. The experiment took place at a public venue, MOJO (“MOntra de JOgos”,
which can be translated as “videogames exposition”) which purpose was to demonstrate
videogames created as part of a master’s course. K9 was developed mostly during that
course, while, at the same time, supporting this work as a demonstrator. Although most
of the participants were university students whose age varied from 19 to 28, MOJO gave
us the chance to evaluate this work with people unfamiliar with our working area.
55
(a) (b)
Figure 7.1: MOJO: a videogames exposition and demonstration.
Before asking them to start the experiment, we made sure that they had a minimal
knowledge of how to play the game, as well as knowledge about K9’s storyline and its
specific mechanics, namely the interaction with Rusty. After the tutorial, we had a small
interview in which we resolved any remaining misunderstanding that participants could
have about these topics. Once ready, we asked them to play a session of K9’s main level,
in which their objective was to invade the enemy’s headquarters. Each interaction lasted
for two to three minutes.
Each game session was recorded through a screen capture software. Immediately after
finishing a session, we asked the participant to annotate the recognized intents. For the
annotation, participants could choose one of five available options: whether they wanted
to attack an enemy or flee from it; whether they wanted to open the door or were avoiding
it; the fifth option was to be used if their intent did not match any of the previous four
options. From this evaluation, we gathered a total of 54 samples (instants in which Rusty
expressed itself based on the user’s intent) from 9 different participants.
Because human observers watching the game would also fail at recognizing certain
intents, we wanted to compare the performance of our approach with that of a human
observer. Our motivation is that if the mismatch between the real intent and the recog-
nized intent can be understood as natural by the player, it might appear as believable.
However, for the player to understand Rusty’s reactions, he has to interpret the situation
from Rusty’s point of view. With this idea in mind, we randomly selected 30 samples
from the 54 available. We then published them on a video streaming website. Although
we lost some video quality, it gave us the possibility to pursue the evaluation virtually.
Our aim was to reach a larger population this time and each participant was asked to
review the thirty samples and to classify them using the same options previously available
to the participants. With this approach we collected 820 valid samples from 30 observers.
We now discuss our results.
56
7.2 Intent Recognition
We gathered a set of 54 samples from the venue’s participants and we compared each
of them against their respective annotation. We found that the participant’s intent was
recognized 61% of the time (see table 7.1).
Rusty samples matches %
player invades building 17 13 76.47%player attacks enemy 25 17 68.00%
“avoid” intents 12 3 25.00%total 54 33 61.11%
Table 7.1: Matches between Rusty’s interpreted intent and the player’s intent.
During our tests, we found out that participants rarely fled from an enemy (although
being positively detected several times). They also had a certain difficulty to express
negative intents such as “avoiding an entity” because, in reality, when they did so, it
was mostly due to the fact that they started to do something else (i.e. “to seek” another
entity). Comparatively, we achieved a recognition rate of 71% for intents related to moving
toward an entity (first two rows in table 7.1). These results support our assumption that
if the player moves his avatar toward an entity, he probably has the intent to interact
with it.
The gathered data also suggests that some refinements could help increase the recog-
nition performance. At the conceptual level, it may be important to take the player’s
field of view into account. Even if the player could be focusing his attention on something
he cannot directly see (e.g. he wants to ambush an enemy by making a detour around
an obstacle and get in his back), such information could be useful to confirm specific
situations and prune others. An example is the when the player is strafing, such as in
figure 7.2. We should not consider that the player intends to attack the enemy that is
outside his field of view, if we remarks that he is focusing on keeping a safe distance from
the enemy he is seeing.
Figure 7.2: The player is strafing toward an enemy while looking at another one.
At the implementation level, the notion of distance could be implemented differently.
57
In K9, we chose to use euclidean distance as a compromise between accuracy and per-
formance. However, in a world with a high number of obstacles (e.g. buildings, rocks,
interiors), this value may be inadequate. A more realistic value for distance could be
computed using path-finding algorithms. However it is a costly process in virtual worlds
of a dynamic nature, and the added value for intent recognition remains to be measured.
Even if we could access to the path-finding information, which would be the distance to
consider in figure 7.3? It is but a simple situation, yet not a trivial one to solve. When
one of the three distance decrease, the others might increase. Given this scenario, should
we consider that the player-character is getting closer or getting away from the enemy?
Figure 7.3: Which of the three distance to use? The blue line represents the euclideandistance and green lines available paths.
7.3 Intent Interpretation
We do not need realistic behaviour to achieve believability, only to display specific and
context-ware behaviour [18]. When Rusty fails to recognize the user’s intent, we do not
want it to break the user’s suspension of disbelief by displaying inadequate behaviour
based on its failed intent recognition. If we ask a human observer to comment about
the possible intentions of his peer, he too might get some intentions wrong. However,
every time the observer recognizes an intention, he can justify it based on his observation.
Then, if the user uses some theory of mind, and puts himself in the shoes of the observer,
limiting his own knowledge to the information the observer had, he might agree with the
observer that the recognized intention could indeed be valid from a certain point of view.
As such, we wanted to verify whether Rusty succeeds or fails at recognizing the same
intents a human observer would succeed or fail to recognize.
We classified each sample according to whether Rusty and/or the external observer
had recognized correctly the user’s intent (see table 7.2) and applied a Person’s chi-square
test, �2(1, N = 820) = 18.31 (� < 0.001). Results suggest that, generally, Rusty would
recognize the same user’s intents the human observer would, and fail to recognize the
same user’s intents the external observer would fail to recognize. As such, Rusty and
external human observers seem to perform similarly when it comes to recognizing intents,
58
leading us to believe that Rusty’s intent reaction may be perceived as believable by the
user.
Rusty ≈ Observer External observertotal
�2 (� < 0.001) recognized not recog.
Rustyrecognized 377 272 649not recog. 68 103 171
total 445 375 820
Table 7.2: Matching external observer’s interpreted intent and Rusty’s recognized intent.
The 820 gathered samples also reinforce our previous statement about negative intents.
Subjects assuming the role of observers also had difficulties in identifying such intents as
only 6.14% were correctly recognized (compared to 62% positive intent recognition —
see table 7.3). The small variation between Rusty’s performance and the lower human
observer’s performance may be related to the fact that Rusty’s has access to information
that is not always on the game screen. This is desirable for virtual sidekick agents, as the
user will value the added understanding of the surrounding virtual world, without feeling
that the sidekick has access to information the user simply cannot access.
Observer samples matches %
player invades building 325 217 66.77%player attacks enemy 381 221 57.40%
“avoid” intents 114 7 6.14%total 820 33 54.27%
Table 7.3: Matching observer’s recognized intent and player’s intent.
7.4 Limitations
The experiment helped us identifying several limitations in our approach. First and fore-
most, not all actions’ intent can be equally identified. Our results support that “moving
toward an entity” is correlated with the player intending to interact with the said entity.
However, our results also show that “moving away from an entity” is not correlated with
the intent to avoid the same entity. As referred earlier, a possible explanation is that
participants explained their actions by stating they wanted to do something new and not
by stating they did not want to do an action anymore: if the player stops attacking and
moves away from a virtual enemy agent, he might not be fleeing from it, he might just be
looking for something that suddenly appeared on the floor or attacking another enemy.
The second limitation is that our approach emits intents that have no past history, as
if each one was the first and only detected intent. This results in duplicated and correlated
59
Figure 7.4: Avoiding an entity is not just a simple negation of the converging intention.
intents emitted disregarding their possible cause. If the player gets surprisingly closer to
an enemy and, seconds later, his distance decrease surprisingly once again, two different
intents are emitted while referring to a same intent: to attack the enemy. Also, attacking
an enemy might produce an intent that suggests that the player is fleeing from another
one. This lack of history and correlation between intents led to the emission of some false
intents.
7.5 Computation Impact
A graphical application, such as videogames, usually requires most of the hardware re-
sources to be dedicated to the graphics engine. When each CPU cycle is crucial, the most
important concern is to sustain the graphical frame rate, as it is one of the most noticeable
aspects of the interaction for the end-user. While the game logic also needs its share of
resources to be processed, in computer and video games, the approach is that the lower
the better. With this in mind, we aimed for a system with low resources requirements: if
it is to be added on top of current generation game technology, then it has to be hardly
noticeable.
Our first concern was the framework’s update rate. An empirical test showed that 2
to 4 hertz tend to present good results. Lower than 2 hertz seems unnatural as it takes
too much time between the user’s action and the intent’s detection. Any frequency higher
than 4 hertz seems to produce the same unnatural behaviour, as if Rusty was reacting
immediately. In a special testing environment, we used 300 predictors divided into 100
sets, to solely monitor virtual enemy agents. This number is exaggerated, but should
cover most of the user’s potential centre of focus in a virtual world. Even if we use more
entities, those which are not relevant for the user’s actual situation can be culled. Profiling
this instantiation of the framework did not show any significant overhead (0.18%) at 4
hertz.
Real-time constraints are mandatory in applications such as computer and video
games. While our solution only recognize a subset of all possible intents, we would argue
that, with such a low impact on CPU cycles, it does recognize them almost for free. As
60
Figure 7.5: DogMate did not take more than 0.18% of the used CPU cycles.
such, we defend this approach could be used to help in creating believable behaviour for
synthetic characters inhabiting the virtual world.
7.6 Summary
We assessed DogMate’s successfulness by evaluating K9 in a public venue. We focused on
gathering two types of data: about the recognition of the player’s intentions and about
the interpretation of these intentions. Hence, we interviewed the participants, and classi-
fied several samples (instants in which an intent is detected) with them. Furthermore, we
published a (randomly selected) subset of these samples on the internet for more classifi-
cations. This time, however, the objective was to gather data about external observers.
These classifications allowed us to support our assumption that if the player is going
toward an entity, it must be to interact with it. We also propose that this assumption
might be refined, namely by looking at the player’s field of view and by implementing
other notions of distances, like analysing several paths. The data also helped us to realise
that this assumption cannot be negated and still be valid. In fact, if the player is not going
toward an entity, it does not mean that he does not want to interact with it. Usually, the
player prefer to refer that they started to seek something else rather than avoiding the
said entity. Our analysis suggests that these facts are consistent whether classified by the
player himself or by an external observer.
Rusty interprets the player-character’s intent from a third-person perspective. While
guessing accurately his intent is important, to remain believable it should, at least, have a
logical explanation from an observer’s point of view. Thus, we compared Rusty’s detected
intents with the external observer’s classification and found out that both were consistent
with each other. This could effectively mean that the player would be able to accept
Rusty’s recognised intentions as valid in his own context.
61
We, finally, evaluated our implementation in a special environment with a large group
of entities. Profiling the application showed us that the overhead is minimal. This was
expected as the framework only needs a low refresh rate, 2 to 4 hertz, to produce accept-
able performance. Hence, with such a low overhead, this solution comes almost as a free
support for character’s designers to enhance their creation.
62
Chapter 8
Conclusions
Synthetic characters generally lack the support to understand a fundamental character
in their world — the player-character. This has a negative impact on their behavioural
believability, as the surrounding characters are unable to provide adequate context-ware
behaviour to the player. In this work, we analysed synthetic characters and sidekicks in
particular, which typically interact for long hours with the player and are consequently
in more danger to break his suspension of disbelief than other characters. We saw that
they could benefit from the capacity to understand the intention underlying the player-
character’s actions and therefore our objective was to provide a framework for recognizing
some of the intentions underlying the actions of other synthetic character sharing the same
virtual world. By understanding intentions in particular situations, we allow characters
to display specific context-aware behavioural reactions and improve their perceived be-
lievability.
We started to analyse several works in which anticipation is used to recognise these
actions and based on Searle’s definition of intention, we modelled the relation between
intention and action. We discussed how an action can be divided into three components
(movement, target and intent) and we proceeded to detail how these components of an
action can be detected in a virtual world, based on the matching and mismatching of
anticipated behaviour related to distance change between entities.
To support this work, we used the emotivector and provided some alteration to its
models, namely to classify the relevance of an action. We detailed how the salience that
it generates are triggers for new intents, which type depends on the emotivectors that
become relevant simultaneously, and how an affective appraisal provides the character
with a personal view on the situation. The whole process constitutes our framework,
DogMate.
A test-case (K9) was presented, showing how our framework can be connected to a
current generation game engine. In K9, DogMate was implemented in an agent architec-
ture to control the behaviour of the player-character’s sidekick, Rusty. Rusty interacts
with the player within a virtual world and advised him based on the intents it would inter-
63
pret from the actions of his avatar. The test-case evaluation suggests that our framework
can be used to identify some of the player’s intents and, at least in some tasks, performs
comparably to a human observer, an encouraging result as we were mostly focused on the
generation of believable behaviour. The results also show that while the framework only
recognizes a subset of possible intents, it has a low computational impact, and as such is
suited for integration.
8.1 Concluding Remarks
Our solution’s evaluation showed satisfactory results in the recognition rate of the player’s
intention based on the movement to get closer to an entity. We view these results as a
confirmation that anticipation is a valid support for the detection of some of the player’s
intention. However, our assumption was not equally valid in all situations. In fact, we
should not have assumed that when the player moves away from an entity, the intentions
should be the same as when he gets close to it but in its negated form. This led us to find
that player’s rarely ever think in the negative terms, e.g. instead of not being interested
in one entity, they affirm to be interested in another one.
This approach also yield promising results as it showed to that the intention detected
could be valid as an interpreted intention, as if it was an external observer that made the
guess. This is important as the role of a sidekick, like any other character, is not to be
the player, but one that interacts with him and they should act like an external observer
would: both watch and interpret the player’s experience from a third-person perspective.
Our last concern was to be able to realize the detection not only in real-time but also
efficiently so that it could appear as an appealing method to be used in resource-intensive
application that are 3D application (e.g. videogames). Here too, the results were very
encouraging as the total processing time consumed by our framework (in a stress test)
did not exceed 0.18% of the total. This result let us say that while we may not detect a
wide range of intentions, their detection is almost costless.
8.2 Future Work
We left some details out of our approach, details that could be further investigated. First
and foremost, we analysed the unexpected actions of the player-character, but left out the
expected actions. Expected actions pose a temporal problem. When does an expected
action start and finish? Similarly, when is an action changing within two contiguous
expected actions? Take for instance the case in which the player-character runs through
the world full of static entities. He always goes in the same direction, same speed, in a
predictable manner. After a while, he passed by a few entities. Did he have an intention
64
to do anything with any of them? Probably not, but we only knew it after he passed by
them. How can we know it before hand?
We believe the limitations identified during evaluation could be addressed by introduc-
ing a higher-level layer that would create a history of the intents to filter duplicates and
related intents. Additionally, the framework should also be able to detect intentions in
expected movement (e.g. while exploring), by introducing prior intentions to the frame-
work. These will undoubtedly require new components to recognize the user’s plans, their
purpose and if an action is part of it. All these are possible directions we are considering
for future work.
Another possible direction for this work come from the limitations identified during
evaluation, such as intents being detected without taking in account neither its history
nor possible correlations with other intents. We believe that these limitations could be
addressed by introducing a higher level layer that would place detected intents in a time-
line, to recognise duplicates and correlate intents between each other. We could maintain
this framework as-is and use the detect intent as input in a higher-level layer, as shown
in figure 8.1. This layer would then act as a filter, avoid false-positives to pass through.
Figure 8.1: The intent is used as an input into a higher-level layer which filters correlatedand duplicated intents.
We should also revise how a set’s salience pattern might elicit the detection of an
intention. For instance, we saw that if the player gets further away from an entity, it
does not necessarily mean that he is running away from it. If, however, we measure other
metrics and compare them against a desired value, we might gather a valuable indication.
Take for instance the visibility and let us consider that a desired value of zero indicates
that the player-character is fully hidden from the enemy. Then, if an unexpected change
occurs in the monitored value of that emotivector, it might suggest that the player is either
65
hiding from that enemy or getting out of his cover to possibly attack him (depending on
the signal’s variation). This is just an example of a possible idea that might easily be
implemented in the current framework to refine detected intentions.
Finally, there are still other intentions to consider, namely the prior intentions. These
might be detected if the system can recognise that the player-character is doing several
actions to achieve the same objective. To that end, however, we do not believe that
looking solely at independent actions and correlating them would be enough. We should,
for instance, try to detect every bit of information that the player knows about the
world. We should probably relate it with a model of the player and try to understand
if he performed a certain action to achieve an objective (effectively detecting a prior
intentions).
Hopefully, these are questions we will further investigate in the future.
66
Bibliography
[1] J. Bates. The nature of characters in interactive worlds and the oz project. Technicalreport, Carnegie Mellon University, 1992.
[2] J. Bates. Virtual reality, art, and entertainment. Presence: Teleoper. Virtual Envi-ron., 1(1):133–138, 1992.
[3] J. Bates. The role of emotion in believable agents. Commun. ACM2, 37(7):122–125,1994.
[4] J. Bates, B. Loyall, and W. S. Reilly. Broad agents. In Proceedings of AAAI SpringSymposium on Integrated Intelligent Architectures, pages 38–40, 1991.
[5] M. Bratman. Intentions, Plans, and Practical Reason. Harvard University Press,1987.
[6] R. Burke. It’s about time: Temporal representations for synthetic characters. Mas-ter’s thesis, MIT, The Media Lab, 2001.
[7] P. Carruthers and P. K. Smith. Theories of Theories of Mind. Cambridge UniversityPress, 1996.
[8] D. C. Dennett. The Intentional Stance. MIT Press, 1987.
[9] P. Gorniak. The Affordance-Based Concept. PhD thesis, MIT, 2005.
[10] P. Gorniak and D. Roy. Speaking with your sidekick: Understanding situated speechin computer role playing games. In Proceedings of the First Annual Artificial In-telligence and Interactive Digital Entertainment Conference, pages 385–392, MenloPark, CA, USA, 2005. AAAI.
[11] K. Isbister. Better Game Characters by Design: A Psychological Approach. MorganKaufmann Publishers Inc., San Francisco, CA, USA, 2006.
[12] D. Isla and B. Blumberg. New challenges for character-based AI in games. InArtificial Intelligence and Interactive Entertainment: Papers from the 2002 AAAISpring Symposium. AAAI Press, 2002.
[13] C. Kline. Observation-based expectation generation and response for behavior-basedartificial creatures. Master’s thesis, MIT, The Media Lab, 1999.
[14] J. E. Laird. It knows what you’re going to do: adding anticipation to a quakebot.In AGENTS ’01: Proceedings of the fifth international conference on Autonomousagents, pages 385–392, New York, NY, USA, 2001. ACM.
67
[15] J. E. Laird, A. Newell, and P. S. Rosenbloom. Soar: an architecture for generalintelligence. Artif. Intell., 33(1):1–64, 1987.
[16] N. Lazzaro. Why we play games: Four keys to more emotion without story. In GameDevelopers Conference, March 2004.
[17] I. Leite, C. Martinho, A. Pereira, and A. Paiva. iCat: an affective game buddy basedon anticipatory mechanisms. In AAMAS ’08: Proceedings of the 7th internationaljoint conference on Autonomous agents and multiagent systems, pages 1229–1232,Richland, SC, 2008. International Foundation for Autonomous Agents and Multia-gent Systems.
[18] A. B. Loyall and J. Bates. Personality-rich believable agents that use language.In Proceedings of the First International Conference on Autonomous Agents, pages106–113. ACM Press, 1997.
[19] D. Mark. The art of AI sidekicks: Making sure robin doesn’t suck.http://aigamedev.com/open/article/art-of-sidekicks/., June 2008.
[20] C. Martinho. Emotivector: Affective Anticipatory Mechanism for Synthetic Charac-ters. PhD thesis, Instituto Superior Tecnico, Technical University of Lisbon, Lisbon,Portugal, September 2007.
[21] C. Martinho and A. Paiva. Using anticipation to create believable behaviour. InProceedings of the AAAI 2006, pages 175–180. AAAI Press, 2006.
[22] C. Martinho and A. Paiva. It’s all in the anticipation. In Proceedings of the seventhinternational conference on Intelligent Virtual Agents, pages 331–338, Paris, France,2007. Lecture Notes in Computer Sciences 4722 Springer 2007.
[23] M. Mateas and A. Stern. Facade: An experiment in building a fully-realized inter-active drama, 2003.
[24] M. Mateas and A. Stern. Natural language understanding in facade: Surface textprocessing. In Proceedings of the Conference on Technologies for Interactive DigitalStorytelling and Entertainment, 2004.
[25] M. Mateas and N. Wardrip-Fruin. Defining operational logics. In Proceedings of theDigital Games Research Association, September 2009.
[26] NIST/SEMATECH e-Handbook of Statistical Methods.http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.htm, October2009.
[27] R. Prada. Teaming Up Humans and Synthetic Characters. PhD thesis, InstitutoSuperior Tecnico, Technical University of Lisbon, Lisbon, Portugal, December 2005.
[28] A. S. Rao and M. P. Georgeff. Bdi agents: From theory to practice. In Proceedingsof the first international conference on multi-agent systems (ICMAS-95), pages 312–319, 1995.
[29] A. Rollings and E. Adams. Andrew Rollings and Ernest Adams on Game Design.New Riders Publishing, Indianapolis, 2003.
68
[30] J. R. Searle. Intentionality, an essay in the philosophy of mind. Cambridge UniversityPress, Cambridge, NY, USA, 1983.
[31] L. Sheldon. Character Development and Storytelling for Games. Premier Press, 2004.
[32] F. Thomas and O. Johnston. The Illusion of Life. Hyperion Press, New York, NY,USA, 1994.
[33] B. Tomlinson, M. Downie, and B. Blumberg. Multiple conceptions of character-basedinteractive installations. In ITU Recommendation I371, pages 5–11, 2001.
69