Head movement and facial expressions as game input

Accepted Manuscript

Head movement and facial expressions as game input

Mirja Ilves, Yulia Gizatdinova, Veikko Surakka, Esko Vankka

PII: S1875-9521(14)00016-0

DOI: http://dx.doi.org/10.1016/j.entcom.2014.04.005

Reference: ENTCOM 115

To appear in: Entertainment Computing

Received Date: 9 September 2013

Revised Date: 10 February 2014

Accepted Date: 28 April 2014

Please cite this article as: M. Ilves, Y. Gizatdinova, V. Surakka, E. Vankka, Head movement and facial expressions

as game input, Entertainment Computing (2014), doi: http://dx.doi.org/10.1016/j.entcom.2014.04.005

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers

we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and

review of the resulting proof before it is published in its final form. Please note that during the production process

errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

http://dx.doi.org/10.1016/j.entcom.2014.04.005

http://dx.doi.org/http://dx.doi.org/10.1016/j.entcom.2014.04.005

1

Head movement and facial expressions as game input

Mirja Ilvesa*, Yulia Gizatdinovaa, Veikko Surakkaa, and Esko Vankkaa

aResearch Group for Emotions, Sociality, and Computing, Tampere Unit for Computer-Human

Interaction (TAUCHI), School of Information Sciences, University of Tampere, Kanslerinrinne 1,

FIN-33014 University of Tampere, Finland

E-mail addresses: [email protected] (M. Ilves), [email protected] (Y. Gizatdinova),

[email protected] (V. Surakka), [email protected] (E. Vankka)

*Corresponding author: Mirja Ilves, E-mail: [email protected], Tel: +358 50 318 5848, Postal

address: Mirja Ilves, Tampere Unit for Computer-Human Interaction, School of Information

Sciences, University of Tampere, Kanslerinrinne 1, FIN-33014 University of Tampere, Finland

Abstract

This study aimed to develop and test a hands-free video game that utilizes information on the

player’s real-time face position and facial expressions as intrinsic elements of a gameplay. Special

focus was given to investigating the user’s subjective experiences in utilizing computer vision input

2

in the game interaction. The player’s goal was to steer a drunken character home as quickly as

possible by moving their head. Additionally, the player could influence the behavior of game

characters by using the facial expressions of frowning and smiling. The participants played the

game with computer vision and a conventional joystick and rated the functionality of the control

methods and their emotional and game experiences. The results showed that although the

functionality of the joystick steering was rated higher than that of the computer vision method, the

use of head movements and facial expressions enhanced the experiences of game playing in many

ways. The participants rated playing with the computer vision technique as more entertaining,

interesting, challenging, immersive, and arousing than doing so with a joystick. The results

suggested that a high level of experienced arousal in the case of computer vision-based interaction

may be a key factor for better experiences of game playing.

Keywords: Interfaces and interaction techniques, camera-based video game, computer vision, face

detection and tracking, facial expression classification, head movement, gameplay experience,

emotion

1. Introduction

Enjoyment and other emotion-related factors are central motivators for playing video games. People

seek and play video games that are fun and entertaining or elicit other kinds of emotional

experiences. Games with different characteristics elicit various emotional responses (e.g. [1]), but

3

control devices for playing can also affect a player’s emotions and game experience. Recent

evidence shows that new, handheld but more natural controllers (i.e. Wii remote, steering wheel)

can lead to higher feelings of spatial presence and game enjoyment than traditional control devices

(i.e. joystick, gamepad, keyboard) [2]. However, systems that utilize physical controls have inherent

limitations of being unable to detect the presence and identity of players, for example.

More natural, active, and playful gaming has become possible because of advances in computer

vision (CV). Through standard video cameras, CV technology provides a low-cost alternative to

handheld devices and allows entirely unobtrusive detection of head and body movements or hand

gestures, for example, and their use as a game input. Automatic analysis of facial information can

help to understand the identities and number of players, as well as their presence and locations. The

recognition of facial expressions is also possible with the help of CV. The human face and facial

expressions provide a rich source of information about human behavior and emotional state. It can

be argued that faces are the main modality in human nonverbal communication, and many

expressions can be performed voluntarily; thus, the use of facial expressions could provide a natural

method of game control. However, in the past, research in the area of automatic face analysis had

focused on the technological aspect, dealing with performance characteristics of different methods

such as their speed, accuracy, and robustness [3,4]. Generally, the question of how video games can

successfully leverage facial information remains less understood. The literature analysis reveals that

although automatic face analysis has started to be utilized in gaming, few studies have attempted to

systematically evaluate the usability and user experience aspects of face-responsive games.

This study aimed to investigate the functional and experiential aspects of head movements and

facial expressions detected by CV as game input methods. In contrast to earlier studies, we

combined active and continuous face tracking with facial expression classification in real time in

4

order to enhance the overall experiences of a gameplay. For this purpose, we designed and

implemented the game “Take Drunkard Home,” where a player’s goal is to steer by head

movements a drunken character home as quickly as possible without the latter falling down. Along

the way, the player needs to pick up various items and avoid collisions with stationary or moving

obstacles. Additionally, the player can influence the behavior of the other game characters by using

facial expressions with two affective meanings—positive for smiling and negative for frowning.

The game solely relies on automatic face processing and therefore supports better accessibility to

video gaming for those players who have difficulties or simply do not want to use physical input

devices. We conducted a user study in which the participants played the game using CV technology

and a conventional joystick. We recorded the game duration; the number of falls; and the number of

picked flowers, beer cans, and hamburgers when the participants played the game. The participants

evaluated the functionality of the control methods, as well as their emotional and game experiences.

This paper reviews recent advances in face-responsive video gaming and introduces the game

design and CV-based control methods used in this study. Then it presents the results of the

empirical evaluation of the game and further discusses issues related to the future development of

video games that utilize facial information.

2. Background

Quite recently, the game industry has added new input devices and techniques to traditional controls

such as the keyboard, mouse, joystick, and gamepad. The controlling of games is not limited

anymore to the use of the hands; new input devices and techniques allow more natural and more

playful, physically active gaming. Thus, the motion-sensing capability of Nintendo’s Wii1 remote

5

enables the detection of acceleration and movement in three dimensions. There are also

technologies that enable game control without any handheld device. For example, Microsoft Kinect2

enables players to control the Xbox 360 using body movements and gestures without touching the

control device. The Microsoft Kinect sensor consists of an infrared laser projector, combined with

two cameras that detect the positions and movements of people in three dimensions. Some game

technologies, such as Sony EyeToy3 in Sony PlayStation2, use CV for the gestural control of

games. Because of advances in CV and hardware processing capabilities, movement- and gesture-

based interaction has also become possible in conventional computer systems. The CV technology

has been improving constantly, and the field has recently demonstrated a number of face analysis

methods that have proven to function well even in challenging conditions [3,5,6]. Considering that

real-time, accurate, and robust measurement of facial visual information is only a matter of time, it

is important to understand how this information can be used in the context of game interaction. At

this point, we emphasize the need for early user studies as an integral part of the development of

CV-based user interfaces and their successful integration into video games.

From the implementation standpoint, controlling video games or any other graphical user interface

is typically based on pointing and activation methods. The pointing method identifies the object of

interest, and the activation method allows implying a certain action in a virtual game world. From

the design standpoint, video games can be controlled in two ways, implicit and explicit. Implicit

control allows for automatic adaptation or adjustment of the game environment and interaction

modalities to the player’s spontaneous behavior. Explicit control means that the player consciously

produces facial expressions, head movements, and body gestures to directly control the game

interaction. This type of gaming replaces traditional gaming with physical input devices that are

primarily based on the point-and-click concept. Furthermore, the use of faces for game control

(implicit or explicit) has two important advantages. First, the human face is highly expressive, with

6

more than 40 muscles that alone or in combinations produce visually detectable changes in facial

appearance. Therefore, facial expressions, together with head movements, can potentially provide a

diverse, intuitive, and fine-grained means of game control. Second, growing evidence indicates that

the use of physical movements of the head and face can enhance the overall experience of game

playing.

The proposed video games to date can be roughly divided into three categories, according to the

inner mechanisms of how facial information has been utilized for game interaction:

(1) Digital interactive mirror. The avatar explicitly repeats the player’s head movements and

facial expressions (the top left image of Figure 1). Real-time animation of sometimes

impressively photorealistic avatars has become an increasingly popular area of research

(e.g. [7]), partly due to its potential utilization in the film industry. Additional graphical add-

ons such as makeup, various head-wearing objects, or emoticons can be drawn on top of the

player’s face to enhance the experience of presence and role-playing [8].

--------------------------------

Insert Figure 1 about here

--------------------------------

Figure 1. Screen shots of face-responsive games: (top left) “Maris head” digital interactive mirror (top right), “eating game” (arrows show the direction of movement), and (bottom) “walking game.”

(2) Viewpoint, directional navigation, and action trigger control. Information on face position

and head orientation is directly used to change the player’s point of view in first-person

games or to steer the avatar in the game environment. Additionally, the detected head

gestures and facial expressions are used to imply a certain action in the game world.

Conventionally, the navigation of the avatar in a two- or three-dimensional world has been

7

performed with handheld devices such as the keyboard, mouse, joystick, and gamepad. It

has been shown that head rotation and movement can substitute for the use of physical

devices in navigational tasks and provide a more natural and intuitive means of game control

[8,9]. The top right image in Figure 1 shows the “eating game” [10], which implies the idea

of transferring the head movement sideways to a horizontal motion of the “eater” character

that is located at the bottom of the game space. The player controls the character’s mouth-

opening movement by opening his or her own mouth. The bottom image in Figure 1 shows a

top-down, strategy-like “walking game” [10], where a circular movement template is used to

move the character from one cell to another in a labyrinth, by means of head gestures. The

player can also produce facial expressions to pick up different items.

(3) Affective control. Facial expressions are naturally utilized to bring affective information to

the gameplay and, depending on the game design, implicitly or explicitly execute emotion-

related or emotion-guided activities. This idea closely relates to the concept of affective

gaming, meaning that the player’s emotional state influences the game’s difficulty level or

aesthetics, for example [11]. Previously, in order to assess the user’s affective state,

information on the player’s heart rate, skin conductance, and respiration had been detected

and further utilized in manipulating the gameplay [12]. In CV-based games, the player’s

spontaneous facial expressions and body gestures are detected fully unobtrusively without

the attached physiological sensors and used to adapt the game to the supposed affective state

of the player.

Our research focus belongs to the last two categories on augmenting video games with information

on the player’s real-time head position and facial expressions. We concentrate our literature search

on those studies involving empirical verification and explain how the proposed game designs

influence user experiences.

8

2.1. Facial information for viewpoint, directional navigation, and action trigger control

Wang et al. [8] investigated how applying real-time information about face position as an essential

element of a gameplay would affect game experiences. They found that using face position

information in a first person shooter (FPS) game for peek-and-dodge movements can effectively

enhance the sense of presence. Sko and Gardner [9] also studied the potential of the head gestural

input in FPS games. From their focus group consisting of expert game developers and experienced

end users, their study received positive feedback concerning the head gestural input (e.g. zooming,

peeking, spinning) in FPS games. In their further study [13], the head interaction technique was

improved to make it robust to the variable conditions of home use. The feedback from 2,500 users

showed that head tracking improved the game’s immersion and realism. Furthermore, Gorodnichy

and Roth’s study [14] showed that test participants rated playing the game “Aim-n-shoot

BubbleFrenzy” with the hands-free ‘nose as mouse’ technique as more fun and less tiring than

playing the game with a mouse.

2.2. Facial expressions as affective control

In addition to body postures, gestures, and head position, CV can detect changes in facial

expressions. The human face and facial expressions have a significant role in interpersonal

interaction; thus, the use of facial expressions as an input method also provides a potential

communication channel for the gaming environment. Facial expressions are communicative signals

that reflect both voluntary and involuntary activation (e.g. [15,16]. Involuntary facial expressions

that occur spontaneously can reflect emotional states such as fear, anger, and happiness, or

9

cognitive activities such as concentration [17,18,19]. Facial expressions can also be used

voluntarily, for example, to affect the mental state or behavior of another person. For example, a

smile can show friendliness, approval, or encouragement; lifting the eyebrows can communicate

wonder; and lowering the eyebrows can demonstrate disapproval or aggression.

Some authors have developed and tested video games that utilize voluntary or involuntary facial

expressions. Obaid, Han, and Billinghurst [20] designed the game “Feed the Fish,” which responds

to a player’s facial expressions by adjusting the game’s difficulty level. A positive expression

changes the game level to a harder one, and a negative expression lowers the game level to an easier

one. They found that people rated the affect-aware game as more enjoyable, exciting, and

challenging than its non-affective version. Bernhaupt et al. [21] developed the game “Emotional

Flowers” for long-term usage (i.e. three to five days) in a working environment. The game’s main

idea is to grow a flower as fast as possible by using positive facial expressions. The facial

expressions are measured every now and then, and the flower grows or shrinks depending on the

detected facial expressions. Multiple players can play the game simultaneously. An ambient display

in a public area shows the flowers of all participants. A user study showed that the game influenced

the players’ emotional states and social communication patterns. Lankes et al. [22] redesigned the

game of Bernhaupt et al. to be suitable for a shopping center. In this kind of environment, the

interaction time has to be short, with immediate feedback. In Lankes et al.’s “EmoFlowers” game,

the players’ facial expressions influence the weather status in the game and in this way, the growth

of a virtual flower. The expressions of sadness lead to rain, and the expressions of joy cause

sunshine. The participants of a user study reported that interaction with the game via facial

expressions was natural and easy to learn. Additionally, the majority (i.e. 92 %) of the players

claimed a positive user experience while playing.

10

These studies show the potential of facial expressions as a game input method. Because in real life,

facial expressions have a central role in face-to-face communication between humans, it is logical

to study whether the use of expressions would also function in a game’s more natural interaction

context. Thus, in our game, the facial expressions’ purpose was to affect the behavior of the other

game characters. Moreover, our game did not utilize facial expressions only, but combined them

with information on head movements for avatar steering.

2.3. Subjective experience measures of game playing

Computer games are hedonic in nature, that is, people play games to entertain themselves [23].

Hedonic products are consumed mainly for affective or sensory gratification purposes [24].

Affective experiences are often measured through a certain set of dimensional scales (i.e. valence,

arousal, and dominance) that are formed based on a dimensional theory of emotions [25]. The

valence dimension relates to the pleasantness of a certain experience, ranging from unpleasant to

pleasant. Arousal dimension refers to the level of activation, ranging from relaxed to stimulated.

The dominance dimension involves the feelings of control, ranging from being controlled to being

in control.

In the game environment, factors such as immersion, presence, and flow have also been considered

important for a comprehensive understanding of the subjective game experience. IJsselsteijn et al.

[26] developed a game experience questionnaire (GEQ) that measures several dimensions of the

playing experience. The questionnaire has been used in many game studies globally, and it can

assess the gameplay experience with high reliability [27,28]. There is also evidence that video

games cause motion sickness in many players [29]. Thus, it is essential to evaluate whether the use

11

of head movements increases motion sickness, compared to a more traditional control device. Thus,

we attempted to compare the functionality and motion sickness experiences between the CV and

joystick control methods, the emotional experiences with the control methods, and players’ game

experiences.

3. Methods

3.1. Game design

“Take Drunkard Home” (see Figure 2) is a third-person view, obstacle course game, where the

player’s main goal is to steer a drunken man home as quickly as possible without the latter falling

down. The character—a drunken soldier—walks along the dark street (the character’s forward

movement is automatic). Real-time head position information is processed and used for explicit

control of the character’s walking direction. Thus, the player’s steering of the character to the left or

right is possible by using head movements sideways. Because the player’s character is drunk, he is

in constant danger of falling down. That is why the player must actively keep the drunken character

balanced with head movements; the player should move the head to the right when the character is

falling to the left and vice versa.

--------------------------------


--------------------------------

Figure 2. Screen shots of the game: (top image) neutral facial expression, (bottom left image) smiling facial

expression, and (bottom right image) frowning facial expression.

12

The character’s initial level of intoxication is adjustable from the menu settings, varying from sober

(balanced, easy to steer) to very drunk (highly unbalanced, difficult to steer). The player should also

try to pick up as many items as possible along the way. Bonus points are represented as flowers, and

the player should pick them up in order to increase the total score of the gameplay. The other items

that influence the gameplay are beer cans, which increase the character’s walking speed at the cost

of also increasing his intoxication level, and hamburgers, which decrease the character’s

intoxication level and walking speed. On the way, the player should avoid collisions with different

obstacles such as boxes and other characters such as stationary or moving cats or people. The

difficulty and length of the obstacle course are adjustable from the menu settings. A general

workflow diagram of the game is shown in Figure 3.

--------------------------------


--------------------------------

Figure 3. A general workflow diagram of the gameplay with key scenes (green), backend judgment (pink),

and user interaction (yellow).

Our particular interest in designing this game was to integrate facial expressions into the gameplay

in an intuitive, easy, and entertaining way. Although the general idea seems straightforward, the use

of voluntary facial expressions as another axis of control for emotion-guided activities is not easy in

the case of active games. In the “Take Drunkard Home” game, the player navigates through the

obstacle course, strategically thinks of positions and amounts of items to be picked up, and at the

same time, actively steers and balances the character with physical movements. It can become a

demanding task for the player to receive an additional cognitive load by recognizing situations

13

where expressions can be beneficial and making a physical effort to produce such expressions.

Therefore, the game design should take special care of easing this task for the player by creating

situations with a natural connection to the real world’s use of facial expressions. We ended up with

a design that assigns two affective meanings, namely, positive and negative affects, to the player’s

expressions with respect to the other characters of the game (see Figure 4). Thus, a smiling

expression stops moving characters, which become friendly and do not collide with the player’s

character. A frowning expression frightens away stationary characters, which become scared by the

expression of disapproval or anger and give way to the player’s character.

--------------------------------


--------------------------------

Figure 4. A diagram of key actions of the gameplay such as walking, socio-emotional interactions with other

game characters, avoiding obstacles, falling down, and picking up items.

We enhanced visual feedback about facial expression processing and usage by adding an emoticon

at the bottom of the game space (see Figure 3). The emoticon remains inactive during the times

when the game does not expect affective input from the player. When the player approaches a

character that moves or stands in the way, the emoticon activates, indicating that now the player is

entitled to input affective information to the gameplay by producing one of the two predefined

facial expressions. If the player shows neither of those two expressions, the game proceeds. If the

character falls down because of a stationary or moving character, the game gives the player three

seconds to prepare for the continuation of the game (the display shows a timer). Additionally, there

is a two-second period when the character does not collide with the obstacles but walks by them.

This prevents the character from colliding with the same obstacle repeatedly.

14

3.2. Implementation

The game was created using the XNA4 Framework, with the Nuclex5 Framework’s input library for

the joystick input. The CV algorithms were developed separately and executed in another

application. The results of face processing were sent to the game application using a socket

connection. The participants were able to observe the face-processing output on a separate window

to ensure that the face was tracked properly. If the game did not find a face when the player moved

out of the camera’s field of view, for example, it paused the gameplay until the face was found

again.

The CV application used the location of the player’s face on the camera image, aka camera mouse,

to steer the game’s character. Two different methods were utilized for face detection and tracking.

The Viola-Jones face detector [30,31] was used as a primary method for locating the player’s face

and tracking his or her sideways movements. It is a fairly fast and robust method of detecting faces,

but it frequently fails when the head is tilted or rotated to near-profile views. On the other hand, the

tracking-learning-detection method (TLD) [32] is a reliable one, which can learn changes in facial

appearance on the fly. However, in our implementation, TLD was not fast enough to support a real-

time gameplay. For this reason, TLD was used only in those cases when the Viola-Jones method

failed to find the face. The speed of face detection was ~30 frames per second (fps) for the Viola-

Jones method, dropping to 10–15 fps for the TLD method. A moving average was applied to the

five most recent face locations to remove the jitter from the detection output. Using the

anthropometrical measures of the human face [33], the detected region was further segmented into

the upper face (eye-forehead) and lower face (mouth) areas.

15

Facial expression recognition was performed according to the approach presented in [31,34], which

applies support vector machines for the histogram-based image classification with structural and

textural features. The upper face classifier was trained to differentiate between neutral and frowning

expressions, while the lower face classifier distinguished between neutral and smiling expressions.

The classifiers have been evaluated previously in real-time interaction scenarios. Thus, based on the

earlier findings reported for this approach to classifying facial expressions [31,34], the expected

misclassifications due to the system, namely, false positive and false negative rates, were ~10%.

The speed of upper and lower classifiers working simultaneously was ~10 fps. Considering that an

average duration of facial expressions such as frowning is 500±200 ms [30], we assumed that the

speed of facial classification would support capturing the players’ facial expressions. The

expression classification module was implemented so that it did not slow down the speed of the face

detection/tracking module. We note that the game design is independent from its implementation of

CV methods; therefore, the selection of the methods is not limited to those presented in this paper,

since other methods may operate better in a given context.

The game can also be played using a conventional physical input device, for example, steering and

balancing the character by tilting the joystick to the left or right. The expressions are controlled with

two buttons, one for a smile and one for an angry expression. If no button is pressed, the expression

is neutral.

4. User study

4.1. Participants

16

Twenty participants (7 females, 13 males) took part in the study. The participants’ mean age was 33

years, ranging between 24 and 51. All the participants played video games to some extent; 3

participants played at least a couple of times a year, 2 at least once a month, 9 at least once a week,

and 6 on a daily basis. All had some experience using both a joystick and body movements for

controlling games (e.g. Nintendo Wii, Microsoft Kinect, or PlayStation Move).

4.2. Equipment

The experiments were conducted on a PC computer with a 64-bit Windows 7 Professional operating

system. The computer had an Intel® Core™2 Quad CPU and 4 GB of RAM. The display screen

was a 24-inch Samsung SyncMaster 2443 with a resolution of 1920x1080 pixels. For CV-based

controlling, we used an off-the-shelf Creative Live!® CAM Sync web camera with a resolution of

800x600 pixels. The camera was installed on top of the monitor, approximately at eye level. The

joystick used was Logitech Attack 3.

4.3. Experimental procedure and measures

When a participant arrived at the laboratory, he or she was oriented by the experimenter about the

laboratory and asked to fill out a consent form and a background questionnaire. Then the participant

was told to sit on a chair in front of the computer screen. The experimenter introduced the idea of

the game and instructed the participant about his or her task to play the game with two controlling

methods and rate the game and game experiences.

17

The experiment was counterbalanced so that half of the participants played the game first using a

joystick and the other half did so using the CV-based control method. This study utilized a person-

dependent CV system, meaning that it needed a special calibration procedure to fine-tune the CV

methods for each new participant [31]. A calibration window was presented to the participants (the

window width equalled the screen width, and the window height equalled half of the screen height,

with the window positioned on the top half of the screen). The participants were instructed to

continuously point with their faces, one by one at the four corners of the calibration window. The

corners pointed at were highlighted in red, providing visual feedback to the participants. During the

calibration procedure that lasted 2–3 minutes overall, facial image data were collected and further

used to train the TLD face tracker and face classifiers. The training set for the TLD face tracker was

collected first and consisted of 50 images. The face classifiers were trained next, with image inputs

from the segmented upper and lower parts of the face. First, a training set of 50 upper and 50 lower

face images with a neutral expression was collected, followed by 25 images each of a smiling face

and a frowning face. The participants were asked to produce expressions of high intensity in order

to obtain a representative training set of images. Then the CV system was trained with the collected

image data for 1–2 minutes. Finally, the face detection and expression classification were verified to

ensure that the system operated well.

During the calibration, the participants were familiarized with the idea of the CV-based control

method. It was also explained that the CV technology had certain limitations and that the best

performance of the system could be achieved if the face remained close to the up-frontal position.

The general recommendation was to activate the torso and make small rotations and tilts of the head

to control the application. The participants were instructed to check now and then whether their

faces were still in the camera’s field of view.

18

With both input methods, there were four playing fields: a practice field and three actual playing

fields. After the participant had played the game using one or the other method, he or she gave the

ratings. First, the participant rated his or her emotional experience with the control method using

three, nine-point bipolar scales: pleasantness, arousability, and dominance. The scales varied from -

4 (e.g. unpleasant) to +4 (e.g. pleasant). Zero (0) represented a neutral point (e.g. neither unpleasant

nor pleasant) in all scales.

Then the user gave eight different ratings of the game and control methods using nine-point bipolar

scales that have previously been used in many studies investigating new interaction methods

[35,36,37]. The scales were as follows: general evaluation (i.e. varying from bad to good), speed

(i.e. from slow to fast), accuracy (i.e. from inaccurate to accurate), efficiency (i.e. from inefficient

to efficient), difficulty (i.e. from difficult to easy), naturalness (i.e. from unnatural to natural),

amusement (i.e. from boring to fun), and interestingness (i.e. from uninteresting to interesting). The

CV-based method was also rated with four additional scales: the pleasantness of smiling, the

pleasantness of frowning, the functionality of the smile, and the functionality of the frown. These

scales varied from -4 (e.g. bad experience) to +4 (e.g. good experience). Zero (0) represented a

neutral value (e.g. neither bad nor good) in all scales.

Finally, the participant filled out a GEQ [26] consisting of seven dimensions of the player

experience: sensory and imaginative immersion, tension, competence, flow, negative affect,

positive affect, and challenge. The ending section of the GEQ had four questions about sickness

symptoms taken from the ITC-Sense of Presence Inventory (ITC-SOPI), where they formed a factor

labeled “negative effects.” The 5-point rating scales varied from 0 (not at all) to 4 (extremely). After

filling the questionnaire, the participant played the game once more using the other technique and

gave the ratings again. The total duration of the experiment was approximately 50 min.

19

4.4. Data analysis

The game performance measures were compared between the control techniques using the pairwise

t-test. The ratings for the control techniques were compared using the pairwise Wilcoxon signed-

rank test.

5. Results

5.1. Game performance measures

Figures 5 and 6 show the means and standard error of the means (SEMs) of the game duration, the

number of falls, and the number of picked flowers, beer cans, and hamburgers. The pairwise

comparisons between the control methods showed statistically significant differences; the

participants got through the game quicker t(19) = -11.83, p < .001, managed to pick up more

flowers t(19) = 5.23, p < .001 and beer cans t(19) = 4.01, p < .001, and their drunkard character fell

less frequently t(19) = -8.83, p < .001 when they played the game using a joystick than when they

did so with CV. The difference in the amounts of the picked hamburgers was statistically

insignificant.

--------------------------------


--------------------------------

20

Figure 5. Mean game durations (and SEMs) for both control techniques.

--------------------------------


--------------------------------

Figure 6. Mean numbers of falls, and picked flowers, beer cans, and hamburgers (and SEMs) for both

control techniques.

The pairwise comparisons between the first and third playing fields showed that practicing

improved to some extent the game performance for both control methods. The differences in the

results were statistically significant; the participants finished the game quicker t(19) = 3.03, p < .01

and their drunkard character fell less frequently t(19) = 3.56, p < .01 in the last field, compared to

the first playing field, when the participant played the game with CV. In the first playing field, the

drunkard fell down approximately every 17 seconds. In the third playing field, the drunkard fell

down approximately every 26 seconds. The differences between the numbers of picked flowers,

beer cans, or hamburgers were statistically insignificant. In the joystick method, the participants’

drunkard character fell less frequently t(19) = 2.81, p < .05 and managed to pick up more flowers

t(19) = -3.49, p < .01 in the third playing field than in the first one. The differences in the game

duration and numbers of picked beer cans or hamburgers were statistically insignificant.

5.2. Emotional ratings for the control techniques

Figure 7 shows the mean ratings and SEMs for experienced valence, arousal, and dominance in

both control techniques. Wilcoxon signed-ranked tests showed statistically significant differences in

21

the ratings for valence (Z = 3.04, p < .01) and dominance (Z = 3.91, p < .001), which were higher

after the participants had played the game using the joystick than using the CV-based technique.

The ratings for arousal also indicated statistically significant differences, although they were higher

after the participants had played the game using the CV-based technique than using the joystick (Z

= 3.03, p < .01).

--------------------------------


--------------------------------

Figure 7. Mean ratings (and SEMs) for valence, arousal, and dominance in both control techniques.

5.3. Subjective evaluation of the game

Figures 8 and 9 show the mean ratings and SEMs for the evaluations of the game and control

methods. The Wilcoxon signed-ranked test between the CV-based technique and the joystick

showed statistically insignificant differences in the general evaluation. The steering of the game

using the joystick was rated as significantly faster (Z = 3.03, p < .01), more accurate (Z = 3.03, p <

.01), more efficient (Z = 3.03, p < .01), easier (Z = 3.03, p < .01), and more natural (Z = 3.03, p <

.01) than that with the CV-based method. Playing using the CV-based method was rated as

significantly more entertaining (Z = 3.03, p < .01) and interesting (Z = 3.03, p < .01) than doing so

with the joystick.

Pairwise comparisons between the functionality of frowning and smiling or pleasantness of

frowning and smiling were statistically insignificant.

22

--------------------------------


--------------------------------

Figure 8. Mean subjective ratings (and SEMs) for both control techniques.

--------------------------------


--------------------------------

Figure 9. Mean ratings (and SEMs) for the game experience in both control techniques.

5.4. Game experience

Figure 10 shows the mean ratings and SEMs for the seven factors of the GEQ. The ratings for

immersion (Z = 2.05, p < .05), tension (Z = 2.40, p < .05), and challenge (Z = 3.64, p < .001) were

significantly higher after playing the game with the CV-based method than with the joystick. The

ratings for competence (Z = 3.36, p < .001) and negative affect (Z = 1.98, p < .05) were

significantly higher after playing with the joystick than with CV.

The ratings for the flow and positive affect indicated statistically insignificant differences between

the control methods.

23

--------------------------------


--------------------------------

Figure 10. Mean ratings (and SEMs) for the game experience in both control techniques.

5.5. Physical tiredness and sickness symptoms

No significant differences in sickness symptoms (Cronbach’s α = 0.76) were reported when the

participants played the game with the joystick or CV.

6. Discussion

The present results provided evidence both for and against using body- and face-based interactions.

On the one hand, the findings showed that CV enhanced gaming and entertainment experiences in

many ways, compared with the joystick. Playing using CV was rated as more entertaining and

interesting than gaming with the joystick. The participants also experienced playing with CV as

more challenging and immersive than doing so with the joystick. On the other hand, the joystick

was scored as more functional than the CV technique. The participants rated joystick steering as

more accurate, efficient, natural, faster, and easier than CV steering. Probably, the lower

functionality of the CV method also caused the participants to regard themselves as less competent

and to feel more tension after gaming with CV than doing so with the joystick.

24

The scores for the affective space indicated higher ratings for valence and dominance with joystick

controlling than with the CV-based counterpart, whereas CV controlling was ranked more arousing

than the joystick-controlled playing. The probable reason for these differences is that the joystick

control function was experienced as better and easier than that of CV; the former also evoked more

pleasant and less arousing experiences. On the other hand, the ratings for arousal were consistent

with Ibsister’s findings [38] that movement-based steering led to higher scores for arousal than

those obtained using key commands. Furthermore, the results for the affective ratings are interesting

in the light of the recent findings of Poels et al. [23]. They studied how players’ emotions during

gameplay predict playing behavior at a later stage. Their research revealed that pleasure during the

initial gameplay affected short-term playing time and game preferences, while experienced arousal

predicted long-term game preferences best. Thus, experienced arousal seems to be an important

factor for the long-term success of a video game. Maybe the level of experienced arousal is also

associated with the level and amount of body movements while playing. This in turn motivates

players better for future gaming. Previous research has shown that body movement controlling can

lead to more enjoyable and engaging experiences, compared with traditional controllers [39,40].

Clearly, our CV-based application required more movements than the use of the joystick.

The ratings for valence somewhat contradict those for positive and negative affects in the GEQ.

Although joystick steering evoked higher ratings for valence than CV steering did, the scores for

positive affect were not significantly different between these control methods. Furthermore, the

ratings for negative affect were even higher after joystick controlling than after CV controlling.

When the participants gave their scores for valence, arousal, and dominance, they were instructed to

rate their experiences with the control method. In contrast, the negative and positive affect

dimensions in the GEQ measured comprehensive experience during the game (e.g. “I felt bored”).

Thus, although joystick steering was experienced as more pleasant than CV steering, the overall

25

impression of the gameplay with the joystick was more boring and tiresome, compared to that with

CV.

The investigation of players’ game experiences, along with the more functional aspects of the game,

is essential for evaluating the potential success of a video game. Sweetser and Wyeth [41, p. 1]

suggested that “player enjoyment is the single most important goal for computer games.” Moreover,

Sherry et al. [42] proposed that challenge and arousal are among the main reasons why people play

video games. Thus, because the participants experienced game playing as particularly more

entertaining, arousing, and challenging when they used their own head movements and facial

expressions rather than the joystick, CV seems to have considerable potential as a control method.

Our results suggest that CV-based controlling could be an appealing enhancement to traditional

control devices in the game environment. However, on the basis of this study, it is not possible to

predict which of the gameplay conditions people would prefer over the long term. It is possible that

the novelty of the CV method influenced the ratings for interestingness and entertainment, for

example. Alternatively, CV-based controlling could possibly be more rewarding after the players

have learned to use it better. In a future study, it would be interesting to investigate long-term

playing behavior to discover whether the ratings change over time.

As described above, the participants rated the functionality (or usability) of the joystick as better

than that of CV. Joystick steering was scored faster and easier, as well as more accurate and

efficient than CV steering. Additionally, although the mean ratings for the functionality of facial

expressions as an input method were on the positive side of the scale, these numbers were quite

low. These combined results indicate some problems with the detection of head movements and

facial expressions using the CV method. By improving the robustness and speed of the CV method,

steering the game with head movements and facial expressions could result in more positive ratings

26

for game functionality and players’ subjective experiences. In future studies, we will extend the

scope of facial behavior so that other expressions would be tested with respect to game controlling.

In the present study, the participants controlled the game directly and consciously with head

movements and facial expressions. It is also possible to apply other kinds of bodily information as

an interaction method in games. Previous studies have provided evidence that direct or explicit

biofeedback can enhance the game experience. Nacke et al. [43] found that people prefer direct

physiological control over indirect control. In the study of Kuikkaniemi et al. [27], implicit

biofeedback had no effects on player experience, whereas the measures that a player could

manipulate explicitly heightened the feelings of immersion and enjoyment. Furthermore, Dekker

and Champion [12] successfully used players’ biofeedback information to increase the feelings of

terror in a horror game. Thus, in addition to facial expressions, using other kinds of consciously

produced physiological information could enhance the game experience. This aspect will be

considered in future work.

In conclusion, even though CV functionality was not experienced as effective as that of the joystick,

a new kind of control method evoked significantly higher experiences of entertainment,

interestingness, challenge, and immersion, for example. Thus, it can be argued that the playing

experience using the CV-based technique was more enhanced than that with the traditional joystick.

In the future, CV could provide a promising, hands-free method for controlling games.

Acknowledgments

27

This research was funded by the Academy of Finland (project 129354). The authors would like to

thank the following students of the University of Tampere: Tek Prasad Gautam, Reza Ahliaraghi,

Henrik Lehtinen, Anju Thapa, Mirjan Merruko, and especially Yanzhao Wen (who worked as a

summer intern in the project), for implementing face-tracking algorithms, and Anu Leppälampi for

serving as the experimenter.

Footnotes

1 http://www.nintendo.com/wii

2 http://www.microsoft.com/en-us/kinectforwindows/

3 http://us.playstation.com/ps2/accessories/eyetoy-usb-camera-ps2.html

4 http://www.microsoft.com/en-us/download/details.aspx?id=23714

5 http://nuclexframework.codeplex.com/

References

[1] N. Ravaja, M. Salminen, J. Holopainen, T. Saari, J. Laarni, A. Järvinen, Emotional response

patterns and sense of presence during video games: Potential criterion variables for game design, in:

Proceedings of the third Nordic conference on Human-computer interaction, ACM, New York,

2004, pp. 339-347.

[2] P. Skalski, R. Tamborini, A. Shelton, M. Buncher, P. Lindmark, Mapping the road to fun:

Natural video game controllers, presence, and game enjoyment, New Media & Society 13 (2)

(2011) 224-242.

28

[3] Z. Zeng, M. Pantic, G.I. Roisman, T.S. Huang, A survey of affect recognition methods: audio,

visual, and spontaneous expressions, IEEE Transactions on Pattern Analysis and Machine

Intelligence 31 (1) (2009) 39-58.

[4] C. Manresa-Yee, P. Ponsa, J. Varona, F.J. Perales, User experience to improve the usability of a

vision-based interface, Interacting with Computers 22 (6) (2010) 594-605.

[5] M. Porta, Vision-based user interfaces: Methods and applications, International Journal of

Human-Computer Studies 57 (1) (2002) 27-73.

[6] M. Yang, D. Kriegman, N. Ahuaja, Detecting face in images: A survey, IEEE Transactions on

Pattern Analysis and Machine Intelligence 24 (1) (2002) 34-58.

[7] J.M. Saragih, S. Lucey, J.F. Cohn, Real-time avatar animation from a single image, in: IEEE

International Conference on Automatic Face and Gesture Recognition (FG’11), 2011, pp. 117 –

124.

[8] S. Wang, X. Ziong, Y. Zu, C. Wang, W. Zhang, X. Dai, D. Zhang, Face-tracking as an

augmented input in video games: enhancing presence, role-playing and control, in: Proceedings of

the SIGCHI conference on Human Factors in computing systems, ACM, New York, 2006, pp.

1097-1106.

[9] T. Sko, H.J. Gardner, Head tracking in First-Person games: interaction using a web-camera, in:

Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction

(INTERACT '09): Part I, LNCS 5726, 2009, pp. 342-355.

29

[10] Y. Gizatdinova, V. Surakka, S. Haniff, E. Mäkinen, R. Raisamo, J. Iso-Tuisku, A. Sand (2013).

Emerging application areas and challenges of automatic face analysis, Continuum: Journal of Media

& Cultural Sciences 27 (4) (2013) 572-584.

[11] K. Gilleade, A. Dix, J. Alanson, Affective videogames and modes of affective gaming: assist

me, challenge me, emote me, in: Proceedings of the Digital Games Research Association DiGRA

2005 Conference: Changing Views – Worlds in Play, 2005.

[12] A. Dekker, E. Champion, Please biofeed the zombies: enhancing the gameplay and display of a

horror game using biofeedback, in: Proceedings of the Digital Games Research Association DiGRA

2007 Conference: In Situated Play, 2007, 550-558.

[13] T. Sko, H. Gardner, M. Martin, Studying a head tracking technique for first-person-shooter

games in a home setting, in: Proceedings of the International Conference on Human-Computer

Interaction (INTERACT 2013): Part IV, LNCS 8120, 2013, pp. 246-263.

[14] D.O. Gorodnichy, G. Roth, Nouse ‘use your nose as a mouse’ perceptual vision technology for

hands-free games and interfaces, Image and Vision Computing 22 (12) (2004) 931-942.

[15] M.L. Knapp, Nonverbal communication in human interaction, Holt, Rinehart and Winston,

New York, 1978.

[16] V. Surakka, J.K. Hietanen, Facial and emotional reactions to Duchenne and non-Duchenne

smiles, International Journal of Psychophysiology 29 (1998) 23-33.

30

[17] J.T. Cacioppo, R.E. Petty, K.J. Morris, Semantic, evaluative, and self-referent processing:

memory, cognitive effort, and somatovisceral activity, Psychophysiology 22 (4) (1985) 371-384.

[18] P. Ekman, An argument for basic emotions, Cognition and Emotion 6 (3/4) (1992) 169-200.

[19] J.K. Hietanen, V. Surakka, I. Linnankoski, Facial electromyographic responses to vocal affect

expressions, Psychophysiology 35 (1998) 530-536.

[20] M. Obaid, C. Ham, M. Billinghurst, “Feed the Fish”: and affect aware game, in: Proceedings of

the 5th Australasian Conference on Interactive Entertainment, article No. 6, ACM, New York,

2008.

[21] R. Bernhaupt, A. Boldt, T. Mirlacher, D. Wilfinger, M. Tscheligi, Using emotion in games:

emotional flowers, in: Proceedings of the International Conference on Advances in Computer

Entertainment Technology 2007, ACM, New York, 2007, pp. 41-48.

[22] M. Lankes, S. Riegler, A. Weiss, T. Mirlacher, M. Pirker, M. Tscheligi, Facial expressions as

game input with different emotional feedback conditions, in: Proceedings of the International

Conference on Advances in Computer Entertainment Technology 2008, ACM, New York, 2008,

pp. 253-256.

[23] K. Poels, W. van den Hoogen, W. Ijssellsteijn, Y. de Kort, Pleasure to play, arousal to stay: the

effect of player emotions on digital game preferences and playing time, Cyberpsychology,

Behavior, and Networking 15 (1) (2012) 1-6.

31

[24] D.S. Kempf, Attitude formation from product trial: Distinct roles of cognition and affect for

hedonic and functional products, Psychology & Marketing 16 (1) (1999) 35-50.

[25] M.M. Bradley, P.J. Lang, Measuring emotion: The self-assessment manikin and the semantic

differential, Journal of Behavioral Therapy and Experimental Psychiatry 25(1) (1994) 49-59.

[26] W. IJsselsteijn, W. van den Hoogen, C. Klimmt, Y. de Kort, C. Lindley, K. Mathiak, K. Poels,

N. Ravaja, M. Turpeinen, P. Vorderer, Measuring the experience of digital game enjoyment, in:

Proceedings of Measuring Behavior, 2008, pp.26-29.

[27] K. Kuikkaniemi, T. Laitinen, M. Turpeinen, T. Saari, I. Kosunen, The influence of implicit and

explicit biofeedback in first-person shooter game, in: Proceedings of the SIGCHI Conference on

Human Factors in Computing Systems, ACM, New York, 2010, pp. 859-868.

[28] L. Nacke, C.A. Lindley, Flow and immersion in first-person shooters: Measuring the player’s

gameplay experience, in: Proceedings of the 2008 Conference on Future Play: Research, Play,

Share, ACM, New York, 2008, pp. 81-88.

[29] C.-H. Chang, W.-W. Pan, L.Y. Tseng, T.A. Stoffregen, Postural activity and motion sickness

during video game play in children and adults, Experimental Brain Research 217 (2) (2012) 299-

309.

[30] P. Viola, M. Jones, Robust real-time face detection, International Journal of Computer Vision

57 (2) (2004) 137-154.

32

[31] Y. Gizatdinova, O. Špakov, V. Surakka, Face typing: Visual gesture-based perceptual interface

for typing with a scrollable virtual keyboard, in: IEEE Workshop on the Applications of Computer

Vision (WACV’12), IEEE Computer Society, 2012, pp. 81-87.

[32] Z. Kalal, K. Mikolajczyk, J. Matas, “Tracking-Learning-Detection”, Pattern Analysis and

Machine Intelligence 34 (7) (2012) 1409 - 1422.

[33] L. Farkas, Anthropometry of the Head and Face, second ed., Raven, New York, 1994.

[34] Y. Gizatdinova, O. Špakov, V. Surakka, Comparison of video-based pointing and selection

techniques for hands-free text entry, in: Proceedings of International Working Conference on

Advanced Visual Interfaces (AVI’12), ACM, New York, 2012, pp. 132-139.

[35] V. Surakka, M. Illi, P. Isokoski, Gazing and frowning as a new human-computer interaction

technique, ACM Transactions on Applied Perception 1 (1) (2004) 40-56.

[36] V. Surakka, P. Isokoksi, M. Illi, K. Salminen, Is it better to gaze and frown or gaze and smile

when controlling user interfaces?, in: Proceedings of HCI International, Vol. 2005.

[37] O. Tuisku, V. Surakka, T. Vanhala, V. Rantanen, J. Lekkala, Wireless Face Interface: Using

voluntary gaze direction and facial muscle activations for human-computer interaction, Interacting

with Computers 24 (1) (2012) 1-9.

33

[38] K. Ibsister, Emotion and motion: Games as inspiration for shaping the future interface,

Interactions 18 (5) (2011) 24-27.

[39] N. Bianchi-Berthouze, W.W. Kim, D. Patel, Does body movement engage you more in digital

game play? And why?, in: Proceedings of the International Conference of Affective Computing and

Intelligent Interaction (ACII 2007), LNCS 4738, 2007, pp. 102-113.

[40] L.F. Teófilo, P.A. Nogueira, P.B. Silva, GEMINI: A generic multi-modal natural interface

framework for videogames, in: Advances in Information Systems and Technologies

(WorldCIST’13), 2013, 873-884.

[41] P. Sweetser, P. Wyeth, GameFlow: A model for evaluating player enjoyment in games, ACM

Computers in Entertainment 3 (3) (2005) 1-24.

[42] J.L. Sherry, K. Lukas, B. Greenberg, K. Lachlan, Video game uses and gratifications as

predictors of use and game preference, in P. Vorderer, J. Bryant (Eds.), Playing videogames:

motives, responses, and consequences. NJ: Lawrence Erlbaum Associates, 2006, pp. 213-224.

[43] L.E. Nacke, M. Kalyn, C. Lough, R.L. Mandryk, Biofeedback game design: using direct and

indirect physiological control to enhance game interaction, in: Proceedings of the SIGCHI

Conference on Human Factors in Computing Systems, ACM, New York, 2011, pp. 103-112.

34

Highlights

• We present a game that utilizes information on face position and facial expressions

• A user study was conducted to evaluate the game

• The utilization of face analysis enhanced players’ game experiences in many ways

• Facial information detected by computer vision offers a promising way for game control

35

36

Game starts User login and

settings adjustment

(e.g. difficulty level,

controlling method)

Yes Calibration of

CV methods

Is face

detected/t

racked?

Is CV input

used? No

Collision course starts

Yes

No

Yes

Logging starts

Game pauses

Game continues

Play again with the same

user?

Collision course ends Logging ends

Walking/steering

Game endsNo

u

37

Walking/steering

Avoiding inanimate

obstacles (e.g. boxes,

bananas)

Interaction with living obstacles (e.g. people, animals)

Is obstacle

moving to

collide?

No

Falling down

Smiling

Was smile

successful?

Yes

No

Yes

No

Was frown

successful?

Is obstacle

standing on

the way?

Frowning

Yes

Yes

No

Picking up items (i.e. beer

cans, hamburgers, flowers)

38

39

Date post:	30-Dec-2016
Category:	Documents
Upload:	esko
View:	213 times
Download:	1 times

Head movement and facial expressions as game input

Documents