Download - sensing physiological arousal and visual attention during user ...

SENSING PHYSIOLOGICAL AROUSAL

AND VISUAL ATTENTION DURING

USER INTERACTION

A thesis submitted to the University of Manchester

for the degree of Doctor of Philosophy

in the Faculty of Science and Engineering

2019

Oludamilare Matthews

School of Computer Science

Contents

Abstract 11

Declaration 12

Copyright Statement 13

Acknowledgements 14

1 Introduction 15

1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.2 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.3.1 Research Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.3.2 Rationale for our research methodology . . . . . . . . . . . . . . 22

1.4 Contributions and research outputs . . . . . . . . . . . . . . . . . . . . 23

1.4.1 Artefacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.4.2 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.5 Research statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.6 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 Background and related work 28

2.1 Affective computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2 Progress in the lab, limited progress under practical settings . . . . . . 31

2.3 Affect detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.4 Selecting an affect detection mechanism . . . . . . . . . . . . . . . . . . 44

2.5 Review of affect detection mechanisms . . . . . . . . . . . . . . . . . . 46

2.5.1 Query Construction . . . . . . . . . . . . . . . . . . . . . . . . . 46

2

2.5.2 Exclusion criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.5.3 Inclusion Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.5.4 Quality assessment . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.5.6 Synthesis of related work . . . . . . . . . . . . . . . . . . . . . . 50

2.6 Rationale for pupillary response . . . . . . . . . . . . . . . . . . . . . . 60

2.7 Representing affect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2.8 Applications of affective computing . . . . . . . . . . . . . . . . . . . . 63

2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3 Development of AFA algorithm 68

3.1 Pupillometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.2 Exploring pupillary response data . . . . . . . . . . . . . . . . . . . . . 73

3.3 Description of pupil dilation . . . . . . . . . . . . . . . . . . . . . . . . 79

3.4 Existing approaches to the analysis of pupil data . . . . . . . . . . . . . 80

3.5 Iterative development of AFA algorithm . . . . . . . . . . . . . . . . . . 82

3.5.1 Study 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3.5.2 Study 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.7 Visualising the output of AFA algorithm . . . . . . . . . . . . . . . . . . 94

3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4 Evaluating AFA algorithm 104

4.1 Rationale and motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.2 Sensing emotionally evoked arousal . . . . . . . . . . . . . . . . . . . . 106

4.2.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

4.2.2 Limitations from our analysis of emotional stimuli . . . . . . . . 113

4.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

4.3 Sensing cognition-induced arousal . . . . . . . . . . . . . . . . . . . . . 117

4.3.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

4.3.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

4.3.3 Lessons learnt from the analysis of cognitive stimuli . . . . . . . 124

4.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

3

5 Sensing frustration-induced arousal on the Web 128

5.1 Why frustration? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

5.2 Related works on sensing frustration in interactive systems . . . . . . . 130

5.3 Research contributions through this study . . . . . . . . . . . . . . . . 132

5.4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.4.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.4.2 Materials and procedure . . . . . . . . . . . . . . . . . . . . . . 133

5.4.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

5.4.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

5.6.1 Limitations of this study . . . . . . . . . . . . . . . . . . . . . . 148

5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

6 Arousal detection and Scanpath analysis 150

6.1 Motivation behind our methodology . . . . . . . . . . . . . . . . . . . . 151

6.2 Pilot study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

6.3 Formation of the methodology . . . . . . . . . . . . . . . . . . . . . . . 155

6.3.1 Autism and the web . . . . . . . . . . . . . . . . . . . . . . . . 158

6.4 Research questions and contributions through this study . . . . . . . . 159

6.5 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

6.5.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

6.5.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

6.5.3 Materials and Method . . . . . . . . . . . . . . . . . . . . . . . 161

6.5.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

6.5.5 Visualizing our visual behaviour model . . . . . . . . . . . . . . 166

6.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

6.6.1 Analysis of the Web pages by their AOIs . . . . . . . . . . . . . 168

6.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

6.8 Limitations of the AFA algorithm-STA methodology . . . . . . . . . . . 185

6.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

4

7 Discussion and Conclusion 187

7.1 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

7.2 Design and methodological implications . . . . . . . . . . . . . . . . . . 191

7.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

7.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

7.4.1 User evaluation of our visualisation toolkit . . . . . . . . . . . . 195

7.4.2 The impact of light on AFA algorithm . . . . . . . . . . . . . . . 195

7.4.3 Combining AFA algorithm with other affect detection mechanisms 196

7.4.4 Optimizing AFA algorithm for real-time arousal sensing . . . . . 196

7.4.5 Extending AFA algorithm for adaptive systems . . . . . . . . . . 196

7.4.6 Utilizing AFA algorithm on mobile devices . . . . . . . . . . . . . 197

7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Bibliography 203

A Sensing Emotionally Evoked Arousal 249

A.1 Participant information sheet . . . . . . . . . . . . . . . . . . . . . . . 249

A.2 Consent form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

A.3 Post-study questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . 253

B Sensing Cognitively Induced Arousal 255

B.1 Participant information sheet . . . . . . . . . . . . . . . . . . . . . . . 255

B.2 Consent form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

C Sensing Frustration on The Web 259

C.1 Participant information sheet . . . . . . . . . . . . . . . . . . . . . . . 259

C.2 Consent form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

C.3 Post-study questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . 263

Word count xxxxx

5

List of Tables

2.1 Comparison of affect detection mechanisms . . . . . . . . . . . . . . . . 47

2.2 Affect detection mechanisms and their accuracies . . . . . . . . . . . . 48

2.3 Related work in theoretical findings, applications, methods or sensors

used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.1 Comparison of eye-tracking vendors . . . . . . . . . . . . . . . . . . . . 70

3.2 Statistical description of the pupil diameter . . . . . . . . . . . . . . . 75

3.3 Matrix showing the Pearson’s correlation of statistical feature. (L - left,

R - right, W - window, std - standard deviation) . . . . . . . . . . . . . 98

3.4 Matrix comparing our predictor variables(Exerpeince, Accuracy, Time

spent, Difficulty), and the total stress score with our outcome variable

(No. of Arousal points - peaks) . . . . . . . . . . . . . . . . . . . . . . 99

3.5 Expected arousal level (M ) vs. computed arousal level (Output) . . . 100

4.1 IAPS Stimuli showing the description, arousal, dominance and valence

values of each stimulus. . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

4.2 Correlation between the mean IAPS arousal rating, self-reported rating

and the algorithm’s arousal level (scaled between 1 and 5). . . . . . . . 110

4.3 Stimuli and expected arousal levels . . . . . . . . . . . . . . . . . . . . 119

5.1 Experimental tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

5.2 Results of Wilcoxon test comparing arousal between each task with

Bonferroni correction α = 0.008 . . . . . . . . . . . . . . . . . . . . . . 140

5.3 Results of Wilcoxon test comparing the mode of interaction within each

task with Bonferroni correction α = 0.0125 . . . . . . . . . . . . . . . . 141

6.1 Cumulative arousal per AOI on the Apple home page . . . . . . . . . . 155

6

6.2 Results of Mann Whitney U test comparing arousal between each group

(autistc and neurotypical) with Bonferroni correction α = 0.00625 . . . 167

6.3 Scan path Sequence (Seq), participants (n) with change in arousal

level per AOI , mean arousal (M ) for the participants and standard

deviation (SD) for the controlled and autistic group (ASD). NB: The

gaps in the table exist where there are fewer elements making up the

trending scanpath over a website for that group . . . . . . . . . . . . . 168

7

List of Figures

1.1 The process flow of affective computing in an adaptive system . . . . . 16

2.1 Psychological states, by duration [Bakhtiyari et al., 2014] . . . . . . . . 30

2.2 Number of results retrieved from Google Scholar, by year. . . . . . . . 31

2.3 EEG device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.4 Facial expressions for discrete emotions . . . . . . . . . . . . . . . . . . 61

2.5 2D representation of emotion . . . . . . . . . . . . . . . . . . . . . . . 62

2.6 Plutchik’s emotion wheel and cone representation of Plutchik’s emotions 63

3.1 Setup of eye tracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.2 Areas of interest overlaid on a 12-lead ECG . . . . . . . . . . . . . . . 74

3.3 Distribution plot on the left pupil diameter (mm) of correct participants 76

3.4 Plot of pupillary response (mm) against time (ms). . . . . . . . . . . . 78

3.5 Plot of pupillary response (mm) against time (ms) after applying smooth-

ing function with window size (d) = 1, 3, 5, 10. . . . . . . . . . . . . . 78

3.6 Heatmap to illustrate our predictor variables(Exerpeince, Accuracy,

Time spent, Difficulty), and the total stress score with our outcome

variable (No. of Arousal points - peaks . . . . . . . . . . . . . . . . . . 86

3.7 Areas of interest overlaid on a Protege’s UI . . . . . . . . . . . . . . . . 87

3.8 Top - input event, Bottom - probability of change point . . . . . . . . . 89

3.9 Execution flow of our arousal detection approach . . . . . . . . . . . . 90

3.10 Graph of Arousal Level against time (s) . . . . . . . . . . . . . . . . . . 90

3.11 A comparison of the raw pupil dilation extracted from the eye tracker,

with the processed arousal signal, after converting to arousal levels . . . 101

3.12 Arousal explorer tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3.13 Arousal toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

8

3.14 Modes of visualisation in our arousal toolkit . . . . . . . . . . . . . . . 103

4.1 Gaze behaviour across all 12 stimuli) . . . . . . . . . . . . . . . . . . . 111

4.2 Stimuli against the algorithm’s arousal rating, participant’s reported

feedback, and the IAPS arousal ratings . . . . . . . . . . . . . . . . . . 112

4.3 Correlation between the accuracy of the algorithm and the minimum

task duration per participant . . . . . . . . . . . . . . . . . . . . . . . . 114

4.4 Correlation between the accuracy of the algorithm and the maximum

tasks allowed per participant . . . . . . . . . . . . . . . . . . . . . . . . 114

4.5 Stimuli against the algorithm’s arousal rating, participants’ reported

feedback, and the IAPS arousal ratings . . . . . . . . . . . . . . . . . . 115

4.6 Stimuli for stroop’s effect . . . . . . . . . . . . . . . . . . . . . . . . . . 118

4.7 Heatmap showing the aggregated fixation on AOIs of each stimulus . . 120

4.8 Bar chart showing the total fixation count mean, for congruent and

incongruent object naming across all stimuli . . . . . . . . . . . . . . . 120

4.9 Bar chart showing the total fixation duration mean (s) for congruent

and incongruent object naming across all simulli . . . . . . . . . . . . . 121

4.10 Box plot showing the data distribution of the output of the algorithm

for each stimulus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

4.11 Violin plot showing the data distribution of the output of the algorithm

for each stimulus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5.1 Disruption to tasks to elicit frustration: T1. Time-out experienced

when booking a trip, T2. Mouse location altered when selecting weather

information, T3. Operating system error during Google search and T4.

Multiple Pops ups interrupting Wikipedia content lookup . . . . . . . . 137

5.2 Violin plot of the data distribution of the level of arousal in all tasks

for both groups (disruptive and normal) . . . . . . . . . . . . . . . . . 140

5.3 Bar chat with error bars (Standard error of the mean) showing the tasks

(both modes of interaction combined) vs level of arousal . . . . . . . . 141

5.4 Bar chat with error bars (Standard error of the mean) showing all tasks

vs level of arousal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

5.5 Number of fixations vs level of arousal (all observations n=75) . . . . . 145

9

6.1 Apple home page segmented into AOIs . . . . . . . . . . . . . . . . . . 153

6.3 Levels of arousal from the austistic and neurotypical group per AOI for

the Whatsapp Web page . . . . . . . . . . . . . . . . . . . . . . . . . . 170


the Amazon Web page . . . . . . . . . . . . . . . . . . . . . . . . . . . 171


the Wordpress Web page . . . . . . . . . . . . . . . . . . . . . . . . . . 172


the Netflix Web page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173


the BBC Web page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174


the YouTube Web page . . . . . . . . . . . . . . . . . . . . . . . . . . . 175


the Adobe Web page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176


the Outlook Web page . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

6.11 Levels of arousal from each group’s trending scan path, overlaid on the

AOI’s of each Website. . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

10

The University of Manchester


Doctor of Philosophy

SENSING PHYSIOLOGICAL AROUSAL AND VISUAL ATTENTION

DURING USER INTERACTION

October 30, 2019

Arousal is a psychophysiological state that is characterised by increased attention and

alertness. Arousal detection is paramount during user interaction because arousal

influences perception, cognition and performance, which all have significant impacts

on user experience (UX). Self-reported means of measuring arousal are manual and

prone to bias. Behavioural modes of sensing arousal, such as the analysis of voice

prosody, keystroke dynamics and body gestures yield inconsistent results when ap-

plied in different applications. Physiological sensors for detecting arousal such as

electroencephalograms and galvanic skin response are sensitive to confounding fac-

tors like motion and temperature. Recent studies have leveraged multimodal arousal

detection to improve detection accuracy. However, due to the cost of purchasing addi-

tional sensors, skills to set them up and the availability of all the sensors, multimodal

arousal detection has limited potential for widespread use. These modes of arousal

detection also provide limited visual context about users’ measure of arousal. We use

eye trackers to collect pupillary response and gaze behaviour data. The analysis of

pupillary response is used to sense changes in arousal while gaze detection reveals the

visual context, i.e., the user’s focal attention during moments of increased arousal.

To improve generalisability, our approach was developed and evaluated using multiple

eye-tracking datasets containing known causes of arousal. Despite the limitation of

our approach (i.e., sensitivity to light changes), results suggest that our approach can

be used to sense arousal of several forms (cognitive load, emotional and frustration).

Furthermore, our unimodal approach detects users’ focal attention during moments of

increased arousal. As web cameras with eye-tracking abilities become more accessi-

ble, there is increased potential for the widespread use of our technique in the wild.

Unobtrusive arousal sensing opens up opportunities for UX researchers, UI designers

and software developers in adaptive computing, affective gaming, intelligent tutoring

systems, user modelling and recommender systems.

11

Declaration

No portion of the work referred to in the thesis has been

submitted in support of an application for another degree

or qualification of this or any other university or other

institute of learning.

12

Copyright Statement

i. The author of this thesis (including any appendices and/or schedules to this thesis)

owns certain copyright or related rights in it (the “Copyright”) and s/he has given

The University of Manchester certain rights to use such Copyright, including for

administrative purposes.

ii. Copies of this thesis, either in full or in extracts and whether in hard or electronic

copy, may be made only in accordance with the Copyright, Designs and Patents

Act 1988 (as amended) and regulations issued under it or, where appropriate, in

accordance with licensing agreements which the University has from time to time.

This page must form part of any such copies made.

iii. The ownership of certain Copyright, patents, designs, trade marks and other intel-

lectual property (the “Intellectual Property”) and any reproductions of copyright

works in the thesis, for example graphs and tables (“Reproductions”), which may

be described in this thesis, may not be owned by the author and may be owned by

third parties. Such Intellectual Property and Reproductions cannot and must not

be made available for use without the prior written permission of the owner(s) of

the relevant Intellectual Property and/or Reproductions.

iv. Further information on the conditions under which disclosure, publication and com-

mercialisation of this thesis, the Copyright and any Intellectual Property and/or

Reproductions described in it may take place is available in the University IP Policy

(see http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=487), in any rele-

vant Thesis restriction declarations deposited in the University Library, The Univer-

sity Library’s regulations (see http://www.manchester.ac.uk/library/aboutus/regul-

ations) and in The University’s Policy on Presentation of Theses.

13

Acknowledgements

First and foremost, I would like to thank God for seeing me through this program.

I appreciate Dr Simon Harper, my supervisor and Dr Markel Vigo, my co-supervisor,

who gave me their time, sharing their vast knowledge, believing in me, and guiding me

through this great endeavour. I am privileged to be led by both of you to this great

potential achievement.

I thank the members of the Interaction Analysis and Modelling (IAM) lab for listening

to my presentations, providing constructive criticism, sharing ideas, and making the

lab a conducive place to study. I especially like to thank Alan Davies for dropping

brilliant ideas and providing technical support to my work. Thank you to Julio, Julia

to Rob for being co-reviewers for my systematic reviews. I appreciate Aitor for always

being resourceful and Manuelle for helping me out with statistics. I like to appreciate

Dr Sarah Clinch, Dr Jorge Goncalves and Zhanna for facilitating my research visit to

Melbourne, Australia, where I was able to collaborate with and gain exposure to fellow

researchers in the domain. To my ever-loving wife, Bubu, thank you for supporting

me and being understanding, especially at times that I had to work late at night. I am

deeply indebted to you. I especially thank my family Dr (Gen.) Olusegun Matthew,

Mrs Gloria Matthew, Tomi and Luwa for their encouragement and backing all through

my study. I like to thank the family of Engr. and Mrs Cole, for their love, prayers,

gifts and visits during the period of my study. I also like to thank the Puka-Chaps

family, as Manchester would not have felt the same without you. I wish to appreciate

the leadership and members of The Grateful Church (TGC) for their prayers, spiritual

guidance and communal fellowship.

Finally, I like to thank the National Information Technology Development Agency

(NITDA) for funding my PhD program.

14

Chapter 1

Introduction

The quality of user interaction is often measured using metrics such as error rates,

task completion times, dwell time, fixation and saccades [Mullins and Treu, 1991].

These metrics do not fully account for the subjective experience of the users, i.e., their

emotion [Scholtz, 2006]. Affective computing is a domain in HCI and psychology that

relates to the detecting and understanding of the users’ emotional state to improve the

quality of their user interaction [Picard, 1997]. An application of affective computing

is adaptive systems [Dalvand and Kazemifard, 2012]. In adaptive systems, the content,

structure or layout of a website is altered based on the users’ affective response, to

induce a more desirable emotional state [Sommer et al., 2014]. For adaptive computing

to be effective, certain events need to initiate the adaptive engine [Sommer et al., 2014].

For instance, an adaptive system based on physiological sensors. When the arousal

level of a user reaches a certain threshold while fixating on a text with small font,

the font size could be magnified to ease the discomfort of the user [Liu et al., 2014].

Adaptive systems in intelligent tutoring systems could be utilised in situations where

difficult questions cause users to experience frustration-induced arousal [Merrill et al.,

1992]. For this, an adaptive difficulty system could be deployed by setting triggers

where the adaptive engine fetches a less difficult question so that the user does not

drop out [Liu et al., 2009]. Figure 1.1 illustrates the process flow of an affect-enabled

adaptive system. The process begins with affect detection, where emotions are sensed,

then, the context in which the user experiences emotion is identified. Based on this

context, an intervention is carried out to transform the users’ emotional state into a

more desirable one. Most adaptive systems are cyclic because emotions are sensed

15

CHAPTER 1. INTRODUCTION 16

Figure 1.1: The process flow of affective computing in an adaptive system

again, to evaluate the impact of the previous intervention [Dalvand and Kazemifard,

2012]. Since the process often begins and ends with affect detection, detecting affect

accurately and in an ecologically valid manner is of great value. Affect detection could

be used to prevent undesired outcomes such as fatal human errors, cognitive overload,

disinterested users and extreme emotional states (e.g., boredom and frustration) [Liu

and Joines, 2012]. The scope of our research is therefore limited to sensing

arousal during user interaction.

Emotions can be represented in several forms. Albert Mehrabian et al. proposed the

Pleasure-Arousal-Dominance (PAD) psychological model of emotions. In this model,

arousal represents the intensity of the emotion. Pleasure refers to hedonic (valence)

nature of the emotion, while dominance indicates whether it is a dominant emotion

like anger or a submissive one like fear [Mehrabian, 1996]. Arousal can be used as a

proxy to sense frustration, stress, anxiety, alertness attention and interest, which are

important factors during user interaction [Russell and Pratt, 1980]. We focused our

affect detection on the arousal component of emotions due to its impact on the quality

of user experience [Van Schaik and Ling, 2008].

Psychlopedia, a web-based encyclopaedia for psychology defines physiological arousal

as “neural, hormonal, visceral, and muscular changes that happen in the body when

it is emotionally stimulated” [Psychlopedia, 2018]. Physiological arousal is the state

of being alert to increased perception [Pengnate, 2016]. The sympathetic nervous

system, responsible for fight or flight, is activated upon an increase in arousal [Bradley


et al., 2017]. Therefore, any measure from physiological indicators such as sweat on

the skin, respiration, blood circulation and pupil size can be a proxy for measuring

arousal [Torres-Valencia et al., 2014]. However, there are many challenges in detecting

arousal during human interaction.

1.1 Problem statement

Arousal detection is researched extensively in affective computing literature, and sev-

eral results report high accuracy in detection [Calvo and D’Mello, 2010]. In spite of

the numerous techniques with high accuracy reported in the literature, the challenge

lies in integrating these solutions to fit real-world applications [Reeshad Khan, 2017].

An underlying issue is the choice of the affect detection mechanism, hence, our top-

down approach. For example, the most simple approach is self-reported emotions.

Self-reported mechanisms are carried out using diaries, questionnaires, surveys, e.t.c.

Even in digital self-reported approaches, people are still required to input their emo-

tions [Vega et al., 2018]. The manual approach increases the likelihood of introducing

bias. Biases include acquiescence [Ross and Mirowsky, 1984], demand characteristics

[Nichols and Maner, 2008], extreme responding [Hui and Triandis, 1989] and social

desirability [Fisher, 1993]. Also, there is an increased likelihood for the user to drop

out in longitudinal use, and the limited potential for use in interactive systems [Stieger

et al., 2017]. Despite the limitations in self-report, they are still regarded as the golden

standard for evaluating the accuracy other affect detection mechanisms in controlled

settings [Lang, 2005]. In all our experiments, we will adhere to this practice of cap-

turing self-report as it is still regarded as the golden standard for evaluating affect

detection mechanisms [Broekens and Brinkman, 2013]. In addition, we cross-validated

our self-reported measure against controlled tasks to assess and confirm the validity of

our ground truth. The limitations of self-report mentioned above, have led researchers

to investigate automatic ways of detecting arousal.

Automatic arousal sensing can be classified into two: 1. Behavioural/physical ap-

proaches and 2. Physiological approaches. In both classes, they require sensors to

capture the users’ response, and an algorithm running on a computer, to extract the

affective signal. Behavioural/physical approaches utilise computer peripherals to act as


sensors. Examples of behavioural/physical sensors include audio interface (voice) [Er-

dem and Sert, 2014], mouse (mouse motion dynamics), keyboard (keystroke dynamics)

[Kolakowska, 2013] and camera (gestures and facial expression) [Setyati et al., 2012].

The challenge is that computer peripherals are used in application-specific ways and

may yield inconsistent responses when deployed in other applications, which is a limita-

tion to its external validity. For example, not all applications require an audio, mouse

or keyboard interface. Also, emotional reactions of low intensity may not be captured

by facial expressions and body gestures. Physiological mechanisms make use of de-

vices that are less ubiquitous such as electroencephalograms (Electroencephalogram

(EEG)) [Gao and Wang, 2015, Torres-Valencia et al., 2014], muscles electrical activ-

ities (electromyography - Electromyography (EMG)) [Soleymani et al., 2008] heart

rate (Heart Rate (HR)), and galvanic skin response (Galvanic Skin Response (GSR))

sensors [Kosir and Strle, 2017]. Physiological sensors are less likely to become widely

adopted than physical/behavioural sensors due to their cost to purchase, skills to set

up and obtrusiveness [Gollan et al., 2016]. Also, physiological sensors are sensitive

to motion, light and temperature changes. Further, inconsistencies such as individ-

ual idiosyncrasies and within-person changes introduce noisy data [Petrantonakis and

Hadjileontiadis, 2010]. To address some of the challenges in automatic arousal sens-

ing (generalisability, noise, confounding factors), researchers have explored the use of

multiple sensors (multi-modal affect detection systems) [Zhang et al., 2014]. However,

combining multiple sensors increases the complexity to set up, obtrusiveness and the

cost of purchase [Lu et al., 2015]. As we stated earlier, arousal detection has yielded

high accuracy in the literature. However, accuracy may be of limited value in the wild

if there is limited potential for naturalistic use. Therefore, we break down the chal-

lenges that have limited the application of affect recognition in naturalistic settings,

following this order:

(PS1) Potential for ubiquitous use

We qualify this requirement with the word ‘potential’, as a caveat to include

arousal detection mechanisms that have more accessible alternatives (low-cost

sensors and ubiquitous for end-users). This caveat includes sensors that can be

used in the wild but may not be as accurate as the laboratory-grade counterpart.


(PS2) Generalisability of the solution

Arousal sensing devices that would yield consistent results in different interac-

tions contexts (application, stimuli and users).

(PS3) Accuracy of detection

Accuracy will be measured in relation to already established ground truth such

as self-report, domain expert’s evaluation of interaction, and other proxies of

arousal (e.g. cognitive load and task difficulty ratings).

We propose tackling this problem in this order so that first and foremost, any proposed

solution has the potential for ecological validity. Accuracy can be improved iteratively

through data-driven approaches, whereas ubiquity has a broader scope. Ubiquity

includes sensor availability, skills to set up and perceived comfort to the user, which

we have less control over. Similarly, the generalisability of our solution can be improved

upon by computational techniques and further evaluations can be done on other stimuli

types.

Our research aims to address the research questions stated in the next section.

1.2 Research questions

We aim to develop an approach to sense arousal and identify the visual attention

of users during moments of change in arousal in an ecologically valid way by using

a mechanism that has the potential for widespread ubiquitous use. Therefore, our

research questions are broken down into three broad questions:

RQ1. What method(s) can be used to sense physiological arousal during user interac-

tion, in an ecologically valid way?

Despite the high accuracy in arousal detection reported in the literature, most of

the approaches have limited potential for widespread, ubiquitous use [Calandra

et al., 2016]. Current limitations include the required skills to set up the sensors,

obtrusiveness, cost and availability of sensors. We performed a literature review

to extract the existing methodologies for affect detection, to select the most

suitable affect detection mechanism that satisfied these criteria, for ecological

validity.


RQ2. With the constraints in RQ1, how much accuracy in arousal detection can be

achieved, for different causes/forms of arousal?

There are different reasons for increased arousal during user interaction [Raiturkar

et al., 2016]. This research question can, therefore, be broken down into gen-

eralisability across various visual stimuli. We categorise them into emotional

and cognitive stimuli. We also examine low-intensity arousal. Particularly, low-

intensity arousal can be triggered when users become frustrated on the Web

[Lazar et al., 2006a]. To ensure that our method is generalisable, we evaluated it

using emotionally-evoked, cognitively-induced, and frustration-induced arousal.

RQ3. Can the method that was selected in RQ1 be used to determine the visual con-

text (i.e. the users’ focal attention and visual scan sequence) during moments of

increased arousal?

While RQ1 and RQ2 address the issue of validity and accuracy, RQ3 evaluates

the extensibility of AFA algorithm. In other words, if we identify moments of

increased arousal using the method, how much about the context of the user’s

interaction can we know, so that adaptation/interventions are possible? We

combined our method with an existing algorithm, to model users’ visual be-

haviour not only in terms of their affective response but also according to their

aggregated scan paths. For affect detection to have the desired impact, the con-

text of why users feel the way they feel is important information [Michailidou

et al., 2008]. Therefore, for adaptive computing and recommender systems, our

approach is capable of sensing the user’s focal attention when they experienced

their affective state. Added context should be fed into third-party applications

in the wild so that smarter (better informed) interventions can be carried out

to improve the quality of user interaction [Matthews et al., 2019b]. Some re-

lated work has been done in mapping the user’s attention to their measure of

arousal. For example, Wang et al. made use of mouse motion to detect users’

attention[Wang et al., 2019]. Our work complements Wang et al.’s but makes

use of users’ visual attention.


1.3 Methodology

This section aims to report our research methodology by considering the application

category, objectives and the insight our research adds to computer science.

1.3.1 Research Paradigm

A current challenge in the field of affective computing is in applying affect detection

in naturalistic settings [Sanchez et al., 2018]. Our research is aimed at progressing the

domain further by developing an algorithm for affect detection that has the potential

for ubiquitous widespread use. According to Baban et al.’s philosophy of research

methodology, our research falls into the category of applied research [Baban et al.,

2009].

Mixed method approaches have become popular in computer science, especially as it

is often a blend of mathematics and engineering [Demeyer, 2011]. Our research has

multiple objectives at different phases, including establishing a mathematical model

and developing an algorithm to sense arousal based on this model. Therefore, we

take a mixed-methods approach [Johnson et al., 2007]. Throughout our research,

we made use of descriptive, exploratory, correlation, data collection, explanatory and

analytical research methods. We started by identifying and explaining our research

problems through our literature review. Next, we take an iterative analytical approach

to solve the problem, refining our solution over each iteration. Our analytical approach

made use of an iterative data-driven process where we collected and examined eye-

tracking data to establish a correlation between examined pupil dilation and gaze data

pupil dilation, against arousal and visual attention. In a bid to make our approach

generalisable, we took an explanatory approach at each stage of our research in order to

explain the reasons behind our findings to improve our model in subsequent iterations.

We also evaluated our model using correlations and explained our results at every phase

of evaluation.

In addressing our objectives, we make generalisable deductions. According to Denicolo

et al., our research follows the positivism/post-positivism research paradigm [Denicolo

and Becker, 2012]. In the next subsection, we revisit our problem statements to justify

our choice of methodology.


1.3.2 Rationale for our research methodology

We tackled the challenge of ecological validity in PS1, through the selection of our

detection mechanism. Our considerations included the potential for use in interac-

tive systems, obtrusiveness, skills required to set up, generalisability, the potential for

capturing the users’ interaction context and availability of the sensor [Ragot et al.,

2017]. Therefore, uni-modality of sensors was a desirable factor in our selection pro-

cess. We are aware that technology evolves rapidly. Therefore, our emphasis was on

their ‘potential’ for widespread use, rather than the current prevalence. For example,

web-cameras with eye-tracking capabilities exist but are not yet accessible due to their

cost. We carried out a review to select an affect detection mechanism that fulfilled

those criteria. To improve generalisability (PS2), we developed our analysis technique

iteratively, using datasets from different stimuli. For accuracy (PS3), our data-driven

methodology account for confounding factors like idiosyncratic differences, lag in re-

sponse while retaining its generalisability by re-evaluating our solution over different

stimuli types.

Following our systematic selection of arousal detection mechanism, we decided on the

use of pupillary response as a means to sense arousal during user interaction. In vi-

sual researches, eye trackers can be used to capture the size of the pupil upwards of

50 times per second (50Hz) [Simola et al., 2015]. Web cameras with the ability to

track gaze behaviour could become more accessible in future; hence, the potential for

widespread use. Eye trackers also detect fixations, which are prolonged visual gaze

on a single location. Contrary to self-reported emotions, pupil dilation is not easily

prone to bias because it is a measure of autonomic activities, which are non-deliberate

responses that can not easily be faked [Bradley et al., 2017]. It is unobtrusive and adds

further context to affect detection because we can use the same device that captures

pupil dilation to capture the users focal attention [Tangnimitchok et al., 2018]. Cap-

turing gaze behaviour makes AFA algorithm unique because if we determine the users’

visual attention during moments of change in arousal, the designer may have a more

informed idea on how to improve user interaction. For the analysis, we developed the

algorithm to sense arousal by modelling the users’ baseline into fixed non-overlapping

window sizes. Next, we use peak detection to sense an increase in the arousal level.

We then identify the area on the screen with the most fixation during the moments


of increased arousal while also accounting for the time lag in pupillary response. The

fixation duration and the magnitude of change in pupil dilation are used to compute

the measure of arousal that the user has experienced [Simola et al., 2015]. To en-

sure generalisability, we evaluated the algorithm on its ability to sense arousal due to

cognitive load, emotional stimuli and frustration during web interaction. We under-

stand that arousal can mean different things depending on the context [Psychlopedia,

2018]. For interaction designers and UX researchers, it is important to consider the

user’s focal attention, the time and the measure of arousal experienced [Iqbal et al.,

2004, Partala and Surakka, 2004]. Therefore, we designed a visualisation to facilitate

hypothesis generation. The results of our analysis, evaluating different stimuli using

this algorithm show promise. Lab-based eye-tracking studies can input their data into

the algorithm and visualise the arousal and focal attention of participants, which com-

plements existing methods in usability studies. With further work, more evaluation,

and the advent of high fidelity web cameras, we anticipate that this algorithm can be

used to sense arousal in real-time and in naturalistic settings.

1.4 Contributions and research outputs

Our contributions are to the field of computer science, specifically human-computer

interaction. They are as follows:

1. A mechanism to sense arousal with the potential for unobtrusive

widespread use.

Addressing RQ1 in Chapter 2 and 3 has lead to the selection and implementa-

tion of an arousal detection mechanism that has the potential for unobtrusive

wide-spread use. Several studies review existing mechanisms for stress detection

[Greene et al., 2016, Sioni and Chittaro, 2015], cognitive load [van der Wel and

van Steenbergen, 2018, Einhauser, 2017] or the use of physiological sensors in

general [Brouwer et al., 2015]. We focus on arousal sensing and the potential for

widespread ubiquitous use.

2. The generalisability of our approach to different visual stimuli.

Addressing RQ2 in Chapter 4 and 5 shows evidence of the generalisability of our

approach. Generalisability of affect detection is a problem in affective computing


[Schuller et al., 2010]. Results show that our approach can be used to sense

cognitively induced arousal [Matthews et al., 2018d], emotionally evoked arousal

[Matthews et al., 2018b] and frustration on the Web (Chapter 5).

3. Sensing arousal with the context of the user’s focal attention.

Mapping arousal to the user’s focal attention in Chapter 5 and 6 demonstrates

that our algorithm detects the user’s focal attention during moments of increased

arousal.

4. An extensible algorithm to create a richer understanding of visual and

affective behaviour on the Web.

We demonstrated in Chapter 6, that our algorithm can be combined with other

algorithms to derive a broader understanding of user’s affective behaviour and

visual scan patterns on the Web. To the best of our knowledge, this is the first

work that combines visual scan path with arousal to model the behaviour of a

group of users on the web [Matthews et al., 2019b].

1.4.1 Artefacts

In the course of our research, we have produced the following data and software arte-

facts:

1. A working software that analyses pupil data from an eye tracker to generate

arousal points and users’ visual attention during the moment of increased arousal.

2. A tool to visualise the output of AFA algorithm, for hypothesis formulation in

usability and UX researches.

3. Eye-tracking datasets (105 participants) from three experiments with 24 stimuli

that can be used to improve the methods and algorithms in AFA algorithm.

The datasets for the three studies can also be used for further exploration in a

secondary analysis.

1.4.2 Publications

Six publications (five 1st author).


1. Combining Trending Scan Paths with Arousal to Model Visual Be-

haviour on the Web: A Case Study of Neurotypical People vs People

with Autism

Oludamilare Matthews, Sukru Eraslan, Victoria Yaneva, Alan Davies, Yeliz

Yesilada, Markel Vigo, Simon Harper

UMAP 2019, Cyprus [Matthews et al., 2019b]

2. Unobtrusive Arousal Detection on The Web Using Pupillary Response

Oludamilare Matthews, Alan Davies, Markel Vigo, Simon Harper

IJHCS 2019 [Matthews et al., 2019a]

3. Sensing Arousal and Focal Attention During Visual Interaction

Oludamilare Matthews, Markel Vigo, Simon Harper

ICMI 2018, USA [Matthews et al., 2018c]

4. Towards Arousal Sensing With High Fidelity Detection of Visual Focal

Attention

Oludamilare Matthews, Markel Vigo, Simon Harper

Measuring behaviour 2018, United Kingdom [Matthews et al., 2018d]

5. Inferring the Mood of a Community From Their Walking Speed: A

Preliminary Study

Oludamilare Matthews, Zhanna Sarsenbayeva, Weiwei Jiang, Joshua Newn,

Eduardo Velloso, Sarah Clinch, Jorge Gonalves

Ubicomp 2018, Singapore [Matthews et al., 2018a]

6. Moodbook: An Application for Continuous Monitoring of Social Me-

dia Usage and Mood

Heng Zhang, Shkurta Gashi, Hanke Kimm, Elin Hanci, Oludamilare Matthews

Ubicomp 2018, Singapore [Zhang et al., 2018]

7. Ubiquitous Mobile Sensing: Behaviour, Mood, and Environment

Aku Visuri, Kennedy Opoku Asare, Elina Kuosmanen, Yuuki Nishiyama, Denzil

Ferreira, Zhanna Sarsenbayeva, Jorge Gonalves, Niels van Berkel, Greg Wadley,

Vassilis Kostakos, Sarah Clinch, Oludamilare Matthews, Simon Harper, Amy


Jenkins, Stephen Snow, m. c. schraefel

Ubicomp 2018, Singapore [Visuri et al., 2018]

1.5 Research statement

Our motivation is based on a lack of ecologically valid approaches to sensing arousal

during user interaction [Ragot et al., 2017]. Our mission is to develop an algorithm

to sense arousal using a mechanism that has the potential for ubiquitous widespread

use. Our vision is to utilise our algorithm to drive adaptive systems in making in-

terventions to the context, layout and contents of user interaction based on the levels

of arousal and their focal attention during moments of undesirable states of arousal.

We have developed the AFA (Arousal and Focal Attention) algorithm and evaluated

it to assess its accuracy. Our results show that our algorithm can be used to extract

arousal levels and focal attention of users from eye-tracking datasets. With future web

cameras that have the capabilities of eye trackers, our vision can be accomplished with

some optimisations to our algorithm to improve on its accuracy, and to function in

naturalistic settings.

1.6 Thesis structure

In Chapter 2, we start with an overview of the research domain - affective computing,

appraising its progress in the lab vs in naturalistic settings. We delve deeper into our

scope - affect detection, highlighting the challenges that have limited the potential of

sensing affect. Next, we present our criteria for selecting our preferred affect detection

mechanism, and the rationale for our choice - pupillary response. Then, we present

the result of our literature review. Finally, we discuss the various ways that affect can

be represented and the applications of affective computing.

In Chapter 3, we discuss our research methods in detail. We talk more about pupil-

lometry devices and the nature of the pupillary response and pupillometry sensor data.

Next, we discuss the existing approaches for analysing pupil data. We then present our

approach and the iterative development of our approach using datasets from different

sources and complexity. We presented the implementation of our algorithm, which we


named AFA algorithm (Arousal and Focal Attention Algorithm). Finally, we discussed

how we designed and developed a visualisation toolkit for AFA algorithm.

In Chapter 4, we evaluated AFA algorithm on its ability to sense static stimuli (e.g.,

pictures). We evaluated AFA algorithm on arousal caused by emotive contents as well

as cognitively induced forms of arousal. Our results show that AFA algorithm senses

arousal with a moderate to a strong level of correlation to our ground truths.

In Chapter 5, we evaluated AFA algorithm on sensing arousal on the Web. We dis-

cussed how Web interaction data provide a more difficult challenge than static images.

Further, we made use of ecologically valid stimuli, as we injected common causes of

frustration on the Web. Our results show that AFA algorithm discriminates between

normal and frustrating tasks with a strong effect size.

In Chapter 6, we demonstrated that AFA algorithm could be combined with existing

methodologies to give a richer understanding of users’ behaviour. We combined AFA

algorithm with Scanpath Trending Analysis (STA) algorithm to create a new method-

ology. We used the case of people with autism vs neurotypical people to demonstrate

the novelty of our methodology, as the combination of AFA algorithm and STA algo-

rithm was used to model the affective state and the visual scan behaviour of these two

groups on the Web.

In Chapter 7, we revisited our research questions and highlighted how our aims and

objectives lead us in addressing them. We also presented the limitations to our ap-

proach. Further, we propose future work and potential research pathways that our

research can take. Finally, we presented concluding remarks.

Chapter 2

Background and related work

This chapter starts by providing an overview of affective computing and clarifying

terminologies frequently used or misused in the domain. Section 2.2 then proceeds to

an appraisal of progress made in affective computing, generally characterised by many

laboratory studies but few ecologically valid methodologies. Sequel to that, affect

detection mechanisms are categorised, and 12 of the most common mechanisms are

discussed with the aim of highlighting their applications, pros and cons in section 2.3.

Critical evaluation of the existing affect detection mechanism then leads us to propose

pupillary response as our preferred choice for affect detection in section 2.6. After

that, in section 2.5, results of our literature review of affect detection mechanisms is

presented. Next, we discuss related work. In 2.7, theories and methods of representing

affect are discussed while highlighting application areas and challenges associated with

each one. Following that, domain areas in which affective computing has been studied

with example applications are reviewed in section 2.8. Finally, section 2.9 contains

conclusion on this chapter.

2.1 Affective computing

Although affective computing is an interdisciplinary field encompassing computer sci-

ence, neuroscience, psychology and physiology, early questions under philosophy have

been asked about emotions (e.g. the definition of emotions) much before the inception

of the affective computing domain [Ekman, 1992a, Ekman, 2004]. Affective comput-

ing revolves around affect detection, inducing affective states and the expression of

28

CHAPTER 2. BACKGROUND AND RELATED WORK 29

emotions by machines [Picard, 1997].

Affect, emotions and feelings may have been used interchangeably in the literature,

yet they have important differences often misapplied and sometimes misunderstood by

neophytes and even experts [Cromby, 2012]. An affective state consists of a set of

psychophysiological patterns/state during a defined period [Picard, 2003]. If the sub-

ject cognitively recognises an affective state to be a unique psychological state, then

it can be reported as a feeling [Cromby, 2012]. Hence, we often see people fill logs

in diaries saying, ‘I felt happy after seeing my test result’ or during product reviews

say ‘I was disappointed after trying it out’ or, ‘I feel scared’ [Feinstein et al., 2011]. If

this feeling is made observable through behaviours, gestures, facial expressions, voice

prosody or attitudes, then we say that an emotion is expressed [Izard et al., 1987].

An affect may or may not be a familiar feeling, and as human beings become more

emotionally intelligent, we know when to express, suppress, fake or exaggerate an emo-

tion [Ashkanasy and Daus, 2002]. As a result, the distinction between feelings and

emotions can be understood from machines. They do not have one but can be made

to express the other [Ekman et al., 1987, Davidson, 2003]. To summarise these differ-

ences, everybody experiences affect, not all affective states can be recognised feelings

(notably infants), and finally, emotions are social expressions of our psychological state

(although they can be faked) [Harris et al., 2000].

Other related terms include expressions, autonomic changes, attitude, self-reported/full-

blown emotions, mood, emotional disorder, traits. However, these terms were differ-

entiated in a study by Bakhtiyari et al. [Bakhtiyari et al., 2014] using their temporal

dimension, i.e. how long the psychological states last. Figure 2.1 illustrates this. The

self-reported (full-blown emotion) is the psychological state of a person that remains

dominant between few minutes to hours [Matsumoto, 1993]. In practice, it is not easy

to isolate the affective state of an individual from other psychological phenomena. The

wrong approach in detecting affect is to assume that one affect detection model would

fit everybody’s affective state [Cowie et al., 2001]. This approach is wrong because

individual personality traits, psychological disorders and moods could obscure the af-

fective state of a person [Fragopanagos and Taylor, 2005]. Also, short-lived variations

in personal attitude, autonomic changes and expressions could intensify an affective


state. For example, a person may be an introvert by nature, going through a psycho-

logical disorder and experiencing a negative mood. The same individual may express a

brief moment of joy while watching a short video clip on the Web. Characterising this

state depends on the window of time being examined, the person’s baseline, amongst

other factors [Oatley et al., 2006]. Hence, when detecting affect, these factors should

be taken into consideration to avoid inferring other psychological phenomena that are

not purely affective states.

Some progress has been made in affective computing, and several applications have

Figure 2.1: Psychological states, by duration [Bakhtiyari et al., 2014]

leveraged on affective computing and detection of the user’s affective state. Some

of them include Affective Tutoring Systems (ATS), affect enhanced gaming, recom-

mender/helper systems and in computer science, user interfaces.

The general concept in these applications is to factor the user’s emotional/affective

state into the way the machines interact with the users. This is because human be-

ings are used to interactions that feature empathy and emotional intelligence during

a human-human interaction [Picard, 2010]. Other ways affective computing is be-

ing applied include the use of computer agents (robots, emoticons/avatars) to induce

emotions [Burleson and Picard, 2004]. For instance, enabling them with the ability to

make decisions or express themselves in ways, they would have otherwise done if they


had feelings just like human beings [Ahn and Picard, 2005]. The lack of emotional

intelligence in machines has somewhat limited the potentials of human-computer in-

teraction and by a wide margin, detecting human affective state has been the most

difficult challenge in empowering machines with emotional intelligence [Picard, 1997].

The next section shows that the study of affective computing is on a steady growth;

however, how has this translated into real-life applications?

2.2 Progress in the lab, limited progress under prac-

tical settings

A lot of progress has been made in the study of affective computing. In the last fifteen

years, we can see in figure 2.2, an average increase of M = 1313.57(SD = 735.52) in

the number of results the phrase ‘Affective Computing’ returns from Google Scholars

search engine, by year. It is important to note that this result is not accumulated

Figure 2.2: Number of results retrieved from Google Scholar, by year.

results till the given year, but number of results only in that year. Despite the in-

crease in the number of literature as seen in 2.2, a corresponding increase in end-user

applications of affective computing has not manifested because affect-aware systems


are still rare to come by [Epp et al., 2011]. Many of the studies have been focused

on inducing participants with affect stimulus. Then, confirming that certain affect

detection mechanisms can find a correlation [Siraj et al., 2006], classify or detect a

predetermined affective response to the induced stimuli [Abbasi et al., 2010]. Only a

few of these studies use methodologies that are reusable for real-life application [Epp

et al., 2011].

To this end, sufficient research already supports claims that affective state can be de-

tected using several methods [Xu et al., 2015]. Hence, future efforts should be diverted

to applying affective computing in ways that can be deployed in real-life applications

to improve human-computer interaction. Few applications have succeeded in using af-

fective computing to enhance interaction, but their methodologies work under limited

applications [Ragot et al., 2017]. This limitation is mainly due to their choice of affect

detection mechanisms, and the next section attempts to highlight why this is so.

2.3 Affect detection

Recognising affect by ourselves (human recognition) seems intuitive. However, this

is done by observing behavioural cues such as body gestures, facial expressions, voice

prosody. Other cues enable us to clarify and contextualize our judgement about human

emotions. For example, the nature of the relationship between people in conversation,

the task being carried out. The innate personality of a person and prior events can

serve as a baseline, and the deviation from this indicates the severity of a person’s

emotion. After detecting this, reporting it is also non-trivial [Hassan, 2006]. It is

sometimes prone to bias [Fisher, 1993]. Some of the challenges in self-reported emo-

tions include the challenges in representing emotional states in standard, uniform and

detailed ways [Levine and Safer, 2002]. Sometimes, our feelings are too subtle to de-

tect in cases of low-intensity emotions; other times, they are too entangled in cases of

mixed and transition between psychological states [Bakhtiyari et al., 2014]. Also, the

pressure and demand to detect and report it could introduce another affective state

[Allen et al., 2001]. This is even more complex when a third party is required to detect

this feeling as the complexity and combination of the challenges above further reduce


the accuracy. Some of these complexities are faced when programming machines to

detect human affect [Fisher, 1993].

Notwithstanding these difficulties, affect detection through computers offers us some

advantages. Namely: the ability to automate the process and the sensitivity of sen-

sors in detecting minute changes otherwise not physically observable by human beings

(and certainly not at the same frequency rate) [Picard, 1997]. Further, machines are

not subjected to the same bias as humans. Additional advantages of sensors include

data storage, processing capabilities and integrability of sensors to other computer

applications [Zhang et al., 2014]. These are the reasons that make affect detection a

promising study for improving human-computer interaction. Self-reported emotions

are still regarded as the golden standard for precision accuracy of emotion measure-

ments [Lang, 2005, Mirgain and Cordova, 2007]. However, When machines achieve

accuracies close to humans in detecting emotions, then we can leverage on the other

advantages of computation, i.e. affective computing [Prendinger and Ishizuka, 2005].

We discuss the three broad categories of affect detection below:

• Reported: Diaries, product ratings, service feedbacks as questionnaires. [Gehricke

and Shapiro, 2000, Heller et al., 1997] Through self-reported emotions, peo-

ple communicate their feelings to product owners in product reviews, service

providers as customer feedback, or personal records for people who keep per-

sonal diaries. However, this approach is highly subjected to human bias and

would require a lot of effort for the user to produce sufficient granularity of

detail required for most applications of affective computing [Levine and Safer,

2002].

• Physical/Behavioural: Gestures, facial recognition [Kumar and Agarwal, 2014,

Xu et al., 2015], Natural Language Processing (NLP), voice prosody, Keystroke

Dynamics (Keystroke Dynamics (KD)) & Mouse Dynamics (Mouse Dynamics

(MD)) [Hernandez-Aguila et al., 2014]. We can observe physical and behavioural

cues, to infer emotions, as it is the most common way of learning each other’s

emotional state during a human-human interaction [Zaalberg et al., 2004]. It is

very natural, but it is difficult for computers to achieve high precision through

this method because emotional expressions are a human-human social construct


for people to express their feelings to each other, not machines [Van Kleef, 2009].

Another limitation of this approach is that it is context-specific, as people express

their emotions in different ways at different circumstances [Hochschild, 1979].

Also, most of these methods require high computational power to achieve a

satisfactory level of accuracy.

• Physiological: GSR, Electrocardiography (ECG), EEG, EMG, HR, Skin Tem-

perature (ST) [Zhang et al., 2014], Pupillary Response [Eraslan et al., 2014].

In detecting the user’s affective state for computational purposes, more sophisti-

cated instrumentation and analysis techniques could be used to measure changes

in physiological activities that are known to correlate with the user’s affective

state. This approach is known as affect detection using physiological correlates

of affect.

There are other things to consider when selecting an affect detection mechanism, and

those considerations depend on the purpose of the application. In adaptive gaming,

the user’s gestures may be an indication of the affective state of the player while

game progression and user performance may be used to determine the context in

which the user experiences an affective state. Subsequently, suitable events such as

increased/reduced complexity of a game level could be used to intervene on bore-

dom/frustration respectively [Christian et al., 2014]. In intelligent tutoring systems,

emotional cues captured from cameras and features extracted from sound prosodies

could be used to detect the user’s affective state while responses to cognitively chal-

lenging tasks could be used to contextualise an affective state. When an affective state

and the context that describes the affective state is known, suitable interventions like

suggesting breaks, repeating an explanation or quizzes can be introduced to improve

the learning process [Sidney et al., 2005]. Contrary to the applications above, affect

responses during interaction with user interfaces are not as intense as during gaming

interaction, hence the need for methods with high sensitivity [Matthews et al., 2018d].

Furthermore, many of the methods are invasive therefore not suitable for observing

people’s affective states during human-computer interaction with user interfaces.


In more detail, twelve (12) well-used affect detection mechanisms are reviewed indi-

cating their applications, advantages and constraints. The list was compiled through

my literature of mechanisms I learnt about until I found no new mechanism found in

the literature other than those in the list. They are categorically listed according to

the feature/behaviour/physiology being sensed rather than the sensor themselves.

1. Self-reported

Self-reported means remains the golden standard upon which other implicit affect

detection mechanisms are evaluated [Broekens and Brinkman, 2013]. Self-reported

mechanisms require participants to explicitly rate their emotions on a 5, 7 or 9-point

scale [Zeng et al., 2008]. These scales are presented to the user pictorially, verbally,

through graphical animations or by filling out questionnaires on paper [Desmet, 2003].

Some of the well-used self-reported scales include Self-Assessment Manikin (SAM)

[Geethanjali et al., 2017] for measuring valence, arousal and dominance, the Positive

Affect and Negative Affect scales (PANAS) for measuring valence, Profile Of Mood

State (POMS) for measuring mood [Norcross et al., 1984] and Visual analogue scale

(VAS) for measuring characteristics of an attitudes on a continuous scale [Crichton,

2001]. To validate our approach, we used the SAM scale because it is simple to use,

measures arousal, and does not complicate the comparison with our algorithm, as

it is also quantified on a scale [Watson et al., 1988]. However, self-report’s reliance

on human judgement justifies the need for an automatic approach to detecting affect.

Several individual components add up to make up a person’s cognitive and intellectual

abilities. These include memory retention, reasoning and creativity, but emotional in-

telligence is often overlooked as one of them [Detterman, 1987]. Emotional intelligence

can include the ability to deduce one’s emotions [Peter, 2010] cognitively. The vary-

ing level of emotional intelligence can make self-reported emotions inconsistent. Also,

there seems to be a consensus among [Ekman et al., 1987, Cole et al., 2002, Simon and

Nath, 2004] that gender, social, cultural and personality differences affect perception

and display rules of emotions. Although this has been used in product/service feedback

[McKone, 1999, Gamon, 2004], personal diaries and logs [Birditt et al., 2005]; There is

a lack of detail, consistency and ease of use in self-reported emotions as an approach

for affect detection because it is done manually [Levine and Safer, 2002]. This manual


approach has made it difficult for standardization and consistency; hence, self-report

is not an ideal method of affect detection for use in affective computing [Hassan, 2006].

2. Facial recognition

Facial recognition is a less demanding way of detecting emotions than self-reported

emotions as it has the potential of being fully automated. Recognizing emotional

expressions can be done in four (4) different ways [Kumar and Agarwal, 2014]: 1.

Geometry - shapes, direction, regions; 2. Colour-based: the colour of the feature (eye,

nose, mouth) but this is very individualistic and culture/race/skin colour specific;

3. Appearance-based: statistical techniques; 4. Template-based: by comparing with

templates of a feature from a feature database until there’s a match. Paul Ekman,

who was at the forefront of findings using facial expressions and suggested the use of

Facial Action Coding System (FACS) based on the works of an anatomist, CH Hjortsj

[Hjortsjo, 1969, Ekman and Friesen, 2003]. However, the subjective nature of human

emotions has hindered progress using these techniques. Despite the challenge of indi-

vidual differences, [Kumar and Agarwal, 2014] reported that facial expressions have

been successful to accuracy between 70% and 84% in person-specific emotion recogni-

tion with a maximum of 5% variation in non-person-specific emotion recognition.

The challenges to facial recognition include its sensitivity, granularity of calibration,

which influences the accuracy in temporal dimensions [Kolakowska, 2013]. During

human-computer interactions, low-intensity emotions are prevalent, and sensitivity of

recognition is a desirable feature which facial recognition does not afford us. Another

challenge is that, under observation, users tend to display what the observer is expect-

ing to see. In a real-life situation, under covert settings or with no observance, the user

may behave differently [Zaalberg et al., 2004]. Also in facial recognition, it becomes

unclear what expressions to expect from mixed emotions such as someone transition-

ing from a joyful experience to surprise and fear. Most facial expression methods are

based on Paul Ekman’s fundamental theory of the basic discrete emotions [Ekman,

1992b]. Although detecting facial expressions are non-intrusive and averagely cheap,

needing only a camera and decent computational power. Recognising an affective state

through facial expression remains a non-sensitive, user-specific, bias-prone way of de-

tecting user affect [Harms et al., 2010].


3. Gestures

Gestures of users can be used to discern their affective states [Mitra and Acharya,

2007]. This is made possible because our hand, head, body movement and postures

can be used as affective cues. It often requires a camera and motion sensors to detect

these signals. Hence, gestures are similar to facial expressions regarding the cost and

detection approach, requiring image analysis using the same techniques (geometry and

shape) as metrics [Gunes and Piccardi, 2007]. Some applications have even combined

the use of both techniques due to their similarities in feature metrics, software design

and hardware resources required [Gunes and Piccardi, 2007, Castellano et al., 2008].

However, it is worse than facial recognition regarding sensitivity and applicability be-

cause low-intensity emotions do not reflect much in user gestures [Mitra and Acharya,

2007, Ward and Marsden, 2004]. Despite these limitations, gestures have been ap-

plied to annotating video contents [Hartmann et al., 2005], affective tutoring systems

[Sarrafzadeh et al., 2006, Sarrafzadeh et al., 2008], tutor training systems to improve

facilitator’s body movement [Nguyen et al., 2015], etc.

4. Voice prosody

Voice prosody has been applied to detect the genre of music, predicting emotions,

detecting emotions, voice chats, etc. [Fritz et al., 2009]. It is cheap (in infrastruc-

ture) and non-intrusive with accuracies reaching over 80% [Erdem and Sert, 2014].

Prosodic (frequency, duration, intensity and timbre) features and non-prosodic fea-

tures have been used to classify sounds into emotional types [Mion and Poli, 2008].

Despite its great accuracy, low cost, non-invasiveness, it is unnatural to request a user

during human-computer interaction, to speak to detect the user’s current affective

state. This method is more suited to eliciting emotions than detecting emotions dur-

ing human-computer interaction [Kim and Andre, 2008]. Furthermore, it could pose

a significant challenge for a voice recognition engine to extract a voice while retaining

the prosodic features that are useful for affect detection. Finally, voice processing is

computationally intensive due to the extensive signal processing necessary for voice

analysis.


5. Natural Language Processing (NLP) and Text Mining

While text mining deals with the extraction of interesting knowledge including statis-

tics, clustering, classification from unstructured or free text, NLP breaks it down and

evaluates it to discern sensual, emotional or a richer meaning of a free text [Kao and

Poteet, 2007].

Some applications of text mining and NLP to affective computing are musical lyrics

classification [Hu et al., 2009], social media emotion detection [Sobkowicz et al.,

2012, Gil et al., 2013]. In a literature review, NLP was said to have a relatively

low accuracy of 77.30% and is prone to differences in language interpretation ambigu-

ity, cultural differences and social display rules [Lee et al., 2012]. NLP as a method of

detecting emotions is also not real-time, and more suitable for affect contextualization

rather than affect detection [Kim et al., 2010].

6. Electro-dermal activity

GSR, Skin Conductance Level/Rate (SCR/L), PsychoGalvanic Reflex (PGR) are all

measures of electrical activities of the skin [Boucsein, 2012]. Now formally known as

Electro-Dermal Activity (EDA), EDA is the general term for all activities that mea-

sure the changes in the electrical properties (potential difference, current, resistance,

conductance) of the skin [Critchley, 2002]. The Sympathetic Nervous system (SNS)

is the part of the Autonomic Nervous System (ANS) which controls the involuntary

action of sweating [Critchley, 2002]. When the SNS is stimulated, it becomes aroused

and is reflected by the amount of sweat deposited in the sweat pores [Chellali and

Hennig, 2013]. As sweat is a conductor of electricity, the amount of sweat in the sweat

pores influences those changes in electrical properties of the skin. EDA is a reflection

of the arousal dimension of emotion [Schlosberg, 1954]. It should be noted that it

is not a causal relationship that exists between emotional arousal and EDA as other

factors such as humidity, weather, and motor activities can easily increase or decrease

the amount of sweat on the skin [Chellali and Hennig, 2013]. Devices that measure

EDA have been miniaturised and made non-intrusive in the form of wristbands, fin-

ger bands, toe bands, etc. [Luharuka et al., 2003]. Despite its unobtrusiveness, the

uncertainty and difficulty to control external factors have made EDA unsuitable for ap-

plications in naturalistic settings. Also, when measuring EDAs, it is expected to factor


in between 3 to 6 seconds latency between stimuli and responses [Chanel et al., 2006],

which is another limitation. Furthermore, EDA devices are still somewhat expensive

and do not come ship as a standard personal computer peripheral device, hence not

practicable for use in everyday affect detection by non-experts in non-experimental

settings [Boucsein, 2012].

7. Electrical-brain activity

Human electroencephalography is a measure of the electrical frequency and amplitude

activities measured from the [Li et al., 2009]. A comparison of multimodal affect detec-

tion mechanisms comprising of EEG, GSR, HR, temperature and respiration pattern

was made using generative models [Torres-Valencia et al., 2014]. It was observed that

when EEG was included in the combination, it yielded the highest accuracy in both

arousal and valence dimensions of emotion. To utilise EEGs, several bands such as

alpha, beta, theta, delta representing different frequency bands indicate different ac-

tivities in the brain; therefore, a careful selection of one or combinations of these bands

is necessary [Ahern and Schwartz, 1985]. Because placing the probes on the head also

affects stability. Hence, the reliability of the readings, some level of skill is needed

for a successful setup. Some studies claim that EEGs are inexpensive, non-intrusive,

simple, fast and an accurate way of studying the brain’s reaction to stimuli [Li et al.,

2009]. While its accuracy and responsiveness are undisputed, the knowledge on how

to set-up, the frequency bands to explore, the unnatural fitting, the non-aesthetic and

intimidating look as seen in figure 2.3, we would argue that it is not quite ideal for

daily, non-experimental use [Li et al., 2009].

8. Heart rate variability

Under this category includes sensing the heart rate, blood volume pressure and electrical-

heart activity with ECG. There are several sensors that can be used to estimate the

heart rate and blood volume pressure, including the use of skin temperature (Photo-

plethysmogram (PPG)) discussed later [Quazi et al., 2012]. ECG, is mostly used as a

clinical diagnostic tool for detecting the health condition of the heart and could also be

used as a physiological correlate to affect [Li and Chen, 2006]. ECG as an affect detec-

tion mechanism is based on correlations between cardiovascular activities of the heart

and changes in the affective state of a person. When the fibres through the arteries


Figure 2.3: EEG device 1

and ventricles are activated through the Sympathetic Nervous System (SNS), the heart

rate increases. Conversely, activation of the Parasympathetic Nervous System (PSN)

reduces workload, hence lowers the heart rate [Agrafioti et al., 2012]. Catalano et al.

[Catalano, 2002] reported that the reacting effects observable in ECG are: automacity,

contractility, conduction rate, excitability and dilation of coronary blood vessels. In

another study, Support Vector Machine(SVM) classifier was used to classify emotions

using ECG signals, the accuracy of 78.4% and 61.8% was achieved for three and four

classes of emotions respectively [Kim et al., 2004b]. As is the case with EEGs, ECGs

also requires some level of expertise to set up, and they are intrusive, the devices are

comparatively costly and worse still, its accuracy is not high. These constraints make

ECG non-suitable as an affect detection mechanism [Nakamura et al., 1993].

9. Electrical-muscle activity

EMG is the use of electrical activities to measure changes in muscle activities [Benedek

and Hazlett, 2005]. The most common use of this is in measuring changes in facial

muscles. The physiology of this is that when a stimulus is induced and experienced

by a participant, the affective state changes, and the brain instructs the motor nerves

to reflect these changes through facial muscles to generate the corresponding facial

expression [Xu et al., 2016]. This process is reversed to reset the facial expression or

transition into another facial expression. These changes can be revealed by observing

1http://www.cns.ppls.ed.ac.uk/eegmain


EMG over a facial muscle, and the duration of an affective state can also be measured.

EMG is more sensitive than other methodologies to detect facial expressions. Affective

changes not detected using Ekman’s FACS was detectable using EMG when emotional

intensities were suppressed [Cacioppo et al., 1992, Cacioppo et al., 1986]. The corru-

gator muscle is a facial muscle located in the forehead that is responsible for wrinkling

of the forehead. Activities of the corrugator muscles are usually used to detect nega-

tive emotions like sadness, worry, deep thought, anger because the corrugator muscle

controls the eyebrow by lowering it during those states [Dimberg, 1990, Hazlett and

Hazlett, 1999]. Also, the zygomatic major muscle is a facial muscle which is responsi-

ble for stretching the mouth posteriorly and superiorly, hence controlling smiling and

other mouth expressions that are correlated with joy and pleasurable emotions [Lang

et al., 1993]. Despite its sensitivity and accuracy, EMG just like EEG and ECGs are

intrusive because they depend on attaching devices to the body and require some level

of skill in setup [Allen et al., 2001]. Like facial recognition, individual, social and

cultural differences influence the perception and display rules of participants towards

affective stimuli [Cheng and Liu, 2008].

10. Skin Temperature

PPG can be used to measure skin temperature. The SNS works by preparing the body

for fight or flight response to stimuli while the PNS operates in the opposite way to

regulate this effect by triggering a rest or digest response [Nakamura et al., 1993, Moses

et al., 2007]. These activities influence our heartbeat patterns, respiration patterns

and skin temperature [Bousefsaf et al., 2013, Hjortskov et al., 2004]. Recent works

have confirmed that measures of variabilities in these activities can be done using a

technique known as PPG [McDuff et al., 2014]. PPG functions by emitting light into

the skin and measuring the amount of light reflected onto a camera or photosensitive

device [Shelley, 2007]. The amount of light reflected is an indicator of changes in

blood quantity deposited in the lower layer of the skin, and as explained above, the

blood volume is influenced by activities of our autonomic nervous system [Quazi et al.,

2012]. Lee et al. [Lee et al., 2011] proposed an algorithm which proved successful in

improving the Signal to Noise Ratio (SNR) and hence accuracy by between 10.9% and


12.7% for motions of the wrist and walking on the road respectively. Despite indica-

tions that PPG is a very promising methodology for detecting affect, the problem of

several confounding environmental variables such as temperature, humidity, motion,

individual differences are issues that linger on when using this technique [Lee et al.,

2011]. It is not ideal for use in naturalistic settings.

11. Keystroke Dynamics (KD) and Mouse Dynamics (MD)

KD and MD have been attempted for use as biometric devices to authenticate users

[Bergadano et al., 2002, Monrose and Rubin, 2000, Dowland and Furnell, 2004, Pusara

and Brodley, 2004]. The same feature sets used by these studies are also being used

for user profiling and emotion detection [Alhothali, 2011]. In KD for emotion detec-

tion, users affective states are being correlated with the dynamics of the way the user

types free text or fixed text. Features such as typing speed, backspace rate, durations

between key press, the duration between key down and key up, etc. are being used as

an indication of user’s affective state [Lin et al., 2013]. Features in mouse dynamics

include mouse speed, click rates, the duration of key up and key down events, scroll

rates, etc. are used for affect detection in MD [Lin et al., 2013]. The underlying

principles are in trying to associate muscle action and user behaviours (as manifested

in user actions) with the user’s affective state [Vizer et al., 2009]. A review of studies

done in emotion detection using keyboard and mouse dynamics was carried out with

the highest average accuracy being 93.4% gotten when KNN (K nearest neighbour)

was used to classify emotions elicited from audio stories into anger, fear, happiness,

sadness, surprise and neutral [Kolakowska, 2013]. The advantage of using this method

is that it is cheap, non-intrusive, low computational cost, natural and no complexity in

the device set-up [Kolakowska, 2013]. KD and MD would have been the ideal method-

ology for detecting affect if all applications required the same pattern of input during

an interaction. While reading emails, requires little or no keyboard input, sending

emails will require keyboard input and little mouse input. Relying on a methodology

based on KD and MD to detect user affect states in these cases could be erratic [Khan

et al., 2012].

12. Pupillary response

The pupillary response is known to be an indicator of arousal [Bradley et al., 2008b].


It has been used to detect cognitive load, frustration, attention, boredom and stress

[Partala and Surakka, 2003, Klingner et al., 2011]. The primary function of the pupil is

to regulate the amount of light going through to the retina [Baltaci and Gokcay, 2012].

However, once that is kept relatively constant, the pupil diameter is an indicator of

the autonomic activities of a person [Pfleging et al., 2016]. Pupillary response as an

indicator of affect stems from the fact that the sympathetic nervous system is that

part of the autonomic nervous system which when activated, raises the blood pressure,

heart rate, constricts the heart rate and most interestingly stimulates the radial dilator

muscles which in turn causes pupil dilation [Koss, 1986]. As the sympathetic activities

decrease, this process reverses and causes a decrease in the pupil diameter. Contrarily,

when the parasympathetic nervous system is activated, the sphincter muscles of the

iris is triggered, which in turn constricts the pupil. This process is reversed whenever

the parasympathetic system is inhibited and this, in turn, results in pupil dilation

[Bruneau et al., 2002, Schroder et al., 2005]. Visual interaction involves eye gaze at

the contents so, there is an opportunity to capture the eyes. Eye trackers have been

used in experimental set-ups to capture user interaction; however, some studies have

begun to use the web camera to achieve the same under naturalistic settings [Sommer

et al., 2014]. The cumulative measure of dilations and contraction of the pupil using

windowing techniques has been known to correlate with a user’s affective states. Once

the light is controlled/accounted for, and the camera or eye tracking device has a vi-

sual on the eye, we envisage that it will be possible to detect a users’ affective state.

The relatively high accuracy as seen in table 2.2 and the potential for use in natural

settings [Sommer et al., 2014], non-intrusiveness, easy to set-up features of pupillom-

etry as seen in table 2.1 has made pupillary response our preferred choice of affect

detection [Christian et al., 2014, Partala et al., 2000]. The next section takes a more

detailed expository on the selection of affect detection mechanism and how pupillary

response could be a reliable physiological correlate of affect during human-computer

interaction.


2.4 Selecting an affect detection mechanism

When selecting an affect detection mechanism, several considerations must be made.

• Purpose & natural fit

The purpose or application area that the affect detection mechanism is to be

used is very crucial in determining a suitable way to detect affect [Brouwer et al.,

2015]. In our intended application - user interfaces and visual context, it is quite

challenging to identify an ideal affect detection mechanism. This is because users

interact with a system through its user interface, and these interactions take dif-

ferent modes [Exposito et al., 2018]. However, most user interfaces have visual

contents which enable users to view options, information and make decisions

[Sanghoon and Roberto, 2005]. It is, therefore, a reasonable assumption that

most users will have visual contact with user interfaces. One of the established

modes of detecting affect, the pupillary response can be used to measure the

affective state of users [Sanghoon and Roberto, 2005]. Through eye trackers and

web cameras - the eyes of the computer, we can collect physiological responses

from the user, which will inform the affective states of the user. Another natural

way that the interaction of users can be observed is through keystroke and mouse

dynamics. However, these methods are not always applicable because some user

interfaces require less of them than others so, keystroke or mouse motion dy-

namics collected from one interface may be dissimilar on other interfaces even if

the user is experiencing the same affective state on both user interfaces.

Regarding the purpose and natural fit, pupillary response and other camera-

based approaches are more generalizable and accurate means of collecting affec-

tive data for interaction with user interfaces and visual contents.

• Accuracy

Accuracy is of high importance to affect detection. However, with lots of data

cleansing and analysis methods, several studies have come up with specific tech-

niques for detecting affect in certain contexts. As can be seen from 2.2, nearly

all affect detection mechanisms have accuracies of above 80%, at least in certain

contexts. The more the number of classes, the lower the expected accuracy. Also,


in classification, higher accuracies are easier to attain than estimation. There is

no standard way to measure the accuracies of affect estimation. However, for a

more generalizable application, estimation is preferred because often, the desired

classes are not known beforehand. Another advantage of estimation is that it

is possible to convert estimates to classes but not vice-versa. In user interfaces,

the most important qualities for performance: attention, interest, adequate cog-

nitive load, stress are best measured using estimates on the arousal scale. It

is, therefore, important, that an affect detection mechanism can measure with

the right granularity, the affective state of a user. Physiological correlates of af-

fect are more fine-grained than physical/behavioural and self-reported methods.

This inevitably eliminates facial gestures, body gestures and other camera-based

affect detection mechanisms except for pupillary responses.

• Ease of use

Affect detection is very relevant to user interaction, as has been established

previously. However, if it is too costly to purchase and difficult to set up, or it

is too computationally intensive then, its cost may outweigh its value.

As can be seen from table 2.2, most of the studies were experimental. Only very

few of the mechanisms can be deployed in real-life applications. Text mining and

voice prosodies can be used for social and customer care applications, while the

keyboard and mouse dynamics may be used for specific computer applications.

Gestures and facial recognition are computationally intensive, but they may be

used in visual contents, learning or gaming. All other physiological response

based mechanisms are not applicable in natural settings because they do not

integrate seamlessly into the systems, except pupillary response. Also, if the

detection device does not need to be purchased separately or attached to the

body, then it can be considered easy to use.

Table 2.1 presents a summary of the various affect detection mechanisms, their gran-

ularity (how sensitive they are to affective changes), how easy it is to set it up, their

level of obtrusiveness (how intrusive, large and if they are attached to the body), the

minimum amount it costs to purchase it, how much computer resource is needed to run


it, its prominent disadvantages and its most suitable applications. Table 2.2 contains

accuracies of the affect detection mechanisms as extracted from the literature cited in

the rightmost column. The accuracy, application that motivated the research, number

of classification classes, method of analysis and the research novelty was tabulated.

Multi-modal studies (studies that used multiple affect detection mechanisms) were not

taken into considerations since it will be difficult to extract the accuracy of each of

the sensors separately.

2.5 Review of affect detection mechanisms

A review was conducted to search for papers in affective computing to choose a suitable

affect detection mechanism for user interfaces or visual contents.

2.5.1 Query Construction

The most important term here is affective computing because we do not want studies

that use these same detection mechanisms for purely health, sports or other unrelated

domain. Therefore, if a paper does not mention affective computing in its title or

abstract, it will not be extracted by the query.

We also needed to limit papers to only those that relate to the arousal dimension

of affect. The search is, therefore narrowed down by the terms arousal ‘or’ stress.

Arousal is a dimension of affect that deals with the intensity of emotion. Stress is a

form of mental and physical agitation in users and can be observed within a certain

range on the arousal scale. For now, there is no formal agreement on the specific level

of arousal that characterises stress. Despite the ambiguity and non-specific ways the

word stress has been applied, it remains a common term used to describe users that

are experiencing some level of discomfort during human-computer interaction. The

final search query constructed was:

Stress: Stress OR Arousal

+

Detection: Detection OR Recognition

+

Affective Computing: Affective Computing The query was applied to the following


Aff

ect

det

ecti

on

Gra

nula

rity

Ease

of

Obtr

usi

ven

ess

Fin

anci

al

Com

puta

tional

Dis

adva

nta

ges

Suit

able

mec

hanis

ms

setu

pco

stco

stapplica

tions

Rep

ort

edSel

f-re

port

edL

owL

owL

owL

owL

owP

ote

nti

al

for

bia

s,not

auto

mata

ble

Volu

nta

rily

/so

lici

ted

feed

back

,ea

sy,

seco

ndary

data

sourc

e

Physi

cal

/b

ehav

ioura

l

Faci

al

reco

gnit

ion

Low

Low

low

Low

Hig

hD

iffer

ence

sin

dis

pla

yru

les,

can

be

faked

E-l

earn

ing,

class

ifica

tion

into

dis

cret

eem

oti

ons

Ges

ture

sL

owL

owlo

wL

owH

igh

Diff

eren

ces

indis

pla

yru

les

Gam

es,

learn

ing,

exp

erim

ents

,m

ovie

class

ifica

tion

Voic

epro

sody

Med

ium

Low

Low

Low

Hig

hL

imit

edapplica

tions

Soci

al

net

work

s,le

arn

ing,

lie

det

ecti

on

NL

P&

text

min

ing

Med

ium

Low

low

Low

Med

ium

Language

diff

eren

ces

Soci

al

net

work

ing,

ente

rtain

men

t,co

nte

xtu

alizi

ng

KD

&M

DM

ediu

mL

owlo

wL

owL

owA

pplica

tion

dep

enden

tE

xp

erim

enta

l,sp

ecifi

cuse

rin

terf

ace

s

Physi

olo

gic

al

corr

elate

s

GSR

Hig

hM

ediu

mM

ediu

mM

ediu

mM

ediu

mC

onfo

unds

wit

hm

oti

on,

hum

idit

yand

tem

per

atu

re;

Late

ncy

;E

xp

erim

ents

,m

edic

al

EE

GH

igh

Hig

hH

igh

Hig

hM

ediu

mH

igh

gra

nula

rity

,co

mple

xit

yE

xp

erim

ents

,co

gnit

ive

scie

nce

,m

edic

al

EC

GH

igh

Hig

hH

igh

Hig

hM

ediu

mC

onfo

unds

wit

hm

oti

on,

low

gra

nula

rity

Exp

erim

ents

,m

edic

al

EM

GH

igh

Hig

hH

igh

Hig

hM

ediu

mC

an

be

faked

.C

lass

ifica

tion/es

tim

ati

on

of

basi

cem

oti

ons,

med

ical

PP

GH

igh

Med

ium

low

Low

Med

ium

Confo

unds

wit

hm

oti

on,

hum

idit

y,te

mp

eratu

re;

Late

ncy

.G

am

es,

exp

erim

ents

Pupilla

ryre

sponse

Hig

hL

owL

owL

owM

ediu

mC

onfo

unds

wit

hlight;

Req

uir

esuse

rfixed

posi

tion.

Analy

sis

of

vis

ual

stim

uli

Tab

le2.

1:C

ompar

ison

ofaff

ect

det

ecti

onm

echan

ism

s


Acc

ura

cyA

pp

lica

tion

Cla

sses

An

aly

sis

Nov

elty

Cit

ati

on

Faci

al

Exp

ress

ion

91.6

6%

(Faci

al

Ges

ture

det

ecti

on

)90%

(Em

oti

on

reco

gn

itio

n)

94.5

8%

(both

).D

river

s4

(hap

py,

an

ger

,sa

d,

an

dsu

rpri

se)

Fu

zzy

rule

Eyes

,li

ps

com

bin

ing

ges

ture

san

dex

pre

ssio

ns

[Agra

wal

etal.

,2013]

84.8

0%

Exp

erim

enta

l4

(norm

al,

pre

ten

ded

hap

py

an

dp

rete

nd

edsa

dfa

cial

exp

ress

ion

of

aff

ecti

ve

state

s)L

inea

rd

iscr

imin

ant

Th

rou

gh

faci

al

tem

per

atu

re[K

han

etal.

,2006]

87.5

0%

Mu

sic

ther

apy

3(h

ap

py,

neu

tral,

sad

)S

VM

Mu

sic

reco

mm

end

ati

on

syst

em[R

izk

etal.

,2014]

77.4

%(2

D)8

9.4

%(3

D)

Exp

erim

enta

l2

(posi

tive

an

dn

egati

ve)

SV

MU

sin

gco

nsu

mer

dep

thca

mer

as

an

dlu

min

an

ced

ata

[Sav

ran

etal.

,2013]

90.7

3%

Exp

erim

enta

l2

bin

ary

(happy,

an

gry

,sa

d,

surp

rise

d,

dis

gu

sted

an

dfe

ar)

Rad

ial

Basi

sF

un

ctio

nN

etw

ork

(RB

FN

)U

sin

gA

SM

(Act

ive

shap

em

od

el)

[Set

yati

etal.

,2012]

72%

Exp

erim

enta

l2

(str

esse

dor

not

stre

ssed

)S

VM

Usi

ng

ther

mal

an

dvis

ual

spec

tru

m[S

harm

aet

al.

,2013]

97.5

%(N

eura

lN

etw

ork

s)66.6

6%

(Reg

ress

ion

)E

xp

erim

enta

l6

(hap

py,

an

gry

,sa

d,

surp

rise

d,

dis

gu

sted

an

dfe

ar)

Reg

ress

ion

&N

NU

sin

gN

N(M

ult

ilay

erL

ayer

Per

cep

tron

wit

hB

ack

pro

pagati

on

learn

ing

alg

ori

thm

)[S

iraj

etal.

,2006]

95.3

%(D

elib

erate

lyd

isp

layed

)72%

(volu

n-

tari

lyd

isp

layed

)E

xp

erim

enta

l6

(hap

py,

an

gry

,sa

d,

surp

rise

d,

dis

gu

sted

an

dfe

ar)

SV

M&

HM

MT

emp

ora

lm

od

elli

ng

of

AU

’sfo

rfa

cial

reco

gn

itio

n[V

als

tar

an

dP

anti

c,2012]

Ges

ture

s

97.4

%(w

ith

self

rep

ort

edm

enta

lst

ate

)83.2

%(W

hen

incl

ud

ing

case

sn

ot

rep

ort

ed)

ITS

6(s

tres

sed

,sa

tisfi

ed,

tire

d,

thin

kin

g,

reca

ll-

ing,

con

centr

ati

ng)

Dyn

am

icB

ayes

ian

net

work

jun

ctio

ntr

eealg

ori

thm

Cla

ssif

yin

gm

enta

lst

ate

sth

at

aff

ect

learn

ing

[Ab

basi

etal.

,2010]

79%

Ad

ap

tive

gam

ing

3(l

evel

1,

2,

3)

of

sad

nes

s,fr

ust

rati

on

,h

ap

pin

ess,

joy

HM

MU

sin

gX

sen

sm

oti

on

cap

ture

syst

em(h

ttp

://w

ww

.xse

ns.

com

/)

[De

Sil

vaet

al.

,2006]

Voic

ep

roso

dy

77%

Exp

erim

enta

l2

(neu

tral

vs

emoti

on

al)

Sym

met

ric

Ku

llb

ack

-Lei

ble

rd

ista

nce

,lo

gis

-ti

cre

gre

ssio

nm

od

els,

HM

M[B

uss

oet

al.

,2009]

75%

for

2cl

ass

es(e

.gan

ger

vs.

joy

)84%

for

bin

ary

class

ifica

tion

(str

esse

dvs.

not

stre

ssed

)

Soft

ware

lib

rary

2(s

tres

sed

vs

neu

tral)

Mob

ile

base

dli

bra

ryto

det

ect

mood

an

dst

ress

.P

erfo

rman

ceis

30%

pro

cess

or

usa

ge

on

idle

mod

ean

d70%

wh

ile

an

aly

sin

g.

[Ch

an

get

al.

,2011]

80.1

0%

Cu

stom

erca

re2

(an

gry

an

dn

eutr

al)

Neu

ral

Net

work

sse

nti

men

td

etec

tion

(lan

gu

age

an

dte

xt

ind

e-p

end

ent)

[Morr

ison

etal.

,2005]

Tex

tm

inin

g85%

Web

conte

nts

3(p

osi

tive,

neg

ati

ve

an

dn

eutr

al)

pro

bab

ilit

yd

istr

ibu

tion

,d

efin

edru

les

[Lu

etal.

,2010]

KD

an

dM

D77%

-88%

for

bin

ary

Exp

erim

enta

lb

inary

(an

ger

,ex

cite

men

t,co

nfi

den

ce,

hes

itan

ce,

ner

vou

snes

s,re

laxati

on

,sa

dn

ess,

tire

dn

ess,

ner

-vou

snes

s)

Dec

isio

ntr

ees

Non

-inva

sive

&ch

eap

[Ep

pet

al.

,2011]

64.7

2%

for

vale

nce

an

d61.0

2%

for

aro

usa

lra

tin

gs

Exp

erim

enta

l2

(Low

vs

Hig

h)

AN

NU

sin

gA

NN

work

sw

ith

KD

&M

D[K

han

etal.

,2012]

GS

R40%

Exp

erim

enta

lC

orr

elati

on

Sta

tist

ics

Rel

ati

on

ship

bet

wee

nE

DA

an

dB

od

ym

o-

tion

.M

ovem

ent

occ

urs

3s

aft

erE

DA

sign

al.

[Ch

ella

lian

dH

ennig

,2013]

65.7

9%

Exp

erim

enta

l2

(str

esse

dvs.

rela

xed

)K

NN

,m

ult

ilay

erp

erce

ptr

on

,N

B,

ran

dom

fore

st,

Jri

pC

om

pare

pu

pil

lary

resp

on

sew

ith

GS

R[R

enet

al.

,2013]

EE

G

82%

Exp

erim

enta

l3

(posi

tive,

neg

ati

ve

an

dn

eutr

al

vale

nce

)Q

DC

,S

VM

,K

NN

Wir

eles

sE

EG

sen

sor

[Bro

wn

etal.

,2011]

82%

Exp

erim

enta

l2

(calm

/n

eutr

al

vs.

exci

ted

/n

egati

ve)

SV

MH

igh

erO

rder

Sp

ectr

um

(HO

S)u

sin

ggen

etic

salg

ori

thm

for

featu

reex

tract

ion

.[H

oss

ein

iet

al.

,2010]

84.5

0%

Exp

erim

enta

l4

(Aro

usa

lan

dva

len

ce,

hig

han

dlo

w)

KN

NS

elf-

org

an

izin

gm

ap

sfo

rb

ou

nd

ary

sele

ctio

n[K

hosr

owab

ad

iet

al.

,2010]

90%

Mu

sic

6(r

elax,

hap

py,

surp

rise

,sa

d,

fear

an

dan

gry

)G

MM

,B

ayes

ian

net

work

,O

ne-

Ru

le[K

hosr

owab

ad

iet

al.

,2009]

82.2

94

(joy

,sa

dn

ess,

an

ger

an

dp

leasu

re)

SV

M,

mu

ltil

ayer

per

cep

tron

Usi

ng

few

erE

EG

elec

trod

es[L

inet

al.

,2010]

69.6

9%

Mu

sic

4(j

oy,

sad

nes

s,an

ger

an

dp

leasu

re)

Mu

ltil

ayer

per

cep

tron

[Lin

etal.

,2007]

54.0

9%

,46.8

6%

an

d40.7

2%

resp

ecti

vel

yfo

rcl

ass

ifier

sL

earn

ing

4(b

ore

dom

,co

nfu

sion

,en

gagem

ent

an

dfr

ust

rati

on

)K

NN

,S

VM

,m

ult

ilay

erp

erce

ptr

on

[Mam

pu

sti

etal.

,2011]

87%

neg

ati

ve

vale

nce

(hig

hvs

low

aro

usa

l)66%

posi

tive

vale

nce

(hig

hvs.

low

aro

usa

l)E

xp

erim

enta

l2

for

neg

ati

ve

vale

nce

(hig

hvs

low

aro

usa

l)2

for

posi

tive

vale

nce

(hig

hvs.

low

aro

usa

l)sh

rin

kage

Lin

ear

Dis

crim

inant

An

aly

sis

(sh

LD

A)

[Math

ieu

etal.

,2013]

91.3

3%

Exp

erim

enta

l5

(hap

py,

surp

rise

,fe

ar,

dis

gu

st,

neu

tral)

KN

N[M

uru

gap

pan

an

dM

uru

gap

pan

,2013]

85.1

7%

Exp

erim

enta

l6

(hap

pin

ess,

surp

rise

,an

ger

,fe

ar,

dis

gu

st,

an

dsa

dn

ess)

SV

MU

sin

ghyb

rid

ad

ap

tive

filt

erin

gan

dh

igh

eror-

der

cross

ings

an

aly

sis

[Pet

ranto

nakis

an

dH

ad

jile

onti

ad

is,

2010]

EC

G78.2

1%

(for

sim

ilar

ages

)70.2

3%

(for

all

ages

)E

xp

erim

enta

l6

(Hap

pin

ess,

sad

nes

s,fe

ar,

surp

rise

,d

isgust

an

dn

eutr

al)

Reg

ress

ion

tree

,K

NN

an

dfu

zzy

KN

N(F

KN

N)

Eff

ect

of

age

gro

up

son

emoti

on

class

ifier

s’age

gro

up

s:9-1

6,

18-2

5,

39-6

8[J

erri

tta

etal.

,2013]

82.2

9%

Exp

erim

enta

l4

(Joy

,an

ger

,sa

dn

ess,

ple

asu

re)

BP

Neu

ral

net

work

,T

emp

late

Mach

ine

class

ifier

[Ch

eng

an

dL

iu,

2008]

Pu

pil

lary

resp

on

se90%

Exp

erim

enta

l2

(neu

tral

vs

aro

use

d)

Dec

isio

nT

ree

Per

iorb

ital

tem

per

atu

re[B

alt

aci

an

dG

okca

y,2014]

83.1

6%

aver

age

acc

ura

cyE

xp

erim

enta

l2

(rel

axed

vs

stre

ssed

)K

NN

,m

ult

ilay

erp

erce

ptr

on

,N

B,

ran

dom

fore

st,

Jri

pC

om

pari

ng

Pu

pil

lary

resp

onse

sto

GS

R[R

enet

al.

,2013]

Tab

le2.

2:A

ffec

tdet

ecti

onm

echan

ism

san

dth

eir

accu

raci

es


databases: ACM digital library, Springer, Science Direct, IEEE. After extracting the

results of the query and downloading the citations, title and abstract screening were

done. The following subsection iterates the protocols for inclusion and exclusion of

papers.

2.5.2 Exclusion criteria

The following papers were excluded:

- Duplicate studies

- Non-empirical studies

- Studies not reported in the English language

- Studies that do not clearly state the experimental conditions and demographics of

the participants

- Exclusion based on quality criteria, see 2.5.4.

- Conference proceedings, unless it is a full paper.

- Studies that do not report their findings clearly.

- Studies that do not have PDF versions.

- Studies that do not mention stress, arousal or emotion detection in their abstracts.

2.5.3 Inclusion Criteria

We included:

- Studies published from 2005 till 2014.

- Studies that explore methods to detect affect.

- Studies that use a single method or a combination of methods.

- Studies that improve detection mechanisms through the data cleansing phase even

if the studies have been done previously using the same detection mechanism.


- Studies that improve affect detection mechanisms through the machine learning

or statistical method for analysis even if the studies have been done previously

using the same detection affect detection mechanism.

2.5.4 Quality assessment

We will exclude fatally flawed studies from the review. These will be identified using

the following five questions developed by Dixon et al. [Dixon-Woods et al., 2006].

1. Are the aims and objectives of the research clearly stated?

2. Is the research design clearly specified and appropriate for the aims and objec-

tives of the research?

3. Do the researchers provide a clear account of the process by which their findings

we reproduced?

4. Do the researchers display enough data to support their interpretations and

conclusions?

5. Is the method of analysis appropriate and adequately explicated?

2.5.5 Results

A total of 1,255 papers were extracted from the database, and 741 were excluded after

applying the exclusion and inclusion criteria, leaving us with 514 papers for full-text

screening. With these remaining papers, I carried out a text search for the terms

“Web” or “internet” or “browse” and it yielded seven papers. This suggests that only

a few studies have focused their research to affect detection on the Web. In the next

subsection, we present a synthesis of the final papers extracted from the review that

contain either a methodological or theoretical similarity/relationship to our research.

2.5.6 Synthesis of related work

As early as 1974, Janisse et al. tested the hypothesis that pupil dilation and con-

striction occurs with attraction and constriction, respectively [Janisse, 1974]. Results


showed that there was a linear correlation between pupil, and affective intensity, while

there is a curvilinear relationship related to valence [Janisse, 1974]. Hence, pupil di-

lations occur in both positively and negatively valenced stimuli and are proportional

to the intensity [Janisse, 1974]. In 2003, Partala et al. confirmed Janisse et al.’s

finding that pupil size variation is an indication of affective processing. Participants

were presented with ten positive, neutral and negatively valenced sounds each; results

showed that after two seconds offset, participants’ pupils were significantly larger for

both negative and positive stimuli, than for the neutral stimuli [Partala and Surakka,

2003]. Bradley et al. carried out a study that showed that pupillary response reflects

emotional arousal in affective picture viewing tasks [Bradley et al., 2008a]. Similarly,

in 1966, Kahneman et al. showed through a study, that pupil dilation is a measure of

cognitive load and task difficulty [Kahneman and Beatty, 1966]. In other early, but

related works, pupil dilation, has been combined with other mechanisms, for example,

GSR, Blood Volume Pressure (BVP) and ST to detect the presence of stress or in-

creased arousal [Zhai and Barreto, 2006]. Calvo et al. reviewed affect detection with a

broad overview of models, methods, and their applications [Calvo and D’Mello, 2010].

Bremner et al. researched on the amplitudinal changes in pupil size, and the velocity

to constrict after exposure to light stimuli [Bremner, 2012]. Partala et al. provided

evidence that positive interventions can indeed trigger a different affective state when

participants experience a negative one, such as during a mouse pointer failure [Partala

and Surakka, 2004]. So far, within this section, we presented the early related work

(published before 2014), that we extracted from our review for the theoretical under-

pinnings behind our research.

Table 2.3 summarises more recent related works that have been published between

2015 and 2019 during the period of our research. These papers were retrieved by

querying Google scholar, ACM digital library, Springer, Science Direct and IEEE us-

ing the criteria (pupil OR eye-tracking) AND (web OR human-computer interaction or

user interface or adaptive computing) AND (arousal OR stress) for papers published

between 2015 and 2019.

CHAPTER

2.BACKGROUND

AND

RELATED

WORK

52

Table 2.3: Related work in theoretical findings, applications, methods or sensors used

Theoretical Application Analysis Method

No Novelty Review Empirical Lab Wild AI/ML Others Sensor Source

1

Model for

predicting stress in

deskjobs

X X Smartwatch [Sanchez et al., 2018]

2 Detecting stress in UI X Patterns Mouse, Gaze [Wang et al., 2019]

3

Emotion

intensity-duration

model

X X Pupils [Steephen et al., 2018]

4Pupilsreflect arousal

from clickbaitX X [Pengnate, 2016]

5

Sensing and eliciting

emotions from

humanoid-robots

X Average Pupil, EEG [Guo et al., 2019]

6Arousal modulated

game difficultyX GSR [Amico, 2018]

7 Stress detection X Various [Sioni and Chittaro, 2015]

8Predicting intentions

and emotionX X Gaze, head motion [He et al., 2018]

9

Relationship between pupil

dilation, task difficulty,

habituation

X Pupil [Matsumoto et al., 2016]

10Detecting emotional

valence/brain activationX Average Pupil, fMRI [Park and Kim, 2016]

11Arousal and Valence

recognition in gamingX Hilbert transform Pupil [Alhargan et al., 2017]

12Emotion recognition:

Lab vs wearable sensorsX X Various [Ragot et al., 2017]

CHAPTER

2.BACKGROUND

AND

RELATED

WORK

53

Continued from previous page



13Affect detection with

thermal imagingX X Thermal imaging [Latif et al., 2015]

14

Logging viewers affective

responses to visual

attention on paintings

X X Pupil, Gaze [Calandra et al., 2016]

15Affect recognition

for e-learningX X Pupil [Xing et al., 2016]

16Sensing stress from

mobile typingX Mobile typing pressure [Exposito et al., 2018]

17Relationship between listening

duration and arousalX X Pupil [McGarrigle et al., 2017]

18

Relationship between pupil

dilation, mental load

and light

X X Pupil [Pfleging et al., 2016]

19

Impact on pupillary

response on gaze

estimation

X X Pupil, Gaze [Choe et al., 2016]

20

Model relating pupil

dilation with Cognitive

arousal vs light

X X Pupil [Duchowski et al., 2018]

21

Relating memory

capacity with pupil

size variability

X Pupil [Aminihajibashi et al., 2019]

22Estimating mental load

from pupil and gaze dataX Bayesian surprise Pupil, Gaze [Wolf et al., 2018]

CHAPTER

2.BACKGROUND

AND

RELATED

WORK

54




23

Detecting affect from

facial expression and

pupil dilation.

X Average Pupil, Facial [Tangnimitchok et al., 2018]

24Sensing stress in

visual perceptionX X Pupil, Gaze [Chmielewska et al., 2019]

25Stationary of pupil

in predicting WorkloadX X Pupil [Buettner et al., 2018]

26Pupil dilation as a predictor

of emotional engagementX Average Pupil [Henderson et al., 2018]

27

Stress detection with pupil

dilation and facial

temperature

X X Pupil, Thermal imaging [Baltaci and Gokcay, 2016]

28

Detecting emotional

valence from EEG and

pupil dilation

X Average Pupil, EEG [Abdrabou et al., 2018]

29Detecting emotional

valence from pupil dilationX X Pupil [Babiker et al., 2015]

30Pupillary response in

virtual reality enviromentX X Pupil [Chen et al., 2017]

31Decoupling light reflex

from pupillary dilationX X Pupil [Raiturkar et al., 2016]

32

Pupillary responses influence

internal belief states of

correctness or error in

decision making

X Average Pupil [Colizoli et al., 2018]

33Pupil dilation as a mechanism

for attention-aware systemsX X Pupil (glasses) [Gollan et al., 2016]

CHAPTER

2.BACKGROUND

AND

RELATED

WORK

55




34Pupil dilation for tracking

lapses in attentionX X Pupil [van den Brink et al., 2016]

35

Determining the relationship

between arousal vs pupil

dilation, HR and GSR

X Average Pupil, HR, GSR [Wang et al., 2018]

36

Pupil dilation and HR as

an indicator of

cognitive load

X X Pupil, HR [Jercic et al., 2018]

37Pupil responds

to surpriseX X Pupil [Alamia et al., 2019]

38Pupil dilations

to sense arousalX X Pupil [Kassem et al., 2017]

39The window of cognition

in pupil dilationX Pupil [Korn and Bach, 2016]

40

Pupil size responds

to attention and

experience

X X Pupil [Wahn et al., 2016]

41

Correlating gaze position

with pupil size

measurements

X X Pupil, Gaze [Hayes and Petrov, 2016]

42

Predicting click intention

from pupil dilation, EEG

and Gaze tracking

X X Pupil, EEG, Gaze [Slanzi et al., 2017]

CHAPTER

2.BACKGROUND

AND

RELATED

WORK

56




43

The relationship between

affect and ratings of

visual complexity

suggest an

‘arousal-complexity bias’

X X Pupil [Madan et al., 2018]

44Review of low-cost

eye trackersX Gaze [Ferhat and Vilarino, 2016]

45Review of pupil dilation

in sensing cognitive loadX Pupil [Einhauser, 2017]

46Gaze tracking, attention

and consumer behaviorX Pupil, Gaze [Rosa, 2015]

47

Effects of brightness, contrast

and hedonic content

on pupil diameter

X Pupil [Bradley et al., 2017]

48Effects of music on

the pupil sizeX X Average Pupil [Laeng et al., 2016]

49

Using the startle eye-blink

for measuring player

affect in games

X X EMG [Nesbitt et al., 2015]

50pupillary responses

to musicX X Pupil [Gingras et al., 2015]

51

Recommendations in

using neurophysiological

signals

X Various [Brouwer et al., 2015]

52Pupil dilation to measure

cognitive effortX Pupil [van der Wel and van Steenbergen, 2018]

CHAPTER

2.BACKGROUND

AND

RELATED

WORK

57




53Stress detection in

healthX Various [Greene et al., 2016]

54Emotion recognition

from pupil dilation and EEGX X Pupil [Lu et al., 2015]

55

The role of image duration,

habituation, and viewing

mode (active/passive)

of affective pictures

X X Pupil [Snowden et al., 2016]

56

Eye movement and

fixation to sense affect in

emotive pictures

Average Pupil, Gaze [Simola et al., 2015]


Table 2.3 shows that out of 56 related works, 28 were theoretical studies, while

the other 28 were applications or systems that demonstrate the use of affect detec-

tion in different domains. Out of the 28 theoretical studies, seven were reviews and

recommendations for affective computing while the other 21 were theoretical concepts

with empirical studies. Surprisingly, out of the 28 related applications or systems im-

plemented for affect detection, we found only one that was carried out in the wild -

‘logging the affective state and focal attention of gallery viewers using an eye tracker’

([Calandra et al., 2016]). Regarding the analysis methods, 9 studies made use of

machine learning, and were mostly multimodal sensors for affect detection, while 34

studies made use of other analysis methods (statistics, probabilistic or mathematical

models). The rightmost column of table 2.3 summarizes the novelty in each work to

our research.

From the seven reviews in Table 2.3, Sioni et al. reviewed various sensors for stress

detection. Sioni et al. noted the various application areas of stress detection, including

learning, communication and health, while pupillary response was listed as one of the

emerging technologies for stress detection [Sioni and Chittaro, 2015]. Similarly, Greene

et al. reviewed sensors for stress detection with a focus on health applications [Greene

et al., 2016]. Brouwer et al. reviewed neurophysiological signals, especially noting

the pitfalls (e.g. accuracy measures, purpose of use, confounding factors, statistical

methods) to consider for an affect enabled application [Brouwer et al., 2015]. Rosa et

al. reviewed eye-tracking technology in a bid to reveal methodological and technical

challenges in inferring cognitive and emotional processing, but in consumer behaviour

[Rosa, 2015]. Einhauser et al. reviewed techniques, applications and physiological

responses of the pupil to cognition, emotion, attention and memory [Einhauser, 2017].

Another study by van der Wel et al. investigates cognitive control tasks and revealed

a diverging relationship between pupil dilation and performance [van der Wel and van

Steenbergen, 2018]. As our contribution aims to select an affect detection mechanism

with the potential for ubiquitous use, Ferhat et al. reviewed low-cost eye trackers

to highlight their technologies from various perspectives (calibration strategies, head

pose invariance, and gaze estimation techniques) [Ferhat and Vilarino, 2016].

Other empirical studies related to our research includes the work of Matsumoto et al.

who studied the relationship between pupil dilation, task difficulty and habituation


[Matsumoto et al., 2016]. Similarly, Snowden et al. studied habituation in the viewing

of affective pictures [Snowden et al., 2016]. In studies that performed comparisons

between sensors, Wang et al. compared the relationship between arousal and pupil

dilation, HR and GSR [Wang et al., 2018]. Similarly, Ragot et al. performed a compar-

ison between Biopac MP150 (laboratory sensor) and Empatica E4 (wearable sensor)

trained using machine learning models and found similar accuracies between the two,

thereby, supporting the viability of emotion recognition in the wild [Ragot et al., 2017].

Light represents the largest source of noise when extracting physiological response to

arousal [Bradley et al., 2017]. Hence, there were several studies that researched on

models relating (or decoupling pupil) dilation as an affective response and pupillary

response to light or brightness [Pfleging et al., 2016, Duchowski et al., 2018, Raiturkar

et al., 2016, Bradley et al., 2017]. Some other studies researched the relationship be-

tween pupil dilation and listening, sound for music applications [McGarrigle et al.,

2017, Laeng et al., 2016, Gingras et al., 2015]. Further, empirical studies were done

relating pupil dilation with physiological arousal [McGarrigle et al., 2017] and cog-

nition [Duchowski et al., 2018, Korn and Bach, 2016]. Other more specific variables

that were investigated against pupil dilation include task difficulty [Matsumoto et al.,

2016], mental/ workload [Pfleging et al., 2016, Buettner et al., 2018], confidence of

decision making in tasks [Colizoli et al., 2018], surprise [Alamia et al., 2019], visual

complexity rating [Madan et al., 2018], memory capacity [Aminihajibashi et al., 2019],

attention and experience [Wahn et al., 2016].

Regarding the studies that investigated physiological means for affect detection, all

but one was implemented in a naturalistic setting. Calandra et al. designed a system

called EYECU (pronounced I see you) to capture the affective responses of users as

well as their attention on paintings [Calandra et al., 2016]. Other related works on

applications in the lab that do not include the use of pupillary response were that of

thermal imaging [Latif et al., 2015] and EMG [Nesbitt et al., 2015] for affect detection,

and smartwatches [Sanchez et al., 2018], mobile typing pressure [Exposito et al., 2018],

mouse and gaze tracking [Wang et al., 2019] to detect the presence of stress. Some

applications combined pupillary response with other mechanisms like facial expres-

sion [Tangnimitchok et al., 2018], gaze tracking [Calandra et al., 2016, Choe et al.,

2016, Wolf et al., 2018, Slanzi et al., 2017], thermal imaging [Baltaci and Gokcay,


2016], EEG [Slanzi et al., 2017], fMRI (functional magnetic resonance imaging) [Park

and Kim, 2016] and HR [Jercic et al., 2018] to sense arousal and cognitive load.

With regards to the analysis methods used, many of the approaches used measures

of central tendency [Tangnimitchok et al., 2018, Henderson et al., 2018, Abdrabou

et al., 2018, Colizoli et al., 2018, Wang et al., 2018, Laeng et al., 2016, Simola et al.,

2015]. Other computational methods/techniques used by these studies include ma-

chine learning/AI [Latif et al., 2015, Xing et al., 2016, Sanchez et al., 2018, He et al.,

2018, Ragot et al., 2017, Chmielewska et al., 2019, Baltaci and Gokcay, 2016, Babiker

et al., 2015, Lu et al., 2015], Bayesian surprise [Wolf et al., 2018], Hilbert transform

[Alhargan et al., 2017] and pattern recognition [Wang et al., 2019].

2.6 Rationale for pupillary response

Pupillary responses in affective computing refer to the changes in pupil diameter that

are related to responses to emotional stimuli [Partala and Surakka, 2004]. The pri-

mary function of the pupil is to regulate the amount of light coming from the cornea

towards the retina 2. However, psychologists have long used pupillometry to measure

changes in autonomic activities of the nervous system [Steinhauer and Hakerem, 1992].

In a controlled experiment where the amount of light could be regulated, the pupillary

response is a useful affect discriminator for arousal [Pfleging et al., 2016]. Also, in

natural settings, the use of web camera’s could be explored due to its availability and

simplicity of use [Calandra et al., 2016].

Eye-tracking is used to observe visual behaviours to understand responses around the

user’s area of interest, user’s scan paths, the timing and variability between the two

[Pantic et al., 2007]. This has been studied in HCI to improve several aspects of user

interaction. Some of these ways include identifying areas of the visual stimulus that

has caused the user to exhibit a particular behaviour such as prolonged eye gaze or

frequent scans along the area [Davies et al., 2016]. It has also been used to observe

how users transition between areas of interest so that user interface elements that

are frequently used will be placed in more conspicuous locations or elements that are

of high value will but hidden will be placed in areas where users record more eye

2http://medical-dictionary.thefreedictionary.com


gaze [Eraslan et al., 2016b]. Pupillary response conversely will offer the subjective

behaviours of users, such as providing more understanding of what aspects of user

interaction are related to specific affective states [Yaneva et al., 2016a].

This research aims to contribute to the understanding of how pupillary responses cor-

responds to affective states. This will aid in the analysis of visual behaviour, especially

their affective response to stimuli.

2.7 Representing affect

As early as 2001, there were already more than 90 definitions of emotions, and we

would expect in its representation, there are at least as many variations, some only

different in the lexicon or application [Plutchik, 2001]. The oldest representation of

emotion is called discrete emotions. Paul Ekman proposed that there are six discrete

emotions, namely: fear, anger, sadness, happiness, surprise and disgust [Ekman and

Friesen, 1971]. However, Izard and Caroll E. postulated that there are four more

including interest, contempt, shame, guilt [Izard, 1992]. Later on, Allen applied

3

Figure 2.4: Facial expressions for discrete emotions

discrete emotions to product review scales. There have been several other suitable

applications of discrete emotions, but the most notable one is the classification of

facial expressions into discrete emotions by Paul Ekman, considered to be a pioneer

in the study of emotions [Allen et al., 1988]. Discrete emotions is a natural way,

familiar to humans of describing emotions; however, semantic ambiguity often makes

it challenging to quantify and represent emotions in affective computing.

Founded by Russel, the dimensional way of representing emotion is based on the

concept that emotions can be represented along 2 or 3 dimensions [Russell and Pratt,

1980]. The two dimensions are valence/pleasure (how pleasant/unpleasant the emotion

is) and arousal (the intensity of the emotion). Another dimension was proposed by


Mehrabian and Russel because it was observed that a third dimension, dominance

(the measure of control/submissiveness to emotion) was needed to distinguish between

some emotions with similar plots on the two-dimensional scale [Mehrabian and Russell,

1974, Russell and Mehrabian, 1977]. The dimensional representation is frequently

used in affective computing because it can be applied using mathematical expressions,

formulas and models. Some limitations of this method of representing affect are that

it is too fine-grained, and even though emotions are continuous quantities, they are

not well understood in ways that can effectively utilise in such low granularity scale.

These dimensional scales are also not natural in the way human beings understand

emotions to be.

The component model is another way emotions have been represented. Plutchik

4

Figure 2.5: 2D representation of emotion

proposed the emotion wheel, analogous to the colour wheel in 2D perspective, and

the cone visualisation in 3D to understand relationships between emotions [Plutchik,

1980]. He opined that there are eight emotions known as the basic emotions, and a

mixture of 2 basic emotions could form a dyad. For example, submission = trust +

fear. There is also the notion where emotion is the inverse of another, i.e. each of

the primary emotions, have their opposite states as seen in surprise and anticipation.

Also, those emotions could manifest with different levels of intensity, just like serenity

is of lower intensity to ecstasy. In the conical model (3D), the apex of the cone is

the most neutral state while in the colour wheel, the intensity of emotions reduces

towards the edges. [Ortony et al., 1990] postulated the OCC model, which is similar


5 6

Figure 2.6: Plutchik’s emotion wheel and cone representation of Plutchik’s emotions

to discrete emotions in that there are 22 emotion states. The difference is that the

emotional states are considered to be a possible consequence of an event, which may

be experienced due to: 1. The consequence of event (good or bad) 2. The action

of the agent and 3. Aspects of objects like appealing or not. Several researchers

have criticised the ambiguity in this model and therefore proffered revised versions

like the one in [Steunebrink et al., 2009]. Despite its complexity, it has been useful in

simulating and predicting emotions in several affect-aware applications [Ko lakowska

et al., 2015].

The next section reviews several applications of affective computing, by domain area.

2.8 Applications of affective computing

All affect detection mechanisms have specific applications they are suitable for regard-

less of their limitations. There is no formally agreed classification of the applications of

affective computing. Rosalind Picard classified applications according to their abilities,

so an application could have the ability to recognise, express and have emotions while

Schwark proposed a more elaborate taxonomy in affective computing where questions


are answered from the bottom up which then makes up an application definition [Pi-

card, 1997, Schwark, 2015]. The questions are to determine, from bottom to apex, its:

purpose/goal, level of integration, affective understanding, affect generation and the

platform. Regardless of the semantics for our choice of classification, we will discuss

the most common applications of affective computing by domain.

Entertainment is one area affective computing has been applied. An example of its

use in entertainment is in music and movies. These forms of entertainment have be-

come very accessible and portable through music players, computers, internet stream-

ing and radio station. The numerous options available affords listeners and viewers

the opportunity to be selective, but the question is, what criteria should they use

when selecting one for a moment? Bearing in mind that tracks are being released at

a pace that no single person can test all of them to learn by experience which track,

album or artist is suitable for the moment. This has motivated the use of machine

learning approaches to classifying music and movies into the kind of emotions they

elicit. Furthermore, it is prevalent in the studies of affect detection to use sounds and

music to elicit emotions [Yazdani et al., 2012]. Some of the research in this domain are

cited here [Yang and Chen, 2012, Burger et al., 2013, Janssen et al., 2012, Daimi and

Saha, 2014]. One of the applications of affective computing in computer science

is adaptive systems. In adaptive systems, several aspects of interaction are moni-

tored such as attention, planning, learning, memory and decision-making [Dalvand

and Kazemifard, 2012]. Although there exist acceptable standards of design centred

around accessibility, usability and ethical principles, to a large extent, these rules are

agnostic of individual idiosyncrasies of each user [Van Schaik and Ling, 2008]. These

formally agreed principles are guidelines that generated for most users, but there will

always be individual specific desires and preferences regarding application features,

design, presentation and mode of interaction. Just like in human-human interaction,

it is a good idea to start with certain heuristics for interacting with people then as the

interaction progresses and better understanding is developed, the manner of commu-

nication is gradually tailored to suit individual preferences. The idea is not to discard

the already established heuristics because they provide a starting point for user inter-

action but to develop a responsive and dynamic interaction that improves over time

by learning individual preferences.


When building an adaptive system, it is important that the mode in which the system

learns about the user does not introduce more problems by making the user having

to put on discomforting gadgets, increasing the number of tasks, compromising user

privacy or causing a bias that could skew the intended outcome of the system. In

an adaptive system, the selection of affect detection mechanism goes a long way in

determining its success because as seen in 1.1 affect detection is done twice within a

cycle, i.e. in detecting and evaluating the outcome.

Closely related to adaptive systems is usability and accessibility engineering, where

user interfaces are being tested to understand the visual and psychological behaviour

of users during human-computer interaction. Many applications of adaptive systems

have leveraged on the use of gadgets that contain pre-built sensors for detecting human

affect [Datcu, 2014]. However, user interfaces provide the avenue for users to perform

a variety of tasks such as browsing web pages, to sending emails, reading, watching

multimedia, chatting. Because the computer resources used to perform these tasks

are different, and their expected interaction behaviours are different, we need to make

use of a detection mechanism that factor in context, behaviour and affective state

into recognising and representing the user’s response. The eye tracker is a common

way of achieving this, but to study the subjective aspect of the user’s experience, a

different approach should be used to measure the physiological correlates to the user’s

affective state. Some of the studies in this area include Lunn et al. [Lunn and Harper,

2010a] who used GSR to identify areas of frustration in older web users, Fernandez et

al. [Fernandez et al., 2012] who used carried out a survey on stress during decision

making using stock traders and how stress affects their emotions and decision making

as a case study. Another study based on how cognitive load affects decision-making

was carried out on users of Computer Aided Design (CAD) software [Liu et al., 2014].

EEG, GSR and ECG sensors were used to collect psychophysiological data on the

participants, and fuzzy logic was used to model these physiological responses to frus-

tration, satisfaction, engagement and challenge.

Affect-aware games are games that are designed to adapt to the affective state of a

player. The goal in such applications is to ensure that the user remains challenged

and entertained [Hudlicka, 2009]. To accomplish this, the system needs to detect bore-

dom, loss of focus, frustration and any of such affective states that could cause user


disengagement [Gilleade et al., 2005]. When unwanted affective states are recognised

by the system, several actions such as increasing or reducing game complexity, reward

and punishment, providing hints can be used to induce a more palatable emotional

state [Liu et al., 2009, Rani et al., 2005]. Another application of affective computing in

games is in empathetic agents. Especially in role-playing games, the characters need

to display an emotion that closely depicts the situation of a game[Kim et al., 2004a].

In the learning domain, affective computing has been called affective learning, Intel-

ligent Tutoring Systems (ITS) and Affective Tutoring Systems (ATS). They are either

concerned with the detection of student’s affective state [Sarrafzadeh et al., 2006],

inducing desirable affective states in students [Alexander et al., 2003], enabling tu-

toring agents with emotions [Merrill et al., 1992] or a combination of those features

[Mao and Li, 2010]. In Medicine, affective computing has been used to study the

correlation between psychosomatic illnesses and the users’ affective state. It has also

been used to monitor the treatment of such illnesses on a longitudinal and non-invasive

basis [Bamidis et al., 2004, el Kaliouby et al., 2006]. Also, affective computing and

psychophysiology are used in medical and health domain to understand the way the

brain and nervous system function in particular through the use of EEGs [Wolpaw

and McFarland, 1994, Neuper et al., 2003].

2.9 Summary

Our literature review reveals a research gap that limits the potential of affecctive com-

puting. The goals of affective computing are for computers to be able to recognise

human emotions and adapt to it by altering aspects of user interaction or by ‘showing’

emotions to simulate human empathy, all in a bid to improve the quality of human-

computer interaction. The current gap in affective computing is the challenge of affect

detection. For affective computing to fulfil its potential, further research is needed on

affect detection mechanisms that have the potential for wide-spread ubiquitous use.

Consequently, methods of detecting affect were critically reviewed to select a suitable

affect detection mechanism for use in both experimental conditions and natural set-

tings during interaction with user interfaces and visual contents.


Our review and background study shows that pupillary responses provide an unobtru-

sive approach to affect detection during the human-computer interaction. It offers the

opportunity for controlled laboratory studies as well as the potential for use in natu-

ralistic settings through the use of web cameras. Furthermore, through the analysis of

gaze behaviour, we can detect the users’ focal attention during moments of increased

arousal.

The next chapter presents our research methods in detail.

Chapter 3

Development of AFA algorithm

In Chapter 2, we discussed related work and systematically reviewed existing mecha-

nisms for sensing affect. Pupillary response, and gaze behaviour was selected as our

preferred approach to sensing arousal because it is unobtrusive and has the potential

for deployment in the wild. Further, the analysis of gaze behaviour adds context to our

measure of arousal. We start off by discussing the existing pupillometry devices, their

uses and limitations. Next, we explore the pupillary response data through a secondary

analysis of eye-tracking datasets. The findings from our exploration study informed

the next section, where we described the structure and characteristics of pupillary

response data. Next, we describe some of the techniques that have previously been

used to analyse pupillary response to extract affective signals from it. Based on the

lessons we learnt from the exploration of some of these techniques in our secondary

data exploration, we decided on our preferred technique for sensing arousal. Next,

we explain our method for analysing pupillary response to sense changes in arousal

in more detail. We conclude the chapter by laying out the plan for evaluating AFA

algorithm.

3.1 Pupillometry

Pupillometry is the study of changes in the diameter of the pupil as a function of cog-

nitive processing [Sirois and Brisson, 2014]. Pupillometry is of interest in the fields of

medicine, neuroscience, physiology and psychology. We aim to leverage the literature

on the physiology of the pupils, and the psychological principles behind physiological

68

CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 69

responses, that relate to human-computer interaction. Mathot et al. categorised the

causes of pupillary response into three, namely: the pupil’s light response, the pupil’s

near response and the pupil’s response to arousal/mental effort/cognition [Mathot,

2018]. In the first case, the pupil responds to brightness by constricting in size to

accommodate less light. For the second, whenever an image is too close, the pupil

constricts to maintain focus and to sharpen the image. Finally, of the three cate-

gories, arousal/mental effort and cognitive load is the main factor that causes the

pupils to dilate. See Chapter 2 for a detailed description of the physiology of pupil

dilation. This section aims to list out the mainstream eye-tracking devices. Some

of the most popular manufacturers of eye-tracking equipment include Tobii, Natural

Point, Eyetribe, SMI, Eyelink and Gaze Point. Table 3.1 shows a table comparing

products, their sampling rates - F (Hz), portability, applications and their support for

pupil dilation.

CHAPTER

3.DEVELOPMENT

OFAFA

ALGORITHM

70

Table 3.1: Comparison of eye-tracking vendors

OEMs Product F (Hz) Portable? Application PD?

Tobii Pro glasses 2 50 & 100 YesMobile tracking of attention, engagement, training, skill

transfer and performance enhancementYes

Pro Spectrum 600 & 1200 NoHigh fidelity research with synchronization to external

sources: EEG, GSRYes

Pro X3-120 120 YesIt is designed for detailed research into the timing and

duration of fixations on screen-based stimuliYes

Pro X2 30 & 60 YesIt is ideal for usability and market research studies in the

fieldYes

Pro T60XL 60 No

Measure gaze behaviour over widescreen angles and large

stimuli for a broad range of

psychology and neuroscience research.

Yes

Pro TX-300 300 No

Study occulomotor functions and

capture natural human behavior

without the need for chin or head rest.

Yes

Dynavox PCEye Plus 30 YesTo aid accessibility and control user interaction on the

laptop or desktop using the eyesNo

CHAPTER

3.DEVELOPMENT

OFAFA

ALGORITHM

71

Table 3.1 – Continued from previous page


PCEye Mini 30 YesTo aid accessibility and control user interaction on the

tablet PC using the eyesNo

Eye Tracker 4C 90 YesUsed as an eye tracking peripheral to improve gaming

experienceNo

Natural Point TrackIR 120 Yes For tracking head movement as a gaming accessory No

SmartNav 3 & 4 NA YesAn assistive technology to aid accessibility for cursor

controlNo

Eyetribe NA NA YesProvides an SDK to either develop applications based on

gaze behaviour or capture the data for research purposes.No

SMI iview X 1250 Novideo-based

tracking with chin restYes

Glasses 120 YesMobile tracking of natural gaze behavior in the wild, with a

virtual reality settingNo

CHAPTER

3.DEVELOPMENT

OFAFA

ALGORITHM

72



HTC Vive 250 Yes

To perform immersive scientific grade research. For a totally

controlled naturalistic study by making participants

immersed into the stimuli to understand perception,

visual search, UX studies and clinical research.

No

Eyelink Portable duo 2000 Yes

It can be used for eye-movement research, both in and out

of the lab. Can be programmed with its SDK using several

programming languages and on multiple operating systems.

Yes

100 Plus 2000 YesVideo-based tracking in head-supported or head-free mode.

Device can be configured in different mount modes.Yes

Eyelink II 500 Yes Head mounted video based and scene tracking Yes

Gaze Point GP3 HD 60 & 150 Yes

A research-grade eye tracker for usability

and UX studies and recommended for

programmers who want to develop

gaze-based applications

Yes

Laptop Mount NA Yes For use with laptops and notebooks NA


There are some solutions for low-cost eye-trackers or web-cameras but they are not

yet of commercial grade [Fuhl et al., 2018, San Agustin et al., 2010, Kassner et al.,

2014, Mantiuk et al., 2012]. For all our studies, we made use of the Tobii X-60 and

captured data at 50Hz. Our experimental set up is shown in Figure 1.1.

.

Figure 3.1: Setup of eye tracker

Having itemised the various pupillometric devices, in the next section, we proceed

to explore several features of pupillary response data that can be used to understand

users’ affective response towards visual interaction.

3.2 Exploring pupillary response data

To explore the dynamics of pupillary response, we extracted data from a pre-existing

study. The original aim of this study was to understand the visual behaviours of

medical experts as they interpret ECG scans to improve the accuracy of ECG inter-

pretation. There were 31 participants - 23 (74%) female and 8 (26%) male. Most

of these participants (74.2%) were cardiac physiologists/technicians and students of

cardiac physiology, while the remaining 25.8% were of other health-related professions,

including nurses, doctors, and students. Students make up 12.9% of participants. All

participants had some training on ECG interpretation, although varying in level and

experience, Mdn = 7years(2− 35). The stimuli presented to the participants were 18

ECG scans in random order, without time limit, until they made their interpretation

of each scan. Further details about the initial study can be found elsewhere [Davies


et al., 2016]. For our exploration, we selected two ECG stimuli, both having 12 leads

(grids that represent segments in the heart signal). Those two stimuli were selected

because they had the closest number of correct as incorrect interpretations so that

the analysis is balanced between these groups for statistical validity. Each stimulus

was split into different AOIs that correspond to each of the 12 ECG leads, using the

eye-tracking software see Figure 3.2. This makes it possible to relate areas where

Figure 3.2: Areas of interest overlaid on a 12-lead ECG

participants gazed at, to a specific lead on the ECG. The questions this secondary

data analysis aimed at exploring was, “is there a statistically significant difference in

the pupillary response of those who got it correct from those who got it wrong?”.

If there is, “can we gain a better understanding of affective states by analysing the

pupillary response of people who got it correct against those who got it wrong?”. This

may be possible if we assume that there is increased anxiety and cognitive load in

people who have limited understanding as opposed to people who indicate what the

ECG scan represents. It could also be hypothesised that those who are reputable or

experts at reading ECG scans will experience more stress due to the pressure on them

to have high ECG interpretation accuracies. Notwithstanding these hypotheses, the

final question is, “can we predict using statistical features such as measures of central

tendency (mean, median and mode) and measures of deviation (mean and standard

deviation) of the data, whether a medical practitioner will get an ECG interpretation

correct?”

In this exploration, we used an ECG scan of one of the conditions known as the an-

terior stemi. We used this because the stimulus was almost evenly split amongst the


31 participants (16 of them interpreted the scan correct to be anterior stemi while

15 of them got it wrong). A feature on the eye tracker provides an accuracy level

by considering the number of blinks, tilting of the head and eye gazes that are not

within the eye tracker’s detection range. One of the participants that got it correct

was removed due to low accuracy based on this criteria. Eliminating the participant

left the data equally split into 15 correct, 15 incorrect participants.

The data was extracted from the eye tracker and loaded into Python IDE for explo-

ration. After cleaning the data by removing data points with quality less than 70%

(according to Tobii eye tracker), Table 3.2 shows a statistical description of the data

for the left and right pupil grouped by participants accuracy (correct, incorrect). It

Left pupil Right pupil

Statistical measures Correct Incorrect Correct Incorrect

Total count 125259 160230 125390 158337

Mean (mm) 3.73 3.55 3.72 3.55

Std (mm) 0.60 0.47 0.67 0.5

Min (mm) 1.66 1.30 1.00 1.25

Max (mm) 6.39 6.68 6.88 6.39

25% (mm) 3.26 3.24 3.25 3.15

50% (mm) 3.61 3.58 3.57 3.55

75% (mm) 4.06 3.90 3.97 3.88

Table 3.2: Statistical description of the pupil diameter

could be observed that the mean pupil diameter is higher for the group that got it

correct - Left (M=3.73, SD=0.60), Right (M=3.72, SD=0.67) compared to those that

got it wrong - Left (M=3.55, SD=0.47), Right (M=3.55, SD=0.50). It can also be

observed that the standard deviation of those who got it correct is greater than those

who got it wrong. A possible explanation could be that those who got it wrong were

scanning for clues on the ECG scan, and during their search, their pupil size remained

the same and because they found no information of interest (minimal/no arousal);

hence they have less standard deviation than those who got it correct.

We can see from Figure 3.3 that the data is not normally distributed. Therefore, we


performed a non-parametric test of difference to see if there is a statistically signifi-

cant difference between the correct and incorrect participants in terms of their pupil

dilations.

The result of the parametric test is given below:

Figure 3.3: Distribution plot on the left pupil diameter (mm) of correct participants

Left Eye (correct vs. incorrect) U(8728562295.0), p < 0.01

Right Eye (correct vs. incorrect) U(8811297026.5), p < 0.01.

The result supports our hypothesis that the pupil dilation could be a discriminator of

participants who got the ECG readings correct from those who got it wrong.

To explore modelling techniques to be used as our discriminators, we extract certain

features from our datasets. The aim here is to utilise features that are discriminants of

those who got it correct from those who got it wrong. We carry out some pre-processing

on the data. Firstly, a triangulation-based smoothing technique was implemented on

the dataset. The smoothing technique works by triangulation to compute an average

value so that the resulting data is less noisy. The smoothing function uses a win-

dow size so that within the window size, rise/ fall in the curve will be smooth. The

translation of this regarding the physiology of the body is that several autonomic re-

sponses which could occur within seconds can be aggregated into signals that can be

recognised by the system as affective states. The accumulation of autonomic responses


together makes it possible to understand the user’s attitude, while the accumulation

of patterns in attitudes form an affective state, see 2.1. Another advantage of this

smoothing technique is that it reduces the effects of outliers due to noisy data. Figure

3.4 and 3.5 show the effect of smoothing on our dataset. Selecting the correct window

size depends on the frequency rate of the datasets, i.e. how many data points were

captured per second by the eye tracker.

Using the smoothing function, it is now possible to see the troughs in graph for window

size (d = 10) at time (t = 60, 140, 160), and the peaks at t = 150, 170.

The next question is, are these features sufficient to detect or predict participants who

got it correct or wrong? In answering this, statistical features of the datasets were

extracted from the unrefined data and the resulting data after windowing is applied.

A total of 12 features were extracted: mean, median, the standard deviation of the

left and right pupil in both their unrefined and smoothed/windowed state. To know

the features that have the best relationship (negative or positive) with the accuracy of

the participants, we computed the correlation between each feature and the accuracy.

Table 3.3 shows the matrix of the correlation of features. In the next stage, our goal

was to eliminate the features that least discriminate between the participants who got

it correct and those that got it wrong. The metric that was used is Pearson’s corre-

lation. From the table 3.3, it shows the features that least correlate with accuracy:

the standard deviations of both pupils and the standard deviation of their smoothed

state. All others have a higher correlation of at least 0.19.

Next, we try out machine learning classifiers to discriminate between correct and in-

correct predictions, based on features of the pupil dilation.

Our model was trained using logistic regression, KNN, SVM and linear regression.

The higher accuracy was found in logistic regression (52.50%) while the highest was

KNN with 57.22% (at k = 3). All accuracy tests were done using cross-fold validation

(n = 10).

In summary, the secondary analysis of pupillary response data was used to discrim-

inate between medical practitioners that interpreted ECG scans correctly and those

that got it wrong. our model yielded the highest accuracy of 57.22% when using KNN

classifier. we discovered that measures of central tendency such as mean and median

on both the unrefined and windowed form data had a stronger correlation with the


Figure 3.4: Plot of pupillary response (mm) against time (ms).

Figure 3.5: Plot of pupillary response (mm) against time (ms) after applying smooth-ing function with window size (d) = 1, 3, 5, 10.

accuracy of the participants than measures of variance. Windowing allowed us to


observe an aggregation of data points. Assuming that the data points represent au-

tonomic responses (pupil dilation and constriction) the appropriate window size will

enable us to measure more reliably, the affective states of the participants, window

by window. Recall from the background section, that the aggregation of autonomic

responses represent a physiological reaction (2.1). It is this physiological response that

we plan to correlate with users’ affective state in our research.

3.3 Description of pupil dilation

The average range of the pupil size is estimated to be between 2-4mm in bright light

and 4-8mm in darkness, which means that the pupil, at any given time, could be

between 2-8mm [Walker et al., 1990]. Anatomical differences between individuals of

different demographics (race, gender, age) also mean that this range varies from in-

dividual to individual. Some studies refuted the hypothesis of cultural differences in

valence and dominance of an emotion [Ekman et al., 1987]. However, some others

suggested that the absolute intensity of emotion could vary by culture [Russell, 1994].

A study found that people’s reaction to an emotion change with age but it depends

on the valence of the emotion [Charles et al., 2001]. Other factors such as experience

level, intelligence, personality traits may influence the affective state of a user [Picard,

2003].

More so, the range and rate of change can be influenced by the type of stimulus, pre-

vious state (cognitive, affective, experience) of the individual and their environment

(ambience) [Wilder, 1958]. Even when all these conditions are constant, certain health

conditions prevent the pupils from responding predictably. To compound this, there is

a condition called anisocoria that exists in one out of five people where both pupils do

not follow the same physiological pattern of response [Ettinger et al., 1991]. Further-

more, data from the eye tracker can be noisy due to machine errors caused by blinks,

geometry change (position and distance) between eye tracker and user, and indistinct

pupil colours.

In summary, from the literature, and what we learnt from our data exploration in

Section 3.2, the pupillary response is characterised by a noisy time series data with

variability within and between individuals and between different stimuli types. Next,


we examine how the pupillary response has been analysed by previous works.

3.4 Existing approaches to the analysis of pupil

data

In some studies that used multi-modal approaches, pupil dilation contributed the

greatest effect on the accuracy of their models, compared to the other sources of affec-

tive signal [Baltaci and Gokcay, 2014, Soleymani et al., 2012]. Since we have previously

eliminated multi-modal approaches due to availability constraints and ecological valid-

ity (costs, required skills to set-up, obtrusiveness), we will focus on uni-modal methods

that make use of pupillary responses alone.

Many approaches have been taken to analyse pupillary response data. Wang et al.

suggested three approaches for the analysis of pupil dilation: 1. mean pupil dilation,

2. latency to peak and 3. peak pupil dilation [Wang, 2011]. In controlled settings,

where participants interact with a stimulus for a fixed period, the average pupil

size of one eye can be taken, for each participant. Then, the average pupil dilation is

compared against a baseline. The baseline is often the average pupil size of a partic-

ipant while interacting with a controlled stimulus. The controlled stimuli could be a

grey background for which the participant gazes for about ten seconds. Wang et al.

used the pupillary response to discriminate cognitive workload under the influence of

confounding factors such as luminance conditions and emotional arousal [Wang et al.,

2013]. Wang et al. proposed using machine learning algorithms with pupil dilation

features to classify cognitive workload under the influence of these confounders. Iqbal

et al. discovered that using the percentage change in pupil size (PCPS) between the

task and the baseline by averaging the pupil dilations over time, is not an effective

discriminator of mental workload [Iqbal et al., 2004]. They suggested that this could

be due to longer tasks not having sustained pupillary responses and by including pe-

riods where the pupil size has dropped, it will significantly reduce the effect observed

from the data. After decomposing the tasks into smaller tasks, they were able to

observe differences in pupillary response. Partala et al. induced participants with

emotional sounds vs neutral sounds for a fixed duration [Partala and Surakka, 2004].

They observed that participants had higher average pupil dilations for the emotionally


arousing sounds. Bradley et al. displayed emotionally arousing pictures from the IAPS

for a fixed duration of time and used the average pupil size as a discriminator [Bradley

et al., 2008a]. There was a significant difference in pupil dilation when participants

viewed neutral pictures compared to emotional pictures. The drawback of using the

averaging approach is that in naturalistic settings, interaction lasts for variable peri-

ods that may not be predetermined. Also, peoples’ baselines change depending on the

stimulus type and according to the previous affective state [Wang, 2011]; this is known

as the law of initial value (LIV). Another limitation of taking the average of the pupil

size is that the periods in which the participant do not experience increased arousal

are included in the average, thereby, weakening the strength of the signal at the point

where there is an actual change in arousal. Some pertinent questions that challenge

the viability of this approach in the wild are, how will existing methods of detecting

arousal perform if participants interact with stimulus: 1. As long as they wish, 2.

With stimuli of different brightness, 3. Different stimulus types (cognitive, emotional,

etc.) or 4. Without the opportunity to take the pupil baseline measurements?

Wang et al. proposed using pupil dilations and gaze behaviour as machine learning

features [Wang, 2011]. In this approach, statistical properties of the data such as the

mean, standard deviation, and other gaze metrics like saccades, fixation duration are

input as features to the machine learning algorithm. The disadvantage of this ap-

proach is that machine learning is often unreliable when people exhibit idiosyncratic

behaviours [Savva and Bianchi-Berthouze, 2011]. Also, different stimulus types and

contexts could trigger different reactions. For example, when participants experience

stress due to a certain AOI on the screen, they may have longer fixations and microsac-

cades around the same AOI since the source of stress is located at a specific region.

However, when they are stressed due to a search task, people may exhibit a more ran-

dom fixation pattern around the entire screen because there is no specific region on the

screen causing the stress. In these two cases, saccades and fixation metrics could follow

different trends, thereby, sending inconsistent signals to a machine learning algorithm.

Finally, machine learning does not present a transparent way for exploratory research

because it often involves complex algorithms that are not well suited for explaining

the relationships between the features and the outcome variables, in this case, pupil

dilation and arousal.


In AFA algorithm, we used peak detection to sense moments of increased arousal,

while the amplitude of the peak indicates the strength of the arousal, i.e. the arousal

level (AL). The aim of our method is to analyse pupillary response data from an eye

tracker and generate an output, that is an array of time index, for moments where

users experience an increase in arousal. For each item in the index, we would also

identify the area of interest (AOI) where the user focused their attention on, prior

to experiencing an increase in arousal. Furthermore, for each item in the index, we

quantify the magnitude of increase in arousal that the participant experienced. In the

next section, we discuss how we developed AFA algorithm iteratively using static and

interactive stimuli types.

3.5 Iterative development of AFA algorithm

We took a data-driven methodology and developed our arousal sensing method through

the secondary analysis of 2 different studies: study 1 (3.5.1) - ECG static images and

study 2 (3.5.2) - interactive user interface. In the first instance, we used the ECG

study’s dataset to sense arousal for what we call an atomic stimulus. In this concept

of our design choice, an atomic stimulus would be a stimulus where we expect the

interaction to accomplish a single specific task. In this task, the content remains the

same (even though regions on the screen may elicit different affective states from other

regions). A stimulus is no longer atomic when a task contains multiple objectives or

the entire content of the screen changes during the task (e.g. navigating to another

Web page). This distinction is necessary to preserve the law of initial value (LIV). LIV

was first postulated by Wilder et al., and states that the initial physiological values and

its corresponding change (in response to stimulus) are negatively correlated [Wilder,

2014]. Some other researchers, like Jin et al. challenged this relationship and provided

evidence of spurious effects from confounding variables [Jin, 1992]. As this debate is

outside the scope of our work, to avoid the effects of the initial value, we analyse each

stimulus atomically so that we identify a baseline within an atomic stimulus, then sense

changes within the atomic stimulus. For a picture viewing event, we take each image

as an atomic stimulus. We developed our method for analysing such interaction using

the ECG dataset discussed in Subsection 3.5.1. For user interaction involving multiple


atomic stimuli, we can divide each task, such as web searching and filling in a form

to be different atomic tasks/stimulus. We developed AFA algorithm for segmenting a

session of user interaction into atomic tasks in the second study with a study that used

an interactive user interface as stimuli. We expand more on this in subsection 3.5.2.

3.5.1 Study 1

For this study, we reused the dataset that was described in Section 3.2. The eye tracker

captured data at a frequency of 50Hz, i.e. 50 records per second. The dataset for a

participant viewing a single stimulus over a 30 second dwell time will contain approx-

imately 1500 record, which is much more than the anticipated frequency of change in

arousal that could occur within 30 seconds. The relevance of analysing this data for

AFA algorithm was to use data-driven techniques to find out the optimal data aggre-

gation size and technique that most accurately detects when the participants felt an

increase in arousal. The ground truth we used here was the participants self-reported

arousal, and the medical experts annotation of their thought process regarding where

they looked at that informed their decision making for interpreting the ECG scan. We

explored two different fixed-size windowing techniques: simple moving averages and

non-overlapping windows. We aggregated both using three aggregate sizes of 5, 50

and 100 records. Data was collected at 50Hz, so these are equivalent to 0.1s, 1s and

2s respectively. For participants to make an accurate interpretation of the ECG scan,

there are leads that must be observed, as they reveal the abnormalities in an ECG

signal. Our hypothesis was that: Participants will experience an increase in arousal

when they gaze at these salient leads of the ECG (for instance, H, I and J are salient

leads/AOIs for the anterior stemi condition). To determine the optimal setting (win-

dowing technique and aggregate size), we randomly extracted participants data and

applied the algorithm under different settings. We used statistical specificity (recall)

to select the best configuration. The specificity was computed by

Specificity = Correctly detected peaks (True Positive)correctly detected peaks (True Positive)+ incorrectly rejected peaks (False Negative)

This was a suitable evaluation metric for us (rather than the precision, or F-value)

because our ground truth only stipulates some instances (not all instances) where we

expect increased arousal. Also, since the aim of the algorithm is to sense changes


in arousal, we are not interested in using the algorithm to detect the true negative

rate (i.e. when there should not be a peak). However, to reduce the likelihood of

false positives, we selected the setting with the highest accuracy but fewest number of

peaks, as due to confounding factors, some peaks may not be an increase in arousal.

We selected the fixed non-overlapping window size over moving average because com-

pared to the moving average, this technique also summarises the data. AFA algorithm

performed better using 50Hz (1s) because the specificity was 100% compared to 83%

with 5Hz (0.5s). Although 100Hz (2s) also returned 100% specificity, it was difficult

to identify the most fixated AOI during a particular window because over a longer

period, and there could be multiple fixations on different AOIs. Also, with a window

size of 2s, there is an increased likelihood to miss out a peak in arousal because it

takes between 1 and 3 seconds for the pupil size to reach its maximum response (to

a stimulus). Considering that AFA algorithm was built using the opinion of the re-

searcher (also a medical practitioner) who collected the data, we decided to evaluate

the approach using the participant’s self-reported feedback and other variables. We

explore the following variables:

1. Accuracy: This refers to the correctness of their interpretation of the ECG

signal. It is vital to note here that some of the ECG scans were less specific to a

medical condition, and it was possible for participants to get the interpretation

partially correct. In such instances, participants were classified as incorrect,

as this is also an unsatisfactory situation in the real world. For accuracy, we

assumed that participants who got the interpretation wrong might have found the

task difficult, thereby, applying more mental effort. Our assumption is based on

several studies that show negative correlations between stress and performance

[WELFORD, 1973, Akgun and Ciarrochi, 2003, Lazarus et al., 1952]. Stress

results in increased arousal, while accuracy is a proxy for measuring performance

[Scholtz, 2006]. Therefore, We assign a weighting (w) of +0.5 to participants who

got the interpretation wrong, and -0.5 to participants who got it correct.

2. Time spent on interpretation: This is the total duration spent from the start

of stimuli presented to the end, in milliseconds (ms). We assumed here that

the time spent will increase the level of stress experienced by the participant


because spending long on a problem indicates that they find it difficult, thereby,

demanding more cognitive effort. This correlation is evidenced in the literature

of usability metrics [Hornbæk and Law, 2007]. We assign a weighted value (w)

of +0.5 to participants who are in the first third for this variable, and -0.5 to

participants in the lower third.

3. Participant’s perceived difficulty of the experiment: We transcoded the

participant’s reported difficulty of the experiment on a scale of 1 to 10, with

one representing easy and ten indicating difficult. As evidenced in the literature

[Gellatly and Meyer, 1992], our assumption for this variable is that the difficulty

level would increase the level of arousal. We assign a weighting (w) of +0.5 to

participants who are in the first third for this variable and -0.5 to participants

in the lower third.

4. Experience level of participants: Participants included students, physiolo-

gists, nurses, cardiologists, healthcare assistant, etc. Participants years of expe-

rience ranged from 2-year nursing student to an advanced cardiac physiologist

with 30 years of work experience. Participant’s experience level was inferred by

their job type and years of experience. For example, a third-year medical stu-

dent would be assigned an experience level of 3 while a newly hired Cardiology

registrar will be assigned an experience level of 6, to account for the five years

of training. Consequently, the cardiac physiologist with 30 years of work expe-

rience was assigned an experience level of 35 to include five years in training.

Following evidence in the literature, we assume that experience level would in-

creases the anxiety, thereby increasing their arousal level [Wahn et al., 2016]. We

assign a weighting (w) of +1 to participants who are in the first third in terms

of experience, and -1 to participants in the lower third in terms of experience.

Finally, we sum the scores for each participant for the four variables. This sum

represents the expected level of stress (arousal) for that task. We compare this against

the number of arousal points (peaks) for each participant. Table3.4 shows the correla-

tion matrix and Figure 3.6 illustrates this in the form of a heatmap where the darker

cells indicate a higher correlation than the lighter cells. Figure 3.6 reveals that the

accuracy had the least correlation with arousal, followed by the time spent, then the


Figure 3.6: Heatmap to illustrate our predictor variables(Exerpeince, Accuracy, Timespent, Difficulty), and the total stress score with our outcome variable (No. of Arousalpoints - peaks

participant’s perceived difficulty level. The experience level was the variable that had

the highest correlation to arousal. The moderate correlation between the expected

stress score and the number of arousal points was r(30) = 0.62, p <= 0.01. Using

this study, we developed an approach to evaluate static (atomic) stimuli. In the next

study, we examine the case of interactive stimuli.

3.5.2 Study 2

Having examined static (atomic stimuli), this study aims to develop an approach to

sense arousal from given interactive stimuli. The original aim of this experiment that

we adopted in this study was to evaluate the impact of a plug-in that was built to

support the users in authoring and managing ontologies. Participants were required to


have expertise in ontologies. From the data, twenty-nine (29) participants’ eye-tracking

data were captured; fourteen (14) male, fifteen (15) female, between twenty-two (22)

and fifty-seven (57) years of age (mean 33.28). Participants interacted with Protege,

version 5.0, which is an open-source ontology engineering tool developed by the Uni-

versity of Manchester and Stanford University. Protege was pre-configured with the

inference inspector and a custom developed plugin Protege Survey Tool (PST). For

this study, the user interface was segmented into 5 AOI’s - view, progress, scenario,

question and action; Figure 3 shows this. Figure 3: Protege user interface with

Figure 3.7: Areas of interest overlaid on a Protege’s UI

Areas of Interest overlaid. See [Matentzoglu et al., 2016] for more information about

the study. This dataset poses a unique challenge because it contains user interaction

and the data spans for a longer time compared to Study 1. This is an example of an

interaction that we expect to have multiple atomic stimuli within it. Although it was

possible to manually segment the interaction into the tasks that the participants were

given, we chose not to do so because, in naturalistic settings, AFA algorithm would

not have real-time knowledge of the task start/end times. The relevance of analysing

this dataset was first to decide on a technique to split the data into chunks while re-

specting the temporal order, and secondly, to use AFA algorithm to assign areas on the

screen with arousal levels for the users’ interaction. For the segmentation, we explored

several techniques including fixed-size segmentation, clustering and changepoint de-

tection. The choice of an approach to use depends on the context of the application,


the parameters available, and the end goal.

Fixed-size segmentation would ensure that datasets are divided into fixed sizes. How-

ever, in using this approach, there is an increased likelihood that the same task could

be split into two atomic stimuli. Clustering is another option for grouping together

tasks with similar characteristics, but the temporal order of the interaction needs to

be considered. It will be incorrect to assemble a segment that consists of data points

belonging to different temporal spaces. Another restriction with clustering is that

clustering is usually applied when we have the entire datasets, which means that it is

not a practical solution for real-time applications. By real-time applications, we mean

applications in which new data points are being generated as the user interacts. As

new data points are formed, it will be difficult and sometimes impossible to anticipate

how new clusters should be composed apriori. Pattern recognition of cyclic (repeti-

tive, periodic) changes could also be another approach to segmenting an interaction

into atomic stimuli. This detects when certain events repeat themselves. However,

it is not always the case that tasks have a cyclic repetition pattern. This approach,

when applied in tasks where there are no cyclic or repetitive actions, will not support

segmentation. Another option is changepoint detection. Change point detection algo-

rithm computes the likelihood of change in the statistical properties of a dataset and

the likelihood that the change takes place at a certain record in the dataset. Change-

point detection is implementable in non-real time and real-time data. Inputting a

certain property such as interaction evens into the changepoint algorithm is possible.

Interaction events such as clicks or typing can be fed into the changepoint detection

algorithm. There are univariate and multivariate changepoint detection algorithms.

The univariate takes a single variable to detect a change, while the multivariate de-

tects changes in multiple variables to assign probabilistic values to whether a certain

data point is a change point. The flexibility and robustness of this technique informed

our decision to use this algorithm to split long interaction data (e.g., greater than 5s)

informed our choice of this approach.

In AFA algorithm, we used the Bayesian change point detection implemented in Python

PL by Johannes Kulick (MIT 2014). The input to the method is user event duration,

and our output is a vector of the same length containing probabilities of a change in the

event pattern and the probability of that change occurring at a certain point. Fig 3.8


shows a sample result of implementing the change point between input events (mouse

clicks, typing, scrolling) during a task. A cut-off value can be set to accept a period

as a change point. For example, the interaction in Figure 3.8 lasted for approximately

600s. If the cut-off probability is set at 0.8 (80%), we would have 21 changepoints for

that period, which is two tasks per minute. Using a threshold of 80%, we segmented

Figure 3.8: Top - input event, Bottom - probability of change point

the interaction into atomic segments. Then, we applied AFA algorithm discussed in

Section 3.6 to each segment. We took the domain expert’s rating on a scale of 1 to

5, how much arousal is expected per AOI, considering cognitive demand, attention,

anxiety and stress. We computed the mean of all four variables to generate an ex-

pected arousal value for each AOI. Next, we performed our arousal analysis using AFA

algorithm described in Section 3.6, then calculated the sum of all the arousal levels

obtained for each peak, for each participant, on each AOI. The result is presented

in Table 3.5. Following this, we performed Spearman’s correlation test between the


result of AFA algorithm and arousal rating that was obtained from the domain expert.

We observed a strong positive correlation r(5) = 0.82, p = 0.089. This result shows

promise and we anticipated that with increased sample size, our p-value would be

lower.

3.6 Implementation

The execution flow of AFA algorithm can is illustrated in Figure 3.9.

Figure 3.9: Execution flow of our arousal detection approach

Figure 3.10: Graph of Arousal Level against time (s)

Pupillary response, which is the diameter of the pupil in millimetres is captured by

an eye tracker at a specified frequency rate. For a 50Hz capture rate, the eye tracker

records the diameter of the pupil once every 20ms.


Function 1: Non-overlapping window aggregation

Input: Array of pupil dilations (pupilDilation[]), eye-tracking frequency

(frequency)

Output: Array of non-overlapping widow (window[])

windowIndex = 0;

for each element in pupilDilation[] do

append element to tempArray;

if (index is divisible by frequency) then

median = median(tempArray); window[windowIndex] = median;

windowIndex ++;

clear tempArray;

end

end

return window[]

The analysis is split into three major phases: 1. Data transformation (preparation,

aggregation, and transforming the pupil data to categorical data that represents each’s

level of arousal), 2. peak detection (to detect individual changes in levels of arousal)

and 3. Inference phase (to compute the cumulative impact of arousal when partici-

pants focus on an area of interest on the screen). In the first phase of the analysis,

the data is cleaned using linear interpolation to replace missing values and blinks. The

data for both study 1 and 2 were cleaned using linear interpolation to replace missing

values and outliers (values below 2mm and above 8mm, which is outside the range of

pupil size).

After cleaning, the highly granular data is aggregated into fixed-size windows at non-

overlapping contiguous time intervals (tumbling windows) as expressed in Function

1. The window size should be equal to the eye-tracking capture frequency in Hertz

(Hz ) so that every window is 1s long. The rationale for the aggregate size is that it

takes approximately 400ms for the human pupil to react to cognitive stimulus and for

emotional stimulus, up to 600ms. However, it takes between 2 and 3 seconds from the

time of exposure to a stimulus for a participant to attain a peak in arousal. The aggre-

gation of 1s is short enough that peaks are not split within windows but long enough

to reduce the effects of outliers in each window. This is done for each stimulus for


each participant. The output of Function 1 is passed into a function which transforms

each element into categorical data that represents the level of arousal. This function

computes the range of the individual’s pupil dilation, the unit of change for that partic-

ipant so that the arousal can be modelled based on each person’s pupil characteristics.

Figure 3.11a shows the raw pupil signals from the eye tracker. Figure 3.11b shows the

signal after aggregating it into windows of size 1s (50Hz in our case, makes one sec-

ond). Finally, Figure 3.11c shows the arousal levels after transforming the aggregated

data, according to each participant’s physiology characteristics (range and measure

of central tendency). The pseudo code for this process is expressed in Function 2.

Function 2: Data transformationInput: Array of non-overlapping widow (window[]), number of levels (levels)

Output: The level of arousal for each window (window[])

index = 0;

pupilRange = max(window[]) - min(window[]);

unitOfChange=pupilRange/(levels - 1);

med = median(window[]);

for each element in window[] do

change = window[index] - med;

window[index] = change / unitOfChange;

index ++;

end

return window[]

Now that the pupil dilations have been modelled into categorical levels of arousal,

we must detect when a participant has been experienced a change in arousal. The

second phase uses a peak detection function to sense increase in arousal levels,

expressed in Function 3. The array returned by Function 3 contains the indices of

windows where the participant has experienced an increase in arousal. To compute

the magnitude of the increase in arousal (V), we calculate the difference between the

arousal level at each peak index and the lowest point before the peak. This magnitude

does not tell us the true impact of this arousal because we know that the more the

participant interacts with an arousing stimulus, the greater we expect the impact of

the stimulus to be [Iqbal et al., 2004]. Therefore, in the third phase, the impact of

an area of interest (a) on arousal (A) on is computed by:


Function 3: Arousal peak detection

Input: Array of categorical levels of arousal per window (window[])Output: Array containing indices of peaks detected (peakIndices[])index = 0;for each element in window[] do

if ((window[index] ¿ window[index− 1]) and (window[index] ¿=window[index+ 1])) then

append index to peakIndices[];end

endreturn peakIndices[]

A(a) = (∑

(V ))t

The arousal magnitude (V), multiplied by the participant’s total fixation duration (t)

for the area of interest (a). The hypothesis is that the arousal magnitude (V) repre-

sents the intensity of the stimulus, but the impact of the stimulus on the participant

is compounded by the total duration spent on the stimulus (t). We are aware that

many algorithms for sensing emotion tend to normalise their outputs by time. What

that measures, is the intensity of the stimuli, rather than the effect on the individual.

Since we aim to use AFA algorithm for sensing arousal during user interaction, for the

purpose of improving user interaction, we need to measure the total impact that a

stimulus has on a participant. Take, for instance, the case of boiling water from a pot.

The intensity of heat (analogous to the intensity of the stimuli), can be measured by

the temperature. However, if a pot of water is placed on a burning furnace, to mea-

sure the impact of the heat from the furnace on the water, there would be a difference

between two pots that are placed on the furnace for 1 second, compared to 10 minutes.

Therefore, to compute the impact on the water (analogous to users), we must consider

the temperature (intensity) as well as the time spent on the furnace (duration).

Henceforth, our project methodology, methods, and the algorithm we developed would

be referred to as Algorithm for sensing Arousal and Focal Attention (AFA). The visu-

alisation of the output of AFA algorithm is presented in the next section.


3.7 Visualising the output of AFA algorithm

The output of AFA algorithm is an array of time indexes where participants experience

an increase in arousal (peak). For each index, we extract: 1. the magnitude of increase

2. the time the increase occurs, and 3. the most fixated area of interest during the

moment of increase. Therefore, with these three variables, we can explore changes

in arousal in the form of temporal trends (time), or as a function of the users’ focal

attention (AOIs). Temporal trends will enable researchers/observers to understand

how the measure of arousal changes with time. For example, participants experienc-

ing a decline in arousal from the first fixation on the web page to when they exit the

web page may indicate a progressive loss of interest or boredom. This would be an

example of a hypothesis that can be generated, that the researcher could carry out

further investigations on. Observing patterns of arousal from the perspective of the

users’ visual attention could inform researchers or designers of components/contents

on the web page that cause users stress or frustration. For example, a confusing UI

control may cause users to experience an increase in arousal when they fixate on it.

Our initial prototype was a dynamic and interactive visualisation. This visualisation

enables users to replay the dynamics of arousal for multiple users to time and region

of focal attention (AOI) over a stimulus. Figure 3.12 is a screenshot of this prototype.

This prototype is divided into 11 segments, which we will describe further. After car-

rying out a study using an eye tracker, the data is extracted from the eye tracker and

analysed using AFA algorithm. In Segment 2 (the file loading panel), the investigator

uploads the image file of the stimulus (.png, .jpg, .gif), the coordinates that define

the areas of interest (.csv), the aggregated result of the analysis (.csv), and the break-

down of the analysis by participants (.csv). Segment 7 displays the AOI as coloured

grids overlaid unto the stimulus image. The AOIs are also labelled accordingly. In

this example, the stimulus is an image of an ECG scan that was divided into thirteen

AOIs (A-M). In instances where the algorithm is unable to identify the most fixated

AOIs during a moment of arousal, Segment 3 still displays the level of arousal by the

participants. The coloured grids are similar to heatmaps, as the colour changes with

respect to the level of arousal. The key that maps to the levels of arousal (from low to

high) is displayed in Segment 6. The colour scheme in this example ranges from white


(low) → Yellow → Red (high), but there are other schemes that can be selected from

Segment 5, according to the investigator’s preference. The arousal levels displayed in

Segment 7 is the cumulative arousal for the participants that are selected in Segment

8. All participants can be selected/deselected using the control in Segment 4. Individ-

ual arousal levels are broken down into participants in Segment 9, where the previous

arousal state (with the label indicating the focal attention) is shown in the first grid,

the current state shown on Segment 7 is shown in the second, and the next state is

shown in the third. As we stated before, the display in Segment 7 changes with time

(in seconds). Therefore, Segment 10 allows the investigator to change (or view) the

time frame for which the arousal levels on Segment 7 is shown. Segment 11 allows the

investigator to control the replay (pause and play), and the playing speed.

This tool allows researchers to formulate hypothesis by observing trends temporally

and spatially across different AOIs. The ability to select participates also means that

outliers can be spotted. Group comparison can also be made, but not easily achieved

simultaneously since one interface controls the visual exploration. Another limitation

of this approach is that, because the visualization is replayed as a video, it does not

present an overview of the arousal dynamics at one glance. Therefore, the visuali-

sation offers limited options to communicate a pattern to fellow researchers for print

media (e.g. publications) as taking a screenshot of the visualisation will only reveal

a period of the interaction. This visualisation was built using Javascript, CSS and

HTML5 while AFA algorithm was developed in Python programming language, which

means that there was no software integration between the visualisation and the anal-

ysis. Due to the limitations mentioned above, we designed a visualisation toolkit

that enables UX researchers to analyse their data from the eye tracker and view the

dynamics of their participants’ arousal ass a single frame, rather than as a video. Be-

cause this toolkit was developed using the Flask-Django framework for python, as well

as HTML5, CSS and Javascript, it enables researchers to interact with the software

from the data selection, to the analysis, all within the same UI environment. In our

design, we have two modes of interaction; one has the option of viewing participants’

changes in arousal as a function of time, and the other is to visualise the arousal as a

function of their visual attention over the AOIs that they fixated upon. Figure 3.13

shows a screenshot of the user interface of our arousal toolkit.


Segment 1 of the toolkit is the menu. Following the “Enter data” hyperlink, the

researcher can load their eye-tracking dataset, followed by the execution of AFA algo-

rithm. The next hyperlink “Analysis”, enables takes us to the page displayed in Figure

3.13 where the visual analysis/exploration takes place. In Segment 2, the researcher

selects either the “Arousal timeline” mode or the “Arousal areas” mode. Segment 3

allows the researcher to select the participants to extracted from the result of the eye-

tracking analysis for visualisation. In Segment 4, the researcher selects the stimulus

they are interested in. The result is then visualised in Segment 5 (the work area).

Segment 6 shows the key for interpreting the arousal bubbles in the Arousal timeline

graph. As we stated, there are two modes of visualisation in our arousal toolkit.

Figure 3.14a is the arousal timeline mode, and Figure3.14b is the arousal areas mode.

In the first mode, all peaks are plotted on a time scale, and the size of the circle is the

magnitude of the peak (arousal level). It is colour coded by participants in case their

interaction lasts for a long time, and we have to scroll to view the later parts of the

interaction. This visualisation is not so much about where they are looking at, even

though we can see this detail when the mouse pointer hovers on a peak. For example,

we can see that Participant P2M had at a peak in arousal with a magnitude (AL) =

‘7’. This peak happened 50 seconds into the interaction while his attention was on

AOI ‘B’.

In the second mode, researchers can visualise the measure of arousal induced by an

area of interest (A, B, C, D) for each participant. The lowest row shows a cumulative

value for all the participants regarding an area of interest. The darker the value, the

more the arousal as indicated by the key on the right side of the figure.

The aim is to be able to input eye-tracking datasets into our tool and generate visuali-

sations about the user’s arousal dynamics that can be appended into scientific papers,

presentations or UX evaluation report. The toolkit can also be used to formulate a

hypothesis which can be investigated using other formal statistical or qualitative re-

search methods.


3.8 Conclusion

In this chapter, we explored existing devices for capturing eye-tracking data. For our

studies, we chose a low-end portable eye tracker, as we anticipate that web-cameras

with such fidelity would soon become mainstream. We discussed the rationale for

our technique, especially the use of peak detection to sense arousal. As we said, it

provides the opportunity for an event-based approach, where moments of arousal can

be detected along with the visual attention of users during these moments. This

event-based approach may be used to facilitate third-party applications such that

actions can be triggered when the arousal level reaches a certain threshold or when

certain visual element triggers arousal. In situations where arousal is being tracked

to sense attention, third-party applications that utilise AFA algorithm could draw the

attention of participant to other parts of the screen that they may have missed but are

crucial to the user on the platform. Third-party applications could be developed to

automatically change the layout of a web-page to present less information in the case

of cognitive load. Further, display a tooltip text over visual elements that cause stress

on a page, or suggesting breaks in times of extreme stress thereby preventing mental

fatigue. We are aware of the fact that there are several causes of arousal, including

responses to emotion-evoking stimuli, cognitive load and frustration. The evaluation

of AFA algorithm aims to cover these sources of arousal to assess the generalisability of

our findings. Therefore, we designed three lab-based experiments to assess the ability

of AFA algorithm to detect arousal induced by these stimuli types. We discuss these

experiments in the next two chapters.


Tab

le3.

3:M

atri

xsh

owin

gth

eP

ears

on’s

corr

elat

ion

ofst

atis

tica

lfe

ature

.(L

-le

ft,

R-

righ

t,W

-w

indow

,st

d-

stan

dar

ddev

iati

on)

Rm

ean

Lm

ean

Rst

dL

std

Rm

edia

nL

med

ian

RW

mea

nLW

mea

nR

Wst

dLW

std

RW

med

ian

LW

med

ian

Acc

ura

cy

Rm

ean

1.00

0.94

0.41

0.09

0.99

0.95

0.10

0.94

0.40

0.11

0.10

0.95

0.19

Lm

ean

0.94

1.00

0.34

0.11

0.93

0.10

0.94

0.10

0.35

0.16

0.93

0.99

0.23

Rst

d0.

410.

341.

000.

630.

350.

320.

410.

340.

920.

610.

360.

36-0

.07

Lst

d0.

090.

110.

631.

000.

070.

090.

100.

120.

590.

940.

080.

120.

05

Rm

edia

n0.

990.

930.

350.

071.

000.

940.

990.

940.

330.

080.

100.

940.

20

Lm

edia

n0.

951.

000.

320.

090.

941.

000.

951.

000.

330.

120.

940.

990.

24

RW

mea

n1.

000.

940.

410.

101.

000.

951.

000.

940.

410.

120.

990.

950.

19

LW

mea

n0.

941.

000.

340.

120.

941.

000.

941.

000.

360.

160.

940.

990.

22

RW

std

0.40

0.35

0.92

0.59

0.33

0.33

0.41

0.36

1.00

0.67

0.34

0.36

-0.0

8

LW

std

0.11

0.16

0.60

0.94

0.08

0.12

0.12

0.16

0.67

1.00

0.09

0.15

0.02

RW

med

ian

0.99

0.93

0.36

0.08

1.00

0.94

0.99

0.94

0.34

0.09

1.00

0.95

0.19

LW

med

ian

0.95

0.99

0.36

0.12

0.94

0.99

0.95

0.99

0.36

0.15

0.95

1.00

0.21

Acc

ura

cy0.

190.

23-0

.07

0.05

0.20

0.24

0.19

0.22

-0.0

80.

020.

190.

211.

00


Tab

le3.

4:M

atri

xco

mpar

ing

our

pre

dic

tor

vari

able

s(E

xer

pei

nce

,A

ccura

cy,

Tim

esp

ent,

Diffi

cult

y),

and

the

tota

lst

ress

scor

ew

ith

our

outc

ome

vari

able

(No.

ofA

rousa

lp

oints

-p

eaks)

Exp

eri

en

ceA

ccura

cyT

ime

spent

Diffi

cult

yStr

ess

score

No.

of

Aro

usa

lp

oin

ts

Exp

eri

en

ce1.

000

-0.0

27-0

.179

-0.4

920.

505

0.45

4A

ccu

racy

-0.0

271.

000

-0.1

65-0

.218

-0.7

57-0

.278

Tim

eSp

ent

-0.1

79-0

.165

1.00

00.

078

-0.0

97-0

.200

Diffi

cult

y-0

.491

-0.2

180.

078

1.00

00.

012

0.00

5Str

ess

score

0.50

5-0

.757

-0.0

970.

012

1.00

00.

618

No.

of

Aro

usa

lp

oin

ts0.

454

-0.2

78-0

.200

0.00

50.

618

1.00

0


Table 3.5: Expected arousal level (M ) vs. computed arousal level (Output)

AOIScenario Action Progress View Question

Arousal

Cognition 1 2 1 5 4Attention 1 2 1 5 5Anxiety 1 1 2 5 5Stress 1 2 1 5 4

M 1 1.75 1.75 5 4.5Output 99 533 17 3464 1526


(a) Participant’s raw pupil signal from the eye tracker

(b) Participant’s aggregated signal (window size = 1s)

(c) Participant’s discretised arousal levels after converting it to scale (1-9)

Figure 3.11: A comparison of the raw pupil dilation extracted from the eye tracker,with the processed arousal signal, after converting to arousal levels


Figure 3.12: Arousal explorer tool

Figure 3.13: Arousal toolkit


(a) Arousal timeline

(b) Arousal areas as heat map

Figure 3.14: Modes of visualisation in our arousal toolkit

Chapter 4

Evaluating AFA algorithm on

emotion evoking and cognitively

induced arousal

In the previous chapter, we proposed our methodology for sensing arousal. As we

said, our methodology works by aggregating pupil data extracted from the eye tracker

to even out the effect of outliers. The aggregated data is converted modelling to fit

the users range and central measures. Further, we detect peaks, which are moments

where participants experience an increase in arousal. For each peak, we compute the

magnitude of change, and also identify the users focal attention during that moment of

increased arousal. Finally, we multiply the magnitude of arousal caused by each AOI

with the duration spent fixating on that AOI, to compute the cumulative measure

of arousal due to the AOI. Recall that in our problem statement (Chapter 1), we

identified possible reasons why affect detection has limited potential for ubiquitous

widespread use. The most critical reason was in the selection of the affect detection

mechanism. Another reason we identified for the lack of widespread use was noise and

idiosyncratic physiological response. Our method (Chapter 3) tackled this through

data cleansing, aggregation and fitting the data to individual baselines. We address

confounding factors such as participants’ previous affective states (LIV), stimulus-

response specificity and colour intensity through our concept of an atomic stimulus.

By considering each stimulus in isolation, thereby, detecting intra-stimulus changes

using our peak detection approach, we were able to circumvent the challenges above.

104

CHAPTER 4. EVALUATING AFA ALGORITHM 105

In this chapter, we focus on another factor that often yields inconsistent results in the

field of affective computing, which is the lack of generalisability of affect detection due

to varying stimulus types. We state why the evaluation we carry out in this chapter

is a vital process to attaining a generalisable solution.

4.1 Rationale and motivation

Physiological arousal could indicate the presence of stress [Zhai and Barreto, 2006, Sun

et al., 2010], boredom [Chanel et al., 2008], attention [Lang, 1990] and cognitive load

[Shi et al., 2007]. Other states such as intense joy, sexual feelings, anger, surprise

also result in increased arousal, even though they are considered emotional states.

There are other states, which can be a combination of both cognitive and emotional

states such as anticipation, activation, and frustration. These concepts differ in seman-

tic meaning to individuals who experience them but may share certain physiological

characteristics if they result in increased arousal. However, certain approaches may

be capable of sensing one category of arousal accurately, but fail to sense arousal in

another, as emotional processing and cognitive processing could follow different neu-

rological pathways. The need to treat emotions and cognition as separate concepts is

a source of debate amongst researchers from different domains.

For instance, psychologists and neuroscientists have expressed divided opinions over

the premise that “emotion and cognition are distinct concepts” or whether “emotions

are part of cognition”. Gerrod Parrott et al. argued that for humans to adapt to

emotions, we must anticipate, interpret and perform problem-solving functions, all of

which require cognitive processing [Parrott and Schulkin, 1993]. They also argued that

the brain’s central system of control for its sensory functions means that no part of the

brain is purely emotional [Parrott and Schulkin, 1993]. Joseph E. LeDow’s attempted

to debunk this view using the analogy of the brains mechanism for interpreting visual

perception and reacting accordingly with motor actions [Ledoux, 1993]. When we see

a potentially harmful object, there is some cooperation between the vision and the

motor function of the brain, to avert danger (e.g., by moving away from the object).

Despite this overlap between the visual and motor faculty, the separationist view is

that it is useful to examine the vision and motion functions independently [Ledoux,


1993]. This argument is beyond the scope of our research, and our aim is not to inves-

tigate these theories. However, since people encounter emotionally evoking contents,

as well as cognitively demanding contents during user interaction, it is important to

ensure that AFA algorithm yields consistent result under both circumstances. There-

fore, we decided to evaluate AFA algorithm on pupillary responses to both cognitive

and emotional stimuli. We start by evaluating AFA algorithm on emotionally evoked

arousal.

4.2 Sensing emotionally evoked arousal

As early as 1981, Paul R. Kleinginna et al. identified 92 definitions of emotions from

the literature. In an attempt to come up with a consensual definition, the following

was proposed:

“Emotion is a complex set of interactions among subjective and objective

factors, mediated by neural and hormonal systems, which can (a) give rise

to affective experiences such as feelings of arousal, pleasure/displeasure;

(b) generate cognitive processes such as emotionally relevant perceptual

effects, appraisals, labeling processes; (c) activate widespread phys-

iological adjustments to the arousing conditions; and (d) lead to

behaviour that is often but not always, expressive, goal-directed, and adap-

tive” [Kleinginna and Kleinginna, 1981]

The key-point from this definition is found in (c), that emotional experiences could

activate arousal and physiological responses. In another view by Mehrabian et al., they

opined that most emotional states could be described using three nearly orthogonal

dimensions known as the Pleasure-Arousal-Dominance (PAD) scale [Mehrabian, 1996].

In this view, arousal indicates the strength and intensity of emotion. Therefore, our

expectation is that when people experience an emotion, we would be able to sense

arousal elicited by this emotion, using AFA algorithm, because arousal is the intensity

of emotion. The objective of this section is to evaluate our arousal sensing approach

on its ability to utilise physiological signals to sense arousal from emotionally-evoked

stimuli.

In order to evaluate this, we need to select stimuli that are capable of eliciting a


measured amount of emotional responses unto the participants. The analysis involves

comparing the self-reported (expected measures) of arousal against the output of AFA

algorithm. We describe our experimental design next.

4.2.1 Experiment

Participants

41 (9 female and 32 male) participants were recruited to take part in the study at

the University of Manchester. Participants’ mean ages were between 16 and 37 (M =

26, SD = 4.3). All the participants were students with their highest qualification

being (4 GCSE, 22 A-level, 9 Bachelors’ and 6 Masters’). Recruitment was done by

word of mouth, participation was voluntary, and the withdrawal was possible before,

during and after the experiment. This experiment was approved by the University of

Manchester’s research ethics committee (approval number: 2017-1906-3160). The

participant information sheet and the consent form for this study are appended in

Appendix A.1 and A.2 respectively.

Stimuli and selection criteria

There are several options for databases that contain emotion-evoking stimuli. For

example, the NimStim Face Stimulus Set for studies based on facial stimuli [Totten-

ham et al., 2009], the Geneva Affective PicturE Database (GAPED) [Dan-Glauser and

Scherer, 2011], which includes a rating of normative significance, and the International

Affective Picture System (IAPS) [Lang et al., 1997]. For a review of the use of affective

stimuli datasets, see [Horvat et al., 2013]. 12 Pictures identified by (7175, 7010, 2513,

2440, 2312, 2359, 8231, 9031, 4597, 1302, 8492, 1321) were selected from the Interna-

tional Affective Picture System (IAPS) database as it is widely used and contains the

most varied selection of images. Due to the terms and conditions of the use of images

from the IAPS, these images were not be published or described in graphic details

here in this thesis. Each picture contained the mean and standard deviation of the

valence, dominance and arousal ratings using the self-assessment manikin (SAM) scale

[Bradley and Lang, 1994]. These ratings were accumulated from approximately 100

participants per picture who self-reported their emotional response to each stimulus,


see [Lang, 2005] for more information about this. We rounded off all the mean arousal

rating of the pictures in the database to their nearest whole numbers. After rounding

up, all ratings now had discrete arousal values between two and seven. After that, we

selected two pictures from each level (2-7) such that one had a relatively high valence

and the other had a relatively low valence so that we had a widespread of valence

between the images. To ensure that the mean value of arousal for the images selected

elicited a consistent feeling amongst the participants who rated them, we selected

pictures that the standard deviation (SD) of their arousal rating (as given by IAPS)

were more than 2. Finally, we ensured that no extremely violent or erotic picture was

selected for ethical reasons. Table 4.1 shows the description and affective ratings of

the pictures that were selected.

Materials and procedures

Participants viewed each stimulus (12 images) at a distance of ∼ 65cm from a 17-inch

monitor. Tobii X2-60 eye tracker was used to capture gaze behaviour at an angle of ∼

30◦ and pupillary response every 20ms (f = 50Hz). Each participant was presented

with each image in counterbalanced order for as long as they needed to view them.

They were instructed to press the space bar on their keyboard to proceed to the next

image. Before the next image was presented, a plain grey image was displayed to them

for 3 seconds so that their pupil sizes can return to baseline before engaging with the

next image. At the end of the entire presentation, the participants were instructed

to rate each image on a Likert-type scale on paper, how aroused they felt seeing

each image. To clarify the term arousal for them, we phrased the question using loose

terms that they may associate with arousal. The question was, “Please rate the images

you just viewed according to how much arousal (stress, anxiety, cognitive load, fear,

excitement) you felt”, see Appendix A.3. This method of self-report was adopted for

the ease of explaining to the participants how to self-assess their emotions as, during

the pilot study, we discovered that participants could not interpret the other methods

used, such as the SAM scale, without undergoing some training. After participants

rated their levels of arousal, we took note of comments, feedback or self-reflection from

the participants regarding the study. Other equipment used include 17-inch monitor,

Tobii Studio 3.2 software, keyboard and mouse.


Tab

le4.

1:IA

PS

Sti

muli

show

ing

the

des

crip

tion

,ar

ousa

l,dom

inan

cean

dva

lence

valu

esof

each

stim

ulu

s.

IAP

SID

Desc

rip

tion

Aro

usa

lM

(SD)

Aro

usa

l(r

ounded)

Vale

nce

M(S

D)

Dom

inance

M(S

D)

7175

Lam

p1.

72(1

.26)

24.

87(1

.00)

6.47

(2.0

4)7010

Bas

ket

1.76

(1.4

8)2

4.94

(1.0

7)6.

70(1

.48)

2513

Wom

an3.

29(1

.67)

35.

8(1

.29)

5.92

(1.7

12440

Gir

l2.

63(1

.70)

34.

49(1

.03)

5.97

(1.8

9)2312

Mot

her

4.02

(1.6

6)4

3.71

(1.6

4)4.

72(1

.73)

2359

Mot

her

and

Child

3.94

(1.7

3)4

5.87

(1.4

1)5.

49(2

.20)

8231

Box

er5.

24(1

.84)

53.

77(1

.83)

4.68

(1.9

1)9031

Shoe

inth

em

ud

4.82

(1.9

2)5

3.01

(1.5

9)4.

68(1

.81)

4597

Rom

ance

5.91

(1.8

6)6

6.95

(1.6

5)5.

64(2

.11)

1302

Dog

6.00

(1.8

7)6

4.21

(1.7

8)4.

04(2

.11)

8492

Rol

ler

coas

ter

7.31

(1.6

4)7

7.21

(2.2

6)4.

63(2

.41)

1321

Bea

r6.

64(1

.89)

74.

32(1

.87)

3.51

(2.1

2)


Results

Datasets of 41 participants over the 12 stimuli were extracted from the eye tracker.

Figure 4.1a, 4.1b and 4.1c show the mean time to first fixation, mean fixation count

and total fixation duration, respectively, on all images. From Figure 4.1a, we can

see that on the average, participants took longer to fixate on image 2440. They also

had the fewest number of fixations, as well as fixation duration as shown in Figure

4.1b and 4.1c. This image was that of a girl, and had a low IAPS arousal rating

(M = 2.60, SD = 1.70) and a neutral valence (M = 4.49, SD = 1.03) as shown in

Table 4.1. Recall, that the aim of this experiment is to evaluate AFA algorithm against

the self-reported measures, which we consider to be the ground truth for this study.

To find out the ground truth correlates with AFA algorithm, we carry out the following

data preparation actions, then run the eye-tracking dataset through AFA algorithm.

Firstly, instances that participants viewed the pictures for less than 3 seconds were ex-

cluded because it takes 2-3 seconds for the pupil dilation to reach its peak. Also, only

the first four stimuli viewed per participants were included in the analysis to reduce

the effect of disinterest. Furthermore, records with less than 70% accuracy (measured

by, the number of times that the eye tracker was able to capture both eyes) were

excluded, as this was the recommended filter on Tobii studio. After data preparation

and running the data through AFA algorithm, we observed the following results. For

example, the same image 2440 that showed evidence of low attention and interest as

observed by the time to first fixation, fixation count and fixation duration, also showed

the least arousal rating from AFA algorithm. Figure 4.2 shows each stimuli against

the algorithm’s mean arousal, the participants’ mean arousal rating and the mean

IAPS arousal rating. The correlation between the mean IAPS arousal, participant

Table 4.2: Correlation between the mean IAPS arousal rating, self-reported rating andthe algorithm’s arousal level (scaled between 1 and 5).

X Y r p

IAPS Reported 0.90 <.05IAPS Algorithm 0.59 <.05Reported Algorithm 0.51 <.05

reported arousal and the algorithm’s output is presented in Table 4.2. The strong

positive correlation, r(12) = .90, p <= 0.01 between the mean self-reported arousal


(a)

Mea

nti

me

tofi

rst

fixat

ion

(s)

(b)

Tot

alfi

xat

ion

cou

nt

mea

n

(c)

Tot

alfi

xat

ion

dura

tion

mea

n

Fig

ure

4.1:

Gaz

eb

ehav

iour

acro

ssal

l12

stim

uli)


Figure 4.2: Stimuli against the algorithm’s arousal rating, participant’s reported feed-back, and the IAPS arousal ratings

of participants and the mean ratings of the IAPS validates the dataset as ground

truth. The strong correlation between the IAPS arousal rating and self-reported rat-

ings is expected because both are self-reported measures, but this also validates the

participant’s reported arousal as a basis for evaluating our output. The correlations

between the mean algorithm arousal and the mean IAPS (r(12) = .59, p <= 0.01) and

between the mean algorithm arousal and the mean participants’ reported feedback

r(12) = .51, p <= 0.01 per stimulus are moderate.

Treating each participant independently rather than averaging all participants per

stimulus, we also observed a moderate correlation (r(47) = .46, p <= 0.01) between

the algorithm’s arousal rating and the participants’ self-reported arousal. Similar to

Oliveira et al., inter-picture colour intensity was accounted for by getting the main

colour of a picture using the highest frequently appearing colour of an image [Oliveira

et al., 2009]. Next, this colour was converted to perceived brightness using the formula,

Perceived brightness = (Red value * 299) + (Green value * 587) + (Blue value * 114)1000

There was no correlation (r(47) = 0.03, p = 0.83) between the output of AFA


algorithm and the inter-picture brightness which indicates that our result is not a

factor of the inter-colour differences between the stimuli.

4.2.2 Limitations from our analysis of emotional stimuli

We postulate that the moderate correlation (rather than a strong correlation) between

physiological responses and self-reported measures exist due to the limitation in both

approaches. A lack of ideal ground truth [Constantine and Hajj, 2012] makes it difficult

to establish with certainty, the way forward. For pupillary response, errors could arise

due to eye-tracking accuracy and the analysis technique while for self-report, the main

cause of concern is bias. In the following subsections, we discuss these limitations and

make recommendations.

Limitations from self-reported ground truth

Participants sometimes report their expected feeling rather than their actual feeling

[Nichols and Maner, 2008]. In situations where participants’ engagement is limited or

passive, they would still rate their expected feeling, and this could limit the validity

of self-report. We observed that the correlation between AFA algorithm and self-report

was proportional to the minimum participant’s dwell time (r(134) = .39, p <= .01).

See Figure 4.3. This could mean that the more they engage with the stimulus, the

more accurate their self-report data is. We also observed that filtering by ‘maximum

stimuli viewed’, the more pictures participants viewed, the less accurate the algorithm

(r(134) = −.56, p <= .01), see Figure 4.4. In this case, it could mean that people get

disinterested or desensitised after looking at some stimuli. Therefore, their physiolog-

ical response is reduced. These may not be perceptible to participants, so they end

up reporting their expected emotion rather than their actual one. Indeed, the bias

in self-report lends further credence to the need for an objective measure of affect.

However, AFA algorithm is not without its limitations.

Limitations identified from AFA algorithm in this study

The accuracy of the data collected by the eye tracker is crucial for the performance

of the algorithm. The accuracy measured from by Tobii eye tracker refers to the


Figure 4.3: Correlation between the accuracy of the algorithm and the minimum taskduration per participant

Figure 4.4: Correlation between the accuracy of the algorithm and the maximum tasksallowed per participant


proportion of times that the eye tracker can capture both pupils during the study. We

observed from the dataset that the strength of the correlation between AFA algorithm,

and self-report was proportional to the ‘minimum accuracy of participant’ filter value

(r(134) = .59, p <= .01). The higher we set the minimum accuracy for data to be

included in the analysis, the better AFA algorithm can sense movements of arousal,

see Figure 4.5. This shows how much AFA algorithm relies on eye-tracking accuracy.

Therefore, even though the analysis technique can be improved upon, the accuracy

of the eye tracker (which we have limited control over), has a great influence on the

accuracy of the algorithm. Using the limitations from self-report and this algorithm,

we make the following recommendations for usability and user experience researchers.

Figure 4.5: Stimuli against the algorithm’s arousal rating, participants’ reported feed-back, and the IAPS arousal ratings

Recommendations

Improving the correctness of self-reported approaches is difficult because bias is a

consequence of human nature. Conversely, eye trackers are constantly evolving, and

we can expect improved accuracy. For instance, we used a Tobii X60 (60Hz ) eye


tracker for this study, but there are eye-tracking devices with as much as 10x more

accuracy, precision or fidelity, i.e., the Tobii Pro Spectrum (1200Hz ). Therefore, there

seems to be more potential in AFA algorithm because it is easier to improve and develop

technology than influencing the human nature of bias, on a global scale. Also, the fact

that AFA algorithm is automatic, it is more suitable for computation. We propose that

for now, both approaches be used simultaneously to corroborate findings or cancel

out their limitations. At the barest minimum, researchers should be aware of the

limitations of their chosen approach, such as the ones we highlighted.

4.2.3 Summary

We carried out this study with the main objective of evaluating AFA algorithm on its

ability to sense changes in arousal from emotionally evoked stimuli. We obtained our

ground truth data from the IAPS database. Therefore, our dependent variable was

the arousal level measured by AFA algorithm, while our independent variable was the

expected level of arousal from each picture in the IAPS dataset. Other confounding

factors such as learning effect, desensitisation and participants’ lack of interest in-

troduce bias; hence, the need for physiological responses to complement self-reported

means of arousal detection. The moderate correlation (r(47) = .46, p ≤ .01) observed

between the output of AFA algorithm and the ground truth shows that AFA algorithm

has the potential to complement self-reported arousal detection for usability, UX and

other studies of visual behaviour. In addition to measuring the correlation between

our measure of arousal and our ground truth, we observed a direct correlation between

the accuracy of our eye tracker and the accuracy of AFA algorithm. This helps us in

estimating the impact that machine accuracy has on AFA algorithm and establishes

evidence that future enhancements to eye-tracking technology would improve the ac-

curacy of AFA algorithm.

However, as we mentioned at the start of this chapter, arousal can be caused by both

emotional and cognitive factors. Therefore, we need to examine how AFA algorithm

responds to cognitively induced arousal. The next section focuses on evaluating the al-

gorithm’s ability to sense arousal when participants experience increased arousal from

cognitively induced stimuli.


4.3 Sensing cognition-induced arousal

In the previous section, we showed that pupillary response could be used to sense

emotionally evoked arousal. However, cognition has profound impacts in interactive

systems because it is crucial for learning, searching, coordination and assimilation,

all of which influence user experience [Sweller, 1994]. Cognitive load refers to the

amount of cognitive effort that is expended by individuals while carrying out a certain

activity. During user interaction, we perform tasks that demand cognitive efforts

[Paas and Van Merrienboer, 1994]. As we established in our background section, the

literature provides evidence to suggest that performance increases with an increase

in arousal, but only up to a point where performance begins to deteriorate. Lack

of optimal cognitive load can lead to performance reduction and errors [Arent and

Landers, 2003]. In mission-critical systems such as air traffic control, an inappropriate

mental state is a catalyst for failure, which can lead to loss of life and property. In less

critical systems, lack of optimal cognitive load may lead to errors, loss of time, poor

user experience, and ultimately abandonment of the piece of software or system.

In this section, we evaluate AFA algorithm on pupillary responses to cognitive stimuli.

To do this, we elicit cognitive arousal and controlled conditions onto each participant.

Then, we compare the output of AFA algorithm against the known levels of cognitive

arousal that we have induced, to evaluate its ability to discriminate between both

states. We explain the design of our experiment in further details.

4.3.1 Experiment

Participants

Participants were 27 (10 female and 17 male), recruited to take part in the study at

the University of Manchester. Participants were prospective and active students from

the University of Manchester, as the experiment was conducted during the university’s

open day. Recruitment was done by word of mouth, participation was voluntary, and

withdrawal was possible before, during and after the experiment. This experiment was

approved by the University of Manchester school of computer science’ ethics committee

(approval number: CS 283). The participant information sheet and the consent form

for this study are appended in Appendix B.1 and B.2 respectively.


Stimuli

Stroop’s effect was used to elicit cognitive arousal in participants. Stroop’s effect

is a psychological effect where participants experience a decrease in cognitive effi-

ciency measured by accuracy and response time when they are distracted by incor-

rectly named objects (incongruent), compared to correctly named objects (congruent)

[Bousefsaf et al., 2014]. 25 Congruently named colours (CC), 25 incongruently named

colours (IC), 20 congruently named animals (CA) and 20 incongruently named animals

(IA) were used to elicit Stroop’s effect as in Figure 4.6.

Figure 4.6: Stimuli for stroop’s effect

Materials and Procedure

Tobii X2-60 eye tracker was used to capture gaze behaviour at an angle of ∼ 30◦ and

pupillary response every 20ms (f = 50Hz). Other equipment used include 17-inch

monitor, Tobii Studio 3.2 software, keyboard and mouse. Participants viewed each

stimulus image at a distance of ∼ 60cm from a 17-inch monitor angled at ∼ 115◦.

Participants were randomly allocated into two groups which determined the order the

stimuli were displayed. Group A viewed them in the order CC → IC → CA → IA


while group B’s stimuli were presented in the order CA → IA → CC → IC, See table

4.3. The participants were asked to name aloud each object within the stimulus,

Stimulus Expected Arousal LevelCongruent Animals (CA) LowIncongruent Animals (IA) HighCongruent Colours (CC) LowIncongruent Colours (IC) High

Table 4.3: Stimuli and expected arousal levels

irrespective of the textual label. As soon as they name every object in a stimulus,

they would press the space bar to move to the next, and there was no time limit for

each task or the entire experiment.

4.3.2 Result

Participants’ gaze behaviour under cognitive stimuli

The 27 participants spent a total of 3557.73s (59.30 minutes) observing all 5 stimuli.

The participants’ mean dwell time was M = 131.77s, SD = 28.55 and ranged between

88.80 and 182.46s. Figure 4.7a, 4.7a, 4.7c and 4.7c show the spread of fixations on

the AOIs across the congruent animal naming, incongruent animal naming, congruent

colour naming and incongruent colour naming respectively. From these figures, we

can see that the heat map appears more dense on the incongruent naming than the

congruent naming, for both each stimuli type (animals and colours) which indicate

more fixations on the incongruent naming. Fixation is a proxy for measuring atten-

tion [Corbetta et al., 1998, Pan et al., 2004, Holmqvist et al., 2011], which means that

the incongruent tasks required more attention to complete them.

The total mean fixation count across all media is 22.32. This means that for every

131s, participants will have approximately 22 fixations. Figure 4.8 shows the mean

fixation count across each media. As shown in the graph, breaking it down by stimuli

type (i.e. animals vs. colour naming), participants fixated more on the incongruent

stimuli than the congruent stimuli in both cases. For the animal naming stimuli, the

mean fixation count for the congruent stimulus was M=1.94, SD=2.51 while the mean

fixation count for the congruent stimulus was M=2.08, SD=0.11 compared to M=2.83,

SD=0.28 for the incongruent stimulus.


(a) Congruent animal naming (b) Incongruent animal naming

(c) Congruent colour naming (d) Incongruent colour naming

Figure 4.7: Heatmap showing the aggregated fixation on AOIs of each stimulus

Figure 4.8: Bar chart showing the total fixation count mean, for congruent and incon-gruent object naming across all stimuli


Similarly, the sum of mean fixation duration across all media is M = 0.39s, SD = 0.09.

This means that during the 131s (on the average) that participants spent across all

stimuli, they will fixate on an AOI for a total duration of approximately 4s. Figure 4.9

Figure 4.9: Bar chart showing the total fixation duration mean (s) for congruent andincongruent object naming across all simulli

shows the mean fixation duration across each media. As shown in the graph, breaking

it down by stimuli type (i.e. animals vs colour naming), participants’ fixation duration

was longer on the incongruent stimuli than the congruent stimuli in both cases. For

the animal naming stimuli, the mean fixation duration for the congruent stimulus was

M=0.53s, SD=0.05, while the mean fixation duration for the congruent stimulus was

M=0.28, SD=0.10 compared to M=0.34, SD=0.02 for the incongruent stimulus.

To minimise the learning effect on our analysis, we randomised the order of present-

ing congruent the stimuli such that Group A was presented congruent stimuli before

incongruent while Group B was presented the incongruent before congruent stimuli.

Analysis between the two groups shows that there was no statistically significant dif-

ference, (MannWhitney U (p > 0.05)) between both groups’ average fixation count

(Group A: M = 2.23, SD = 0.38; Group B: M = 2.232, SD = 0.49) and average

fixation duration (Group A: M = 0.39, SD = 0.09; Group B: M = 0.39, SD = 0.10).

Therefore, deducing from the fixation count and duration, the order of stimuli presen-

tation did not appear to influence their gaze behaviour.


In summary, considering fixation count and duration as proxies of attention, the in-

creased fixation count and duration in incongruent stimuli compared to congruent

stimuli, we can conclude that participants needed more attention to complete the

more cognitively demanding tasks (incongruent naming). Attention, however, does

not tell the full story. Therefore, we look at other indices of arousal, vis-a-vis, the

analysis of pupil dilation, especially as it is the focus of AFA algorithm.

Pupillary response to cognitive load

A total of 213, 985 pupil data points were extracted from the eye tracker. Participants

had an average of M = 7925.37, SD = 1696.97 rows of data, which is quite varied,

considering the large standard deviation. Figure 4.10 is a boxplot that illustrates the

data distribution of the pupil dilation grouped by congruent and incongruent naming

for each stimulus type. The box plot shows that the data distribution is quite similar.

Therefore, simply using the mean pupil dilation as a means to discriminate congru-

ent and incongruent naming would not suffice. We elucidate on how we utilised AFA

algorithm to generate a more distinctive measure of arousal than both the raw pupil

data and the gaze behaviour from 4.3.2. To analyse this dataset using AFA algorithm

requires aggregating the data to eliminate outliers because as we can see from the

box plot, some pupil dilations are as low as 1mm, which is likely to be the effect of

noise (because the typical pupil size ranges between 2mm-8mm ). Furthermore, we

transform the aggregated data for each participant to fit their range and measure of

central tendency using AFA algorithm explained in Chapter 3. Finally, we apply peak

detection and arousal sensing.

Therefore, after processing the raw datasets through our peak detection and arousal

sensing algorithm, we generated an output, which is an array of arousal levels for

each participant over each stimulus. The mean arousal level for animal naming in the

congruent task was M = 2.46, SD = 1.67 while participants experienced nearly twice

the amount of arousal (M = 4.89, SD = 2.45) on the incongruent task. The mean

arousal level for colour naming in the congruent task was M = 2.17, SD = 1.95 while

participants experienced nearly thrice the amount of arousal (M = 6.80, SD = 1.90)

on the incongruent task. We plot the distribution for the cumulative arousal that


Figure 4.10: Box plot showing the data distribution of the output of the algorithm foreach stimulus

each participant experienced on each stimuli on Figure 4.11 to illustrate the distribu-

tion more.

To evaluate our result, we perform correlation tests with the output of AFA algorithm

and the ground truth (the expected level of arousal after Stroop’s effect). We did

a correlation between the cumulative arousal per stimulus for each participant. As

described in 4.3.1, there were four stimuli. Congruent stimuli were categorised as level

0 - low arousal and incongruent stimuli were categorised as level 1 - high arousal as

shown in Table 4.3. Using point biserial correlation (to perform a test of correlation

between an independent variable and dependent variables with two categories), we

found that there was a moderate correlation, r(76) = .64, p < .01 between the ex-

pected arousal level and AFA algorithm s arousal level. Breaking this down by stimuli,

we found there to be a moderate correlation, r(76) = .51, p < .01 for animal naming

between the expected arousal level and AFA algorithm s arousal level, while there was a

high correlation, r(76) = .77, p < .01 for colour naming. This suggests clearer discrim-

ination between congruent and incongruent stimulus for colour naming compared to

animal naming. To examine whether there is a statistical difference, we apply Mann


Figure 4.11: Violin plot showing the data distribution of the output of the algorithmfor each stimulus

Whitney U. Results show that the algorithm can discriminate between congruent and

incongruent stimuli, in both animal naming (U(76) = 51, p < .05) and colour naming

tasks(U(76) = 142.5, p < .05). We discuss the lessons learnt from this study.

4.3.3 Lessons learnt from the analysis of cognitive stimuli

From our experimental design, incongruent naming of objects represents increased cog-

nitive load, while congruent naming simulates the controlled condition. The result of

our within-subject design shows that participants required more attention to complete

the more cognitively demanding tasks as seen from the gaze behaviour [Corbetta et al.,

1998, Pan et al., 2004, Holmqvist et al., 2011]. The results of our analysis of the par-

ticipants’ gaze behaviour confirm previous claims in the literature, that people fixate

more and for a longer duration to pay more attention and complete more cognitively

demanding tasks. Cognitive stimulus differs from the emotional stimulus, in terms of

arousal. For example, fear is an emotion that is characterised by a negative valence


(displeasure), increased arousal, and non-dominant (submissive response). The sub-

missive response means that participants will tend to avoid stimuli that cause fear,

thereby exhibiting low fixation count and duration, in terms of their gaze behaviour.

In our analysis of emotional stimulus in the previous chapter, we showed that AFA

algorithm was able to identify an increase in arousal under different emotional stim-

uli. Despite the potential differences in gaze behaviour between cognitively induced

arousal and emotionally evoked arousal, AFA algorithm was able to discriminate be-

tween congruent and incongruent arousal.

We observed that AFA algorithm was able to identify the arousal signal clearer in the

colour naming tasks clearer than in the animal naming task. This may suggest that

the incongruent colour naming task was more difficult than the incongruent animal

naming tasks. In addition to that possibility, the following quote from participant

P14M suggests that some participants were able to adopt coping mechanisms to avoid

the incorrect naming of animal objects. Whereas this specific coping mechanism was

not as efficient for the incongruent colour naming task.

“I was able to view the animals passively without looking at the captions.

This was more difficult for the colours because I could not name the colours

themselves while also avoiding the text on the colours. This is why I called

some of the colours the wrong name.”

[P14M]

This phenomenon could be related to the barriers of language retrieval. Language

or word retrieval is the process of recalling a target word from memory [Kambanaros

et al., 2013]. People adopt several strategies while retrieving words from memory,

e.g. by associating semantic relevance of the word to the context or by perceptual

relevance [La Heij, 1988]. In animal naming, there are more visually distinguishing

cues to associate the animals with, for example, the shape, size and distinguishing

features (i.e., the wings of a bird and the trunk of an elephant) [Martin et al., 1994].

Whereas, for colours, participants have limited features to aid in word retrieval [Shao

et al., 2015].

Furthermore, the quote from participant P25M suggested that naming animals in

the English language presented a language barrier to the task.


“As a native Arabic speaker, I could not remember what some of the an-

imals were called in English. For example, I called the spider a scorpion.

That was one level of confusion, as well as the wrong labelling. It was easier

for me to remember the English names for the colours than the animals.”

[P25M]

This is a well-researched phenomenon in the literature on cross-cultural information

retrieval [Ballesteros and Croft, 1998]. This could have been put into consideration,

for example, recruiting only native speakers or collecting English language proficiency

as part of the participant’s information to be accounted for.

Therefore, it is likely that some participants experienced increased cognitive load

for animal naming because they could not recollect the English word for the animal.

Thereby making the controlled task not as effective for colour naming compared to

colour naming.

We also showed that despite gaze behaviour being a capable discriminator of increased

cognitive arousal, combining it with pupil dilation, increases this distinction signifi-

cantly. In reality, the boundary between the causes of increased arousal may not be

determinable. This is why an approach that takes an average of participants reaction

under different stimulus may not be effective. Therefore, an event-based approach

that is capable of sensing moments of increased arousal is ideal. AFA algorithm which

uses continuous peak detection to sense when a participant experiences an increase in

arousal while combining it with detection of areas of focal attention can be used to

identify the areas on a user interface that induces increased cognitive load. Designers

can, therefore, address this by adapting the user interface in real-time or offline, or

providing hints to the user where this is not possible. We understand that the Stroop’s

effect evaluates of AFA algorithm by comparing a controlled situation with cognitive

induced arousal, which may be less realistic. Therefore, the study in the next Chap-

ter was carried out using stimuli, which the literature has described as some of the

common causes of end-user frustration on the Web.


4.3.4 Summary

Cognitive overload is one of the causes of stress during user interaction. The term user-

friendliness indicates that a user interface is intuitive and corresponds to the users’

expected outcome. The presence of cognitive overload, however, is an indication that

the content, presentation, layout or structure of multimedia content may not be fit for

purpose. Arousal can be used as a proxy to sense cognitive overload. In this chapter, we

showed that AFA algorithm could be used to detect the discriminate between controlled

conditions and cognitively induced arousal. Using Stroop’s effect, we induced cognitive

arousal onto 27 participants. Results showed that AFA algorithm could discriminate

between incongruent and congruent tasks for both animal naming and colour naming

tasks.

In this chapter, we have shown through our studies, that AFA algorithm is capable of

sensing arousal due to emotional or cognitive stimuli. Both stimuli have taken the

form of images, and contain effects that are not frequently encountered, especially in

such magnitude, during user interaction. Therefore, we need to examine the behaviour

of AFA algorithm on common causes of stress during user interaction, to show that AFA

algorithm is ecologically valid.

In the next chapter, we evaluate the ability of AFA algorithm to sense frustration

induced arousal on the Web.

Chapter 5

Sensing frustration-induced arousal

on the Web

In the previous chapter, we evaluated AFA algorithm on emotionally-evoked arousal

and cognitively induced arousal. Those evaluations allowed us to examine how the

algorithm would perform on static stimulus as, in both cases, we used images to in-

duce the desired levels of arousal onto the participants. However, in addition to visual

engagement and cognitive activities, participants browsing the Web engage in more

complex interactions such as typing on the keyboard, mouse scrolling, mouse-clicking

and mouse hovering. Furthermore, people use these modes of interaction to achieve

specific tasks. For example, people tend to scroll during reading tasks, people use the

keyboard to type characters when filling forms, and people hover their mouse in target

acquisition tasks. Any failure in computer peripherals to aid interaction could result

in frustration. Besides the user’s interaction, other factors deter successful task com-

pletion on the Web. For example, network failure, hardware and software malfunction,

and poor design of user interfaces may prevent users from completing their tasks.

Task completion is a critical factor for the usability of a system. The inability of users

to complete their tasks may result in several emotional and cognitive responses that

hinder the quality of user experience. Since we have previously tested AFA algorithm

on detecting arousal from emotional and cognitive stimuli, in this chapter, we evalu-

ate AFA algorithm in detecting these same psycho-physiological signals when users are

unable to complete their web interaction tasks due to hindrances such as software,

hardware and network failure.

128

CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB129

5.1 Why frustration?

Detecting frustration on the Web provides a greater challenge compared to the previ-

ous evaluations in Chapter 4 due to the various modes of interacting with a website,

and the events that could go wrong. Some frustrating events may result in lower levels

of arousal compared to purely emotional or cognitive stimuli like those in Chapter

4. However, it is also crucial to detect lower levels of arousal because the cumulative

effects of undesired states could result in an overall poor user experience towards a

system. Moreover, if frustration is not prevented or accounted for, it can lead to other

negative emotions including anger, aggression, sadness, disappointment, fear, anxiety

and withdrawal from the use of a software or website [Jeronimus and Laceulle, 2017],

all of which are limiting factors towards good user experience.

Berkowitz et al. defined frustration as an emotional response to delay or hindrance

in achieving a goal [Berkowitz, 1962]. It is a negative affective state that leads to

an increase in physiological arousal [Storms and Spector, 1987]. In interactive sys-

tems, frustration caused mainly by incorrect or unexpected responses to the users’

interaction. This may be due to the nature of the task (e.g. difficulty), hardware

infrastructure, user interface, software system, network or even the user’s mental and

cognitive state [Olson and Olson, 2003]. Frustration can lead to the reduction in ac-

curacy, speed and other psychological consequences, such as loss of motivation, all of

which limit the quality of user experience in terms of performance [Lazar et al., 2006a].

Frustration is an idiosyncratic experience, and people respond to its effects differently.

In a study by Szasz et al., it was discovered that people respond differently to frustrat-

ing tasks depending on their coping mechanisms [Szasz et al., 2011]. Some people are

more resilient than others, people have different competency levels using technology,

and some people react differently to the layout and presentation of Web contents as

can be seen in our case study of neurotypical people vs autistic people on the Web

(Chapter 5). Creating a set of heuristics or guidelines to manage frustration may

not be generalizable or effective for each user. Peoples’ prior experience, personality

traits and demographical background (e.g. culture, age, gender and location) result in


different tolerances and reactions towards incorrect or unexpected responses to com-

puter usage [Lazar et al., 2006a]. Therefore, interventions need to be administered

to users individually and per case. This concept holds certain similarities to the field

of medicine, where personalised healthcare is used to deliver treatment to each pa-

tient based on the severity of their symptoms, demographic profile, genetics, etc. One

way to accomplish this during user interaction is to detect frustration as it occurs

automatically. Physiological sensors, for example, heart rate and blood pressure were

suggested as a means to examine the effects of frustration on an individual basis [Szasz

et al., 2011].

Affect detection systems have often been evaluated using cognitively induced arousal

as well as emotionally evoked arousal but frustration induced arousal has been un-

derstudied. In the next section, we discuss works that are similar to ours regarding

frustration detection in interactive systems.

5.2 Related works on sensing frustration in inter-

active systems

In 2002, Klein et al. proposed an interactive support affective agent that helps users

manage and recover from negative emotional experiences during computer use by

demonstrating active listening, empathy, and sympathy [Klein et al., 2002]. Despite

the relevance of their work in ameliorating the effects of frustration, a system has to

be in place to sense when frustration occurs. In 2010 Lunn et al. performed a study

using galvanic skin response to sense stress levels between older users and younger

users, and between static contents and dynamic contents on the Web. Their results

yielded no significant result by age, but they observed that older users had a varied

physiological response to dynamic contents response compared to static contents. This

may indicate hesitancy due to a lack of familiarisation, and cautiousness towards dy-

namic contents [Lunn and Harper, 2010b]. Lunn et al.’s work identifies patterns of

interaction and physiological responses that can help build heuristics for older users.

Our work improves on this through the analysis of gaze behaviour to detect the users’

focal attention, in a more generalisable way, not only to older users but also applica-

ble to wider demography of users. In another study, facial electromyography (EMG)


was used to discriminate between correct and incorrect tasks, novice and expert users,

whereas, the difficulty rating of a website was used as indices of frustration [Hazlett,

2003]. As mentioned earlier, frustration can occur when participants do not achieve

the expected outcome of an interaction (correctness and completeness), but, other fac-

tors cause frustration on the Web besides their ability to complete tasks. In another

study that was based on the hypothesis that people who are frustrated tend to apply

more pressure to their mouse device, Qi et al. used a mouse, mounted with an 8-point

pressure sensor to collect pressure information from participants [Qi et al., 2001]. Qi et

al. further improved the accuracy of this study, using a Bayesian model to sense par-

ticipants’ frustration on an individual basis, achieving an accuracy of 88% [Qi et al.,

2001]. Using the mouse device as a sensor for frustration detection is ideal for use

in applications that make consistent use of the mouse. AFA algorithm complements

their approach, especially in applications that make use of inconsistent/limited mouse

interactions to perform tasks.

A multi-modal approach using a pressure-sensitive mouse, pressure-sensitive chair and

a camera was used to develop a model for predicting frustration in user search tasks

with an accuracy of 65% [Feild et al., 2010]. In the same study, the interaction events

were logged and used as predictive features. With the event log data, Field et al.

achieved a higher accuracy of 87%. In another multi-modal study, Kapoor et al. used

a camera, posture sensing chair, pressure mouse, skin conductance and the game state

of the user to detect frustration with an accuracy of 79%. Galvanic skin response and

Gaze data have also been used in combination to sense the frustration and severity of

usability problems [Bruun et al., 2016]. Several other models have been built to predict

both satisfaction and frustration, especially for consumer-based users [Garrett et al.,

2004]. Many of these approaches have limited potential for widespread application in

the wild due to their use of multi-modal sensors. The ideal solution for arousal detec-

tion in interactive systems should be unobtrusive and respond to low-intensity changes.

It should also be generalisable (not specific to a software application). Furthermore,

unimodal solutions are more suitable because using multiple sensors decrease the like-

lihood for ubiquitous use. Our proposed solution fulfils these criteria.

During a frustrating experience, there is an increase in arousal [Hokanson and Burgess,


1964]. Therefore, our evaluation in this chapter is to extend the validity of AFA algo-

rithm towards its use in sensing frustration-induced arousal on the Web.

5.3 Research contributions through this study

This evaluation also examines the ecological validity of AFA algorithm, since we make

use of more practical stimuli. Ecological validity is one of the main challenges in sens-

ing arousal in the field of affective computing in the wild. We have aim to answer the

following research questions in this study:

RQ1. Can pupillary response be used to sense arousal, induced by frustra-

tion on the Web?

For this, we examine whether we can yield consistent results on the Web, as

with emotional and cognitively induced arousal on static images. Also, frustra-

tion may induce arousal levels of low intensities compared to emotional images

or cognitive load. Would AFA algorithm be able to sense low intensity signals?.

This research question partially addresses the research question RQ2 stated in

Chapter 1 pertaining to the overall goal of our PhD research.

RQ2. Is there a relationship between participants’ levels of arousal and their

focal attention during moments of frustration?

For this, we examine whether we can localize the cause of arousal to a certain

element on the screen. When people feel stressed caused by a UI element, can we

identify this, so that AFA algorithm can inform potential interventions to ame-

liorate frustration with techniques such as adaptive computing or recommender

systems? This research question partially addresses the research question RQ3

stated in Chapter 1, one of the goals of our PhD research.

To answer these questions, we induced participants with known causes of frustration.

The choice of stimuli was based on a study by Ceaparu et al. where the frequency,

the cause, and the level of severity of frustrating experiences in interactive systems

were researched on through the use of diaries and surveys [Ceaparu et al., 2004]. In

our study, we induced participants with these known causes of frustration selected


from Ceaparu et al.’s study. Further, we applied AFA algorithm to sense arousal in

participants and detect their focal attention when the frustrating component could be

segmented on the screen. We describe our study in further detail.

5.4 Experiment

This experiment was approved by The University of Manchester committee on ethics

and informed consent was obtained prior to participants taking part in the study

(approval number: 2018-4365-5934). The participant information sheet and the

consent form for this study are appended in Appendix C.1 and C.2 respectively.

5.4.1 Participants

Participants (N =40, Female=13, Male=27) with a median age of 25 years (M =

26.33, SD = 5.72) were recruited for this study. Invitation to take part was promoted

via poster advertisement, emails, and by word-of-mouth to people within the university

community. All participants identified as having normal, or corrected to normal vision.

5.4.2 Materials and procedure

A Tobii X2-60 eye-tracker was used to capture pupillary response, fixation location,

and fixation duration of the participants at a frequency of 50Hz. As they carried out the

tasks, a Logitec 1080 pixel Web camera was used to capture video of the participants.

This video was replayed to the participants after the study to aid their recall regarding

episodes of frustration while filling in their self-reported measure of frustration on a

questionnaire (see Appendix C.3 for the questionnaire). After participants rated their

levels of frustration, we took note of comments, feedback or self-reflection from the

participants regarding the study. A mouse and keyboard was used to interact with the

websites, while a 15.6 ′′ monitor at 1366 × 768 pixels resolution was used to view the

instructions and carry out the tasks in full screen mode. The experiment was run on

a Dell Latitude E5530 notebook, Windows 7 operating system with a Mozilla Firefox

Quantum 59.0.2 (64-bit) Web browser.


5.4.3 Method

The study took place in a usability laboratory with regulated level of illumination to

control for changes in pupil dilation due to light. The design makes use of a 4x2 within-

subject design with a random order of stimulus presentation. There are two levels of

effect for each website presented to participants, ‘normal’ and ‘disruptive’ interaction.

During the normal interaction, participants carried out tasks without experiencing

unexpected responses. During the disruptive interaction, participants experienced a

simulated operating system failure, pop-ups, internet time-out and mouse malfunc-

tion on Google, Wikipedia, National Express and BBC websites, respectively. The

disruptions were based on the most common causes of end-user frustration reported

by Ceaparu et al. [Ceaparu et al., 2004].

The websites were selected such that they represented different types of tasks: infor-

mation search, reading, data entry, and pointing tasks. Table 5.1 shows the list of

tasks, descriptions and disruptions associated with each website. The disruptive tasks

were implemented using Javascript and injected into the websites using Violentmonkey

(a browser-based plugin that inserts user scripts into websites at run time) [Gerald,

2018]. Other tools for creating user-script include Tampermonkey [Biniok, 2018] and

Greasemonkey [Buchanan, 2018]. At the time of writing the user-scripts for this study,

we selected Violentmonkey because it is opensource and was compatible with the ver-

sion of the Web browser that was compatible with our eye-tracking software. The task

ID starts with ‘T’ followed by the task number[1-4], while the last character stands

for ‘D’-disruptive or ‘N’-non-disruptive. We describe the design of the four disruptive

tasks in further details:

T1D: A booking task where participants were instructed to book a trip to Manchester.

The rationale behind this task is to simulate what participants experience when

there is a time-out just before completing the task. In reality, a time-out can be

caused by system errors, internet connection, unexpected software behaviours

such as an interrupted system restart for an operating system software update.

When these disruptions occur, users are unable to complete their tasks, and this

leads to frustration due to wasted efforts and consequently their loss of time.


Often, they have to redo the task from the start. This effect is similar in retail e-

commerce websites, job application websites, hotel and transportation booking

tasks where participants fill in forms to complete their tasks. Transportation

booking sites follow a similar process, so familiarity is less likely to be a biasing

condition compared to some of the forms above. We selected the national express

website as it is one of the popular coach booking websites in the UK. For the

disruptive version of the task, our design was such that the participant would

experience a time-out response before they reach the point of paying for their trip

to Manchester, from London. After 3 seconds, the participants were redirected to

start the booking again, this time, without disruption. Our expectation, backed

by the literature, is that participants will be frustrated by having to start the

process all over again [Lazar et al., 2006b, Qi et al., 2001].

T2D: A pointing task where participants were instructed to check the weather in

Manchester during a distinct time. Participants were required to select the loca-

tion, date and time by hovering, pointing and clicking using their mouse device.

Therefore, it was suitable for the mouse malfunction disruption. Other exam-

ples where mouse pointing is frequently used is in gaming applications, website

search tasks and desktop browsing with the file browser. Since our study is about

frustration on the Web, we selected BBC.com as it is ranked as the sixth most

popular website in the UK (2018), by https://www.alexa.com/topsites. For

the disruptive task, we made their standard mouse pointer invisible. Using CSS,

we painted a lookalike of the mouse pointer using CSS and Javascript and made

the fake mouse pointer disappear and appear at random intervals and locations

so that it simulates a faulty mouse device. We anticipated that the lack of total

control of the mouse pointer by their mouse would cause the participants to

become frustrated. Furthermore, findings in the literature suggest that unex-

pected outcome, imprecision and inaccuracy of pointing devices are sources of

frustration in interactive systems [Benko and Wigdor, 2010].

T3D: A search task, where participants were instructed to conduct a simple Google

search to find out the current time in Ottawa, Canada. The purpose of this task is

to simulate an operating system failure. In the event of operating system failure,

https://www.alexa.com/topsites


it can result in a loss of data, communication, time, and even money to the user.

Operating system failures can occur at any point during user interaction and is

usually an unwelcome event during user interaction. Therefore, we hypothesised

that an unwelcome event such as an operating system crash during the study

would induce frustration unto the participant. We chose to do this in a simple

task such as Web search so that the user does not feel that they have caused the

system to crash since they were only carrying out a fairly easy Google search.

We used the Windows operating system because operating system crashes are

common on the Windows OS compared to other operating systems.

For this disruptive mode, the user script is triggered as soon as the search query

is executed. The user script makes use of Javascript to redirects the page to

a picture, in fullscreen mode, that resembles the well known ‘Blue Screen Of

Death’ (BSOD or Blue Screen for short). Since the experiment is presented in

full-screen mode, the search bar and taskbars are hidden, so the participants

were led to believe that there was an actual operating system failure. After 3

seconds, the participants are redirected back to the default Google search page.

T4D: A reading task in which the participants are instructed to look for Stephen

Hawking’s PhD thesis on his Wikipedia page. The target was at the bottom of

the page, so users were expected to scroll down. To scroll, participants needed

to dismiss the pop-up by checking the button which says “Do not display the

pop-up again”. The Wikipedia website was chosen because it is one of the most

popular media for looking up biography information (ranked as the seventh most

popular website in the UK (2018), by https://www.alexa.com/topsites. The

main aim of this task is to induce frustration through pop-ups. A reading task

was chosen because the pop-up serves as a hindrance to their view, and the pop-

up hinders their interaction with the webpage. Furthermore, pop-ups are known

to cause frustration on the Web as reported in the literature [Bahr and Ford,

2011, Sanghoon and Roberto, 2005, Baylor and Rosenberg-Kima, 2006].

For this disruption task, the user-script is launched as soon as the page is loaded

to show a pop-up with the content “Invalid user action”. The information that

participants needed to find was located at a position on the page where they had

to scroll down to see it. The participants were then interrupted by pop-ups at

https://www.alexa.com/topsites


Figure 5.1: Disruption to tasks to elicit frustration: T1. Time-out experienced whenbooking a trip, T2. Mouse location altered when selecting weather information, T3.Operating system error during Google search and T4. Multiple Pops ups interruptingWikipedia content lookup

one-second intervals.

Figure 5.1 illustrates the tasks and websites.

As stated previously, the aim of the study was to measure the performance of our

proposed approach (pupillary response and eye-tracking) by discriminating between a

temporal period of frustration and normal user interaction. We discuss how we carried

out our analysis next.


Tab

le5.

1:E

xp

erim

enta

lta

sks

Task

IDW

eb

site

Task

desc

ripti

on

Sim

ula

ted

dis

rup

tion

T1N

Nat

ional

Expre

ssB

ook

trip

from

Man

ches

ter

toL

ondon

Non

eT

2NB

BC

Wea

ther

Chec

kw

eath

erin

Lon

don

Non

eT

3NG

oog

leC

hec

kti

me

inO

ttaw

a(C

anad

a)N

one

T4N

Wik

iped

iaF

ind

the

titl

eof

Ala

nT

uri

ng’

sP

hD

thes

isN

one

T1D

Nat

ional

Expre

ssB

ook

trip

from

Lon

don

toM

anch

este

rSes

sion

tim

eou

tT

2DB

BC

Wea

ther

Chec

kw

eath

erin

Man

ches

ter

Mou

sem

alfu

nct

ion

T3D

Goog

leC

hec

kti

me

inC

anb

erra

(Aust

ralia)

OS

failure

T4D

Wik

iped

iaF

ind

the

titl

eof

Ste

phen

Haw

kin

g’s

PhD

thes

isM

ult

iple

pop

-up’s

NB

:N

=nor

mal

,D

=dis

rupte

d


5.4.4 Analysis

In this study, the entire stimulus (task) was treated as a single AOI. Table 5.1 shows

the tasks that were carried out by participants. If any task was incomplete due to

network failure or website malfunction (not by our design), that participant’s entire

dataset is excluded from the analysis to preserve the balance of our within-subject

design. Therefore, 10 participants were excluded based on this criteria. Linear inter-

polation was performed to replace missing values from the eye-tracker. Furthermore,

the pupil (right or left) with the most complete data extracted from the eye-tracker

was used as the input to the analysis algorithm described in Chapter 3.

5.5 Results

Results of our analysis show that AFA algorithm can sense arousal in frustrated par-

ticipants to a large effect (η2p). We present this in further details.

The output of the algorithm described in section 5.4.4 was stored as a dependent vari-

able (DV) in a vector for further statistical analysis against the independent variables

(IVs) - the website, and the mode of interaction (normal or disrupted). The violin

plot in Figure 5.2, grouped by the task and mode of interaction shows the distribution

of the DV, which illustrates a difference in the pattern for disruptive (red), vs non-

disruptive tasks (blue).

The experiment utilised a 4X2 within-subject design. Therefore, we performed a two

way repeated measures analysis of variance (ANOVA). The dependent variable be-

ing the cumulative effect of arousal reported by the algorithm, and the independent

variables being the website, and mode of interaction (with two factors, ‘normal’ or

‘disruptive’). There was a significant difference between the modes of interaction on

the algorithm’s arousal levels with a large effect (F (1) = 400.303, p < .001, η2p = .690).

We also observed that there was a significant difference between the websites on the al-

gorithm’s measure of arousal with a large effect (F (3) = 182.669, p < .001, η2p = .849).

However, performing a pairwise comparison of difference (Wilcoxon’s test), on the

websites revealed that T1 (booking a trip) was significantly different from the other

tasks. Table 5.2 shows the pairwise comparison between the websites, while Figure 5.3


Figure 5.2: Violin plot of the data distribution of the level of arousal in all tasks forboth groups (disruptive and normal)

illustrates that website T1 appears to elicit more arousal than the others. On further

investigation, excluding T1 from the analysis, the same ANOVA shows that the web-

site had no significant effect on the arousal level (F (2) = 0.292, p > .001, η2p = .009),

while the mode of interaction still has a large effect (F (1) = 100.7, p < .001, η2p = .537).

The average time of completion for task T1 (M = 62.586s, SD = 26.732) was twice

as long as T2 (M = 24.330s, SD = 10.294), T3 (M = 22.475, SD = 12.399) and T4

(M = 21.667, SD = 18.206) which may have influenced the cumulative arousal score

Table 5.2: Results of Wilcoxon test comparing arousal between each task with Bon-ferroni correction α = 0.008

Group1 Group2 W p-valueT1 T2 45.0 <.001*T1 T3 38.0 <.001*T1 T4 79.0 <.001*T2 T3 896.0 1.000T2 T4 720.0 .907T3 T4 739.0 .195Note: * = p < .008


Figure 5.3: Bar chat with error bars (Standard error of the mean) showing the tasks(both modes of interaction combined) vs level of arousal

in T1 compared to the other tasks. Similarly, we performed a Tukey HSD test to deter-

mine how the measure of arousal differed based on the mode of interaction; the results

show that there is a significant difference, (M = 1.25, p < .001, CI.95 = [0.69, 1.82]).

We conducted a pairwise comparison (Wilcoxon’s test) between the normal interaction

mode and the disruptive mode for each website. Table 5.3 shows that the algorithm

was able to discriminate arousal in all four tasks. Figure 5.4 also illustrates this. This

result answers RQ1., which pertains to whether AFA algorithm is capable of distin-

guishing between frustrating tasks and controlled tasks.

Table 5.3: Results of Wilcoxon test comparing the mode of interaction within eachtask with Bonferroni correction α = 0.0125

Group1 Group2 W p-valueT1D T1N 0.0 <.001*T2D T2N 121.0 .004*T3D T3N 28.0 <.001*T4D T4N 1.0 <.001*NB : N = normal, D = disrupted

* = p < .0125


Figure 5.4: Bar chat with error bars (Standard error of the mean) showing all tasksvs level of arousal

We have presented the analysis of variance between the measure of arousal consid-

ering the website and the mode of interaction as factors. We now consider an additional

method to evaluate AFA algorithm against the ground truth by testing how similar the

output of the algorithm’s (DV) is, to the participants’ rating, and the mode of inter-

action. A Spearman correlation test between the DV and the participant rating shows

a moderate correlation, (rs(301) = .325, p < .001). However, the moderate interrater

agreement (κ = .600) between both sources of ground truth - the participants’ rating

and the mode of interaction (normal and disruptive) suggests that there is significant

disagreement. The quotes from participants P01M, P37M and P05M may explain why

some participants were not frustrated by the disruptive tasks.

“I am familiar with time-outs, I never expect to fill a form without errors.”

[P01M]

“I didn’t even notice that there were pop-ups

[P37M]


Also, some participants’ feedback (P05M and P14M) suggest that the order of

presenting the stimuli may have influenced their arousal levels.

“Towards the later tasks, I began to suspect that there was a trick going on

so, I became relaxed.”

[P05M]

“I had problems remembering the instructions earlier on, but it became

easier as the experiment progressed.”

[P14M]

For some participants, their personal experiences may have influenced their response

to the stimuli. For example, a participant who rated both modes of interaction on T1

(booking a trip) as frustrating said:

“Trips are frustrating [...] anyway, I don’t like to see anything that reminds

me about travelling.”

[P31M]

Conversely, another participant rated the normal mode in T1 (booking a trip) as

frustrating, while giving a normal rating for the disruptive mode. The participant’s

feedback was:

“...not happy thinking about a trip to Manchester.”

[P12M]

indicating that previous experiences may have influenced their rating, rather than

the stimulus. As previously stated, the purpose of this study is to evaluate the per-

formance of AFA algorithm to discriminating between induced arousal. We have two

sources of ground truth to validate our dataset. For more confidence that the inter-

vention group’s dataset contains only instances of arousal, the participant must be

carrying out a frustrating task and also rate the task as frustrated. For more confi-

dence that the controlled group were normal, the participant must be carrying out a


non-disrupted task and report it as non-frustrating. As a form of secondary analy-

sis, we investigated this concept by carrying out the same test on the subset of the

records that have a perfect agreement. This means that we would exclude 109 records

where the participant reported frustration, where it was a normal task, or, the partic-

ipant reported normal when it was a disrupted task to include only records where the

ground truths agree. In our evaluation on emotional stimulus via the picture view-

ing study, we learnt that by eliminating confounding factors such as lack of interest,

stimuli desensitisation, we could carry out a better evaluation of AFA algorithm. In

this particular study, we anticipated that the lack of accuracy in the participants re-

ported feedback could be a confounding factor. We also anticipated that since we are

using low-intensity stimuli, this effect may be missed by some participants. Therefore,

by analysing data where our two sources of ground truth are saying the same thing,

i.e., participants reporting stress when the stimulus was designed to be stressful, or

participants reporting no stress and on the controlled stimuli, this ensures that our

ground truth reflects more accurately, what the participants feel. We observed a mod-

erate but higher correlation between the consolidated ground truth and the algorithm,

(rs(192) = .474, p < .001) after eliminating these confounding factors. We discuss the

implications of this increase in correlation in the discussion section. The large effects

observed from the ANOVA results and the moderate correlation between the ground

truth and the arousal level suggest that the algorithm can detect frustration induced

arousal about research question RQ1. However, to confirm that this was indeed due

to the frustrating component on the screen, as with our research question RQ2, we

carried out an analysis on the participants’ gaze data to ensure that their attention

was focused on the source of frustration during the period of increased arousal level.

The use of fixation metrics as a proxy to measure attention is common practice in eye-

tracking studies [Corbetta et al., 1998, Pan et al., 2004, Holmqvist et al., 2011]. Task

T2D and T3D, which were the operating system crash and mouse pointer malfunction

did not have specific UI elements on the screen that caused increased arousal. There-

fore, we subset only T1D and T4D for this analysis, because the frustrating elements

in this tasks were located in a specific area of the screen, i.e., the time-out message

on T1, and the pop-up messages in T4. We segmented these tasks into AOIs around

the location of the pop-up and time-out message. The mean fixation count (number


of fixations) for T1D on the time-out message was 105.027 (SD = 22.514), and for

T4D, the mean was 38.210 (SD = 22.193). We performed a Spearman’s correlation

test and observed a high correlation between the fixation count and the algorithms’

measure of arousal for those tasks, (rs(75) = .867, p < .001). This result is supported

by literature in eye tracking and usability [Ehmke and Wilson, 2007]. Figure 5.5 il-

lustrates that participants who fixated more on the disruptor were more likely to be

frustrated. This suggests that the algorithm is both capable of discriminating be-

Figure 5.5: Number of fixations vs level of arousal (all observations n=75)

tween frustration-induced arousal and normal interaction (RQ1), while also detecting

the participants’ focal attention during these moments (RQ2).

5.6 Discussion

Based on the literature, we selected a list of relevant frustrating events during user

interaction with the assumption that those events would induce an increase in the

participants’ level of arousal. The moderate agreement between feedback from the

participant and expected feedback shows that there is some validity to this assumption,

but also means that not all participants conformed to our expectations. Also, there


was a significant increase in the correlation between the ground truth and the output

of the algorithm when we excluded records with ground truth disagreement (between

the self-report and the mode of interaction). We discuss possible reasons.

Individual-response (IR) specificity, where people react idiosyncratically to the same

stimulus has been shown to influence people’s autonomic response to stimuli [Engel,

1960, Wenger et al., 1961]. The quotes from participants P01M, and P37 suggest that

IR specificity affected this study.

The lack of perfect agreement could also be a result of bias in the participants’

self-report. Bias can be caused by exaggerated [Kihlstrom et al., 2000] and under-

stated responses [Levine and Safer, 2002]. Recall bias, where participants are unable

to recollect or quantify their experience is also a common problem in studies that

use retrospective self-reported feedback [Hassan, 2006]. To mitigate this, we replayed

the video of the participant during their participation, in the order in which they

performed the tasks to aid them in remembering their level of frustration during the

feedback session. Another factor that could have influenced the moderate interrater

disagreement is the order in which the tasks are presented to the participants. Con-

sequences of these are, the initial tasks may have had a more frustrating impact than

other subsequent tasks. For example, the quotes from P05 and P14M suggested that

the order may have influenced their susceptibility to the disruptive stimuli, and their

cognitive states, respectively. The randomised order of the study would have been

expected to reduce the effect of the stimuli presentation order in the main results, but

this would still have impacted the interrater agreement. For some participants, their

personal experiences may have influenced their response to the stimuli. For example,

the quotes from P31M and P12M in the results section.

The results suggested that the website contributed to the measure of arousal. Table 5.2

and Figure 5.3 highlight that only T1 - booking a trip was significantly different from

the other tasks. The average duration of the task was also higher than the average

duration of the other tasks. Therefore, we speculate that the time, and consequently,

the effort required to complete T1 could have aggravated the participant’s level of

frustration. This is not to dismiss the type of frustration experienced in task 1 (form

time-out) as a contributing factor to the increase in frustration level compared to

other tasks. If we consider each disruptor by examining the mean difference in arousal


between the modes of interaction on each task, results show that T1 had the largest

mean difference. According to this, a time-out was the most frustrating task, followed

by pop-ups, then operating system failure, while the mouse pointer disruptor was the

least frustrating task in this study. We discuss external validity as a form of limitation

to our experimental design. Considering that this was a controlled study, our evalua-

tion was based on internal validity. Concerning research question R1, results suggest

that frustration induced arousal on the Web using AFA algorithm. Results also show

that participants who fixated more on the area of interest, causing arousal experienced

more arousal, which is an indication that gaze behaviour can be used to reveal the

cause of arousal. In adaptive computing, where user interfaces can be adapted to suit

individuals, it is necessary to understand how each component of the UI affects the

user. In most affect detection mechanism, it is difficult to determine the relationship

between the visual components and user affect, therefore, making them less suitable

for adaptive computing, identifying or identifying UX problems. The results provide

further credence for the use of AFA algorithm in sensing and quantifying arousal on

the web.

In this controlled study, we showed evidence that common causes of frustration which

could appear subtle to the user but have critical effects on positive user experience,

can be sensed with our proposed approach.

Our findings from this particular study have the potential to impact the way website

contents are being structured and delivered. Since the algorithm is lightweight (not

requiring large computational resources), we propose a system, based on a plugin ar-

chitecture, in which, other third-party systems can receive a trigger from the algorithm

whenever the level of arousal exceeds a given threshold. Third-party adaptive systems

can, therefore, influence user interaction by altering the presentation (layout, colour,

font properties) and contents based on the user’s emotional state. This system also

has the potential to aid recommender systems in entertainment applications such as

suggesting music/video playlists, news items, and facilitating digital shopping assis-

tants based on the user’s affective profile. One of the causes of users’ resistance to

software changes is due to the initial lack of familiarisation with the look and feel. This

system could provide the platform for third-party applications to log the user’s mo-

ments of frustration and the causes of frustration so that developers can leverage this


information to the UX of future releases. In the future, when web camera technology

improves, to offer low-cost eye-tracking, the algorithm could be used in many appli-

cation domains, including, gaming, to alter the game based on the user’s emotional

state, in tutoring systems where systems are aware of the student’s emotional state

and can offer an experience tailored to fit their emotional profile. In mission-critical

systems such as air-traffic control, the system may suggest break times for operators

based on their affective state.

5.6.1 Limitations of this study

Several factors beyond website and mode of interaction contribute to a user’s level of

arousal. This is a controlled study in which we examined frustration. On the Web, sev-

eral other factors contribute to the intensity of frustration as well as other factors that

could increase the level of arousal. For example, in the event of real operating system

failure, the recovery rate may be longer, and the frustration could last longer. Also,

when participants enter their payment details in the event of booking a real trip, and

there is a time-out, the frustration may be experienced differently from experimental

conditions where the participants are aware that the tasks have limited impact on their

finances, personal computer or time. To consider other factors that may influence the

cause and severity of frustration, a more naturalistic approach should be undertaken.

However, besides technological and ethical limitations in carrying out this study in

naturalistic settings, other limitations exist. The algorithm for sensing arousal would

need to be optimised to handle real-time data streams. From our methodology section,

we explored the use of change point detection algorithm, which can function in real-

time. Applying such an algorithm will help take in streams of pupil dilation and gaze

behaviour data, segment them into fixed windows, and then AFA algorithm would be

applied on each window, to sense a change in arousal. Furthermore, in experimental

settings, light conditions such as ambience, monitor brightness and the relative colour

difference in the stimulus can be controlled. Developing a module to account for these

changes will also improve the algorithm’s reliability under naturalistic settings.


5.7 Conclusion

Frustration frequently occurs during user interaction and is challenging to prevent due

to individual idiosyncrasies, and multiple scenarios that can elicit frustration. Frustra-

tion increases the level of arousal, which we know is a critical factor in performance and

user experience. Having evaluated AFA algorithm on emotional and cognitive stimuli,

we attempt to sense frustration, which may be of lower intensity. Our stimuli, being

Web, makes this study more ecologically valid, thereby presenting further challenges

such as detecting arousal during user interaction. We proposed this approach to quan-

tify changes in arousal while also identifying users’ focal attention through the analysis

of the pupillary response and gaze behaviour. The pupillary response was used to sense

increases in arousal levels as a proxy for frustration, while the gaze behaviour indicates

the user’s visual focus; hence, the likely cause of the sources of frustration. Results

indicate that AFA algorithm offers a feasible method to discriminate between normal

tasks and frustration-induced arousal on the Web. We found significant correlations

between the user’s focal attention on stressful UI elements, and the measure of arousal,

in this study. Therefore, in the next chapter, we apply the concept of identifying the

focal attention with arousal levels in greater detail, to achieve a richer understanding

of users’ affective behaviour on the Web.

Chapter 6

Arousal detection and visual

attention over users’ scanpath

In previous chapters, we have evaluated our arousal sensing algorithm on sensing emo-

tionally induced arousal, cognitively induced arousal both on static stimuli. In the last

chapter, we examined its ability to sense more complex interactions with low-intensity

arousal on the web. In the preceding chapter, we also observed that there is a signif-

icant relationship between the cause of arousal (i.e. UI elements on the screen) and

the measure of arousal. This means that if we know where participants look at when

they experience an increase in arousal, this leads us a step closer to understanding why

they experience a certain feeling (e.g., stress). Consequently, we are better equipped to

make informed changes to improve the quality of users’ experience, both in real-time

or as part of a design process.

In this chapter, we demonstrate one of the ways this algorithm can be combined with

other recognised methodologies to achieve a higher understanding of users’ affective

behaviour on the Web. We take the case of the Scanpath Trend Analysis (STA) al-

gorithm by Eraslan et al. [Eraslan et al., 2016a], which is used to summarise users’

visual transition (scan path) across a Web page. The summary of peoples visual tran-

sition tells us where they look at, and the order in which they scan through a Web

page. Whereas, AFA algorithm can be used to reveal the affective reaction of users,

towards their visual attention. Together, both algorithms can be used to summarise

the users’ visual experience. Therefore, in this chapter, we show how we developed and

explored our novel methodology, which combines the STA algorithm with our arousal

150

CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 151

sensing technique, to generate our descriptive user model. This methodology is based

on users’ visual behaviour and their affective response to their trending visual paths.

This can be used to understand user interaction better and is therefore applicable in

adaptive systems and intelligent user interfaces. To test this idea of combining scan

paths with arousal sensing, we performed a pilot study, using the Apple home page,

to see whether we can derive a model based on this dataset. After the model was

developed, we decided to evaluate it on datasets from other websites. In this context

also, we use a case study of neurotypical users vs people with autism.

We have used this case of autistic people vs neurotypical users on the Web to suggest

that AFA algorithm and STA can be used to uncover differences in interest, affective

response and visual behaviour.

Autism spectrum disorder (ASD) is a developmental disorder characterised by social

and communication impairments and by restricted interests and repetitive behaviours

[Christensen et al., 2018]. The overall prevalence of autism is estimated to be between

1.1 and 1.2% of the UK populace [Brugha et al., 2012]. The presence of autism relates

to different experiences when interacting with the Web and can elicit different affec-

tive responses [Eraslan et al., 2017, Eraslan et al., 2018]. For example, people with

autism often exhibit idiosyncratic visual attention patterns, which have been shown to

affect their processing of Web pages [Mayes and Calhoun, 2007]. Besides, the strong

preference for structure and familiarity that many individuals on the spectrum have,

may be challenged by changes in the structure or the interface of many applications

[Yaneva et al., 2018]. Last but not least, there are well-documented differences in the

way some people on the spectrum interpret emotions from facial expressions and the

common presence of such faces on Web pages may affect their processing by people

with autism [Harms et al., 2010].

6.1 Motivation behind our methodology

There are many methods of investigating UX problems from both qualitative and

quantitative paradigms, including questionnaires, eye-tracking and physiological com-

puting. Qualitative methods of investigating UX problems often require participants


to communicate their experience. This may be inaccurate due to some reasons such

as lack of self-reflection with regards to one’s differences, as well as difficulties with

communication in general, which is also one of the diagnostic criteria for ASD [Paulhus

and Vazire, 2007]. An alternative approach which requires less verbalisation is using

self-reported scales to elicit feedback from users. However, recollecting experiences

requires cognitive processing. People with autism, ADHD and similar developmental

disorders may exhibit cognitive impairment [Volkmar et al., 2014], thereby limiting

the accuracy of their reported scores. Due to these limitations, we argue that meth-

ods that require eliciting intentional feedback from autistic users are less reliable for

understanding the UX challenges that people with autism face. Usability metrics such

as error rates and completion times can be used to detect common problems that

users experience on the Web. Predominantly, these metrics answer the question - “Is

there a problem?” but, discovering problems is only the first step to improving UX.

Analysis of gaze behaviour with metrics like fixation location/duration and saccades

(visual transitions) can be used to answer the more advanced question - “Where is the

problem?” but, this is also limited because knowing where the problem lies may not

always lead to a solution. Trying a different approach may alleviate the problem for

one user, but, without knowing why the problem exists, we may not have an under-

standing of which group of users the problem affects. People with autism are not the

only atypical groups of users on the Web. In addition to the idiosyncrasies of typical

users, people with learning disorders, developmental disorders, obsessive-compulsive

disorder (OCD), and other psychiatric/developmental disorders may also require spe-

cial considerations when carrying out studies to identify UX issues specific to these

user groups. The answer to the question - “Why is there a problem there?” provides

more context so that researchers can offer recommendations to overcome the existing

UX issue(s).

Affective computing relates to, arises from or influences emotions [Picard, 2010]. This

is useful, especially when the physiological state of users can be detected and related

to their interaction patterns. We show that the combination of visual behaviour and

physiological computing presents a rich methodology for identifying and understand-

ing UX issues. Data collected through physiological means are often noisy and may

also lack required sample sizes and data distributions to make finding generalizable.


Our approach provides a descriptive method that aids hypothesis genesis about UX

issues, which can then be followed up by qualitative feedback from users or further in-

ferential statistical analysis. To explore the feasibility of combining our arousal sensing

approach with scan path analysis, we conducted the following pilot study.

6.2 Pilot study

In this pilot study to develop our methodology to combine scan path analysis with

sensing arousal, we used a browsing task on the home page of the apple website.

Participants (n=39, ) were instructed to explore the Apple home page for 30 seconds.

Figure 6.1 shows the Apple home page, segmented into AOIs. Participants’ pupillary

Figure 6.1: Apple home page segmented into AOIs

response and gaze data were captured using Tobii Eye tracker by other researchers, so,

this was a secondary analysis. With permission from the original researchers for the


study, we extracted the dataset and analysed it using our arousal sensing approach to

measure the cumulative arousal levels due to each AOI on the screen. We obtained the

following result presented in Table 6.1. Since we are interested in combining the STA

algorithm with AFA algorithm, we, first of all, compare the output of AFA algorithm

with the STA scan path: F C H I E. From the AOIs included in the STA scan path, we

notice that they were all located in the middle of the screen. This is expected because

the trending scan path should contain items that people mostly fixate on. Since they

are located in the middle rather than the extreme edges, people are more likely to

notice them. Another factor worth considering is that this task is a browsing task.

Hence, participants are more likely to focus on more conspicuous AOIs rather than

the obscure ones since they are under no obligation to consider the less conspicuous

elements on the screen. Conversely, we can see from our cumulative arousal scores

that the items that are more obscure on the bottom of the screen, i.e., the footer

elements elicited the most arousal (AOIs K, L and N ). However, this is only the case

for elements that were fixated on, e.g. (AOIs M, O, P and Q). Therefore, we can

hypothesise that, if an element is not conspicuous, it may not be fixated upon by most

people, but when they are fixated on, it may result in increased arousal. Looking

closely at this postulation, there is an exception to this. The navigation bar and the

search bar is also located at the top of the page whereas, they elicited relatively low

levels of arousal, 3.07 and 1, respectively. Hence, we can refine our postulation to

mean that the reason for the increase in arousal for the footer element is that the

participants are stressed by the presentation of the content, i.e., the small fonts used

in the footer.

Now that we can see that it is possible to combine arousal scores with trending scan

paths, to aid hypothesis generation in order to understand the behaviours of Web

users, we ask the question, “How can we combine STA algorithm, and our arousal

sensing approach such that it forms an affective model for Web users”

We discuss our design process and the rationale for our proposed methodology.


Table 6.1: Cumulative arousal per AOI on the Apple home page

AOI Cumulative arousal Description Location on the page

A 1 Search bar Top RightB 3.07 Navigation bar TopC* 5.22 Ipad mini Middle LeftD 4.45 Video thumbnail Middle LeftE* 2.89 Video thumbnail Middle LeftF* 4.44 Ipad picture Middle RightG 3.13 Ipad thumbnail Middle LeftH* 2.79 Mac thumbnail Middle LeftI* 3.60 Itunes thumbnail Middle RightJ 2.91 iPhone thumbnail Middle RightK 7.50 Footer left Bottom LeftL 7.50 Footer right 1 Bottom RightM - Footer right 2 Bottom RightN 9 Footer right 3 Bottom RightO - Footer right 4 Bottom RightP - Footer right 5 Bottom RightQ - Footer right 6 Bottom RightR - Footer right 7 Bottom RightNote: AOIs with * were included in the STA trending scanpath

6.3 Formation of the methodology

The purpose of this methodology is to build a model that factors in users’ scan paths, as

well as what they feel (in terms of arousal) when they fixate on the scan path. As stated

earlier, the goal is to aid hypothesis generation, so the model may be descriptive and

qualitative rather than produce a quantitative “matter of fact” result. We formulated

three ideas for this methodology. The first two approaches pertain to fusing one

method into the other, while the last one combines both approaches loosely into one

model.

• Use the measure of arousal as a factor in deriving the STA scanpath:

The STA algorithm makes use of gaze patterns like fixation duration and counts

to select the elements that will be included in the final STA scan path. One

approach will be to include a threshold such that only when users’ cumulative

arousal for a certain AOI meets this cut-off value would they be included in

the STA scan path. Semantically, this would mean that the output of the STA


scan path would be: 1. UI elements that participants fixated upon, and, 2.

elements that induced an increase in arousal. The use case of this would be

where UX researchers are interested in evaluating areas on the screen that people

look at and react to emotionally. Further investigation can diagnose whether

the emotion is positive, negative, which would then inform the UI designers

to either retain the design for the former or improve the design in case of the

later. One limitation to this approach is that it abstracts away the measures

of arousal, thereby limiting the potential of making meaning of the individual

affective scores for each element.

• To use the scan path as a weighting factor in calculating the cumulative

arousal: When a UI element is included among the trending scan paths, it means

that most people visit the AOI visually. However, if that same element induces a

significantly high measure of arousal, it means that that element is a key factor

in determining the feelings of most users on the web-page. This is applicable

when UX researchers want to generate a hypothesis about why most users find

a website frustrating or interesting. For example, if a UI element is seen to

induce a high level of arousal due to frustration or stress, and most people who

visit the website fixate on that UI element, it means that the impact of that

frustration on the quality of user experience is likely to be high. Conversely, if a

UI element is fixated on by most users, and people have low arousal when they

fixate on the website, we may find that the UI element is taking huge space on

the website but having little to no effect on the emotions of the users. While

sometimes, this is a good thing, other times, this may serve as a useful cue

in improving the level of engagement of a particular website. While there are

other approaches to evaluating engagement such as using usability scales [Obrien

and Toms, 2013, Xu, 2015], and website interaction metrics such as Google

analytics [Kirk et al., 2012, Bijmolt et al., 2010], this approach provides a richer

understanding of “why people feel (dis)engaged?”, rather than only answering

the question “are they engaged?” [Hart et al., 2012]. This approach, however,

comes with some drawbacks. Firstly, the scan path sequence is lost. The order in

the scan path sequence tells a rich meaning as to understanding the visual scan

behaviour of users. The scan path order is useful behaviour, especially when


carrying out website transcoding to improve visual search behaviour [Harper

et al., 2006]. Another thing we miss out from when we fuse the scan paths as

weighting factors to the measure of arousal is that we are tampering with the

semantic meaning of our results. For example, we are coupling the meaning of

arousal with visual engagement, which is two fundamentally different behaviours.

One person may be visually engaged, yet not result in increased arousal, while

another may experience increased arousal without being visually engaged, as

may be the case in cognitively induced arousal. For this reason, we proposed

the third approach where both methods (the STA algorithm, and our arousal

sensing approach) are loosely coupled, yet combined, to have a rich semantic

meaning to users’ behaviours on the Web.

• Combining visual scan paths with our arousal sensing approach: In

the first approach, the Arousal sensing algorithm was used as a weighting fac-

tor (which abstracts the cumulative arousal scores). Therefore, we would lose

some of the richness in the measures of arousal, which would have been useful in

generating hypotheses regarding participants’ affective behaviour. In the second

approach, where we would have used the scan paths as weighting factors for

the measure of arousal, we would lose the semantic meaning of what the result

means. Therefore, we discuss the benefits of combining both algorithms loosely,

and how we tackle the problem of perception since there is a potential for infor-

mation overload when the results are not as summarised as in the previous two

approaches.

This loosely coupled combination allows us clarity to generate our UX hypoth-

esis. For example, when we notice that some areas on the screen induce high

arousal, we can refer to the scan path to check if there was a gradual build-up

leading to a cumulative increase from previous AOIs. If that was so, we could

generate questions such as, ‘‘did people become stressed on a UI element as a

consequence of increased frustration from previous ones?”. Another reason for

combining both approaches this way is to retain the semantic meaning. We can

view the scan paths for what it means, visual scan sequence, and the affective

measure as measures of arousal. To derive a new UX measurement would require

a series of evaluations, and ground truth to verify any claims. At this point, we


are interested in understanding behaviour, rather than defining new ways for

measuring a certain behaviour. To us, combining both algorithms would mean

that we have more information to assimilate, which may lead to information

overload. To manage this, we developed a visualisation whereby, we can repre-

sent the cumulative measure of arousal for each AOI, as well as its sequential

order from the scan path analysis, by superimposing this values upon the visual

stimulus, in this case, the Web page. The scan path sequence is represented as

letters, while the arousal level is represented as numbers within the circle that

is superimposed on the AOI of the web page. The size of the circle also repre-

sents the measure of arousal. This is also useful when we want to compare the

behaviours of different groups of users; we need to colour code each user groups’

visualisation so that they are distinguishable. We discuss this in the analysis.

This approach has immense contributions to UX research, such as modelling

user groups for adaptive computing and intelligent systems. We discuss our

contribution next.

To evaluate our methodology, we use the case of two groups, in which the literature

has provided evidence of differences in cognitive and affective processing. The case

of Autistic vs Neurotypical users, regarding how they differ cognitive and affective

behaviours have previously been researched on. However, our proposed methodology

may be used to uncover other behavioural idiosyncrasies between both groups. There-

fore, in the next section, we briefly discuss existing research about this population

regarding their behaviours on the Web.

6.3.1 Autism and the web

Previous work showed evidence that atypical visual attention in individuals with

autism may result in unconventional information-searching strategies on the Web

[Eraslan et al., 2018]. Such differences revealed through eye-tracking have also been

used to classify users into autistic and neurotypical groups [Yaneva et al., 2018]. A

more thoroughly researched factor on the Web known to affect the two groups dif-

ferently is textual content. While not specifically presented on web pages, deficits in


reading comprehension among people with autism have been widely researched [Chi-

ang and Lin, 2007, Ricketts et al., 2013], including by means of eye-tracking experi-

ments [Yaneva et al., 2016b, Yaneva and Evans, 2015, Yaneva et al., 2016c, Yaneva,

2016] and in combination with images [Yaneva et al., 2015]. This issue has been

addressed in readability research by attempting to measure the difficulty of text for

readers with autism, specifically based on the difficulties they encounter [Yaneva et al.,

2016a, Yaneva et al., 2017]. To the best of our knowledge, there is currently no method-

ology that combines pupillary response, gaze behaviour and scan path analysis to study

the affective response of this population on the web. Hence, we discuss our research

contributions next.

6.4 Research questions and contributions through

this study

For our methodology, we combine two different algorithms: 1. The Scanpath Trend

Analysis (STA) algorithm, which provides the trending scan path followed by a group

of users [Eraslan et al., 2016a] and 2. Arousal detection through the analysis of

pupillary response to identify moments of increased arousal. Gaze analysis is then

used to identify the user’s visual attention on the screen (i.e., visual element) during

moments of increased arousal [Matthews et al., 2018b]. The individual arousal scores

for each participant are averaged for each visual element. The sequence of each visual

element of the trending scan path and the corresponding arousal score is combined

to produce a visualization. The output of our approach is an aggregate of a group of

users’ scan paths over a visual stimulus, and their arousal response to each element of

the scan path. Based on this approach, our research questions are as follows:

RQ1. Does the combination of scan path analysis, and level of arousal reveal

differences between people with autism and neurotypical people in

Web browsing tasks? This question allows us to apply our approach to other

methods for understanding user behaviours on the Web.

RQ2. Does the combination of scan path analysis and arousal reveal where


the differences in arousal occur between people with autism and neu-

rotypical people in Web browsing tasks? This research question furthers

RQ1 in Chapter 1, pertaining to the overall goal of our research to determine

the focal attention of users during moments of increased arousal.

To test our hypothesis and evaluate our methodology, we do so using two populations:

19 neurotypical users and 19 users with autism. During a Web browsing task involving

8 Web pages, their pupil dilation and gaze behaviour was tracked. Further, we apply

the STA algorithm and our arousal sensing algorithm. Finally, we combine them and

visualise the result. We observe differences in their visual behaviours as, in certain

instances, the autistic group exhibits a lower arousal response to affective contents.

While this is consistent with the literature on autism, we confirm this phenomenon on

the Web. Our approach and findings present a novel research methodology to identify

and improve understanding of user interaction problems of user groups with varied

interaction patterns and experiences. Our contributions are as follows:

1. A methodology that combines visual scan path analysis with arousal scores to

provide a more holistic understanding of the users’ experience.

2. The analysis and visualisation of differences in autistic users vs neurotypical

users in Web browsing tasks using our methodology.

The study and data collection was done by Yaneva et al. [Eraslan et al., 2018], and

the STA algorithm was developed by Eraslan et al. [Eraslan et al., 2016a]. In the next

section, we describe the experiment in further detail.

6.5 Experiment

In this section, we explain the methodology employed to address our research questions

that are specific to this chapter. Further details about this experiment, the method-

ology and dataset can be found in the original publication of this study by Yaneva et

al. [Eraslan et al., 2018]. We performed a secondary analysis of this study to highlight

our contribution.


6.5.1 Participants

A total of 38 participants, 19 with a formal diagnosis of autism and 19 control-group

participants were recruited for this study1. None of them had any diagnosed degree

of intellectual disability, not any reading disorders. The mean age for the ASD group

was M = 41.05 with SD = 14.04, range [21-67], and M = 32.15, SD = 9.93, range

[20-56] for the control group. All participants with ASD were recruited through a UK

autism charity and the student enabling centre at the University of Wolverhampton.

All control-group participants were recruited through snowball sampling. Both the

participants with ASD and the control-group participants were highly able adults,

all of whom were living independently and without relying on a caregiver. From the

ASD group, 11 people had completed a higher education degree, six people had a UK

equivalent of a high-school degree (GCSE or A-levels), and two people preferred not to

answer. From the control group, 15 people had completed a higher education degree,

and three people had completed A-levels (equivalent to high school). All participants

were native speakers of English except four control-group participants, who were highly

fluent, having lived in the UK for many years. All participants reported that they use

the Web daily, with only one ASD participant reporting Web usage “less than once a

month”. All participants identified as having normal or corrected to normal vision.

6.5.2 Apparatus

A Gazepoint GP3 video-based eye-tracker was used to capture pupillary response,

fixation location, and fixation duration of the participants at a frequency of 60Hz. All

questions and answers were exchanged verbally, hence no mouse or keyboard were used.

The stimuli were presented on a 17” LCD monitor. The experiment was run using the

Gazepoint experimental environment and the laptop used for the experiments had a

Windows 10 operating system.

6.5.3 Materials and Method

Eight Web pages were selected by first exploring the home pages of the top 100 web-

sites listed by Alexa.com, excluding those that were repeated more than once. Pages

1This experiment was approved by the University of Wolverhampton, UK committee on ethics.


that were not in English and were mainly designed for authentication and/or as search

pages were also excluded. We then selected the final eight pages in such a way, as to

have a balanced representation of factors such as complexity and space between ele-

ments. The complexity values were obtained using the VICRAM algorithm [Michaili-

dou et al., 2008]. In our final selection, an equal number of pages had a high complexity

(YouTube, Amazon, Adobe and BBC) and low complexity (WordPress, WhatsApp,

Outlook and Netflix), as well as small (Outlook, Netflix, Adobe and BBC) and large

space (WordPress, WhatsApp, YouTube and Amazon) between their elements. Par-

ticipants were presented with screenshots of the pages to ensure consistency in the

look and feel of the web pages. For images of each page, see Figure 6.2a.

There were two types of tasks: browsing and synthesis, presented in counterbal-

anced order for each participant. For the browsing task, the participants were free to

explore each page for 30 seconds. For the synthesis task, each participant had up to

120 seconds to answer two questions per page, with the possibility to move forward

earlier if they had answered the questions. Each question required the participants to

combine information from at least two-page elements to arrive at the third piece of

information not explicitly given on the page. Examples include “What is the cheapest

plan you can get that offers Email & Live Chat support?” for the WordPress page,

where the participant has to identify the plans that offer email and live chat support

and compare their prices or “Which item has the largest price discount measured in

percentage?” for the Amazon page. In this paper, we selected the browsing task for

our case study so that our analysis is based on a single web page per website.

All experiments were conducted in a quiet room. First, the consent form and

the demographic questionnaire were filled in by the participants. After that, the eye

tracker was calibrated using a nine-point calibration, and the experiment commenced.

All questions and answers were given verbally, and the participants were all given

a break between the tasks. After completing the experiment, all participants were

debriefed.

6.5.4 Analysis

The analysis is carried out in two main stages. The first stage makes use of Scan

path analysis using the STA algorithm to summarize participants’ scanpaths into a


(a)

Wh

atsA

pp

(b)

Am

azon

(c)

Word

pre

ss

(d)

Net

flix


(e)

BB

C(f

)Y

ouT

ub

e

(g)

Ad

obe

(h)

Ou

tlook


single scanpath. Following this, AFA algorithm is applied for the each element that

constitutes the trending scanpath for each group (autistic and neurotypical). We

explain this further, in the subsections below.

The STA algorithm

The Scanpath Trend Analysis (STA) algorithm identifies the trending path of

multiple users on a Web page in terms of its AOIs. The STA algorithm is a multi-pass

algorithm which is comprised of three core stages: (1) Preliminary Stage, (2) First

Pass and (3) Second Pass. The detailed description of the STA algorithm can be

found in [Eraslan et al., 2016a].

1. Preliminary Stage: This stage firstly takes a series of fixations for each user on

a particular Web-page and the details of the AOIs of the page. It then matches

each fixation with its corresponding AOI to generate the individual scan paths

in terms of the AOIs of the Web-page.

2. First Pass: Once the individual scan paths are ready for further processing, the

First Pass start analysing them to identify trending AOIs by selecting the AOIs

which are shared by all the users or catch at least the same attention as the fully

shared AOIs based on their total fixation durations and total fixation counts.

3. Second Pass: After identifying trending AOIs, the Second Pass calculates an

overall sequential priority value for each trending AOI based on their positions

in the individual scan paths. It then combines these AOIs based their priority

values to discover the trending path where the trending AOI with the highest

priority will be the first one in the trending path.

The STA algorithm was evaluated by comparing its resultant paths with the re-

sultant paths of other similar algorithms by using different AOI detection approaches

[Eraslan et al., 2016a, Eraslan et al., 2016b]. The evaluation shows that the resultant

path of the STA algorithm is the most similar one to individual scan paths. Thus it

discovers the most representative path. The detailed results of the evaluation can be

found in [Eraslan et al., 2016a, Eraslan et al., 2016b].


Arousal sensing and detection of focal attention

We developed our arousal sensing algorithm iteratively over different eye-tracking

datasets from different application domains (i.e., medicine and ontological authoring)

and ground truth from domain experts and participants’ self-reported feedback. It has

been evaluated on detecting cognitive induced arousal using Stroops’ effect ([MacLeod,

1991]) to elicit differential levels of cognitive load. It has also been evaluated on

emotionally induced arousal using datasets from the International Affective Picture

System (IAPS) database [Lang et al., 1997]. Furthermore, by this time, it had been

evaluated on its ability to sense frustration-induced arousal on the Web. Therefore,

the algorithm was suitable for sensing arousal in our Web browsing tasks. Details of

our implementation of this can be found in Section 3.6. From the output of our arousal

sensing algorithm, we generate vectors that represent the measure of arousal elicited

by each AOI on each participant. We merge the two algorithms by computing the

average and standard deviations of the arousal scores for each of the visual segments

(AOIs) that make up the trending scan paths from STA algorithm.

6.5.5 Visualizing our visual behaviour model

Our pilot study revealed some of the factors that may influence arousal changes during

web interaction. They include the position of elements on the screen, the content and

the font size. Our design of the visualisation considered the fact that the structure and

content must be put into perspective if our methodology would be used to uncover rea-

sons why people experience a change in arousal or group differences in web behaviour.

In regards to group differences, it is also necessary that the visualisation is capable

of showing multiple groups within the visualisation. Therefore, our design involved

superimposing the objective measurements from the algorithms (affective measures,

and STA scan path sequence) onto the visual stimulus (after it has been segmented

into AOIs).

To help us determine where the primary differences are located, this visualisation was

utilised for data exploration (Figure 6.11). Both groups are colour coded so that mul-

tiple groups can be easily distinguished. The circles that are superimposed on the

AOIs that elicited the arousal contain a letter and number. The letter indicates the


sequential order of visualisation that occurred on viewing the Web-page. The corre-

sponding number represents the arousal levels (AL) for each AOI and is rounded to

the nearest whole number so that the size of each circle indicates the ordinal level

of arousal for each group. The arousal levels on this visualisation can be treated as

ordinal measures where 1 to 3 indicate low arousal, 4-6 medium and 7-9 high levels

of arousal. This visualisation was used to generate hypotheses that may explain the

general behaviour of participants in each group.

6.6 Results

In relation to RQ1, which concerns detecting differences in arousal between both groups

for each task, we computed the mean arousal score per participant for each Website.

Results from Table 6.2 indicate that only the YouTube Website shows a significant

difference in arousal between the trending scan paths of the autistic group and the

neurotypical group.

Table 6.2: Results of Mann Whitney U test comparing arousal between each group(autistc and neurotypical) with Bonferroni correction α = 0.00625

Website U p-valueWhatsApp 44.0 .077Amazon 792.5 .380WordPress 755.5 .233Netflix 2132.0 .438BBC 341.5 .208YouTube 388.5 .002*Adobe 1118.0 .211Outlook 405.0 .231Note: * = p < .00625

The mean arousal for the neurotypical group (M =4.00, SD=2.80) for the YouTube

Website was also higher than that of the ASD group (M =2.41, SD=2.24). This is con-

trary to our hypothesis that people with autism would experience more arousal levels

in the browsing tasks. With regards to RQ2, pertaining to identifying differences in

AOIs between groups, further exploration shows that both groups experience arousal

in different degrees from different elements in their respective scan paths.

We examine the bar charts and error bars from each of the websites, carrying out


statistical tests to analyse these differences.

6.6.1 Analysis of the Web pages by their AOIs

Table 6.3: Scan path Sequence (Seq), participants (n) with change in arousal level

per AOI , mean arousal (M ) for the participants and standard deviation (SD) for the

controlled and autistic group (ASD).

NB: The gaps in the table exist where there are fewer elements making up the trending

scanpath over a website for that group

Controlled ASD

Website Seq AOI n M SD AOI n M SD

WhatsApp A 160 13 3.3 2.98 156 4 1.28 0.33

B 159 10 3.68 3.09

C 156 5 5.37 3.61

D 157 3 2.02 1.28

E 161 8 2.8 2.7

Amazon A 263 8 3.82 2.76 263 8 2.95 2.58

B 265 6 5.19 3.59 264 12 6.27 3.22

C 264 9 3.32 3.35 265 11 3.67 3.02

D 266 10 4.11 3.83 266 5 2.36 2.62

E 260 7 3.47 3.31

F 251 7 1.61 0.96

WordPress A 178 14 5.67 3.38 179 10 4.48 3.03

B 179 16 2.96 2.83 178 12 4.74 3.45

C 180 7 2.68 2.93 180 13 3.6 3.27

D 179 10 4.48 3.03

Netflix A 126 6 1.67 1.03 277 6 5.31 4.1

B 125 11 2.8 2.57 279 12 3.97 3.11

C 277 7 4.77 2.86 280 11 2.99 2.53

D 280 4 6.44 3.78 279 12 3.97 3.11

E 279 12 3.83 3.48 126 6 3.25 2.99



Controlled ASD

Website Seq AOI n M SD AOI n M SD

F 277 7 4.77 2.86 281 10 2.71 3.01

G 280 4 6.44 3.78

H 118 1 1 -

I 279 12 3.83 3.48

J 283 6 1.33 0.82

K 281 6 4.17 3.82

BBC A 109 17 5.48 3.89 109 17 5.71 3.37

B 110 9 3.09 3.42 110 13 4.19 3.35

YouTube A 198 5 5.16 2.83 199 8 3.93 3.64

B 199 6 5.29 3.11 200 6 2.35 1.31

C 200 7 6.24 3.15 202 9 1.67 1.33

D 203 5 3.5 3.21 203 6 1.54 0.67

E 202 8 3.15 1.88

F 210 3 2.47 2.54

G 206 7 3.57 2.51

H 207 3 1.32 0.33

Adobe A 16 7 3.55 1.54 15 6 4.11 3.45

B 15 6 3.6 2.9 18 8 3.27 2.2

C 18 11 3.38 2.76 16 8 2.52 2.35

D 19 7 3.02 2.88 19 9 3.69 2.49

E 21 13 3.78 3.08 20 5 4.02 3.73

F 22 5 3.8 3.35 21 8 4.55 3.97

G 225 7 6.1 3.69

Outlook A 137 10 4.21 3.42 137 8 2.56 2.01

B 138 16 3.53 3.26 138 16 5.42 3.44

C 136 11 3.47 2.18


Fig

ure

6.3:

Lev

els

ofar

ousa

lfr

omth

eau

stis

tic

and

neu

roty

pic

algr

oup

per

AO

Ifo

rth

eW

hat

sapp

Web

pag

e


Fig

ure

6.4:

Lev

els

ofar

ousa

lfr

omth

eau

stis

tic

and

neu

roty

pic

algr

oup

per

AO

Ifo

rth

eA

maz

onW

ebpag

e


Fig

ure

6.5:

Lev

els

ofar

ousa

lfr

omth

eau

stis

tic

and

neu

roty

pic

algr

oup

per

AO

Ifo

rth

eW

ordpre

ssW

ebpag

e


Fig

ure

6.6:

Lev

els

ofar

ousa

lfr

omth

eau

stis

tic

and

neu

roty

pic

algr

oup

per

AO

Ifo

rth

eN

etflix

Web

pag

e


Fig

ure

6.7:

Lev

els

ofar

ousa

lfr

omth

eau

stis

tic

and

neu

roty

pic

algr

oup

per

AO

Ifo

rth

eB

BC

Web

pag

e


Fig

ure

6.8:

Lev

els

ofar

ousa

lfr

omth

eau

stis

tic

and

neu

roty

pic

algr

oup

per

AO

Ifo

rth

eY

ouT

ub

eW

ebpag

e


Fig

ure

6.9:

Lev

els

ofar

ousa

lfr

omth

eau

stis

tic

and

neu

roty

pic

algr

oup

per

AO

Ifo

rth

eA

dob

eW

ebpag

e


Fig

ure

6.10

:L

evel

sof

arou

sal

from

the

aust

isti

can

dneu

roty

pic

algr

oup

per

AO

Ifo

rth

eO

utl

ook

Web

pag

e


Starting with Figure 6.3, which represents the arousal levels for the WhatsApp Web

page, considering each group - autistic (red) vs neurotypical (blue). We can see that

AOI 156 (the search bar) elicited a higher magnitude of arousal for the neurotypical

group than the autistic group. AOI 153 (the WhatsApp logo) also induced more

arousal on the neurotypical group than the individuals with autism. The only AOI to

elicit more arousal in the autistic group than the neurotypical group was AOI 163 (an

AOI which describes the features of WhatsApp on Web and Desktop).

Moving on to the Amazon Web page on Figure 6.4, there were four AOIs where the

neurotypical population experienced arousal of a distinctively higher magnitude than

the individuals with autism: AOI 229 (search bar), AOI 254 (Ad feedback), AOI 246

(language selection) and AOI 232 (delivery location). There were three AOIs with a

distinctively higher magnitude of arousal for the individuals with autism compared to

the neurotypical participants: AOI 264 (deal of the day - security system), AOI 260

(product filter), and AOI 240 (navigation item - Today’s deal).

About the arousal levels per AOI on the Wordpress Web page, Figure 6.5 shows

that only AOI 176 (caption - “choose your Wordpress.com flavour”) elicited more

arousal to then neurotypical than the individuals with autism. Similarly, only one

AOI 169 (a navigation item) elicited slightly more arousal to the autistic group than

the neurotypical population.

On the Netflix Web page (Figure 6.6), only AOI 280 (feature item - Ultra HD available)

elicited more arousal for the neurotypical users than the individuals with autism.

Contrarily, AOI 126 (caption - Join free for a month), AOI 117 (picture and caption

(a) Legend

Figure 6.11: Levels of arousal from each group’s trending scan path, overlaid on theAOI’s of each Website.


(b)

Wh

ats

Ap

p(c

)A

maz

on

(d)

Word

pre

ss(e

)N

etfl

ix


(f)

BB

C(g

)Y

ou

Tu

be

(h)

Ad

ob

e(i

)O

utl

ook


- Pick your price), AOI 118 (picture and caption - “No commitments cancel online at

anytime”), AOI 124 (header banner of a movie - “Narcos”) and AOI 284 (feature item

- cancel anytime) all induced arousal of higher magnitudes to the individuals with

autism compared to the neurotypical users.

On the BBC Web page, Figure 6.6) showed no AOI where the magnitude of arousal

for the neurotypical was higher than the autistic population. Whereas, three AOIs:

AOI 113 (news caption - “Today’s formula 1”), AOI 276 (page header - sports) and

AOI 115 (program schedule - “Australian Grand Prix”) induced higher arousal on the

autistic group than the neurotypical population.

Figure 6.8 shows the result of the Youtube page. Recall that the Youtube Web page

was the only page that showed a statistically significant difference between the overall

cumulative arousal of the neurotypical population and the population of individuals

with autism. Breaking it down by AOIs, there were six AOIs where neurotypical users

experienced a higher magnitude of arousal compared to the individuals with autism.

Namely, AOI 202, AOI 215, AOI 203, AOI 200, AOI 206 and AOI 212, which are all

video thumbnails and captions. Whereas, AOI 210, AOI 214, AOI 190, which were

captions and video thumbnails, AOI 205 and AOI 204, which were category headers,

and AOI 287 and AOI 289 which were menu items all induced more arousal on the

individuals with autism than the neurotypical participants.

On the Adobe Web page, Figure 6.9 reveals that only AOI 24 (a caption - “Not in

school? See other creative cloud plans”) induced more arousal onto the neurotypical

participants than the individuals with autism. Similarly, only one AOI, AOI 20 (a

picture and a caption representing Adobe’s “Document Cloud”) induced more arousal

in the individuals with autism than the neurotypical populace.

Finally, on the outlook Web page, we can see from Figure 6.10 shows that only AOI 141

(the Endnote logo) induced more arousal on the neurotypical users than the individuals

with autism. Whereas, AOI 138 (a region filled with text describing some of the

features of outlook), AOI 139 (the skype logo) and AOI 146 (the Giphy logo) induced

more arousal in the individuals with autism than the neurotypical participants.

Our observations from these results are that textual contents are more likely to induce a

greater magnitude of arousal in individuals with autism, compared to the neurotypical

users. Next, we discuss the combined result of our arousal analysis and the output of


the STA algorithm. The average and standard deviations of the arousal scores that

make up the trending scan paths for each group is given in Table 6.3. For example in

the first row, 13 participants (out of 19 in the controlled group) experienced a mean

arousal score, M =3.3, SD=2.98 while looking at ‘AOI 160’ of the Whatsapp Website

whereas, no participant from the ASD group experienced an increase in arousal caused

by the same AOI. From our visualisation in Figure 6.11a, there was a longer trending

scan path for the control group, compared to the ASD group, which had only one

element, AOI 156 (the WhatsApp search bar). The control group experienced a level 5

measure of arousal, whereas the ASD group was level 1. In Figure 6.11d, the controlled

group experienced a more varied range of arousal from level 1 to level 6 over nine

different elements compared to the ASD group that ranged from level 3 to 5 over five

elements. Similarly, on figure 6.11f, we can observe that the controlled group has a

longer scan path and more common visual coverage over the website whereas, the ASD

group have a linear, horizontal scan path across the YouTube Web-page. The arousal

response due to AOI 200, from the controlled group (AL=6), was more than the ASD

group (AL=2). Interestingly, AOI 200 pertains to a video about Stephen Hawking and

his death. Overall, the ASD group exhibited lower levels of arousal (AL=4 vs AL=5

by the control group) when looking at the UI element (AOI 199). The image displayed

on this AOI was a picture of people laughing in excitement. This difference may be

due to the ASD group exhibiting different affective responses than neurotypical people,

which is consistent with findings in the literature [Gallese, 2006].

Results show that the fusion of the STA algorithm and AFA algorithm summarises

behaviours between groups of users. The observations made using our approach can

be analysed further using either inferential statistics or qualitative methods to provide

additional supporting evidence for research findings. In the next section, we discuss

how our modelling method can be used to identify possible UX issues, such as cognitive

load, the presence of stress and differences in the visual perception of the Web elements

in relation to physiological arousal.


6.7 Discussion

Regarding RQ1, that pertains to determining if there is a difference in arousal between

both groups in browsing tasks, we hypothesised that the ASD group would experience

more arousal compared to the neurotypical users. However, our assumption that the

ASD group will experience more arousal from browsing tasks was not the case be-

cause an increase in arousal could be indicative of interest, anticipation or excitement

from different AOIs, all of which could have confounded our results. A more direct

approach would be to test each UI element based on our hypothesis. For example,

narrow our hypothesis to be specific to AOIs, where we have more understanding and

control of their emotive rating and the complexity of the content. This was not how-

ever possible due to the small and uneven sample size of people who fixated on AOIs,

and the varying order of their fixations, thereby, making statistical group comparisons

inappropriate [Greenland et al., 2016].

With regards to RQ2, identifying differences in AOIs between groups, we were able to

observe differences in visual and physiological patterns between both groups in our de-

scriptive approach. For example, in Table 6.2, our observation that the autistic group

showed less arousal compared to the neurotypical group for several UI elements on the

YouTube page. One element shows people in a happy state, while the other contains

the thumbnail for a video regarding the death of physicist, Stephen Hawking. A UX

researcher may relate this to symptoms where people with autism interpret affective

expressions differently from neurotypical people [Philip et al., 2010]. In research by

Baron-Cohen et al., they observed that autistic results that suggest that autistic indi-

viduals do not recognise bodily expressions of emotions as well as non-autistic people

[Baron-Cohen et al., 1988]. Therefore, an implication could be that, if a UI element

contains a link that is a functionally significant aspect of the website, it must be pre-

sented in a manner that does not rely primarily on facial expressions or emotional cues

[Celani et al., 1999, Cook et al., 2013]. This is one such usability issue that can be

identified using this methodology.

Another behaviour that our methodology can help to uncover is that of understanding

the emotions that users experience prior to leaving a Web-page. It may be the case

that the initial UI elements that the users engage with eliciting higher arousal than


the final ones, as is the case with the BBC and YouTube Web-page, or that partic-

ipants experience an increase in arousal in the middle of their interaction compared

to the beginning and end, as the case with the Amazon website or they experience

lower arousal levels at the final UI elements as with the Adobe Web-page. Arousal

could indicate positive feelings such as attraction [Foster et al., 1998], emotionally

neutral ones (i.e. cognitive load [Gellatly and Meyer, 1992]), or negative feelings like

frustration and stress [Mackay et al., 1978]. When participants experience an increase

in arousal towards the end of an interaction, this could imply that they found what

they were looking for, i.e., excitement in completing a task/goal [Wulfert et al., 2005],

or that they were frustrated [Doob and Kirshenbaum, 1973]. Both of these cases could

benefit from the optimisation of the user interface. In the first instance, if users take a

long time to find the item of interest on a page, it means that the user experience may

be improved by repositioning the element to a more visually accessible location, or by

using a more attractive design to draw the attention of users towards that particular

content [Chen et al., 2003]. When users experience frustration with a UI element prior

to leaving a Web-page, it could be an indication that the UI element has a usability

problem [Nakarada-Kordic and Lobb, 2005]. This type of diagnosis is mainly possible

because we combined users visual scan path with a measure of their arousal levels.

The implications for design include aggregation of different user groups of users and

potential modelling of their behaviour. For example, on an e-learning website, people

with autism can have a different profile that takes account of and adapts to their

unique traits and requirements [Martin et al., 2007]. Eye-tracking is becoming more

accessible, and we anticipate that web cameras and mobile phone cameras will one day

have eye-tracking capabilities. Our methodology could then be used within social me-

dia and mobile applications. Posts and feeds can be treated as atomic UI elements so

that the characteristics of different posts (sentiment, object classification, colour etc.)

can be investigated against the visual scan sequence and the corresponding affective

states that are elicited.


6.8 Limitations of the AFA algorithm-STA method-

ology

Due to the limited accuracy of eye-tracking technology, our analysis has been based

on group behaviour as opposed to individual behaviour. Therefore, our measures are

aggregated to represent the behaviour of an entire group. Even though the raw data

is easily accessible (as shown in the tabular format in Table 6.3), the information pre-

sented on the visualisation lacks a measure of confidence such as confidence interval

(CI ), standard errors of the mean (SEM ), or the standard deviation (SD) from the

mean arousal level. Also, the visualisation does not show how many users were used

to compute the aggregate, which may be crucial, especially when qualitative results,

such as interviews with the participants are available. When making inferences about

certain trends observed on the visualisation, it is therefore important to look up these

additional factors from the output of the algorithms to obtain more context to the

trend. In future, the transparency or opaqueness of each circle representing the scan

an element could be used to indicate a confidence level for our measurements. For

instance, increased opacity means higher confidence interval. This way, the more reli-

able signals are more emphasised to the observer while the less significant results are

blurred out. Another limitation is that our methodology can currently only be used in

laboratory settings. Ambient light, inter-colour differences between (and within) stim-

uli and other environmental variables may introduce confounding factors which may

yield different results in the wild. Therefore, our methodology needs to be optimised

to handle these factors dynamically in naturalistic settings.

6.9 Conclusion

Traditional metrics for evaluating the usability of websites have yielded much success

in the past. Many of those metrics tell us what is wrong with a website. However, of

equal importance to UX researchers, is determining the user’s emotional state during

the interaction. Recognising the user’s affective state is important because it reveals

a richer understanding of why users behave in certain ways in the presence of certain

content and tasks. We have demonstrated through our study that it is possible to


combine methods that answer both of these questions, ‘how do users interact with

websites’ and ‘how do they feel when they interact with web-pages. The former was

achieved by summarising the users’ visual scan paths into a trending scan path using

the STA algorithm, while the latter was achieved by generating arousal scores that each

UI element elicits for each group of users. Furthermore, we created a novel visualisation

that can aid researchers in assimilating and generating research questions that can be

investigated further using more established statistical or qualitative methods. We

have utilised changes in arousal as our affective metric in this study, in future, we

recommend an approach that also identifies the valence of the users’ emotional state.

In domains such as e-learning and gaming, users often drop out due to undesired

affective and cognitive states during their interaction. Our methodology provides

context (users’ affective state, and visual attention) which can be fed back into the

system. Real-time adaptation of user interfaces and contents can then be carried out to

improve the quality of their user experience. Having proposed, implemented, evaluated

AFA algorithm in previous chapters, this chapter extends its use by combining it with

other more established methods. In the next chapter, we discuss the implications for

design, limitations and future work in terms of our research.

Chapter 7

Discussion and Conclusion

In the previous chapter, we extended AFA algorithm by combining it with the anal-

ysis of users’ trending scan path (STA algorithm) to facilitate a better understating

of user behaviour on the Web. In this chapter, we take a higher-level overview, and

appraisal of our research - sensing arousal and focal attention during user interaction.

In Section 7.2, we consider what AFA algorithm means, and how it can be utilised

by UX researchers, UI designers and other affective computing researchers. Next, we

highlight its limitations in Section 7.3. We discuss how similar HCI methodologies

compare with ours, and alternative approaches we could have taken regarding ours.

We proceed to Section 3.7, presenting a visualisation tool that we developed to view

the output of AFA algorithm. We propose this tool which is a dashboard that can aid

users in visualising the output of their eye-tracking datasets to observe participants’

affective signals, either with respect to temporal trends or in relation to their focal

attention during moments of increased arousal. We conclude the chapter with a sum-

mary and appraisal of the entire research.

7.1 Reflection about our work and state of the art

in affect detection

Since we started our research, the sub-domain of affect detection has gained increased

attention [van der Wel and van Steenbergen, 2018]. In 2015, Sioni et al. reported

187

CHAPTER 7. DISCUSSION AND CONCLUSION 188

a review of existing affect detection mechanisms. They recommended that research

should be carried out into affect detections with objects that users make use of daily

as sensors. For example, embedding GSR sensors into a mouse and optical sensors on

mobile phones for sensing BVP. Even though research has shown that this is possible

[Amico, 2018], the devices in question are still not widely used. Smartwatches with HR

and GSR sensors are also becoming popular, but they are mainly used by people for

fitness and health tracking. Smartwatches are not considered computer peripherals;

therefore, we argue that it is less integrate-able compared to peripherals like webcams.

The pupillary response was particularly cited as being a noisy source of affective signal

[Klingner et al., 2008, Klingner, 2010]. The idiosyncratic nature of pupillary response

data makes baseline measurements difficult [Van Gerven et al., 2004, Beatty, 1982].

We showed with our computational approach through a dynamic baseline detection

that we can measure relative changes between previous and current states of arousal.

Recent work by Wang et al. selected a set of sequences of mouse and gaze patterns

that correspond to participants feeling stressed during user interaction. Their applica-

tion, which is capable of sensing the user’s fixation, was evaluated on Web search and

mental calculation tasks resulting in an accuracy of 74.3%, outperforming the status

quo by 20% [Wang et al., 2019]. Our work is capable of sensing emotional arousal,

as well as stress, but Wang et al.’s work complements ours, as it does not require

devices with pupillometry capabilities. Similarly, EYECU (Emotional eYe trackEr for

Cultural heritage sUpport), a system for sensing arousal in relation to the users’ focal

attention was developed for viewers of an art gallery [Calandra et al., 2016]. Calandra

et al. designed EYECU to log affective arousal data about the visitors of an art gallery,

in relation to the area of the stimulus that was fixated upon [Calandra et al., 2016].

The focus of our research was on sensing arousal. Pupil dilation is an indicator of cog-

nitive and affective arousal [Partala and Surakka, 2003]. However, it would be useful to

distinguish between positive, high arousal states like interest, attention and excitement

from high arousal, negative states like anger, frustration and cognitive overload. This

distinction between positive and negative valence is useful for the adaptive system to

alter the system in the right direction of hedonic polarity [Van Schaik and Ling, 2008].

We have observed new evidence since the start of our work suggesting that features


extracted from the dynamics of pupil dilation can be analysed using machine learn-

ing techniques, to sense affective valence (negative or positive states) [Babiker et al.,

2015, Ragot et al., 2017]. To the best of our knowledge, the results from Babiker et

al. have not been replicated by other studies without the combination of other sensors

like the EEG [Lu et al., 2015]. Sticking to our principle of using a single source of

affect detection, we could enhance our approach by adding object identification. With

object identification, we can extract the content of the stimulus where a user is focused

during moments of increased arousal. After extracting the content, we can apply sen-

timent analysis, to identify the valence categories (positive, neutral or negative) of the

object so that applications that utilise AFA algorithm will have a richer context about

the users’ affective state [Baltaci and Gokcay, 2012].

In recent times, there have been many methodological and theoretical contributions

to the understanding, analysis and modelling of pupillary response to sense affective

signals. Stephen et al. studied the relationship between emotional intensity and dura-

tion, which is useful not only in predicting and sensing an emotion but a crucial factor

in the implementation of adaptive systems [Steephen et al., 2018]. Snowden et al., in

addition to the duration, also explored the role of habituation and the mode of view-

ing (passive vs active) in a picture viewing task while sensing arousal [Snowden et al.,

2016]. With regards to methodologies to account for or remove illumination effects in

pupillary response, several solutions have been proposed. Korn et al. proposed concept

of linear time-invariant (LTI) to account for light changes in auditory-oddball task,

an emotional-words task, and a visual-detection task [Korn and Bach, 2016]. Pfleging

et al. proposed a model relating pupil diameter to mental workload and lightening

conditions [Pfleging et al., 2016, Raiturkar et al., 2016] while proposed a method to

decouple light reflex from pupillary dilation to measure emotional arousal in videos

[Pfleging et al., 2016, Raiturkar et al., 2016]. Due to a lack of resources (time, and

data), we were unable to implement these methodologies for accounting for illumina-

tion and brightness. Especially the effect of varying lights, combined with stimuli that

have varying emotional and cognitive properties. For simplicity and internal validity,

we decided to keep the ambient light constant. Therefore, our results are valid for

intra-colour changes within a stimulus.


Regarding the application of AFA algorithm in naturalistic settings, Ferhat et al. re-

viewed the existing options for low-cost eye-tracking devices [Ferhat and Vilarino,

2016]. Although the emphasis was on gaze tracking rather than pupillometry, they

reveal that gaze accuracy of visible light cameras is comparable to those that use in-

frared cameras [Ferhat and Vilarino, 2016]. Previously, the standard frame rate for

web cameras was 15 frames per second (fps), but at the point of this writing, web

cameras typically range from 30fps to 120fps (which is twice the frequency at which

we captured data in our studies). However, frequency is not the only limitation on web

cameras. Web cameras capture data like pictures, at a particular resolution, for exam-

ple, 720 pixels (p) or 1080p (known as High Definition - HD). Higher resolution images

contain more detail; therefore, there is an increased likelihood for accurate measure-

ments of pupil data. To cope with limitations in hardware, software, the mode of data

transmission (USB 2.0, USB 3.0) from the web camera to the computer and band-

width (over the internet), there is usually a compromise to be made in frequency or

picture resolution. Therefore, web cameras may capture images at higher resolutions

(1080p) but transmit at lower frequencies (30fps) due to lower processing power, mode

of transmission or bandwidth limitation. Eye trackers like Tobii make use of dedicated

Ethernet LAN cables (with RJ45 connectors), and USB interfaces for data transmis-

sion and communication between the device and the computer. Eye trackers also have

inbuilt software for pre-processing data. Furthermore, eye trackers capture data using

infrared, which captures light with wavelengths as long as 14,000 nanometers which

is more detailed for pupillometry processing, compared to web cameras that capture

images meant to be interpreted by the human eye, with a visible range of wavelength

between 400 - 700 nanometres. As we discovered from our literature review, gaze

tracking and eye tracking is far more widely accepted and adopted methodology in

usability studies and HCI than pupillometry. Our work gives credence to the exist-

ing works on pupil dilation. We hope that with more success in the methodology for

analysing pupillary response data, there will be a corresponding success regarding the

use of low-cost eye trackers with pupillometry capabilities for widespread ubiquitous

affect detection.


7.2 Design and methodological implications

Many usability metrics measure performance, accuracy, effectiveness, satisfaction and

efficiency [Scholtz, 2006]. These metrics evaluate usability from the system’s perspec-

tive, i.e. how well does the system deliver its functions to the users [Martin et al.,

2007]. However, there are many more factors that influence the impact that a sys-

tem/product/user interface has on its users. User experience is a more overarching

term that refers to “a person’s perceptions and responses that result from the use or

anticipated use of a product, system or service” [Law et al., 2009]. From this defini-

tion by the ISO, we can see that users’ perception and response count towards user

experience. Therefore, apart from evaluations that take measurements of the system,

recent trends in HCI have exposed the need to take our bearings from the perspective

of the user [Prendinger and Ishizuka, 2005]. In research, gaze detection has been de-

ployed as one of the techniques for investigating users’ perception of a visual stimulus

[San Agustin et al., 2010]. Therefore, eye tracking has long been established as a

methodology for carrying out usability tests. Eye trackers have effectively been used

to answer the question, “how do users behave when presented with a visual stim-

ulus?”. Some of the well-used metrics include the fixation (duration and location).

With fixation metrics, for example, we can measure learnability, an important factor

for efficiency, by comparing the time to the first fixation, on salient components in one

user interface vs another user interface [Simola et al., 2015]. More advanced techniques

such as visual sequences make use of fixations to understand the transition between

different AOIs of a user interface [Eraslan et al., 2016b]. For example, STA algorithm

that we combined in Chapter 6.

Affective states contribute greatly towards users’ perception of a system. Vice-versa,

the use of an interface may influence users’ affective state, and ultimately, their per-

ception of the system/product [Desmet, 2003]. Affective computing, particularly the

use of physiological sensors to measure users’ affect, was promised as a domain that

will deliver a greater understanding of human-computer interaction [Picard, 1997]. As

we reported in Chapter 2, several works of research have been invested in affective

computing. Limitations in the applicability of previous research in the wild, cost of

purchase and deployment of the sensors have hindered the progress [Ragot et al., 2017].


Our research focused on leveraging the use of eye-tracking, which helps us know what

the user is doing, but with the combination of pupil dilation (a physiological response),

we can understand how the users feel.

For researchers, the methodological implication is that we now have the opportunity to

extract a richer understanding of users’ behaviour (how users act + how they feel) dur-

ing visual interaction. Our toolkit aids the comprehension and assimilation of results,

as researchers can plug in their eye-tracking datasets and visualise their behaviours

from a spatial (AOIs) or temporal perspective. As our methodology becomes more

popular, we anticipate that researchers will begin to identify patterns and trends that

could be used to predict the positive/negative perception of a system. The merit of

AFA algorithm is that the eye-tracking methodological process flow does not change

a great deal, i.e. we have leveraged an existing methodology (eye-tracking), but we

derive additional insight [Matthews et al., 2019b].

For designers, we envisage that those findings from researchers, as discussed above,

would influence the way user interfaces are being designed. However, we also envision

a more direct impact if future web cameras were to have pupillometric capabilities

[Bousefsaf et al., 2014]. When this future is met, AFA algorithm can be implemented

as a browser plugin, or within a user interface such that it notifies the system when-

ever it senses an unwanted affective state. When UI/system designers are equipped

with the ability to sense the users’ affective state, there are several ideas that could

be explored as palliatives to induce a more desirable state. For example, if a user ex-

periences cognitive overload, perhaps, the system can suggest breaks, or play soothing

music to the user [Dalvand and Kazemifard, 2012, Kim and Andre, 2008].

Our work and its potential impacts are promising. Perhaps, if it was in doubt to man-

ufacturers of web camera hardware, that their consumers may not benefit from having

web cameras with eye-tracking and pupillometric capabilities, our work provides ev-

idence that such technological advancements enhance the realms of possibilities for

HCI researchers, UI and system designers.

We discuss the limitations of our work in the next.


7.3 Limitations

We discuss four categories of limitations: methodological (AFA algorithm), technologi-

cal (hardware), application (use cases), and contextual (meaning).

Methodological Limitations: Although, we have shown through our study of emo-

tional pictures in Chapter 4, that the relative colour intensity was not a significant

factor in our measure of arousal, we have not tested the effect of ambient lighting,

screen brightness, or more intense colour changes on the screen. We are also yet to

test AFA algorithm on real-time arousal detection.

We decided to focus on ecological validity. Therefore, we have not carried out a

study that compares the accuracy of AFA algorithm with other studies. As we dis-

covered from our background study, most other affect detection methodologies are

application-specific, and in many cases designed to function in the lab, with accuracy

as the priority. Rather than achieving a context-specific high accuracy, we aimed for

generalisability and consistency in our results by developing and testing AFA algorithm

using multiple datasets, and realistic stimuli types (e.g. frustration on the Web). Even

though our evaluations have been lab-based, the choice of our affect detection mecha-

nism has the potential for use in naturalistic settings, with web cameras.

We propose that AFA algorithm be optimised using the online change point detection

algorithm described in Chapter 3 to split real-time streams of pupillary response so

that AFA algorithm can be applied on atomic segments to sense arousal during user

interaction. This can then be combined with web camera’s that are capable of pupil-

lometry so that the potential of applying AFA algorithm in naturalistic settings is

fulfilled. We discuss the technological limitations next.

Technological limitations: As we established in Chapter 3, pupillometry devices

are still costly and in many cases large, rather than inbuilt on into personal computers.

The accuracy and fidelity of eye trackers have improved over the years; as now, we

have eye trackers that capture data at 1200Hz. For our study, we used Tobii X60,

which has a maximum frequency of 60Hz. As we observed in Chapter 4, that there

is a moderate correlation between the accuracy of AFA algorithm, and the accuracy of

the eye tracker (measured by the number of times the device can capture both pupils

over the entire session). Several factors such as blinks, head movement interfere with


the eye tracker’s ability to capture the pupil, but eye trackers with higher frequency

rates recover faster from data loss.

Application limitations: The use of pupillary response as an affect detection mech-

anism relies on participants to have correct to normal vision. Hence, visually impaired

users could make use of affect detection mechanisms like the Galvanic skin response.

Also, applications that do not require visual attention, such as listening to music could

benefit from other affect detection mechanisms like the GSR [Kim and Andre, 2008].

Contextual limitations: When we sense arousal, we measure it to the participants’

baseline dynamically and sense relative changes in arousal. Hence, from our measure-

ment, arousal ranges from 1-9, where 1-3 may be considered low, 4-6 medium and

7-9 are high levels of arousal. Our research could benefit from an extensive study to

understand the practical implications of these arousal levels on the user’s interaction.

Third-party applications could benefit from this as they know how much disruption

can be introduced due to this. Another contextual limitation is that arousal is a proxy

that could indicate stress, interest, boredom, alertness, frustration. In lab studies,

qualitative methods such as interviews or reported feedback can be collected from

the participants to narrow down the reason for increases in arousal. In the wild, this

may not be possible. In AFA algorithm, we detect the user’s focal attention during

moments of increase in arousal. We could, therefore, improve AFA algorithm with an

automatic identification of objects, which can then be fed through sentiment analysis

to detect if the object has a negative or positive valence so that we can have more

context. We discuss more this in future work. We discuss the potential solutions to

these limitations next.

7.4 Future Research Pathways and The Potential

of Our Research

In this section, we discuss future work from the perspective of the research opportu-

nities and potential applications of our research. Based on the setting for which our

research work can be applied, we categorise our recommendations for future work into

two: 1. Future work that could enhance the use of AFA algorithm in controlled settings

(in-the-lab), and 2. Future work to be done to ensure that our research can be applied


in naturalistic settings (in-the-wild). The first recommendation discussed below relate

to the former, while the other five pertain to the application of our research in the wild.

7.4.1 User evaluation of our visualisation toolkit

The visualisation toolkit for our output of AFA algorithm was presented in Chapter

7. The visualisation toolkit displays the changes in arousal, either as a function of

the participant’s area of focal attention, or, with regards to changes in time as they

interact with a visual stimulus. It was proposed as a medium to aid researchers in

formulating hypotheses as to the arousal dynamics of participants.

The next phase in the advancement of this toolkit is its evaluation. This is necessary to

access the learnability of the tool, and if indeed salient trends in the arousal dynamics

of users can be detected by observing the visualisation.

Moreover, we often claimed in this study that HCI and UX researchers would benefit

from understanding the behaviours of users in terms of changes in arousal. While

research trends in the literature support our claim [Foglia et al., 2008, Omata et al.,

2012], this study can be done to further assess the technology acceptance of our overall

approach, by its potential users (HCI and UX researchers).

7.4.2 The impact of light on AFA algorithm

As we highlighted in Chapter 7, one of the limitations to our approach is light changes.

We observed that intra-colour changes within stimulus do not have significant effects

on our measure of arousal. Therefore, in laboratory settings, researchers may control

for light by keeping ambient light constant. However, for AFA algorithm to be utilised

in naturalistic settings, changes in perceived brightness need to be accounted for. We

propose that studies should be conducted with varying light settings, to understand its

effect on the pupil. When this relationship between light and the pupil is established,

our model of the users’ pupil dilation could be modulated such that the amplitude of

our the affective signal is increased to counter the effect of pupil constriction during

increased brightness. Similarly, the amplitude should be reduced to account for partic-

ipants experiencing reduced brightness so that the modulated signal more accurately


reflects the participant’s affective state.

7.4.3 Combining AFA algorithm with other affect detection

mechanisms

The scope of our research was limited to sensing arousal during visual interaction.

Pupillary response and the analysis of gaze behaviour was our preferred approach,

within this scope. However, as we stated in the limitations, many eye-trackers require

participants to have correct to normal vision. Therefore, we propose an effective way

to alternate between the use of eye trackers and for example, GSR sensors so that if our

approach is not feasible, sensing arousal would still be possible. This is different from

combining both technologies simultaneously for affect detection, as that would negate

our previously established principle (that multi-sensory approach would decrease the

likelihood for widespread use). What we proffer in this case is a provision whereby

when one sensor is unavailable, another could be used, to deliver a similar outcome.

7.4.4 Optimizing AFA algorithm for real-time arousal sensing

In developing our algorithm, we used the offline change point detection algorithm to

split interaction data into smaller segments. For AFA algorithm to be utilised in natu-

ralistic settings in real-time, we propose the use of the online change point detection

algorithm. Since this online version has not been evaluated along with AFA algorithm,

we have listed it here for future work. Real-time analyses of pupillary dynamics may

provide affective context for adaptive computing and recommender systems.

7.4.5 Extending AFA algorithm for adaptive systems

Adaptive computing can be used to influence users’ interaction experience. This could

be achieved through a change in the users content or layout either in real-time or not.

For adaptive computing to be effective, certain events need to initiate the adaptive

engine. When AFA algorithm is optimised for real-time detection, triggers can be set;

for example, when the arousal level of participants reaches a certain threshold while

fixating on a text with small font, the font size could be magnified. Another example

in intelligent tutoring systems could be used in adaptive tests where difficult questions


can cause users to experience an increase in arousal. Triggers can be set such that the

adaptive engine fetches a less difficult question so that the learner does not drop out.

7.4.6 Utilizing AFA algorithm on mobile devices

Of all the future works listed, enabling AFA algorithm to function accurately on mobile

devices appears to be the most challenging. However, for our approach to become truly

ubiquitous, AFA algorithm should be optimised for use on mobile devices. Besides the

hardware limitations that mobile devices present, i.e., a camera with eye-tracking ca-

pabilities, there are other challenging conditions that currently present AFA algorithm

from sensing changes in arousal accurately on mobile devices in the wild. Varying am-

bient lights, dynamic screen brightness, relative positional changes between the user’s

pupil and the camera are some of them.

Our work opens up research avenues in affective computing, ubiquitous computing,

usability and UX evaluations.

7.5 Conclusion

We started off our research with the theme, “Physiological correlates of affect”. The

literature into affective computing has evolved since the inception of the domain, where

foremost researcher, Rosalind Picard envisioned future computers to have the capa-

bility to sense the emotions of its users [Picard, 1997]. She opined that sensing user

emotions would facilitate computer systems to deliver contents and services such as

entertainment, education, information, software user interfaces in a more intelligent

manner [Picard, 1997]. The ambition is to attain a level, similar to humans, where

we observe each other’s emotional cues such as facial expressions, voice prosody, body

gestures to improve social interaction. Similar to human beings, computers are also

equipped with a sensor to learn these behavioural cues. Human beings still have a

higher accuracy of emotion detection, but with the potential in computational tech-

niques, researchers are making headway in using computers to sense people’s affective

states. In comparison to human beings, computers have processing and storage ad-

vantage, which makes inferences less biased. However, human beings are currently


better at putting confounding factors into context. For affective sensors, environ-

mental conditions (light, motion, location, interaction scenarios), psychological states

(personality traits, moods, previous emotional states) and demographic profiles (pre-

vious experience, sex, culture, age) may translate into noisy data. Whereas, human

beings are currently better at factoring contextual information while observing each

other’s emotional cues.

Our literature review helped us understand the landscape of affective computing, and

existing mechanisms to sense user affect. We observed that there are many research

works focused on in-the-lab affect detection, with limited potential for application in

the wild. Therefore, we defined our scope, that is, an approach, and a method for

sensing user emotions in such a way that it has the potential for deployment in the

wild. There are many potential HCI-related benefits, and software application for

sensing user affect. Some of them include affective gaming [Hudlicka, 2009], intelligent

tutoring systems [Sidney et al., 2005], stress detection [Mullins and Treu, 1991], etc.

However, the full benefits of affective computing would only be realised by end-users

if the technique for sensing affect has the potential for widespread ubiquitous use.

Therefore, our first objective was to select an affect detection mechanism that has the

potential for deployment in the wild. This was our first and most important criteria,

because unlike generalisability and accuracy (which can be improved upon), the choice

of affect detection mechanism, once decided can only be changed for another. Each

affect detection sensor comes with its peculiarity, so, a change may require equally

thorough research and testing. Through our literature review (2), we selected pupil-

lary response and gaze behaviour as our affect detection mechanism.

We defined our research to be applicable on the Web and interactive systems. There-

fore, rather than focusing on all affective states, we defined our scope to sensing arousal.

Arousal is a dimension of affect that influences user experience critically. As we stated

in previous chapters, arousal is the intensity of an emotion and can be used as a proxy

for measuring interest, attention, cognitive load, all of which are of interest in inter-

active systems.

Pupillary responses allow us sense changes in the users’ level of arousal while gaze be-

haviour enables us to know where they are looking at when they experience a change

in arousal [Partala and Surakka, 2003]. We use peak detection to sense when these


changes occur, compute the most fixated area 1 second before the increase in arousal.

When a peak is detected, we compute the magnitude of the peak and compound it

by the total fixation duration during the moment of a peak, to measure the impact of

the peak on the user. We took a data-driven methodology to develop AFA algorithm

by analysing eye-tracking datasets from different domains and improving our methods

iteratively.

In evaluating AFA algorithm, we considered generalisability to different stimuli types

(emotions, cognition, frustration, static and during user interaction on the Web). Alto-

gether, we developed AFA algorithm using eye-tracking datasets from two independent

studies consisting of a total of 60 participants. In the first study, using EEG images

as our stimuli, we developed AFA algorithm to sense arousal from static images, while

in the other study, we used eye-tracking data from interaction with the Protege user

interface. We went on to evaluate AFA algorithm on three independent studies, with

108 participants in total. In the first study, we evaluated AFA algorithm on its ability

to sense arousal from emotion-evoking images, where we had a moderate correlation

between our measure of arousal and the participants’ reported arousal rating. In the

second study, participants were asked to say aloud, the names of animals in one stim-

ulus, and the name of different colours, for the other stimulus. For the controlled

condition, the objects (animals/colours) we correctly (congruently) labelled [La Heij,

1988]. In the comparison condition, each object was misnamed (incongruently), to

induce Stroops’ effect (increased level of cognitive demand) onto the participants. In

this second evaluation study, we observed that there was a moderate correlation be-

tween the output of AFA algorithm and the expected level of arousal, for the animal

naming task, while for the colour naming task, there was a strong correlation. In

the third evaluation study, participants were presented with four Web tasks. In the

controlled form, participants carried out the tasks normally while in the comparison

tasks, we injected known causes of frustration unto the participants. AFA algorithm

was capable of distinguishing between both tasks with a strong effect. We also ob-

served a strong correlation between the users’ attention (measured by their fixation

count) on the components that induced arousal and their measures of arousal. The

three studies showed that AFA algorithm could sense arousal and the user’s attention

during moments of increased arousal.


Next, we extended AFA algorithm to develop a novel methodology for understanding

user behaviours on the Web. For this, we combined AFA algorithm with STA algo-

rithm, which computes the trending visual scan path of viewers of a visual stimulus.

To develop this approach, we performed a secondary analysis of the eye-tracking data

of 41 participants that viewed the Apple home page. We used the secondary analysis

to develop a visualisation that can help to uncover behaviours that are common to a

user group for a visual stimulus. To evaluate our methodology, we explored differences

between people with autism and neurotypical people with regards to how they browse

certain Web pages. Our study yielded results that are consistent with the literature on

people with autism. For example, we discovered cases where emotive images evoked

more arousal onto the neurotypical users than the users with autism.

Finally, in terms of AFA algorithm, as an indirect contribution to this research, we have

also developed an arousal explorer tool, and another visualisation toolkit to observe

the results of the analysis of eye-tracking data, visually. We propose that it can be

used to aid hypothesis generation from the output of AFA algorithm. This thesis de-

veloped an approach to sensing arousal and visual attention during user interaction.

We set out with these three objectives:

1. To sense arousal using an affect detection mechanism that has the po-

tential for future use.

For affect detect mechanisms to become accessible, they need to be cheap, not

bulky and required minimal skills to set up. After carrying out a literature

review, our choice to use pupillary response is based on the fact that it is un-

obtrusive. Further, prevalent trends in eye tracking device suggest that web

cameras with eye-tracking capabilities are becoming mainstream. Our research

provides motivation for web camera manufacturers to deliver web cameras with

pupillometric capabilities. This would, in turn, spur up the affective computing

community, UX researchers, UI and system designers to develop applications

that can work with AFA algorithm. We conclude that AFA algorithm has the

potential for widespread ubiquitous use.

2. To sense arousal in a way that is generalisable in visual interaction

Besides the potential for widespread ubiquitous use, generalisability was of key


importance to our research. This is why we took a data-driven approach, devel-

oping and evaluating our approach by using datasets from several visual stimuli.

Our stimuli varied from static to interactive contents, emotional to cognitive

stimuli types. Our ground truths were based on participant feedback, literature

and domain experts. We applied our approach on the Web, and extended its

use to another existing methodology, the STA algorithm. We, therefore, con-

clude that AFA algorithm can be applied to a variety of visual stimuli types

and extended to other research methodologies, or systems that work with visual

stimuli.

3. To sense arousal accurately

As we stated earlier, this was not our most important criteria, because we be-

lieved that the accuracy of our technique could be improved with eye trackers

with higher fidelity (frequency of data collection and resolution). In evaluating

AFA algorithm, our approach has consistently delivered moderate to strong corre-

lations or effects between the arousal measure of AFA algorithm, and our ground

truths. Our results show promise and the correlation between the accuracy of

AFA algorithm and the accuracy of the data collected by our eye tracker confirms

the potential for improvement.

In addition to our set objectives, we proposed, designed and developed a way to

visualise our algorithm such that researchers can use our the visualisation toolkit

as a medium to observe the results of AFA algorithm in a way that aids hypothesis

formulation. In terms of our research questions, RQ1 was addressed through the first

objective, where we selected pupillary response and the analysis of gaze behaviour as

our affect detection mechanism. RQ2, which is about accuracy and generalisability,

was addressed in objectives two and three above. As we stated, in all our evaluations,

we observed a moderate to strong correlation/effect between our measure of arousal

and the ground truth. Finally, in RQ3, which is about determining the focal attention

of users during moments of arousal, we observed a correlation between the measure of

arousal, and the component being fixated upon during moments of increased arousal on

the Web. RQ3 was addressed further in Chapter 6, where we extended AFA algorithm

successfully by combining it with the STA algorithm. In this novel methodology,


we confirmed existing behavioural patterns that were observed in the literature. For

example, the difference in affective response of people with autism, compared with

neurotypical people on the Web. The results of these two studies suggest that AFA

algorithm can be used to sense the user’s focal attention during moments of increased

arousal.

Our aims and objectives for this research have been addressed. Further, our objectives

aided us in answering our research questions. We highlighted the limitations of our

research in Chapter 7 and stated alternative approaches. In the next section, we

expanded more on our recommendations for future work and research pathways in

sensing arousal and focal attention during user interaction. Finally, we presented our

concluding remarks.

Bibliography

[Abbasi et al., 2010] Abbasi, A. R., Dailey, M. N., Afzulpurkar, N. V., and Uno, T.

(2010). Student mental state inference from unintentional body gestures using dy-

namic bayesian networks. Journal on Multimodal User Interfaces, 3(1-2):21–31.

[Abdrabou et al., 2018] Abdrabou, Y., Kassem, K., Salah, J., El-Gendy, R., Morsy,

M., Abdelrahman, Y., and Abdennadher, S. (2018). Exploring the usage of eeg and

pupil diameter to detect elicited valence. In International Conference on Intelligent

Human Systems Integration, pages 287–293. Springer.

[Agrafioti et al., 2012] Agrafioti, F., Hatzinakos, D., and Anderson, A. K. (2012). Ecg

pattern analysis for emotion detection. Affective Computing, IEEE Transactions

on, 3(1):102–115.

[Agrawal et al., 2013] Agrawal, U., Giripunje, S., and Bajaj, P. (2013). Emotion and

gesture recognition with soft computing tool for drivers assistance system in human

centered transportation. In Systems, Man, and Cybernetics (SMC), 2013 IEEE

International Conference on, pages 4612–4616. IEEE.

[Ahern and Schwartz, 1985] Ahern, G. L. and Schwartz, G. E. (1985). Differential

lateralization for positive and negative emotion in the human brain: Eeg spectral

analysis. Neuropsychologia, 23(6):745–755.

[Ahn and Picard, 2005] Ahn, H. and Picard, R. W. (2005). Affective-cognitive learn-

ing and decision making: A motivational reward framework for affective agents. In

International Conference on Affective Computing and Intelligent Interaction, pages

866–873. Springer.

203

BIBLIOGRAPHY 204

[Akgun and Ciarrochi, 2003] Akgun, S. and Ciarrochi, J. (2003). Learned resourceful-

ness moderates the relationship between academic stress and academic performance.

Educational Psychology, 23(3):287–294.

[Alamia et al., 2019] Alamia, A., VanRullen, R., Pasqualotto, E., Mouraux, A., and

Zenon, A. (2019). Pupil-linked arousal responds to unconscious surprisal. Journal

of Neuroscience, pages 3010–18.

[Alexander et al., 2003] Alexander, S., Sarrafzadeh, A., and Fan, C. (2003). Pay at-

tention! the computer is watching: Affective tutoring systems. In Proceedings of

World Conference on E-Learning in Corporate, Government, Healthcare, and Higher

Education, pages 1463–1466.

[Alhargan et al., 2017] Alhargan, A., Cooke, N., and Binjammaz, T. (2017). Affect

recognition in an interactive gaming environment using eye tracking. In 2017 Sev-

enth International Conference on Affective Computing and Intelligent Interaction

(ACII), pages 285–291. IEEE.

[Alhothali, 2011] Alhothali, A. (2011). Modeling user affect using interaction events.

Warloo.

[Allen et al., 1988] Allen, C. T., Machleit, K. A., and Marine, S. S. (1988). On assess-

ing the emotionality of advertising via izards differential emotions scale. Advances

in Consumer Research, 15(1):226–231.

[Allen et al., 2001] Allen, J. J., Harmon-Jones, E., and Cavender, J. H. (2001). Manip-

ulation of frontal eeg asymmetry through biofeedback alters self-reported emotional

responses and facial emg. Psychophysiology, 38(4):685–693.

[Amico, 2018] Amico, S. (2018). ETNA: a Virtual Reality Game with Affective Dy-

namic Difficulty Adjustment based on Skin Conductance. PhD thesis.

[Aminihajibashi et al., 2019] Aminihajibashi, S., Hagen, T., Foldal, M. D., Laeng, B.,

and Espeseth, T. (2019). Individual differences in resting-state pupil size: Evi-

dence for association between working memory capacity and pupil size variability.

International Journal of Psychophysiology.

BIBLIOGRAPHY 205

[Arent and Landers, 2003] Arent, S. M. and Landers, D. M. (2003). Arousal, anxiety,

and performance: A reexamination of the inverted-u hypothesis. Research quarterly

for exercise and sport, 74(4):436–444.

[Ashkanasy and Daus, 2002] Ashkanasy, N. M. and Daus, C. S. (2002). Emotion in the

workplace: The new challenge for managers. Academy of Management Perspectives,

16(1):76–86.

[Baban et al., 2009] Baban, S. M., Mohammed, P., Baberstock, P., Sankat, C., Boyd,

W., Laukner, B., Lloyd, D., and Baban, S. M. (2009). The Journey from Pondering

to Publishing. University of the West Indies Press.

[Babiker et al., 2015] Babiker, A., Faye, I., Prehn, K., and Malik, A. (2015). Ma-

chine learning to differentiate between positive and negative emotions using pupil

diameter. Frontiers in psychology, 6:1921.

[Bahr and Ford, 2011] Bahr, G. S. and Ford, R. A. (2011). How and why pop-ups dont

work: Pop-up prompted eye movements, user affect and decision making. Computers

in Human Behavior, 27(2):776–783.

[Bakhtiyari et al., 2014] Bakhtiyari, K., Taghavi, M., and Husain, H. (2014). Hybrid

affective computingkeyboard, mouse and touch screen: from review to experiment.

Neural Computing and Applications.

[Ballesteros and Croft, 1998] Ballesteros, L. and Croft, W. B. (1998). Resolving am-

biguity for cross-language retrieval. In Sigir, volume 98, pages 64–71.

[Baltaci and Gokcay, 2012] Baltaci, S. and Gokcay, D. (2012). Negative sentiment in

scenarios elicit pupil dilation response: an auditory study. In Proceedings of the 14th

ACM international conference on Multimodal interaction, pages 529–532. ACM.

[Baltaci and Gokcay, 2014] Baltaci, S. and Gokcay, D. (2014). Role of pupil dilation

and facial temperature features in stress detection. In Signal Processing and Com-

munications Applications Conference (SIU), 2014 22nd, pages 1259–1262. IEEE.

[Baltaci and Gokcay, 2016] Baltaci, S. and Gokcay, D. (2016). Stress detection in

human–computer interaction: Fusion of pupil dilation and facial temperature fea-

tures. International Journal of Human–Computer Interaction, 32(12):956–966.

BIBLIOGRAPHY 206

[Bamidis et al., 2004] Bamidis, P. D., Papadelis, C., Kourtidou-Papadeli, C., Pappas,

C., and Vivas, A. B. (2004). Affective computing in the era of contemporary neu-

rophysiology and health informatics. Interacting with Computers, 16(4):715–721.

[Baron-Cohen et al., 1988] Baron-Cohen, R. P., Ouston, J., and Lee, A. (1988). Emo-

tion recognition in autism: Coordinating faces and voices. Psychological medicine,

18(4):911–923.

[Baylor and Rosenberg-Kima, 2006] Baylor, A. L. and Rosenberg-Kima, R. B. (2006).

Interface agents to alleviate online frustration. In Proceedings of the 7th interna-

tional conference on Learning sciences, pages 30–36. International Society of the

Learning Sciences.

[Beatty, 1982] Beatty, J. (1982). Task-evoked pupillary responses, processing load,

and the structure of processing resources. Psychological bulletin, 91(2):276.

[Benedek and Hazlett, 2005] Benedek, J. and Hazlett, R. (2005). Incorporating facial

emg emotion measures as feedback in the software design process. Proc. Human

Computer Interaction Consortium.

[Benko and Wigdor, 2010] Benko, H. and Wigdor, D. (2010). Imprecision, inaccu-

racy, and frustration: The tale of touch input. In Tabletops-Horizontal Interactive

Displays, pages 249–275. Springer.

[Bergadano et al., 2002] Bergadano, F., Gunetti, D., and Picardi, C. (2002). User

authentication through keystroke dynamics. ACM Transactions on Information

and System Security (TISSEC), 5(4):367–397.

[Berkowitz, 1962] Berkowitz, L. (1962). Aggression: A social psychological analysis.

PsycINFO research-info-systems.

[Bijmolt et al., 2010] Bijmolt, T. H., Leeflang, P. S., Block, F., Eisenbeiss, M., Hardie,

B. G., Lemmens, A., and Saffert, P. (2010). Analytics for customer engagement.

Journal of Service Research, 13(3):341–356.

[Biniok, 2018] Biniok, J. (2018). Tamper monkey. https://github.com/

Tampermonkey/tampermonkey.

https://github.com/Tampermonkey/tampermonkey

https://github.com/Tampermonkey/tampermonkey

BIBLIOGRAPHY 207

[Birditt et al., 2005] Birditt, K. S., Fingerman, K. L., and Almeida, D. M. (2005).

Age differences in exposure and reactions to interpersonal tensions: a daily diary

study. Psychology and aging, 20(2):330.

[Boucsein, 2012] Boucsein, W. (2012). Electrodermal activity. Springer Science &

Business Media.

[Bousefsaf et al., 2013] Bousefsaf, F., Maaoui, C., and Pruski, A. (2013). Remote as-

sessment of the heart rate variability to detect mental stress. In Pervasive Computing

Technologies for Healthcare (PervasiveHealth), 2013 7th International Conference

on, pages 348–351. IEEE.

[Bousefsaf et al., 2014] Bousefsaf, F., Maaoui, C., and Pruski, A. (2014). Remote

detection of mental workload changes using cardiac parameters assessed with a low-

cost webcam. Computers in biology and medicine, 53:154–163.

[Bradley and Lang, 1994] Bradley, M. M. and Lang, P. J. (1994). Measuring emo-

tion: the self-assessment manikin and the semantic differential. Journal of behavior

therapy and experimental psychiatry, 25(1):49–59.

[Bradley et al., 2008a] Bradley, M. M., Miccoli, L., Escrig, M. A., and Lang, P. J.

(2008a). The pupil as a measure of emotional arousal and autonomic activation.

Psychophysiology, 45(4):602–607.

[Bradley et al., 2008b] Bradley, M. M., Miccoli, L., Escrig, M. a., and Lang, P. J.

(2008b). The pupil as a measure of emotional arousal and autonomic activation.


[Bradley et al., 2017] Bradley, M. M., Sapigao, R. G., and Lang, P. J. (2017). Sym-

pathetic ans modulation of pupil diameter in emotional scene perception: Effects of

hedonic content, brightness, and contrast. Psychophysiology, 54(10):1419–1435.

[Bremner, 2012] Bremner, F. D. (2012). Pupillometric evaluation of the dynamics of

the pupillary response to a brief light stimulus in healthy subjects. Investigative

ophthalmology & visual science, 53(11):7343–7347.

BIBLIOGRAPHY 208

[Broekens and Brinkman, 2013] Broekens, J. and Brinkman, W.-P. (2013). Affectbut-

ton: A method for reliable and valid affective self-report. International Journal of

Human-Computer Studies, 71(6):641–667.

[Brouwer et al., 2015] Brouwer, A.-M., Zander, T. O., Van Erp, J. B., Korteling, J. E.,

and Bronkhorst, A. W. (2015). Using neurophysiological signals that reflect cogni-

tive or affective state: six recommendations to avoid common pitfalls. Frontiers in

neuroscience, 9:136.

[Brown et al., 2011] Brown, L., Grundlehner, B., and Penders, J. (2011). Towards

wireless emotional valence detection from eeg. In Engineering in Medicine and

Biology Society, EMBC, 2011 Annual International Conference of the IEEE, pages

2188–2191. IEEE.

[Brugha et al., 2012] Brugha, T., Cooper, S. A., McManus, S., Purdon, S., Smith, J.,

Scott, F., Spiers, N., and Tyrer, F. (2012). Estimating the prevalence of autism spec-

trum conditions in adults: extending the 2007 adult psychiatric morbidity survey.

The NHS Informaiton Centre.

[Bruneau et al., 2002] Bruneau, D., Sasse, M. A., and McCarthy, J. (2002). The eyes

never lie: The use of eye tracking data in hci research. In Proceedings of the CHI,

volume 2, page 25. Citeseer.

[Bruun et al., 2016] Bruun, A., Law, E. L.-C., Heintz, M., and Alkly, L. H. (2016).

Understanding the relationship between frustration and the severity of usability

problems: What can psychophysiological data (not) tell us? In Proceedings of the

2016 CHI Conference on Human Factors in Computing Systems, pages 3975–3987.

ACM.

[Buchanan, 2018] Buchanan, J. (2018). Project title. https://github.com/insin/

greasemonkey.

[Buettner et al., 2018] Buettner, R., Scheuermann, I. F., Koot, C., Rossle, M., and

Timm, I. J. (2018). Stationarity of a users pupil size signal as a precondition of

pupillary-based mental workload evaluation. In Information Systems and Neuro-

science, pages 195–200. Springer.

https://github.com/insin/greasemonkey

https://github.com/insin/greasemonkey

BIBLIOGRAPHY 209

[Burger et al., 2013] Burger, B., Saarikallio, S., Luck, G., Thompson, M. R., and

Toiviainen, P. (2013). Relationships between perceived emotions in music and music-

induced movement. Music Perception: An Interdisciplinary Journal, 30(5):517–533.

[Burleson and Picard, 2004] Burleson, W. and Picard, R. W. (2004). Affective agents:

Sustaining motivation to learn through failure and a state of stuck. In Workshop

on Social and Emotional Intelligence in Learning Environments.

[Busso et al., 2009] Busso, C., Lee, S., and Narayanan, S. (2009). Analysis of emotion-

ally salient aspects of fundamental frequency for emotion detection. Audio, Speech,

and Language Processing, IEEE Transactions on, 17(4):582–596.

[Cacioppo et al., 1992] Cacioppo, J. T., Bush, L. K., and Tassinary, L. G. (1992).

Microexpressive facial actions as a function of affective stimuli: Replication and

extension. Personality and Social Psychology Bulletin, 18(5):515–526.

[Cacioppo et al., 1986] Cacioppo, J. T., Petty, R. E., Losch, M. E., and Kim, H. S.

(1986). Electromyographic activity over facial muscle regions can differentiate the

valence and intensity of affective reactions. Journal of personality and social psy-

chology, 50(2):260.

[Calandra et al., 2016] Calandra, D. M., Di Mauro, D., DAuria, D., and Cutugno, F.

(2016). Eyecu: an emotional eye tracker for cultural heritage support. In Empow-

ering Organizations, pages 161–172. Springer.

[Calvo and D’Mello, 2010] Calvo, R. A. and D’Mello, S. (2010). Affect detection: An

interdisciplinary review of models, methods, and their applications. IEEE Transac-

tions on Affective Computing, 1(1):18–37.

[Castellano et al., 2008] Castellano, G., Kessous, L., and Caridakis, G. (2008). Emo-

tion recognition through multiple modalities: face, body gesture, speech. In Affect

and emotion in human-computer interaction, pages 92–103. Springer.

[Catalano, 2002] Catalano, J. T. (2002). Guide to ECG analysis. Lippincott Williams

& Wilkins.

BIBLIOGRAPHY 210

[Ceaparu et al., 2004] Ceaparu, I., Lazar, J., Bessiere, K., Robinson, J., and Shnei-

derman, B. (2004). Determining causes and severity of end-user frustration. Inter-

national journal of human-computer interaction, 17(3):333–356.

[Celani et al., 1999] Celani, G., Battacchi, M. W., and Arcidiacono, L. (1999). The

understanding of the emotional meaning of facial expressions in people with autism.

Journal of autism and developmental disorders, 29(1):57–66.

[Chanel et al., 2006] Chanel, G., Kronegg, J., Grandjean, D., and Pun, T. (2006).

Emotion assessment: Arousal evaluation using eegs and peripheral physiological

signals. Multimedia content representation, classification and security, pages 530–

537.

[Chanel et al., 2008] Chanel, G., Rebetez, C., Betrancourt, M., and Pun, T. (2008).

Boredom, engagement and anxiety as indicators for adaptation to difficulty in games.

In Proceedings of the 12th international conference on Entertainment and media in

the ubiquitous era, pages 13–17. ACM.

[Chang et al., 2011] Chang, K.-h., Fisher, D., Canny, J., and Hartmann, B. (2011).

How’s my mood and stress?: an efficient speech analysis library for unobtrusive

monitoring on mobile phones. In Proceedings of the 6th International Conference

on Body Area Networks, pages 71–77. ICST (Institute for Computer Sciences, Social-

Informatics and Telecommunications Engineering).

[Charles et al., 2001] Charles, S. T., Reynolds, C. A., and Gatz, M. (2001). Age-

related differences and change in positive and negative affect over 23 years. Journal

of personality and social psychology, 80(1):136.

[Chellali and Hennig, 2013] Chellali, R. and Hennig, S. (2013). Is it time to rethink

motion artifacts? temporal relationships between electrodermal activity and body

movements in real-life conditions. In Affective Computing and Intelligent Interaction

(ACII), 2013 Humaine Association Conference on, pages 330–335. IEEE.

[Chen et al., 2017] Chen, H., Dey, A., Billinghurst, M., and Lindeman, R. W. (2017).

Exploring pupil dilation in emotional virtual reality environments.

BIBLIOGRAPHY 211

[Chen et al., 2003] Chen, L.-Q., Xie, X., Fan, X., Ma, W.-Y., Zhang, H.-J., and Zhou,

H.-Q. (2003). A visual attention model for adapting images on small displays.

Multimedia systems, 9(4):353–364.

[Cheng and Liu, 2008] Cheng, B. and Liu, G.-Y. (2008). Emotion recognition from

surface emg signal using wavelet transform and neural network. In Proceedings

of the 2nd international conference on bioinformatics and biomedical engineering

(ICBBE), pages 1363–1366.

[Chiang and Lin, 2007] Chiang, H.-M. and Lin, Y.-H. (2007). Reading comprehension

instruction for students with autism spectrum disorders: A review of the literature.

Focus on Autism and Other Developmental Disabilities, 22(4):259–267.

[Chmielewska et al., 2019] Chmielewska, M., Dzienkowski, M., Bogucki, J., Kocki,

W., Kwiatkowski, B., Pe lka, J., and Tuszynska-Bogucka, W. (2019). Affective com-

puting with eye-tracking data in the study of the visual perception of architectural

spaces. In MATEC Web of Conferences, volume 252, page 03021. EDP Sciences.

[Choe et al., 2016] Choe, K. W., Blake, R., and Lee, S.-H. (2016). Pupil size dynamics

during fixation impact the accuracy and precision of video-based gaze estimation.

Vision research, 118:48–59.

[Christensen et al., 2018] Christensen, D. L., Braun, K. V. N., Baio, J., Bilder, D.,

Charles, J., Constantino, J. N., Daniels, J., Durkin, M. S., Fitzgerald, R. T.,

Kurzius-Spencer, M., et al. (2018). Prevalence and characteristics of autism spec-

trum disorder among children aged 8 yearsautism and developmental disabilities

monitoring network, 11 sites, united states, 2012. MMWR Surveillance Summaries,

65(13):1.

[Christian et al., 2014] Christian, Allison, B., Nijholt, A., and Chanel, G. (2014). A

survey of affective brain computer interfaces: principles, state-of-the-art, and chal-

lenges. Brain-Computer Interfaces, 1(2):66–84.

[Cole et al., 2002] Cole, P. M., Bruschi, C. J., and Tamang, B. L. (2002). Cultural dif-

ferences in children’s emotional reactions to difficult situations. Child development,

73(3):983–996.

BIBLIOGRAPHY 212

[Colizoli et al., 2018] Colizoli, O., De Gee, J. W., Urai, A. E., and Donner, T. H.

(2018). Task-evoked pupil responses reflect internal belief states. Scientific reports,

8(1):13702.

[Constantine and Hajj, 2012] Constantine, L. and Hajj, H. (2012). A survey of ground-

truth in emotion data annotation. In Pervasive Computing and Communications

Workshops (PERCOM Workshops), 2012 IEEE International Conference on, pages

697–702. IEEE.

[Cook et al., 2013] Cook, R., Brewer, R., Shah, P., and Bird, G. (2013). Alexithymia,

not autism, predicts poor recognition of emotional facial expressions. Psychological

science, 24(5):723–732.

[Corbetta et al., 1998] Corbetta, M., Akbudak, E., Conturo, T. E., Snyder, A. Z.,

Ollinger, J. M., Drury, H. A., Linenweber, M. R., Petersen, S. E., Raichle, M. E.,

Van Essen, D. C., et al. (1998). A common network of functional areas for attention

and eye movements. Neuron, 21(4):761–773.

[Cowie et al., 2001] Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kol-

lias, S., Fellenz, W., and Taylor, J. G. (2001). Emotion recognition in human-

computer interaction. IEEE Signal processing magazine, 18(1):32–80.

[Crichton, 2001] Crichton, N. (2001). Visual analogue scale (vas). J Clin Nurs,

10(5):706–6.

[Critchley, 2002] Critchley, H. D. (2002). Book review: electrodermal responses: what

happens in the brain. The Neuroscientist, 8(2):132–142.

[Cromby, 2012] Cromby, J. (2012). Feeling the way: Qualitative clinical research and

the affective turn. Qualitative Research in Psychology, 9(1):88–98.

[Daimi and Saha, 2014] Daimi, S. N. and Saha, G. (2014). Classification of emotions

induced by music videos and correlation with participants rating. Expert Systems

with Applications, 41(13):6057–6065.

[Dalvand and Kazemifard, 2012] Dalvand, K. and Kazemifard, M. (2012). An adap-

tive user-interface based on user’s emotion. 2012 2nd International eConference on

Computer and Knowledge Engineering, ICCKE 2012, pages 161–166.

BIBLIOGRAPHY 213

[Dan-Glauser and Scherer, 2011] Dan-Glauser, E. S. and Scherer, K. R. (2011). The

geneva affective picture database (gaped): a new 730-picture database focusing on

valence and normative significance. Behavior research methods, 43(2):468.

[Datcu, 2014] Datcu, D. (2014). On the Enhancement of Augmented Reality-based

Tele- Collaboration with Affective Computing Technology. -.

[Davidson, 2003] Davidson, R. J. (2003). Seven sins in the study of emotion: Correc-

tives from affective neuroscience. Brain and Cognition, 52(1):129–132.

[Davies et al., 2016] Davies, A., Horseman, L., Splendiani, B., Harper, S., and Jay, C.

(2016). Data driven analysis of visual behaviour for electrocardiogram interpreta-

tion. Technical Report.

[De Silva et al., 2006] De Silva, P. R., Osano, M., Marasinghe, A., and Madurappe-

ruma, A. P. (2006). Towards recognizing emotion with affective dimensions through

body gestures. In Automatic Face and Gesture Recognition, 2006. FGR 2006. 7th


[Demeyer, 2011] Demeyer, S. (2011). Research methods in computer science. In ICSM,

page 600.

[Denicolo and Becker, 2012] Denicolo, P. and Becker, L. (2012). Developing research

proposals. Sage.

[Desmet, 2003] Desmet, P. (2003). Measuring emotion: Development and application

of an instrument to measure emotional responses to products. In Funology, pages


[Detterman, 1987] Detterman, D. K. (1987). Theoretical notions of intelligence and

mental retardation. American Journal of Mental Deficiency.

[Dimberg, 1990] Dimberg, U. (1990). Facial electromyography and emotional reac-

tions. Psychophysiology.

[Dixon-Woods et al., 2006] Dixon-Woods, M., Bonas, S., Booth, A., Jones, D. R.,

Miller, T., Sutton, A. J., Shaw, R. L., Smith, J. A., and Young, B. (2006). How can

BIBLIOGRAPHY 214

systematic reviews incorporate qualitative research? a critical perspective. Quali-

tative research, 6(1):27–44.

[Doob and Kirshenbaum, 1973] Doob, A. N. and Kirshenbaum, H. M. (1973). The

effects on arousal of frustration and aggressive films. Journal of Experimental Social

Psychology, 9(1):57–64.

[Dowland and Furnell, 2004] Dowland, P. S. and Furnell, S. M. (2004). A long-term

trial of keystroke profiling using digraph, trigraph and keyword latencies. In Security

and Protection in Information Processing Systems, pages 275–289. Springer.

[Duchowski et al., 2018] Duchowski, A. T., Krejtz, K., Krejtz, I., Biele, C., Niedziel-

ska, A., Kiefer, P., Raubal, M., and Giannopoulos, I. (2018). The index of pupillary

activity: Measuring cognitive load vis- a-vis task difficulty with pupil oscillation. In

Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems,

page 282. ACM.

[Ehmke and Wilson, 2007] Ehmke, C. and Wilson, S. (2007). Identifying web usability

problems from eye-tracking data. In Proceedings of the 21st British HCI Group

Annual Conference on People and Computers: HCI... but not as we know it-Volume

1, pages 119–128. British Computer Society.

[Einhauser, 2017] Einhauser, W. (2017). The pupil as marker of cognitive processes.

In Computational and cognitive neuroscience of vision, pages 141–169. Springer.

[Ekman, 1992a] Ekman, P. (1992a). Are there basic emotions?

[Ekman, 1992b] Ekman, P. (1992b). An argument for basic emotions. Cognition &

emotion, 6(3-4):169–200.

[Ekman, 2004] Ekman, P. (2004). Emotions revealed. BMJ, 328(Suppl S5):0405184.

[Ekman and Friesen, 1971] Ekman, P. and Friesen, W. V. (1971). Constants across

cultures in the face and emotion. Journal of personality and social psychology,

17(2):124.

[Ekman and Friesen, 2003] Ekman, P. and Friesen, W. V. (2003). Unmasking the face:

A guide to recognizing emotions from facial clues. Ishk.

BIBLIOGRAPHY 215

[Ekman et al., 1987] Ekman, P., Friesen, W. V., O’sullivan, M., Chan, A., Diacoyanni-

Tarlatzis, I., Heider, K., Krause, R., LeCompte, W. A., Pitcairn, T., Ricci-Bitti,

P. E., et al. (1987). Universals and cultural differences in the judgments of facial

expressions of emotion. Journal of personality and social psychology, 53(4):712.

[el Kaliouby et al., 2006] el Kaliouby, R., Picard, R., and BARON-COHEN, S. (2006).

Affective computing and autism. Annals of the New York Academy of Sciences,

1093(1):228–248.

[Engel, 1960] Engel, B. T. (1960). Stimulus-response and individual-response speci-

ficity. AMA Archives of General Psychiatry, 2(3):305–313.

[Epp et al., 2011] Epp, C., Lippold, M., and Mandryk, R. L. (2011). Identifying emo-

tional states using keystroke dynamics. In Proceedings of the SIGCHI Conference

on Human Factors in Computing Systems, pages 715–724. ACM.

[Eraslan et al., 2017] Eraslan, S., Yaneva, V., Yesilada, Y., and Harper, S. (2017). Do

web users with autism experience barriers when searching for information within

web pages? In Proceedings of the 14th Web for All Conference on The Future of

Accessible Work, W4A ’17, pages 20:1–20:4, New York, NY, USA. ACM.

[Eraslan et al., 2018] Eraslan, S., Yaneva, V., Yesilada, Y., and Harper, S. (2018).

Web users with autism: eye tracking evidence for differences. Behaviour & Infor-

mation Technology.

[Eraslan et al., 2014] Eraslan, S., Yesilada, Y., and Harper, S. (2014). Identifying

patterns in eyetracking scanpaths in terms of visual elements of web pages. In Web

Engineering, pages 163–180. Springer.

[Eraslan et al., 2016a] Eraslan, S., Yesilada, Y., and Harper, S. (2016a). Scanpath

trend analysis on web pages: Clustering eye tracking scanpaths. ACM Trans. Web,

10(4):20:1–20:35.

[Eraslan et al., 2016b] Eraslan, S., Yesilada, Y., and Harper, S. (2016b). Trends in

eye tracking scanpaths: Segmentation effect? In Proceedings of the 27th ACM

Conference on Hypertext and Social Media, HT ’16, pages 15–25, New York, NY,

USA. ACM.

BIBLIOGRAPHY 216

[Erdem and Sert, 2014] Erdem, E. S. and Sert, M. (2014). Efficient recognition of

human emotional states from audio signals. In Multimedia (ISM), 2014 IEEE In-

ternational Symposium on, pages 139–142. IEEE.

[Ettinger et al., 1991] Ettinger, E., Wyatt, H., and London, R. (1991). Anisocoria.

variation and clinical observation with different conditions of illumination and ac-

commodation. Investigative ophthalmology & visual science, 32(3):501–509.

[Exposito et al., 2018] Exposito, M., Picard, R. W., and Hernandez, J. (2018). Affec-

tive keys: towards unobtrusive stress sensing of smartphone users. In Proceedings

of the 20th International Conference on Human-Computer Interaction with Mobile

Devices and Services Adjunct, pages 139–145. ACM.

[Feild et al., 2010] Feild, H. A., Allan, J., and Jones, R. (2010). Predicting searcher

frustration. In Proceedings of the 33rd international ACM SIGIR conference on

Research and development in information retrieval, pages 34–41. ACM.

[Feinstein et al., 2011] Feinstein, J. S., Adolphs, R., Damasio, A., and Tranel, D.

(2011). The human amygdala and the induction and experience of fear. Current

biology, 21(1):34–38.

[Ferhat and Vilarino, 2016] Ferhat, O. and Vilarino, F. (2016). Low cost eye tracking.

Computational intelligence and neuroscience, 2016:17.

[Fernandez et al., 2012] Fernandez, J. M., Augusto, J. C., Seepold, R., and Madrid,

N. M. (2012). A sensor technology survey for a stress-aware trading process. IEEE

Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews,

42(6):809–824.

[Fisher, 1993] Fisher, R. J. (1993). Social desirability bias and the validity of indirect

questioning. Journal of consumer research, 20(2):303–315.

[Foglia et al., 2008] Foglia, P., Prete, C. A., and Zanda, M. (2008). Relating gsr

signals to traditional usability metrics: Case study with an anthropomorphic web

assistant. In 2008 IEEE Instrumentation and Measurement Technology Conference,

pages 1814–1818. IEEE.

BIBLIOGRAPHY 217

[Foster et al., 1998] Foster, C. A., Witcher, B. S., Campbell, W. K., and Green, J. D.

(1998). Arousal and attraction: Evidence for automatic and controlled processes.

Journal of Personality and Social Psychology, 74(1):86.

[Fragopanagos and Taylor, 2005] Fragopanagos, N. and Taylor, J. G. (2005). Emotion

recognition in human–computer interaction. Neural Networks, 18(4):389–405.

[Fritz et al., 2009] Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I.,

Turner, R., Friederici, A. D., and Koelsch, S. (2009). Universal recognition of three

basic emotions in music. Current biology, 19(7):573–576.

[Fuhl et al., 2018] Fuhl, W., Castner, N., and Kasneci, E. (2018). Histogram of ori-

ented velocities for eye movement detection. In Proceedings of the Workshop on

Modeling Cognitive Processes from Multimodal Data, page 5. ACM.

[Gallese, 2006] Gallese, V. (2006). Intentional attunement: A neurophysiological per-

spective on social cognition and its disruption in autism. Brain research, 1079(1):15–

24.

[Gamon, 2004] Gamon, M. (2004). Sentiment classification on customer feedback data:

noisy data, large feature vectors, and the role of linguistic analysis. In Proceedings

of the 20th international conference on Computational Linguistics, page 841. Asso-

ciation for Computational Linguistics.

[Gao and Wang, 2015] Gao, Z. and Wang, S. (2015). Emotion recognition from eeg

signals using hierarchical bayesian network with privileged information. In Proceed-

ings of the 5th ACM on International Conference on Multimedia Retrieval, pages

579–582. ACM.

[Garrett et al., 2004] Garrett, S. K., Horn, D. B., and Caldwell, B. S. (2004). Modeling

user satisfaction, frustration, and user goal/website compatibility. In Proceedings

of the Human Factors and Ergonomics Society Annual Meeting, volume 48, pages

1508–1512. SAGE Publications Sage CA: Los Angeles, CA.

[Geethanjali et al., 2017] Geethanjali, B., Adalarasu, K., Hemapraba, A., Pravin Ku-

mar, S., and Rajasekeran, R. (2017). Emotion analysis using sam (self-assessment

manikin) scale. Biomedical Research (0970-938X), 28.

BIBLIOGRAPHY 218

[Gehricke and Shapiro, 2000] Gehricke, J.-G. and Shapiro, D. (2000). Reduced facial

expression and social context in major depression: discrepancies between facial

muscle activity and self-reported emotion. Psychiatry Research, 95(2):157–167.

[Gellatly and Meyer, 1992] Gellatly, I. R. and Meyer, J. P. (1992). The effects of

goal difficulty on physiological arousal, cognition, and task performance. Journal of

Applied Psychology, 77(5):694.

[Gerald, 2018] Gerald (2018). Violentmonkey. https://github.com/violentmonkey/violentmonkey.

[Gil et al., 2013] Gil, G. B., de Jesus, A. B., and Lopez, J. M. M. (2013). Combining

machine learning techniques and natural language processing to infer emotions using

spanish twitter corpus. In Highlights on Practical Applications of Agents and Multi-

Agent Systems, pages 149–157. Springer.

[Gilleade et al., 2005] Gilleade, K., Dix, A., and Agllanson, J. (2005). Affective

videogames and modes of affective gaming: assist me, challenge me, emote me.

DiGRA 2005: Changing Views–Worlds in Play.

[Gingras et al., 2015] Gingras, B., Marin, M. M., Puig-Waldmuller, E., and Fitch,

W. (2015). The eye is listening: Music-induced arousal and individual differences

predict pupillary responses. Frontiers in human neuroscience, 9:619.

[Gollan et al., 2016] Gollan, B., Haslgrubler, M., and Ferscha, A. (2016). Demon-

strator for extracting cognitive load from pupil dilation for attention management

services. In Proceedings of the 2016 ACM International Joint Conference on Per-

vasive and Ubiquitous Computing: Adjunct, pages 1566–1571. ACM.

[Greene et al., 2016] Greene, S., Thapliyal, H., and Caban-Holt, A. (2016). A survey of

affective computing for stress detection: Evaluating technologies in stress detection

for better health. IEEE Consumer Electronics Magazine, 5(4):44–56.

[Greenland et al., 2016] Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B.,

Poole, C., Goodman, S. N., and Altman, D. G. (2016). Statistical tests, p values,

confidence intervals, and power: a guide to misinterpretations. European journal of

epidemiology, 31(4):337–350.

BIBLIOGRAPHY 219

[Gunes and Piccardi, 2007] Gunes, H. and Piccardi, M. (2007). Bi-modal emotion

recognition from expressive face and body gestures. Journal of Network and Com-

puter Applications, 30(4):1334–1345.

[Guo et al., 2019] Guo, F., Li, M., Qu, Q., and Duffy, V. G. (2019). The effect of

a humanoid robots emotional behaviors on users emotional responses: Evidence

from pupillometry and electroencephalography measures. International Journal of

Human–Computer Interaction, pages 1–13.

[Harms et al., 2010] Harms, M. B., Martin, A., and Wallace, G. L. (2010). Facial

emotion recognition in autism spectrum disorders: a review of behavioral and neu-

roimaging studies. Neuropsychology review, 20(3):290–322.

[Harper et al., 2006] Harper, S., Bechhofer, S., and Lunn, D. (2006). Sadie:: transcod-

ing based on css. In Proceedings of the 8th international ACM SIGACCESS confer-

ence on Computers and accessibility, pages 259–260. ACM.

[Harris et al., 2000] Harris, P. L., de Rosnay, M., and Pons, F. (2000). Understanding

emotion. Handbook of emotions, 2:281–292.

[Hart et al., 2012] Hart, J., Sutcliffe, A., and De Angeli, A. (2012). Using affect to

evaluate user engagement. In CHI’12 Extended Abstracts on Human Factors in

Computing Systems, pages 1811–1834. ACM.

[Hartmann et al., 2005] Hartmann, B., Mancini, M., and Pelachaud, C. (2005). Imple-

menting expressive gesture synthesis for embodied conversational agents. In gesture

in human-Computer Interaction and Simulation, pages 188–199. Springer.

[Hassan, 2006] Hassan, E. (2006). Recall bias can be a threat to retrospective and

prospective research designs. The Internet Journal of Epidemiology, 3(2):339–412.

[Hayes and Petrov, 2016] Hayes, T. R. and Petrov, A. A. (2016). Mapping and cor-

recting the influence of gaze position on pupil size measurements. Behavior Research

Methods, 48(2):510–527.

[Hazlett, 2003] Hazlett, R. (2003). Measurement of user frustration: a biologic ap-

proach. In CHI’03 extended abstracts on Human factors in computing systems,

pages 734–735. ACM.

BIBLIOGRAPHY 220

[Hazlett and Hazlett, 1999] Hazlett, R. L. and Hazlett, S. Y. (1999). Emotional re-

sponse to television commercials: Facial emg vs. self-report. Journal of Advertising

Research, 39:7–24.

[He et al., 2018] He, H., She, Y., Xiahou, J., Yao, J., Li, J., Hong, Q., and Ji, Y.

(2018). Real-time eye-gaze based interaction for human intention prediction and

emotion analysis. In Proceedings of Computer Graphics International 2018, pages

185–194. ACM.

[Heller et al., 1997] Heller, W., Nitschke, J. B., and Lindsay, D. L. (1997). Neuropsy-

chological correlates of arousal in self-reported emotion. Cognition & Emotion,

11(4):383–402.

[Henderson et al., 2018] Henderson, R. R., Bradley, M. M., and Lang, P. J. (2018).

Emotional imagery and pupil diameter. Psychophysiology, 55(6):e13050.

[Hernandez-Aguila et al., 2014] Hernandez-Aguila, A., Garcia-Valdez, M., and Man-

cilla, A. (2014). Affective states in software programming: Classification of individ-

uals based on their keystroke and mouse dynamics. Intelligent Learning Environ-

ments, page 27.

[Hjortsjo, 1969] Hjortsjo, C.-H. (1969). Man’s face and mimic language. Studen lit-

teratur.

[Hjortskov et al., 2004] Hjortskov, N., Rissen, D., Blangsted, A. K., Fallentin, N.,

Lundberg, U., and Søgaard, K. (2004). The effect of mental stress on heart rate

variability and blood pressure during computer work. European journal of applied

physiology, 92(1-2):84–89.

[Hochschild, 1979] Hochschild, A. R. (1979). Emotion work, feeling rules, and social

structure. American journal of sociology, 85(3):551–575.

[Hokanson and Burgess, 1964] Hokanson, J. E. and Burgess, M. (1964). Effects of

physiological arousal level, frustration, and task complexity on performance. The

Journal of Abnormal and Social Psychology, 68(6):698.

BIBLIOGRAPHY 221

[Holmqvist et al., 2011] Holmqvist, K., Nystrom, M., Andersson, R., Dewhurst, R.,

Jarodzka, H., and Van de Weijer, J. (2011). Eye tracking: A comprehensive guide

to methods and measures. OUP Oxford.

[Hornbæk and Law, 2007] Hornbæk, K. and Law, E. L.-C. (2007). Meta-analysis of

correlations among usability measures. In Proceedings of the SIGCHI conference on

Human factors in computing systems, pages 617–626. ACM.

[Horvat et al., 2013] Horvat, M., Popovic, S., and Cosic, K. (2013). Multimedia stim-

uli databases usage patterns: a survey report. In 2013 36th International Convention

on Information and Communication Technology, Electronics and Microelectronics

(MIPRO), pages 993–997. IEEE.

[Hosseini et al., 2010] Hosseini, S. A., Khalilzadeh, M. A., Naghibi-Sistani, M. B.,

and Niazmand, V. (2010). Higher order spectra analysis of eeg signals in emotional

stress states. In Information Technology and Computer Science (ITCS), 2010 Second


[Hu et al., 2009] Hu, X., Downie, J. S., and Ehmann, A. F. (2009). Lyric text mining

in music mood classification. American music, 183(5,049):2–209.

[Hudlicka, 2009] Hudlicka, E. (2009). Affective game engines: motivation and require-

ments. In Proceedings of the 4th international conference on foundations of digital

games, pages 299–306. ACM.

[Hui and Triandis, 1989] Hui, C. H. and Triandis, H. C. (1989). Effects of culture and

response format on extreme response style. Journal of cross-cultural psychology,

20(3):296–309.

[Iqbal et al., 2004] Iqbal, S. T., Zheng, X. S., and Bailey, B. P. (2004). Task-evoked

pupillary response to mental workload in human-computer interaction. In CHI’04

extended abstracts on Human factors in computing systems, pages 1477–1480. ACM.

[Izard, 1992] Izard, C. E. (1992). Basic emotions, relations among emotions, and

emotion-cognition relations. Journal of personality and social psychology.

BIBLIOGRAPHY 222

[Izard et al., 1987] Izard, C. E., Hembree, E. A., and Huebner, R. R. (1987). Infants’

emotion expressions to acute pain: Developmental change and stability of individual

differences. Developmental Psychology, 23(1):105.

[Janisse, 1974] Janisse, M. P. (1974). Pupil size, affect and exposure frequency. Social

Behavior and personality, 2(2):125–146.

[Janssen et al., 2012] Janssen, J. H., Van Den Broek, E. L., and Westerink, J. H.

D. M. (2012). Tune in to your emotions: A robust personalized affective music

player. User Modelling and User-Adapted Interaction, 22:255–279.

[Jercic et al., 2018] Jercic, P., Sennersten, C., and Lindley, C. (2018). Modeling cog-

nitive load and physiological arousal through pupil diameter and heart rate. Multi-

media Tools and Applications, pages 1–15.

[Jeronimus and Laceulle, 2017] Jeronimus, B. F. and Laceulle, O. M. (2017). Frustra-

tion, pages 1–5. Springer International Publishing, Cham.

[Jerritta et al., 2013] Jerritta, S., Murugappan, M., Wan, K., and Yaacob, S. (2013).

Emotion detection from qrs complex of ecg signals using hurst exponent for differ-

ent age groups. In Affective Computing and Intelligent Interaction (ACII), 2013

Humaine Association Conference on, pages 849–854. IEEE.

[Jin, 1992] Jin, P. (1992). Toward a reconceptualization of the law of initial value.

Psychological Bulletin, 111(1):176.

[Johnson et al., 2007] Johnson, R. B., Onwuegbuzie, A. J., and Turner, L. A. (2007).

Toward a definition of mixed methods research. Journal of mixed methods research,

1(2):112–133.

[Kahneman and Beatty, 1966] Kahneman, D. and Beatty, J. (1966). Pupil diameter

and load on memory. Science, 154(3756):1583–1585.

[Kambanaros et al., 2013] Kambanaros, M., Grohmann, K. K., and Michaelides, M.

(2013). Lexical retrieval for nouns and verbs in typically developing bilectal children.

First language, 33(2):182–199.

BIBLIOGRAPHY 223

[Kao and Poteet, 2007] Kao, A. and Poteet, S. R. (2007). Natural language processing

and text mining. Springer Science & Business Media.

[Kassem et al., 2017] Kassem, K., Salah, J., Abdrabou, Y., Morsy, M., El-Gendy, R.,

Abdelrahman, Y., and Abdennadher, S. (2017). Diva: exploring the usage of pupil

diameter to elicit valence and arousal. In Proceedings of the 16th International

Conference on Mobile and Ubiquitous Multimedia, pages 273–278. ACM.

[Kassner et al., 2014] Kassner, M., Patera, W., and Bulling, A. (2014). Pupil: an

open source platform for pervasive eye tracking and mobile gaze-based interaction.

In Proceedings of the 2014 ACM international joint conference on pervasive and

ubiquitous computing: Adjunct publication, pages 1151–1160. ACM.

[Khan et al., 2006] Khan, M. M., Ward, R. D., and Ingleby, M. (2006). Infrared

thermal sensing of positive and negative affective states. In Robotics, Automation

and Mechatronics, 2006 IEEE Conference on, pages 1–6. IEEE.

[Khan et al., 2012] Khan, M. S., Khan, I. A., and Shafi, M. (2012). Keyboard and

mouse interaction based mood measurement using artificial neural networks. In

Robotics and Artificial Intelligence (ICRAI), 2012 International Conference on,


[Khosrowabadi et al., 2010] Khosrowabadi, R., Quek, H. C., Wahab, A., and Ang,

K. K. (2010). Eeg-based emotion recognition using self-organizing map for boundary

detection. In Pattern Recognition (ICPR), 2010 20th International Conference on,


[Khosrowabadi et al., 2009] Khosrowabadi, R., Wahab, A., Ang, K. K., and Baniasad,

M. H. (2009). Affective computation on eeg correlates of emotion from musical

and vocal stimuli. In Neural Networks, 2009. IJCNN 2009. International Joint

Conference on, pages 1590–1594. IEEE.

[Kihlstrom et al., 2000] Kihlstrom, J. F., Eich, E., Sandbrand, D., and Tobias, B. A.

(2000). Emotion and memory: Implications for self-report. The science of self-

report: Implications for research and practice, pages 81–99.

BIBLIOGRAPHY 224

[Kim and Andre, 2008] Kim, J. and Andre, E. (2008). Emotion recognition based on

physiological changes in music listening. Pattern Analysis and Machine Intelligence,

IEEE Transactions on, 30(12):2067–2083.

[Kim et al., 2004a] Kim, J., Bee, N., Wagner, J., and Andre, E. (2004a). Emote to

win: Affective interactions with a computer game agent. GI Jahrestagung, 1:159–

164.

[Kim et al., 2004b] Kim, K. H., Bang, S., and Kim, S. (2004b). Emotion recognition

system using short-term monitoring of physiological signals. Medical and biological

engineering and computing, 42(3):419–427.

[Kim et al., 2010] Kim, Y. E., Schmidt, E. M., Migneco, R., Morton, B. G., Richard-

son, P., Scott, J., Speck, J. A., and Turnbull, D. (2010). Music emotion recognition:

A state of the art review. In Proc. ISMIR, pages 255–266. Citeseer.

[Kirk et al., 2012] Kirk, M., Morgan, R., Tonkin, E., McDonald, K., and Skirton, H.

(2012). An objective approach to evaluating an internet-delivered genetics educa-

tion resource developed for nurses: using google analytics to monitor global visitor

engagement. Journal of Research in Nursing, 17(6):557–579.

[Klein et al., 2002] Klein, J., Moon, Y., and Picard, R. W. (2002). This computer re-

sponds to user frustration: Theory, design, and results. Interacting with computers,

14(2):119–140.

[Kleinginna and Kleinginna, 1981] Kleinginna, P. R. and Kleinginna, A. M. (1981). A

categorized list of emotion definitions, with suggestions for a consensual definition.

Motivation and emotion, 5(4):345–379.

[Klingner, 2010] Klingner, J. (2010). Fixation-aligned pupillary response averaging.

In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications,


[Klingner et al., 2008] Klingner, J., Kumar, R., and Hanrahan, P. (2008). Measuring

the task-evoked pupillary response with a remote eye tracker. In Proceedings of the

2008 symposium on Eye tracking research & applications, pages 69–72. ACM.

BIBLIOGRAPHY 225

[Klingner et al., 2011] Klingner, J., Tversky, B., and Hanrahan, P. (2011). Effects of

visual and verbal presentation on cognitive load in vigilance, memory, and arithmetic

tasks. Psychophysiology, 48(3):323–332.

[Kolakowska, 2013] Kolakowska, A. (2013). A review of emotion recognition methods

based on keystroke dynamics and mouse movements. In Human System Interaction

(HSI), 2013 The 6th International Conference on, pages 548–555. IEEE.

[Ko lakowska et al., 2015] Ko lakowska, A., Landowska, A., Szwoch, M., Szwoch, W.,

and Wrobel, M. R. (2015). Modeling emotions for affect-aware applications. Cover

and title page designed by ESENCJA Sp. z oo, page 55.

[Korn and Bach, 2016] Korn, C. W. and Bach, D. R. (2016). A solid frame for the

window on cognition: Modeling event-related pupil responses. Journal of Vision,

16(3):28–28.

[Kosir and Strle, 2017] Kosir, A. and Strle, G. (2017). Emotion elicitation in a socially

intelligent service: The typing tutor. Computers, 6(2):14.

[Koss, 1986] Koss, M. C. (1986). Pupillary dilation as an index of central nervous

system α 2-adrenoceptor activation. Journal of pharmacological methods, 15(1):1–

19.

[Kumar and Agarwal, 2014] Kumar, A. and Agarwal, A. (2014). Emotion recognition

using anatomical information in facial expressions. In Industrial and Information

Systems (ICIIS), 2014 9th International Conference on, pages 1–6. IEEE.

[La Heij, 1988] La Heij, W. (1988). Components of stroop-like interference in picture

naming. Memory & Cognition, 16(5):400–410.

[Laeng et al., 2016] Laeng, B., Eidet, L. M., Sulutvedt, U., and Panksepp, J. (2016).

Music chills: The eye pupil as a mirror to musics soul. Consciousness and cognition,

44:161–178.

[Lang, 1990] Lang, A. (1990). Involuntary attention and physiological arousal evoked

by structural features and emotional content in tv commercials. Communication

Research, 17(3):275–299.

BIBLIOGRAPHY 226

[Lang, 2005] Lang, P. J. (2005). International affective picture system (iaps): Affective

ratings of pictures and instruction manual. Technical report.

[Lang et al., 1997] Lang, P. J., Bradley, M. M., and Cuthbert, B. N. (1997). Interna-

tional affective picture system (iaps): Technical manual and affective ratings. NIMH

Center for the Study of Emotion and Attention, pages 39–58.

[Lang et al., 1993] Lang, P. J., Greenwald, M. K., Bradley, M. M., and Hamm, A. O.

(1993). Looking at pictures: Affective, facial, visceral, and behavioral reactions.


[Latif et al., 2015] Latif, M. A., Yusof, H. M., Sidek, S. N., and Rusli, N. (2015).

Thermal imaging based affective state recognition. In 2015 IEEE International

Symposium on Robotics and Intelligent Sensors (IRIS), pages 214–219. IEEE.

[Law et al., 2009] Law, E. L.-C., Roto, V., Hassenzahl, M., Vermeeren, A. P., and

Kort, J. (2009). Understanding, scoping and defining user experience: a survey

approach. In Proceedings of the SIGCHI conference on human factors in computing

systems, pages 719–728. ACM.

[Lazar et al., 2006a] Lazar, J., Jones, A., Hackley, M., and Shneiderman, B. (2006a).

Severity and impact of computer user frustration: A comparison of student and

workplace users. Interacting with Computers, 18(2):187–207.

[Lazar et al., 2006b] Lazar, J., Jones, A., and Shneiderman, B. (2006b). Workplace

user frustration with computers: An exploratory investigation of the causes and

severity. Behaviour & Information Technology, 25(03):239–251.

[Lazarus et al., 1952] Lazarus, R. S., Deese, J., and Osler, S. F. (1952). The effects of

psychological stress upon performance. Psychological bulletin, 49(4):293.

[Ledoux, 1993] Ledoux, J. E. (1993). Cognition versus emotion, again-this time in the

brain: A response to parrott and schulkin. Cognition & Emotion, 7(1):61–64.

[Lee et al., 2012] Lee, B., Isenberg, P., Riche, N. H., and Carpendale, S. (2012). Be-

yond mouse and keyboard: Expanding design considerations for information visual-

ization interactions. Visualization and Computer Graphics, IEEE Transactions on,

18(12):2689–2698.

BIBLIOGRAPHY 227

[Lee et al., 2011] Lee, Y.-K., Kwon, O.-W., Shin, H. S., Jo, J., and Lee, Y. (2011).

Noise reduction of ppg signals using a particle filter for robust emotion recognition.

In Consumer Electronics-Berlin (ICCE-Berlin), 2011 IEEE International Confer-

ence on, pages 202–205. IEEE.

[Levine and Safer, 2002] Levine, L. J. and Safer, M. A. (2002). Sources of bias in

memory for emotions. Current Directions in Psychological Science, 11(5):169–173.

[Li and Chen, 2006] Li, L. and Chen, J.-h. (2006). Emotion recognition using physi-

ological signals from multiple subjects. In Intelligent Information Hiding and Mul-

timedia Signal Processing, 2006. IIH-MSP’06. International Conference on, pages

355–358. IEEE.

[Li et al., 2009] Li, M., Chai, Q., Kaixiang, T., Wahab, A., and Abut, H. (2009). Eeg

emotion recognition system. In In-vehicle corpus and signal processing for driver

behavior, pages 125–135. Springer.

[Lin et al., 2013] Lin, T., Li, X., Wu, Z., and Tang, N. (2013). Automatic cognitive

load classification using high-frequency interaction events: An exploratory study.

International Journal of Technology and Human Interaction (IJTHI), 9(3):73–88.

[Lin et al., 2010] Lin, Y.-P., Wang, C.-H., Jung, T.-P., Wu, T.-L., Jeng, S.-K., Duann,

J.-R., and Chen, J.-H. (2010). Eeg-based emotion recognition in music listening.

Biomedical Engineering, IEEE Transactions on, 57(7):1798–1806.

[Lin et al., 2007] Lin, Y.-P., Wang, C.-H., Wu, T.-L., Jeng, S.-K., and Chen, J.-H.

(2007). Multilayer perceptron for eeg signal classification during listening to emo-

tional music. In TENCON 2007-2007 IEEE Region 10 Conference, pages 1–3. IEEE.

[Liu et al., 2009] Liu, C., Agrawal, P., Sarkar, N., and Chen, S. (2009). Dynamic

difficulty adjustment in computer games through real-time anxiety-based affective

feedback. International Journal of Human-Computer Interaction, 25(6):506–529.

[Liu and Joines, 2012] Liu, S. and Joines, S. (2012). Developing a Framework of Guid-

ing Interface Design for Older Adults. Proceedings of the Human Factors and Er-

gonomics Society Annual Meeting, 56:1967–1971.

BIBLIOGRAPHY 228

[Liu et al., 2014] Liu, Y., Ritchie, J. M., Lim, T., Kosmadoudi, Z., Sivanathan, a.,

and Sung, R. C. W. (2014). A fuzzy psycho-physiological approach to enable the

understanding of an engineer’s affect status during CAD activities. CAD Computer

Aided Design, 54:19–38.

[Lu et al., 2010] Lu, C.-Y., Lin, S.-H., Liu, J.-C., Cruz-Lara, S., and Hong, J.-S.

(2010). Automatic event-level textual emotion sensing using mutual action his-

togram between entities. Expert systems with applications, 37(2):1643–1653.

[Lu et al., 2015] Lu, Y., Zheng, W.-L., Li, B., and Lu, B.-L. (2015). Combining eye

movements and eeg to enhance emotion recognition. In Twenty-Fourth International

Joint Conference on Artificial Intelligence.

[Luharuka et al., 2003] Luharuka, R., Gao, R. X., and Krishnamurty, S. (2003). De-

sign and realization of a portable data logger for physiological sensing [gsr]. Instru-

mentation and Measurement, IEEE Transactions on, 52(4):1289–1295.

[Lunn and Harper, 2010a] Lunn, D. and Harper, S. (2010a). Using galvanic skin re-

sponse measures to identify areas of frustration for older web 2.0 users. W4A 10

Proceedings of the 2010 International Cross Disciplinary Conference on Web Acces-

sibility W4A, pages 1–10.

[Lunn and Harper, 2010b] Lunn, D. and Harper, S. (2010b). Using galvanic skin re-

sponse measures to identify areas of frustration for older web 2.0 users. In Proceed-

ings of the 2010 International Cross Disciplinary Conference on Web Accessibility

(W4A), page 34. ACM.

[Mackay et al., 1978] Mackay, C., Cox, T., Burrows, G., and Lazzerini, T. (1978). An

inventory for the measurement of self-reported stress and arousal. British journal

of social and clinical psychology, 17(3):283–284.

[MacLeod, 1991] MacLeod, C. M. (1991). Half a century of research on the stroop

effect: an integrative review. Psychological bulletin, 109(2):163.

[Madan et al., 2018] Madan, C. R., Bayer, J., Gamer, M., Lonsdorf, T. B., and Som-

mer, T. (2018). Visual complexity and affect: ratings reflect more than meets the

eye. Frontiers in psychology, 8:2368.

BIBLIOGRAPHY 229

[Mampusti et al., 2011] Mampusti, E. T., Ng, J. S., Quinto, J. J. I., Teng, G. L.,

Suarez, M. T. C., and Trogo, R. S. (2011). Measuring academic affective states

of students via brainwave signals. In Knowledge and Systems Engineering (KSE),

2011 Third International Conference on, pages 226–231. IEEE.

[Mantiuk et al., 2012] Mantiuk, R., Kowalik, M., Nowosielski, A., and Bazyluk, B.

(2012). Do-it-yourself eye tracker: Low-cost pupil-based eye tracker for computer

graphics applications. In International Conference on Multimedia Modeling, pages


[Mao and Li, 2010] Mao, X. and Li, Z. (2010). Agent based affective tutoring systems:

A pilot study. Computers & Education, 55(1):202–208.

[Martin et al., 1994] Martin, A., Wiggs, C. L., Lalonde, F., and Mack, C. (1994).

Word retrieval to letter and semantic cues: A double dissociation in normal subjects

using interference tasks. Neuropsychologia, 32(12):1487–1494.

[Martin et al., 2007] Martin, L., y Restrepo, E. G., Barrera, C., Ascaso, A. R., San-

tos, O. C., and Boticario, J. G. (2007). Usability and accessibility evaluations along

the elearning cycle. In International Conference on Web Information Systems En-

gineering, pages 453–458. Springer.

[Matentzoglu et al., 2016] Matentzoglu, N., Vigo, M., Jay, C., and Stevens, R. (2016).

Making entailment set changes explicit improves the understanding of consequences

of ontology authoring actions. In European Knowledge Acquisition Workshop, pages


[Mathieu et al., 2013] Mathieu, N., Bonnet, S., Harquel, S., Gentaz, E., and Cam-

pagne, A. (2013). Single-trial erp classification of emotional processing. In Neu-

ral Engineering (NER), 2013 6th International IEEE/EMBS Conference on, pages

101–104. IEEE.

[Mathot, 2018] Mathot, S. (2018). Pupillometry: Psychology, physiology, and func-

tion. Journal of Cognition, 1(1).

[Matsumoto et al., 2016] Matsumoto, A., Tange, Y., Nakazawa, A., and Nishida, T.

(2016). Estimation of task difficulty and habituation effect while visual manipulation

BIBLIOGRAPHY 230

using pupillary response. In Video Analytics. Face and Facial Expression Recognition

and Audience Measurement, pages 24–35. Springer.

[Matsumoto, 1993] Matsumoto, D. (1993). Ethnic differences in affect intensity, emo-

tion judgments, display rule attitudes, and self-reported emotional expression in an

american sample. Motivation and emotion, 17(2):107–123.

[Matthews et al., 2019a] Matthews, O., Davies, A., Vigo, M., and Harper, S. (2019a).

Unobtrusive arousal detection on the web using pupillary response. International

Journal of Human-Computer Studies.

[Matthews et al., 2019b] Matthews, O., Eraslan, S., Yaneva, V., Davies, A., Yesilada,

Y., Vigo, M., and Harper, S. (2019b). Combining trending scan paths with arousal

to model visual behaviour on the web: A case study of neurotypical people vs

people with autism. In Proceedings of the 27th ACM Conference on User Modeling,

Adaptation and Personalization, pages 86–94. ACM.

[Matthews et al., 2018a] Matthews, O., Sarsenbayeva, Z., Jiang, W., Newn, J., Vel-

loso, E., Clinch, S., and Gonalves, J. (2018a). Inferring the mood of a community

from their walking speed: A preliminary study. In UbiComp/ISWC Adjunct, pages

1144–1149. ACM.

[Matthews et al., 2018b] Matthews, O., Vigo, M., and Harper, S. (2018b). Sensing

arousal and focal attention during visual interaction. In Proceedings of the 20th

ACM International Conference on Multimodal Interaction, ICMI ’18, pages 263–

267, New York, NY, USA. ACM.

[Matthews et al., 2018c] Matthews, O., Vigo, M., and Harper, S. (2018c). Sensing

arousal and focal attention during visual interaction. In ICMI, pages 263–267.

ACM.

[Matthews et al., 2018d] Matthews, O., Vigo, M., and Harper, S. (2018d). Towards

arousal sensing with high fidelity detection of visual focal attention. Measuring

behaviour.

[Mayes and Calhoun, 2007] Mayes, S. D. and Calhoun, S. L. (2007). Learning, at-

tention, writing, and processing speed in typical children and children with adhd,

BIBLIOGRAPHY 231

autism, anxiety, depression, and oppositional-defiant disorder. Child Neuropsychol-

ogy, 13(6):469–493.

[McDuff et al., 2014] McDuff, D., Gontarek, S., and Picard, R. W. (2014). Improve-

ments in remote cardiopulmonary measurement using a five band digital camera.

Biomedical Engineering, IEEE Transactions on, 61(10):2593–2601.

[McGarrigle et al., 2017] McGarrigle, R., Dawes, P., Stewart, A. J., Kuchinsky, S. E.,

and Munro, K. J. (2017). Pupillometry reveals changes in physiological arousal

during a sustained listening task. Psychophysiology, 54(2):193–203.

[McKone, 1999] McKone, K. E. (1999). Analysis of student feedback improves in-

structor effectiveness. Journal of Management Education, 23(4):396–415.

[Mehrabian, 1996] Mehrabian, A. (1996). Pleasure-arousal-dominance: A general

framework for describing and measuring individual differences in temperament. Cur-

rent Psychology, 14(4):261–292.

[Mehrabian and Russell, 1974] Mehrabian, A. and Russell, J. A. (1974). An approach

to environmental psychology. the MIT Press.

[Merrill et al., 1992] Merrill, D. C., Reiser, B. J., Ranney, M., and Trafton, J. G.

(1992). Effective tutoring techniques: A comparison of human tutors and intelligent

tutoring systems. The Journal of the Learning Sciences, 2(3):277–305.

[Michailidou et al., 2008] Michailidou, E., Harper, S., and Bechhofer, S. (2008). Visual

complexity and aesthetic perception of web pages. In Proceedings of the 26th annual

ACM international conference on Design of communication, pages 215–224. ACM.

[Mion and Poli, 2008] Mion, L. and Poli, G. D. (2008). Score-independent audio fea-

tures for description of music expression. Audio, Speech, and Language Processing,

IEEE Transactions on, 16(2):458–466.

[Mirgain and Cordova, 2007] Mirgain, S. A. and Cordova, J. V. (2007). Emotion skills

and marital health: The association between observed and self-reported emotion

skills, intimacy, and marital satisfaction. Journal of Social and Clinical Psychology,

26(9):983.

BIBLIOGRAPHY 232

[Mitra and Acharya, 2007] Mitra, S. and Acharya, T. (2007). Gesture recognition: A

survey. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE

Transactions on, 37(3):311–324.

[Monrose and Rubin, 2000] Monrose, F. and Rubin, A. D. (2000). Keystroke dynamics

as a biometric for authentication. Future Generation computer systems, 16(4):351–

359.

[Morrison et al., 2005] Morrison, D., Wang, R., De Silva, L. C., and Xu, W. (2005).

Real-time spoken affect classification and its application in call-centres. In Infor-

mation Technology and Applications, 2005. ICITA 2005. Third International Con-

ference on, volume 1, pages 483–487. IEEE.

[Moses et al., 2007] Moses, Z. B., Luecken, L. J., and Eason, J. C. (2007). Measuring

task-related changes in heart rate variability. In Engineering in Medicine and Biology

Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE,


[Mullins and Treu, 1991] Mullins, P. M. and Treu, S. (1991). Measurement of stress

to gauge user satisfaction with features of the computer interface. Behaviour &

Information Technology, 10(4):325–343.

[Murugappan and Murugappan, 2013] Murugappan, M. and Murugappan, S. (2013).

Human emotion recognition through short time electroencephalogram (eeg) signals

using fast fourier transform (fft). In Signal Processing and its Applications (CSPA),

2013 IEEE 9th International Colloquium on, pages 289–294. IEEE.

[Nakamura et al., 1993] Nakamura, Y., Yamamoto, Y., and Muraoka, I. (1993). Au-

tonomic control of heart rate during physical exercise and fractal dimension of heart

rate variability. Journal of Applied Physiology, 74(2):875–881.

[Nakarada-Kordic and Lobb, 2005] Nakarada-Kordic, I. and Lobb, B. (2005). Effect

of perceived attractiveness of web interface design on visual search of web sites. In

Proceedings of the 6th ACM SIGCHI New Zealand chapter’s international conference

on Computer-human interaction: making CHI natural, pages 25–27. ACM.

BIBLIOGRAPHY 233

[Nesbitt et al., 2015] Nesbitt, K., Blackmore, K., Hookham, G., Kay-Lambkin, F.,

and Walla, P. (2015). Using the startle eye-blink to measure affect in players. In

Serious Games Analytics, pages 401–434. Springer.

[Neuper et al., 2003] Neuper, C., Muller, G., Kubler, A., Birbaumer, N., and

Pfurtscheller, G. (2003). Clinical application of an eeg-based brain–computer in-

terface: a case study in a patient with severe motor impairment. Clinical neuro-

physiology, 114(3):399–409.

[Nguyen et al., 2015] Nguyen, A.-T., Chen, W., and Rauterberg, M. (2015). Intelli-

gent presentation skills trainer analyses body movement. In Advances in Computa-

tional Intelligence, pages 320–332. Springer.

[Nichols and Maner, 2008] Nichols, A. L. and Maner, J. K. (2008). The good-subject

effect: Investigating participant demand characteristics. The Journal of general

psychology, 135(2):151–166.

[Norcross et al., 1984] Norcross, J. C., Guadagnoli, E., and Prochaska, J. O. (1984).

Factor structure of the profile of mood states (poms): two partial replications.

Journal of clinical psychology, 40(5):1270–1277.

[Oatley et al., 2006] Oatley, K., Keltner, D., and Jenkins, J. M. (2006). Understanding

emotions. Blackwell publishing.

[Oliveira et al., 2009] Oliveira, F. T., Aula, A., and Russell, D. M. (2009). Discrimi-

nating the relevance of web search results with measures of pupil size. In Proceedings

of the SIGCHI Conference on Human Factors in Computing Systems, pages 2209–

2212. ACM.

[Olson and Olson, 2003] Olson, G. M. and Olson, J. S. (2003). Human-computer in-

teraction: Psychological aspects of the human use of computing. Annual review of

psychology, 54(1):491–516.

[Omata et al., 2012] Omata, M., Moriwaki, K., Mao, X., Kanuka, D., and Imamiya,

A. (2012). Affective rendering: Visual effect animations for affecting user arousal.

In 2012 International Conference on Multimedia Computing and Systems, pages

737–742. IEEE.

BIBLIOGRAPHY 234

[Ortony et al., 1990] Ortony, A., Clore, G. L., and Collins, A. (1990). The cognitive

structure of emotions. Cambridge university press.

[Obrien and Toms, 2013] Obrien, H. L. and Toms, E. G. (2013). Examining the gen-

eralizability of the user engagement scale (ues) in exploratory search. Information

Processing & Management, 49(5):1092–1107.

[Paas and Van Merrienboer, 1994] Paas, F. G. and Van Merrienboer, J. J. (1994).

Instructional control of cognitive load in the training of complex cognitive tasks.

Educational psychology review, 6(4):351–371.

[Pan et al., 2004] Pan, B., Hembrooke, H. A., Gay, G. K., Granka, L. A., Feusner,

M. K., and Newman, J. K. (2004). The determinants of web page viewing behav-

ior: an eye-tracking study. In Proceedings of the 2004 symposium on Eye tracking

research & applications, pages 147–154. ACM.

[Pantic et al., 2007] Pantic, M., Pentland, A., Nijholt, A., and Huang, T. S. (2007).

Human computing and machine understanding of human behavior: a survey. In

Artifical Intelligence for Human Computing, pages 47–71. Springer.

[Park and Kim, 2016] Park, S. M. and Kim, D. S. (2016). Human emotion decoding

using eye tracking and fmri. In Organization for Human Brain Mapping 2016.

Organization for Human Brain Mapping 2016.

[Parrott and Schulkin, 1993] Parrott, W. G. and Schulkin, J. (1993). Neuropsychology

and the cognitive nature of the emotions. Cognition & Emotion, 7(1):43–59.

[Partala et al., 2000] Partala, T., Jokiniemi, M., and Surakka, V. (2000). Pupillary

responses to emotionally provocative stimuli. In Proceedings of the 2000 symposium

on Eye tracking research & applications, pages 123–129. ACM.

[Partala and Surakka, 2003] Partala, T. and Surakka, V. (2003). Pupil size variation

as an indication of affective processing. International journal of human-computer

studies, 59(1):185–198.

[Partala and Surakka, 2004] Partala, T. and Surakka, V. (2004). The effects of af-

fective interventions in human–computer interaction. Interacting with computers,

16(2):295–309.

BIBLIOGRAPHY 235

[Paulhus and Vazire, 2007] Paulhus, D. L. and Vazire, S. (2007). The self-report

method. Handbook of research methods in personality psychology, 1:224–239.

[Pengnate, 2016] Pengnate, S. F. (2016). Measuring emotional arousal in clickbait:

eye-tracking approach.

[Peter, 2010] Peter, P. C. (2010). Emotional intelligence. Wiley International Ency-

clopedia of Marketing.

[Petrantonakis and Hadjileontiadis, 2010] Petrantonakis, P. C. and Hadjileontiadis,

L. J. (2010). Emotion recognition from brain signals using hybrid adaptive fil-

tering and higher order crossings analysis. Affective Computing, IEEE Transactions

on, 1(2):81–97.

[Pfleging et al., 2016] Pfleging, B., Fekety, D. K., Schmidt, A., and Kun, A. L. (2016).

A model relating pupil diameter to mental workload and lighting conditions. In

Proceedings of the 2016 CHI conference on human factors in computing systems,

pages 5776–5788. ACM.

[Philip et al., 2010] Philip, R., Whalley, H., Stanfield, A., Sprengelmeyer, R., Santos,

I., Young, A., Atkinson, A., Calder, A., Johnstone, E., Lawrie, S., et al. (2010).

Deficits in facial, body movement and vocal emotional processing in autism spectrum

disorders. Psychological medicine, 40(11):1919–1929.

[Picard, 1997] Picard, R. W. (1997). Affective computing, volume 252. MIT press

Cambridge.

[Picard, 2003] Picard, R. W. (2003). Affective computing: challenges. International

Journal of Human-Computer Studies, 59(1-2):55–64.

[Picard, 2010] Picard, R. W. (2010). Affective computing: from laughter to ieee. IEEE

Transactions on Affective Computing, 1(1):11–17.

[Plutchik, 1980] Plutchik, R. (1980). A general psychoevolutionary theory of emotion.

Theories of emotion, 1:3–31.

BIBLIOGRAPHY 236

[Plutchik, 2001] Plutchik, R. (2001). The nature of emotions human emotions have

deep evolutionary roots, a fact that may explain their complexity and provide tools

for clinical practice. American Scientist, 89(4):344–350.

[Prendinger and Ishizuka, 2005] Prendinger, H. and Ishizuka, M. (2005). The em-

pathic companion: A character-based interface that addresses users’affective states.

Applied Artificial Intelligence, 19(3-4):267–285.

[Psychlopedia, 2018] Psychlopedia (2018).

[Pusara and Brodley, 2004] Pusara, M. and Brodley, C. E. (2004). User re-

authentication via mouse movements. In Proceedings of the 2004 ACM workshop

on Visualization and data mining for computer security, pages 1–8. ACM.

[Qi et al., 2001] Qi, Y., Reynolds, C., and Picard, R. W. (2001). The bayes point

machine for computer-user frustration detection via pressuremouse. In Proceedings

of the 2001 workshop on Perceptive user interfaces, pages 1–5. ACM.

[Quazi et al., 2012] Quazi, M., Mukhopadhyay, S., Suryadevara, N., and Huang, Y.

(2012). Towards the smart sensors based human emotion recognition. In Instru-

mentation and Measurement Technology Conference (I2MTC), 2012 IEEE Interna-

tional, pages 2365–2370. IEEE.

[Ragot et al., 2017] Ragot, M., Martin, N., Em, S., Pallamin, N., and Diverrez, J.-M.

(2017). Emotion recognition using physiological signals: laboratory vs. wearable

sensors. In International Conference on Applied Human Factors and Ergonomics,

pages 15–22. Springer.

[Raiturkar et al., 2016] Raiturkar, P., Kleinsmith, A., Keil, A., Banerjee, A., and Jain,

E. (2016). Decoupling light reflex from pupillary dilation to measure emotional

arousal in videos. In Proceedings of the ACM Symposium on Applied Perception,

pages 89–96. ACM.

[Rani et al., 2005] Rani, P., Sarkar, N., and Liu, C. (2005). Maintaining optimal chal-

lenge in computer games through real-time physiological feedback. In Proceedings

of the 11th international conference on human computer interaction, volume 58.

BIBLIOGRAPHY 237

[Reeshad Khan, 2017] Reeshad Khan, O. S. (2017). A literature review on emotion

recognition using various methods. Global Journal of Computer Science and Tech-

nology.

[Ren et al., 2013] Ren, P., Barreto, A., Gao, Y., and Adjouadi, M. (2013). Affective

assessment by digital processing of the pupil diameter. Affective Computing, IEEE

Transactions on, 4(1):2–14.

[Ricketts et al., 2013] Ricketts, J., Jones, C. R., Happe, F., and Charman, T. (2013).

Reading comprehension in autism spectrum disorders: The role of oral language and

social functioning. Journal of autism and developmental disorders, 43(4):807–816.

[Rizk et al., 2014] Rizk, Y., Safieddine, M., Matchoulian, D., and Awad, M. (2014).

Face2mus: a facial emotion based internet radio tuner application. In Mediterranean

Electrotechnical Conference (MELECON), 2014 17th IEEE, pages 257–261. IEEE.

[Rosa, 2015] Rosa, P. (2015). What do your eyes say? bridging eye movements to

consumer behavior. International Journal of Psychological Research, 8(2):90–103.

[Ross and Mirowsky, 1984] Ross, C. E. and Mirowsky, J. (1984). Socially-desirable

response and acquiescence in a cross-cultural survey of mental health. Journal of

Health and Social Behavior, pages 189–197.

[Russell, 1994] Russell, J. A. (1994). Is there universal recognition of emotion from

facial expression? a review of the cross-cultural studies. Psychological bulletin,

115(1):102.

[Russell and Mehrabian, 1977] Russell, J. A. and Mehrabian, A. (1977). Evidence for

a three-factor theory of emotions. Journal of research in Personality, 11(3):273–294.

[Russell and Pratt, 1980] Russell, J. A. and Pratt, G. (1980). A description of the

affective quality attributed to environments. Journal of personality and social psy-

chology, 38(2):311.

[San Agustin et al., 2010] San Agustin, J., Skovsgaard, H., Mollenbach, E., Barret,

M., Tall, M., Hansen, D. W., and Hansen, J. P. (2010). Evaluation of a low-cost

open-source gaze tracker. In Proceedings of the 2010 Symposium on Eye-Tracking

Research & Applications, pages 77–80. ACM.

BIBLIOGRAPHY 238

[Sanchez et al., 2018] Sanchez, W., Martinez, A., Hernandez, Y., Estrada, H., and

Gonzalez-Mendoza, M. (2018). A predictive model for stress recognition in desk

jobs. Journal of Ambient Intelligence and Humanized Computing, pages 1–13.

[Sanghoon and Roberto, 2005] Sanghoon, A. L. B. D. W. and Roberto, P. E. S. (2005).

The impact of frustration-mitigating messages delivered by an interface agent. Ar-

tificial intelligence in education: supporting learning through intelligent and socially

informed technology, 125:73.

[Sarrafzadeh et al., 2006] Sarrafzadeh, A., Alexander, S., Dadgostar, F., Fan, C., and

Bigdeli, A. (2006). See me, teach me: Facial expression and gesture recognition for

intelligent tutoring systems. In Innovations in Information Technology, 2006, pages

1–5. IEEE.

[Sarrafzadeh et al., 2008] Sarrafzadeh, A., Alexander, S., Dadgostar, F., Fan, C., and

Bigdeli, A. (2008). how do you know that i dont understand? a look at the future

of intelligent tutoring systems. Computers in Human Behavior, 24(4):1342–1363.

[Savran et al., 2013] Savran, A., Gur, R., and Verma, R. (2013). Automatic detection

of emotion valence on faces using consumer depth cameras. In Proceedings of the

IEEE International Conference on Computer Vision Workshops, pages 75–82.

[Savva and Bianchi-Berthouze, 2011] Savva, N. and Bianchi-Berthouze, N. (2011).

Automatic recognition of affective body movement in a video game scenario. In

International Conference on Intelligent Technologies for interactive entertainment,


[Schlosberg, 1954] Schlosberg, H. (1954). Three dimensions of emotion. Psychological

review, 61(2):81.

[Scholtz, 2006] Scholtz, J. (2006). Metrics for evaluating human information interac-

tion systems. Interacting with Computers, 18(4):507–527.

[Schroder et al., 2005] Schroder, H., Berghaus, N., and Zimmermann, G. (2005). Das

blickverhalten der kunden als grundlage fur die warenplatzierung im lebensmit-

teleinzelhandel. der markt, 44(1):31–43.

BIBLIOGRAPHY 239

[Schuller et al., 2010] Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz,

A., Wendemuth, A., and Rigoll, G. (2010). Cross-corpus acoustic emotion recog-

nition: Variances and strategies. IEEE Transactions on Affective Computing,

1(2):119–131.

[Schwark, 2015] Schwark, J. D. (2015). Toward a taxonomy of affective computing.

International Journal of Human-Computer Interaction, 31(11):761–768.

[Setyati et al., 2012] Setyati, E., Suprapto, Y. K., and Purnomo, M. H. (2012). Fa-

cial emotional expressions recognition based on active shape model and radial basis

function network. In Computational Intelligence for Measurement Systems and Ap-

plications (CIMSA), 2012 IEEE International Conference on, pages 41–46. IEEE.

[Shao et al., 2015] Shao, Z., Roelofs, A., Martin, R. C., and Meyer, A. S. (2015).

Selective inhibition and naming performance in semantic blocking, picture-word

interference, and color–word stroop tasks. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 41(6):1806.

[Sharma et al., 2013] Sharma, N., Dhall, A., Gedeon, T., and Goecke, R. (2013). Mod-

eling stress using thermal facial patterns: A spatio-temporal approach. In Affective

Computing and Intelligent Interaction (ACII), 2013 Humaine Association Confer-

ence on, pages 387–392. IEEE.

[Shelley, 2007] Shelley, K. H. (2007). Photoplethysmography: beyond the calculation

of arterial oxygen saturation and heart rate. Anesthesia & Analgesia, 105(6):S31–

S36.

[Shi et al., 2007] Shi, Y., Ruiz, N., Taib, R., Choi, E., and Chen, F. (2007). Galvanic

skin response (gsr) as an index of cognitive load. In CHI’07 extended abstracts on

Human factors in computing systems, pages 2651–2656. ACM.

[Sidney et al., 2005] Sidney, K. D., Craig, S. D., Gholson, B., Franklin, S., Picard,

R., and Graesser, A. C. (2005). Integrating affect sensors in an intelligent tutoring

system. In Affective Interactions: The Computer in the Affective Loop Workshop

at, pages 7–13.

BIBLIOGRAPHY 240

[Simola et al., 2015] Simola, J., Le Fevre, K., Torniainen, J., and Baccino, T. (2015).

Affective processing in natural scene viewing: Valence and arousal interactions in

eye-fixation-related potentials. NeuroImage, 106:21–33.

[Simon and Nath, 2004] Simon, R. W. and Nath, L. E. (2004). Gender and emotion in

the united states: Do men and women differ in self-reports of feelings and expressive

behavior? 1. American journal of sociology, 109(5):1137–1176.

[Sioni and Chittaro, 2015] Sioni, R. and Chittaro, L. (2015). Stress detection using

physiological sensors. Computer, 48(10):26–33.

[Siraj et al., 2006] Siraj, F., Yusoff, N., and Kee, L. C. (2006). Emotion classification

using neural network. In Computing & Informatics, 2006. ICOCI’06. International

Conference on, pages 1–7. IEEE.

[Sirois and Brisson, 2014] Sirois, S. and Brisson, J. (2014). Pupillometry. Wiley In-

terdisciplinary Reviews: Cognitive Science, 5(6):679–692.

[Slanzi et al., 2017] Slanzi, G., Balazs, J. A., and Velasquez, J. D. (2017). Combining

eye tracking, pupil dilation and eeg analysis for predicting web users click intention.

Information Fusion, 35:51–57.

[Snowden et al., 2016] Snowden, R. J., O’Farrell, K. R., Burley, D., Erichsen, J. T.,

Newton, N. V., and Gray, N. S. (2016). The pupil’s response to affective pic-

tures: Role of image duration, habituation, and viewing mode. Psychophysiology,

53(8):1217–1223.

[Sobkowicz et al., 2012] Sobkowicz, P., Kaschesky, M., and Bouchard, G. (2012).

Opinion mining in social media: Modeling, simulating, and forecasting political

opinions in the web. Government Information Quarterly, 29(4):470–479.

[Soleymani et al., 2008] Soleymani, M., Chanel, G., Kierkels, J. J., and Pun, T.

(2008). Affective ranking of movie scenes using physiological signals and content

analysis. In Proceedings of the 2nd ACM workshop on Multimedia semantics, pages

32–39. ACM.

BIBLIOGRAPHY 241

[Soleymani et al., 2012] Soleymani, M., Pantic, M., and Pun, T. (2012). Multimodal

emotion recognition in response to videos. IEEE transactions on affective computing,

3(2):211–223.

[Sommer et al., 2014] Sommer, N., Hirshfield, L., and Velipasalar, S. (2014). Our

emotions as seen through a webcam. In Foundations of Augmented Cognition. Ad-

vancing Human Performance and Decision-Making through Adaptive Systems, pages

78–89. Springer.

[Steephen et al., 2018] Steephen, J. E., Obbineni, S. C., Kummetha, S., and Bapi,

R. S. (2018). An affective adaptation model explaining the intensity-duration rela-

tionship of emotion. IEEE Transactions on Affective Computing.

[Steinhauer and Hakerem, 1992] Steinhauer, S. R. and Hakerem, G. (1992). The pupil-

lary response in cognitive psychophysiology and schizophrenia. Annals of the New

York Academy of Sciences, 658(1):182–204.

[Steunebrink et al., 2009] Steunebrink, B. R., Dastani, M., and Meyer, J.-J. C. (2009).

The occ model revisited. In Proc. of the 4th Workshop on Emotion and Computing.

[Stieger et al., 2017] Stieger, S., Lewetz, D., and Reips, U.-D. (2017). Can smart-

phones be used to bring computer-based tasks from the lab to the field? a mobile

experience-sampling method study about the pace of life. Behavior Research Meth-

ods.

[Storms and Spector, 1987] Storms, P. L. and Spector, P. E. (1987). Relationships

of organizational frustration with reported behavioural reactions: The moderating

effect of locus of control. Journal of occupational psychology, 60(3):227–234.

[Sun et al., 2010] Sun, F.-T., Kuo, C., Cheng, H.-T., Buthpitiya, S., Collins, P., and

Griss, M. (2010). Activity-aware mental stress detection using physiological sensors.

In International Conference on Mobile Computing, Applications, and Services, pages


[Sweller, 1994] Sweller, J. (1994). Cognitive load theory, learning difficulty, and in-

structional design. Learning and instruction, 4(4):295–312.

BIBLIOGRAPHY 242

[Szasz et al., 2011] Szasz, P. L., Szentagotai, A., and Hofmann, S. G. (2011). The

effect of emotion regulation strategies on anger. Behaviour research and therapy,

49(2):114–119.

[Tangnimitchok et al., 2018] Tangnimitchok, S., Nonnarit, O., Ratchatanantakit, N.,

Barreto, A., Ortega, F. R., Rishe, N. D., et al. (2018). A system for non-intrusive

affective assessment in the circumplex model from pupil diameter and facial ex-

pression monitoring. In International Conference on Human-Computer Interaction,


[Torres-Valencia et al., 2014] Torres-Valencia, C. A., Garcia-Arias, H. F., Lopez, M.

A. A., and Orozco-Gutierrez, A. A. (2014). Comparative analysis of physiological

signals and electroencephalogram (eeg) for multimodal emotion recognition using

generative models. In Image, Signal Processing and Artificial Vision (STSIVA),

2014 XIX Symposium on, pages 1–5. IEEE.

[Tottenham et al., 2009] Tottenham, N., Tanaka, J. W., Leon, A. C., McCarry, T.,

Nurse, M., Hare, T. A., Marcus, D. J., Westerlund, A., Casey, B., and Nelson, C.

(2009). The nimstim set of facial expressions: judgments from untrained research

participants. Psychiatry research, 168(3):242–249.

[Valstar and Pantic, 2012] Valstar, M. F. and Pantic, M. (2012). Fully automatic

recognition of the temporal phases of facial actions. Systems, Man, and Cybernetics,

Part B: Cybernetics, IEEE Transactions on, 42(1):28–43.

[van den Brink et al., 2016] van den Brink, R. L., Murphy, P. R., and Nieuwenhuis,

S. (2016). Pupil diameter tracks lapses of attention. PLoS One, 11(10):e0165274.

[van der Wel and van Steenbergen, 2018] van der Wel, P. and van Steenbergen, H.

(2018). Pupil dilation as an index of effort in cognitive control tasks: A review.

Psychonomic bulletin & review, 25(6):2005–2015.

[Van Gerven et al., 2004] Van Gerven, P. W., Paas, F., Van Merrienboer, J. J., and

Schmidt, H. G. (2004). Memory load and the cognitive pupillary response in aging.


BIBLIOGRAPHY 243

[Van Kleef, 2009] Van Kleef, G. A. (2009). How emotions regulate social life: The

emotions as social information (easi) model. Current directions in psychological

science, 18(3):184–188.

[Van Schaik and Ling, 2008] Van Schaik, P. and Ling, J. (2008). Modelling user ex-

perience with web sites: Usability, hedonic value, beauty and goodness. Interacting

with Computers, 20(3):419–432.

[Vega et al., 2018] Vega, J., Couth, S., Poliakoff, E., Kotz, S., Sullivan, M., Jay, C.,

Vigo, M., and Harper, S. (2018). Back to analogue: Self-reporting for parkinson’s

disease. In Proceedings of the 2018 CHI Conference on Human Factors in Computing

Systems, page 74. ACM.

[Visuri et al., 2018] Visuri, A., Asare, K. O., Kuosmanen, E., Nishiyama, Y., Ferreira,

D., Sarsenbayeva, Z., Gonalves, J., van Berkel, N., Wadley, G., Kostakos, V., Clinch,

S., Matthews, O., Harper, S., Jenkins, A., Snow, S., and m. c. schraefel (2018).

Ubiquitous mobile sensing: Behaviour, mood, and environment. In UbiComp/ISWC

Adjunct, pages 1140–1143. ACM.

[Vizer et al., 2009] Vizer, L. M., Zhou, L., and Sears, A. (2009). Automated stress de-

tection using keystroke and linguistic features: An exploratory study. International

Journal of Human-Computer Studies, 67(10):870–886.

[Volkmar et al., 2014] Volkmar, F., Siegel, M., Woodbury-Smith, M., King, B., Mc-

Cracken, J., State, M., of Child, A. A., et al. (2014). Practice parameter for the as-

sessment and treatment of children and adolescents with autism spectrum disorder.

Journal of the American Academy of Child & Adolescent Psychiatry, 53(2):237–257.

[Wahn et al., 2016] Wahn, B., Ferris, D. P., Hairston, W. D., and Konig, P. (2016).

Pupil sizes scale with attentional load and task experience in a multiple object

tracking task. PloS one, 11(12):e0168087.

[Walker et al., 1990] Walker, H. K., Hall, W. D., and Hurst, J. W. (1990). Cranial

nerves iii, iv, and vi: The oculomotor, trochlear, and abducens nerves. Clinical

methods: the history, physical, and laboratory examinations.

BIBLIOGRAPHY 244

[Wang et al., 2018] Wang, C.-A., Baird, T., Huang, J., Coutinho, J. D., Brien, D. C.,

and Munoz, D. P. (2018). Arousal effects on pupil size, heart rate, and skin con-

ductance in an emotional face task. Frontiers in neurology, 9.

[Wang et al., 2019] Wang, J., Fu, E. Y., Ngai, G., Leong, H. V., and Huang, M. X.

(2019). Detecting stress from mouse-gaze attraction. In Proceedings of the 34th

ACM/SIGAPP Symposium on Applied Computing, pages 692–700. ACM.

[Wang, 2011] Wang, J. T.-y. (2011). Pupil dilation and eye tracking. A handbook

of process tracing methods for decision research: A critical review and users guide,

page 188.

[Wang et al., 2013] Wang, W., Li, Z., Wang, Y., and Chen, F. (2013). Indexing cogni-

tive workload based on pupillary response under luminance and emotional changes.

In Proceedings of the 2013 international conference on Intelligent user interfaces,


[Ward and Marsden, 2004] Ward, R. D. and Marsden, P. H. (2004). Affective comput-

ing: problems, reactions and intentions. Interacting with Computers, 16(4):707–713.

[Watson et al., 1988] Watson, D., Clark, L. A., and Tellegen, A. (1988). Development

and validation of brief measures of positive and negative affect: the panas scales.

Journal of personality and social psychology, 54(6):1063.

[WELFORD, 1973] WELFORD, A. T. (1973). Stress and performance. Ergonomics,

16(5):567–580.

[Wenger et al., 1961] Wenger, M. A., Clemens, T., Coleman, D., Cullen, T., and En-

gel, B. T. (1961). Autonomic response specificity. Psychosomatic medicine.

[Wilder, 1958] Wilder, J. (1958). Modern psychophysiology and the law of initial

value. American Journal of Psychotherapy.

[Wilder, 2014] Wilder, J. (2014). Stimulus and response: The law of initial value.

Elsevier.

[Wolf et al., 2018] Wolf, E., Martinez, M., Roitberg, A., Stiefelhagen, R., and Deml,

B. (2018). Estimating mental load in passive and active tasks from pupil and gaze

BIBLIOGRAPHY 245

changes using bayesian surprise. In Proceedings of the Workshop on Modeling Cog-

nitive Processes from Multimodal Data, page 6. ACM.

[Wolpaw and McFarland, 1994] Wolpaw, J. R. and McFarland, D. J. (1994). Mul-

tichannel eeg-based brain-computer communication. Electroencephalography and

clinical Neurophysiology, 90(6):444–449.

[Wulfert et al., 2005] Wulfert, E., Roland, B. D., Hartley, J., Wang, N., and Franco,

C. (2005). Heart rate arousal and excitement in gambling: winners versus losers.

Psychology of Addictive Behaviors, 19(3):311.

[Xing et al., 2016] Xing, B., Zhang, L., Gao, J., Yu, R., and Lyu, R. (2016). Barrier-

free affective communication in mooc study by analyzing pupil diameter variation.

In SIGGRAPH ASIA 2016 Symposium on Education, page 7. ACM.

[Xu et al., 2015] Xu, C., Feng, Z., and Meng, Z. (2015). Affective experience modelling

based on interactive synergetic dependence in big data. Future Generation Computer

Systems.

[Xu et al., 2016] Xu, C., Feng, Z., and Meng, Z. (2016). Affective experience modeling

based on interactive synergetic dependence in big data. Future Generation Computer

Systems, 54:507–517.

[Xu, 2015] Xu, Q. (2015). Examining user engagement attributes in visual information

search. iConference 2015 Proceedings.

[Yaneva, 2016] Yaneva, V. (2016). Assessing text and web accessibility for people with

autism spectrum disorder. PhD thesis, University of Wolverhampton.

[Yaneva and Evans, 2015] Yaneva, V. and Evans, R. (2015). Six good predictors of

autistic text comprehension. In Proceedings of the International Conference Recent

Advances in Natural Language Processing, pages 697–706.

[Yaneva et al., 2016a] Yaneva, V., Evans, R., and Temnikova, I. (2016a). Predicting

reading difficulty for readers with autism spectrum disorder. In Proceedings of

Workshop on Improving Social Inclusion using NLP: Tools and Resources (ISI-NLP)

held in conjunction with LREC.

BIBLIOGRAPHY 246

[Yaneva et al., 2018] Yaneva, V., Ha, L. A., Eraslan, S., Yesilada, Y., and Mitkov, R.

(2018). Detecting autism based on eye-tracking data from web searching tasks. In

Proceedings of the Internet of Accessible Things, page 16. ACM.

[Yaneva et al., 2017] Yaneva, V., Orasan, C., Evans, R., and Rohanian, O. (2017).

Combining multiple corpora for readability assessment for people with cognitive

disabilities. In Proceedings of the 12th Workshop on Innovative Use of NLP for

Building Educational Applications, pages 121–132.

[Yaneva et al., 2015] Yaneva, V., Temnikova, I., and Mitkov, R. (2015). Accessible

texts for autism: An eye-tracking study. In Proceedings of the 17th International

ACM SIGACCESS Conference on Computers & Accessibility, pages 49–57. ACM.

[Yaneva et al., 2016b] Yaneva, V., Temnikova, I. P., and Mitkov, R. (2016b). A corpus

of text data and gaze fixations from autistic and non-autistic adults. In LREC.

[Yaneva et al., 2016c] Yaneva, V., Temnikova, I. P., and Mitkov, R. (2016c). Evaluat-

ing the readability of text simplification output for readers with cognitive disabilities.

In LREC.

[Yang and Chen, 2012] Yang, Y.-H. and Chen, H. H. (2012). Machine recognition of

music emotion: A review. ACM Transactions on Intelligent Systems and Technology

(TIST), 3(3):40.

[Yazdani et al., 2012] Yazdani, A., Lee, J.-S., Vesin, J.-M., and Ebrahimi, T. (2012).

Affect recognition based on physiological changes during the watching of music

videos. ACM Transactions on Interactive Intelligent Systems, 2(1):1–26.

[Zaalberg et al., 2004] Zaalberg, R., Manstead, A., and Fischer, A. (2004). Relations

between emotions, display rules, social motives, and facial behaviour. Cognition and

Emotion, 18(2):183–207.

[Zeng et al., 2008] Zeng, Z., Pantic, M., Roisman, G. I., and Huang, T. S. (2008). A

survey of affect recognition methods: Audio, visual, and spontaneous expressions.

IEEE transactions on pattern analysis and machine intelligence, 31(1):39–58.

BIBLIOGRAPHY 247

[Zhai and Barreto, 2006] Zhai, J. and Barreto, A. (2006). Stress detection in com-

puter users based on digital signal processing of noninvasive physiological variables.

In Engineering in Medicine and Biology Society, 2006. EMBS’06. 28th Annual In-

ternational Conference of the IEEE, pages 1355–1358. IEEE.

[Zhang et al., 2018] Zhang, H., Gashi, S., Kimm, H., Hanci, E., and Matthews, O.

(2018). Moodbook: An application for continuous monitoring of social media usage

and mood. In UbiComp/ISWC Adjunct, pages 1150–1155. ACM.

[Zhang et al., 2014] Zhang, H., Zhu, Y., Maniyeri, J., and Guan, C. (2014). Detection

of variations in cognitive workload using multi-modality physiological sensors and a

large margin unbiased regression machine. In Engineering in Medicine and Biology

Society (EMBC), 2014 36th Annual International Conference of the IEEE, pages

2985–2988. IEEE.

List of Acronyms

AFA Algorithm for sensing Arousal and Focal Attention. 93

BVP Blood Volume Pressure. 51, 188

ECG Electrocardiography. 34, 39, 73–75, 82–84

EEG Electroencephalogram. 18, 34, 39, 60, 189

EMG Electromyography. 18, 34, 40, 59

GSR Galvanic Skin Response. 18, 34, 39, 51, 59, 188

HR Heart Rate. 18, 34, 39, 59, 188

KD Keystroke Dynamics. 33

MD Mouse Dynamics. 33

NLP Natural Language Processing. 33

PANAS Positive Affect and Negative Affect scales. 35

POMS Profile Of Mood State. 35

PPG Photoplethysmogram. 39, 41

SAM Self-Assessment Manikin. 35

ST Skin Temperature. 34, 51

STA Scanpath Trending Analysis. 27

VAS Visual analogue scale. 35

248

Appendix A

Sensing Emotionally Evoked

Arousal

A.1 Participant information sheet

249

02/06/2017, Version 3.0


Emotion Sensing Using Pupil Dilation and Eye tracking

Participant Information Sheet

You are being invited to take part in a research study as part of a PhD study to use pupil dilation and

eye tracking to measure emotions in interactive systems. Before you decide, it is important for you to

understand why the research is being done and what it will involve. Please take time to read the

following information carefully and discuss it with others if you wish. Please ask if there is anything

that is not clear or if you would like more information. Take time to decide whether or not you wish to

take part. Thank you for taking the time to read this.

Who will conduct the research?


What is the purpose of the research?

The aim of this research is to build an algorithm for measuring emotions. This experiment serves as a

means to evaluate the accuracy of our algorithm using already rated stimulus.

Why have I been chosen?

We are inviting members of the public to take part in this study so that we can evaluate our algorithm.

What would I be asked to do if I took part?

You will be asked to look at a set of pictures for as long as you would normally do. An eye tracker,

located on the monitor will capture your pupil size and gaze data. Afterwards, you will be required to

rate the images according to how you feel about them.

What happens to the data collected?

Electronic data will be stored securely on a computer. Written information will be stored in a locked

drawer. The data will be analyzed and the results will be used in preparation for my dissertation.

How is confidentiality maintained?

Data will be made anonymous. The personal data collected is for consent alone and no one will be

able to match it with the data collected by the eye tracker. Furthermore, the consent form will be kept

separately in a secure file cabinet separate from the data collected from the eye tracker which will be

stored on a secure server within the University of Manchester.

What happens if I do not want to take part or if I change my mind?

It is up to you to decide whether or not to take part. If you do decide to take part, you will be given this

information sheet to keep and be asked to sign a consent form. If you decide not to take part, it is up

to you and there are no adverse consequences to you for this decision. If you decide to take part, you

are still free to withdraw at any point during the experiment without giving a reason and without

detriment to yourself. At any point you decide not to partake in the study (before or during the study),

your data will not be stored and no record of this will be taken.

APPENDIX A. SENSING EMOTIONALLY EVOKED AROUSAL 251

A.2 Consent form

02/06/2017, Version 3.0

Emotion Sensing Using Pupil Dilation and Eye tracking

CONSENT FORM

If you are happy to participate please complete and sign the consent form below.

Please initial box

Participant No.:

Gender: Age: Profession: Highest level of qualification:

I agree to take part in the above project

Name of participant

Date Signature

Name of researcher

Date Signature

This Project Has Been Approved by the University of Manchester’s Research Ethics Committee [2017-1906-3160].

1. I confirm that I have read the attached information sheet on the above project and have had the opportunity to consider the information and ask questions and had these answered satisfactorily.

2. I understand that my participation in the study is voluntary and that I am free to withdraw at any time without giving a reason and without detriment to my treatment/service/self and my data will not be stored.

3. I understand that my personal data will remain confidential and untraceable to the electronic data recorded.

APPENDIX A. SENSING EMOTIONALLY EVOKED AROUSAL 253

A.3 Post-study questionnaire

Please rate the images you just viewed according to how much arousal (stress, anxiety, cognitive load,

fear, excitement) you felt.

Dog

1-Very low 2- Low 3- Medium 4- high 5-Very high

Basket


Woman


Couple


Roller coaster


Woman carrying baby


Dirty foot


Fallen Boxer


Boy


Lamp


Bear


Mother & baby


Appendix B

Sensing Cognitively Induced

Arousal

B.1 Participant information sheet

255


Effects of cognitive tasks on Pupil Dilation and Eye movement


You are being invited to take part in a research study. Before you decide it is important for you to


following information carefully and discuss it with others if you wish. Please ask if there is

anything that is not clear or if you would like more information. Take time to decide whether or not

you wish to take part. Thank you for reading this.



Title of the research.



We are inviting members of the public to take part in the study so that we can understand how

people react to cognitive activities.

What would I be asked to do if I take part?

You will be asked to view 4 pictures containing animals and coloured texts. Your task is to

verbally say what they are. In some cases, there will be textual cues to what the objects are. You

are still expected to say what the objects or the colours they really are and NOT what the text

says. You don't have a time limit to complete the task but you should do so as quick as you are

APPENDIX B. SENSING COGNITIVELY INDUCED AROUSAL 257

B.2 Consent form



CONSENT FORM

If you are happy to participate please complete and sign the consent form below

PleaseInitialBox

1 I confirm that I have read the attached information sheet on theabove project and have had the opportunity to consider theinformation and ask questions and had these answeredsatisfactorily.

2 I understand that my participation in the study is voluntary andthat I am free to withdraw at any time without giving a reason.

3 I understand that the session will be audio and video recordedand an eye-tracker will be used.

4 I agree to the use of anonymous quotes.

I agree to take part in the above project.

Name of participant Date Signature

Name of person taking consent Date Signature

Appendix C

Sensing Frustration on The Web

C.1 Participant information sheet

259

24/04/2018, Version 1.0


Sensing frustration in end-user tasks from pupillary response


You are being invited to take part in a research study as part of a PhD study to use pupil dilation and

eye tracking to measure emotions in interactive systems. Before you decide, it is important for you to


following information carefully and discuss it with others if you wish. Please ask if there is anything

that is not clear or if you would like more information. Take time to decide whether or not you wish to

take part. Thank you for taking the time to read this.



What is the purpose of the research?

The aim of this research is to build an algorithm for measuring emotions. This experiment serves as a

means to evaluate the accuracy of our algorithm using already end user tasks.


We are inviting members of the public to take part in this study so that we can evaluate our algorithm.

What would I be asked to do if I took part?

You will be asked to perform 4 tasks

1. To book trips using National express web platform

2. To carry out searches on Google search engine

3. To search the biographies of some people on Wikipedia

4. To check the weather on BBC website.

An eye tracker, located on the monitor will capture your pupil size and gaze data. After these, you will

be required to rate the difficulty of your tasks, and optionally, qualitative feedback on your experience.

What happens to the data collected?

Electronic data will be stored securely on a computer. Written information will be stored in a locked

drawer. The data will be analyzed and the results will be used in preparation for my dissertation.

How is confidentiality maintained?

Data will be made anonymous. The personal data collected is for consent alone and no one will be

able to match it with the data collected by the eye tracker. Furthermore, the consent form will be kept

separately in a secure file cabinet separate from the data collected from the eye tracker which will be

stored on a secure server within the University of Manchester.

What happens if I do not want to take part or if I change my mind?

It is up to you to decide whether or not to take part. If you do decide to take part, you will be given this

information sheet to keep and be asked to sign a consent form. If you decide not to take part, it is up

to you and there are no adverse consequences to you for this decision. If you decide to take part, you

are still free to withdraw at any point during the experiment without giving a reason and without

APPENDIX C. SENSING FRUSTRATION ON THE WEB 261

C.2 Consent form

24/04/2018, Version 1.0

Sensing frustration in end-user tasks from pupillary response CONSENT FORM

If you are happy to participate please complete and sign the consent form below.

Please initial box

Participant No.:

Gender: Age: Profession: Highest level of qualification:

I agree to take part in the above project

Name of participant

Date Signature

Name of researcher

Date Signature

This Project Has Been Approved by the University of Manchester’s Research Ethics Committee [2018-4365-5934].

1. I confirm that I have read the attached information sheet on the above project and have had the opportunity to consider the information and ask questions and had these answered satisfactorily.

2. I understand that my participation in the study is voluntary and that I am free to withdraw at any time without giving a reason and without detriment to my treatment/service/self and my data will not be stored.

3. I understand that my personal data will remain confidential and untraceable to the electronic data recorded.


C.3 Post-study questionnaire

How did these tasks make you feel?

Weather in Manchester

Normal Frustrated

Weather in London

Normal Frustrated

Alan Turin's thesis

Normal Frustrated

Stephen Hawkin's Thesis

Normal Frustrated

Time in Canberra (Australia)

Normal Frustrated

Time in Ottawa (Canada)

Normal Frustrated

Trip to Manchester

Normal Frustrated

Trip to London

Normal Frustrated


.........