SENSING PHYSIOLOGICAL AROUSAL
AND VISUAL ATTENTION DURING
USER INTERACTION
A thesis submitted to the University of Manchester
for the degree of Doctor of Philosophy
in the Faculty of Science and Engineering
2019
Oludamilare Matthews
School of Computer Science
Contents
Abstract 11
Declaration 12
Copyright Statement 13
Acknowledgements 14
1 Introduction 15
1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.1 Research Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.2 Rationale for our research methodology . . . . . . . . . . . . . . 22
1.4 Contributions and research outputs . . . . . . . . . . . . . . . . . . . . 23
1.4.1 Artefacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4.2 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.5 Research statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.6 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Background and related work 28
2.1 Affective computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2 Progress in the lab, limited progress under practical settings . . . . . . 31
2.3 Affect detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Selecting an affect detection mechanism . . . . . . . . . . . . . . . . . . 44
2.5 Review of affect detection mechanisms . . . . . . . . . . . . . . . . . . 46
2.5.1 Query Construction . . . . . . . . . . . . . . . . . . . . . . . . . 46
2
2.5.2 Exclusion criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.5.3 Inclusion Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.5.4 Quality assessment . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.5.6 Synthesis of related work . . . . . . . . . . . . . . . . . . . . . . 50
2.6 Rationale for pupillary response . . . . . . . . . . . . . . . . . . . . . . 60
2.7 Representing affect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.8 Applications of affective computing . . . . . . . . . . . . . . . . . . . . 63
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3 Development of AFA algorithm 68
3.1 Pupillometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2 Exploring pupillary response data . . . . . . . . . . . . . . . . . . . . . 73
3.3 Description of pupil dilation . . . . . . . . . . . . . . . . . . . . . . . . 79
3.4 Existing approaches to the analysis of pupil data . . . . . . . . . . . . . 80
3.5 Iterative development of AFA algorithm . . . . . . . . . . . . . . . . . . 82
3.5.1 Study 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.5.2 Study 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.7 Visualising the output of AFA algorithm . . . . . . . . . . . . . . . . . . 94
3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4 Evaluating AFA algorithm 104
4.1 Rationale and motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.2 Sensing emotionally evoked arousal . . . . . . . . . . . . . . . . . . . . 106
4.2.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.2.2 Limitations from our analysis of emotional stimuli . . . . . . . . 113
4.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.3 Sensing cognition-induced arousal . . . . . . . . . . . . . . . . . . . . . 117
4.3.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.3.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.3.3 Lessons learnt from the analysis of cognitive stimuli . . . . . . . 124
4.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3
5 Sensing frustration-induced arousal on the Web 128
5.1 Why frustration? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.2 Related works on sensing frustration in interactive systems . . . . . . . 130
5.3 Research contributions through this study . . . . . . . . . . . . . . . . 132
5.4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.4.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.4.2 Materials and procedure . . . . . . . . . . . . . . . . . . . . . . 133
5.4.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.4.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.6.1 Limitations of this study . . . . . . . . . . . . . . . . . . . . . . 148
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6 Arousal detection and Scanpath analysis 150
6.1 Motivation behind our methodology . . . . . . . . . . . . . . . . . . . . 151
6.2 Pilot study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.3 Formation of the methodology . . . . . . . . . . . . . . . . . . . . . . . 155
6.3.1 Autism and the web . . . . . . . . . . . . . . . . . . . . . . . . 158
6.4 Research questions and contributions through this study . . . . . . . . 159
6.5 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
6.5.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.5.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.5.3 Materials and Method . . . . . . . . . . . . . . . . . . . . . . . 161
6.5.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
6.5.5 Visualizing our visual behaviour model . . . . . . . . . . . . . . 166
6.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.6.1 Analysis of the Web pages by their AOIs . . . . . . . . . . . . . 168
6.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.8 Limitations of the AFA algorithm-STA methodology . . . . . . . . . . . 185
6.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4
7 Discussion and Conclusion 187
7.1 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
7.2 Design and methodological implications . . . . . . . . . . . . . . . . . . 191
7.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
7.4.1 User evaluation of our visualisation toolkit . . . . . . . . . . . . 195
7.4.2 The impact of light on AFA algorithm . . . . . . . . . . . . . . . 195
7.4.3 Combining AFA algorithm with other affect detection mechanisms 196
7.4.4 Optimizing AFA algorithm for real-time arousal sensing . . . . . 196
7.4.5 Extending AFA algorithm for adaptive systems . . . . . . . . . . 196
7.4.6 Utilizing AFA algorithm on mobile devices . . . . . . . . . . . . . 197
7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Bibliography 203
A Sensing Emotionally Evoked Arousal 249
A.1 Participant information sheet . . . . . . . . . . . . . . . . . . . . . . . 249
A.2 Consent form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
A.3 Post-study questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . 253
B Sensing Cognitively Induced Arousal 255
B.1 Participant information sheet . . . . . . . . . . . . . . . . . . . . . . . 255
B.2 Consent form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
C Sensing Frustration on The Web 259
C.1 Participant information sheet . . . . . . . . . . . . . . . . . . . . . . . 259
C.2 Consent form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
C.3 Post-study questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Word count xxxxx
5
List of Tables
2.1 Comparison of affect detection mechanisms . . . . . . . . . . . . . . . . 47
2.2 Affect detection mechanisms and their accuracies . . . . . . . . . . . . 48
2.3 Related work in theoretical findings, applications, methods or sensors
used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.1 Comparison of eye-tracking vendors . . . . . . . . . . . . . . . . . . . . 70
3.2 Statistical description of the pupil diameter . . . . . . . . . . . . . . . 75
3.3 Matrix showing the Pearson’s correlation of statistical feature. (L - left,
R - right, W - window, std - standard deviation) . . . . . . . . . . . . . 98
3.4 Matrix comparing our predictor variables(Exerpeince, Accuracy, Time
spent, Difficulty), and the total stress score with our outcome variable
(No. of Arousal points - peaks) . . . . . . . . . . . . . . . . . . . . . . 99
3.5 Expected arousal level (M ) vs. computed arousal level (Output) . . . 100
4.1 IAPS Stimuli showing the description, arousal, dominance and valence
values of each stimulus. . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.2 Correlation between the mean IAPS arousal rating, self-reported rating
and the algorithm’s arousal level (scaled between 1 and 5). . . . . . . . 110
4.3 Stimuli and expected arousal levels . . . . . . . . . . . . . . . . . . . . 119
5.1 Experimental tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.2 Results of Wilcoxon test comparing arousal between each task with
Bonferroni correction α = 0.008 . . . . . . . . . . . . . . . . . . . . . . 140
5.3 Results of Wilcoxon test comparing the mode of interaction within each
task with Bonferroni correction α = 0.0125 . . . . . . . . . . . . . . . . 141
6.1 Cumulative arousal per AOI on the Apple home page . . . . . . . . . . 155
6
6.2 Results of Mann Whitney U test comparing arousal between each group
(autistc and neurotypical) with Bonferroni correction α = 0.00625 . . . 167
6.3 Scan path Sequence (Seq), participants (n) with change in arousal
level per AOI , mean arousal (M ) for the participants and standard
deviation (SD) for the controlled and autistic group (ASD). NB: The
gaps in the table exist where there are fewer elements making up the
trending scanpath over a website for that group . . . . . . . . . . . . . 168
7
List of Figures
1.1 The process flow of affective computing in an adaptive system . . . . . 16
2.1 Psychological states, by duration [Bakhtiyari et al., 2014] . . . . . . . . 30
2.2 Number of results retrieved from Google Scholar, by year. . . . . . . . 31
2.3 EEG device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4 Facial expressions for discrete emotions . . . . . . . . . . . . . . . . . . 61
2.5 2D representation of emotion . . . . . . . . . . . . . . . . . . . . . . . 62
2.6 Plutchik’s emotion wheel and cone representation of Plutchik’s emotions 63
3.1 Setup of eye tracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.2 Areas of interest overlaid on a 12-lead ECG . . . . . . . . . . . . . . . 74
3.3 Distribution plot on the left pupil diameter (mm) of correct participants 76
3.4 Plot of pupillary response (mm) against time (ms). . . . . . . . . . . . 78
3.5 Plot of pupillary response (mm) against time (ms) after applying smooth-
ing function with window size (d) = 1, 3, 5, 10. . . . . . . . . . . . . . 78
3.6 Heatmap to illustrate our predictor variables(Exerpeince, Accuracy,
Time spent, Difficulty), and the total stress score with our outcome
variable (No. of Arousal points - peaks . . . . . . . . . . . . . . . . . . 86
3.7 Areas of interest overlaid on a Protege’s UI . . . . . . . . . . . . . . . . 87
3.8 Top - input event, Bottom - probability of change point . . . . . . . . . 89
3.9 Execution flow of our arousal detection approach . . . . . . . . . . . . 90
3.10 Graph of Arousal Level against time (s) . . . . . . . . . . . . . . . . . . 90
3.11 A comparison of the raw pupil dilation extracted from the eye tracker,
with the processed arousal signal, after converting to arousal levels . . . 101
3.12 Arousal explorer tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.13 Arousal toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8
3.14 Modes of visualisation in our arousal toolkit . . . . . . . . . . . . . . . 103
4.1 Gaze behaviour across all 12 stimuli) . . . . . . . . . . . . . . . . . . . 111
4.2 Stimuli against the algorithm’s arousal rating, participant’s reported
feedback, and the IAPS arousal ratings . . . . . . . . . . . . . . . . . . 112
4.3 Correlation between the accuracy of the algorithm and the minimum
task duration per participant . . . . . . . . . . . . . . . . . . . . . . . . 114
4.4 Correlation between the accuracy of the algorithm and the maximum
tasks allowed per participant . . . . . . . . . . . . . . . . . . . . . . . . 114
4.5 Stimuli against the algorithm’s arousal rating, participants’ reported
feedback, and the IAPS arousal ratings . . . . . . . . . . . . . . . . . . 115
4.6 Stimuli for stroop’s effect . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.7 Heatmap showing the aggregated fixation on AOIs of each stimulus . . 120
4.8 Bar chart showing the total fixation count mean, for congruent and
incongruent object naming across all stimuli . . . . . . . . . . . . . . . 120
4.9 Bar chart showing the total fixation duration mean (s) for congruent
and incongruent object naming across all simulli . . . . . . . . . . . . . 121
4.10 Box plot showing the data distribution of the output of the algorithm
for each stimulus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.11 Violin plot showing the data distribution of the output of the algorithm
for each stimulus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.1 Disruption to tasks to elicit frustration: T1. Time-out experienced
when booking a trip, T2. Mouse location altered when selecting weather
information, T3. Operating system error during Google search and T4.
Multiple Pops ups interrupting Wikipedia content lookup . . . . . . . . 137
5.2 Violin plot of the data distribution of the level of arousal in all tasks
for both groups (disruptive and normal) . . . . . . . . . . . . . . . . . 140
5.3 Bar chat with error bars (Standard error of the mean) showing the tasks
(both modes of interaction combined) vs level of arousal . . . . . . . . 141
5.4 Bar chat with error bars (Standard error of the mean) showing all tasks
vs level of arousal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.5 Number of fixations vs level of arousal (all observations n=75) . . . . . 145
9
6.1 Apple home page segmented into AOIs . . . . . . . . . . . . . . . . . . 153
6.3 Levels of arousal from the austistic and neurotypical group per AOI for
the Whatsapp Web page . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.4 Levels of arousal from the austistic and neurotypical group per AOI for
the Amazon Web page . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.5 Levels of arousal from the austistic and neurotypical group per AOI for
the Wordpress Web page . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.6 Levels of arousal from the austistic and neurotypical group per AOI for
the Netflix Web page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.7 Levels of arousal from the austistic and neurotypical group per AOI for
the BBC Web page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.8 Levels of arousal from the austistic and neurotypical group per AOI for
the YouTube Web page . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.9 Levels of arousal from the austistic and neurotypical group per AOI for
the Adobe Web page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.10 Levels of arousal from the austistic and neurotypical group per AOI for
the Outlook Web page . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.11 Levels of arousal from each group’s trending scan path, overlaid on the
AOI’s of each Website. . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
10
The University of Manchester
Oludamilare Matthews
Doctor of Philosophy
SENSING PHYSIOLOGICAL AROUSAL AND VISUAL ATTENTION
DURING USER INTERACTION
October 30, 2019
Arousal is a psychophysiological state that is characterised by increased attention and
alertness. Arousal detection is paramount during user interaction because arousal
influences perception, cognition and performance, which all have significant impacts
on user experience (UX). Self-reported means of measuring arousal are manual and
prone to bias. Behavioural modes of sensing arousal, such as the analysis of voice
prosody, keystroke dynamics and body gestures yield inconsistent results when ap-
plied in different applications. Physiological sensors for detecting arousal such as
electroencephalograms and galvanic skin response are sensitive to confounding fac-
tors like motion and temperature. Recent studies have leveraged multimodal arousal
detection to improve detection accuracy. However, due to the cost of purchasing addi-
tional sensors, skills to set them up and the availability of all the sensors, multimodal
arousal detection has limited potential for widespread use. These modes of arousal
detection also provide limited visual context about users’ measure of arousal. We use
eye trackers to collect pupillary response and gaze behaviour data. The analysis of
pupillary response is used to sense changes in arousal while gaze detection reveals the
visual context, i.e., the user’s focal attention during moments of increased arousal.
To improve generalisability, our approach was developed and evaluated using multiple
eye-tracking datasets containing known causes of arousal. Despite the limitation of
our approach (i.e., sensitivity to light changes), results suggest that our approach can
be used to sense arousal of several forms (cognitive load, emotional and frustration).
Furthermore, our unimodal approach detects users’ focal attention during moments of
increased arousal. As web cameras with eye-tracking abilities become more accessi-
ble, there is increased potential for the widespread use of our technique in the wild.
Unobtrusive arousal sensing opens up opportunities for UX researchers, UI designers
and software developers in adaptive computing, affective gaming, intelligent tutoring
systems, user modelling and recommender systems.
11
Declaration
No portion of the work referred to in the thesis has been
submitted in support of an application for another degree
or qualification of this or any other university or other
institute of learning.
12
Copyright Statement
i. The author of this thesis (including any appendices and/or schedules to this thesis)
owns certain copyright or related rights in it (the “Copyright”) and s/he has given
The University of Manchester certain rights to use such Copyright, including for
administrative purposes.
ii. Copies of this thesis, either in full or in extracts and whether in hard or electronic
copy, may be made only in accordance with the Copyright, Designs and Patents
Act 1988 (as amended) and regulations issued under it or, where appropriate, in
accordance with licensing agreements which the University has from time to time.
This page must form part of any such copies made.
iii. The ownership of certain Copyright, patents, designs, trade marks and other intel-
lectual property (the “Intellectual Property”) and any reproductions of copyright
works in the thesis, for example graphs and tables (“Reproductions”), which may
be described in this thesis, may not be owned by the author and may be owned by
third parties. Such Intellectual Property and Reproductions cannot and must not
be made available for use without the prior written permission of the owner(s) of
the relevant Intellectual Property and/or Reproductions.
iv. Further information on the conditions under which disclosure, publication and com-
mercialisation of this thesis, the Copyright and any Intellectual Property and/or
Reproductions described in it may take place is available in the University IP Policy
(see http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=487), in any rele-
vant Thesis restriction declarations deposited in the University Library, The Univer-
sity Library’s regulations (see http://www.manchester.ac.uk/library/aboutus/regul-
ations) and in The University’s Policy on Presentation of Theses.
13
Acknowledgements
First and foremost, I would like to thank God for seeing me through this program.
I appreciate Dr Simon Harper, my supervisor and Dr Markel Vigo, my co-supervisor,
who gave me their time, sharing their vast knowledge, believing in me, and guiding me
through this great endeavour. I am privileged to be led by both of you to this great
potential achievement.
I thank the members of the Interaction Analysis and Modelling (IAM) lab for listening
to my presentations, providing constructive criticism, sharing ideas, and making the
lab a conducive place to study. I especially like to thank Alan Davies for dropping
brilliant ideas and providing technical support to my work. Thank you to Julio, Julia
to Rob for being co-reviewers for my systematic reviews. I appreciate Aitor for always
being resourceful and Manuelle for helping me out with statistics. I like to appreciate
Dr Sarah Clinch, Dr Jorge Goncalves and Zhanna for facilitating my research visit to
Melbourne, Australia, where I was able to collaborate with and gain exposure to fellow
researchers in the domain. To my ever-loving wife, Bubu, thank you for supporting
me and being understanding, especially at times that I had to work late at night. I am
deeply indebted to you. I especially thank my family Dr (Gen.) Olusegun Matthew,
Mrs Gloria Matthew, Tomi and Luwa for their encouragement and backing all through
my study. I like to thank the family of Engr. and Mrs Cole, for their love, prayers,
gifts and visits during the period of my study. I also like to thank the Puka-Chaps
family, as Manchester would not have felt the same without you. I wish to appreciate
the leadership and members of The Grateful Church (TGC) for their prayers, spiritual
guidance and communal fellowship.
Finally, I like to thank the National Information Technology Development Agency
(NITDA) for funding my PhD program.
14
Chapter 1
Introduction
The quality of user interaction is often measured using metrics such as error rates,
task completion times, dwell time, fixation and saccades [Mullins and Treu, 1991].
These metrics do not fully account for the subjective experience of the users, i.e., their
emotion [Scholtz, 2006]. Affective computing is a domain in HCI and psychology that
relates to the detecting and understanding of the users’ emotional state to improve the
quality of their user interaction [Picard, 1997]. An application of affective computing
is adaptive systems [Dalvand and Kazemifard, 2012]. In adaptive systems, the content,
structure or layout of a website is altered based on the users’ affective response, to
induce a more desirable emotional state [Sommer et al., 2014]. For adaptive computing
to be effective, certain events need to initiate the adaptive engine [Sommer et al., 2014].
For instance, an adaptive system based on physiological sensors. When the arousal
level of a user reaches a certain threshold while fixating on a text with small font,
the font size could be magnified to ease the discomfort of the user [Liu et al., 2014].
Adaptive systems in intelligent tutoring systems could be utilised in situations where
difficult questions cause users to experience frustration-induced arousal [Merrill et al.,
1992]. For this, an adaptive difficulty system could be deployed by setting triggers
where the adaptive engine fetches a less difficult question so that the user does not
drop out [Liu et al., 2009]. Figure 1.1 illustrates the process flow of an affect-enabled
adaptive system. The process begins with affect detection, where emotions are sensed,
then, the context in which the user experiences emotion is identified. Based on this
context, an intervention is carried out to transform the users’ emotional state into a
more desirable one. Most adaptive systems are cyclic because emotions are sensed
15
CHAPTER 1. INTRODUCTION 16
Figure 1.1: The process flow of affective computing in an adaptive system
again, to evaluate the impact of the previous intervention [Dalvand and Kazemifard,
2012]. Since the process often begins and ends with affect detection, detecting affect
accurately and in an ecologically valid manner is of great value. Affect detection could
be used to prevent undesired outcomes such as fatal human errors, cognitive overload,
disinterested users and extreme emotional states (e.g., boredom and frustration) [Liu
and Joines, 2012]. The scope of our research is therefore limited to sensing
arousal during user interaction.
Emotions can be represented in several forms. Albert Mehrabian et al. proposed the
Pleasure-Arousal-Dominance (PAD) psychological model of emotions. In this model,
arousal represents the intensity of the emotion. Pleasure refers to hedonic (valence)
nature of the emotion, while dominance indicates whether it is a dominant emotion
like anger or a submissive one like fear [Mehrabian, 1996]. Arousal can be used as a
proxy to sense frustration, stress, anxiety, alertness attention and interest, which are
important factors during user interaction [Russell and Pratt, 1980]. We focused our
affect detection on the arousal component of emotions due to its impact on the quality
of user experience [Van Schaik and Ling, 2008].
Psychlopedia, a web-based encyclopaedia for psychology defines physiological arousal
as “neural, hormonal, visceral, and muscular changes that happen in the body when
it is emotionally stimulated” [Psychlopedia, 2018]. Physiological arousal is the state
of being alert to increased perception [Pengnate, 2016]. The sympathetic nervous
system, responsible for fight or flight, is activated upon an increase in arousal [Bradley
CHAPTER 1. INTRODUCTION 17
et al., 2017]. Therefore, any measure from physiological indicators such as sweat on
the skin, respiration, blood circulation and pupil size can be a proxy for measuring
arousal [Torres-Valencia et al., 2014]. However, there are many challenges in detecting
arousal during human interaction.
1.1 Problem statement
Arousal detection is researched extensively in affective computing literature, and sev-
eral results report high accuracy in detection [Calvo and D’Mello, 2010]. In spite of
the numerous techniques with high accuracy reported in the literature, the challenge
lies in integrating these solutions to fit real-world applications [Reeshad Khan, 2017].
An underlying issue is the choice of the affect detection mechanism, hence, our top-
down approach. For example, the most simple approach is self-reported emotions.
Self-reported mechanisms are carried out using diaries, questionnaires, surveys, e.t.c.
Even in digital self-reported approaches, people are still required to input their emo-
tions [Vega et al., 2018]. The manual approach increases the likelihood of introducing
bias. Biases include acquiescence [Ross and Mirowsky, 1984], demand characteristics
[Nichols and Maner, 2008], extreme responding [Hui and Triandis, 1989] and social
desirability [Fisher, 1993]. Also, there is an increased likelihood for the user to drop
out in longitudinal use, and the limited potential for use in interactive systems [Stieger
et al., 2017]. Despite the limitations in self-report, they are still regarded as the golden
standard for evaluating the accuracy other affect detection mechanisms in controlled
settings [Lang, 2005]. In all our experiments, we will adhere to this practice of cap-
turing self-report as it is still regarded as the golden standard for evaluating affect
detection mechanisms [Broekens and Brinkman, 2013]. In addition, we cross-validated
our self-reported measure against controlled tasks to assess and confirm the validity of
our ground truth. The limitations of self-report mentioned above, have led researchers
to investigate automatic ways of detecting arousal.
Automatic arousal sensing can be classified into two: 1. Behavioural/physical ap-
proaches and 2. Physiological approaches. In both classes, they require sensors to
capture the users’ response, and an algorithm running on a computer, to extract the
affective signal. Behavioural/physical approaches utilise computer peripherals to act as
CHAPTER 1. INTRODUCTION 18
sensors. Examples of behavioural/physical sensors include audio interface (voice) [Er-
dem and Sert, 2014], mouse (mouse motion dynamics), keyboard (keystroke dynamics)
[Kolakowska, 2013] and camera (gestures and facial expression) [Setyati et al., 2012].
The challenge is that computer peripherals are used in application-specific ways and
may yield inconsistent responses when deployed in other applications, which is a limita-
tion to its external validity. For example, not all applications require an audio, mouse
or keyboard interface. Also, emotional reactions of low intensity may not be captured
by facial expressions and body gestures. Physiological mechanisms make use of de-
vices that are less ubiquitous such as electroencephalograms (Electroencephalogram
(EEG)) [Gao and Wang, 2015, Torres-Valencia et al., 2014], muscles electrical activ-
ities (electromyography - Electromyography (EMG)) [Soleymani et al., 2008] heart
rate (Heart Rate (HR)), and galvanic skin response (Galvanic Skin Response (GSR))
sensors [Kosir and Strle, 2017]. Physiological sensors are less likely to become widely
adopted than physical/behavioural sensors due to their cost to purchase, skills to set
up and obtrusiveness [Gollan et al., 2016]. Also, physiological sensors are sensitive
to motion, light and temperature changes. Further, inconsistencies such as individ-
ual idiosyncrasies and within-person changes introduce noisy data [Petrantonakis and
Hadjileontiadis, 2010]. To address some of the challenges in automatic arousal sens-
ing (generalisability, noise, confounding factors), researchers have explored the use of
multiple sensors (multi-modal affect detection systems) [Zhang et al., 2014]. However,
combining multiple sensors increases the complexity to set up, obtrusiveness and the
cost of purchase [Lu et al., 2015]. As we stated earlier, arousal detection has yielded
high accuracy in the literature. However, accuracy may be of limited value in the wild
if there is limited potential for naturalistic use. Therefore, we break down the chal-
lenges that have limited the application of affect recognition in naturalistic settings,
following this order:
(PS1) Potential for ubiquitous use
We qualify this requirement with the word ‘potential’, as a caveat to include
arousal detection mechanisms that have more accessible alternatives (low-cost
sensors and ubiquitous for end-users). This caveat includes sensors that can be
used in the wild but may not be as accurate as the laboratory-grade counterpart.
CHAPTER 1. INTRODUCTION 19
(PS2) Generalisability of the solution
Arousal sensing devices that would yield consistent results in different interac-
tions contexts (application, stimuli and users).
(PS3) Accuracy of detection
Accuracy will be measured in relation to already established ground truth such
as self-report, domain expert’s evaluation of interaction, and other proxies of
arousal (e.g. cognitive load and task difficulty ratings).
We propose tackling this problem in this order so that first and foremost, any proposed
solution has the potential for ecological validity. Accuracy can be improved iteratively
through data-driven approaches, whereas ubiquity has a broader scope. Ubiquity
includes sensor availability, skills to set up and perceived comfort to the user, which
we have less control over. Similarly, the generalisability of our solution can be improved
upon by computational techniques and further evaluations can be done on other stimuli
types.
Our research aims to address the research questions stated in the next section.
1.2 Research questions
We aim to develop an approach to sense arousal and identify the visual attention
of users during moments of change in arousal in an ecologically valid way by using
a mechanism that has the potential for widespread ubiquitous use. Therefore, our
research questions are broken down into three broad questions:
RQ1. What method(s) can be used to sense physiological arousal during user interac-
tion, in an ecologically valid way?
Despite the high accuracy in arousal detection reported in the literature, most of
the approaches have limited potential for widespread, ubiquitous use [Calandra
et al., 2016]. Current limitations include the required skills to set up the sensors,
obtrusiveness, cost and availability of sensors. We performed a literature review
to extract the existing methodologies for affect detection, to select the most
suitable affect detection mechanism that satisfied these criteria, for ecological
validity.
CHAPTER 1. INTRODUCTION 20
RQ2. With the constraints in RQ1, how much accuracy in arousal detection can be
achieved, for different causes/forms of arousal?
There are different reasons for increased arousal during user interaction [Raiturkar
et al., 2016]. This research question can, therefore, be broken down into gen-
eralisability across various visual stimuli. We categorise them into emotional
and cognitive stimuli. We also examine low-intensity arousal. Particularly, low-
intensity arousal can be triggered when users become frustrated on the Web
[Lazar et al., 2006a]. To ensure that our method is generalisable, we evaluated it
using emotionally-evoked, cognitively-induced, and frustration-induced arousal.
RQ3. Can the method that was selected in RQ1 be used to determine the visual con-
text (i.e. the users’ focal attention and visual scan sequence) during moments of
increased arousal?
While RQ1 and RQ2 address the issue of validity and accuracy, RQ3 evaluates
the extensibility of AFA algorithm. In other words, if we identify moments of
increased arousal using the method, how much about the context of the user’s
interaction can we know, so that adaptation/interventions are possible? We
combined our method with an existing algorithm, to model users’ visual be-
haviour not only in terms of their affective response but also according to their
aggregated scan paths. For affect detection to have the desired impact, the con-
text of why users feel the way they feel is important information [Michailidou
et al., 2008]. Therefore, for adaptive computing and recommender systems, our
approach is capable of sensing the user’s focal attention when they experienced
their affective state. Added context should be fed into third-party applications
in the wild so that smarter (better informed) interventions can be carried out
to improve the quality of user interaction [Matthews et al., 2019b]. Some re-
lated work has been done in mapping the user’s attention to their measure of
arousal. For example, Wang et al. made use of mouse motion to detect users’
attention[Wang et al., 2019]. Our work complements Wang et al.’s but makes
use of users’ visual attention.
CHAPTER 1. INTRODUCTION 21
1.3 Methodology
This section aims to report our research methodology by considering the application
category, objectives and the insight our research adds to computer science.
1.3.1 Research Paradigm
A current challenge in the field of affective computing is in applying affect detection
in naturalistic settings [Sanchez et al., 2018]. Our research is aimed at progressing the
domain further by developing an algorithm for affect detection that has the potential
for ubiquitous widespread use. According to Baban et al.’s philosophy of research
methodology, our research falls into the category of applied research [Baban et al.,
2009].
Mixed method approaches have become popular in computer science, especially as it
is often a blend of mathematics and engineering [Demeyer, 2011]. Our research has
multiple objectives at different phases, including establishing a mathematical model
and developing an algorithm to sense arousal based on this model. Therefore, we
take a mixed-methods approach [Johnson et al., 2007]. Throughout our research,
we made use of descriptive, exploratory, correlation, data collection, explanatory and
analytical research methods. We started by identifying and explaining our research
problems through our literature review. Next, we take an iterative analytical approach
to solve the problem, refining our solution over each iteration. Our analytical approach
made use of an iterative data-driven process where we collected and examined eye-
tracking data to establish a correlation between examined pupil dilation and gaze data
pupil dilation, against arousal and visual attention. In a bid to make our approach
generalisable, we took an explanatory approach at each stage of our research in order to
explain the reasons behind our findings to improve our model in subsequent iterations.
We also evaluated our model using correlations and explained our results at every phase
of evaluation.
In addressing our objectives, we make generalisable deductions. According to Denicolo
et al., our research follows the positivism/post-positivism research paradigm [Denicolo
and Becker, 2012]. In the next subsection, we revisit our problem statements to justify
our choice of methodology.
CHAPTER 1. INTRODUCTION 22
1.3.2 Rationale for our research methodology
We tackled the challenge of ecological validity in PS1, through the selection of our
detection mechanism. Our considerations included the potential for use in interac-
tive systems, obtrusiveness, skills required to set up, generalisability, the potential for
capturing the users’ interaction context and availability of the sensor [Ragot et al.,
2017]. Therefore, uni-modality of sensors was a desirable factor in our selection pro-
cess. We are aware that technology evolves rapidly. Therefore, our emphasis was on
their ‘potential’ for widespread use, rather than the current prevalence. For example,
web-cameras with eye-tracking capabilities exist but are not yet accessible due to their
cost. We carried out a review to select an affect detection mechanism that fulfilled
those criteria. To improve generalisability (PS2), we developed our analysis technique
iteratively, using datasets from different stimuli. For accuracy (PS3), our data-driven
methodology account for confounding factors like idiosyncratic differences, lag in re-
sponse while retaining its generalisability by re-evaluating our solution over different
stimuli types.
Following our systematic selection of arousal detection mechanism, we decided on the
use of pupillary response as a means to sense arousal during user interaction. In vi-
sual researches, eye trackers can be used to capture the size of the pupil upwards of
50 times per second (50Hz) [Simola et al., 2015]. Web cameras with the ability to
track gaze behaviour could become more accessible in future; hence, the potential for
widespread use. Eye trackers also detect fixations, which are prolonged visual gaze
on a single location. Contrary to self-reported emotions, pupil dilation is not easily
prone to bias because it is a measure of autonomic activities, which are non-deliberate
responses that can not easily be faked [Bradley et al., 2017]. It is unobtrusive and adds
further context to affect detection because we can use the same device that captures
pupil dilation to capture the users focal attention [Tangnimitchok et al., 2018]. Cap-
turing gaze behaviour makes AFA algorithm unique because if we determine the users’
visual attention during moments of change in arousal, the designer may have a more
informed idea on how to improve user interaction. For the analysis, we developed the
algorithm to sense arousal by modelling the users’ baseline into fixed non-overlapping
window sizes. Next, we use peak detection to sense an increase in the arousal level.
We then identify the area on the screen with the most fixation during the moments
CHAPTER 1. INTRODUCTION 23
of increased arousal while also accounting for the time lag in pupillary response. The
fixation duration and the magnitude of change in pupil dilation are used to compute
the measure of arousal that the user has experienced [Simola et al., 2015]. To en-
sure generalisability, we evaluated the algorithm on its ability to sense arousal due to
cognitive load, emotional stimuli and frustration during web interaction. We under-
stand that arousal can mean different things depending on the context [Psychlopedia,
2018]. For interaction designers and UX researchers, it is important to consider the
user’s focal attention, the time and the measure of arousal experienced [Iqbal et al.,
2004, Partala and Surakka, 2004]. Therefore, we designed a visualisation to facilitate
hypothesis generation. The results of our analysis, evaluating different stimuli using
this algorithm show promise. Lab-based eye-tracking studies can input their data into
the algorithm and visualise the arousal and focal attention of participants, which com-
plements existing methods in usability studies. With further work, more evaluation,
and the advent of high fidelity web cameras, we anticipate that this algorithm can be
used to sense arousal in real-time and in naturalistic settings.
1.4 Contributions and research outputs
Our contributions are to the field of computer science, specifically human-computer
interaction. They are as follows:
1. A mechanism to sense arousal with the potential for unobtrusive
widespread use.
Addressing RQ1 in Chapter 2 and 3 has lead to the selection and implementa-
tion of an arousal detection mechanism that has the potential for unobtrusive
wide-spread use. Several studies review existing mechanisms for stress detection
[Greene et al., 2016, Sioni and Chittaro, 2015], cognitive load [van der Wel and
van Steenbergen, 2018, Einhauser, 2017] or the use of physiological sensors in
general [Brouwer et al., 2015]. We focus on arousal sensing and the potential for
widespread ubiquitous use.
2. The generalisability of our approach to different visual stimuli.
Addressing RQ2 in Chapter 4 and 5 shows evidence of the generalisability of our
approach. Generalisability of affect detection is a problem in affective computing
CHAPTER 1. INTRODUCTION 24
[Schuller et al., 2010]. Results show that our approach can be used to sense
cognitively induced arousal [Matthews et al., 2018d], emotionally evoked arousal
[Matthews et al., 2018b] and frustration on the Web (Chapter 5).
3. Sensing arousal with the context of the user’s focal attention.
Mapping arousal to the user’s focal attention in Chapter 5 and 6 demonstrates
that our algorithm detects the user’s focal attention during moments of increased
arousal.
4. An extensible algorithm to create a richer understanding of visual and
affective behaviour on the Web.
We demonstrated in Chapter 6, that our algorithm can be combined with other
algorithms to derive a broader understanding of user’s affective behaviour and
visual scan patterns on the Web. To the best of our knowledge, this is the first
work that combines visual scan path with arousal to model the behaviour of a
group of users on the web [Matthews et al., 2019b].
1.4.1 Artefacts
In the course of our research, we have produced the following data and software arte-
facts:
1. A working software that analyses pupil data from an eye tracker to generate
arousal points and users’ visual attention during the moment of increased arousal.
2. A tool to visualise the output of AFA algorithm, for hypothesis formulation in
usability and UX researches.
3. Eye-tracking datasets (105 participants) from three experiments with 24 stimuli
that can be used to improve the methods and algorithms in AFA algorithm.
The datasets for the three studies can also be used for further exploration in a
secondary analysis.
1.4.2 Publications
Six publications (five 1st author).
CHAPTER 1. INTRODUCTION 25
1. Combining Trending Scan Paths with Arousal to Model Visual Be-
haviour on the Web: A Case Study of Neurotypical People vs People
with Autism
Oludamilare Matthews, Sukru Eraslan, Victoria Yaneva, Alan Davies, Yeliz
Yesilada, Markel Vigo, Simon Harper
UMAP 2019, Cyprus [Matthews et al., 2019b]
2. Unobtrusive Arousal Detection on The Web Using Pupillary Response
Oludamilare Matthews, Alan Davies, Markel Vigo, Simon Harper
IJHCS 2019 [Matthews et al., 2019a]
3. Sensing Arousal and Focal Attention During Visual Interaction
Oludamilare Matthews, Markel Vigo, Simon Harper
ICMI 2018, USA [Matthews et al., 2018c]
4. Towards Arousal Sensing With High Fidelity Detection of Visual Focal
Attention
Oludamilare Matthews, Markel Vigo, Simon Harper
Measuring behaviour 2018, United Kingdom [Matthews et al., 2018d]
5. Inferring the Mood of a Community From Their Walking Speed: A
Preliminary Study
Oludamilare Matthews, Zhanna Sarsenbayeva, Weiwei Jiang, Joshua Newn,
Eduardo Velloso, Sarah Clinch, Jorge Gonalves
Ubicomp 2018, Singapore [Matthews et al., 2018a]
6. Moodbook: An Application for Continuous Monitoring of Social Me-
dia Usage and Mood
Heng Zhang, Shkurta Gashi, Hanke Kimm, Elin Hanci, Oludamilare Matthews
Ubicomp 2018, Singapore [Zhang et al., 2018]
7. Ubiquitous Mobile Sensing: Behaviour, Mood, and Environment
Aku Visuri, Kennedy Opoku Asare, Elina Kuosmanen, Yuuki Nishiyama, Denzil
Ferreira, Zhanna Sarsenbayeva, Jorge Gonalves, Niels van Berkel, Greg Wadley,
Vassilis Kostakos, Sarah Clinch, Oludamilare Matthews, Simon Harper, Amy
CHAPTER 1. INTRODUCTION 26
Jenkins, Stephen Snow, m. c. schraefel
Ubicomp 2018, Singapore [Visuri et al., 2018]
1.5 Research statement
Our motivation is based on a lack of ecologically valid approaches to sensing arousal
during user interaction [Ragot et al., 2017]. Our mission is to develop an algorithm
to sense arousal using a mechanism that has the potential for ubiquitous widespread
use. Our vision is to utilise our algorithm to drive adaptive systems in making in-
terventions to the context, layout and contents of user interaction based on the levels
of arousal and their focal attention during moments of undesirable states of arousal.
We have developed the AFA (Arousal and Focal Attention) algorithm and evaluated
it to assess its accuracy. Our results show that our algorithm can be used to extract
arousal levels and focal attention of users from eye-tracking datasets. With future web
cameras that have the capabilities of eye trackers, our vision can be accomplished with
some optimisations to our algorithm to improve on its accuracy, and to function in
naturalistic settings.
1.6 Thesis structure
In Chapter 2, we start with an overview of the research domain - affective computing,
appraising its progress in the lab vs in naturalistic settings. We delve deeper into our
scope - affect detection, highlighting the challenges that have limited the potential of
sensing affect. Next, we present our criteria for selecting our preferred affect detection
mechanism, and the rationale for our choice - pupillary response. Then, we present
the result of our literature review. Finally, we discuss the various ways that affect can
be represented and the applications of affective computing.
In Chapter 3, we discuss our research methods in detail. We talk more about pupil-
lometry devices and the nature of the pupillary response and pupillometry sensor data.
Next, we discuss the existing approaches for analysing pupil data. We then present our
approach and the iterative development of our approach using datasets from different
sources and complexity. We presented the implementation of our algorithm, which we
CHAPTER 1. INTRODUCTION 27
named AFA algorithm (Arousal and Focal Attention Algorithm). Finally, we discussed
how we designed and developed a visualisation toolkit for AFA algorithm.
In Chapter 4, we evaluated AFA algorithm on its ability to sense static stimuli (e.g.,
pictures). We evaluated AFA algorithm on arousal caused by emotive contents as well
as cognitively induced forms of arousal. Our results show that AFA algorithm senses
arousal with a moderate to a strong level of correlation to our ground truths.
In Chapter 5, we evaluated AFA algorithm on sensing arousal on the Web. We dis-
cussed how Web interaction data provide a more difficult challenge than static images.
Further, we made use of ecologically valid stimuli, as we injected common causes of
frustration on the Web. Our results show that AFA algorithm discriminates between
normal and frustrating tasks with a strong effect size.
In Chapter 6, we demonstrated that AFA algorithm could be combined with existing
methodologies to give a richer understanding of users’ behaviour. We combined AFA
algorithm with Scanpath Trending Analysis (STA) algorithm to create a new method-
ology. We used the case of people with autism vs neurotypical people to demonstrate
the novelty of our methodology, as the combination of AFA algorithm and STA algo-
rithm was used to model the affective state and the visual scan behaviour of these two
groups on the Web.
In Chapter 7, we revisited our research questions and highlighted how our aims and
objectives lead us in addressing them. We also presented the limitations to our ap-
proach. Further, we propose future work and potential research pathways that our
research can take. Finally, we presented concluding remarks.
Chapter 2
Background and related work
This chapter starts by providing an overview of affective computing and clarifying
terminologies frequently used or misused in the domain. Section 2.2 then proceeds to
an appraisal of progress made in affective computing, generally characterised by many
laboratory studies but few ecologically valid methodologies. Sequel to that, affect
detection mechanisms are categorised, and 12 of the most common mechanisms are
discussed with the aim of highlighting their applications, pros and cons in section 2.3.
Critical evaluation of the existing affect detection mechanism then leads us to propose
pupillary response as our preferred choice for affect detection in section 2.6. After
that, in section 2.5, results of our literature review of affect detection mechanisms is
presented. Next, we discuss related work. In 2.7, theories and methods of representing
affect are discussed while highlighting application areas and challenges associated with
each one. Following that, domain areas in which affective computing has been studied
with example applications are reviewed in section 2.8. Finally, section 2.9 contains
conclusion on this chapter.
2.1 Affective computing
Although affective computing is an interdisciplinary field encompassing computer sci-
ence, neuroscience, psychology and physiology, early questions under philosophy have
been asked about emotions (e.g. the definition of emotions) much before the inception
of the affective computing domain [Ekman, 1992a, Ekman, 2004]. Affective comput-
ing revolves around affect detection, inducing affective states and the expression of
28
CHAPTER 2. BACKGROUND AND RELATED WORK 29
emotions by machines [Picard, 1997].
Affect, emotions and feelings may have been used interchangeably in the literature,
yet they have important differences often misapplied and sometimes misunderstood by
neophytes and even experts [Cromby, 2012]. An affective state consists of a set of
psychophysiological patterns/state during a defined period [Picard, 2003]. If the sub-
ject cognitively recognises an affective state to be a unique psychological state, then
it can be reported as a feeling [Cromby, 2012]. Hence, we often see people fill logs
in diaries saying, ‘I felt happy after seeing my test result’ or during product reviews
say ‘I was disappointed after trying it out’ or, ‘I feel scared’ [Feinstein et al., 2011]. If
this feeling is made observable through behaviours, gestures, facial expressions, voice
prosody or attitudes, then we say that an emotion is expressed [Izard et al., 1987].
An affect may or may not be a familiar feeling, and as human beings become more
emotionally intelligent, we know when to express, suppress, fake or exaggerate an emo-
tion [Ashkanasy and Daus, 2002]. As a result, the distinction between feelings and
emotions can be understood from machines. They do not have one but can be made
to express the other [Ekman et al., 1987, Davidson, 2003]. To summarise these differ-
ences, everybody experiences affect, not all affective states can be recognised feelings
(notably infants), and finally, emotions are social expressions of our psychological state
(although they can be faked) [Harris et al., 2000].
Other related terms include expressions, autonomic changes, attitude, self-reported/full-
blown emotions, mood, emotional disorder, traits. However, these terms were differ-
entiated in a study by Bakhtiyari et al. [Bakhtiyari et al., 2014] using their temporal
dimension, i.e. how long the psychological states last. Figure 2.1 illustrates this. The
self-reported (full-blown emotion) is the psychological state of a person that remains
dominant between few minutes to hours [Matsumoto, 1993]. In practice, it is not easy
to isolate the affective state of an individual from other psychological phenomena. The
wrong approach in detecting affect is to assume that one affect detection model would
fit everybody’s affective state [Cowie et al., 2001]. This approach is wrong because
individual personality traits, psychological disorders and moods could obscure the af-
fective state of a person [Fragopanagos and Taylor, 2005]. Also, short-lived variations
in personal attitude, autonomic changes and expressions could intensify an affective
CHAPTER 2. BACKGROUND AND RELATED WORK 30
state. For example, a person may be an introvert by nature, going through a psycho-
logical disorder and experiencing a negative mood. The same individual may express a
brief moment of joy while watching a short video clip on the Web. Characterising this
state depends on the window of time being examined, the person’s baseline, amongst
other factors [Oatley et al., 2006]. Hence, when detecting affect, these factors should
be taken into consideration to avoid inferring other psychological phenomena that are
not purely affective states.
Some progress has been made in affective computing, and several applications have
Figure 2.1: Psychological states, by duration [Bakhtiyari et al., 2014]
leveraged on affective computing and detection of the user’s affective state. Some
of them include Affective Tutoring Systems (ATS), affect enhanced gaming, recom-
mender/helper systems and in computer science, user interfaces.
The general concept in these applications is to factor the user’s emotional/affective
state into the way the machines interact with the users. This is because human be-
ings are used to interactions that feature empathy and emotional intelligence during
a human-human interaction [Picard, 2010]. Other ways affective computing is be-
ing applied include the use of computer agents (robots, emoticons/avatars) to induce
emotions [Burleson and Picard, 2004]. For instance, enabling them with the ability to
make decisions or express themselves in ways, they would have otherwise done if they
CHAPTER 2. BACKGROUND AND RELATED WORK 31
had feelings just like human beings [Ahn and Picard, 2005]. The lack of emotional
intelligence in machines has somewhat limited the potentials of human-computer in-
teraction and by a wide margin, detecting human affective state has been the most
difficult challenge in empowering machines with emotional intelligence [Picard, 1997].
The next section shows that the study of affective computing is on a steady growth;
however, how has this translated into real-life applications?
2.2 Progress in the lab, limited progress under prac-
tical settings
A lot of progress has been made in the study of affective computing. In the last fifteen
years, we can see in figure 2.2, an average increase of M = 1313.57(SD = 735.52) in
the number of results the phrase ‘Affective Computing’ returns from Google Scholars
search engine, by year. It is important to note that this result is not accumulated
Figure 2.2: Number of results retrieved from Google Scholar, by year.
results till the given year, but number of results only in that year. Despite the in-
crease in the number of literature as seen in 2.2, a corresponding increase in end-user
applications of affective computing has not manifested because affect-aware systems
CHAPTER 2. BACKGROUND AND RELATED WORK 32
are still rare to come by [Epp et al., 2011]. Many of the studies have been focused
on inducing participants with affect stimulus. Then, confirming that certain affect
detection mechanisms can find a correlation [Siraj et al., 2006], classify or detect a
predetermined affective response to the induced stimuli [Abbasi et al., 2010]. Only a
few of these studies use methodologies that are reusable for real-life application [Epp
et al., 2011].
To this end, sufficient research already supports claims that affective state can be de-
tected using several methods [Xu et al., 2015]. Hence, future efforts should be diverted
to applying affective computing in ways that can be deployed in real-life applications
to improve human-computer interaction. Few applications have succeeded in using af-
fective computing to enhance interaction, but their methodologies work under limited
applications [Ragot et al., 2017]. This limitation is mainly due to their choice of affect
detection mechanisms, and the next section attempts to highlight why this is so.
2.3 Affect detection
Recognising affect by ourselves (human recognition) seems intuitive. However, this
is done by observing behavioural cues such as body gestures, facial expressions, voice
prosody. Other cues enable us to clarify and contextualize our judgement about human
emotions. For example, the nature of the relationship between people in conversation,
the task being carried out. The innate personality of a person and prior events can
serve as a baseline, and the deviation from this indicates the severity of a person’s
emotion. After detecting this, reporting it is also non-trivial [Hassan, 2006]. It is
sometimes prone to bias [Fisher, 1993]. Some of the challenges in self-reported emo-
tions include the challenges in representing emotional states in standard, uniform and
detailed ways [Levine and Safer, 2002]. Sometimes, our feelings are too subtle to de-
tect in cases of low-intensity emotions; other times, they are too entangled in cases of
mixed and transition between psychological states [Bakhtiyari et al., 2014]. Also, the
pressure and demand to detect and report it could introduce another affective state
[Allen et al., 2001]. This is even more complex when a third party is required to detect
this feeling as the complexity and combination of the challenges above further reduce
CHAPTER 2. BACKGROUND AND RELATED WORK 33
the accuracy. Some of these complexities are faced when programming machines to
detect human affect [Fisher, 1993].
Notwithstanding these difficulties, affect detection through computers offers us some
advantages. Namely: the ability to automate the process and the sensitivity of sen-
sors in detecting minute changes otherwise not physically observable by human beings
(and certainly not at the same frequency rate) [Picard, 1997]. Further, machines are
not subjected to the same bias as humans. Additional advantages of sensors include
data storage, processing capabilities and integrability of sensors to other computer
applications [Zhang et al., 2014]. These are the reasons that make affect detection a
promising study for improving human-computer interaction. Self-reported emotions
are still regarded as the golden standard for precision accuracy of emotion measure-
ments [Lang, 2005, Mirgain and Cordova, 2007]. However, When machines achieve
accuracies close to humans in detecting emotions, then we can leverage on the other
advantages of computation, i.e. affective computing [Prendinger and Ishizuka, 2005].
We discuss the three broad categories of affect detection below:
• Reported: Diaries, product ratings, service feedbacks as questionnaires. [Gehricke
and Shapiro, 2000, Heller et al., 1997] Through self-reported emotions, peo-
ple communicate their feelings to product owners in product reviews, service
providers as customer feedback, or personal records for people who keep per-
sonal diaries. However, this approach is highly subjected to human bias and
would require a lot of effort for the user to produce sufficient granularity of
detail required for most applications of affective computing [Levine and Safer,
2002].
• Physical/Behavioural: Gestures, facial recognition [Kumar and Agarwal, 2014,
Xu et al., 2015], Natural Language Processing (NLP), voice prosody, Keystroke
Dynamics (Keystroke Dynamics (KD)) & Mouse Dynamics (Mouse Dynamics
(MD)) [Hernandez-Aguila et al., 2014]. We can observe physical and behavioural
cues, to infer emotions, as it is the most common way of learning each other’s
emotional state during a human-human interaction [Zaalberg et al., 2004]. It is
very natural, but it is difficult for computers to achieve high precision through
this method because emotional expressions are a human-human social construct
CHAPTER 2. BACKGROUND AND RELATED WORK 34
for people to express their feelings to each other, not machines [Van Kleef, 2009].
Another limitation of this approach is that it is context-specific, as people express
their emotions in different ways at different circumstances [Hochschild, 1979].
Also, most of these methods require high computational power to achieve a
satisfactory level of accuracy.
• Physiological: GSR, Electrocardiography (ECG), EEG, EMG, HR, Skin Tem-
perature (ST) [Zhang et al., 2014], Pupillary Response [Eraslan et al., 2014].
In detecting the user’s affective state for computational purposes, more sophisti-
cated instrumentation and analysis techniques could be used to measure changes
in physiological activities that are known to correlate with the user’s affective
state. This approach is known as affect detection using physiological correlates
of affect.
There are other things to consider when selecting an affect detection mechanism, and
those considerations depend on the purpose of the application. In adaptive gaming,
the user’s gestures may be an indication of the affective state of the player while
game progression and user performance may be used to determine the context in
which the user experiences an affective state. Subsequently, suitable events such as
increased/reduced complexity of a game level could be used to intervene on bore-
dom/frustration respectively [Christian et al., 2014]. In intelligent tutoring systems,
emotional cues captured from cameras and features extracted from sound prosodies
could be used to detect the user’s affective state while responses to cognitively chal-
lenging tasks could be used to contextualise an affective state. When an affective state
and the context that describes the affective state is known, suitable interventions like
suggesting breaks, repeating an explanation or quizzes can be introduced to improve
the learning process [Sidney et al., 2005]. Contrary to the applications above, affect
responses during interaction with user interfaces are not as intense as during gaming
interaction, hence the need for methods with high sensitivity [Matthews et al., 2018d].
Furthermore, many of the methods are invasive therefore not suitable for observing
people’s affective states during human-computer interaction with user interfaces.
CHAPTER 2. BACKGROUND AND RELATED WORK 35
In more detail, twelve (12) well-used affect detection mechanisms are reviewed indi-
cating their applications, advantages and constraints. The list was compiled through
my literature of mechanisms I learnt about until I found no new mechanism found in
the literature other than those in the list. They are categorically listed according to
the feature/behaviour/physiology being sensed rather than the sensor themselves.
1. Self-reported
Self-reported means remains the golden standard upon which other implicit affect
detection mechanisms are evaluated [Broekens and Brinkman, 2013]. Self-reported
mechanisms require participants to explicitly rate their emotions on a 5, 7 or 9-point
scale [Zeng et al., 2008]. These scales are presented to the user pictorially, verbally,
through graphical animations or by filling out questionnaires on paper [Desmet, 2003].
Some of the well-used self-reported scales include Self-Assessment Manikin (SAM)
[Geethanjali et al., 2017] for measuring valence, arousal and dominance, the Positive
Affect and Negative Affect scales (PANAS) for measuring valence, Profile Of Mood
State (POMS) for measuring mood [Norcross et al., 1984] and Visual analogue scale
(VAS) for measuring characteristics of an attitudes on a continuous scale [Crichton,
2001]. To validate our approach, we used the SAM scale because it is simple to use,
measures arousal, and does not complicate the comparison with our algorithm, as
it is also quantified on a scale [Watson et al., 1988]. However, self-report’s reliance
on human judgement justifies the need for an automatic approach to detecting affect.
Several individual components add up to make up a person’s cognitive and intellectual
abilities. These include memory retention, reasoning and creativity, but emotional in-
telligence is often overlooked as one of them [Detterman, 1987]. Emotional intelligence
can include the ability to deduce one’s emotions [Peter, 2010] cognitively. The vary-
ing level of emotional intelligence can make self-reported emotions inconsistent. Also,
there seems to be a consensus among [Ekman et al., 1987, Cole et al., 2002, Simon and
Nath, 2004] that gender, social, cultural and personality differences affect perception
and display rules of emotions. Although this has been used in product/service feedback
[McKone, 1999, Gamon, 2004], personal diaries and logs [Birditt et al., 2005]; There is
a lack of detail, consistency and ease of use in self-reported emotions as an approach
for affect detection because it is done manually [Levine and Safer, 2002]. This manual
CHAPTER 2. BACKGROUND AND RELATED WORK 36
approach has made it difficult for standardization and consistency; hence, self-report
is not an ideal method of affect detection for use in affective computing [Hassan, 2006].
2. Facial recognition
Facial recognition is a less demanding way of detecting emotions than self-reported
emotions as it has the potential of being fully automated. Recognizing emotional
expressions can be done in four (4) different ways [Kumar and Agarwal, 2014]: 1.
Geometry - shapes, direction, regions; 2. Colour-based: the colour of the feature (eye,
nose, mouth) but this is very individualistic and culture/race/skin colour specific;
3. Appearance-based: statistical techniques; 4. Template-based: by comparing with
templates of a feature from a feature database until there’s a match. Paul Ekman,
who was at the forefront of findings using facial expressions and suggested the use of
Facial Action Coding System (FACS) based on the works of an anatomist, CH Hjortsj
[Hjortsjo, 1969, Ekman and Friesen, 2003]. However, the subjective nature of human
emotions has hindered progress using these techniques. Despite the challenge of indi-
vidual differences, [Kumar and Agarwal, 2014] reported that facial expressions have
been successful to accuracy between 70% and 84% in person-specific emotion recogni-
tion with a maximum of 5% variation in non-person-specific emotion recognition.
The challenges to facial recognition include its sensitivity, granularity of calibration,
which influences the accuracy in temporal dimensions [Kolakowska, 2013]. During
human-computer interactions, low-intensity emotions are prevalent, and sensitivity of
recognition is a desirable feature which facial recognition does not afford us. Another
challenge is that, under observation, users tend to display what the observer is expect-
ing to see. In a real-life situation, under covert settings or with no observance, the user
may behave differently [Zaalberg et al., 2004]. Also in facial recognition, it becomes
unclear what expressions to expect from mixed emotions such as someone transition-
ing from a joyful experience to surprise and fear. Most facial expression methods are
based on Paul Ekman’s fundamental theory of the basic discrete emotions [Ekman,
1992b]. Although detecting facial expressions are non-intrusive and averagely cheap,
needing only a camera and decent computational power. Recognising an affective state
through facial expression remains a non-sensitive, user-specific, bias-prone way of de-
tecting user affect [Harms et al., 2010].
CHAPTER 2. BACKGROUND AND RELATED WORK 37
3. Gestures
Gestures of users can be used to discern their affective states [Mitra and Acharya,
2007]. This is made possible because our hand, head, body movement and postures
can be used as affective cues. It often requires a camera and motion sensors to detect
these signals. Hence, gestures are similar to facial expressions regarding the cost and
detection approach, requiring image analysis using the same techniques (geometry and
shape) as metrics [Gunes and Piccardi, 2007]. Some applications have even combined
the use of both techniques due to their similarities in feature metrics, software design
and hardware resources required [Gunes and Piccardi, 2007, Castellano et al., 2008].
However, it is worse than facial recognition regarding sensitivity and applicability be-
cause low-intensity emotions do not reflect much in user gestures [Mitra and Acharya,
2007, Ward and Marsden, 2004]. Despite these limitations, gestures have been ap-
plied to annotating video contents [Hartmann et al., 2005], affective tutoring systems
[Sarrafzadeh et al., 2006, Sarrafzadeh et al., 2008], tutor training systems to improve
facilitator’s body movement [Nguyen et al., 2015], etc.
4. Voice prosody
Voice prosody has been applied to detect the genre of music, predicting emotions,
detecting emotions, voice chats, etc. [Fritz et al., 2009]. It is cheap (in infrastruc-
ture) and non-intrusive with accuracies reaching over 80% [Erdem and Sert, 2014].
Prosodic (frequency, duration, intensity and timbre) features and non-prosodic fea-
tures have been used to classify sounds into emotional types [Mion and Poli, 2008].
Despite its great accuracy, low cost, non-invasiveness, it is unnatural to request a user
during human-computer interaction, to speak to detect the user’s current affective
state. This method is more suited to eliciting emotions than detecting emotions dur-
ing human-computer interaction [Kim and Andre, 2008]. Furthermore, it could pose
a significant challenge for a voice recognition engine to extract a voice while retaining
the prosodic features that are useful for affect detection. Finally, voice processing is
computationally intensive due to the extensive signal processing necessary for voice
analysis.
CHAPTER 2. BACKGROUND AND RELATED WORK 38
5. Natural Language Processing (NLP) and Text Mining
While text mining deals with the extraction of interesting knowledge including statis-
tics, clustering, classification from unstructured or free text, NLP breaks it down and
evaluates it to discern sensual, emotional or a richer meaning of a free text [Kao and
Poteet, 2007].
Some applications of text mining and NLP to affective computing are musical lyrics
classification [Hu et al., 2009], social media emotion detection [Sobkowicz et al.,
2012, Gil et al., 2013]. In a literature review, NLP was said to have a relatively
low accuracy of 77.30% and is prone to differences in language interpretation ambigu-
ity, cultural differences and social display rules [Lee et al., 2012]. NLP as a method of
detecting emotions is also not real-time, and more suitable for affect contextualization
rather than affect detection [Kim et al., 2010].
6. Electro-dermal activity
GSR, Skin Conductance Level/Rate (SCR/L), PsychoGalvanic Reflex (PGR) are all
measures of electrical activities of the skin [Boucsein, 2012]. Now formally known as
Electro-Dermal Activity (EDA), EDA is the general term for all activities that mea-
sure the changes in the electrical properties (potential difference, current, resistance,
conductance) of the skin [Critchley, 2002]. The Sympathetic Nervous system (SNS)
is the part of the Autonomic Nervous System (ANS) which controls the involuntary
action of sweating [Critchley, 2002]. When the SNS is stimulated, it becomes aroused
and is reflected by the amount of sweat deposited in the sweat pores [Chellali and
Hennig, 2013]. As sweat is a conductor of electricity, the amount of sweat in the sweat
pores influences those changes in electrical properties of the skin. EDA is a reflection
of the arousal dimension of emotion [Schlosberg, 1954]. It should be noted that it
is not a causal relationship that exists between emotional arousal and EDA as other
factors such as humidity, weather, and motor activities can easily increase or decrease
the amount of sweat on the skin [Chellali and Hennig, 2013]. Devices that measure
EDA have been miniaturised and made non-intrusive in the form of wristbands, fin-
ger bands, toe bands, etc. [Luharuka et al., 2003]. Despite its unobtrusiveness, the
uncertainty and difficulty to control external factors have made EDA unsuitable for ap-
plications in naturalistic settings. Also, when measuring EDAs, it is expected to factor
CHAPTER 2. BACKGROUND AND RELATED WORK 39
in between 3 to 6 seconds latency between stimuli and responses [Chanel et al., 2006],
which is another limitation. Furthermore, EDA devices are still somewhat expensive
and do not come ship as a standard personal computer peripheral device, hence not
practicable for use in everyday affect detection by non-experts in non-experimental
settings [Boucsein, 2012].
7. Electrical-brain activity
Human electroencephalography is a measure of the electrical frequency and amplitude
activities measured from the [Li et al., 2009]. A comparison of multimodal affect detec-
tion mechanisms comprising of EEG, GSR, HR, temperature and respiration pattern
was made using generative models [Torres-Valencia et al., 2014]. It was observed that
when EEG was included in the combination, it yielded the highest accuracy in both
arousal and valence dimensions of emotion. To utilise EEGs, several bands such as
alpha, beta, theta, delta representing different frequency bands indicate different ac-
tivities in the brain; therefore, a careful selection of one or combinations of these bands
is necessary [Ahern and Schwartz, 1985]. Because placing the probes on the head also
affects stability. Hence, the reliability of the readings, some level of skill is needed
for a successful setup. Some studies claim that EEGs are inexpensive, non-intrusive,
simple, fast and an accurate way of studying the brain’s reaction to stimuli [Li et al.,
2009]. While its accuracy and responsiveness are undisputed, the knowledge on how
to set-up, the frequency bands to explore, the unnatural fitting, the non-aesthetic and
intimidating look as seen in figure 2.3, we would argue that it is not quite ideal for
daily, non-experimental use [Li et al., 2009].
8. Heart rate variability
Under this category includes sensing the heart rate, blood volume pressure and electrical-
heart activity with ECG. There are several sensors that can be used to estimate the
heart rate and blood volume pressure, including the use of skin temperature (Photo-
plethysmogram (PPG)) discussed later [Quazi et al., 2012]. ECG, is mostly used as a
clinical diagnostic tool for detecting the health condition of the heart and could also be
used as a physiological correlate to affect [Li and Chen, 2006]. ECG as an affect detec-
tion mechanism is based on correlations between cardiovascular activities of the heart
and changes in the affective state of a person. When the fibres through the arteries
CHAPTER 2. BACKGROUND AND RELATED WORK 40
Figure 2.3: EEG device 1
and ventricles are activated through the Sympathetic Nervous System (SNS), the heart
rate increases. Conversely, activation of the Parasympathetic Nervous System (PSN)
reduces workload, hence lowers the heart rate [Agrafioti et al., 2012]. Catalano et al.
[Catalano, 2002] reported that the reacting effects observable in ECG are: automacity,
contractility, conduction rate, excitability and dilation of coronary blood vessels. In
another study, Support Vector Machine(SVM) classifier was used to classify emotions
using ECG signals, the accuracy of 78.4% and 61.8% was achieved for three and four
classes of emotions respectively [Kim et al., 2004b]. As is the case with EEGs, ECGs
also requires some level of expertise to set up, and they are intrusive, the devices are
comparatively costly and worse still, its accuracy is not high. These constraints make
ECG non-suitable as an affect detection mechanism [Nakamura et al., 1993].
9. Electrical-muscle activity
EMG is the use of electrical activities to measure changes in muscle activities [Benedek
and Hazlett, 2005]. The most common use of this is in measuring changes in facial
muscles. The physiology of this is that when a stimulus is induced and experienced
by a participant, the affective state changes, and the brain instructs the motor nerves
to reflect these changes through facial muscles to generate the corresponding facial
expression [Xu et al., 2016]. This process is reversed to reset the facial expression or
transition into another facial expression. These changes can be revealed by observing
1http://www.cns.ppls.ed.ac.uk/eegmain
CHAPTER 2. BACKGROUND AND RELATED WORK 41
EMG over a facial muscle, and the duration of an affective state can also be measured.
EMG is more sensitive than other methodologies to detect facial expressions. Affective
changes not detected using Ekman’s FACS was detectable using EMG when emotional
intensities were suppressed [Cacioppo et al., 1992, Cacioppo et al., 1986]. The corru-
gator muscle is a facial muscle located in the forehead that is responsible for wrinkling
of the forehead. Activities of the corrugator muscles are usually used to detect nega-
tive emotions like sadness, worry, deep thought, anger because the corrugator muscle
controls the eyebrow by lowering it during those states [Dimberg, 1990, Hazlett and
Hazlett, 1999]. Also, the zygomatic major muscle is a facial muscle which is responsi-
ble for stretching the mouth posteriorly and superiorly, hence controlling smiling and
other mouth expressions that are correlated with joy and pleasurable emotions [Lang
et al., 1993]. Despite its sensitivity and accuracy, EMG just like EEG and ECGs are
intrusive because they depend on attaching devices to the body and require some level
of skill in setup [Allen et al., 2001]. Like facial recognition, individual, social and
cultural differences influence the perception and display rules of participants towards
affective stimuli [Cheng and Liu, 2008].
10. Skin Temperature
PPG can be used to measure skin temperature. The SNS works by preparing the body
for fight or flight response to stimuli while the PNS operates in the opposite way to
regulate this effect by triggering a rest or digest response [Nakamura et al., 1993, Moses
et al., 2007]. These activities influence our heartbeat patterns, respiration patterns
and skin temperature [Bousefsaf et al., 2013, Hjortskov et al., 2004]. Recent works
have confirmed that measures of variabilities in these activities can be done using a
technique known as PPG [McDuff et al., 2014]. PPG functions by emitting light into
the skin and measuring the amount of light reflected onto a camera or photosensitive
device [Shelley, 2007]. The amount of light reflected is an indicator of changes in
blood quantity deposited in the lower layer of the skin, and as explained above, the
blood volume is influenced by activities of our autonomic nervous system [Quazi et al.,
2012]. Lee et al. [Lee et al., 2011] proposed an algorithm which proved successful in
improving the Signal to Noise Ratio (SNR) and hence accuracy by between 10.9% and
CHAPTER 2. BACKGROUND AND RELATED WORK 42
12.7% for motions of the wrist and walking on the road respectively. Despite indica-
tions that PPG is a very promising methodology for detecting affect, the problem of
several confounding environmental variables such as temperature, humidity, motion,
individual differences are issues that linger on when using this technique [Lee et al.,
2011]. It is not ideal for use in naturalistic settings.
11. Keystroke Dynamics (KD) and Mouse Dynamics (MD)
KD and MD have been attempted for use as biometric devices to authenticate users
[Bergadano et al., 2002, Monrose and Rubin, 2000, Dowland and Furnell, 2004, Pusara
and Brodley, 2004]. The same feature sets used by these studies are also being used
for user profiling and emotion detection [Alhothali, 2011]. In KD for emotion detec-
tion, users affective states are being correlated with the dynamics of the way the user
types free text or fixed text. Features such as typing speed, backspace rate, durations
between key press, the duration between key down and key up, etc. are being used as
an indication of user’s affective state [Lin et al., 2013]. Features in mouse dynamics
include mouse speed, click rates, the duration of key up and key down events, scroll
rates, etc. are used for affect detection in MD [Lin et al., 2013]. The underlying
principles are in trying to associate muscle action and user behaviours (as manifested
in user actions) with the user’s affective state [Vizer et al., 2009]. A review of studies
done in emotion detection using keyboard and mouse dynamics was carried out with
the highest average accuracy being 93.4% gotten when KNN (K nearest neighbour)
was used to classify emotions elicited from audio stories into anger, fear, happiness,
sadness, surprise and neutral [Kolakowska, 2013]. The advantage of using this method
is that it is cheap, non-intrusive, low computational cost, natural and no complexity in
the device set-up [Kolakowska, 2013]. KD and MD would have been the ideal method-
ology for detecting affect if all applications required the same pattern of input during
an interaction. While reading emails, requires little or no keyboard input, sending
emails will require keyboard input and little mouse input. Relying on a methodology
based on KD and MD to detect user affect states in these cases could be erratic [Khan
et al., 2012].
12. Pupillary response
The pupillary response is known to be an indicator of arousal [Bradley et al., 2008b].
CHAPTER 2. BACKGROUND AND RELATED WORK 43
It has been used to detect cognitive load, frustration, attention, boredom and stress
[Partala and Surakka, 2003, Klingner et al., 2011]. The primary function of the pupil is
to regulate the amount of light going through to the retina [Baltaci and Gokcay, 2012].
However, once that is kept relatively constant, the pupil diameter is an indicator of
the autonomic activities of a person [Pfleging et al., 2016]. Pupillary response as an
indicator of affect stems from the fact that the sympathetic nervous system is that
part of the autonomic nervous system which when activated, raises the blood pressure,
heart rate, constricts the heart rate and most interestingly stimulates the radial dilator
muscles which in turn causes pupil dilation [Koss, 1986]. As the sympathetic activities
decrease, this process reverses and causes a decrease in the pupil diameter. Contrarily,
when the parasympathetic nervous system is activated, the sphincter muscles of the
iris is triggered, which in turn constricts the pupil. This process is reversed whenever
the parasympathetic system is inhibited and this, in turn, results in pupil dilation
[Bruneau et al., 2002, Schroder et al., 2005]. Visual interaction involves eye gaze at
the contents so, there is an opportunity to capture the eyes. Eye trackers have been
used in experimental set-ups to capture user interaction; however, some studies have
begun to use the web camera to achieve the same under naturalistic settings [Sommer
et al., 2014]. The cumulative measure of dilations and contraction of the pupil using
windowing techniques has been known to correlate with a user’s affective states. Once
the light is controlled/accounted for, and the camera or eye tracking device has a vi-
sual on the eye, we envisage that it will be possible to detect a users’ affective state.
The relatively high accuracy as seen in table 2.2 and the potential for use in natural
settings [Sommer et al., 2014], non-intrusiveness, easy to set-up features of pupillom-
etry as seen in table 2.1 has made pupillary response our preferred choice of affect
detection [Christian et al., 2014, Partala et al., 2000]. The next section takes a more
detailed expository on the selection of affect detection mechanism and how pupillary
response could be a reliable physiological correlate of affect during human-computer
interaction.
CHAPTER 2. BACKGROUND AND RELATED WORK 44
2.4 Selecting an affect detection mechanism
When selecting an affect detection mechanism, several considerations must be made.
• Purpose & natural fit
The purpose or application area that the affect detection mechanism is to be
used is very crucial in determining a suitable way to detect affect [Brouwer et al.,
2015]. In our intended application - user interfaces and visual context, it is quite
challenging to identify an ideal affect detection mechanism. This is because users
interact with a system through its user interface, and these interactions take dif-
ferent modes [Exposito et al., 2018]. However, most user interfaces have visual
contents which enable users to view options, information and make decisions
[Sanghoon and Roberto, 2005]. It is, therefore, a reasonable assumption that
most users will have visual contact with user interfaces. One of the established
modes of detecting affect, the pupillary response can be used to measure the
affective state of users [Sanghoon and Roberto, 2005]. Through eye trackers and
web cameras - the eyes of the computer, we can collect physiological responses
from the user, which will inform the affective states of the user. Another natural
way that the interaction of users can be observed is through keystroke and mouse
dynamics. However, these methods are not always applicable because some user
interfaces require less of them than others so, keystroke or mouse motion dy-
namics collected from one interface may be dissimilar on other interfaces even if
the user is experiencing the same affective state on both user interfaces.
Regarding the purpose and natural fit, pupillary response and other camera-
based approaches are more generalizable and accurate means of collecting affec-
tive data for interaction with user interfaces and visual contents.
• Accuracy
Accuracy is of high importance to affect detection. However, with lots of data
cleansing and analysis methods, several studies have come up with specific tech-
niques for detecting affect in certain contexts. As can be seen from 2.2, nearly
all affect detection mechanisms have accuracies of above 80%, at least in certain
contexts. The more the number of classes, the lower the expected accuracy. Also,
CHAPTER 2. BACKGROUND AND RELATED WORK 45
in classification, higher accuracies are easier to attain than estimation. There is
no standard way to measure the accuracies of affect estimation. However, for a
more generalizable application, estimation is preferred because often, the desired
classes are not known beforehand. Another advantage of estimation is that it
is possible to convert estimates to classes but not vice-versa. In user interfaces,
the most important qualities for performance: attention, interest, adequate cog-
nitive load, stress are best measured using estimates on the arousal scale. It
is, therefore, important, that an affect detection mechanism can measure with
the right granularity, the affective state of a user. Physiological correlates of af-
fect are more fine-grained than physical/behavioural and self-reported methods.
This inevitably eliminates facial gestures, body gestures and other camera-based
affect detection mechanisms except for pupillary responses.
• Ease of use
Affect detection is very relevant to user interaction, as has been established
previously. However, if it is too costly to purchase and difficult to set up, or it
is too computationally intensive then, its cost may outweigh its value.
As can be seen from table 2.2, most of the studies were experimental. Only very
few of the mechanisms can be deployed in real-life applications. Text mining and
voice prosodies can be used for social and customer care applications, while the
keyboard and mouse dynamics may be used for specific computer applications.
Gestures and facial recognition are computationally intensive, but they may be
used in visual contents, learning or gaming. All other physiological response
based mechanisms are not applicable in natural settings because they do not
integrate seamlessly into the systems, except pupillary response. Also, if the
detection device does not need to be purchased separately or attached to the
body, then it can be considered easy to use.
Table 2.1 presents a summary of the various affect detection mechanisms, their gran-
ularity (how sensitive they are to affective changes), how easy it is to set it up, their
level of obtrusiveness (how intrusive, large and if they are attached to the body), the
minimum amount it costs to purchase it, how much computer resource is needed to run
CHAPTER 2. BACKGROUND AND RELATED WORK 46
it, its prominent disadvantages and its most suitable applications. Table 2.2 contains
accuracies of the affect detection mechanisms as extracted from the literature cited in
the rightmost column. The accuracy, application that motivated the research, number
of classification classes, method of analysis and the research novelty was tabulated.
Multi-modal studies (studies that used multiple affect detection mechanisms) were not
taken into considerations since it will be difficult to extract the accuracy of each of
the sensors separately.
2.5 Review of affect detection mechanisms
A review was conducted to search for papers in affective computing to choose a suitable
affect detection mechanism for user interfaces or visual contents.
2.5.1 Query Construction
The most important term here is affective computing because we do not want studies
that use these same detection mechanisms for purely health, sports or other unrelated
domain. Therefore, if a paper does not mention affective computing in its title or
abstract, it will not be extracted by the query.
We also needed to limit papers to only those that relate to the arousal dimension
of affect. The search is, therefore narrowed down by the terms arousal ‘or’ stress.
Arousal is a dimension of affect that deals with the intensity of emotion. Stress is a
form of mental and physical agitation in users and can be observed within a certain
range on the arousal scale. For now, there is no formal agreement on the specific level
of arousal that characterises stress. Despite the ambiguity and non-specific ways the
word stress has been applied, it remains a common term used to describe users that
are experiencing some level of discomfort during human-computer interaction. The
final search query constructed was:
Stress: Stress OR Arousal
+
Detection: Detection OR Recognition
+
Affective Computing: Affective Computing The query was applied to the following
CHAPTER 2. BACKGROUND AND RELATED WORK 47
Aff
ect
det
ecti
on
Gra
nula
rity
Ease
of
Obtr
usi
ven
ess
Fin
anci
al
Com
puta
tional
Dis
adva
nta
ges
Suit
able
mec
hanis
ms
setu
pco
stco
stapplica
tions
Rep
ort
edSel
f-re
port
edL
owL
owL
owL
owL
owP
ote
nti
al
for
bia
s,not
auto
mata
ble
Volu
nta
rily
/so
lici
ted
feed
back
,ea
sy,
seco
ndary
data
sourc
e
Physi
cal
/b
ehav
ioura
l
Faci
al
reco
gnit
ion
Low
Low
low
Low
Hig
hD
iffer
ence
sin
dis
pla
yru
les,
can
be
faked
E-l
earn
ing,
class
ifica
tion
into
dis
cret
eem
oti
ons
Ges
ture
sL
owL
owlo
wL
owH
igh
Diff
eren
ces
indis
pla
yru
les
Gam
es,
learn
ing,
exp
erim
ents
,m
ovie
class
ifica
tion
Voic
epro
sody
Med
ium
Low
Low
Low
Hig
hL
imit
edapplica
tions
Soci
al
net
work
s,le
arn
ing,
lie
det
ecti
on
NL
P&
text
min
ing
Med
ium
Low
low
Low
Med
ium
Language
diff
eren
ces
Soci
al
net
work
ing,
ente
rtain
men
t,co
nte
xtu
alizi
ng
KD
&M
DM
ediu
mL
owlo
wL
owL
owA
pplica
tion
dep
enden
tE
xp
erim
enta
l,sp
ecifi
cuse
rin
terf
ace
s
Physi
olo
gic
al
corr
elate
s
GSR
Hig
hM
ediu
mM
ediu
mM
ediu
mM
ediu
mC
onfo
unds
wit
hm
oti
on,
hum
idit
yand
tem
per
atu
re;
Late
ncy
;E
xp
erim
ents
,m
edic
al
EE
GH
igh
Hig
hH
igh
Hig
hM
ediu
mH
igh
gra
nula
rity
,co
mple
xit
yE
xp
erim
ents
,co
gnit
ive
scie
nce
,m
edic
al
EC
GH
igh
Hig
hH
igh
Hig
hM
ediu
mC
onfo
unds
wit
hm
oti
on,
low
gra
nula
rity
Exp
erim
ents
,m
edic
al
EM
GH
igh
Hig
hH
igh
Hig
hM
ediu
mC
an
be
faked
.C
lass
ifica
tion/es
tim
ati
on
of
basi
cem
oti
ons,
med
ical
PP
GH
igh
Med
ium
low
Low
Med
ium
Confo
unds
wit
hm
oti
on,
hum
idit
y,te
mp
eratu
re;
Late
ncy
.G
am
es,
exp
erim
ents
Pupilla
ryre
sponse
Hig
hL
owL
owL
owM
ediu
mC
onfo
unds
wit
hlight;
Req
uir
esuse
rfixed
posi
tion.
Analy
sis
of
vis
ual
stim
uli
Tab
le2.
1:C
ompar
ison
ofaff
ect
det
ecti
onm
echan
ism
s
CHAPTER 2. BACKGROUND AND RELATED WORK 48
Acc
ura
cyA
pp
lica
tion
Cla
sses
An
aly
sis
Nov
elty
Cit
ati
on
Faci
al
Exp
ress
ion
91.6
6%
(Faci
al
Ges
ture
det
ecti
on
)90%
(Em
oti
on
reco
gn
itio
n)
94.5
8%
(both
).D
river
s4
(hap
py,
an
ger
,sa
d,
an
dsu
rpri
se)
Fu
zzy
rule
Eyes
,li
ps
com
bin
ing
ges
ture
san
dex
pre
ssio
ns
[Agra
wal
etal.
,2013]
84.8
0%
Exp
erim
enta
l4
(norm
al,
pre
ten
ded
hap
py
an
dp
rete
nd
edsa
dfa
cial
exp
ress
ion
of
aff
ecti
ve
state
s)L
inea
rd
iscr
imin
ant
Th
rou
gh
faci
al
tem
per
atu
re[K
han
etal.
,2006]
87.5
0%
Mu
sic
ther
apy
3(h
ap
py,
neu
tral,
sad
)S
VM
Mu
sic
reco
mm
end
ati
on
syst
em[R
izk
etal.
,2014]
77.4
%(2
D)8
9.4
%(3
D)
Exp
erim
enta
l2
(posi
tive
an
dn
egati
ve)
SV
MU
sin
gco
nsu
mer
dep
thca
mer
as
an
dlu
min
an
ced
ata
[Sav
ran
etal.
,2013]
90.7
3%
Exp
erim
enta
l2
bin
ary
(happy,
an
gry
,sa
d,
surp
rise
d,
dis
gu
sted
an
dfe
ar)
Rad
ial
Basi
sF
un
ctio
nN
etw
ork
(RB
FN
)U
sin
gA
SM
(Act
ive
shap
em
od
el)
[Set
yati
etal.
,2012]
72%
Exp
erim
enta
l2
(str
esse
dor
not
stre
ssed
)S
VM
Usi
ng
ther
mal
an
dvis
ual
spec
tru
m[S
harm
aet
al.
,2013]
97.5
%(N
eura
lN
etw
ork
s)66.6
6%
(Reg
ress
ion
)E
xp
erim
enta
l6
(hap
py,
an
gry
,sa
d,
surp
rise
d,
dis
gu
sted
an
dfe
ar)
Reg
ress
ion
&N
NU
sin
gN
N(M
ult
ilay
erL
ayer
Per
cep
tron
wit
hB
ack
pro
pagati
on
learn
ing
alg
ori
thm
)[S
iraj
etal.
,2006]
95.3
%(D
elib
erate
lyd
isp
layed
)72%
(volu
n-
tari
lyd
isp
layed
)E
xp
erim
enta
l6
(hap
py,
an
gry
,sa
d,
surp
rise
d,
dis
gu
sted
an
dfe
ar)
SV
M&
HM
MT
emp
ora
lm
od
elli
ng
of
AU
’sfo
rfa
cial
reco
gn
itio
n[V
als
tar
an
dP
anti
c,2012]
Ges
ture
s
97.4
%(w
ith
self
rep
ort
edm
enta
lst
ate
)83.2
%(W
hen
incl
ud
ing
case
sn
ot
rep
ort
ed)
ITS
6(s
tres
sed
,sa
tisfi
ed,
tire
d,
thin
kin
g,
reca
ll-
ing,
con
centr
ati
ng)
Dyn
am
icB
ayes
ian
net
work
jun
ctio
ntr
eealg
ori
thm
Cla
ssif
yin
gm
enta
lst
ate
sth
at
aff
ect
learn
ing
[Ab
basi
etal.
,2010]
79%
Ad
ap
tive
gam
ing
3(l
evel
1,
2,
3)
of
sad
nes
s,fr
ust
rati
on
,h
ap
pin
ess,
joy
HM
MU
sin
gX
sen
sm
oti
on
cap
ture
syst
em(h
ttp
://w
ww
.xse
ns.
com
/)
[De
Sil
vaet
al.
,2006]
Voic
ep
roso
dy
77%
Exp
erim
enta
l2
(neu
tral
vs
emoti
on
al)
Sym
met
ric
Ku
llb
ack
-Lei
ble
rd
ista
nce
,lo
gis
-ti
cre
gre
ssio
nm
od
els,
HM
M[B
uss
oet
al.
,2009]
75%
for
2cl
ass
es(e
.gan
ger
vs.
joy
)84%
for
bin
ary
class
ifica
tion
(str
esse
dvs.
not
stre
ssed
)
Soft
ware
lib
rary
2(s
tres
sed
vs
neu
tral)
Mob
ile
base
dli
bra
ryto
det
ect
mood
an
dst
ress
.P
erfo
rman
ceis
30%
pro
cess
or
usa
ge
on
idle
mod
ean
d70%
wh
ile
an
aly
sin
g.
[Ch
an
get
al.
,2011]
80.1
0%
Cu
stom
erca
re2
(an
gry
an
dn
eutr
al)
Neu
ral
Net
work
sse
nti
men
td
etec
tion
(lan
gu
age
an
dte
xt
ind
e-p
end
ent)
[Morr
ison
etal.
,2005]
Tex
tm
inin
g85%
Web
conte
nts
3(p
osi
tive,
neg
ati
ve
an
dn
eutr
al)
pro
bab
ilit
yd
istr
ibu
tion
,d
efin
edru
les
[Lu
etal.
,2010]
KD
an
dM
D77%
-88%
for
bin
ary
Exp
erim
enta
lb
inary
(an
ger
,ex
cite
men
t,co
nfi
den
ce,
hes
itan
ce,
ner
vou
snes
s,re
laxati
on
,sa
dn
ess,
tire
dn
ess,
ner
-vou
snes
s)
Dec
isio
ntr
ees
Non
-inva
sive
&ch
eap
[Ep
pet
al.
,2011]
64.7
2%
for
vale
nce
an
d61.0
2%
for
aro
usa
lra
tin
gs
Exp
erim
enta
l2
(Low
vs
Hig
h)
AN
NU
sin
gA
NN
work
sw
ith
KD
&M
D[K
han
etal.
,2012]
GS
R40%
Exp
erim
enta
lC
orr
elati
on
Sta
tist
ics
Rel
ati
on
ship
bet
wee
nE
DA
an
dB
od
ym
o-
tion
.M
ovem
ent
occ
urs
3s
aft
erE
DA
sign
al.
[Ch
ella
lian
dH
ennig
,2013]
65.7
9%
Exp
erim
enta
l2
(str
esse
dvs.
rela
xed
)K
NN
,m
ult
ilay
erp
erce
ptr
on
,N
B,
ran
dom
fore
st,
Jri
pC
om
pare
pu
pil
lary
resp
on
sew
ith
GS
R[R
enet
al.
,2013]
EE
G
82%
Exp
erim
enta
l3
(posi
tive,
neg
ati
ve
an
dn
eutr
al
vale
nce
)Q
DC
,S
VM
,K
NN
Wir
eles
sE
EG
sen
sor
[Bro
wn
etal.
,2011]
82%
Exp
erim
enta
l2
(calm
/n
eutr
al
vs.
exci
ted
/n
egati
ve)
SV
MH
igh
erO
rder
Sp
ectr
um
(HO
S)u
sin
ggen
etic
salg
ori
thm
for
featu
reex
tract
ion
.[H
oss
ein
iet
al.
,2010]
84.5
0%
Exp
erim
enta
l4
(Aro
usa
lan
dva
len
ce,
hig
han
dlo
w)
KN
NS
elf-
org
an
izin
gm
ap
sfo
rb
ou
nd
ary
sele
ctio
n[K
hosr
owab
ad
iet
al.
,2010]
90%
Mu
sic
6(r
elax,
hap
py,
surp
rise
,sa
d,
fear
an
dan
gry
)G
MM
,B
ayes
ian
net
work
,O
ne-
Ru
le[K
hosr
owab
ad
iet
al.
,2009]
82.2
94
(joy
,sa
dn
ess,
an
ger
an
dp
leasu
re)
SV
M,
mu
ltil
ayer
per
cep
tron
Usi
ng
few
erE
EG
elec
trod
es[L
inet
al.
,2010]
69.6
9%
Mu
sic
4(j
oy,
sad
nes
s,an
ger
an
dp
leasu
re)
Mu
ltil
ayer
per
cep
tron
[Lin
etal.
,2007]
54.0
9%
,46.8
6%
an
d40.7
2%
resp
ecti
vel
yfo
rcl
ass
ifier
sL
earn
ing
4(b
ore
dom
,co
nfu
sion
,en
gagem
ent
an
dfr
ust
rati
on
)K
NN
,S
VM
,m
ult
ilay
erp
erce
ptr
on
[Mam
pu
sti
etal.
,2011]
87%
neg
ati
ve
vale
nce
(hig
hvs
low
aro
usa
l)66%
posi
tive
vale
nce
(hig
hvs.
low
aro
usa
l)E
xp
erim
enta
l2
for
neg
ati
ve
vale
nce
(hig
hvs
low
aro
usa
l)2
for
posi
tive
vale
nce
(hig
hvs.
low
aro
usa
l)sh
rin
kage
Lin
ear
Dis
crim
inant
An
aly
sis
(sh
LD
A)
[Math
ieu
etal.
,2013]
91.3
3%
Exp
erim
enta
l5
(hap
py,
surp
rise
,fe
ar,
dis
gu
st,
neu
tral)
KN
N[M
uru
gap
pan
an
dM
uru
gap
pan
,2013]
85.1
7%
Exp
erim
enta
l6
(hap
pin
ess,
surp
rise
,an
ger
,fe
ar,
dis
gu
st,
an
dsa
dn
ess)
SV
MU
sin
ghyb
rid
ad
ap
tive
filt
erin
gan
dh
igh
eror-
der
cross
ings
an
aly
sis
[Pet
ranto
nakis
an
dH
ad
jile
onti
ad
is,
2010]
EC
G78.2
1%
(for
sim
ilar
ages
)70.2
3%
(for
all
ages
)E
xp
erim
enta
l6
(Hap
pin
ess,
sad
nes
s,fe
ar,
surp
rise
,d
isgust
an
dn
eutr
al)
Reg
ress
ion
tree
,K
NN
an
dfu
zzy
KN
N(F
KN
N)
Eff
ect
of
age
gro
up
son
emoti
on
class
ifier
s’age
gro
up
s:9-1
6,
18-2
5,
39-6
8[J
erri
tta
etal.
,2013]
82.2
9%
Exp
erim
enta
l4
(Joy
,an
ger
,sa
dn
ess,
ple
asu
re)
BP
Neu
ral
net
work
,T
emp
late
Mach
ine
class
ifier
[Ch
eng
an
dL
iu,
2008]
Pu
pil
lary
resp
on
se90%
Exp
erim
enta
l2
(neu
tral
vs
aro
use
d)
Dec
isio
nT
ree
Per
iorb
ital
tem
per
atu
re[B
alt
aci
an
dG
okca
y,2014]
83.1
6%
aver
age
acc
ura
cyE
xp
erim
enta
l2
(rel
axed
vs
stre
ssed
)K
NN
,m
ult
ilay
erp
erce
ptr
on
,N
B,
ran
dom
fore
st,
Jri
pC
om
pari
ng
Pu
pil
lary
resp
onse
sto
GS
R[R
enet
al.
,2013]
Tab
le2.
2:A
ffec
tdet
ecti
onm
echan
ism
san
dth
eir
accu
raci
es
CHAPTER 2. BACKGROUND AND RELATED WORK 49
databases: ACM digital library, Springer, Science Direct, IEEE. After extracting the
results of the query and downloading the citations, title and abstract screening were
done. The following subsection iterates the protocols for inclusion and exclusion of
papers.
2.5.2 Exclusion criteria
The following papers were excluded:
- Duplicate studies
- Non-empirical studies
- Studies not reported in the English language
- Studies that do not clearly state the experimental conditions and demographics of
the participants
- Exclusion based on quality criteria, see 2.5.4.
- Conference proceedings, unless it is a full paper.
- Studies that do not report their findings clearly.
- Studies that do not have PDF versions.
- Studies that do not mention stress, arousal or emotion detection in their abstracts.
2.5.3 Inclusion Criteria
We included:
- Studies published from 2005 till 2014.
- Studies that explore methods to detect affect.
- Studies that use a single method or a combination of methods.
- Studies that improve detection mechanisms through the data cleansing phase even
if the studies have been done previously using the same detection mechanism.
CHAPTER 2. BACKGROUND AND RELATED WORK 50
- Studies that improve affect detection mechanisms through the machine learning
or statistical method for analysis even if the studies have been done previously
using the same detection affect detection mechanism.
2.5.4 Quality assessment
We will exclude fatally flawed studies from the review. These will be identified using
the following five questions developed by Dixon et al. [Dixon-Woods et al., 2006].
1. Are the aims and objectives of the research clearly stated?
2. Is the research design clearly specified and appropriate for the aims and objec-
tives of the research?
3. Do the researchers provide a clear account of the process by which their findings
we reproduced?
4. Do the researchers display enough data to support their interpretations and
conclusions?
5. Is the method of analysis appropriate and adequately explicated?
2.5.5 Results
A total of 1,255 papers were extracted from the database, and 741 were excluded after
applying the exclusion and inclusion criteria, leaving us with 514 papers for full-text
screening. With these remaining papers, I carried out a text search for the terms
“Web” or “internet” or “browse” and it yielded seven papers. This suggests that only
a few studies have focused their research to affect detection on the Web. In the next
subsection, we present a synthesis of the final papers extracted from the review that
contain either a methodological or theoretical similarity/relationship to our research.
2.5.6 Synthesis of related work
As early as 1974, Janisse et al. tested the hypothesis that pupil dilation and con-
striction occurs with attraction and constriction, respectively [Janisse, 1974]. Results
CHAPTER 2. BACKGROUND AND RELATED WORK 51
showed that there was a linear correlation between pupil, and affective intensity, while
there is a curvilinear relationship related to valence [Janisse, 1974]. Hence, pupil di-
lations occur in both positively and negatively valenced stimuli and are proportional
to the intensity [Janisse, 1974]. In 2003, Partala et al. confirmed Janisse et al.’s
finding that pupil size variation is an indication of affective processing. Participants
were presented with ten positive, neutral and negatively valenced sounds each; results
showed that after two seconds offset, participants’ pupils were significantly larger for
both negative and positive stimuli, than for the neutral stimuli [Partala and Surakka,
2003]. Bradley et al. carried out a study that showed that pupillary response reflects
emotional arousal in affective picture viewing tasks [Bradley et al., 2008a]. Similarly,
in 1966, Kahneman et al. showed through a study, that pupil dilation is a measure of
cognitive load and task difficulty [Kahneman and Beatty, 1966]. In other early, but
related works, pupil dilation, has been combined with other mechanisms, for example,
GSR, Blood Volume Pressure (BVP) and ST to detect the presence of stress or in-
creased arousal [Zhai and Barreto, 2006]. Calvo et al. reviewed affect detection with a
broad overview of models, methods, and their applications [Calvo and D’Mello, 2010].
Bremner et al. researched on the amplitudinal changes in pupil size, and the velocity
to constrict after exposure to light stimuli [Bremner, 2012]. Partala et al. provided
evidence that positive interventions can indeed trigger a different affective state when
participants experience a negative one, such as during a mouse pointer failure [Partala
and Surakka, 2004]. So far, within this section, we presented the early related work
(published before 2014), that we extracted from our review for the theoretical under-
pinnings behind our research.
Table 2.3 summarises more recent related works that have been published between
2015 and 2019 during the period of our research. These papers were retrieved by
querying Google scholar, ACM digital library, Springer, Science Direct and IEEE us-
ing the criteria (pupil OR eye-tracking) AND (web OR human-computer interaction or
user interface or adaptive computing) AND (arousal OR stress) for papers published
between 2015 and 2019.
CHAPTER
2.BACKGROUND
AND
RELATED
WORK
52
Table 2.3: Related work in theoretical findings, applications, methods or sensors used
Theoretical Application Analysis Method
No Novelty Review Empirical Lab Wild AI/ML Others Sensor Source
1
Model for
predicting stress in
deskjobs
X X Smartwatch [Sanchez et al., 2018]
2 Detecting stress in UI X Patterns Mouse, Gaze [Wang et al., 2019]
3
Emotion
intensity-duration
model
X X Pupils [Steephen et al., 2018]
4Pupilsreflect arousal
from clickbaitX X [Pengnate, 2016]
5
Sensing and eliciting
emotions from
humanoid-robots
X Average Pupil, EEG [Guo et al., 2019]
6Arousal modulated
game difficultyX GSR [Amico, 2018]
7 Stress detection X Various [Sioni and Chittaro, 2015]
8Predicting intentions
and emotionX X Gaze, head motion [He et al., 2018]
9
Relationship between pupil
dilation, task difficulty,
habituation
X Pupil [Matsumoto et al., 2016]
10Detecting emotional
valence/brain activationX Average Pupil, fMRI [Park and Kim, 2016]
11Arousal and Valence
recognition in gamingX Hilbert transform Pupil [Alhargan et al., 2017]
12Emotion recognition:
Lab vs wearable sensorsX X Various [Ragot et al., 2017]
CHAPTER
2.BACKGROUND
AND
RELATED
WORK
53
Continued from previous page
Theoretical Application Analysis Method
No Novelty Review Empirical Lab Wild AI/ML Others Sensor Source
13Affect detection with
thermal imagingX X Thermal imaging [Latif et al., 2015]
14
Logging viewers affective
responses to visual
attention on paintings
X X Pupil, Gaze [Calandra et al., 2016]
15Affect recognition
for e-learningX X Pupil [Xing et al., 2016]
16Sensing stress from
mobile typingX Mobile typing pressure [Exposito et al., 2018]
17Relationship between listening
duration and arousalX X Pupil [McGarrigle et al., 2017]
18
Relationship between pupil
dilation, mental load
and light
X X Pupil [Pfleging et al., 2016]
19
Impact on pupillary
response on gaze
estimation
X X Pupil, Gaze [Choe et al., 2016]
20
Model relating pupil
dilation with Cognitive
arousal vs light
X X Pupil [Duchowski et al., 2018]
21
Relating memory
capacity with pupil
size variability
X Pupil [Aminihajibashi et al., 2019]
22Estimating mental load
from pupil and gaze dataX Bayesian surprise Pupil, Gaze [Wolf et al., 2018]
CHAPTER
2.BACKGROUND
AND
RELATED
WORK
54
Continued from previous page
Theoretical Application Analysis Method
No Novelty Review Empirical Lab Wild AI/ML Others Sensor Source
23
Detecting affect from
facial expression and
pupil dilation.
X Average Pupil, Facial [Tangnimitchok et al., 2018]
24Sensing stress in
visual perceptionX X Pupil, Gaze [Chmielewska et al., 2019]
25Stationary of pupil
in predicting WorkloadX X Pupil [Buettner et al., 2018]
26Pupil dilation as a predictor
of emotional engagementX Average Pupil [Henderson et al., 2018]
27
Stress detection with pupil
dilation and facial
temperature
X X Pupil, Thermal imaging [Baltaci and Gokcay, 2016]
28
Detecting emotional
valence from EEG and
pupil dilation
X Average Pupil, EEG [Abdrabou et al., 2018]
29Detecting emotional
valence from pupil dilationX X Pupil [Babiker et al., 2015]
30Pupillary response in
virtual reality enviromentX X Pupil [Chen et al., 2017]
31Decoupling light reflex
from pupillary dilationX X Pupil [Raiturkar et al., 2016]
32
Pupillary responses influence
internal belief states of
correctness or error in
decision making
X Average Pupil [Colizoli et al., 2018]
33Pupil dilation as a mechanism
for attention-aware systemsX X Pupil (glasses) [Gollan et al., 2016]
CHAPTER
2.BACKGROUND
AND
RELATED
WORK
55
Continued from previous page
Theoretical Application Analysis Method
No Novelty Review Empirical Lab Wild AI/ML Others Sensor Source
34Pupil dilation for tracking
lapses in attentionX X Pupil [van den Brink et al., 2016]
35
Determining the relationship
between arousal vs pupil
dilation, HR and GSR
X Average Pupil, HR, GSR [Wang et al., 2018]
36
Pupil dilation and HR as
an indicator of
cognitive load
X X Pupil, HR [Jercic et al., 2018]
37Pupil responds
to surpriseX X Pupil [Alamia et al., 2019]
38Pupil dilations
to sense arousalX X Pupil [Kassem et al., 2017]
39The window of cognition
in pupil dilationX Pupil [Korn and Bach, 2016]
40
Pupil size responds
to attention and
experience
X X Pupil [Wahn et al., 2016]
41
Correlating gaze position
with pupil size
measurements
X X Pupil, Gaze [Hayes and Petrov, 2016]
42
Predicting click intention
from pupil dilation, EEG
and Gaze tracking
X X Pupil, EEG, Gaze [Slanzi et al., 2017]
CHAPTER
2.BACKGROUND
AND
RELATED
WORK
56
Continued from previous page
Theoretical Application Analysis Method
No Novelty Review Empirical Lab Wild AI/ML Others Sensor Source
43
The relationship between
affect and ratings of
visual complexity
suggest an
‘arousal-complexity bias’
X X Pupil [Madan et al., 2018]
44Review of low-cost
eye trackersX Gaze [Ferhat and Vilarino, 2016]
45Review of pupil dilation
in sensing cognitive loadX Pupil [Einhauser, 2017]
46Gaze tracking, attention
and consumer behaviorX Pupil, Gaze [Rosa, 2015]
47
Effects of brightness, contrast
and hedonic content
on pupil diameter
X Pupil [Bradley et al., 2017]
48Effects of music on
the pupil sizeX X Average Pupil [Laeng et al., 2016]
49
Using the startle eye-blink
for measuring player
affect in games
X X EMG [Nesbitt et al., 2015]
50pupillary responses
to musicX X Pupil [Gingras et al., 2015]
51
Recommendations in
using neurophysiological
signals
X Various [Brouwer et al., 2015]
52Pupil dilation to measure
cognitive effortX Pupil [van der Wel and van Steenbergen, 2018]
CHAPTER
2.BACKGROUND
AND
RELATED
WORK
57
Continued from previous page
Theoretical Application Analysis Method
No Novelty Review Empirical Lab Wild AI/ML Others Sensor Source
53Stress detection in
healthX Various [Greene et al., 2016]
54Emotion recognition
from pupil dilation and EEGX X Pupil [Lu et al., 2015]
55
The role of image duration,
habituation, and viewing
mode (active/passive)
of affective pictures
X X Pupil [Snowden et al., 2016]
56
Eye movement and
fixation to sense affect in
emotive pictures
Average Pupil, Gaze [Simola et al., 2015]
CHAPTER 2. BACKGROUND AND RELATED WORK 58
Table 2.3 shows that out of 56 related works, 28 were theoretical studies, while
the other 28 were applications or systems that demonstrate the use of affect detec-
tion in different domains. Out of the 28 theoretical studies, seven were reviews and
recommendations for affective computing while the other 21 were theoretical concepts
with empirical studies. Surprisingly, out of the 28 related applications or systems im-
plemented for affect detection, we found only one that was carried out in the wild -
‘logging the affective state and focal attention of gallery viewers using an eye tracker’
([Calandra et al., 2016]). Regarding the analysis methods, 9 studies made use of
machine learning, and were mostly multimodal sensors for affect detection, while 34
studies made use of other analysis methods (statistics, probabilistic or mathematical
models). The rightmost column of table 2.3 summarizes the novelty in each work to
our research.
From the seven reviews in Table 2.3, Sioni et al. reviewed various sensors for stress
detection. Sioni et al. noted the various application areas of stress detection, including
learning, communication and health, while pupillary response was listed as one of the
emerging technologies for stress detection [Sioni and Chittaro, 2015]. Similarly, Greene
et al. reviewed sensors for stress detection with a focus on health applications [Greene
et al., 2016]. Brouwer et al. reviewed neurophysiological signals, especially noting
the pitfalls (e.g. accuracy measures, purpose of use, confounding factors, statistical
methods) to consider for an affect enabled application [Brouwer et al., 2015]. Rosa et
al. reviewed eye-tracking technology in a bid to reveal methodological and technical
challenges in inferring cognitive and emotional processing, but in consumer behaviour
[Rosa, 2015]. Einhauser et al. reviewed techniques, applications and physiological
responses of the pupil to cognition, emotion, attention and memory [Einhauser, 2017].
Another study by van der Wel et al. investigates cognitive control tasks and revealed
a diverging relationship between pupil dilation and performance [van der Wel and van
Steenbergen, 2018]. As our contribution aims to select an affect detection mechanism
with the potential for ubiquitous use, Ferhat et al. reviewed low-cost eye trackers
to highlight their technologies from various perspectives (calibration strategies, head
pose invariance, and gaze estimation techniques) [Ferhat and Vilarino, 2016].
Other empirical studies related to our research includes the work of Matsumoto et al.
who studied the relationship between pupil dilation, task difficulty and habituation
CHAPTER 2. BACKGROUND AND RELATED WORK 59
[Matsumoto et al., 2016]. Similarly, Snowden et al. studied habituation in the viewing
of affective pictures [Snowden et al., 2016]. In studies that performed comparisons
between sensors, Wang et al. compared the relationship between arousal and pupil
dilation, HR and GSR [Wang et al., 2018]. Similarly, Ragot et al. performed a compar-
ison between Biopac MP150 (laboratory sensor) and Empatica E4 (wearable sensor)
trained using machine learning models and found similar accuracies between the two,
thereby, supporting the viability of emotion recognition in the wild [Ragot et al., 2017].
Light represents the largest source of noise when extracting physiological response to
arousal [Bradley et al., 2017]. Hence, there were several studies that researched on
models relating (or decoupling pupil) dilation as an affective response and pupillary
response to light or brightness [Pfleging et al., 2016, Duchowski et al., 2018, Raiturkar
et al., 2016, Bradley et al., 2017]. Some other studies researched the relationship be-
tween pupil dilation and listening, sound for music applications [McGarrigle et al.,
2017, Laeng et al., 2016, Gingras et al., 2015]. Further, empirical studies were done
relating pupil dilation with physiological arousal [McGarrigle et al., 2017] and cog-
nition [Duchowski et al., 2018, Korn and Bach, 2016]. Other more specific variables
that were investigated against pupil dilation include task difficulty [Matsumoto et al.,
2016], mental/ workload [Pfleging et al., 2016, Buettner et al., 2018], confidence of
decision making in tasks [Colizoli et al., 2018], surprise [Alamia et al., 2019], visual
complexity rating [Madan et al., 2018], memory capacity [Aminihajibashi et al., 2019],
attention and experience [Wahn et al., 2016].
Regarding the studies that investigated physiological means for affect detection, all
but one was implemented in a naturalistic setting. Calandra et al. designed a system
called EYECU (pronounced I see you) to capture the affective responses of users as
well as their attention on paintings [Calandra et al., 2016]. Other related works on
applications in the lab that do not include the use of pupillary response were that of
thermal imaging [Latif et al., 2015] and EMG [Nesbitt et al., 2015] for affect detection,
and smartwatches [Sanchez et al., 2018], mobile typing pressure [Exposito et al., 2018],
mouse and gaze tracking [Wang et al., 2019] to detect the presence of stress. Some
applications combined pupillary response with other mechanisms like facial expres-
sion [Tangnimitchok et al., 2018], gaze tracking [Calandra et al., 2016, Choe et al.,
2016, Wolf et al., 2018, Slanzi et al., 2017], thermal imaging [Baltaci and Gokcay,
CHAPTER 2. BACKGROUND AND RELATED WORK 60
2016], EEG [Slanzi et al., 2017], fMRI (functional magnetic resonance imaging) [Park
and Kim, 2016] and HR [Jercic et al., 2018] to sense arousal and cognitive load.
With regards to the analysis methods used, many of the approaches used measures
of central tendency [Tangnimitchok et al., 2018, Henderson et al., 2018, Abdrabou
et al., 2018, Colizoli et al., 2018, Wang et al., 2018, Laeng et al., 2016, Simola et al.,
2015]. Other computational methods/techniques used by these studies include ma-
chine learning/AI [Latif et al., 2015, Xing et al., 2016, Sanchez et al., 2018, He et al.,
2018, Ragot et al., 2017, Chmielewska et al., 2019, Baltaci and Gokcay, 2016, Babiker
et al., 2015, Lu et al., 2015], Bayesian surprise [Wolf et al., 2018], Hilbert transform
[Alhargan et al., 2017] and pattern recognition [Wang et al., 2019].
2.6 Rationale for pupillary response
Pupillary responses in affective computing refer to the changes in pupil diameter that
are related to responses to emotional stimuli [Partala and Surakka, 2004]. The pri-
mary function of the pupil is to regulate the amount of light coming from the cornea
towards the retina 2. However, psychologists have long used pupillometry to measure
changes in autonomic activities of the nervous system [Steinhauer and Hakerem, 1992].
In a controlled experiment where the amount of light could be regulated, the pupillary
response is a useful affect discriminator for arousal [Pfleging et al., 2016]. Also, in
natural settings, the use of web camera’s could be explored due to its availability and
simplicity of use [Calandra et al., 2016].
Eye-tracking is used to observe visual behaviours to understand responses around the
user’s area of interest, user’s scan paths, the timing and variability between the two
[Pantic et al., 2007]. This has been studied in HCI to improve several aspects of user
interaction. Some of these ways include identifying areas of the visual stimulus that
has caused the user to exhibit a particular behaviour such as prolonged eye gaze or
frequent scans along the area [Davies et al., 2016]. It has also been used to observe
how users transition between areas of interest so that user interface elements that
are frequently used will be placed in more conspicuous locations or elements that are
of high value will but hidden will be placed in areas where users record more eye
2http://medical-dictionary.thefreedictionary.com
CHAPTER 2. BACKGROUND AND RELATED WORK 61
gaze [Eraslan et al., 2016b]. Pupillary response conversely will offer the subjective
behaviours of users, such as providing more understanding of what aspects of user
interaction are related to specific affective states [Yaneva et al., 2016a].
This research aims to contribute to the understanding of how pupillary responses cor-
responds to affective states. This will aid in the analysis of visual behaviour, especially
their affective response to stimuli.
2.7 Representing affect
As early as 2001, there were already more than 90 definitions of emotions, and we
would expect in its representation, there are at least as many variations, some only
different in the lexicon or application [Plutchik, 2001]. The oldest representation of
emotion is called discrete emotions. Paul Ekman proposed that there are six discrete
emotions, namely: fear, anger, sadness, happiness, surprise and disgust [Ekman and
Friesen, 1971]. However, Izard and Caroll E. postulated that there are four more
including interest, contempt, shame, guilt [Izard, 1992]. Later on, Allen applied
3
Figure 2.4: Facial expressions for discrete emotions
discrete emotions to product review scales. There have been several other suitable
applications of discrete emotions, but the most notable one is the classification of
facial expressions into discrete emotions by Paul Ekman, considered to be a pioneer
in the study of emotions [Allen et al., 1988]. Discrete emotions is a natural way,
familiar to humans of describing emotions; however, semantic ambiguity often makes
it challenging to quantify and represent emotions in affective computing.
Founded by Russel, the dimensional way of representing emotion is based on the
concept that emotions can be represented along 2 or 3 dimensions [Russell and Pratt,
1980]. The two dimensions are valence/pleasure (how pleasant/unpleasant the emotion
is) and arousal (the intensity of the emotion). Another dimension was proposed by
CHAPTER 2. BACKGROUND AND RELATED WORK 62
Mehrabian and Russel because it was observed that a third dimension, dominance
(the measure of control/submissiveness to emotion) was needed to distinguish between
some emotions with similar plots on the two-dimensional scale [Mehrabian and Russell,
1974, Russell and Mehrabian, 1977]. The dimensional representation is frequently
used in affective computing because it can be applied using mathematical expressions,
formulas and models. Some limitations of this method of representing affect are that
it is too fine-grained, and even though emotions are continuous quantities, they are
not well understood in ways that can effectively utilise in such low granularity scale.
These dimensional scales are also not natural in the way human beings understand
emotions to be.
The component model is another way emotions have been represented. Plutchik
4
Figure 2.5: 2D representation of emotion
proposed the emotion wheel, analogous to the colour wheel in 2D perspective, and
the cone visualisation in 3D to understand relationships between emotions [Plutchik,
1980]. He opined that there are eight emotions known as the basic emotions, and a
mixture of 2 basic emotions could form a dyad. For example, submission = trust +
fear. There is also the notion where emotion is the inverse of another, i.e. each of
the primary emotions, have their opposite states as seen in surprise and anticipation.
Also, those emotions could manifest with different levels of intensity, just like serenity
is of lower intensity to ecstasy. In the conical model (3D), the apex of the cone is
the most neutral state while in the colour wheel, the intensity of emotions reduces
towards the edges. [Ortony et al., 1990] postulated the OCC model, which is similar
CHAPTER 2. BACKGROUND AND RELATED WORK 63
5 6
Figure 2.6: Plutchik’s emotion wheel and cone representation of Plutchik’s emotions
to discrete emotions in that there are 22 emotion states. The difference is that the
emotional states are considered to be a possible consequence of an event, which may
be experienced due to: 1. The consequence of event (good or bad) 2. The action
of the agent and 3. Aspects of objects like appealing or not. Several researchers
have criticised the ambiguity in this model and therefore proffered revised versions
like the one in [Steunebrink et al., 2009]. Despite its complexity, it has been useful in
simulating and predicting emotions in several affect-aware applications [Ko lakowska
et al., 2015].
The next section reviews several applications of affective computing, by domain area.
2.8 Applications of affective computing
All affect detection mechanisms have specific applications they are suitable for regard-
less of their limitations. There is no formally agreed classification of the applications of
affective computing. Rosalind Picard classified applications according to their abilities,
so an application could have the ability to recognise, express and have emotions while
Schwark proposed a more elaborate taxonomy in affective computing where questions
CHAPTER 2. BACKGROUND AND RELATED WORK 64
are answered from the bottom up which then makes up an application definition [Pi-
card, 1997, Schwark, 2015]. The questions are to determine, from bottom to apex, its:
purpose/goal, level of integration, affective understanding, affect generation and the
platform. Regardless of the semantics for our choice of classification, we will discuss
the most common applications of affective computing by domain.
Entertainment is one area affective computing has been applied. An example of its
use in entertainment is in music and movies. These forms of entertainment have be-
come very accessible and portable through music players, computers, internet stream-
ing and radio station. The numerous options available affords listeners and viewers
the opportunity to be selective, but the question is, what criteria should they use
when selecting one for a moment? Bearing in mind that tracks are being released at
a pace that no single person can test all of them to learn by experience which track,
album or artist is suitable for the moment. This has motivated the use of machine
learning approaches to classifying music and movies into the kind of emotions they
elicit. Furthermore, it is prevalent in the studies of affect detection to use sounds and
music to elicit emotions [Yazdani et al., 2012]. Some of the research in this domain are
cited here [Yang and Chen, 2012, Burger et al., 2013, Janssen et al., 2012, Daimi and
Saha, 2014]. One of the applications of affective computing in computer science
is adaptive systems. In adaptive systems, several aspects of interaction are moni-
tored such as attention, planning, learning, memory and decision-making [Dalvand
and Kazemifard, 2012]. Although there exist acceptable standards of design centred
around accessibility, usability and ethical principles, to a large extent, these rules are
agnostic of individual idiosyncrasies of each user [Van Schaik and Ling, 2008]. These
formally agreed principles are guidelines that generated for most users, but there will
always be individual specific desires and preferences regarding application features,
design, presentation and mode of interaction. Just like in human-human interaction,
it is a good idea to start with certain heuristics for interacting with people then as the
interaction progresses and better understanding is developed, the manner of commu-
nication is gradually tailored to suit individual preferences. The idea is not to discard
the already established heuristics because they provide a starting point for user inter-
action but to develop a responsive and dynamic interaction that improves over time
by learning individual preferences.
CHAPTER 2. BACKGROUND AND RELATED WORK 65
When building an adaptive system, it is important that the mode in which the system
learns about the user does not introduce more problems by making the user having
to put on discomforting gadgets, increasing the number of tasks, compromising user
privacy or causing a bias that could skew the intended outcome of the system. In
an adaptive system, the selection of affect detection mechanism goes a long way in
determining its success because as seen in 1.1 affect detection is done twice within a
cycle, i.e. in detecting and evaluating the outcome.
Closely related to adaptive systems is usability and accessibility engineering, where
user interfaces are being tested to understand the visual and psychological behaviour
of users during human-computer interaction. Many applications of adaptive systems
have leveraged on the use of gadgets that contain pre-built sensors for detecting human
affect [Datcu, 2014]. However, user interfaces provide the avenue for users to perform
a variety of tasks such as browsing web pages, to sending emails, reading, watching
multimedia, chatting. Because the computer resources used to perform these tasks
are different, and their expected interaction behaviours are different, we need to make
use of a detection mechanism that factor in context, behaviour and affective state
into recognising and representing the user’s response. The eye tracker is a common
way of achieving this, but to study the subjective aspect of the user’s experience, a
different approach should be used to measure the physiological correlates to the user’s
affective state. Some of the studies in this area include Lunn et al. [Lunn and Harper,
2010a] who used GSR to identify areas of frustration in older web users, Fernandez et
al. [Fernandez et al., 2012] who used carried out a survey on stress during decision
making using stock traders and how stress affects their emotions and decision making
as a case study. Another study based on how cognitive load affects decision-making
was carried out on users of Computer Aided Design (CAD) software [Liu et al., 2014].
EEG, GSR and ECG sensors were used to collect psychophysiological data on the
participants, and fuzzy logic was used to model these physiological responses to frus-
tration, satisfaction, engagement and challenge.
Affect-aware games are games that are designed to adapt to the affective state of a
player. The goal in such applications is to ensure that the user remains challenged
and entertained [Hudlicka, 2009]. To accomplish this, the system needs to detect bore-
dom, loss of focus, frustration and any of such affective states that could cause user
CHAPTER 2. BACKGROUND AND RELATED WORK 66
disengagement [Gilleade et al., 2005]. When unwanted affective states are recognised
by the system, several actions such as increasing or reducing game complexity, reward
and punishment, providing hints can be used to induce a more palatable emotional
state [Liu et al., 2009, Rani et al., 2005]. Another application of affective computing in
games is in empathetic agents. Especially in role-playing games, the characters need
to display an emotion that closely depicts the situation of a game[Kim et al., 2004a].
In the learning domain, affective computing has been called affective learning, Intel-
ligent Tutoring Systems (ITS) and Affective Tutoring Systems (ATS). They are either
concerned with the detection of student’s affective state [Sarrafzadeh et al., 2006],
inducing desirable affective states in students [Alexander et al., 2003], enabling tu-
toring agents with emotions [Merrill et al., 1992] or a combination of those features
[Mao and Li, 2010]. In Medicine, affective computing has been used to study the
correlation between psychosomatic illnesses and the users’ affective state. It has also
been used to monitor the treatment of such illnesses on a longitudinal and non-invasive
basis [Bamidis et al., 2004, el Kaliouby et al., 2006]. Also, affective computing and
psychophysiology are used in medical and health domain to understand the way the
brain and nervous system function in particular through the use of EEGs [Wolpaw
and McFarland, 1994, Neuper et al., 2003].
2.9 Summary
Our literature review reveals a research gap that limits the potential of affecctive com-
puting. The goals of affective computing are for computers to be able to recognise
human emotions and adapt to it by altering aspects of user interaction or by ‘showing’
emotions to simulate human empathy, all in a bid to improve the quality of human-
computer interaction. The current gap in affective computing is the challenge of affect
detection. For affective computing to fulfil its potential, further research is needed on
affect detection mechanisms that have the potential for wide-spread ubiquitous use.
Consequently, methods of detecting affect were critically reviewed to select a suitable
affect detection mechanism for use in both experimental conditions and natural set-
tings during interaction with user interfaces and visual contents.
CHAPTER 2. BACKGROUND AND RELATED WORK 67
Our review and background study shows that pupillary responses provide an unobtru-
sive approach to affect detection during the human-computer interaction. It offers the
opportunity for controlled laboratory studies as well as the potential for use in natu-
ralistic settings through the use of web cameras. Furthermore, through the analysis of
gaze behaviour, we can detect the users’ focal attention during moments of increased
arousal.
The next chapter presents our research methods in detail.
Chapter 3
Development of AFA algorithm
In Chapter 2, we discussed related work and systematically reviewed existing mecha-
nisms for sensing affect. Pupillary response, and gaze behaviour was selected as our
preferred approach to sensing arousal because it is unobtrusive and has the potential
for deployment in the wild. Further, the analysis of gaze behaviour adds context to our
measure of arousal. We start off by discussing the existing pupillometry devices, their
uses and limitations. Next, we explore the pupillary response data through a secondary
analysis of eye-tracking datasets. The findings from our exploration study informed
the next section, where we described the structure and characteristics of pupillary
response data. Next, we describe some of the techniques that have previously been
used to analyse pupillary response to extract affective signals from it. Based on the
lessons we learnt from the exploration of some of these techniques in our secondary
data exploration, we decided on our preferred technique for sensing arousal. Next,
we explain our method for analysing pupillary response to sense changes in arousal
in more detail. We conclude the chapter by laying out the plan for evaluating AFA
algorithm.
3.1 Pupillometry
Pupillometry is the study of changes in the diameter of the pupil as a function of cog-
nitive processing [Sirois and Brisson, 2014]. Pupillometry is of interest in the fields of
medicine, neuroscience, physiology and psychology. We aim to leverage the literature
on the physiology of the pupils, and the psychological principles behind physiological
68
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 69
responses, that relate to human-computer interaction. Mathot et al. categorised the
causes of pupillary response into three, namely: the pupil’s light response, the pupil’s
near response and the pupil’s response to arousal/mental effort/cognition [Mathot,
2018]. In the first case, the pupil responds to brightness by constricting in size to
accommodate less light. For the second, whenever an image is too close, the pupil
constricts to maintain focus and to sharpen the image. Finally, of the three cate-
gories, arousal/mental effort and cognitive load is the main factor that causes the
pupils to dilate. See Chapter 2 for a detailed description of the physiology of pupil
dilation. This section aims to list out the mainstream eye-tracking devices. Some
of the most popular manufacturers of eye-tracking equipment include Tobii, Natural
Point, Eyetribe, SMI, Eyelink and Gaze Point. Table 3.1 shows a table comparing
products, their sampling rates - F (Hz), portability, applications and their support for
pupil dilation.
CHAPTER
3.DEVELOPMENT
OFAFA
ALGORITHM
70
Table 3.1: Comparison of eye-tracking vendors
OEMs Product F (Hz) Portable? Application PD?
Tobii Pro glasses 2 50 & 100 YesMobile tracking of attention, engagement, training, skill
transfer and performance enhancementYes
Pro Spectrum 600 & 1200 NoHigh fidelity research with synchronization to external
sources: EEG, GSRYes
Pro X3-120 120 YesIt is designed for detailed research into the timing and
duration of fixations on screen-based stimuliYes
Pro X2 30 & 60 YesIt is ideal for usability and market research studies in the
fieldYes
Pro T60XL 60 No
Measure gaze behaviour over widescreen angles and large
stimuli for a broad range of
psychology and neuroscience research.
Yes
Pro TX-300 300 No
Study occulomotor functions and
capture natural human behavior
without the need for chin or head rest.
Yes
Dynavox PCEye Plus 30 YesTo aid accessibility and control user interaction on the
laptop or desktop using the eyesNo
CHAPTER
3.DEVELOPMENT
OFAFA
ALGORITHM
71
Table 3.1 – Continued from previous page
OEMs Product F (Hz) Portable? Application PD?
PCEye Mini 30 YesTo aid accessibility and control user interaction on the
tablet PC using the eyesNo
Eye Tracker 4C 90 YesUsed as an eye tracking peripheral to improve gaming
experienceNo
Natural Point TrackIR 120 Yes For tracking head movement as a gaming accessory No
SmartNav 3 & 4 NA YesAn assistive technology to aid accessibility for cursor
controlNo
Eyetribe NA NA YesProvides an SDK to either develop applications based on
gaze behaviour or capture the data for research purposes.No
SMI iview X 1250 Novideo-based
tracking with chin restYes
Glasses 120 YesMobile tracking of natural gaze behavior in the wild, with a
virtual reality settingNo
CHAPTER
3.DEVELOPMENT
OFAFA
ALGORITHM
72
Table 3.1 – Continued from previous page
OEMs Product F (Hz) Portable? Application PD?
HTC Vive 250 Yes
To perform immersive scientific grade research. For a totally
controlled naturalistic study by making participants
immersed into the stimuli to understand perception,
visual search, UX studies and clinical research.
No
Eyelink Portable duo 2000 Yes
It can be used for eye-movement research, both in and out
of the lab. Can be programmed with its SDK using several
programming languages and on multiple operating systems.
Yes
100 Plus 2000 YesVideo-based tracking in head-supported or head-free mode.
Device can be configured in different mount modes.Yes
Eyelink II 500 Yes Head mounted video based and scene tracking Yes
Gaze Point GP3 HD 60 & 150 Yes
A research-grade eye tracker for usability
and UX studies and recommended for
programmers who want to develop
gaze-based applications
Yes
Laptop Mount NA Yes For use with laptops and notebooks NA
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 73
There are some solutions for low-cost eye-trackers or web-cameras but they are not
yet of commercial grade [Fuhl et al., 2018, San Agustin et al., 2010, Kassner et al.,
2014, Mantiuk et al., 2012]. For all our studies, we made use of the Tobii X-60 and
captured data at 50Hz. Our experimental set up is shown in Figure 1.1.
.
Figure 3.1: Setup of eye tracker
Having itemised the various pupillometric devices, in the next section, we proceed
to explore several features of pupillary response data that can be used to understand
users’ affective response towards visual interaction.
3.2 Exploring pupillary response data
To explore the dynamics of pupillary response, we extracted data from a pre-existing
study. The original aim of this study was to understand the visual behaviours of
medical experts as they interpret ECG scans to improve the accuracy of ECG inter-
pretation. There were 31 participants - 23 (74%) female and 8 (26%) male. Most
of these participants (74.2%) were cardiac physiologists/technicians and students of
cardiac physiology, while the remaining 25.8% were of other health-related professions,
including nurses, doctors, and students. Students make up 12.9% of participants. All
participants had some training on ECG interpretation, although varying in level and
experience, Mdn = 7years(2− 35). The stimuli presented to the participants were 18
ECG scans in random order, without time limit, until they made their interpretation
of each scan. Further details about the initial study can be found elsewhere [Davies
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 74
et al., 2016]. For our exploration, we selected two ECG stimuli, both having 12 leads
(grids that represent segments in the heart signal). Those two stimuli were selected
because they had the closest number of correct as incorrect interpretations so that
the analysis is balanced between these groups for statistical validity. Each stimulus
was split into different AOIs that correspond to each of the 12 ECG leads, using the
eye-tracking software see Figure 3.2. This makes it possible to relate areas where
Figure 3.2: Areas of interest overlaid on a 12-lead ECG
participants gazed at, to a specific lead on the ECG. The questions this secondary
data analysis aimed at exploring was, “is there a statistically significant difference in
the pupillary response of those who got it correct from those who got it wrong?”.
If there is, “can we gain a better understanding of affective states by analysing the
pupillary response of people who got it correct against those who got it wrong?”. This
may be possible if we assume that there is increased anxiety and cognitive load in
people who have limited understanding as opposed to people who indicate what the
ECG scan represents. It could also be hypothesised that those who are reputable or
experts at reading ECG scans will experience more stress due to the pressure on them
to have high ECG interpretation accuracies. Notwithstanding these hypotheses, the
final question is, “can we predict using statistical features such as measures of central
tendency (mean, median and mode) and measures of deviation (mean and standard
deviation) of the data, whether a medical practitioner will get an ECG interpretation
correct?”
In this exploration, we used an ECG scan of one of the conditions known as the an-
terior stemi. We used this because the stimulus was almost evenly split amongst the
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 75
31 participants (16 of them interpreted the scan correct to be anterior stemi while
15 of them got it wrong). A feature on the eye tracker provides an accuracy level
by considering the number of blinks, tilting of the head and eye gazes that are not
within the eye tracker’s detection range. One of the participants that got it correct
was removed due to low accuracy based on this criteria. Eliminating the participant
left the data equally split into 15 correct, 15 incorrect participants.
The data was extracted from the eye tracker and loaded into Python IDE for explo-
ration. After cleaning the data by removing data points with quality less than 70%
(according to Tobii eye tracker), Table 3.2 shows a statistical description of the data
for the left and right pupil grouped by participants accuracy (correct, incorrect). It
Left pupil Right pupil
Statistical measures Correct Incorrect Correct Incorrect
Total count 125259 160230 125390 158337
Mean (mm) 3.73 3.55 3.72 3.55
Std (mm) 0.60 0.47 0.67 0.5
Min (mm) 1.66 1.30 1.00 1.25
Max (mm) 6.39 6.68 6.88 6.39
25% (mm) 3.26 3.24 3.25 3.15
50% (mm) 3.61 3.58 3.57 3.55
75% (mm) 4.06 3.90 3.97 3.88
Table 3.2: Statistical description of the pupil diameter
could be observed that the mean pupil diameter is higher for the group that got it
correct - Left (M=3.73, SD=0.60), Right (M=3.72, SD=0.67) compared to those that
got it wrong - Left (M=3.55, SD=0.47), Right (M=3.55, SD=0.50). It can also be
observed that the standard deviation of those who got it correct is greater than those
who got it wrong. A possible explanation could be that those who got it wrong were
scanning for clues on the ECG scan, and during their search, their pupil size remained
the same and because they found no information of interest (minimal/no arousal);
hence they have less standard deviation than those who got it correct.
We can see from Figure 3.3 that the data is not normally distributed. Therefore, we
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 76
performed a non-parametric test of difference to see if there is a statistically signifi-
cant difference between the correct and incorrect participants in terms of their pupil
dilations.
The result of the parametric test is given below:
Figure 3.3: Distribution plot on the left pupil diameter (mm) of correct participants
Left Eye (correct vs. incorrect) U(8728562295.0), p < 0.01
Right Eye (correct vs. incorrect) U(8811297026.5), p < 0.01.
The result supports our hypothesis that the pupil dilation could be a discriminator of
participants who got the ECG readings correct from those who got it wrong.
To explore modelling techniques to be used as our discriminators, we extract certain
features from our datasets. The aim here is to utilise features that are discriminants of
those who got it correct from those who got it wrong. We carry out some pre-processing
on the data. Firstly, a triangulation-based smoothing technique was implemented on
the dataset. The smoothing technique works by triangulation to compute an average
value so that the resulting data is less noisy. The smoothing function uses a win-
dow size so that within the window size, rise/ fall in the curve will be smooth. The
translation of this regarding the physiology of the body is that several autonomic re-
sponses which could occur within seconds can be aggregated into signals that can be
recognised by the system as affective states. The accumulation of autonomic responses
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 77
together makes it possible to understand the user’s attitude, while the accumulation
of patterns in attitudes form an affective state, see 2.1. Another advantage of this
smoothing technique is that it reduces the effects of outliers due to noisy data. Figure
3.4 and 3.5 show the effect of smoothing on our dataset. Selecting the correct window
size depends on the frequency rate of the datasets, i.e. how many data points were
captured per second by the eye tracker.
Using the smoothing function, it is now possible to see the troughs in graph for window
size (d = 10) at time (t = 60, 140, 160), and the peaks at t = 150, 170.
The next question is, are these features sufficient to detect or predict participants who
got it correct or wrong? In answering this, statistical features of the datasets were
extracted from the unrefined data and the resulting data after windowing is applied.
A total of 12 features were extracted: mean, median, the standard deviation of the
left and right pupil in both their unrefined and smoothed/windowed state. To know
the features that have the best relationship (negative or positive) with the accuracy of
the participants, we computed the correlation between each feature and the accuracy.
Table 3.3 shows the matrix of the correlation of features. In the next stage, our goal
was to eliminate the features that least discriminate between the participants who got
it correct and those that got it wrong. The metric that was used is Pearson’s corre-
lation. From the table 3.3, it shows the features that least correlate with accuracy:
the standard deviations of both pupils and the standard deviation of their smoothed
state. All others have a higher correlation of at least 0.19.
Next, we try out machine learning classifiers to discriminate between correct and in-
correct predictions, based on features of the pupil dilation.
Our model was trained using logistic regression, KNN, SVM and linear regression.
The higher accuracy was found in logistic regression (52.50%) while the highest was
KNN with 57.22% (at k = 3). All accuracy tests were done using cross-fold validation
(n = 10).
In summary, the secondary analysis of pupillary response data was used to discrim-
inate between medical practitioners that interpreted ECG scans correctly and those
that got it wrong. our model yielded the highest accuracy of 57.22% when using KNN
classifier. we discovered that measures of central tendency such as mean and median
on both the unrefined and windowed form data had a stronger correlation with the
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 78
Figure 3.4: Plot of pupillary response (mm) against time (ms).
Figure 3.5: Plot of pupillary response (mm) against time (ms) after applying smooth-ing function with window size (d) = 1, 3, 5, 10.
accuracy of the participants than measures of variance. Windowing allowed us to
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 79
observe an aggregation of data points. Assuming that the data points represent au-
tonomic responses (pupil dilation and constriction) the appropriate window size will
enable us to measure more reliably, the affective states of the participants, window
by window. Recall from the background section, that the aggregation of autonomic
responses represent a physiological reaction (2.1). It is this physiological response that
we plan to correlate with users’ affective state in our research.
3.3 Description of pupil dilation
The average range of the pupil size is estimated to be between 2-4mm in bright light
and 4-8mm in darkness, which means that the pupil, at any given time, could be
between 2-8mm [Walker et al., 1990]. Anatomical differences between individuals of
different demographics (race, gender, age) also mean that this range varies from in-
dividual to individual. Some studies refuted the hypothesis of cultural differences in
valence and dominance of an emotion [Ekman et al., 1987]. However, some others
suggested that the absolute intensity of emotion could vary by culture [Russell, 1994].
A study found that people’s reaction to an emotion change with age but it depends
on the valence of the emotion [Charles et al., 2001]. Other factors such as experience
level, intelligence, personality traits may influence the affective state of a user [Picard,
2003].
More so, the range and rate of change can be influenced by the type of stimulus, pre-
vious state (cognitive, affective, experience) of the individual and their environment
(ambience) [Wilder, 1958]. Even when all these conditions are constant, certain health
conditions prevent the pupils from responding predictably. To compound this, there is
a condition called anisocoria that exists in one out of five people where both pupils do
not follow the same physiological pattern of response [Ettinger et al., 1991]. Further-
more, data from the eye tracker can be noisy due to machine errors caused by blinks,
geometry change (position and distance) between eye tracker and user, and indistinct
pupil colours.
In summary, from the literature, and what we learnt from our data exploration in
Section 3.2, the pupillary response is characterised by a noisy time series data with
variability within and between individuals and between different stimuli types. Next,
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 80
we examine how the pupillary response has been analysed by previous works.
3.4 Existing approaches to the analysis of pupil
data
In some studies that used multi-modal approaches, pupil dilation contributed the
greatest effect on the accuracy of their models, compared to the other sources of affec-
tive signal [Baltaci and Gokcay, 2014, Soleymani et al., 2012]. Since we have previously
eliminated multi-modal approaches due to availability constraints and ecological valid-
ity (costs, required skills to set-up, obtrusiveness), we will focus on uni-modal methods
that make use of pupillary responses alone.
Many approaches have been taken to analyse pupillary response data. Wang et al.
suggested three approaches for the analysis of pupil dilation: 1. mean pupil dilation,
2. latency to peak and 3. peak pupil dilation [Wang, 2011]. In controlled settings,
where participants interact with a stimulus for a fixed period, the average pupil
size of one eye can be taken, for each participant. Then, the average pupil dilation is
compared against a baseline. The baseline is often the average pupil size of a partic-
ipant while interacting with a controlled stimulus. The controlled stimuli could be a
grey background for which the participant gazes for about ten seconds. Wang et al.
used the pupillary response to discriminate cognitive workload under the influence of
confounding factors such as luminance conditions and emotional arousal [Wang et al.,
2013]. Wang et al. proposed using machine learning algorithms with pupil dilation
features to classify cognitive workload under the influence of these confounders. Iqbal
et al. discovered that using the percentage change in pupil size (PCPS) between the
task and the baseline by averaging the pupil dilations over time, is not an effective
discriminator of mental workload [Iqbal et al., 2004]. They suggested that this could
be due to longer tasks not having sustained pupillary responses and by including pe-
riods where the pupil size has dropped, it will significantly reduce the effect observed
from the data. After decomposing the tasks into smaller tasks, they were able to
observe differences in pupillary response. Partala et al. induced participants with
emotional sounds vs neutral sounds for a fixed duration [Partala and Surakka, 2004].
They observed that participants had higher average pupil dilations for the emotionally
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 81
arousing sounds. Bradley et al. displayed emotionally arousing pictures from the IAPS
for a fixed duration of time and used the average pupil size as a discriminator [Bradley
et al., 2008a]. There was a significant difference in pupil dilation when participants
viewed neutral pictures compared to emotional pictures. The drawback of using the
averaging approach is that in naturalistic settings, interaction lasts for variable peri-
ods that may not be predetermined. Also, peoples’ baselines change depending on the
stimulus type and according to the previous affective state [Wang, 2011]; this is known
as the law of initial value (LIV). Another limitation of taking the average of the pupil
size is that the periods in which the participant do not experience increased arousal
are included in the average, thereby, weakening the strength of the signal at the point
where there is an actual change in arousal. Some pertinent questions that challenge
the viability of this approach in the wild are, how will existing methods of detecting
arousal perform if participants interact with stimulus: 1. As long as they wish, 2.
With stimuli of different brightness, 3. Different stimulus types (cognitive, emotional,
etc.) or 4. Without the opportunity to take the pupil baseline measurements?
Wang et al. proposed using pupil dilations and gaze behaviour as machine learning
features [Wang, 2011]. In this approach, statistical properties of the data such as the
mean, standard deviation, and other gaze metrics like saccades, fixation duration are
input as features to the machine learning algorithm. The disadvantage of this ap-
proach is that machine learning is often unreliable when people exhibit idiosyncratic
behaviours [Savva and Bianchi-Berthouze, 2011]. Also, different stimulus types and
contexts could trigger different reactions. For example, when participants experience
stress due to a certain AOI on the screen, they may have longer fixations and microsac-
cades around the same AOI since the source of stress is located at a specific region.
However, when they are stressed due to a search task, people may exhibit a more ran-
dom fixation pattern around the entire screen because there is no specific region on the
screen causing the stress. In these two cases, saccades and fixation metrics could follow
different trends, thereby, sending inconsistent signals to a machine learning algorithm.
Finally, machine learning does not present a transparent way for exploratory research
because it often involves complex algorithms that are not well suited for explaining
the relationships between the features and the outcome variables, in this case, pupil
dilation and arousal.
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 82
In AFA algorithm, we used peak detection to sense moments of increased arousal,
while the amplitude of the peak indicates the strength of the arousal, i.e. the arousal
level (AL). The aim of our method is to analyse pupillary response data from an eye
tracker and generate an output, that is an array of time index, for moments where
users experience an increase in arousal. For each item in the index, we would also
identify the area of interest (AOI) where the user focused their attention on, prior
to experiencing an increase in arousal. Furthermore, for each item in the index, we
quantify the magnitude of increase in arousal that the participant experienced. In the
next section, we discuss how we developed AFA algorithm iteratively using static and
interactive stimuli types.
3.5 Iterative development of AFA algorithm
We took a data-driven methodology and developed our arousal sensing method through
the secondary analysis of 2 different studies: study 1 (3.5.1) - ECG static images and
study 2 (3.5.2) - interactive user interface. In the first instance, we used the ECG
study’s dataset to sense arousal for what we call an atomic stimulus. In this concept
of our design choice, an atomic stimulus would be a stimulus where we expect the
interaction to accomplish a single specific task. In this task, the content remains the
same (even though regions on the screen may elicit different affective states from other
regions). A stimulus is no longer atomic when a task contains multiple objectives or
the entire content of the screen changes during the task (e.g. navigating to another
Web page). This distinction is necessary to preserve the law of initial value (LIV). LIV
was first postulated by Wilder et al., and states that the initial physiological values and
its corresponding change (in response to stimulus) are negatively correlated [Wilder,
2014]. Some other researchers, like Jin et al. challenged this relationship and provided
evidence of spurious effects from confounding variables [Jin, 1992]. As this debate is
outside the scope of our work, to avoid the effects of the initial value, we analyse each
stimulus atomically so that we identify a baseline within an atomic stimulus, then sense
changes within the atomic stimulus. For a picture viewing event, we take each image
as an atomic stimulus. We developed our method for analysing such interaction using
the ECG dataset discussed in Subsection 3.5.1. For user interaction involving multiple
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 83
atomic stimuli, we can divide each task, such as web searching and filling in a form
to be different atomic tasks/stimulus. We developed AFA algorithm for segmenting a
session of user interaction into atomic tasks in the second study with a study that used
an interactive user interface as stimuli. We expand more on this in subsection 3.5.2.
3.5.1 Study 1
For this study, we reused the dataset that was described in Section 3.2. The eye tracker
captured data at a frequency of 50Hz, i.e. 50 records per second. The dataset for a
participant viewing a single stimulus over a 30 second dwell time will contain approx-
imately 1500 record, which is much more than the anticipated frequency of change in
arousal that could occur within 30 seconds. The relevance of analysing this data for
AFA algorithm was to use data-driven techniques to find out the optimal data aggre-
gation size and technique that most accurately detects when the participants felt an
increase in arousal. The ground truth we used here was the participants self-reported
arousal, and the medical experts annotation of their thought process regarding where
they looked at that informed their decision making for interpreting the ECG scan. We
explored two different fixed-size windowing techniques: simple moving averages and
non-overlapping windows. We aggregated both using three aggregate sizes of 5, 50
and 100 records. Data was collected at 50Hz, so these are equivalent to 0.1s, 1s and
2s respectively. For participants to make an accurate interpretation of the ECG scan,
there are leads that must be observed, as they reveal the abnormalities in an ECG
signal. Our hypothesis was that: Participants will experience an increase in arousal
when they gaze at these salient leads of the ECG (for instance, H, I and J are salient
leads/AOIs for the anterior stemi condition). To determine the optimal setting (win-
dowing technique and aggregate size), we randomly extracted participants data and
applied the algorithm under different settings. We used statistical specificity (recall)
to select the best configuration. The specificity was computed by
Specificity = Correctly detected peaks (True Positive)correctly detected peaks (True Positive)+ incorrectly rejected peaks (False Negative)
This was a suitable evaluation metric for us (rather than the precision, or F-value)
because our ground truth only stipulates some instances (not all instances) where we
expect increased arousal. Also, since the aim of the algorithm is to sense changes
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 84
in arousal, we are not interested in using the algorithm to detect the true negative
rate (i.e. when there should not be a peak). However, to reduce the likelihood of
false positives, we selected the setting with the highest accuracy but fewest number of
peaks, as due to confounding factors, some peaks may not be an increase in arousal.
We selected the fixed non-overlapping window size over moving average because com-
pared to the moving average, this technique also summarises the data. AFA algorithm
performed better using 50Hz (1s) because the specificity was 100% compared to 83%
with 5Hz (0.5s). Although 100Hz (2s) also returned 100% specificity, it was difficult
to identify the most fixated AOI during a particular window because over a longer
period, and there could be multiple fixations on different AOIs. Also, with a window
size of 2s, there is an increased likelihood to miss out a peak in arousal because it
takes between 1 and 3 seconds for the pupil size to reach its maximum response (to
a stimulus). Considering that AFA algorithm was built using the opinion of the re-
searcher (also a medical practitioner) who collected the data, we decided to evaluate
the approach using the participant’s self-reported feedback and other variables. We
explore the following variables:
1. Accuracy: This refers to the correctness of their interpretation of the ECG
signal. It is vital to note here that some of the ECG scans were less specific to a
medical condition, and it was possible for participants to get the interpretation
partially correct. In such instances, participants were classified as incorrect,
as this is also an unsatisfactory situation in the real world. For accuracy, we
assumed that participants who got the interpretation wrong might have found the
task difficult, thereby, applying more mental effort. Our assumption is based on
several studies that show negative correlations between stress and performance
[WELFORD, 1973, Akgun and Ciarrochi, 2003, Lazarus et al., 1952]. Stress
results in increased arousal, while accuracy is a proxy for measuring performance
[Scholtz, 2006]. Therefore, We assign a weighting (w) of +0.5 to participants who
got the interpretation wrong, and -0.5 to participants who got it correct.
2. Time spent on interpretation: This is the total duration spent from the start
of stimuli presented to the end, in milliseconds (ms). We assumed here that
the time spent will increase the level of stress experienced by the participant
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 85
because spending long on a problem indicates that they find it difficult, thereby,
demanding more cognitive effort. This correlation is evidenced in the literature
of usability metrics [Hornbæk and Law, 2007]. We assign a weighted value (w)
of +0.5 to participants who are in the first third for this variable, and -0.5 to
participants in the lower third.
3. Participant’s perceived difficulty of the experiment: We transcoded the
participant’s reported difficulty of the experiment on a scale of 1 to 10, with
one representing easy and ten indicating difficult. As evidenced in the literature
[Gellatly and Meyer, 1992], our assumption for this variable is that the difficulty
level would increase the level of arousal. We assign a weighting (w) of +0.5 to
participants who are in the first third for this variable and -0.5 to participants
in the lower third.
4. Experience level of participants: Participants included students, physiolo-
gists, nurses, cardiologists, healthcare assistant, etc. Participants years of expe-
rience ranged from 2-year nursing student to an advanced cardiac physiologist
with 30 years of work experience. Participant’s experience level was inferred by
their job type and years of experience. For example, a third-year medical stu-
dent would be assigned an experience level of 3 while a newly hired Cardiology
registrar will be assigned an experience level of 6, to account for the five years
of training. Consequently, the cardiac physiologist with 30 years of work expe-
rience was assigned an experience level of 35 to include five years in training.
Following evidence in the literature, we assume that experience level would in-
creases the anxiety, thereby increasing their arousal level [Wahn et al., 2016]. We
assign a weighting (w) of +1 to participants who are in the first third in terms
of experience, and -1 to participants in the lower third in terms of experience.
Finally, we sum the scores for each participant for the four variables. This sum
represents the expected level of stress (arousal) for that task. We compare this against
the number of arousal points (peaks) for each participant. Table3.4 shows the correla-
tion matrix and Figure 3.6 illustrates this in the form of a heatmap where the darker
cells indicate a higher correlation than the lighter cells. Figure 3.6 reveals that the
accuracy had the least correlation with arousal, followed by the time spent, then the
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 86
Figure 3.6: Heatmap to illustrate our predictor variables(Exerpeince, Accuracy, Timespent, Difficulty), and the total stress score with our outcome variable (No. of Arousalpoints - peaks
participant’s perceived difficulty level. The experience level was the variable that had
the highest correlation to arousal. The moderate correlation between the expected
stress score and the number of arousal points was r(30) = 0.62, p <= 0.01. Using
this study, we developed an approach to evaluate static (atomic) stimuli. In the next
study, we examine the case of interactive stimuli.
3.5.2 Study 2
Having examined static (atomic stimuli), this study aims to develop an approach to
sense arousal from given interactive stimuli. The original aim of this experiment that
we adopted in this study was to evaluate the impact of a plug-in that was built to
support the users in authoring and managing ontologies. Participants were required to
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 87
have expertise in ontologies. From the data, twenty-nine (29) participants’ eye-tracking
data were captured; fourteen (14) male, fifteen (15) female, between twenty-two (22)
and fifty-seven (57) years of age (mean 33.28). Participants interacted with Protege,
version 5.0, which is an open-source ontology engineering tool developed by the Uni-
versity of Manchester and Stanford University. Protege was pre-configured with the
inference inspector and a custom developed plugin Protege Survey Tool (PST). For
this study, the user interface was segmented into 5 AOI’s - view, progress, scenario,
question and action; Figure 3 shows this. Figure 3: Protege user interface with
Figure 3.7: Areas of interest overlaid on a Protege’s UI
Areas of Interest overlaid. See [Matentzoglu et al., 2016] for more information about
the study. This dataset poses a unique challenge because it contains user interaction
and the data spans for a longer time compared to Study 1. This is an example of an
interaction that we expect to have multiple atomic stimuli within it. Although it was
possible to manually segment the interaction into the tasks that the participants were
given, we chose not to do so because, in naturalistic settings, AFA algorithm would
not have real-time knowledge of the task start/end times. The relevance of analysing
this dataset was first to decide on a technique to split the data into chunks while re-
specting the temporal order, and secondly, to use AFA algorithm to assign areas on the
screen with arousal levels for the users’ interaction. For the segmentation, we explored
several techniques including fixed-size segmentation, clustering and changepoint de-
tection. The choice of an approach to use depends on the context of the application,
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 88
the parameters available, and the end goal.
Fixed-size segmentation would ensure that datasets are divided into fixed sizes. How-
ever, in using this approach, there is an increased likelihood that the same task could
be split into two atomic stimuli. Clustering is another option for grouping together
tasks with similar characteristics, but the temporal order of the interaction needs to
be considered. It will be incorrect to assemble a segment that consists of data points
belonging to different temporal spaces. Another restriction with clustering is that
clustering is usually applied when we have the entire datasets, which means that it is
not a practical solution for real-time applications. By real-time applications, we mean
applications in which new data points are being generated as the user interacts. As
new data points are formed, it will be difficult and sometimes impossible to anticipate
how new clusters should be composed apriori. Pattern recognition of cyclic (repeti-
tive, periodic) changes could also be another approach to segmenting an interaction
into atomic stimuli. This detects when certain events repeat themselves. However,
it is not always the case that tasks have a cyclic repetition pattern. This approach,
when applied in tasks where there are no cyclic or repetitive actions, will not support
segmentation. Another option is changepoint detection. Change point detection algo-
rithm computes the likelihood of change in the statistical properties of a dataset and
the likelihood that the change takes place at a certain record in the dataset. Change-
point detection is implementable in non-real time and real-time data. Inputting a
certain property such as interaction evens into the changepoint algorithm is possible.
Interaction events such as clicks or typing can be fed into the changepoint detection
algorithm. There are univariate and multivariate changepoint detection algorithms.
The univariate takes a single variable to detect a change, while the multivariate de-
tects changes in multiple variables to assign probabilistic values to whether a certain
data point is a change point. The flexibility and robustness of this technique informed
our decision to use this algorithm to split long interaction data (e.g., greater than 5s)
informed our choice of this approach.
In AFA algorithm, we used the Bayesian change point detection implemented in Python
PL by Johannes Kulick (MIT 2014). The input to the method is user event duration,
and our output is a vector of the same length containing probabilities of a change in the
event pattern and the probability of that change occurring at a certain point. Fig 3.8
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 89
shows a sample result of implementing the change point between input events (mouse
clicks, typing, scrolling) during a task. A cut-off value can be set to accept a period
as a change point. For example, the interaction in Figure 3.8 lasted for approximately
600s. If the cut-off probability is set at 0.8 (80%), we would have 21 changepoints for
that period, which is two tasks per minute. Using a threshold of 80%, we segmented
Figure 3.8: Top - input event, Bottom - probability of change point
the interaction into atomic segments. Then, we applied AFA algorithm discussed in
Section 3.6 to each segment. We took the domain expert’s rating on a scale of 1 to
5, how much arousal is expected per AOI, considering cognitive demand, attention,
anxiety and stress. We computed the mean of all four variables to generate an ex-
pected arousal value for each AOI. Next, we performed our arousal analysis using AFA
algorithm described in Section 3.6, then calculated the sum of all the arousal levels
obtained for each peak, for each participant, on each AOI. The result is presented
in Table 3.5. Following this, we performed Spearman’s correlation test between the
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 90
result of AFA algorithm and arousal rating that was obtained from the domain expert.
We observed a strong positive correlation r(5) = 0.82, p = 0.089. This result shows
promise and we anticipated that with increased sample size, our p-value would be
lower.
3.6 Implementation
The execution flow of AFA algorithm can is illustrated in Figure 3.9.
Figure 3.9: Execution flow of our arousal detection approach
Figure 3.10: Graph of Arousal Level against time (s)
Pupillary response, which is the diameter of the pupil in millimetres is captured by
an eye tracker at a specified frequency rate. For a 50Hz capture rate, the eye tracker
records the diameter of the pupil once every 20ms.
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 91
Function 1: Non-overlapping window aggregation
Input: Array of pupil dilations (pupilDilation[]), eye-tracking frequency
(frequency)
Output: Array of non-overlapping widow (window[])
windowIndex = 0;
for each element in pupilDilation[] do
append element to tempArray;
if (index is divisible by frequency) then
median = median(tempArray); window[windowIndex] = median;
windowIndex ++;
clear tempArray;
end
end
return window[]
The analysis is split into three major phases: 1. Data transformation (preparation,
aggregation, and transforming the pupil data to categorical data that represents each’s
level of arousal), 2. peak detection (to detect individual changes in levels of arousal)
and 3. Inference phase (to compute the cumulative impact of arousal when partici-
pants focus on an area of interest on the screen). In the first phase of the analysis,
the data is cleaned using linear interpolation to replace missing values and blinks. The
data for both study 1 and 2 were cleaned using linear interpolation to replace missing
values and outliers (values below 2mm and above 8mm, which is outside the range of
pupil size).
After cleaning, the highly granular data is aggregated into fixed-size windows at non-
overlapping contiguous time intervals (tumbling windows) as expressed in Function
1. The window size should be equal to the eye-tracking capture frequency in Hertz
(Hz ) so that every window is 1s long. The rationale for the aggregate size is that it
takes approximately 400ms for the human pupil to react to cognitive stimulus and for
emotional stimulus, up to 600ms. However, it takes between 2 and 3 seconds from the
time of exposure to a stimulus for a participant to attain a peak in arousal. The aggre-
gation of 1s is short enough that peaks are not split within windows but long enough
to reduce the effects of outliers in each window. This is done for each stimulus for
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 92
each participant. The output of Function 1 is passed into a function which transforms
each element into categorical data that represents the level of arousal. This function
computes the range of the individual’s pupil dilation, the unit of change for that partic-
ipant so that the arousal can be modelled based on each person’s pupil characteristics.
Figure 3.11a shows the raw pupil signals from the eye tracker. Figure 3.11b shows the
signal after aggregating it into windows of size 1s (50Hz in our case, makes one sec-
ond). Finally, Figure 3.11c shows the arousal levels after transforming the aggregated
data, according to each participant’s physiology characteristics (range and measure
of central tendency). The pseudo code for this process is expressed in Function 2.
Function 2: Data transformationInput: Array of non-overlapping widow (window[]), number of levels (levels)
Output: The level of arousal for each window (window[])
index = 0;
pupilRange = max(window[]) - min(window[]);
unitOfChange=pupilRange/(levels - 1);
med = median(window[]);
for each element in window[] do
change = window[index] - med;
window[index] = change / unitOfChange;
index ++;
end
return window[]
Now that the pupil dilations have been modelled into categorical levels of arousal,
we must detect when a participant has been experienced a change in arousal. The
second phase uses a peak detection function to sense increase in arousal levels,
expressed in Function 3. The array returned by Function 3 contains the indices of
windows where the participant has experienced an increase in arousal. To compute
the magnitude of the increase in arousal (V), we calculate the difference between the
arousal level at each peak index and the lowest point before the peak. This magnitude
does not tell us the true impact of this arousal because we know that the more the
participant interacts with an arousing stimulus, the greater we expect the impact of
the stimulus to be [Iqbal et al., 2004]. Therefore, in the third phase, the impact of
an area of interest (a) on arousal (A) on is computed by:
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 93
Function 3: Arousal peak detection
Input: Array of categorical levels of arousal per window (window[])Output: Array containing indices of peaks detected (peakIndices[])index = 0;for each element in window[] do
if ((window[index] ¿ window[index− 1]) and (window[index] ¿=window[index+ 1])) then
append index to peakIndices[];end
endreturn peakIndices[]
A(a) = (∑
(V ))t
The arousal magnitude (V), multiplied by the participant’s total fixation duration (t)
for the area of interest (a). The hypothesis is that the arousal magnitude (V) repre-
sents the intensity of the stimulus, but the impact of the stimulus on the participant
is compounded by the total duration spent on the stimulus (t). We are aware that
many algorithms for sensing emotion tend to normalise their outputs by time. What
that measures, is the intensity of the stimuli, rather than the effect on the individual.
Since we aim to use AFA algorithm for sensing arousal during user interaction, for the
purpose of improving user interaction, we need to measure the total impact that a
stimulus has on a participant. Take, for instance, the case of boiling water from a pot.
The intensity of heat (analogous to the intensity of the stimuli), can be measured by
the temperature. However, if a pot of water is placed on a burning furnace, to mea-
sure the impact of the heat from the furnace on the water, there would be a difference
between two pots that are placed on the furnace for 1 second, compared to 10 minutes.
Therefore, to compute the impact on the water (analogous to users), we must consider
the temperature (intensity) as well as the time spent on the furnace (duration).
Henceforth, our project methodology, methods, and the algorithm we developed would
be referred to as Algorithm for sensing Arousal and Focal Attention (AFA). The visu-
alisation of the output of AFA algorithm is presented in the next section.
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 94
3.7 Visualising the output of AFA algorithm
The output of AFA algorithm is an array of time indexes where participants experience
an increase in arousal (peak). For each index, we extract: 1. the magnitude of increase
2. the time the increase occurs, and 3. the most fixated area of interest during the
moment of increase. Therefore, with these three variables, we can explore changes
in arousal in the form of temporal trends (time), or as a function of the users’ focal
attention (AOIs). Temporal trends will enable researchers/observers to understand
how the measure of arousal changes with time. For example, participants experienc-
ing a decline in arousal from the first fixation on the web page to when they exit the
web page may indicate a progressive loss of interest or boredom. This would be an
example of a hypothesis that can be generated, that the researcher could carry out
further investigations on. Observing patterns of arousal from the perspective of the
users’ visual attention could inform researchers or designers of components/contents
on the web page that cause users stress or frustration. For example, a confusing UI
control may cause users to experience an increase in arousal when they fixate on it.
Our initial prototype was a dynamic and interactive visualisation. This visualisation
enables users to replay the dynamics of arousal for multiple users to time and region
of focal attention (AOI) over a stimulus. Figure 3.12 is a screenshot of this prototype.
This prototype is divided into 11 segments, which we will describe further. After car-
rying out a study using an eye tracker, the data is extracted from the eye tracker and
analysed using AFA algorithm. In Segment 2 (the file loading panel), the investigator
uploads the image file of the stimulus (.png, .jpg, .gif), the coordinates that define
the areas of interest (.csv), the aggregated result of the analysis (.csv), and the break-
down of the analysis by participants (.csv). Segment 7 displays the AOI as coloured
grids overlaid unto the stimulus image. The AOIs are also labelled accordingly. In
this example, the stimulus is an image of an ECG scan that was divided into thirteen
AOIs (A-M). In instances where the algorithm is unable to identify the most fixated
AOIs during a moment of arousal, Segment 3 still displays the level of arousal by the
participants. The coloured grids are similar to heatmaps, as the colour changes with
respect to the level of arousal. The key that maps to the levels of arousal (from low to
high) is displayed in Segment 6. The colour scheme in this example ranges from white
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 95
(low) → Yellow → Red (high), but there are other schemes that can be selected from
Segment 5, according to the investigator’s preference. The arousal levels displayed in
Segment 7 is the cumulative arousal for the participants that are selected in Segment
8. All participants can be selected/deselected using the control in Segment 4. Individ-
ual arousal levels are broken down into participants in Segment 9, where the previous
arousal state (with the label indicating the focal attention) is shown in the first grid,
the current state shown on Segment 7 is shown in the second, and the next state is
shown in the third. As we stated before, the display in Segment 7 changes with time
(in seconds). Therefore, Segment 10 allows the investigator to change (or view) the
time frame for which the arousal levels on Segment 7 is shown. Segment 11 allows the
investigator to control the replay (pause and play), and the playing speed.
This tool allows researchers to formulate hypothesis by observing trends temporally
and spatially across different AOIs. The ability to select participates also means that
outliers can be spotted. Group comparison can also be made, but not easily achieved
simultaneously since one interface controls the visual exploration. Another limitation
of this approach is that, because the visualization is replayed as a video, it does not
present an overview of the arousal dynamics at one glance. Therefore, the visuali-
sation offers limited options to communicate a pattern to fellow researchers for print
media (e.g. publications) as taking a screenshot of the visualisation will only reveal
a period of the interaction. This visualisation was built using Javascript, CSS and
HTML5 while AFA algorithm was developed in Python programming language, which
means that there was no software integration between the visualisation and the anal-
ysis. Due to the limitations mentioned above, we designed a visualisation toolkit
that enables UX researchers to analyse their data from the eye tracker and view the
dynamics of their participants’ arousal ass a single frame, rather than as a video. Be-
cause this toolkit was developed using the Flask-Django framework for python, as well
as HTML5, CSS and Javascript, it enables researchers to interact with the software
from the data selection, to the analysis, all within the same UI environment. In our
design, we have two modes of interaction; one has the option of viewing participants’
changes in arousal as a function of time, and the other is to visualise the arousal as a
function of their visual attention over the AOIs that they fixated upon. Figure 3.13
shows a screenshot of the user interface of our arousal toolkit.
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 96
Segment 1 of the toolkit is the menu. Following the “Enter data” hyperlink, the
researcher can load their eye-tracking dataset, followed by the execution of AFA algo-
rithm. The next hyperlink “Analysis”, enables takes us to the page displayed in Figure
3.13 where the visual analysis/exploration takes place. In Segment 2, the researcher
selects either the “Arousal timeline” mode or the “Arousal areas” mode. Segment 3
allows the researcher to select the participants to extracted from the result of the eye-
tracking analysis for visualisation. In Segment 4, the researcher selects the stimulus
they are interested in. The result is then visualised in Segment 5 (the work area).
Segment 6 shows the key for interpreting the arousal bubbles in the Arousal timeline
graph. As we stated, there are two modes of visualisation in our arousal toolkit.
Figure 3.14a is the arousal timeline mode, and Figure3.14b is the arousal areas mode.
In the first mode, all peaks are plotted on a time scale, and the size of the circle is the
magnitude of the peak (arousal level). It is colour coded by participants in case their
interaction lasts for a long time, and we have to scroll to view the later parts of the
interaction. This visualisation is not so much about where they are looking at, even
though we can see this detail when the mouse pointer hovers on a peak. For example,
we can see that Participant P2M had at a peak in arousal with a magnitude (AL) =
‘7’. This peak happened 50 seconds into the interaction while his attention was on
AOI ‘B’.
In the second mode, researchers can visualise the measure of arousal induced by an
area of interest (A, B, C, D) for each participant. The lowest row shows a cumulative
value for all the participants regarding an area of interest. The darker the value, the
more the arousal as indicated by the key on the right side of the figure.
The aim is to be able to input eye-tracking datasets into our tool and generate visuali-
sations about the user’s arousal dynamics that can be appended into scientific papers,
presentations or UX evaluation report. The toolkit can also be used to formulate a
hypothesis which can be investigated using other formal statistical or qualitative re-
search methods.
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 97
3.8 Conclusion
In this chapter, we explored existing devices for capturing eye-tracking data. For our
studies, we chose a low-end portable eye tracker, as we anticipate that web-cameras
with such fidelity would soon become mainstream. We discussed the rationale for
our technique, especially the use of peak detection to sense arousal. As we said, it
provides the opportunity for an event-based approach, where moments of arousal can
be detected along with the visual attention of users during these moments. This
event-based approach may be used to facilitate third-party applications such that
actions can be triggered when the arousal level reaches a certain threshold or when
certain visual element triggers arousal. In situations where arousal is being tracked
to sense attention, third-party applications that utilise AFA algorithm could draw the
attention of participant to other parts of the screen that they may have missed but are
crucial to the user on the platform. Third-party applications could be developed to
automatically change the layout of a web-page to present less information in the case
of cognitive load. Further, display a tooltip text over visual elements that cause stress
on a page, or suggesting breaks in times of extreme stress thereby preventing mental
fatigue. We are aware of the fact that there are several causes of arousal, including
responses to emotion-evoking stimuli, cognitive load and frustration. The evaluation
of AFA algorithm aims to cover these sources of arousal to assess the generalisability of
our findings. Therefore, we designed three lab-based experiments to assess the ability
of AFA algorithm to detect arousal induced by these stimuli types. We discuss these
experiments in the next two chapters.
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 98
Tab
le3.
3:M
atri
xsh
owin
gth
eP
ears
on’s
corr
elat
ion
ofst
atis
tica
lfe
ature
.(L
-le
ft,
R-
righ
t,W
-w
indow
,st
d-
stan
dar
ddev
iati
on)
Rm
ean
Lm
ean
Rst
dL
std
Rm
edia
nL
med
ian
RW
mea
nLW
mea
nR
Wst
dLW
std
RW
med
ian
LW
med
ian
Acc
ura
cy
Rm
ean
1.00
0.94
0.41
0.09
0.99
0.95
0.10
0.94
0.40
0.11
0.10
0.95
0.19
Lm
ean
0.94
1.00
0.34
0.11
0.93
0.10
0.94
0.10
0.35
0.16
0.93
0.99
0.23
Rst
d0.
410.
341.
000.
630.
350.
320.
410.
340.
920.
610.
360.
36-0
.07
Lst
d0.
090.
110.
631.
000.
070.
090.
100.
120.
590.
940.
080.
120.
05
Rm
edia
n0.
990.
930.
350.
071.
000.
940.
990.
940.
330.
080.
100.
940.
20
Lm
edia
n0.
951.
000.
320.
090.
941.
000.
951.
000.
330.
120.
940.
990.
24
RW
mea
n1.
000.
940.
410.
101.
000.
951.
000.
940.
410.
120.
990.
950.
19
LW
mea
n0.
941.
000.
340.
120.
941.
000.
941.
000.
360.
160.
940.
990.
22
RW
std
0.40
0.35
0.92
0.59
0.33
0.33
0.41
0.36
1.00
0.67
0.34
0.36
-0.0
8
LW
std
0.11
0.16
0.60
0.94
0.08
0.12
0.12
0.16
0.67
1.00
0.09
0.15
0.02
RW
med
ian
0.99
0.93
0.36
0.08
1.00
0.94
0.99
0.94
0.34
0.09
1.00
0.95
0.19
LW
med
ian
0.95
0.99
0.36
0.12
0.94
0.99
0.95
0.99
0.36
0.15
0.95
1.00
0.21
Acc
ura
cy0.
190.
23-0
.07
0.05
0.20
0.24
0.19
0.22
-0.0
80.
020.
190.
211.
00
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 99
Tab
le3.
4:M
atri
xco
mpar
ing
our
pre
dic
tor
vari
able
s(E
xer
pei
nce
,A
ccura
cy,
Tim
esp
ent,
Diffi
cult
y),
and
the
tota
lst
ress
scor
ew
ith
our
outc
ome
vari
able
(No.
ofA
rousa
lp
oints
-p
eaks)
Exp
eri
en
ceA
ccura
cyT
ime
spent
Diffi
cult
yStr
ess
score
No.
of
Aro
usa
lp
oin
ts
Exp
eri
en
ce1.
000
-0.0
27-0
.179
-0.4
920.
505
0.45
4A
ccu
racy
-0.0
271.
000
-0.1
65-0
.218
-0.7
57-0
.278
Tim
eSp
ent
-0.1
79-0
.165
1.00
00.
078
-0.0
97-0
.200
Diffi
cult
y-0
.491
-0.2
180.
078
1.00
00.
012
0.00
5Str
ess
score
0.50
5-0
.757
-0.0
970.
012
1.00
00.
618
No.
of
Aro
usa
lp
oin
ts0.
454
-0.2
78-0
.200
0.00
50.
618
1.00
0
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 100
Table 3.5: Expected arousal level (M ) vs. computed arousal level (Output)
AOIScenario Action Progress View Question
Arousal
Cognition 1 2 1 5 4Attention 1 2 1 5 5Anxiety 1 1 2 5 5Stress 1 2 1 5 4
M 1 1.75 1.75 5 4.5Output 99 533 17 3464 1526
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 101
(a) Participant’s raw pupil signal from the eye tracker
(b) Participant’s aggregated signal (window size = 1s)
(c) Participant’s discretised arousal levels after converting it to scale (1-9)
Figure 3.11: A comparison of the raw pupil dilation extracted from the eye tracker,with the processed arousal signal, after converting to arousal levels
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 102
Figure 3.12: Arousal explorer tool
Figure 3.13: Arousal toolkit
CHAPTER 3. DEVELOPMENT OF AFA ALGORITHM 103
(a) Arousal timeline
(b) Arousal areas as heat map
Figure 3.14: Modes of visualisation in our arousal toolkit
Chapter 4
Evaluating AFA algorithm on
emotion evoking and cognitively
induced arousal
In the previous chapter, we proposed our methodology for sensing arousal. As we
said, our methodology works by aggregating pupil data extracted from the eye tracker
to even out the effect of outliers. The aggregated data is converted modelling to fit
the users range and central measures. Further, we detect peaks, which are moments
where participants experience an increase in arousal. For each peak, we compute the
magnitude of change, and also identify the users focal attention during that moment of
increased arousal. Finally, we multiply the magnitude of arousal caused by each AOI
with the duration spent fixating on that AOI, to compute the cumulative measure
of arousal due to the AOI. Recall that in our problem statement (Chapter 1), we
identified possible reasons why affect detection has limited potential for ubiquitous
widespread use. The most critical reason was in the selection of the affect detection
mechanism. Another reason we identified for the lack of widespread use was noise and
idiosyncratic physiological response. Our method (Chapter 3) tackled this through
data cleansing, aggregation and fitting the data to individual baselines. We address
confounding factors such as participants’ previous affective states (LIV), stimulus-
response specificity and colour intensity through our concept of an atomic stimulus.
By considering each stimulus in isolation, thereby, detecting intra-stimulus changes
using our peak detection approach, we were able to circumvent the challenges above.
104
CHAPTER 4. EVALUATING AFA ALGORITHM 105
In this chapter, we focus on another factor that often yields inconsistent results in the
field of affective computing, which is the lack of generalisability of affect detection due
to varying stimulus types. We state why the evaluation we carry out in this chapter
is a vital process to attaining a generalisable solution.
4.1 Rationale and motivation
Physiological arousal could indicate the presence of stress [Zhai and Barreto, 2006, Sun
et al., 2010], boredom [Chanel et al., 2008], attention [Lang, 1990] and cognitive load
[Shi et al., 2007]. Other states such as intense joy, sexual feelings, anger, surprise
also result in increased arousal, even though they are considered emotional states.
There are other states, which can be a combination of both cognitive and emotional
states such as anticipation, activation, and frustration. These concepts differ in seman-
tic meaning to individuals who experience them but may share certain physiological
characteristics if they result in increased arousal. However, certain approaches may
be capable of sensing one category of arousal accurately, but fail to sense arousal in
another, as emotional processing and cognitive processing could follow different neu-
rological pathways. The need to treat emotions and cognition as separate concepts is
a source of debate amongst researchers from different domains.
For instance, psychologists and neuroscientists have expressed divided opinions over
the premise that “emotion and cognition are distinct concepts” or whether “emotions
are part of cognition”. Gerrod Parrott et al. argued that for humans to adapt to
emotions, we must anticipate, interpret and perform problem-solving functions, all of
which require cognitive processing [Parrott and Schulkin, 1993]. They also argued that
the brain’s central system of control for its sensory functions means that no part of the
brain is purely emotional [Parrott and Schulkin, 1993]. Joseph E. LeDow’s attempted
to debunk this view using the analogy of the brains mechanism for interpreting visual
perception and reacting accordingly with motor actions [Ledoux, 1993]. When we see
a potentially harmful object, there is some cooperation between the vision and the
motor function of the brain, to avert danger (e.g., by moving away from the object).
Despite this overlap between the visual and motor faculty, the separationist view is
that it is useful to examine the vision and motion functions independently [Ledoux,
CHAPTER 4. EVALUATING AFA ALGORITHM 106
1993]. This argument is beyond the scope of our research, and our aim is not to inves-
tigate these theories. However, since people encounter emotionally evoking contents,
as well as cognitively demanding contents during user interaction, it is important to
ensure that AFA algorithm yields consistent result under both circumstances. There-
fore, we decided to evaluate AFA algorithm on pupillary responses to both cognitive
and emotional stimuli. We start by evaluating AFA algorithm on emotionally evoked
arousal.
4.2 Sensing emotionally evoked arousal
As early as 1981, Paul R. Kleinginna et al. identified 92 definitions of emotions from
the literature. In an attempt to come up with a consensual definition, the following
was proposed:
“Emotion is a complex set of interactions among subjective and objective
factors, mediated by neural and hormonal systems, which can (a) give rise
to affective experiences such as feelings of arousal, pleasure/displeasure;
(b) generate cognitive processes such as emotionally relevant perceptual
effects, appraisals, labeling processes; (c) activate widespread phys-
iological adjustments to the arousing conditions; and (d) lead to
behaviour that is often but not always, expressive, goal-directed, and adap-
tive” [Kleinginna and Kleinginna, 1981]
The key-point from this definition is found in (c), that emotional experiences could
activate arousal and physiological responses. In another view by Mehrabian et al., they
opined that most emotional states could be described using three nearly orthogonal
dimensions known as the Pleasure-Arousal-Dominance (PAD) scale [Mehrabian, 1996].
In this view, arousal indicates the strength and intensity of emotion. Therefore, our
expectation is that when people experience an emotion, we would be able to sense
arousal elicited by this emotion, using AFA algorithm, because arousal is the intensity
of emotion. The objective of this section is to evaluate our arousal sensing approach
on its ability to utilise physiological signals to sense arousal from emotionally-evoked
stimuli.
In order to evaluate this, we need to select stimuli that are capable of eliciting a
CHAPTER 4. EVALUATING AFA ALGORITHM 107
measured amount of emotional responses unto the participants. The analysis involves
comparing the self-reported (expected measures) of arousal against the output of AFA
algorithm. We describe our experimental design next.
4.2.1 Experiment
Participants
41 (9 female and 32 male) participants were recruited to take part in the study at
the University of Manchester. Participants’ mean ages were between 16 and 37 (M =
26, SD = 4.3). All the participants were students with their highest qualification
being (4 GCSE, 22 A-level, 9 Bachelors’ and 6 Masters’). Recruitment was done by
word of mouth, participation was voluntary, and the withdrawal was possible before,
during and after the experiment. This experiment was approved by the University of
Manchester’s research ethics committee (approval number: 2017-1906-3160). The
participant information sheet and the consent form for this study are appended in
Appendix A.1 and A.2 respectively.
Stimuli and selection criteria
There are several options for databases that contain emotion-evoking stimuli. For
example, the NimStim Face Stimulus Set for studies based on facial stimuli [Totten-
ham et al., 2009], the Geneva Affective PicturE Database (GAPED) [Dan-Glauser and
Scherer, 2011], which includes a rating of normative significance, and the International
Affective Picture System (IAPS) [Lang et al., 1997]. For a review of the use of affective
stimuli datasets, see [Horvat et al., 2013]. 12 Pictures identified by (7175, 7010, 2513,
2440, 2312, 2359, 8231, 9031, 4597, 1302, 8492, 1321) were selected from the Interna-
tional Affective Picture System (IAPS) database as it is widely used and contains the
most varied selection of images. Due to the terms and conditions of the use of images
from the IAPS, these images were not be published or described in graphic details
here in this thesis. Each picture contained the mean and standard deviation of the
valence, dominance and arousal ratings using the self-assessment manikin (SAM) scale
[Bradley and Lang, 1994]. These ratings were accumulated from approximately 100
participants per picture who self-reported their emotional response to each stimulus,
CHAPTER 4. EVALUATING AFA ALGORITHM 108
see [Lang, 2005] for more information about this. We rounded off all the mean arousal
rating of the pictures in the database to their nearest whole numbers. After rounding
up, all ratings now had discrete arousal values between two and seven. After that, we
selected two pictures from each level (2-7) such that one had a relatively high valence
and the other had a relatively low valence so that we had a widespread of valence
between the images. To ensure that the mean value of arousal for the images selected
elicited a consistent feeling amongst the participants who rated them, we selected
pictures that the standard deviation (SD) of their arousal rating (as given by IAPS)
were more than 2. Finally, we ensured that no extremely violent or erotic picture was
selected for ethical reasons. Table 4.1 shows the description and affective ratings of
the pictures that were selected.
Materials and procedures
Participants viewed each stimulus (12 images) at a distance of ∼ 65cm from a 17-inch
monitor. Tobii X2-60 eye tracker was used to capture gaze behaviour at an angle of ∼
30◦ and pupillary response every 20ms (f = 50Hz). Each participant was presented
with each image in counterbalanced order for as long as they needed to view them.
They were instructed to press the space bar on their keyboard to proceed to the next
image. Before the next image was presented, a plain grey image was displayed to them
for 3 seconds so that their pupil sizes can return to baseline before engaging with the
next image. At the end of the entire presentation, the participants were instructed
to rate each image on a Likert-type scale on paper, how aroused they felt seeing
each image. To clarify the term arousal for them, we phrased the question using loose
terms that they may associate with arousal. The question was, “Please rate the images
you just viewed according to how much arousal (stress, anxiety, cognitive load, fear,
excitement) you felt”, see Appendix A.3. This method of self-report was adopted for
the ease of explaining to the participants how to self-assess their emotions as, during
the pilot study, we discovered that participants could not interpret the other methods
used, such as the SAM scale, without undergoing some training. After participants
rated their levels of arousal, we took note of comments, feedback or self-reflection from
the participants regarding the study. Other equipment used include 17-inch monitor,
Tobii Studio 3.2 software, keyboard and mouse.
CHAPTER 4. EVALUATING AFA ALGORITHM 109
Tab
le4.
1:IA
PS
Sti
muli
show
ing
the
des
crip
tion
,ar
ousa
l,dom
inan
cean
dva
lence
valu
esof
each
stim
ulu
s.
IAP
SID
Desc
rip
tion
Aro
usa
lM
(SD)
Aro
usa
l(r
ounded)
Vale
nce
M(S
D)
Dom
inance
M(S
D)
7175
Lam
p1.
72(1
.26)
24.
87(1
.00)
6.47
(2.0
4)7010
Bas
ket
1.76
(1.4
8)2
4.94
(1.0
7)6.
70(1
.48)
2513
Wom
an3.
29(1
.67)
35.
8(1
.29)
5.92
(1.7
12440
Gir
l2.
63(1
.70)
34.
49(1
.03)
5.97
(1.8
9)2312
Mot
her
4.02
(1.6
6)4
3.71
(1.6
4)4.
72(1
.73)
2359
Mot
her
and
Child
3.94
(1.7
3)4
5.87
(1.4
1)5.
49(2
.20)
8231
Box
er5.
24(1
.84)
53.
77(1
.83)
4.68
(1.9
1)9031
Shoe
inth
em
ud
4.82
(1.9
2)5
3.01
(1.5
9)4.
68(1
.81)
4597
Rom
ance
5.91
(1.8
6)6
6.95
(1.6
5)5.
64(2
.11)
1302
Dog
6.00
(1.8
7)6
4.21
(1.7
8)4.
04(2
.11)
8492
Rol
ler
coas
ter
7.31
(1.6
4)7
7.21
(2.2
6)4.
63(2
.41)
1321
Bea
r6.
64(1
.89)
74.
32(1
.87)
3.51
(2.1
2)
CHAPTER 4. EVALUATING AFA ALGORITHM 110
Results
Datasets of 41 participants over the 12 stimuli were extracted from the eye tracker.
Figure 4.1a, 4.1b and 4.1c show the mean time to first fixation, mean fixation count
and total fixation duration, respectively, on all images. From Figure 4.1a, we can
see that on the average, participants took longer to fixate on image 2440. They also
had the fewest number of fixations, as well as fixation duration as shown in Figure
4.1b and 4.1c. This image was that of a girl, and had a low IAPS arousal rating
(M = 2.60, SD = 1.70) and a neutral valence (M = 4.49, SD = 1.03) as shown in
Table 4.1. Recall, that the aim of this experiment is to evaluate AFA algorithm against
the self-reported measures, which we consider to be the ground truth for this study.
To find out the ground truth correlates with AFA algorithm, we carry out the following
data preparation actions, then run the eye-tracking dataset through AFA algorithm.
Firstly, instances that participants viewed the pictures for less than 3 seconds were ex-
cluded because it takes 2-3 seconds for the pupil dilation to reach its peak. Also, only
the first four stimuli viewed per participants were included in the analysis to reduce
the effect of disinterest. Furthermore, records with less than 70% accuracy (measured
by, the number of times that the eye tracker was able to capture both eyes) were
excluded, as this was the recommended filter on Tobii studio. After data preparation
and running the data through AFA algorithm, we observed the following results. For
example, the same image 2440 that showed evidence of low attention and interest as
observed by the time to first fixation, fixation count and fixation duration, also showed
the least arousal rating from AFA algorithm. Figure 4.2 shows each stimuli against
the algorithm’s mean arousal, the participants’ mean arousal rating and the mean
IAPS arousal rating. The correlation between the mean IAPS arousal, participant
Table 4.2: Correlation between the mean IAPS arousal rating, self-reported rating andthe algorithm’s arousal level (scaled between 1 and 5).
X Y r p
IAPS Reported 0.90 <.05IAPS Algorithm 0.59 <.05Reported Algorithm 0.51 <.05
reported arousal and the algorithm’s output is presented in Table 4.2. The strong
positive correlation, r(12) = .90, p <= 0.01 between the mean self-reported arousal
CHAPTER 4. EVALUATING AFA ALGORITHM 111
(a)
Mea
nti
me
tofi
rst
fixat
ion
(s)
(b)
Tot
alfi
xat
ion
cou
nt
mea
n
(c)
Tot
alfi
xat
ion
dura
tion
mea
n
Fig
ure
4.1:
Gaz
eb
ehav
iour
acro
ssal
l12
stim
uli)
CHAPTER 4. EVALUATING AFA ALGORITHM 112
Figure 4.2: Stimuli against the algorithm’s arousal rating, participant’s reported feed-back, and the IAPS arousal ratings
of participants and the mean ratings of the IAPS validates the dataset as ground
truth. The strong correlation between the IAPS arousal rating and self-reported rat-
ings is expected because both are self-reported measures, but this also validates the
participant’s reported arousal as a basis for evaluating our output. The correlations
between the mean algorithm arousal and the mean IAPS (r(12) = .59, p <= 0.01) and
between the mean algorithm arousal and the mean participants’ reported feedback
r(12) = .51, p <= 0.01 per stimulus are moderate.
Treating each participant independently rather than averaging all participants per
stimulus, we also observed a moderate correlation (r(47) = .46, p <= 0.01) between
the algorithm’s arousal rating and the participants’ self-reported arousal. Similar to
Oliveira et al., inter-picture colour intensity was accounted for by getting the main
colour of a picture using the highest frequently appearing colour of an image [Oliveira
et al., 2009]. Next, this colour was converted to perceived brightness using the formula,
Perceived brightness = (Red value * 299) + (Green value * 587) + (Blue value * 114)1000
There was no correlation (r(47) = 0.03, p = 0.83) between the output of AFA
CHAPTER 4. EVALUATING AFA ALGORITHM 113
algorithm and the inter-picture brightness which indicates that our result is not a
factor of the inter-colour differences between the stimuli.
4.2.2 Limitations from our analysis of emotional stimuli
We postulate that the moderate correlation (rather than a strong correlation) between
physiological responses and self-reported measures exist due to the limitation in both
approaches. A lack of ideal ground truth [Constantine and Hajj, 2012] makes it difficult
to establish with certainty, the way forward. For pupillary response, errors could arise
due to eye-tracking accuracy and the analysis technique while for self-report, the main
cause of concern is bias. In the following subsections, we discuss these limitations and
make recommendations.
Limitations from self-reported ground truth
Participants sometimes report their expected feeling rather than their actual feeling
[Nichols and Maner, 2008]. In situations where participants’ engagement is limited or
passive, they would still rate their expected feeling, and this could limit the validity
of self-report. We observed that the correlation between AFA algorithm and self-report
was proportional to the minimum participant’s dwell time (r(134) = .39, p <= .01).
See Figure 4.3. This could mean that the more they engage with the stimulus, the
more accurate their self-report data is. We also observed that filtering by ‘maximum
stimuli viewed’, the more pictures participants viewed, the less accurate the algorithm
(r(134) = −.56, p <= .01), see Figure 4.4. In this case, it could mean that people get
disinterested or desensitised after looking at some stimuli. Therefore, their physiolog-
ical response is reduced. These may not be perceptible to participants, so they end
up reporting their expected emotion rather than their actual one. Indeed, the bias
in self-report lends further credence to the need for an objective measure of affect.
However, AFA algorithm is not without its limitations.
Limitations identified from AFA algorithm in this study
The accuracy of the data collected by the eye tracker is crucial for the performance
of the algorithm. The accuracy measured from by Tobii eye tracker refers to the
CHAPTER 4. EVALUATING AFA ALGORITHM 114
Figure 4.3: Correlation between the accuracy of the algorithm and the minimum taskduration per participant
Figure 4.4: Correlation between the accuracy of the algorithm and the maximum tasksallowed per participant
CHAPTER 4. EVALUATING AFA ALGORITHM 115
proportion of times that the eye tracker can capture both pupils during the study. We
observed from the dataset that the strength of the correlation between AFA algorithm,
and self-report was proportional to the ‘minimum accuracy of participant’ filter value
(r(134) = .59, p <= .01). The higher we set the minimum accuracy for data to be
included in the analysis, the better AFA algorithm can sense movements of arousal,
see Figure 4.5. This shows how much AFA algorithm relies on eye-tracking accuracy.
Therefore, even though the analysis technique can be improved upon, the accuracy
of the eye tracker (which we have limited control over), has a great influence on the
accuracy of the algorithm. Using the limitations from self-report and this algorithm,
we make the following recommendations for usability and user experience researchers.
Figure 4.5: Stimuli against the algorithm’s arousal rating, participants’ reported feed-back, and the IAPS arousal ratings
Recommendations
Improving the correctness of self-reported approaches is difficult because bias is a
consequence of human nature. Conversely, eye trackers are constantly evolving, and
we can expect improved accuracy. For instance, we used a Tobii X60 (60Hz ) eye
CHAPTER 4. EVALUATING AFA ALGORITHM 116
tracker for this study, but there are eye-tracking devices with as much as 10x more
accuracy, precision or fidelity, i.e., the Tobii Pro Spectrum (1200Hz ). Therefore, there
seems to be more potential in AFA algorithm because it is easier to improve and develop
technology than influencing the human nature of bias, on a global scale. Also, the fact
that AFA algorithm is automatic, it is more suitable for computation. We propose that
for now, both approaches be used simultaneously to corroborate findings or cancel
out their limitations. At the barest minimum, researchers should be aware of the
limitations of their chosen approach, such as the ones we highlighted.
4.2.3 Summary
We carried out this study with the main objective of evaluating AFA algorithm on its
ability to sense changes in arousal from emotionally evoked stimuli. We obtained our
ground truth data from the IAPS database. Therefore, our dependent variable was
the arousal level measured by AFA algorithm, while our independent variable was the
expected level of arousal from each picture in the IAPS dataset. Other confounding
factors such as learning effect, desensitisation and participants’ lack of interest in-
troduce bias; hence, the need for physiological responses to complement self-reported
means of arousal detection. The moderate correlation (r(47) = .46, p ≤ .01) observed
between the output of AFA algorithm and the ground truth shows that AFA algorithm
has the potential to complement self-reported arousal detection for usability, UX and
other studies of visual behaviour. In addition to measuring the correlation between
our measure of arousal and our ground truth, we observed a direct correlation between
the accuracy of our eye tracker and the accuracy of AFA algorithm. This helps us in
estimating the impact that machine accuracy has on AFA algorithm and establishes
evidence that future enhancements to eye-tracking technology would improve the ac-
curacy of AFA algorithm.
However, as we mentioned at the start of this chapter, arousal can be caused by both
emotional and cognitive factors. Therefore, we need to examine how AFA algorithm
responds to cognitively induced arousal. The next section focuses on evaluating the al-
gorithm’s ability to sense arousal when participants experience increased arousal from
cognitively induced stimuli.
CHAPTER 4. EVALUATING AFA ALGORITHM 117
4.3 Sensing cognition-induced arousal
In the previous section, we showed that pupillary response could be used to sense
emotionally evoked arousal. However, cognition has profound impacts in interactive
systems because it is crucial for learning, searching, coordination and assimilation,
all of which influence user experience [Sweller, 1994]. Cognitive load refers to the
amount of cognitive effort that is expended by individuals while carrying out a certain
activity. During user interaction, we perform tasks that demand cognitive efforts
[Paas and Van Merrienboer, 1994]. As we established in our background section, the
literature provides evidence to suggest that performance increases with an increase
in arousal, but only up to a point where performance begins to deteriorate. Lack
of optimal cognitive load can lead to performance reduction and errors [Arent and
Landers, 2003]. In mission-critical systems such as air traffic control, an inappropriate
mental state is a catalyst for failure, which can lead to loss of life and property. In less
critical systems, lack of optimal cognitive load may lead to errors, loss of time, poor
user experience, and ultimately abandonment of the piece of software or system.
In this section, we evaluate AFA algorithm on pupillary responses to cognitive stimuli.
To do this, we elicit cognitive arousal and controlled conditions onto each participant.
Then, we compare the output of AFA algorithm against the known levels of cognitive
arousal that we have induced, to evaluate its ability to discriminate between both
states. We explain the design of our experiment in further details.
4.3.1 Experiment
Participants
Participants were 27 (10 female and 17 male), recruited to take part in the study at
the University of Manchester. Participants were prospective and active students from
the University of Manchester, as the experiment was conducted during the university’s
open day. Recruitment was done by word of mouth, participation was voluntary, and
withdrawal was possible before, during and after the experiment. This experiment was
approved by the University of Manchester school of computer science’ ethics committee
(approval number: CS 283). The participant information sheet and the consent form
for this study are appended in Appendix B.1 and B.2 respectively.
CHAPTER 4. EVALUATING AFA ALGORITHM 118
Stimuli
Stroop’s effect was used to elicit cognitive arousal in participants. Stroop’s effect
is a psychological effect where participants experience a decrease in cognitive effi-
ciency measured by accuracy and response time when they are distracted by incor-
rectly named objects (incongruent), compared to correctly named objects (congruent)
[Bousefsaf et al., 2014]. 25 Congruently named colours (CC), 25 incongruently named
colours (IC), 20 congruently named animals (CA) and 20 incongruently named animals
(IA) were used to elicit Stroop’s effect as in Figure 4.6.
Figure 4.6: Stimuli for stroop’s effect
Materials and Procedure
Tobii X2-60 eye tracker was used to capture gaze behaviour at an angle of ∼ 30◦ and
pupillary response every 20ms (f = 50Hz). Other equipment used include 17-inch
monitor, Tobii Studio 3.2 software, keyboard and mouse. Participants viewed each
stimulus image at a distance of ∼ 60cm from a 17-inch monitor angled at ∼ 115◦.
Participants were randomly allocated into two groups which determined the order the
stimuli were displayed. Group A viewed them in the order CC → IC → CA → IA
CHAPTER 4. EVALUATING AFA ALGORITHM 119
while group B’s stimuli were presented in the order CA → IA → CC → IC, See table
4.3. The participants were asked to name aloud each object within the stimulus,
Stimulus Expected Arousal LevelCongruent Animals (CA) LowIncongruent Animals (IA) HighCongruent Colours (CC) LowIncongruent Colours (IC) High
Table 4.3: Stimuli and expected arousal levels
irrespective of the textual label. As soon as they name every object in a stimulus,
they would press the space bar to move to the next, and there was no time limit for
each task or the entire experiment.
4.3.2 Result
Participants’ gaze behaviour under cognitive stimuli
The 27 participants spent a total of 3557.73s (59.30 minutes) observing all 5 stimuli.
The participants’ mean dwell time was M = 131.77s, SD = 28.55 and ranged between
88.80 and 182.46s. Figure 4.7a, 4.7a, 4.7c and 4.7c show the spread of fixations on
the AOIs across the congruent animal naming, incongruent animal naming, congruent
colour naming and incongruent colour naming respectively. From these figures, we
can see that the heat map appears more dense on the incongruent naming than the
congruent naming, for both each stimuli type (animals and colours) which indicate
more fixations on the incongruent naming. Fixation is a proxy for measuring atten-
tion [Corbetta et al., 1998, Pan et al., 2004, Holmqvist et al., 2011], which means that
the incongruent tasks required more attention to complete them.
The total mean fixation count across all media is 22.32. This means that for every
131s, participants will have approximately 22 fixations. Figure 4.8 shows the mean
fixation count across each media. As shown in the graph, breaking it down by stimuli
type (i.e. animals vs. colour naming), participants fixated more on the incongruent
stimuli than the congruent stimuli in both cases. For the animal naming stimuli, the
mean fixation count for the congruent stimulus was M=1.94, SD=2.51 while the mean
fixation count for the congruent stimulus was M=2.08, SD=0.11 compared to M=2.83,
SD=0.28 for the incongruent stimulus.
CHAPTER 4. EVALUATING AFA ALGORITHM 120
(a) Congruent animal naming (b) Incongruent animal naming
(c) Congruent colour naming (d) Incongruent colour naming
Figure 4.7: Heatmap showing the aggregated fixation on AOIs of each stimulus
Figure 4.8: Bar chart showing the total fixation count mean, for congruent and incon-gruent object naming across all stimuli
CHAPTER 4. EVALUATING AFA ALGORITHM 121
Similarly, the sum of mean fixation duration across all media is M = 0.39s, SD = 0.09.
This means that during the 131s (on the average) that participants spent across all
stimuli, they will fixate on an AOI for a total duration of approximately 4s. Figure 4.9
Figure 4.9: Bar chart showing the total fixation duration mean (s) for congruent andincongruent object naming across all simulli
shows the mean fixation duration across each media. As shown in the graph, breaking
it down by stimuli type (i.e. animals vs colour naming), participants’ fixation duration
was longer on the incongruent stimuli than the congruent stimuli in both cases. For
the animal naming stimuli, the mean fixation duration for the congruent stimulus was
M=0.53s, SD=0.05, while the mean fixation duration for the congruent stimulus was
M=0.28, SD=0.10 compared to M=0.34, SD=0.02 for the incongruent stimulus.
To minimise the learning effect on our analysis, we randomised the order of present-
ing congruent the stimuli such that Group A was presented congruent stimuli before
incongruent while Group B was presented the incongruent before congruent stimuli.
Analysis between the two groups shows that there was no statistically significant dif-
ference, (MannWhitney U (p > 0.05)) between both groups’ average fixation count
(Group A: M = 2.23, SD = 0.38; Group B: M = 2.232, SD = 0.49) and average
fixation duration (Group A: M = 0.39, SD = 0.09; Group B: M = 0.39, SD = 0.10).
Therefore, deducing from the fixation count and duration, the order of stimuli presen-
tation did not appear to influence their gaze behaviour.
CHAPTER 4. EVALUATING AFA ALGORITHM 122
In summary, considering fixation count and duration as proxies of attention, the in-
creased fixation count and duration in incongruent stimuli compared to congruent
stimuli, we can conclude that participants needed more attention to complete the
more cognitively demanding tasks (incongruent naming). Attention, however, does
not tell the full story. Therefore, we look at other indices of arousal, vis-a-vis, the
analysis of pupil dilation, especially as it is the focus of AFA algorithm.
Pupillary response to cognitive load
A total of 213, 985 pupil data points were extracted from the eye tracker. Participants
had an average of M = 7925.37, SD = 1696.97 rows of data, which is quite varied,
considering the large standard deviation. Figure 4.10 is a boxplot that illustrates the
data distribution of the pupil dilation grouped by congruent and incongruent naming
for each stimulus type. The box plot shows that the data distribution is quite similar.
Therefore, simply using the mean pupil dilation as a means to discriminate congru-
ent and incongruent naming would not suffice. We elucidate on how we utilised AFA
algorithm to generate a more distinctive measure of arousal than both the raw pupil
data and the gaze behaviour from 4.3.2. To analyse this dataset using AFA algorithm
requires aggregating the data to eliminate outliers because as we can see from the
box plot, some pupil dilations are as low as 1mm, which is likely to be the effect of
noise (because the typical pupil size ranges between 2mm-8mm ). Furthermore, we
transform the aggregated data for each participant to fit their range and measure of
central tendency using AFA algorithm explained in Chapter 3. Finally, we apply peak
detection and arousal sensing.
Therefore, after processing the raw datasets through our peak detection and arousal
sensing algorithm, we generated an output, which is an array of arousal levels for
each participant over each stimulus. The mean arousal level for animal naming in the
congruent task was M = 2.46, SD = 1.67 while participants experienced nearly twice
the amount of arousal (M = 4.89, SD = 2.45) on the incongruent task. The mean
arousal level for colour naming in the congruent task was M = 2.17, SD = 1.95 while
participants experienced nearly thrice the amount of arousal (M = 6.80, SD = 1.90)
on the incongruent task. We plot the distribution for the cumulative arousal that
CHAPTER 4. EVALUATING AFA ALGORITHM 123
Figure 4.10: Box plot showing the data distribution of the output of the algorithm foreach stimulus
each participant experienced on each stimuli on Figure 4.11 to illustrate the distribu-
tion more.
To evaluate our result, we perform correlation tests with the output of AFA algorithm
and the ground truth (the expected level of arousal after Stroop’s effect). We did
a correlation between the cumulative arousal per stimulus for each participant. As
described in 4.3.1, there were four stimuli. Congruent stimuli were categorised as level
0 - low arousal and incongruent stimuli were categorised as level 1 - high arousal as
shown in Table 4.3. Using point biserial correlation (to perform a test of correlation
between an independent variable and dependent variables with two categories), we
found that there was a moderate correlation, r(76) = .64, p < .01 between the ex-
pected arousal level and AFA algorithm s arousal level. Breaking this down by stimuli,
we found there to be a moderate correlation, r(76) = .51, p < .01 for animal naming
between the expected arousal level and AFA algorithm s arousal level, while there was a
high correlation, r(76) = .77, p < .01 for colour naming. This suggests clearer discrim-
ination between congruent and incongruent stimulus for colour naming compared to
animal naming. To examine whether there is a statistical difference, we apply Mann
CHAPTER 4. EVALUATING AFA ALGORITHM 124
Figure 4.11: Violin plot showing the data distribution of the output of the algorithmfor each stimulus
Whitney U. Results show that the algorithm can discriminate between congruent and
incongruent stimuli, in both animal naming (U(76) = 51, p < .05) and colour naming
tasks(U(76) = 142.5, p < .05). We discuss the lessons learnt from this study.
4.3.3 Lessons learnt from the analysis of cognitive stimuli
From our experimental design, incongruent naming of objects represents increased cog-
nitive load, while congruent naming simulates the controlled condition. The result of
our within-subject design shows that participants required more attention to complete
the more cognitively demanding tasks as seen from the gaze behaviour [Corbetta et al.,
1998, Pan et al., 2004, Holmqvist et al., 2011]. The results of our analysis of the par-
ticipants’ gaze behaviour confirm previous claims in the literature, that people fixate
more and for a longer duration to pay more attention and complete more cognitively
demanding tasks. Cognitive stimulus differs from the emotional stimulus, in terms of
arousal. For example, fear is an emotion that is characterised by a negative valence
CHAPTER 4. EVALUATING AFA ALGORITHM 125
(displeasure), increased arousal, and non-dominant (submissive response). The sub-
missive response means that participants will tend to avoid stimuli that cause fear,
thereby exhibiting low fixation count and duration, in terms of their gaze behaviour.
In our analysis of emotional stimulus in the previous chapter, we showed that AFA
algorithm was able to identify an increase in arousal under different emotional stim-
uli. Despite the potential differences in gaze behaviour between cognitively induced
arousal and emotionally evoked arousal, AFA algorithm was able to discriminate be-
tween congruent and incongruent arousal.
We observed that AFA algorithm was able to identify the arousal signal clearer in the
colour naming tasks clearer than in the animal naming task. This may suggest that
the incongruent colour naming task was more difficult than the incongruent animal
naming tasks. In addition to that possibility, the following quote from participant
P14M suggests that some participants were able to adopt coping mechanisms to avoid
the incorrect naming of animal objects. Whereas this specific coping mechanism was
not as efficient for the incongruent colour naming task.
“I was able to view the animals passively without looking at the captions.
This was more difficult for the colours because I could not name the colours
themselves while also avoiding the text on the colours. This is why I called
some of the colours the wrong name.”
[P14M]
This phenomenon could be related to the barriers of language retrieval. Language
or word retrieval is the process of recalling a target word from memory [Kambanaros
et al., 2013]. People adopt several strategies while retrieving words from memory,
e.g. by associating semantic relevance of the word to the context or by perceptual
relevance [La Heij, 1988]. In animal naming, there are more visually distinguishing
cues to associate the animals with, for example, the shape, size and distinguishing
features (i.e., the wings of a bird and the trunk of an elephant) [Martin et al., 1994].
Whereas, for colours, participants have limited features to aid in word retrieval [Shao
et al., 2015].
Furthermore, the quote from participant P25M suggested that naming animals in
the English language presented a language barrier to the task.
CHAPTER 4. EVALUATING AFA ALGORITHM 126
“As a native Arabic speaker, I could not remember what some of the an-
imals were called in English. For example, I called the spider a scorpion.
That was one level of confusion, as well as the wrong labelling. It was easier
for me to remember the English names for the colours than the animals.”
[P25M]
This is a well-researched phenomenon in the literature on cross-cultural information
retrieval [Ballesteros and Croft, 1998]. This could have been put into consideration,
for example, recruiting only native speakers or collecting English language proficiency
as part of the participant’s information to be accounted for.
Therefore, it is likely that some participants experienced increased cognitive load
for animal naming because they could not recollect the English word for the animal.
Thereby making the controlled task not as effective for colour naming compared to
colour naming.
We also showed that despite gaze behaviour being a capable discriminator of increased
cognitive arousal, combining it with pupil dilation, increases this distinction signifi-
cantly. In reality, the boundary between the causes of increased arousal may not be
determinable. This is why an approach that takes an average of participants reaction
under different stimulus may not be effective. Therefore, an event-based approach
that is capable of sensing moments of increased arousal is ideal. AFA algorithm which
uses continuous peak detection to sense when a participant experiences an increase in
arousal while combining it with detection of areas of focal attention can be used to
identify the areas on a user interface that induces increased cognitive load. Designers
can, therefore, address this by adapting the user interface in real-time or offline, or
providing hints to the user where this is not possible. We understand that the Stroop’s
effect evaluates of AFA algorithm by comparing a controlled situation with cognitive
induced arousal, which may be less realistic. Therefore, the study in the next Chap-
ter was carried out using stimuli, which the literature has described as some of the
common causes of end-user frustration on the Web.
CHAPTER 4. EVALUATING AFA ALGORITHM 127
4.3.4 Summary
Cognitive overload is one of the causes of stress during user interaction. The term user-
friendliness indicates that a user interface is intuitive and corresponds to the users’
expected outcome. The presence of cognitive overload, however, is an indication that
the content, presentation, layout or structure of multimedia content may not be fit for
purpose. Arousal can be used as a proxy to sense cognitive overload. In this chapter, we
showed that AFA algorithm could be used to detect the discriminate between controlled
conditions and cognitively induced arousal. Using Stroop’s effect, we induced cognitive
arousal onto 27 participants. Results showed that AFA algorithm could discriminate
between incongruent and congruent tasks for both animal naming and colour naming
tasks.
In this chapter, we have shown through our studies, that AFA algorithm is capable of
sensing arousal due to emotional or cognitive stimuli. Both stimuli have taken the
form of images, and contain effects that are not frequently encountered, especially in
such magnitude, during user interaction. Therefore, we need to examine the behaviour
of AFA algorithm on common causes of stress during user interaction, to show that AFA
algorithm is ecologically valid.
In the next chapter, we evaluate the ability of AFA algorithm to sense frustration
induced arousal on the Web.
Chapter 5
Sensing frustration-induced arousal
on the Web
In the previous chapter, we evaluated AFA algorithm on emotionally-evoked arousal
and cognitively induced arousal. Those evaluations allowed us to examine how the
algorithm would perform on static stimulus as, in both cases, we used images to in-
duce the desired levels of arousal onto the participants. However, in addition to visual
engagement and cognitive activities, participants browsing the Web engage in more
complex interactions such as typing on the keyboard, mouse scrolling, mouse-clicking
and mouse hovering. Furthermore, people use these modes of interaction to achieve
specific tasks. For example, people tend to scroll during reading tasks, people use the
keyboard to type characters when filling forms, and people hover their mouse in target
acquisition tasks. Any failure in computer peripherals to aid interaction could result
in frustration. Besides the user’s interaction, other factors deter successful task com-
pletion on the Web. For example, network failure, hardware and software malfunction,
and poor design of user interfaces may prevent users from completing their tasks.
Task completion is a critical factor for the usability of a system. The inability of users
to complete their tasks may result in several emotional and cognitive responses that
hinder the quality of user experience. Since we have previously tested AFA algorithm
on detecting arousal from emotional and cognitive stimuli, in this chapter, we evalu-
ate AFA algorithm in detecting these same psycho-physiological signals when users are
unable to complete their web interaction tasks due to hindrances such as software,
hardware and network failure.
128
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB129
5.1 Why frustration?
Detecting frustration on the Web provides a greater challenge compared to the previ-
ous evaluations in Chapter 4 due to the various modes of interacting with a website,
and the events that could go wrong. Some frustrating events may result in lower levels
of arousal compared to purely emotional or cognitive stimuli like those in Chapter
4. However, it is also crucial to detect lower levels of arousal because the cumulative
effects of undesired states could result in an overall poor user experience towards a
system. Moreover, if frustration is not prevented or accounted for, it can lead to other
negative emotions including anger, aggression, sadness, disappointment, fear, anxiety
and withdrawal from the use of a software or website [Jeronimus and Laceulle, 2017],
all of which are limiting factors towards good user experience.
Berkowitz et al. defined frustration as an emotional response to delay or hindrance
in achieving a goal [Berkowitz, 1962]. It is a negative affective state that leads to
an increase in physiological arousal [Storms and Spector, 1987]. In interactive sys-
tems, frustration caused mainly by incorrect or unexpected responses to the users’
interaction. This may be due to the nature of the task (e.g. difficulty), hardware
infrastructure, user interface, software system, network or even the user’s mental and
cognitive state [Olson and Olson, 2003]. Frustration can lead to the reduction in ac-
curacy, speed and other psychological consequences, such as loss of motivation, all of
which limit the quality of user experience in terms of performance [Lazar et al., 2006a].
Frustration is an idiosyncratic experience, and people respond to its effects differently.
In a study by Szasz et al., it was discovered that people respond differently to frustrat-
ing tasks depending on their coping mechanisms [Szasz et al., 2011]. Some people are
more resilient than others, people have different competency levels using technology,
and some people react differently to the layout and presentation of Web contents as
can be seen in our case study of neurotypical people vs autistic people on the Web
(Chapter 5). Creating a set of heuristics or guidelines to manage frustration may
not be generalizable or effective for each user. Peoples’ prior experience, personality
traits and demographical background (e.g. culture, age, gender and location) result in
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB130
different tolerances and reactions towards incorrect or unexpected responses to com-
puter usage [Lazar et al., 2006a]. Therefore, interventions need to be administered
to users individually and per case. This concept holds certain similarities to the field
of medicine, where personalised healthcare is used to deliver treatment to each pa-
tient based on the severity of their symptoms, demographic profile, genetics, etc. One
way to accomplish this during user interaction is to detect frustration as it occurs
automatically. Physiological sensors, for example, heart rate and blood pressure were
suggested as a means to examine the effects of frustration on an individual basis [Szasz
et al., 2011].
Affect detection systems have often been evaluated using cognitively induced arousal
as well as emotionally evoked arousal but frustration induced arousal has been un-
derstudied. In the next section, we discuss works that are similar to ours regarding
frustration detection in interactive systems.
5.2 Related works on sensing frustration in inter-
active systems
In 2002, Klein et al. proposed an interactive support affective agent that helps users
manage and recover from negative emotional experiences during computer use by
demonstrating active listening, empathy, and sympathy [Klein et al., 2002]. Despite
the relevance of their work in ameliorating the effects of frustration, a system has to
be in place to sense when frustration occurs. In 2010 Lunn et al. performed a study
using galvanic skin response to sense stress levels between older users and younger
users, and between static contents and dynamic contents on the Web. Their results
yielded no significant result by age, but they observed that older users had a varied
physiological response to dynamic contents response compared to static contents. This
may indicate hesitancy due to a lack of familiarisation, and cautiousness towards dy-
namic contents [Lunn and Harper, 2010b]. Lunn et al.’s work identifies patterns of
interaction and physiological responses that can help build heuristics for older users.
Our work improves on this through the analysis of gaze behaviour to detect the users’
focal attention, in a more generalisable way, not only to older users but also applica-
ble to wider demography of users. In another study, facial electromyography (EMG)
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB131
was used to discriminate between correct and incorrect tasks, novice and expert users,
whereas, the difficulty rating of a website was used as indices of frustration [Hazlett,
2003]. As mentioned earlier, frustration can occur when participants do not achieve
the expected outcome of an interaction (correctness and completeness), but, other fac-
tors cause frustration on the Web besides their ability to complete tasks. In another
study that was based on the hypothesis that people who are frustrated tend to apply
more pressure to their mouse device, Qi et al. used a mouse, mounted with an 8-point
pressure sensor to collect pressure information from participants [Qi et al., 2001]. Qi et
al. further improved the accuracy of this study, using a Bayesian model to sense par-
ticipants’ frustration on an individual basis, achieving an accuracy of 88% [Qi et al.,
2001]. Using the mouse device as a sensor for frustration detection is ideal for use
in applications that make consistent use of the mouse. AFA algorithm complements
their approach, especially in applications that make use of inconsistent/limited mouse
interactions to perform tasks.
A multi-modal approach using a pressure-sensitive mouse, pressure-sensitive chair and
a camera was used to develop a model for predicting frustration in user search tasks
with an accuracy of 65% [Feild et al., 2010]. In the same study, the interaction events
were logged and used as predictive features. With the event log data, Field et al.
achieved a higher accuracy of 87%. In another multi-modal study, Kapoor et al. used
a camera, posture sensing chair, pressure mouse, skin conductance and the game state
of the user to detect frustration with an accuracy of 79%. Galvanic skin response and
Gaze data have also been used in combination to sense the frustration and severity of
usability problems [Bruun et al., 2016]. Several other models have been built to predict
both satisfaction and frustration, especially for consumer-based users [Garrett et al.,
2004]. Many of these approaches have limited potential for widespread application in
the wild due to their use of multi-modal sensors. The ideal solution for arousal detec-
tion in interactive systems should be unobtrusive and respond to low-intensity changes.
It should also be generalisable (not specific to a software application). Furthermore,
unimodal solutions are more suitable because using multiple sensors decrease the like-
lihood for ubiquitous use. Our proposed solution fulfils these criteria.
During a frustrating experience, there is an increase in arousal [Hokanson and Burgess,
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB132
1964]. Therefore, our evaluation in this chapter is to extend the validity of AFA algo-
rithm towards its use in sensing frustration-induced arousal on the Web.
5.3 Research contributions through this study
This evaluation also examines the ecological validity of AFA algorithm, since we make
use of more practical stimuli. Ecological validity is one of the main challenges in sens-
ing arousal in the field of affective computing in the wild. We have aim to answer the
following research questions in this study:
RQ1. Can pupillary response be used to sense arousal, induced by frustra-
tion on the Web?
For this, we examine whether we can yield consistent results on the Web, as
with emotional and cognitively induced arousal on static images. Also, frustra-
tion may induce arousal levels of low intensities compared to emotional images
or cognitive load. Would AFA algorithm be able to sense low intensity signals?.
This research question partially addresses the research question RQ2 stated in
Chapter 1 pertaining to the overall goal of our PhD research.
RQ2. Is there a relationship between participants’ levels of arousal and their
focal attention during moments of frustration?
For this, we examine whether we can localize the cause of arousal to a certain
element on the screen. When people feel stressed caused by a UI element, can we
identify this, so that AFA algorithm can inform potential interventions to ame-
liorate frustration with techniques such as adaptive computing or recommender
systems? This research question partially addresses the research question RQ3
stated in Chapter 1, one of the goals of our PhD research.
To answer these questions, we induced participants with known causes of frustration.
The choice of stimuli was based on a study by Ceaparu et al. where the frequency,
the cause, and the level of severity of frustrating experiences in interactive systems
were researched on through the use of diaries and surveys [Ceaparu et al., 2004]. In
our study, we induced participants with these known causes of frustration selected
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB133
from Ceaparu et al.’s study. Further, we applied AFA algorithm to sense arousal in
participants and detect their focal attention when the frustrating component could be
segmented on the screen. We describe our study in further detail.
5.4 Experiment
This experiment was approved by The University of Manchester committee on ethics
and informed consent was obtained prior to participants taking part in the study
(approval number: 2018-4365-5934). The participant information sheet and the
consent form for this study are appended in Appendix C.1 and C.2 respectively.
5.4.1 Participants
Participants (N =40, Female=13, Male=27) with a median age of 25 years (M =
26.33, SD = 5.72) were recruited for this study. Invitation to take part was promoted
via poster advertisement, emails, and by word-of-mouth to people within the university
community. All participants identified as having normal, or corrected to normal vision.
5.4.2 Materials and procedure
A Tobii X2-60 eye-tracker was used to capture pupillary response, fixation location,
and fixation duration of the participants at a frequency of 50Hz. As they carried out the
tasks, a Logitec 1080 pixel Web camera was used to capture video of the participants.
This video was replayed to the participants after the study to aid their recall regarding
episodes of frustration while filling in their self-reported measure of frustration on a
questionnaire (see Appendix C.3 for the questionnaire). After participants rated their
levels of frustration, we took note of comments, feedback or self-reflection from the
participants regarding the study. A mouse and keyboard was used to interact with the
websites, while a 15.6 ′′ monitor at 1366 × 768 pixels resolution was used to view the
instructions and carry out the tasks in full screen mode. The experiment was run on
a Dell Latitude E5530 notebook, Windows 7 operating system with a Mozilla Firefox
Quantum 59.0.2 (64-bit) Web browser.
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB134
5.4.3 Method
The study took place in a usability laboratory with regulated level of illumination to
control for changes in pupil dilation due to light. The design makes use of a 4x2 within-
subject design with a random order of stimulus presentation. There are two levels of
effect for each website presented to participants, ‘normal’ and ‘disruptive’ interaction.
During the normal interaction, participants carried out tasks without experiencing
unexpected responses. During the disruptive interaction, participants experienced a
simulated operating system failure, pop-ups, internet time-out and mouse malfunc-
tion on Google, Wikipedia, National Express and BBC websites, respectively. The
disruptions were based on the most common causes of end-user frustration reported
by Ceaparu et al. [Ceaparu et al., 2004].
The websites were selected such that they represented different types of tasks: infor-
mation search, reading, data entry, and pointing tasks. Table 5.1 shows the list of
tasks, descriptions and disruptions associated with each website. The disruptive tasks
were implemented using Javascript and injected into the websites using Violentmonkey
(a browser-based plugin that inserts user scripts into websites at run time) [Gerald,
2018]. Other tools for creating user-script include Tampermonkey [Biniok, 2018] and
Greasemonkey [Buchanan, 2018]. At the time of writing the user-scripts for this study,
we selected Violentmonkey because it is opensource and was compatible with the ver-
sion of the Web browser that was compatible with our eye-tracking software. The task
ID starts with ‘T’ followed by the task number[1-4], while the last character stands
for ‘D’-disruptive or ‘N’-non-disruptive. We describe the design of the four disruptive
tasks in further details:
T1D: A booking task where participants were instructed to book a trip to Manchester.
The rationale behind this task is to simulate what participants experience when
there is a time-out just before completing the task. In reality, a time-out can be
caused by system errors, internet connection, unexpected software behaviours
such as an interrupted system restart for an operating system software update.
When these disruptions occur, users are unable to complete their tasks, and this
leads to frustration due to wasted efforts and consequently their loss of time.
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB135
Often, they have to redo the task from the start. This effect is similar in retail e-
commerce websites, job application websites, hotel and transportation booking
tasks where participants fill in forms to complete their tasks. Transportation
booking sites follow a similar process, so familiarity is less likely to be a biasing
condition compared to some of the forms above. We selected the national express
website as it is one of the popular coach booking websites in the UK. For the
disruptive version of the task, our design was such that the participant would
experience a time-out response before they reach the point of paying for their trip
to Manchester, from London. After 3 seconds, the participants were redirected to
start the booking again, this time, without disruption. Our expectation, backed
by the literature, is that participants will be frustrated by having to start the
process all over again [Lazar et al., 2006b, Qi et al., 2001].
T2D: A pointing task where participants were instructed to check the weather in
Manchester during a distinct time. Participants were required to select the loca-
tion, date and time by hovering, pointing and clicking using their mouse device.
Therefore, it was suitable for the mouse malfunction disruption. Other exam-
ples where mouse pointing is frequently used is in gaming applications, website
search tasks and desktop browsing with the file browser. Since our study is about
frustration on the Web, we selected BBC.com as it is ranked as the sixth most
popular website in the UK (2018), by https://www.alexa.com/topsites. For
the disruptive task, we made their standard mouse pointer invisible. Using CSS,
we painted a lookalike of the mouse pointer using CSS and Javascript and made
the fake mouse pointer disappear and appear at random intervals and locations
so that it simulates a faulty mouse device. We anticipated that the lack of total
control of the mouse pointer by their mouse would cause the participants to
become frustrated. Furthermore, findings in the literature suggest that unex-
pected outcome, imprecision and inaccuracy of pointing devices are sources of
frustration in interactive systems [Benko and Wigdor, 2010].
T3D: A search task, where participants were instructed to conduct a simple Google
search to find out the current time in Ottawa, Canada. The purpose of this task is
to simulate an operating system failure. In the event of operating system failure,
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB136
it can result in a loss of data, communication, time, and even money to the user.
Operating system failures can occur at any point during user interaction and is
usually an unwelcome event during user interaction. Therefore, we hypothesised
that an unwelcome event such as an operating system crash during the study
would induce frustration unto the participant. We chose to do this in a simple
task such as Web search so that the user does not feel that they have caused the
system to crash since they were only carrying out a fairly easy Google search.
We used the Windows operating system because operating system crashes are
common on the Windows OS compared to other operating systems.
For this disruptive mode, the user script is triggered as soon as the search query
is executed. The user script makes use of Javascript to redirects the page to
a picture, in fullscreen mode, that resembles the well known ‘Blue Screen Of
Death’ (BSOD or Blue Screen for short). Since the experiment is presented in
full-screen mode, the search bar and taskbars are hidden, so the participants
were led to believe that there was an actual operating system failure. After 3
seconds, the participants are redirected back to the default Google search page.
T4D: A reading task in which the participants are instructed to look for Stephen
Hawking’s PhD thesis on his Wikipedia page. The target was at the bottom of
the page, so users were expected to scroll down. To scroll, participants needed
to dismiss the pop-up by checking the button which says “Do not display the
pop-up again”. The Wikipedia website was chosen because it is one of the most
popular media for looking up biography information (ranked as the seventh most
popular website in the UK (2018), by https://www.alexa.com/topsites. The
main aim of this task is to induce frustration through pop-ups. A reading task
was chosen because the pop-up serves as a hindrance to their view, and the pop-
up hinders their interaction with the webpage. Furthermore, pop-ups are known
to cause frustration on the Web as reported in the literature [Bahr and Ford,
2011, Sanghoon and Roberto, 2005, Baylor and Rosenberg-Kima, 2006].
For this disruption task, the user-script is launched as soon as the page is loaded
to show a pop-up with the content “Invalid user action”. The information that
participants needed to find was located at a position on the page where they had
to scroll down to see it. The participants were then interrupted by pop-ups at
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB137
Figure 5.1: Disruption to tasks to elicit frustration: T1. Time-out experienced whenbooking a trip, T2. Mouse location altered when selecting weather information, T3.Operating system error during Google search and T4. Multiple Pops ups interruptingWikipedia content lookup
one-second intervals.
Figure 5.1 illustrates the tasks and websites.
As stated previously, the aim of the study was to measure the performance of our
proposed approach (pupillary response and eye-tracking) by discriminating between a
temporal period of frustration and normal user interaction. We discuss how we carried
out our analysis next.
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB138
Tab
le5.
1:E
xp
erim
enta
lta
sks
Task
IDW
eb
site
Task
desc
ripti
on
Sim
ula
ted
dis
rup
tion
T1N
Nat
ional
Expre
ssB
ook
trip
from
Man
ches
ter
toL
ondon
Non
eT
2NB
BC
Wea
ther
Chec
kw
eath
erin
Lon
don
Non
eT
3NG
oog
leC
hec
kti
me
inO
ttaw
a(C
anad
a)N
one
T4N
Wik
iped
iaF
ind
the
titl
eof
Ala
nT
uri
ng’
sP
hD
thes
isN
one
T1D
Nat
ional
Expre
ssB
ook
trip
from
Lon
don
toM
anch
este
rSes
sion
tim
eou
tT
2DB
BC
Wea
ther
Chec
kw
eath
erin
Man
ches
ter
Mou
sem
alfu
nct
ion
T3D
Goog
leC
hec
kti
me
inC
anb
erra
(Aust
ralia)
OS
failure
T4D
Wik
iped
iaF
ind
the
titl
eof
Ste
phen
Haw
kin
g’s
PhD
thes
isM
ult
iple
pop
-up’s
NB
:N
=nor
mal
,D
=dis
rupte
d
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB139
5.4.4 Analysis
In this study, the entire stimulus (task) was treated as a single AOI. Table 5.1 shows
the tasks that were carried out by participants. If any task was incomplete due to
network failure or website malfunction (not by our design), that participant’s entire
dataset is excluded from the analysis to preserve the balance of our within-subject
design. Therefore, 10 participants were excluded based on this criteria. Linear inter-
polation was performed to replace missing values from the eye-tracker. Furthermore,
the pupil (right or left) with the most complete data extracted from the eye-tracker
was used as the input to the analysis algorithm described in Chapter 3.
5.5 Results
Results of our analysis show that AFA algorithm can sense arousal in frustrated par-
ticipants to a large effect (η2p). We present this in further details.
The output of the algorithm described in section 5.4.4 was stored as a dependent vari-
able (DV) in a vector for further statistical analysis against the independent variables
(IVs) - the website, and the mode of interaction (normal or disrupted). The violin
plot in Figure 5.2, grouped by the task and mode of interaction shows the distribution
of the DV, which illustrates a difference in the pattern for disruptive (red), vs non-
disruptive tasks (blue).
The experiment utilised a 4X2 within-subject design. Therefore, we performed a two
way repeated measures analysis of variance (ANOVA). The dependent variable be-
ing the cumulative effect of arousal reported by the algorithm, and the independent
variables being the website, and mode of interaction (with two factors, ‘normal’ or
‘disruptive’). There was a significant difference between the modes of interaction on
the algorithm’s arousal levels with a large effect (F (1) = 400.303, p < .001, η2p = .690).
We also observed that there was a significant difference between the websites on the al-
gorithm’s measure of arousal with a large effect (F (3) = 182.669, p < .001, η2p = .849).
However, performing a pairwise comparison of difference (Wilcoxon’s test), on the
websites revealed that T1 (booking a trip) was significantly different from the other
tasks. Table 5.2 shows the pairwise comparison between the websites, while Figure 5.3
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB140
Figure 5.2: Violin plot of the data distribution of the level of arousal in all tasks forboth groups (disruptive and normal)
illustrates that website T1 appears to elicit more arousal than the others. On further
investigation, excluding T1 from the analysis, the same ANOVA shows that the web-
site had no significant effect on the arousal level (F (2) = 0.292, p > .001, η2p = .009),
while the mode of interaction still has a large effect (F (1) = 100.7, p < .001, η2p = .537).
The average time of completion for task T1 (M = 62.586s, SD = 26.732) was twice
as long as T2 (M = 24.330s, SD = 10.294), T3 (M = 22.475, SD = 12.399) and T4
(M = 21.667, SD = 18.206) which may have influenced the cumulative arousal score
Table 5.2: Results of Wilcoxon test comparing arousal between each task with Bon-ferroni correction α = 0.008
Group1 Group2 W p-valueT1 T2 45.0 <.001*T1 T3 38.0 <.001*T1 T4 79.0 <.001*T2 T3 896.0 1.000T2 T4 720.0 .907T3 T4 739.0 .195Note: * = p < .008
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB141
Figure 5.3: Bar chat with error bars (Standard error of the mean) showing the tasks(both modes of interaction combined) vs level of arousal
in T1 compared to the other tasks. Similarly, we performed a Tukey HSD test to deter-
mine how the measure of arousal differed based on the mode of interaction; the results
show that there is a significant difference, (M = 1.25, p < .001, CI.95 = [0.69, 1.82]).
We conducted a pairwise comparison (Wilcoxon’s test) between the normal interaction
mode and the disruptive mode for each website. Table 5.3 shows that the algorithm
was able to discriminate arousal in all four tasks. Figure 5.4 also illustrates this. This
result answers RQ1., which pertains to whether AFA algorithm is capable of distin-
guishing between frustrating tasks and controlled tasks.
Table 5.3: Results of Wilcoxon test comparing the mode of interaction within eachtask with Bonferroni correction α = 0.0125
Group1 Group2 W p-valueT1D T1N 0.0 <.001*T2D T2N 121.0 .004*T3D T3N 28.0 <.001*T4D T4N 1.0 <.001*NB : N = normal, D = disrupted
* = p < .0125
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB142
Figure 5.4: Bar chat with error bars (Standard error of the mean) showing all tasksvs level of arousal
We have presented the analysis of variance between the measure of arousal consid-
ering the website and the mode of interaction as factors. We now consider an additional
method to evaluate AFA algorithm against the ground truth by testing how similar the
output of the algorithm’s (DV) is, to the participants’ rating, and the mode of inter-
action. A Spearman correlation test between the DV and the participant rating shows
a moderate correlation, (rs(301) = .325, p < .001). However, the moderate interrater
agreement (κ = .600) between both sources of ground truth - the participants’ rating
and the mode of interaction (normal and disruptive) suggests that there is significant
disagreement. The quotes from participants P01M, P37M and P05M may explain why
some participants were not frustrated by the disruptive tasks.
“I am familiar with time-outs, I never expect to fill a form without errors.”
[P01M]
“I didn’t even notice that there were pop-ups
[P37M]
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB143
Also, some participants’ feedback (P05M and P14M) suggest that the order of
presenting the stimuli may have influenced their arousal levels.
“Towards the later tasks, I began to suspect that there was a trick going on
so, I became relaxed.”
[P05M]
“I had problems remembering the instructions earlier on, but it became
easier as the experiment progressed.”
[P14M]
For some participants, their personal experiences may have influenced their response
to the stimuli. For example, a participant who rated both modes of interaction on T1
(booking a trip) as frustrating said:
“Trips are frustrating [...] anyway, I don’t like to see anything that reminds
me about travelling.”
[P31M]
Conversely, another participant rated the normal mode in T1 (booking a trip) as
frustrating, while giving a normal rating for the disruptive mode. The participant’s
feedback was:
“...not happy thinking about a trip to Manchester.”
[P12M]
indicating that previous experiences may have influenced their rating, rather than
the stimulus. As previously stated, the purpose of this study is to evaluate the per-
formance of AFA algorithm to discriminating between induced arousal. We have two
sources of ground truth to validate our dataset. For more confidence that the inter-
vention group’s dataset contains only instances of arousal, the participant must be
carrying out a frustrating task and also rate the task as frustrated. For more confi-
dence that the controlled group were normal, the participant must be carrying out a
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB144
non-disrupted task and report it as non-frustrating. As a form of secondary analy-
sis, we investigated this concept by carrying out the same test on the subset of the
records that have a perfect agreement. This means that we would exclude 109 records
where the participant reported frustration, where it was a normal task, or, the partic-
ipant reported normal when it was a disrupted task to include only records where the
ground truths agree. In our evaluation on emotional stimulus via the picture view-
ing study, we learnt that by eliminating confounding factors such as lack of interest,
stimuli desensitisation, we could carry out a better evaluation of AFA algorithm. In
this particular study, we anticipated that the lack of accuracy in the participants re-
ported feedback could be a confounding factor. We also anticipated that since we are
using low-intensity stimuli, this effect may be missed by some participants. Therefore,
by analysing data where our two sources of ground truth are saying the same thing,
i.e., participants reporting stress when the stimulus was designed to be stressful, or
participants reporting no stress and on the controlled stimuli, this ensures that our
ground truth reflects more accurately, what the participants feel. We observed a mod-
erate but higher correlation between the consolidated ground truth and the algorithm,
(rs(192) = .474, p < .001) after eliminating these confounding factors. We discuss the
implications of this increase in correlation in the discussion section. The large effects
observed from the ANOVA results and the moderate correlation between the ground
truth and the arousal level suggest that the algorithm can detect frustration induced
arousal about research question RQ1. However, to confirm that this was indeed due
to the frustrating component on the screen, as with our research question RQ2, we
carried out an analysis on the participants’ gaze data to ensure that their attention
was focused on the source of frustration during the period of increased arousal level.
The use of fixation metrics as a proxy to measure attention is common practice in eye-
tracking studies [Corbetta et al., 1998, Pan et al., 2004, Holmqvist et al., 2011]. Task
T2D and T3D, which were the operating system crash and mouse pointer malfunction
did not have specific UI elements on the screen that caused increased arousal. There-
fore, we subset only T1D and T4D for this analysis, because the frustrating elements
in this tasks were located in a specific area of the screen, i.e., the time-out message
on T1, and the pop-up messages in T4. We segmented these tasks into AOIs around
the location of the pop-up and time-out message. The mean fixation count (number
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB145
of fixations) for T1D on the time-out message was 105.027 (SD = 22.514), and for
T4D, the mean was 38.210 (SD = 22.193). We performed a Spearman’s correlation
test and observed a high correlation between the fixation count and the algorithms’
measure of arousal for those tasks, (rs(75) = .867, p < .001). This result is supported
by literature in eye tracking and usability [Ehmke and Wilson, 2007]. Figure 5.5 il-
lustrates that participants who fixated more on the disruptor were more likely to be
frustrated. This suggests that the algorithm is both capable of discriminating be-
Figure 5.5: Number of fixations vs level of arousal (all observations n=75)
tween frustration-induced arousal and normal interaction (RQ1), while also detecting
the participants’ focal attention during these moments (RQ2).
5.6 Discussion
Based on the literature, we selected a list of relevant frustrating events during user
interaction with the assumption that those events would induce an increase in the
participants’ level of arousal. The moderate agreement between feedback from the
participant and expected feedback shows that there is some validity to this assumption,
but also means that not all participants conformed to our expectations. Also, there
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB146
was a significant increase in the correlation between the ground truth and the output
of the algorithm when we excluded records with ground truth disagreement (between
the self-report and the mode of interaction). We discuss possible reasons.
Individual-response (IR) specificity, where people react idiosyncratically to the same
stimulus has been shown to influence people’s autonomic response to stimuli [Engel,
1960, Wenger et al., 1961]. The quotes from participants P01M, and P37 suggest that
IR specificity affected this study.
The lack of perfect agreement could also be a result of bias in the participants’
self-report. Bias can be caused by exaggerated [Kihlstrom et al., 2000] and under-
stated responses [Levine and Safer, 2002]. Recall bias, where participants are unable
to recollect or quantify their experience is also a common problem in studies that
use retrospective self-reported feedback [Hassan, 2006]. To mitigate this, we replayed
the video of the participant during their participation, in the order in which they
performed the tasks to aid them in remembering their level of frustration during the
feedback session. Another factor that could have influenced the moderate interrater
disagreement is the order in which the tasks are presented to the participants. Con-
sequences of these are, the initial tasks may have had a more frustrating impact than
other subsequent tasks. For example, the quotes from P05 and P14M suggested that
the order may have influenced their susceptibility to the disruptive stimuli, and their
cognitive states, respectively. The randomised order of the study would have been
expected to reduce the effect of the stimuli presentation order in the main results, but
this would still have impacted the interrater agreement. For some participants, their
personal experiences may have influenced their response to the stimuli. For example,
the quotes from P31M and P12M in the results section.
The results suggested that the website contributed to the measure of arousal. Table 5.2
and Figure 5.3 highlight that only T1 - booking a trip was significantly different from
the other tasks. The average duration of the task was also higher than the average
duration of the other tasks. Therefore, we speculate that the time, and consequently,
the effort required to complete T1 could have aggravated the participant’s level of
frustration. This is not to dismiss the type of frustration experienced in task 1 (form
time-out) as a contributing factor to the increase in frustration level compared to
other tasks. If we consider each disruptor by examining the mean difference in arousal
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB147
between the modes of interaction on each task, results show that T1 had the largest
mean difference. According to this, a time-out was the most frustrating task, followed
by pop-ups, then operating system failure, while the mouse pointer disruptor was the
least frustrating task in this study. We discuss external validity as a form of limitation
to our experimental design. Considering that this was a controlled study, our evalua-
tion was based on internal validity. Concerning research question R1, results suggest
that frustration induced arousal on the Web using AFA algorithm. Results also show
that participants who fixated more on the area of interest, causing arousal experienced
more arousal, which is an indication that gaze behaviour can be used to reveal the
cause of arousal. In adaptive computing, where user interfaces can be adapted to suit
individuals, it is necessary to understand how each component of the UI affects the
user. In most affect detection mechanism, it is difficult to determine the relationship
between the visual components and user affect, therefore, making them less suitable
for adaptive computing, identifying or identifying UX problems. The results provide
further credence for the use of AFA algorithm in sensing and quantifying arousal on
the web.
In this controlled study, we showed evidence that common causes of frustration which
could appear subtle to the user but have critical effects on positive user experience,
can be sensed with our proposed approach.
Our findings from this particular study have the potential to impact the way website
contents are being structured and delivered. Since the algorithm is lightweight (not
requiring large computational resources), we propose a system, based on a plugin ar-
chitecture, in which, other third-party systems can receive a trigger from the algorithm
whenever the level of arousal exceeds a given threshold. Third-party adaptive systems
can, therefore, influence user interaction by altering the presentation (layout, colour,
font properties) and contents based on the user’s emotional state. This system also
has the potential to aid recommender systems in entertainment applications such as
suggesting music/video playlists, news items, and facilitating digital shopping assis-
tants based on the user’s affective profile. One of the causes of users’ resistance to
software changes is due to the initial lack of familiarisation with the look and feel. This
system could provide the platform for third-party applications to log the user’s mo-
ments of frustration and the causes of frustration so that developers can leverage this
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB148
information to the UX of future releases. In the future, when web camera technology
improves, to offer low-cost eye-tracking, the algorithm could be used in many appli-
cation domains, including, gaming, to alter the game based on the user’s emotional
state, in tutoring systems where systems are aware of the student’s emotional state
and can offer an experience tailored to fit their emotional profile. In mission-critical
systems such as air-traffic control, the system may suggest break times for operators
based on their affective state.
5.6.1 Limitations of this study
Several factors beyond website and mode of interaction contribute to a user’s level of
arousal. This is a controlled study in which we examined frustration. On the Web, sev-
eral other factors contribute to the intensity of frustration as well as other factors that
could increase the level of arousal. For example, in the event of real operating system
failure, the recovery rate may be longer, and the frustration could last longer. Also,
when participants enter their payment details in the event of booking a real trip, and
there is a time-out, the frustration may be experienced differently from experimental
conditions where the participants are aware that the tasks have limited impact on their
finances, personal computer or time. To consider other factors that may influence the
cause and severity of frustration, a more naturalistic approach should be undertaken.
However, besides technological and ethical limitations in carrying out this study in
naturalistic settings, other limitations exist. The algorithm for sensing arousal would
need to be optimised to handle real-time data streams. From our methodology section,
we explored the use of change point detection algorithm, which can function in real-
time. Applying such an algorithm will help take in streams of pupil dilation and gaze
behaviour data, segment them into fixed windows, and then AFA algorithm would be
applied on each window, to sense a change in arousal. Furthermore, in experimental
settings, light conditions such as ambience, monitor brightness and the relative colour
difference in the stimulus can be controlled. Developing a module to account for these
changes will also improve the algorithm’s reliability under naturalistic settings.
CHAPTER 5. SENSING FRUSTRATION-INDUCED AROUSAL ON THEWEB149
5.7 Conclusion
Frustration frequently occurs during user interaction and is challenging to prevent due
to individual idiosyncrasies, and multiple scenarios that can elicit frustration. Frustra-
tion increases the level of arousal, which we know is a critical factor in performance and
user experience. Having evaluated AFA algorithm on emotional and cognitive stimuli,
we attempt to sense frustration, which may be of lower intensity. Our stimuli, being
Web, makes this study more ecologically valid, thereby presenting further challenges
such as detecting arousal during user interaction. We proposed this approach to quan-
tify changes in arousal while also identifying users’ focal attention through the analysis
of the pupillary response and gaze behaviour. The pupillary response was used to sense
increases in arousal levels as a proxy for frustration, while the gaze behaviour indicates
the user’s visual focus; hence, the likely cause of the sources of frustration. Results
indicate that AFA algorithm offers a feasible method to discriminate between normal
tasks and frustration-induced arousal on the Web. We found significant correlations
between the user’s focal attention on stressful UI elements, and the measure of arousal,
in this study. Therefore, in the next chapter, we apply the concept of identifying the
focal attention with arousal levels in greater detail, to achieve a richer understanding
of users’ affective behaviour on the Web.
Chapter 6
Arousal detection and visual
attention over users’ scanpath
In previous chapters, we have evaluated our arousal sensing algorithm on sensing emo-
tionally induced arousal, cognitively induced arousal both on static stimuli. In the last
chapter, we examined its ability to sense more complex interactions with low-intensity
arousal on the web. In the preceding chapter, we also observed that there is a signif-
icant relationship between the cause of arousal (i.e. UI elements on the screen) and
the measure of arousal. This means that if we know where participants look at when
they experience an increase in arousal, this leads us a step closer to understanding why
they experience a certain feeling (e.g., stress). Consequently, we are better equipped to
make informed changes to improve the quality of users’ experience, both in real-time
or as part of a design process.
In this chapter, we demonstrate one of the ways this algorithm can be combined with
other recognised methodologies to achieve a higher understanding of users’ affective
behaviour on the Web. We take the case of the Scanpath Trend Analysis (STA) al-
gorithm by Eraslan et al. [Eraslan et al., 2016a], which is used to summarise users’
visual transition (scan path) across a Web page. The summary of peoples visual tran-
sition tells us where they look at, and the order in which they scan through a Web
page. Whereas, AFA algorithm can be used to reveal the affective reaction of users,
towards their visual attention. Together, both algorithms can be used to summarise
the users’ visual experience. Therefore, in this chapter, we show how we developed and
explored our novel methodology, which combines the STA algorithm with our arousal
150
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 151
sensing technique, to generate our descriptive user model. This methodology is based
on users’ visual behaviour and their affective response to their trending visual paths.
This can be used to understand user interaction better and is therefore applicable in
adaptive systems and intelligent user interfaces. To test this idea of combining scan
paths with arousal sensing, we performed a pilot study, using the Apple home page,
to see whether we can derive a model based on this dataset. After the model was
developed, we decided to evaluate it on datasets from other websites. In this context
also, we use a case study of neurotypical users vs people with autism.
We have used this case of autistic people vs neurotypical users on the Web to suggest
that AFA algorithm and STA can be used to uncover differences in interest, affective
response and visual behaviour.
Autism spectrum disorder (ASD) is a developmental disorder characterised by social
and communication impairments and by restricted interests and repetitive behaviours
[Christensen et al., 2018]. The overall prevalence of autism is estimated to be between
1.1 and 1.2% of the UK populace [Brugha et al., 2012]. The presence of autism relates
to different experiences when interacting with the Web and can elicit different affec-
tive responses [Eraslan et al., 2017, Eraslan et al., 2018]. For example, people with
autism often exhibit idiosyncratic visual attention patterns, which have been shown to
affect their processing of Web pages [Mayes and Calhoun, 2007]. Besides, the strong
preference for structure and familiarity that many individuals on the spectrum have,
may be challenged by changes in the structure or the interface of many applications
[Yaneva et al., 2018]. Last but not least, there are well-documented differences in the
way some people on the spectrum interpret emotions from facial expressions and the
common presence of such faces on Web pages may affect their processing by people
with autism [Harms et al., 2010].
6.1 Motivation behind our methodology
There are many methods of investigating UX problems from both qualitative and
quantitative paradigms, including questionnaires, eye-tracking and physiological com-
puting. Qualitative methods of investigating UX problems often require participants
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 152
to communicate their experience. This may be inaccurate due to some reasons such
as lack of self-reflection with regards to one’s differences, as well as difficulties with
communication in general, which is also one of the diagnostic criteria for ASD [Paulhus
and Vazire, 2007]. An alternative approach which requires less verbalisation is using
self-reported scales to elicit feedback from users. However, recollecting experiences
requires cognitive processing. People with autism, ADHD and similar developmental
disorders may exhibit cognitive impairment [Volkmar et al., 2014], thereby limiting
the accuracy of their reported scores. Due to these limitations, we argue that meth-
ods that require eliciting intentional feedback from autistic users are less reliable for
understanding the UX challenges that people with autism face. Usability metrics such
as error rates and completion times can be used to detect common problems that
users experience on the Web. Predominantly, these metrics answer the question - “Is
there a problem?” but, discovering problems is only the first step to improving UX.
Analysis of gaze behaviour with metrics like fixation location/duration and saccades
(visual transitions) can be used to answer the more advanced question - “Where is the
problem?” but, this is also limited because knowing where the problem lies may not
always lead to a solution. Trying a different approach may alleviate the problem for
one user, but, without knowing why the problem exists, we may not have an under-
standing of which group of users the problem affects. People with autism are not the
only atypical groups of users on the Web. In addition to the idiosyncrasies of typical
users, people with learning disorders, developmental disorders, obsessive-compulsive
disorder (OCD), and other psychiatric/developmental disorders may also require spe-
cial considerations when carrying out studies to identify UX issues specific to these
user groups. The answer to the question - “Why is there a problem there?” provides
more context so that researchers can offer recommendations to overcome the existing
UX issue(s).
Affective computing relates to, arises from or influences emotions [Picard, 2010]. This
is useful, especially when the physiological state of users can be detected and related
to their interaction patterns. We show that the combination of visual behaviour and
physiological computing presents a rich methodology for identifying and understand-
ing UX issues. Data collected through physiological means are often noisy and may
also lack required sample sizes and data distributions to make finding generalizable.
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 153
Our approach provides a descriptive method that aids hypothesis genesis about UX
issues, which can then be followed up by qualitative feedback from users or further in-
ferential statistical analysis. To explore the feasibility of combining our arousal sensing
approach with scan path analysis, we conducted the following pilot study.
6.2 Pilot study
In this pilot study to develop our methodology to combine scan path analysis with
sensing arousal, we used a browsing task on the home page of the apple website.
Participants (n=39, ) were instructed to explore the Apple home page for 30 seconds.
Figure 6.1 shows the Apple home page, segmented into AOIs. Participants’ pupillary
Figure 6.1: Apple home page segmented into AOIs
response and gaze data were captured using Tobii Eye tracker by other researchers, so,
this was a secondary analysis. With permission from the original researchers for the
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 154
study, we extracted the dataset and analysed it using our arousal sensing approach to
measure the cumulative arousal levels due to each AOI on the screen. We obtained the
following result presented in Table 6.1. Since we are interested in combining the STA
algorithm with AFA algorithm, we, first of all, compare the output of AFA algorithm
with the STA scan path: F C H I E. From the AOIs included in the STA scan path, we
notice that they were all located in the middle of the screen. This is expected because
the trending scan path should contain items that people mostly fixate on. Since they
are located in the middle rather than the extreme edges, people are more likely to
notice them. Another factor worth considering is that this task is a browsing task.
Hence, participants are more likely to focus on more conspicuous AOIs rather than
the obscure ones since they are under no obligation to consider the less conspicuous
elements on the screen. Conversely, we can see from our cumulative arousal scores
that the items that are more obscure on the bottom of the screen, i.e., the footer
elements elicited the most arousal (AOIs K, L and N ). However, this is only the case
for elements that were fixated on, e.g. (AOIs M, O, P and Q). Therefore, we can
hypothesise that, if an element is not conspicuous, it may not be fixated upon by most
people, but when they are fixated on, it may result in increased arousal. Looking
closely at this postulation, there is an exception to this. The navigation bar and the
search bar is also located at the top of the page whereas, they elicited relatively low
levels of arousal, 3.07 and 1, respectively. Hence, we can refine our postulation to
mean that the reason for the increase in arousal for the footer element is that the
participants are stressed by the presentation of the content, i.e., the small fonts used
in the footer.
Now that we can see that it is possible to combine arousal scores with trending scan
paths, to aid hypothesis generation in order to understand the behaviours of Web
users, we ask the question, “How can we combine STA algorithm, and our arousal
sensing approach such that it forms an affective model for Web users”
We discuss our design process and the rationale for our proposed methodology.
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 155
Table 6.1: Cumulative arousal per AOI on the Apple home page
AOI Cumulative arousal Description Location on the page
A 1 Search bar Top RightB 3.07 Navigation bar TopC* 5.22 Ipad mini Middle LeftD 4.45 Video thumbnail Middle LeftE* 2.89 Video thumbnail Middle LeftF* 4.44 Ipad picture Middle RightG 3.13 Ipad thumbnail Middle LeftH* 2.79 Mac thumbnail Middle LeftI* 3.60 Itunes thumbnail Middle RightJ 2.91 iPhone thumbnail Middle RightK 7.50 Footer left Bottom LeftL 7.50 Footer right 1 Bottom RightM - Footer right 2 Bottom RightN 9 Footer right 3 Bottom RightO - Footer right 4 Bottom RightP - Footer right 5 Bottom RightQ - Footer right 6 Bottom RightR - Footer right 7 Bottom RightNote: AOIs with * were included in the STA trending scanpath
6.3 Formation of the methodology
The purpose of this methodology is to build a model that factors in users’ scan paths, as
well as what they feel (in terms of arousal) when they fixate on the scan path. As stated
earlier, the goal is to aid hypothesis generation, so the model may be descriptive and
qualitative rather than produce a quantitative “matter of fact” result. We formulated
three ideas for this methodology. The first two approaches pertain to fusing one
method into the other, while the last one combines both approaches loosely into one
model.
• Use the measure of arousal as a factor in deriving the STA scanpath:
The STA algorithm makes use of gaze patterns like fixation duration and counts
to select the elements that will be included in the final STA scan path. One
approach will be to include a threshold such that only when users’ cumulative
arousal for a certain AOI meets this cut-off value would they be included in
the STA scan path. Semantically, this would mean that the output of the STA
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 156
scan path would be: 1. UI elements that participants fixated upon, and, 2.
elements that induced an increase in arousal. The use case of this would be
where UX researchers are interested in evaluating areas on the screen that people
look at and react to emotionally. Further investigation can diagnose whether
the emotion is positive, negative, which would then inform the UI designers
to either retain the design for the former or improve the design in case of the
later. One limitation to this approach is that it abstracts away the measures
of arousal, thereby limiting the potential of making meaning of the individual
affective scores for each element.
• To use the scan path as a weighting factor in calculating the cumulative
arousal: When a UI element is included among the trending scan paths, it means
that most people visit the AOI visually. However, if that same element induces a
significantly high measure of arousal, it means that that element is a key factor
in determining the feelings of most users on the web-page. This is applicable
when UX researchers want to generate a hypothesis about why most users find
a website frustrating or interesting. For example, if a UI element is seen to
induce a high level of arousal due to frustration or stress, and most people who
visit the website fixate on that UI element, it means that the impact of that
frustration on the quality of user experience is likely to be high. Conversely, if a
UI element is fixated on by most users, and people have low arousal when they
fixate on the website, we may find that the UI element is taking huge space on
the website but having little to no effect on the emotions of the users. While
sometimes, this is a good thing, other times, this may serve as a useful cue
in improving the level of engagement of a particular website. While there are
other approaches to evaluating engagement such as using usability scales [Obrien
and Toms, 2013, Xu, 2015], and website interaction metrics such as Google
analytics [Kirk et al., 2012, Bijmolt et al., 2010], this approach provides a richer
understanding of “why people feel (dis)engaged?”, rather than only answering
the question “are they engaged?” [Hart et al., 2012]. This approach, however,
comes with some drawbacks. Firstly, the scan path sequence is lost. The order in
the scan path sequence tells a rich meaning as to understanding the visual scan
behaviour of users. The scan path order is useful behaviour, especially when
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 157
carrying out website transcoding to improve visual search behaviour [Harper
et al., 2006]. Another thing we miss out from when we fuse the scan paths as
weighting factors to the measure of arousal is that we are tampering with the
semantic meaning of our results. For example, we are coupling the meaning of
arousal with visual engagement, which is two fundamentally different behaviours.
One person may be visually engaged, yet not result in increased arousal, while
another may experience increased arousal without being visually engaged, as
may be the case in cognitively induced arousal. For this reason, we proposed
the third approach where both methods (the STA algorithm, and our arousal
sensing approach) are loosely coupled, yet combined, to have a rich semantic
meaning to users’ behaviours on the Web.
• Combining visual scan paths with our arousal sensing approach: In
the first approach, the Arousal sensing algorithm was used as a weighting fac-
tor (which abstracts the cumulative arousal scores). Therefore, we would lose
some of the richness in the measures of arousal, which would have been useful in
generating hypotheses regarding participants’ affective behaviour. In the second
approach, where we would have used the scan paths as weighting factors for
the measure of arousal, we would lose the semantic meaning of what the result
means. Therefore, we discuss the benefits of combining both algorithms loosely,
and how we tackle the problem of perception since there is a potential for infor-
mation overload when the results are not as summarised as in the previous two
approaches.
This loosely coupled combination allows us clarity to generate our UX hypoth-
esis. For example, when we notice that some areas on the screen induce high
arousal, we can refer to the scan path to check if there was a gradual build-up
leading to a cumulative increase from previous AOIs. If that was so, we could
generate questions such as, ‘‘did people become stressed on a UI element as a
consequence of increased frustration from previous ones?”. Another reason for
combining both approaches this way is to retain the semantic meaning. We can
view the scan paths for what it means, visual scan sequence, and the affective
measure as measures of arousal. To derive a new UX measurement would require
a series of evaluations, and ground truth to verify any claims. At this point, we
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 158
are interested in understanding behaviour, rather than defining new ways for
measuring a certain behaviour. To us, combining both algorithms would mean
that we have more information to assimilate, which may lead to information
overload. To manage this, we developed a visualisation whereby, we can repre-
sent the cumulative measure of arousal for each AOI, as well as its sequential
order from the scan path analysis, by superimposing this values upon the visual
stimulus, in this case, the Web page. The scan path sequence is represented as
letters, while the arousal level is represented as numbers within the circle that
is superimposed on the AOI of the web page. The size of the circle also repre-
sents the measure of arousal. This is also useful when we want to compare the
behaviours of different groups of users; we need to colour code each user groups’
visualisation so that they are distinguishable. We discuss this in the analysis.
This approach has immense contributions to UX research, such as modelling
user groups for adaptive computing and intelligent systems. We discuss our
contribution next.
To evaluate our methodology, we use the case of two groups, in which the literature
has provided evidence of differences in cognitive and affective processing. The case
of Autistic vs Neurotypical users, regarding how they differ cognitive and affective
behaviours have previously been researched on. However, our proposed methodology
may be used to uncover other behavioural idiosyncrasies between both groups. There-
fore, in the next section, we briefly discuss existing research about this population
regarding their behaviours on the Web.
6.3.1 Autism and the web
Previous work showed evidence that atypical visual attention in individuals with
autism may result in unconventional information-searching strategies on the Web
[Eraslan et al., 2018]. Such differences revealed through eye-tracking have also been
used to classify users into autistic and neurotypical groups [Yaneva et al., 2018]. A
more thoroughly researched factor on the Web known to affect the two groups dif-
ferently is textual content. While not specifically presented on web pages, deficits in
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 159
reading comprehension among people with autism have been widely researched [Chi-
ang and Lin, 2007, Ricketts et al., 2013], including by means of eye-tracking experi-
ments [Yaneva et al., 2016b, Yaneva and Evans, 2015, Yaneva et al., 2016c, Yaneva,
2016] and in combination with images [Yaneva et al., 2015]. This issue has been
addressed in readability research by attempting to measure the difficulty of text for
readers with autism, specifically based on the difficulties they encounter [Yaneva et al.,
2016a, Yaneva et al., 2017]. To the best of our knowledge, there is currently no method-
ology that combines pupillary response, gaze behaviour and scan path analysis to study
the affective response of this population on the web. Hence, we discuss our research
contributions next.
6.4 Research questions and contributions through
this study
For our methodology, we combine two different algorithms: 1. The Scanpath Trend
Analysis (STA) algorithm, which provides the trending scan path followed by a group
of users [Eraslan et al., 2016a] and 2. Arousal detection through the analysis of
pupillary response to identify moments of increased arousal. Gaze analysis is then
used to identify the user’s visual attention on the screen (i.e., visual element) during
moments of increased arousal [Matthews et al., 2018b]. The individual arousal scores
for each participant are averaged for each visual element. The sequence of each visual
element of the trending scan path and the corresponding arousal score is combined
to produce a visualization. The output of our approach is an aggregate of a group of
users’ scan paths over a visual stimulus, and their arousal response to each element of
the scan path. Based on this approach, our research questions are as follows:
RQ1. Does the combination of scan path analysis, and level of arousal reveal
differences between people with autism and neurotypical people in
Web browsing tasks? This question allows us to apply our approach to other
methods for understanding user behaviours on the Web.
RQ2. Does the combination of scan path analysis and arousal reveal where
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 160
the differences in arousal occur between people with autism and neu-
rotypical people in Web browsing tasks? This research question furthers
RQ1 in Chapter 1, pertaining to the overall goal of our research to determine
the focal attention of users during moments of increased arousal.
To test our hypothesis and evaluate our methodology, we do so using two populations:
19 neurotypical users and 19 users with autism. During a Web browsing task involving
8 Web pages, their pupil dilation and gaze behaviour was tracked. Further, we apply
the STA algorithm and our arousal sensing algorithm. Finally, we combine them and
visualise the result. We observe differences in their visual behaviours as, in certain
instances, the autistic group exhibits a lower arousal response to affective contents.
While this is consistent with the literature on autism, we confirm this phenomenon on
the Web. Our approach and findings present a novel research methodology to identify
and improve understanding of user interaction problems of user groups with varied
interaction patterns and experiences. Our contributions are as follows:
1. A methodology that combines visual scan path analysis with arousal scores to
provide a more holistic understanding of the users’ experience.
2. The analysis and visualisation of differences in autistic users vs neurotypical
users in Web browsing tasks using our methodology.
The study and data collection was done by Yaneva et al. [Eraslan et al., 2018], and
the STA algorithm was developed by Eraslan et al. [Eraslan et al., 2016a]. In the next
section, we describe the experiment in further detail.
6.5 Experiment
In this section, we explain the methodology employed to address our research questions
that are specific to this chapter. Further details about this experiment, the method-
ology and dataset can be found in the original publication of this study by Yaneva et
al. [Eraslan et al., 2018]. We performed a secondary analysis of this study to highlight
our contribution.
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 161
6.5.1 Participants
A total of 38 participants, 19 with a formal diagnosis of autism and 19 control-group
participants were recruited for this study1. None of them had any diagnosed degree
of intellectual disability, not any reading disorders. The mean age for the ASD group
was M = 41.05 with SD = 14.04, range [21-67], and M = 32.15, SD = 9.93, range
[20-56] for the control group. All participants with ASD were recruited through a UK
autism charity and the student enabling centre at the University of Wolverhampton.
All control-group participants were recruited through snowball sampling. Both the
participants with ASD and the control-group participants were highly able adults,
all of whom were living independently and without relying on a caregiver. From the
ASD group, 11 people had completed a higher education degree, six people had a UK
equivalent of a high-school degree (GCSE or A-levels), and two people preferred not to
answer. From the control group, 15 people had completed a higher education degree,
and three people had completed A-levels (equivalent to high school). All participants
were native speakers of English except four control-group participants, who were highly
fluent, having lived in the UK for many years. All participants reported that they use
the Web daily, with only one ASD participant reporting Web usage “less than once a
month”. All participants identified as having normal or corrected to normal vision.
6.5.2 Apparatus
A Gazepoint GP3 video-based eye-tracker was used to capture pupillary response,
fixation location, and fixation duration of the participants at a frequency of 60Hz. All
questions and answers were exchanged verbally, hence no mouse or keyboard were used.
The stimuli were presented on a 17” LCD monitor. The experiment was run using the
Gazepoint experimental environment and the laptop used for the experiments had a
Windows 10 operating system.
6.5.3 Materials and Method
Eight Web pages were selected by first exploring the home pages of the top 100 web-
sites listed by Alexa.com, excluding those that were repeated more than once. Pages
1This experiment was approved by the University of Wolverhampton, UK committee on ethics.
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 162
that were not in English and were mainly designed for authentication and/or as search
pages were also excluded. We then selected the final eight pages in such a way, as to
have a balanced representation of factors such as complexity and space between ele-
ments. The complexity values were obtained using the VICRAM algorithm [Michaili-
dou et al., 2008]. In our final selection, an equal number of pages had a high complexity
(YouTube, Amazon, Adobe and BBC) and low complexity (WordPress, WhatsApp,
Outlook and Netflix), as well as small (Outlook, Netflix, Adobe and BBC) and large
space (WordPress, WhatsApp, YouTube and Amazon) between their elements. Par-
ticipants were presented with screenshots of the pages to ensure consistency in the
look and feel of the web pages. For images of each page, see Figure 6.2a.
There were two types of tasks: browsing and synthesis, presented in counterbal-
anced order for each participant. For the browsing task, the participants were free to
explore each page for 30 seconds. For the synthesis task, each participant had up to
120 seconds to answer two questions per page, with the possibility to move forward
earlier if they had answered the questions. Each question required the participants to
combine information from at least two-page elements to arrive at the third piece of
information not explicitly given on the page. Examples include “What is the cheapest
plan you can get that offers Email & Live Chat support?” for the WordPress page,
where the participant has to identify the plans that offer email and live chat support
and compare their prices or “Which item has the largest price discount measured in
percentage?” for the Amazon page. In this paper, we selected the browsing task for
our case study so that our analysis is based on a single web page per website.
All experiments were conducted in a quiet room. First, the consent form and
the demographic questionnaire were filled in by the participants. After that, the eye
tracker was calibrated using a nine-point calibration, and the experiment commenced.
All questions and answers were given verbally, and the participants were all given
a break between the tasks. After completing the experiment, all participants were
debriefed.
6.5.4 Analysis
The analysis is carried out in two main stages. The first stage makes use of Scan
path analysis using the STA algorithm to summarize participants’ scanpaths into a
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 163
(a)
Wh
atsA
pp
(b)
Am
azon
(c)
Word
pre
ss
(d)
Net
flix
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 164
(e)
BB
C(f
)Y
ouT
ub
e
(g)
Ad
obe
(h)
Ou
tlook
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 165
single scanpath. Following this, AFA algorithm is applied for the each element that
constitutes the trending scanpath for each group (autistic and neurotypical). We
explain this further, in the subsections below.
The STA algorithm
The Scanpath Trend Analysis (STA) algorithm identifies the trending path of
multiple users on a Web page in terms of its AOIs. The STA algorithm is a multi-pass
algorithm which is comprised of three core stages: (1) Preliminary Stage, (2) First
Pass and (3) Second Pass. The detailed description of the STA algorithm can be
found in [Eraslan et al., 2016a].
1. Preliminary Stage: This stage firstly takes a series of fixations for each user on
a particular Web-page and the details of the AOIs of the page. It then matches
each fixation with its corresponding AOI to generate the individual scan paths
in terms of the AOIs of the Web-page.
2. First Pass: Once the individual scan paths are ready for further processing, the
First Pass start analysing them to identify trending AOIs by selecting the AOIs
which are shared by all the users or catch at least the same attention as the fully
shared AOIs based on their total fixation durations and total fixation counts.
3. Second Pass: After identifying trending AOIs, the Second Pass calculates an
overall sequential priority value for each trending AOI based on their positions
in the individual scan paths. It then combines these AOIs based their priority
values to discover the trending path where the trending AOI with the highest
priority will be the first one in the trending path.
The STA algorithm was evaluated by comparing its resultant paths with the re-
sultant paths of other similar algorithms by using different AOI detection approaches
[Eraslan et al., 2016a, Eraslan et al., 2016b]. The evaluation shows that the resultant
path of the STA algorithm is the most similar one to individual scan paths. Thus it
discovers the most representative path. The detailed results of the evaluation can be
found in [Eraslan et al., 2016a, Eraslan et al., 2016b].
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 166
Arousal sensing and detection of focal attention
We developed our arousal sensing algorithm iteratively over different eye-tracking
datasets from different application domains (i.e., medicine and ontological authoring)
and ground truth from domain experts and participants’ self-reported feedback. It has
been evaluated on detecting cognitive induced arousal using Stroops’ effect ([MacLeod,
1991]) to elicit differential levels of cognitive load. It has also been evaluated on
emotionally induced arousal using datasets from the International Affective Picture
System (IAPS) database [Lang et al., 1997]. Furthermore, by this time, it had been
evaluated on its ability to sense frustration-induced arousal on the Web. Therefore,
the algorithm was suitable for sensing arousal in our Web browsing tasks. Details of
our implementation of this can be found in Section 3.6. From the output of our arousal
sensing algorithm, we generate vectors that represent the measure of arousal elicited
by each AOI on each participant. We merge the two algorithms by computing the
average and standard deviations of the arousal scores for each of the visual segments
(AOIs) that make up the trending scan paths from STA algorithm.
6.5.5 Visualizing our visual behaviour model
Our pilot study revealed some of the factors that may influence arousal changes during
web interaction. They include the position of elements on the screen, the content and
the font size. Our design of the visualisation considered the fact that the structure and
content must be put into perspective if our methodology would be used to uncover rea-
sons why people experience a change in arousal or group differences in web behaviour.
In regards to group differences, it is also necessary that the visualisation is capable
of showing multiple groups within the visualisation. Therefore, our design involved
superimposing the objective measurements from the algorithms (affective measures,
and STA scan path sequence) onto the visual stimulus (after it has been segmented
into AOIs).
To help us determine where the primary differences are located, this visualisation was
utilised for data exploration (Figure 6.11). Both groups are colour coded so that mul-
tiple groups can be easily distinguished. The circles that are superimposed on the
AOIs that elicited the arousal contain a letter and number. The letter indicates the
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 167
sequential order of visualisation that occurred on viewing the Web-page. The corre-
sponding number represents the arousal levels (AL) for each AOI and is rounded to
the nearest whole number so that the size of each circle indicates the ordinal level
of arousal for each group. The arousal levels on this visualisation can be treated as
ordinal measures where 1 to 3 indicate low arousal, 4-6 medium and 7-9 high levels
of arousal. This visualisation was used to generate hypotheses that may explain the
general behaviour of participants in each group.
6.6 Results
In relation to RQ1, which concerns detecting differences in arousal between both groups
for each task, we computed the mean arousal score per participant for each Website.
Results from Table 6.2 indicate that only the YouTube Website shows a significant
difference in arousal between the trending scan paths of the autistic group and the
neurotypical group.
Table 6.2: Results of Mann Whitney U test comparing arousal between each group(autistc and neurotypical) with Bonferroni correction α = 0.00625
Website U p-valueWhatsApp 44.0 .077Amazon 792.5 .380WordPress 755.5 .233Netflix 2132.0 .438BBC 341.5 .208YouTube 388.5 .002*Adobe 1118.0 .211Outlook 405.0 .231Note: * = p < .00625
The mean arousal for the neurotypical group (M =4.00, SD=2.80) for the YouTube
Website was also higher than that of the ASD group (M =2.41, SD=2.24). This is con-
trary to our hypothesis that people with autism would experience more arousal levels
in the browsing tasks. With regards to RQ2, pertaining to identifying differences in
AOIs between groups, further exploration shows that both groups experience arousal
in different degrees from different elements in their respective scan paths.
We examine the bar charts and error bars from each of the websites, carrying out
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 168
statistical tests to analyse these differences.
6.6.1 Analysis of the Web pages by their AOIs
Table 6.3: Scan path Sequence (Seq), participants (n) with change in arousal level
per AOI , mean arousal (M ) for the participants and standard deviation (SD) for the
controlled and autistic group (ASD).
NB: The gaps in the table exist where there are fewer elements making up the trending
scanpath over a website for that group
Controlled ASD
Website Seq AOI n M SD AOI n M SD
WhatsApp A 160 13 3.3 2.98 156 4 1.28 0.33
B 159 10 3.68 3.09
C 156 5 5.37 3.61
D 157 3 2.02 1.28
E 161 8 2.8 2.7
Amazon A 263 8 3.82 2.76 263 8 2.95 2.58
B 265 6 5.19 3.59 264 12 6.27 3.22
C 264 9 3.32 3.35 265 11 3.67 3.02
D 266 10 4.11 3.83 266 5 2.36 2.62
E 260 7 3.47 3.31
F 251 7 1.61 0.96
WordPress A 178 14 5.67 3.38 179 10 4.48 3.03
B 179 16 2.96 2.83 178 12 4.74 3.45
C 180 7 2.68 2.93 180 13 3.6 3.27
D 179 10 4.48 3.03
Netflix A 126 6 1.67 1.03 277 6 5.31 4.1
B 125 11 2.8 2.57 279 12 3.97 3.11
C 277 7 4.77 2.86 280 11 2.99 2.53
D 280 4 6.44 3.78 279 12 3.97 3.11
E 279 12 3.83 3.48 126 6 3.25 2.99
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 169
Table 6.3 – Continued from previous page
Controlled ASD
Website Seq AOI n M SD AOI n M SD
F 277 7 4.77 2.86 281 10 2.71 3.01
G 280 4 6.44 3.78
H 118 1 1 -
I 279 12 3.83 3.48
J 283 6 1.33 0.82
K 281 6 4.17 3.82
BBC A 109 17 5.48 3.89 109 17 5.71 3.37
B 110 9 3.09 3.42 110 13 4.19 3.35
YouTube A 198 5 5.16 2.83 199 8 3.93 3.64
B 199 6 5.29 3.11 200 6 2.35 1.31
C 200 7 6.24 3.15 202 9 1.67 1.33
D 203 5 3.5 3.21 203 6 1.54 0.67
E 202 8 3.15 1.88
F 210 3 2.47 2.54
G 206 7 3.57 2.51
H 207 3 1.32 0.33
Adobe A 16 7 3.55 1.54 15 6 4.11 3.45
B 15 6 3.6 2.9 18 8 3.27 2.2
C 18 11 3.38 2.76 16 8 2.52 2.35
D 19 7 3.02 2.88 19 9 3.69 2.49
E 21 13 3.78 3.08 20 5 4.02 3.73
F 22 5 3.8 3.35 21 8 4.55 3.97
G 225 7 6.1 3.69
Outlook A 137 10 4.21 3.42 137 8 2.56 2.01
B 138 16 3.53 3.26 138 16 5.42 3.44
C 136 11 3.47 2.18
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 170
Fig
ure
6.3:
Lev
els
ofar
ousa
lfr
omth
eau
stis
tic
and
neu
roty
pic
algr
oup
per
AO
Ifo
rth
eW
hat
sapp
Web
pag
e
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 171
Fig
ure
6.4:
Lev
els
ofar
ousa
lfr
omth
eau
stis
tic
and
neu
roty
pic
algr
oup
per
AO
Ifo
rth
eA
maz
onW
ebpag
e
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 172
Fig
ure
6.5:
Lev
els
ofar
ousa
lfr
omth
eau
stis
tic
and
neu
roty
pic
algr
oup
per
AO
Ifo
rth
eW
ordpre
ssW
ebpag
e
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 173
Fig
ure
6.6:
Lev
els
ofar
ousa
lfr
omth
eau
stis
tic
and
neu
roty
pic
algr
oup
per
AO
Ifo
rth
eN
etflix
Web
pag
e
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 174
Fig
ure
6.7:
Lev
els
ofar
ousa
lfr
omth
eau
stis
tic
and
neu
roty
pic
algr
oup
per
AO
Ifo
rth
eB
BC
Web
pag
e
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 175
Fig
ure
6.8:
Lev
els
ofar
ousa
lfr
omth
eau
stis
tic
and
neu
roty
pic
algr
oup
per
AO
Ifo
rth
eY
ouT
ub
eW
ebpag
e
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 176
Fig
ure
6.9:
Lev
els
ofar
ousa
lfr
omth
eau
stis
tic
and
neu
roty
pic
algr
oup
per
AO
Ifo
rth
eA
dob
eW
ebpag
e
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 177
Fig
ure
6.10
:L
evel
sof
arou
sal
from
the
aust
isti
can
dneu
roty
pic
algr
oup
per
AO
Ifo
rth
eO
utl
ook
Web
pag
e
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 178
Starting with Figure 6.3, which represents the arousal levels for the WhatsApp Web
page, considering each group - autistic (red) vs neurotypical (blue). We can see that
AOI 156 (the search bar) elicited a higher magnitude of arousal for the neurotypical
group than the autistic group. AOI 153 (the WhatsApp logo) also induced more
arousal on the neurotypical group than the individuals with autism. The only AOI to
elicit more arousal in the autistic group than the neurotypical group was AOI 163 (an
AOI which describes the features of WhatsApp on Web and Desktop).
Moving on to the Amazon Web page on Figure 6.4, there were four AOIs where the
neurotypical population experienced arousal of a distinctively higher magnitude than
the individuals with autism: AOI 229 (search bar), AOI 254 (Ad feedback), AOI 246
(language selection) and AOI 232 (delivery location). There were three AOIs with a
distinctively higher magnitude of arousal for the individuals with autism compared to
the neurotypical participants: AOI 264 (deal of the day - security system), AOI 260
(product filter), and AOI 240 (navigation item - Today’s deal).
About the arousal levels per AOI on the Wordpress Web page, Figure 6.5 shows
that only AOI 176 (caption - “choose your Wordpress.com flavour”) elicited more
arousal to then neurotypical than the individuals with autism. Similarly, only one
AOI 169 (a navigation item) elicited slightly more arousal to the autistic group than
the neurotypical population.
On the Netflix Web page (Figure 6.6), only AOI 280 (feature item - Ultra HD available)
elicited more arousal for the neurotypical users than the individuals with autism.
Contrarily, AOI 126 (caption - Join free for a month), AOI 117 (picture and caption
(a) Legend
Figure 6.11: Levels of arousal from each group’s trending scan path, overlaid on theAOI’s of each Website.
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 179
(b)
Wh
ats
Ap
p(c
)A
maz
on
(d)
Word
pre
ss(e
)N
etfl
ix
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 180
(f)
BB
C(g
)Y
ou
Tu
be
(h)
Ad
ob
e(i
)O
utl
ook
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 181
- Pick your price), AOI 118 (picture and caption - “No commitments cancel online at
anytime”), AOI 124 (header banner of a movie - “Narcos”) and AOI 284 (feature item
- cancel anytime) all induced arousal of higher magnitudes to the individuals with
autism compared to the neurotypical users.
On the BBC Web page, Figure 6.6) showed no AOI where the magnitude of arousal
for the neurotypical was higher than the autistic population. Whereas, three AOIs:
AOI 113 (news caption - “Today’s formula 1”), AOI 276 (page header - sports) and
AOI 115 (program schedule - “Australian Grand Prix”) induced higher arousal on the
autistic group than the neurotypical population.
Figure 6.8 shows the result of the Youtube page. Recall that the Youtube Web page
was the only page that showed a statistically significant difference between the overall
cumulative arousal of the neurotypical population and the population of individuals
with autism. Breaking it down by AOIs, there were six AOIs where neurotypical users
experienced a higher magnitude of arousal compared to the individuals with autism.
Namely, AOI 202, AOI 215, AOI 203, AOI 200, AOI 206 and AOI 212, which are all
video thumbnails and captions. Whereas, AOI 210, AOI 214, AOI 190, which were
captions and video thumbnails, AOI 205 and AOI 204, which were category headers,
and AOI 287 and AOI 289 which were menu items all induced more arousal on the
individuals with autism than the neurotypical participants.
On the Adobe Web page, Figure 6.9 reveals that only AOI 24 (a caption - “Not in
school? See other creative cloud plans”) induced more arousal onto the neurotypical
participants than the individuals with autism. Similarly, only one AOI, AOI 20 (a
picture and a caption representing Adobe’s “Document Cloud”) induced more arousal
in the individuals with autism than the neurotypical populace.
Finally, on the outlook Web page, we can see from Figure 6.10 shows that only AOI 141
(the Endnote logo) induced more arousal on the neurotypical users than the individuals
with autism. Whereas, AOI 138 (a region filled with text describing some of the
features of outlook), AOI 139 (the skype logo) and AOI 146 (the Giphy logo) induced
more arousal in the individuals with autism than the neurotypical participants.
Our observations from these results are that textual contents are more likely to induce a
greater magnitude of arousal in individuals with autism, compared to the neurotypical
users. Next, we discuss the combined result of our arousal analysis and the output of
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 182
the STA algorithm. The average and standard deviations of the arousal scores that
make up the trending scan paths for each group is given in Table 6.3. For example in
the first row, 13 participants (out of 19 in the controlled group) experienced a mean
arousal score, M =3.3, SD=2.98 while looking at ‘AOI 160’ of the Whatsapp Website
whereas, no participant from the ASD group experienced an increase in arousal caused
by the same AOI. From our visualisation in Figure 6.11a, there was a longer trending
scan path for the control group, compared to the ASD group, which had only one
element, AOI 156 (the WhatsApp search bar). The control group experienced a level 5
measure of arousal, whereas the ASD group was level 1. In Figure 6.11d, the controlled
group experienced a more varied range of arousal from level 1 to level 6 over nine
different elements compared to the ASD group that ranged from level 3 to 5 over five
elements. Similarly, on figure 6.11f, we can observe that the controlled group has a
longer scan path and more common visual coverage over the website whereas, the ASD
group have a linear, horizontal scan path across the YouTube Web-page. The arousal
response due to AOI 200, from the controlled group (AL=6), was more than the ASD
group (AL=2). Interestingly, AOI 200 pertains to a video about Stephen Hawking and
his death. Overall, the ASD group exhibited lower levels of arousal (AL=4 vs AL=5
by the control group) when looking at the UI element (AOI 199). The image displayed
on this AOI was a picture of people laughing in excitement. This difference may be
due to the ASD group exhibiting different affective responses than neurotypical people,
which is consistent with findings in the literature [Gallese, 2006].
Results show that the fusion of the STA algorithm and AFA algorithm summarises
behaviours between groups of users. The observations made using our approach can
be analysed further using either inferential statistics or qualitative methods to provide
additional supporting evidence for research findings. In the next section, we discuss
how our modelling method can be used to identify possible UX issues, such as cognitive
load, the presence of stress and differences in the visual perception of the Web elements
in relation to physiological arousal.
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 183
6.7 Discussion
Regarding RQ1, that pertains to determining if there is a difference in arousal between
both groups in browsing tasks, we hypothesised that the ASD group would experience
more arousal compared to the neurotypical users. However, our assumption that the
ASD group will experience more arousal from browsing tasks was not the case be-
cause an increase in arousal could be indicative of interest, anticipation or excitement
from different AOIs, all of which could have confounded our results. A more direct
approach would be to test each UI element based on our hypothesis. For example,
narrow our hypothesis to be specific to AOIs, where we have more understanding and
control of their emotive rating and the complexity of the content. This was not how-
ever possible due to the small and uneven sample size of people who fixated on AOIs,
and the varying order of their fixations, thereby, making statistical group comparisons
inappropriate [Greenland et al., 2016].
With regards to RQ2, identifying differences in AOIs between groups, we were able to
observe differences in visual and physiological patterns between both groups in our de-
scriptive approach. For example, in Table 6.2, our observation that the autistic group
showed less arousal compared to the neurotypical group for several UI elements on the
YouTube page. One element shows people in a happy state, while the other contains
the thumbnail for a video regarding the death of physicist, Stephen Hawking. A UX
researcher may relate this to symptoms where people with autism interpret affective
expressions differently from neurotypical people [Philip et al., 2010]. In research by
Baron-Cohen et al., they observed that autistic results that suggest that autistic indi-
viduals do not recognise bodily expressions of emotions as well as non-autistic people
[Baron-Cohen et al., 1988]. Therefore, an implication could be that, if a UI element
contains a link that is a functionally significant aspect of the website, it must be pre-
sented in a manner that does not rely primarily on facial expressions or emotional cues
[Celani et al., 1999, Cook et al., 2013]. This is one such usability issue that can be
identified using this methodology.
Another behaviour that our methodology can help to uncover is that of understanding
the emotions that users experience prior to leaving a Web-page. It may be the case
that the initial UI elements that the users engage with eliciting higher arousal than
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 184
the final ones, as is the case with the BBC and YouTube Web-page, or that partic-
ipants experience an increase in arousal in the middle of their interaction compared
to the beginning and end, as the case with the Amazon website or they experience
lower arousal levels at the final UI elements as with the Adobe Web-page. Arousal
could indicate positive feelings such as attraction [Foster et al., 1998], emotionally
neutral ones (i.e. cognitive load [Gellatly and Meyer, 1992]), or negative feelings like
frustration and stress [Mackay et al., 1978]. When participants experience an increase
in arousal towards the end of an interaction, this could imply that they found what
they were looking for, i.e., excitement in completing a task/goal [Wulfert et al., 2005],
or that they were frustrated [Doob and Kirshenbaum, 1973]. Both of these cases could
benefit from the optimisation of the user interface. In the first instance, if users take a
long time to find the item of interest on a page, it means that the user experience may
be improved by repositioning the element to a more visually accessible location, or by
using a more attractive design to draw the attention of users towards that particular
content [Chen et al., 2003]. When users experience frustration with a UI element prior
to leaving a Web-page, it could be an indication that the UI element has a usability
problem [Nakarada-Kordic and Lobb, 2005]. This type of diagnosis is mainly possible
because we combined users visual scan path with a measure of their arousal levels.
The implications for design include aggregation of different user groups of users and
potential modelling of their behaviour. For example, on an e-learning website, people
with autism can have a different profile that takes account of and adapts to their
unique traits and requirements [Martin et al., 2007]. Eye-tracking is becoming more
accessible, and we anticipate that web cameras and mobile phone cameras will one day
have eye-tracking capabilities. Our methodology could then be used within social me-
dia and mobile applications. Posts and feeds can be treated as atomic UI elements so
that the characteristics of different posts (sentiment, object classification, colour etc.)
can be investigated against the visual scan sequence and the corresponding affective
states that are elicited.
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 185
6.8 Limitations of the AFA algorithm-STA method-
ology
Due to the limited accuracy of eye-tracking technology, our analysis has been based
on group behaviour as opposed to individual behaviour. Therefore, our measures are
aggregated to represent the behaviour of an entire group. Even though the raw data
is easily accessible (as shown in the tabular format in Table 6.3), the information pre-
sented on the visualisation lacks a measure of confidence such as confidence interval
(CI ), standard errors of the mean (SEM ), or the standard deviation (SD) from the
mean arousal level. Also, the visualisation does not show how many users were used
to compute the aggregate, which may be crucial, especially when qualitative results,
such as interviews with the participants are available. When making inferences about
certain trends observed on the visualisation, it is therefore important to look up these
additional factors from the output of the algorithms to obtain more context to the
trend. In future, the transparency or opaqueness of each circle representing the scan
an element could be used to indicate a confidence level for our measurements. For
instance, increased opacity means higher confidence interval. This way, the more reli-
able signals are more emphasised to the observer while the less significant results are
blurred out. Another limitation is that our methodology can currently only be used in
laboratory settings. Ambient light, inter-colour differences between (and within) stim-
uli and other environmental variables may introduce confounding factors which may
yield different results in the wild. Therefore, our methodology needs to be optimised
to handle these factors dynamically in naturalistic settings.
6.9 Conclusion
Traditional metrics for evaluating the usability of websites have yielded much success
in the past. Many of those metrics tell us what is wrong with a website. However, of
equal importance to UX researchers, is determining the user’s emotional state during
the interaction. Recognising the user’s affective state is important because it reveals
a richer understanding of why users behave in certain ways in the presence of certain
content and tasks. We have demonstrated through our study that it is possible to
CHAPTER 6. AROUSAL DETECTION AND SCANPATH ANALYSIS 186
combine methods that answer both of these questions, ‘how do users interact with
websites’ and ‘how do they feel when they interact with web-pages. The former was
achieved by summarising the users’ visual scan paths into a trending scan path using
the STA algorithm, while the latter was achieved by generating arousal scores that each
UI element elicits for each group of users. Furthermore, we created a novel visualisation
that can aid researchers in assimilating and generating research questions that can be
investigated further using more established statistical or qualitative methods. We
have utilised changes in arousal as our affective metric in this study, in future, we
recommend an approach that also identifies the valence of the users’ emotional state.
In domains such as e-learning and gaming, users often drop out due to undesired
affective and cognitive states during their interaction. Our methodology provides
context (users’ affective state, and visual attention) which can be fed back into the
system. Real-time adaptation of user interfaces and contents can then be carried out to
improve the quality of their user experience. Having proposed, implemented, evaluated
AFA algorithm in previous chapters, this chapter extends its use by combining it with
other more established methods. In the next chapter, we discuss the implications for
design, limitations and future work in terms of our research.
Chapter 7
Discussion and Conclusion
In the previous chapter, we extended AFA algorithm by combining it with the anal-
ysis of users’ trending scan path (STA algorithm) to facilitate a better understating
of user behaviour on the Web. In this chapter, we take a higher-level overview, and
appraisal of our research - sensing arousal and focal attention during user interaction.
In Section 7.2, we consider what AFA algorithm means, and how it can be utilised
by UX researchers, UI designers and other affective computing researchers. Next, we
highlight its limitations in Section 7.3. We discuss how similar HCI methodologies
compare with ours, and alternative approaches we could have taken regarding ours.
We proceed to Section 3.7, presenting a visualisation tool that we developed to view
the output of AFA algorithm. We propose this tool which is a dashboard that can aid
users in visualising the output of their eye-tracking datasets to observe participants’
affective signals, either with respect to temporal trends or in relation to their focal
attention during moments of increased arousal. We conclude the chapter with a sum-
mary and appraisal of the entire research.
7.1 Reflection about our work and state of the art
in affect detection
Since we started our research, the sub-domain of affect detection has gained increased
attention [van der Wel and van Steenbergen, 2018]. In 2015, Sioni et al. reported
187
CHAPTER 7. DISCUSSION AND CONCLUSION 188
a review of existing affect detection mechanisms. They recommended that research
should be carried out into affect detections with objects that users make use of daily
as sensors. For example, embedding GSR sensors into a mouse and optical sensors on
mobile phones for sensing BVP. Even though research has shown that this is possible
[Amico, 2018], the devices in question are still not widely used. Smartwatches with HR
and GSR sensors are also becoming popular, but they are mainly used by people for
fitness and health tracking. Smartwatches are not considered computer peripherals;
therefore, we argue that it is less integrate-able compared to peripherals like webcams.
The pupillary response was particularly cited as being a noisy source of affective signal
[Klingner et al., 2008, Klingner, 2010]. The idiosyncratic nature of pupillary response
data makes baseline measurements difficult [Van Gerven et al., 2004, Beatty, 1982].
We showed with our computational approach through a dynamic baseline detection
that we can measure relative changes between previous and current states of arousal.
Recent work by Wang et al. selected a set of sequences of mouse and gaze patterns
that correspond to participants feeling stressed during user interaction. Their applica-
tion, which is capable of sensing the user’s fixation, was evaluated on Web search and
mental calculation tasks resulting in an accuracy of 74.3%, outperforming the status
quo by 20% [Wang et al., 2019]. Our work is capable of sensing emotional arousal,
as well as stress, but Wang et al.’s work complements ours, as it does not require
devices with pupillometry capabilities. Similarly, EYECU (Emotional eYe trackEr for
Cultural heritage sUpport), a system for sensing arousal in relation to the users’ focal
attention was developed for viewers of an art gallery [Calandra et al., 2016]. Calandra
et al. designed EYECU to log affective arousal data about the visitors of an art gallery,
in relation to the area of the stimulus that was fixated upon [Calandra et al., 2016].
The focus of our research was on sensing arousal. Pupil dilation is an indicator of cog-
nitive and affective arousal [Partala and Surakka, 2003]. However, it would be useful to
distinguish between positive, high arousal states like interest, attention and excitement
from high arousal, negative states like anger, frustration and cognitive overload. This
distinction between positive and negative valence is useful for the adaptive system to
alter the system in the right direction of hedonic polarity [Van Schaik and Ling, 2008].
We have observed new evidence since the start of our work suggesting that features
CHAPTER 7. DISCUSSION AND CONCLUSION 189
extracted from the dynamics of pupil dilation can be analysed using machine learn-
ing techniques, to sense affective valence (negative or positive states) [Babiker et al.,
2015, Ragot et al., 2017]. To the best of our knowledge, the results from Babiker et
al. have not been replicated by other studies without the combination of other sensors
like the EEG [Lu et al., 2015]. Sticking to our principle of using a single source of
affect detection, we could enhance our approach by adding object identification. With
object identification, we can extract the content of the stimulus where a user is focused
during moments of increased arousal. After extracting the content, we can apply sen-
timent analysis, to identify the valence categories (positive, neutral or negative) of the
object so that applications that utilise AFA algorithm will have a richer context about
the users’ affective state [Baltaci and Gokcay, 2012].
In recent times, there have been many methodological and theoretical contributions
to the understanding, analysis and modelling of pupillary response to sense affective
signals. Stephen et al. studied the relationship between emotional intensity and dura-
tion, which is useful not only in predicting and sensing an emotion but a crucial factor
in the implementation of adaptive systems [Steephen et al., 2018]. Snowden et al., in
addition to the duration, also explored the role of habituation and the mode of view-
ing (passive vs active) in a picture viewing task while sensing arousal [Snowden et al.,
2016]. With regards to methodologies to account for or remove illumination effects in
pupillary response, several solutions have been proposed. Korn et al. proposed concept
of linear time-invariant (LTI) to account for light changes in auditory-oddball task,
an emotional-words task, and a visual-detection task [Korn and Bach, 2016]. Pfleging
et al. proposed a model relating pupil diameter to mental workload and lightening
conditions [Pfleging et al., 2016, Raiturkar et al., 2016] while proposed a method to
decouple light reflex from pupillary dilation to measure emotional arousal in videos
[Pfleging et al., 2016, Raiturkar et al., 2016]. Due to a lack of resources (time, and
data), we were unable to implement these methodologies for accounting for illumina-
tion and brightness. Especially the effect of varying lights, combined with stimuli that
have varying emotional and cognitive properties. For simplicity and internal validity,
we decided to keep the ambient light constant. Therefore, our results are valid for
intra-colour changes within a stimulus.
CHAPTER 7. DISCUSSION AND CONCLUSION 190
Regarding the application of AFA algorithm in naturalistic settings, Ferhat et al. re-
viewed the existing options for low-cost eye-tracking devices [Ferhat and Vilarino,
2016]. Although the emphasis was on gaze tracking rather than pupillometry, they
reveal that gaze accuracy of visible light cameras is comparable to those that use in-
frared cameras [Ferhat and Vilarino, 2016]. Previously, the standard frame rate for
web cameras was 15 frames per second (fps), but at the point of this writing, web
cameras typically range from 30fps to 120fps (which is twice the frequency at which
we captured data in our studies). However, frequency is not the only limitation on web
cameras. Web cameras capture data like pictures, at a particular resolution, for exam-
ple, 720 pixels (p) or 1080p (known as High Definition - HD). Higher resolution images
contain more detail; therefore, there is an increased likelihood for accurate measure-
ments of pupil data. To cope with limitations in hardware, software, the mode of data
transmission (USB 2.0, USB 3.0) from the web camera to the computer and band-
width (over the internet), there is usually a compromise to be made in frequency or
picture resolution. Therefore, web cameras may capture images at higher resolutions
(1080p) but transmit at lower frequencies (30fps) due to lower processing power, mode
of transmission or bandwidth limitation. Eye trackers like Tobii make use of dedicated
Ethernet LAN cables (with RJ45 connectors), and USB interfaces for data transmis-
sion and communication between the device and the computer. Eye trackers also have
inbuilt software for pre-processing data. Furthermore, eye trackers capture data using
infrared, which captures light with wavelengths as long as 14,000 nanometers which
is more detailed for pupillometry processing, compared to web cameras that capture
images meant to be interpreted by the human eye, with a visible range of wavelength
between 400 - 700 nanometres. As we discovered from our literature review, gaze
tracking and eye tracking is far more widely accepted and adopted methodology in
usability studies and HCI than pupillometry. Our work gives credence to the exist-
ing works on pupil dilation. We hope that with more success in the methodology for
analysing pupillary response data, there will be a corresponding success regarding the
use of low-cost eye trackers with pupillometry capabilities for widespread ubiquitous
affect detection.
CHAPTER 7. DISCUSSION AND CONCLUSION 191
7.2 Design and methodological implications
Many usability metrics measure performance, accuracy, effectiveness, satisfaction and
efficiency [Scholtz, 2006]. These metrics evaluate usability from the system’s perspec-
tive, i.e. how well does the system deliver its functions to the users [Martin et al.,
2007]. However, there are many more factors that influence the impact that a sys-
tem/product/user interface has on its users. User experience is a more overarching
term that refers to “a person’s perceptions and responses that result from the use or
anticipated use of a product, system or service” [Law et al., 2009]. From this defini-
tion by the ISO, we can see that users’ perception and response count towards user
experience. Therefore, apart from evaluations that take measurements of the system,
recent trends in HCI have exposed the need to take our bearings from the perspective
of the user [Prendinger and Ishizuka, 2005]. In research, gaze detection has been de-
ployed as one of the techniques for investigating users’ perception of a visual stimulus
[San Agustin et al., 2010]. Therefore, eye tracking has long been established as a
methodology for carrying out usability tests. Eye trackers have effectively been used
to answer the question, “how do users behave when presented with a visual stim-
ulus?”. Some of the well-used metrics include the fixation (duration and location).
With fixation metrics, for example, we can measure learnability, an important factor
for efficiency, by comparing the time to the first fixation, on salient components in one
user interface vs another user interface [Simola et al., 2015]. More advanced techniques
such as visual sequences make use of fixations to understand the transition between
different AOIs of a user interface [Eraslan et al., 2016b]. For example, STA algorithm
that we combined in Chapter 6.
Affective states contribute greatly towards users’ perception of a system. Vice-versa,
the use of an interface may influence users’ affective state, and ultimately, their per-
ception of the system/product [Desmet, 2003]. Affective computing, particularly the
use of physiological sensors to measure users’ affect, was promised as a domain that
will deliver a greater understanding of human-computer interaction [Picard, 1997]. As
we reported in Chapter 2, several works of research have been invested in affective
computing. Limitations in the applicability of previous research in the wild, cost of
purchase and deployment of the sensors have hindered the progress [Ragot et al., 2017].
CHAPTER 7. DISCUSSION AND CONCLUSION 192
Our research focused on leveraging the use of eye-tracking, which helps us know what
the user is doing, but with the combination of pupil dilation (a physiological response),
we can understand how the users feel.
For researchers, the methodological implication is that we now have the opportunity to
extract a richer understanding of users’ behaviour (how users act + how they feel) dur-
ing visual interaction. Our toolkit aids the comprehension and assimilation of results,
as researchers can plug in their eye-tracking datasets and visualise their behaviours
from a spatial (AOIs) or temporal perspective. As our methodology becomes more
popular, we anticipate that researchers will begin to identify patterns and trends that
could be used to predict the positive/negative perception of a system. The merit of
AFA algorithm is that the eye-tracking methodological process flow does not change
a great deal, i.e. we have leveraged an existing methodology (eye-tracking), but we
derive additional insight [Matthews et al., 2019b].
For designers, we envisage that those findings from researchers, as discussed above,
would influence the way user interfaces are being designed. However, we also envision
a more direct impact if future web cameras were to have pupillometric capabilities
[Bousefsaf et al., 2014]. When this future is met, AFA algorithm can be implemented
as a browser plugin, or within a user interface such that it notifies the system when-
ever it senses an unwanted affective state. When UI/system designers are equipped
with the ability to sense the users’ affective state, there are several ideas that could
be explored as palliatives to induce a more desirable state. For example, if a user ex-
periences cognitive overload, perhaps, the system can suggest breaks, or play soothing
music to the user [Dalvand and Kazemifard, 2012, Kim and Andre, 2008].
Our work and its potential impacts are promising. Perhaps, if it was in doubt to man-
ufacturers of web camera hardware, that their consumers may not benefit from having
web cameras with eye-tracking and pupillometric capabilities, our work provides ev-
idence that such technological advancements enhance the realms of possibilities for
HCI researchers, UI and system designers.
We discuss the limitations of our work in the next.
CHAPTER 7. DISCUSSION AND CONCLUSION 193
7.3 Limitations
We discuss four categories of limitations: methodological (AFA algorithm), technologi-
cal (hardware), application (use cases), and contextual (meaning).
Methodological Limitations: Although, we have shown through our study of emo-
tional pictures in Chapter 4, that the relative colour intensity was not a significant
factor in our measure of arousal, we have not tested the effect of ambient lighting,
screen brightness, or more intense colour changes on the screen. We are also yet to
test AFA algorithm on real-time arousal detection.
We decided to focus on ecological validity. Therefore, we have not carried out a
study that compares the accuracy of AFA algorithm with other studies. As we dis-
covered from our background study, most other affect detection methodologies are
application-specific, and in many cases designed to function in the lab, with accuracy
as the priority. Rather than achieving a context-specific high accuracy, we aimed for
generalisability and consistency in our results by developing and testing AFA algorithm
using multiple datasets, and realistic stimuli types (e.g. frustration on the Web). Even
though our evaluations have been lab-based, the choice of our affect detection mecha-
nism has the potential for use in naturalistic settings, with web cameras.
We propose that AFA algorithm be optimised using the online change point detection
algorithm described in Chapter 3 to split real-time streams of pupillary response so
that AFA algorithm can be applied on atomic segments to sense arousal during user
interaction. This can then be combined with web camera’s that are capable of pupil-
lometry so that the potential of applying AFA algorithm in naturalistic settings is
fulfilled. We discuss the technological limitations next.
Technological limitations: As we established in Chapter 3, pupillometry devices
are still costly and in many cases large, rather than inbuilt on into personal computers.
The accuracy and fidelity of eye trackers have improved over the years; as now, we
have eye trackers that capture data at 1200Hz. For our study, we used Tobii X60,
which has a maximum frequency of 60Hz. As we observed in Chapter 4, that there
is a moderate correlation between the accuracy of AFA algorithm, and the accuracy of
the eye tracker (measured by the number of times the device can capture both pupils
over the entire session). Several factors such as blinks, head movement interfere with
CHAPTER 7. DISCUSSION AND CONCLUSION 194
the eye tracker’s ability to capture the pupil, but eye trackers with higher frequency
rates recover faster from data loss.
Application limitations: The use of pupillary response as an affect detection mech-
anism relies on participants to have correct to normal vision. Hence, visually impaired
users could make use of affect detection mechanisms like the Galvanic skin response.
Also, applications that do not require visual attention, such as listening to music could
benefit from other affect detection mechanisms like the GSR [Kim and Andre, 2008].
Contextual limitations: When we sense arousal, we measure it to the participants’
baseline dynamically and sense relative changes in arousal. Hence, from our measure-
ment, arousal ranges from 1-9, where 1-3 may be considered low, 4-6 medium and
7-9 are high levels of arousal. Our research could benefit from an extensive study to
understand the practical implications of these arousal levels on the user’s interaction.
Third-party applications could benefit from this as they know how much disruption
can be introduced due to this. Another contextual limitation is that arousal is a proxy
that could indicate stress, interest, boredom, alertness, frustration. In lab studies,
qualitative methods such as interviews or reported feedback can be collected from
the participants to narrow down the reason for increases in arousal. In the wild, this
may not be possible. In AFA algorithm, we detect the user’s focal attention during
moments of increase in arousal. We could, therefore, improve AFA algorithm with an
automatic identification of objects, which can then be fed through sentiment analysis
to detect if the object has a negative or positive valence so that we can have more
context. We discuss more this in future work. We discuss the potential solutions to
these limitations next.
7.4 Future Research Pathways and The Potential
of Our Research
In this section, we discuss future work from the perspective of the research opportu-
nities and potential applications of our research. Based on the setting for which our
research work can be applied, we categorise our recommendations for future work into
two: 1. Future work that could enhance the use of AFA algorithm in controlled settings
(in-the-lab), and 2. Future work to be done to ensure that our research can be applied
CHAPTER 7. DISCUSSION AND CONCLUSION 195
in naturalistic settings (in-the-wild). The first recommendation discussed below relate
to the former, while the other five pertain to the application of our research in the wild.
7.4.1 User evaluation of our visualisation toolkit
The visualisation toolkit for our output of AFA algorithm was presented in Chapter
7. The visualisation toolkit displays the changes in arousal, either as a function of
the participant’s area of focal attention, or, with regards to changes in time as they
interact with a visual stimulus. It was proposed as a medium to aid researchers in
formulating hypotheses as to the arousal dynamics of participants.
The next phase in the advancement of this toolkit is its evaluation. This is necessary to
access the learnability of the tool, and if indeed salient trends in the arousal dynamics
of users can be detected by observing the visualisation.
Moreover, we often claimed in this study that HCI and UX researchers would benefit
from understanding the behaviours of users in terms of changes in arousal. While
research trends in the literature support our claim [Foglia et al., 2008, Omata et al.,
2012], this study can be done to further assess the technology acceptance of our overall
approach, by its potential users (HCI and UX researchers).
7.4.2 The impact of light on AFA algorithm
As we highlighted in Chapter 7, one of the limitations to our approach is light changes.
We observed that intra-colour changes within stimulus do not have significant effects
on our measure of arousal. Therefore, in laboratory settings, researchers may control
for light by keeping ambient light constant. However, for AFA algorithm to be utilised
in naturalistic settings, changes in perceived brightness need to be accounted for. We
propose that studies should be conducted with varying light settings, to understand its
effect on the pupil. When this relationship between light and the pupil is established,
our model of the users’ pupil dilation could be modulated such that the amplitude of
our the affective signal is increased to counter the effect of pupil constriction during
increased brightness. Similarly, the amplitude should be reduced to account for partic-
ipants experiencing reduced brightness so that the modulated signal more accurately
CHAPTER 7. DISCUSSION AND CONCLUSION 196
reflects the participant’s affective state.
7.4.3 Combining AFA algorithm with other affect detection
mechanisms
The scope of our research was limited to sensing arousal during visual interaction.
Pupillary response and the analysis of gaze behaviour was our preferred approach,
within this scope. However, as we stated in the limitations, many eye-trackers require
participants to have correct to normal vision. Therefore, we propose an effective way
to alternate between the use of eye trackers and for example, GSR sensors so that if our
approach is not feasible, sensing arousal would still be possible. This is different from
combining both technologies simultaneously for affect detection, as that would negate
our previously established principle (that multi-sensory approach would decrease the
likelihood for widespread use). What we proffer in this case is a provision whereby
when one sensor is unavailable, another could be used, to deliver a similar outcome.
7.4.4 Optimizing AFA algorithm for real-time arousal sensing
In developing our algorithm, we used the offline change point detection algorithm to
split interaction data into smaller segments. For AFA algorithm to be utilised in natu-
ralistic settings in real-time, we propose the use of the online change point detection
algorithm. Since this online version has not been evaluated along with AFA algorithm,
we have listed it here for future work. Real-time analyses of pupillary dynamics may
provide affective context for adaptive computing and recommender systems.
7.4.5 Extending AFA algorithm for adaptive systems
Adaptive computing can be used to influence users’ interaction experience. This could
be achieved through a change in the users content or layout either in real-time or not.
For adaptive computing to be effective, certain events need to initiate the adaptive
engine. When AFA algorithm is optimised for real-time detection, triggers can be set;
for example, when the arousal level of participants reaches a certain threshold while
fixating on a text with small font, the font size could be magnified. Another example
in intelligent tutoring systems could be used in adaptive tests where difficult questions
CHAPTER 7. DISCUSSION AND CONCLUSION 197
can cause users to experience an increase in arousal. Triggers can be set such that the
adaptive engine fetches a less difficult question so that the learner does not drop out.
7.4.6 Utilizing AFA algorithm on mobile devices
Of all the future works listed, enabling AFA algorithm to function accurately on mobile
devices appears to be the most challenging. However, for our approach to become truly
ubiquitous, AFA algorithm should be optimised for use on mobile devices. Besides the
hardware limitations that mobile devices present, i.e., a camera with eye-tracking ca-
pabilities, there are other challenging conditions that currently present AFA algorithm
from sensing changes in arousal accurately on mobile devices in the wild. Varying am-
bient lights, dynamic screen brightness, relative positional changes between the user’s
pupil and the camera are some of them.
Our work opens up research avenues in affective computing, ubiquitous computing,
usability and UX evaluations.
7.5 Conclusion
We started off our research with the theme, “Physiological correlates of affect”. The
literature into affective computing has evolved since the inception of the domain, where
foremost researcher, Rosalind Picard envisioned future computers to have the capa-
bility to sense the emotions of its users [Picard, 1997]. She opined that sensing user
emotions would facilitate computer systems to deliver contents and services such as
entertainment, education, information, software user interfaces in a more intelligent
manner [Picard, 1997]. The ambition is to attain a level, similar to humans, where
we observe each other’s emotional cues such as facial expressions, voice prosody, body
gestures to improve social interaction. Similar to human beings, computers are also
equipped with a sensor to learn these behavioural cues. Human beings still have a
higher accuracy of emotion detection, but with the potential in computational tech-
niques, researchers are making headway in using computers to sense people’s affective
states. In comparison to human beings, computers have processing and storage ad-
vantage, which makes inferences less biased. However, human beings are currently
CHAPTER 7. DISCUSSION AND CONCLUSION 198
better at putting confounding factors into context. For affective sensors, environ-
mental conditions (light, motion, location, interaction scenarios), psychological states
(personality traits, moods, previous emotional states) and demographic profiles (pre-
vious experience, sex, culture, age) may translate into noisy data. Whereas, human
beings are currently better at factoring contextual information while observing each
other’s emotional cues.
Our literature review helped us understand the landscape of affective computing, and
existing mechanisms to sense user affect. We observed that there are many research
works focused on in-the-lab affect detection, with limited potential for application in
the wild. Therefore, we defined our scope, that is, an approach, and a method for
sensing user emotions in such a way that it has the potential for deployment in the
wild. There are many potential HCI-related benefits, and software application for
sensing user affect. Some of them include affective gaming [Hudlicka, 2009], intelligent
tutoring systems [Sidney et al., 2005], stress detection [Mullins and Treu, 1991], etc.
However, the full benefits of affective computing would only be realised by end-users
if the technique for sensing affect has the potential for widespread ubiquitous use.
Therefore, our first objective was to select an affect detection mechanism that has the
potential for deployment in the wild. This was our first and most important criteria,
because unlike generalisability and accuracy (which can be improved upon), the choice
of affect detection mechanism, once decided can only be changed for another. Each
affect detection sensor comes with its peculiarity, so, a change may require equally
thorough research and testing. Through our literature review (2), we selected pupil-
lary response and gaze behaviour as our affect detection mechanism.
We defined our research to be applicable on the Web and interactive systems. There-
fore, rather than focusing on all affective states, we defined our scope to sensing arousal.
Arousal is a dimension of affect that influences user experience critically. As we stated
in previous chapters, arousal is the intensity of an emotion and can be used as a proxy
for measuring interest, attention, cognitive load, all of which are of interest in inter-
active systems.
Pupillary responses allow us sense changes in the users’ level of arousal while gaze be-
haviour enables us to know where they are looking at when they experience a change
in arousal [Partala and Surakka, 2003]. We use peak detection to sense when these
CHAPTER 7. DISCUSSION AND CONCLUSION 199
changes occur, compute the most fixated area 1 second before the increase in arousal.
When a peak is detected, we compute the magnitude of the peak and compound it
by the total fixation duration during the moment of a peak, to measure the impact of
the peak on the user. We took a data-driven methodology to develop AFA algorithm
by analysing eye-tracking datasets from different domains and improving our methods
iteratively.
In evaluating AFA algorithm, we considered generalisability to different stimuli types
(emotions, cognition, frustration, static and during user interaction on the Web). Alto-
gether, we developed AFA algorithm using eye-tracking datasets from two independent
studies consisting of a total of 60 participants. In the first study, using EEG images
as our stimuli, we developed AFA algorithm to sense arousal from static images, while
in the other study, we used eye-tracking data from interaction with the Protege user
interface. We went on to evaluate AFA algorithm on three independent studies, with
108 participants in total. In the first study, we evaluated AFA algorithm on its ability
to sense arousal from emotion-evoking images, where we had a moderate correlation
between our measure of arousal and the participants’ reported arousal rating. In the
second study, participants were asked to say aloud, the names of animals in one stim-
ulus, and the name of different colours, for the other stimulus. For the controlled
condition, the objects (animals/colours) we correctly (congruently) labelled [La Heij,
1988]. In the comparison condition, each object was misnamed (incongruently), to
induce Stroops’ effect (increased level of cognitive demand) onto the participants. In
this second evaluation study, we observed that there was a moderate correlation be-
tween the output of AFA algorithm and the expected level of arousal, for the animal
naming task, while for the colour naming task, there was a strong correlation. In
the third evaluation study, participants were presented with four Web tasks. In the
controlled form, participants carried out the tasks normally while in the comparison
tasks, we injected known causes of frustration unto the participants. AFA algorithm
was capable of distinguishing between both tasks with a strong effect. We also ob-
served a strong correlation between the users’ attention (measured by their fixation
count) on the components that induced arousal and their measures of arousal. The
three studies showed that AFA algorithm could sense arousal and the user’s attention
during moments of increased arousal.
CHAPTER 7. DISCUSSION AND CONCLUSION 200
Next, we extended AFA algorithm to develop a novel methodology for understanding
user behaviours on the Web. For this, we combined AFA algorithm with STA algo-
rithm, which computes the trending visual scan path of viewers of a visual stimulus.
To develop this approach, we performed a secondary analysis of the eye-tracking data
of 41 participants that viewed the Apple home page. We used the secondary analysis
to develop a visualisation that can help to uncover behaviours that are common to a
user group for a visual stimulus. To evaluate our methodology, we explored differences
between people with autism and neurotypical people with regards to how they browse
certain Web pages. Our study yielded results that are consistent with the literature on
people with autism. For example, we discovered cases where emotive images evoked
more arousal onto the neurotypical users than the users with autism.
Finally, in terms of AFA algorithm, as an indirect contribution to this research, we have
also developed an arousal explorer tool, and another visualisation toolkit to observe
the results of the analysis of eye-tracking data, visually. We propose that it can be
used to aid hypothesis generation from the output of AFA algorithm. This thesis de-
veloped an approach to sensing arousal and visual attention during user interaction.
We set out with these three objectives:
1. To sense arousal using an affect detection mechanism that has the po-
tential for future use.
For affect detect mechanisms to become accessible, they need to be cheap, not
bulky and required minimal skills to set up. After carrying out a literature
review, our choice to use pupillary response is based on the fact that it is un-
obtrusive. Further, prevalent trends in eye tracking device suggest that web
cameras with eye-tracking capabilities are becoming mainstream. Our research
provides motivation for web camera manufacturers to deliver web cameras with
pupillometric capabilities. This would, in turn, spur up the affective computing
community, UX researchers, UI and system designers to develop applications
that can work with AFA algorithm. We conclude that AFA algorithm has the
potential for widespread ubiquitous use.
2. To sense arousal in a way that is generalisable in visual interaction
Besides the potential for widespread ubiquitous use, generalisability was of key
CHAPTER 7. DISCUSSION AND CONCLUSION 201
importance to our research. This is why we took a data-driven approach, devel-
oping and evaluating our approach by using datasets from several visual stimuli.
Our stimuli varied from static to interactive contents, emotional to cognitive
stimuli types. Our ground truths were based on participant feedback, literature
and domain experts. We applied our approach on the Web, and extended its
use to another existing methodology, the STA algorithm. We, therefore, con-
clude that AFA algorithm can be applied to a variety of visual stimuli types
and extended to other research methodologies, or systems that work with visual
stimuli.
3. To sense arousal accurately
As we stated earlier, this was not our most important criteria, because we be-
lieved that the accuracy of our technique could be improved with eye trackers
with higher fidelity (frequency of data collection and resolution). In evaluating
AFA algorithm, our approach has consistently delivered moderate to strong corre-
lations or effects between the arousal measure of AFA algorithm, and our ground
truths. Our results show promise and the correlation between the accuracy of
AFA algorithm and the accuracy of the data collected by our eye tracker confirms
the potential for improvement.
In addition to our set objectives, we proposed, designed and developed a way to
visualise our algorithm such that researchers can use our the visualisation toolkit
as a medium to observe the results of AFA algorithm in a way that aids hypothesis
formulation. In terms of our research questions, RQ1 was addressed through the first
objective, where we selected pupillary response and the analysis of gaze behaviour as
our affect detection mechanism. RQ2, which is about accuracy and generalisability,
was addressed in objectives two and three above. As we stated, in all our evaluations,
we observed a moderate to strong correlation/effect between our measure of arousal
and the ground truth. Finally, in RQ3, which is about determining the focal attention
of users during moments of arousal, we observed a correlation between the measure of
arousal, and the component being fixated upon during moments of increased arousal on
the Web. RQ3 was addressed further in Chapter 6, where we extended AFA algorithm
successfully by combining it with the STA algorithm. In this novel methodology,
CHAPTER 7. DISCUSSION AND CONCLUSION 202
we confirmed existing behavioural patterns that were observed in the literature. For
example, the difference in affective response of people with autism, compared with
neurotypical people on the Web. The results of these two studies suggest that AFA
algorithm can be used to sense the user’s focal attention during moments of increased
arousal.
Our aims and objectives for this research have been addressed. Further, our objectives
aided us in answering our research questions. We highlighted the limitations of our
research in Chapter 7 and stated alternative approaches. In the next section, we
expanded more on our recommendations for future work and research pathways in
sensing arousal and focal attention during user interaction. Finally, we presented our
concluding remarks.
Bibliography
[Abbasi et al., 2010] Abbasi, A. R., Dailey, M. N., Afzulpurkar, N. V., and Uno, T.
(2010). Student mental state inference from unintentional body gestures using dy-
namic bayesian networks. Journal on Multimodal User Interfaces, 3(1-2):21–31.
[Abdrabou et al., 2018] Abdrabou, Y., Kassem, K., Salah, J., El-Gendy, R., Morsy,
M., Abdelrahman, Y., and Abdennadher, S. (2018). Exploring the usage of eeg and
pupil diameter to detect elicited valence. In International Conference on Intelligent
Human Systems Integration, pages 287–293. Springer.
[Agrafioti et al., 2012] Agrafioti, F., Hatzinakos, D., and Anderson, A. K. (2012). Ecg
pattern analysis for emotion detection. Affective Computing, IEEE Transactions
on, 3(1):102–115.
[Agrawal et al., 2013] Agrawal, U., Giripunje, S., and Bajaj, P. (2013). Emotion and
gesture recognition with soft computing tool for drivers assistance system in human
centered transportation. In Systems, Man, and Cybernetics (SMC), 2013 IEEE
International Conference on, pages 4612–4616. IEEE.
[Ahern and Schwartz, 1985] Ahern, G. L. and Schwartz, G. E. (1985). Differential
lateralization for positive and negative emotion in the human brain: Eeg spectral
analysis. Neuropsychologia, 23(6):745–755.
[Ahn and Picard, 2005] Ahn, H. and Picard, R. W. (2005). Affective-cognitive learn-
ing and decision making: A motivational reward framework for affective agents. In
International Conference on Affective Computing and Intelligent Interaction, pages
866–873. Springer.
203
BIBLIOGRAPHY 204
[Akgun and Ciarrochi, 2003] Akgun, S. and Ciarrochi, J. (2003). Learned resourceful-
ness moderates the relationship between academic stress and academic performance.
Educational Psychology, 23(3):287–294.
[Alamia et al., 2019] Alamia, A., VanRullen, R., Pasqualotto, E., Mouraux, A., and
Zenon, A. (2019). Pupil-linked arousal responds to unconscious surprisal. Journal
of Neuroscience, pages 3010–18.
[Alexander et al., 2003] Alexander, S., Sarrafzadeh, A., and Fan, C. (2003). Pay at-
tention! the computer is watching: Affective tutoring systems. In Proceedings of
World Conference on E-Learning in Corporate, Government, Healthcare, and Higher
Education, pages 1463–1466.
[Alhargan et al., 2017] Alhargan, A., Cooke, N., and Binjammaz, T. (2017). Affect
recognition in an interactive gaming environment using eye tracking. In 2017 Sev-
enth International Conference on Affective Computing and Intelligent Interaction
(ACII), pages 285–291. IEEE.
[Alhothali, 2011] Alhothali, A. (2011). Modeling user affect using interaction events.
Warloo.
[Allen et al., 1988] Allen, C. T., Machleit, K. A., and Marine, S. S. (1988). On assess-
ing the emotionality of advertising via izards differential emotions scale. Advances
in Consumer Research, 15(1):226–231.
[Allen et al., 2001] Allen, J. J., Harmon-Jones, E., and Cavender, J. H. (2001). Manip-
ulation of frontal eeg asymmetry through biofeedback alters self-reported emotional
responses and facial emg. Psychophysiology, 38(4):685–693.
[Amico, 2018] Amico, S. (2018). ETNA: a Virtual Reality Game with Affective Dy-
namic Difficulty Adjustment based on Skin Conductance. PhD thesis.
[Aminihajibashi et al., 2019] Aminihajibashi, S., Hagen, T., Foldal, M. D., Laeng, B.,
and Espeseth, T. (2019). Individual differences in resting-state pupil size: Evi-
dence for association between working memory capacity and pupil size variability.
International Journal of Psychophysiology.
BIBLIOGRAPHY 205
[Arent and Landers, 2003] Arent, S. M. and Landers, D. M. (2003). Arousal, anxiety,
and performance: A reexamination of the inverted-u hypothesis. Research quarterly
for exercise and sport, 74(4):436–444.
[Ashkanasy and Daus, 2002] Ashkanasy, N. M. and Daus, C. S. (2002). Emotion in the
workplace: The new challenge for managers. Academy of Management Perspectives,
16(1):76–86.
[Baban et al., 2009] Baban, S. M., Mohammed, P., Baberstock, P., Sankat, C., Boyd,
W., Laukner, B., Lloyd, D., and Baban, S. M. (2009). The Journey from Pondering
to Publishing. University of the West Indies Press.
[Babiker et al., 2015] Babiker, A., Faye, I., Prehn, K., and Malik, A. (2015). Ma-
chine learning to differentiate between positive and negative emotions using pupil
diameter. Frontiers in psychology, 6:1921.
[Bahr and Ford, 2011] Bahr, G. S. and Ford, R. A. (2011). How and why pop-ups dont
work: Pop-up prompted eye movements, user affect and decision making. Computers
in Human Behavior, 27(2):776–783.
[Bakhtiyari et al., 2014] Bakhtiyari, K., Taghavi, M., and Husain, H. (2014). Hybrid
affective computingkeyboard, mouse and touch screen: from review to experiment.
Neural Computing and Applications.
[Ballesteros and Croft, 1998] Ballesteros, L. and Croft, W. B. (1998). Resolving am-
biguity for cross-language retrieval. In Sigir, volume 98, pages 64–71.
[Baltaci and Gokcay, 2012] Baltaci, S. and Gokcay, D. (2012). Negative sentiment in
scenarios elicit pupil dilation response: an auditory study. In Proceedings of the 14th
ACM international conference on Multimodal interaction, pages 529–532. ACM.
[Baltaci and Gokcay, 2014] Baltaci, S. and Gokcay, D. (2014). Role of pupil dilation
and facial temperature features in stress detection. In Signal Processing and Com-
munications Applications Conference (SIU), 2014 22nd, pages 1259–1262. IEEE.
[Baltaci and Gokcay, 2016] Baltaci, S. and Gokcay, D. (2016). Stress detection in
human–computer interaction: Fusion of pupil dilation and facial temperature fea-
tures. International Journal of Human–Computer Interaction, 32(12):956–966.
BIBLIOGRAPHY 206
[Bamidis et al., 2004] Bamidis, P. D., Papadelis, C., Kourtidou-Papadeli, C., Pappas,
C., and Vivas, A. B. (2004). Affective computing in the era of contemporary neu-
rophysiology and health informatics. Interacting with Computers, 16(4):715–721.
[Baron-Cohen et al., 1988] Baron-Cohen, R. P., Ouston, J., and Lee, A. (1988). Emo-
tion recognition in autism: Coordinating faces and voices. Psychological medicine,
18(4):911–923.
[Baylor and Rosenberg-Kima, 2006] Baylor, A. L. and Rosenberg-Kima, R. B. (2006).
Interface agents to alleviate online frustration. In Proceedings of the 7th interna-
tional conference on Learning sciences, pages 30–36. International Society of the
Learning Sciences.
[Beatty, 1982] Beatty, J. (1982). Task-evoked pupillary responses, processing load,
and the structure of processing resources. Psychological bulletin, 91(2):276.
[Benedek and Hazlett, 2005] Benedek, J. and Hazlett, R. (2005). Incorporating facial
emg emotion measures as feedback in the software design process. Proc. Human
Computer Interaction Consortium.
[Benko and Wigdor, 2010] Benko, H. and Wigdor, D. (2010). Imprecision, inaccu-
racy, and frustration: The tale of touch input. In Tabletops-Horizontal Interactive
Displays, pages 249–275. Springer.
[Bergadano et al., 2002] Bergadano, F., Gunetti, D., and Picardi, C. (2002). User
authentication through keystroke dynamics. ACM Transactions on Information
and System Security (TISSEC), 5(4):367–397.
[Berkowitz, 1962] Berkowitz, L. (1962). Aggression: A social psychological analysis.
PsycINFO research-info-systems.
[Bijmolt et al., 2010] Bijmolt, T. H., Leeflang, P. S., Block, F., Eisenbeiss, M., Hardie,
B. G., Lemmens, A., and Saffert, P. (2010). Analytics for customer engagement.
Journal of Service Research, 13(3):341–356.
[Biniok, 2018] Biniok, J. (2018). Tamper monkey. https://github.com/
Tampermonkey/tampermonkey.
BIBLIOGRAPHY 207
[Birditt et al., 2005] Birditt, K. S., Fingerman, K. L., and Almeida, D. M. (2005).
Age differences in exposure and reactions to interpersonal tensions: a daily diary
study. Psychology and aging, 20(2):330.
[Boucsein, 2012] Boucsein, W. (2012). Electrodermal activity. Springer Science &
Business Media.
[Bousefsaf et al., 2013] Bousefsaf, F., Maaoui, C., and Pruski, A. (2013). Remote as-
sessment of the heart rate variability to detect mental stress. In Pervasive Computing
Technologies for Healthcare (PervasiveHealth), 2013 7th International Conference
on, pages 348–351. IEEE.
[Bousefsaf et al., 2014] Bousefsaf, F., Maaoui, C., and Pruski, A. (2014). Remote
detection of mental workload changes using cardiac parameters assessed with a low-
cost webcam. Computers in biology and medicine, 53:154–163.
[Bradley and Lang, 1994] Bradley, M. M. and Lang, P. J. (1994). Measuring emo-
tion: the self-assessment manikin and the semantic differential. Journal of behavior
therapy and experimental psychiatry, 25(1):49–59.
[Bradley et al., 2008a] Bradley, M. M., Miccoli, L., Escrig, M. A., and Lang, P. J.
(2008a). The pupil as a measure of emotional arousal and autonomic activation.
Psychophysiology, 45(4):602–607.
[Bradley et al., 2008b] Bradley, M. M., Miccoli, L., Escrig, M. a., and Lang, P. J.
(2008b). The pupil as a measure of emotional arousal and autonomic activation.
Psychophysiology, 45(4):602–607.
[Bradley et al., 2017] Bradley, M. M., Sapigao, R. G., and Lang, P. J. (2017). Sym-
pathetic ans modulation of pupil diameter in emotional scene perception: Effects of
hedonic content, brightness, and contrast. Psychophysiology, 54(10):1419–1435.
[Bremner, 2012] Bremner, F. D. (2012). Pupillometric evaluation of the dynamics of
the pupillary response to a brief light stimulus in healthy subjects. Investigative
ophthalmology & visual science, 53(11):7343–7347.
BIBLIOGRAPHY 208
[Broekens and Brinkman, 2013] Broekens, J. and Brinkman, W.-P. (2013). Affectbut-
ton: A method for reliable and valid affective self-report. International Journal of
Human-Computer Studies, 71(6):641–667.
[Brouwer et al., 2015] Brouwer, A.-M., Zander, T. O., Van Erp, J. B., Korteling, J. E.,
and Bronkhorst, A. W. (2015). Using neurophysiological signals that reflect cogni-
tive or affective state: six recommendations to avoid common pitfalls. Frontiers in
neuroscience, 9:136.
[Brown et al., 2011] Brown, L., Grundlehner, B., and Penders, J. (2011). Towards
wireless emotional valence detection from eeg. In Engineering in Medicine and
Biology Society, EMBC, 2011 Annual International Conference of the IEEE, pages
2188–2191. IEEE.
[Brugha et al., 2012] Brugha, T., Cooper, S. A., McManus, S., Purdon, S., Smith, J.,
Scott, F., Spiers, N., and Tyrer, F. (2012). Estimating the prevalence of autism spec-
trum conditions in adults: extending the 2007 adult psychiatric morbidity survey.
The NHS Informaiton Centre.
[Bruneau et al., 2002] Bruneau, D., Sasse, M. A., and McCarthy, J. (2002). The eyes
never lie: The use of eye tracking data in hci research. In Proceedings of the CHI,
volume 2, page 25. Citeseer.
[Bruun et al., 2016] Bruun, A., Law, E. L.-C., Heintz, M., and Alkly, L. H. (2016).
Understanding the relationship between frustration and the severity of usability
problems: What can psychophysiological data (not) tell us? In Proceedings of the
2016 CHI Conference on Human Factors in Computing Systems, pages 3975–3987.
ACM.
[Buchanan, 2018] Buchanan, J. (2018). Project title. https://github.com/insin/
greasemonkey.
[Buettner et al., 2018] Buettner, R., Scheuermann, I. F., Koot, C., Rossle, M., and
Timm, I. J. (2018). Stationarity of a users pupil size signal as a precondition of
pupillary-based mental workload evaluation. In Information Systems and Neuro-
science, pages 195–200. Springer.
BIBLIOGRAPHY 209
[Burger et al., 2013] Burger, B., Saarikallio, S., Luck, G., Thompson, M. R., and
Toiviainen, P. (2013). Relationships between perceived emotions in music and music-
induced movement. Music Perception: An Interdisciplinary Journal, 30(5):517–533.
[Burleson and Picard, 2004] Burleson, W. and Picard, R. W. (2004). Affective agents:
Sustaining motivation to learn through failure and a state of stuck. In Workshop
on Social and Emotional Intelligence in Learning Environments.
[Busso et al., 2009] Busso, C., Lee, S., and Narayanan, S. (2009). Analysis of emotion-
ally salient aspects of fundamental frequency for emotion detection. Audio, Speech,
and Language Processing, IEEE Transactions on, 17(4):582–596.
[Cacioppo et al., 1992] Cacioppo, J. T., Bush, L. K., and Tassinary, L. G. (1992).
Microexpressive facial actions as a function of affective stimuli: Replication and
extension. Personality and Social Psychology Bulletin, 18(5):515–526.
[Cacioppo et al., 1986] Cacioppo, J. T., Petty, R. E., Losch, M. E., and Kim, H. S.
(1986). Electromyographic activity over facial muscle regions can differentiate the
valence and intensity of affective reactions. Journal of personality and social psy-
chology, 50(2):260.
[Calandra et al., 2016] Calandra, D. M., Di Mauro, D., DAuria, D., and Cutugno, F.
(2016). Eyecu: an emotional eye tracker for cultural heritage support. In Empow-
ering Organizations, pages 161–172. Springer.
[Calvo and D’Mello, 2010] Calvo, R. A. and D’Mello, S. (2010). Affect detection: An
interdisciplinary review of models, methods, and their applications. IEEE Transac-
tions on Affective Computing, 1(1):18–37.
[Castellano et al., 2008] Castellano, G., Kessous, L., and Caridakis, G. (2008). Emo-
tion recognition through multiple modalities: face, body gesture, speech. In Affect
and emotion in human-computer interaction, pages 92–103. Springer.
[Catalano, 2002] Catalano, J. T. (2002). Guide to ECG analysis. Lippincott Williams
& Wilkins.
BIBLIOGRAPHY 210
[Ceaparu et al., 2004] Ceaparu, I., Lazar, J., Bessiere, K., Robinson, J., and Shnei-
derman, B. (2004). Determining causes and severity of end-user frustration. Inter-
national journal of human-computer interaction, 17(3):333–356.
[Celani et al., 1999] Celani, G., Battacchi, M. W., and Arcidiacono, L. (1999). The
understanding of the emotional meaning of facial expressions in people with autism.
Journal of autism and developmental disorders, 29(1):57–66.
[Chanel et al., 2006] Chanel, G., Kronegg, J., Grandjean, D., and Pun, T. (2006).
Emotion assessment: Arousal evaluation using eegs and peripheral physiological
signals. Multimedia content representation, classification and security, pages 530–
537.
[Chanel et al., 2008] Chanel, G., Rebetez, C., Betrancourt, M., and Pun, T. (2008).
Boredom, engagement and anxiety as indicators for adaptation to difficulty in games.
In Proceedings of the 12th international conference on Entertainment and media in
the ubiquitous era, pages 13–17. ACM.
[Chang et al., 2011] Chang, K.-h., Fisher, D., Canny, J., and Hartmann, B. (2011).
How’s my mood and stress?: an efficient speech analysis library for unobtrusive
monitoring on mobile phones. In Proceedings of the 6th International Conference
on Body Area Networks, pages 71–77. ICST (Institute for Computer Sciences, Social-
Informatics and Telecommunications Engineering).
[Charles et al., 2001] Charles, S. T., Reynolds, C. A., and Gatz, M. (2001). Age-
related differences and change in positive and negative affect over 23 years. Journal
of personality and social psychology, 80(1):136.
[Chellali and Hennig, 2013] Chellali, R. and Hennig, S. (2013). Is it time to rethink
motion artifacts? temporal relationships between electrodermal activity and body
movements in real-life conditions. In Affective Computing and Intelligent Interaction
(ACII), 2013 Humaine Association Conference on, pages 330–335. IEEE.
[Chen et al., 2017] Chen, H., Dey, A., Billinghurst, M., and Lindeman, R. W. (2017).
Exploring pupil dilation in emotional virtual reality environments.
BIBLIOGRAPHY 211
[Chen et al., 2003] Chen, L.-Q., Xie, X., Fan, X., Ma, W.-Y., Zhang, H.-J., and Zhou,
H.-Q. (2003). A visual attention model for adapting images on small displays.
Multimedia systems, 9(4):353–364.
[Cheng and Liu, 2008] Cheng, B. and Liu, G.-Y. (2008). Emotion recognition from
surface emg signal using wavelet transform and neural network. In Proceedings
of the 2nd international conference on bioinformatics and biomedical engineering
(ICBBE), pages 1363–1366.
[Chiang and Lin, 2007] Chiang, H.-M. and Lin, Y.-H. (2007). Reading comprehension
instruction for students with autism spectrum disorders: A review of the literature.
Focus on Autism and Other Developmental Disabilities, 22(4):259–267.
[Chmielewska et al., 2019] Chmielewska, M., Dzienkowski, M., Bogucki, J., Kocki,
W., Kwiatkowski, B., Pe lka, J., and Tuszynska-Bogucka, W. (2019). Affective com-
puting with eye-tracking data in the study of the visual perception of architectural
spaces. In MATEC Web of Conferences, volume 252, page 03021. EDP Sciences.
[Choe et al., 2016] Choe, K. W., Blake, R., and Lee, S.-H. (2016). Pupil size dynamics
during fixation impact the accuracy and precision of video-based gaze estimation.
Vision research, 118:48–59.
[Christensen et al., 2018] Christensen, D. L., Braun, K. V. N., Baio, J., Bilder, D.,
Charles, J., Constantino, J. N., Daniels, J., Durkin, M. S., Fitzgerald, R. T.,
Kurzius-Spencer, M., et al. (2018). Prevalence and characteristics of autism spec-
trum disorder among children aged 8 yearsautism and developmental disabilities
monitoring network, 11 sites, united states, 2012. MMWR Surveillance Summaries,
65(13):1.
[Christian et al., 2014] Christian, Allison, B., Nijholt, A., and Chanel, G. (2014). A
survey of affective brain computer interfaces: principles, state-of-the-art, and chal-
lenges. Brain-Computer Interfaces, 1(2):66–84.
[Cole et al., 2002] Cole, P. M., Bruschi, C. J., and Tamang, B. L. (2002). Cultural dif-
ferences in children’s emotional reactions to difficult situations. Child development,
73(3):983–996.
BIBLIOGRAPHY 212
[Colizoli et al., 2018] Colizoli, O., De Gee, J. W., Urai, A. E., and Donner, T. H.
(2018). Task-evoked pupil responses reflect internal belief states. Scientific reports,
8(1):13702.
[Constantine and Hajj, 2012] Constantine, L. and Hajj, H. (2012). A survey of ground-
truth in emotion data annotation. In Pervasive Computing and Communications
Workshops (PERCOM Workshops), 2012 IEEE International Conference on, pages
697–702. IEEE.
[Cook et al., 2013] Cook, R., Brewer, R., Shah, P., and Bird, G. (2013). Alexithymia,
not autism, predicts poor recognition of emotional facial expressions. Psychological
science, 24(5):723–732.
[Corbetta et al., 1998] Corbetta, M., Akbudak, E., Conturo, T. E., Snyder, A. Z.,
Ollinger, J. M., Drury, H. A., Linenweber, M. R., Petersen, S. E., Raichle, M. E.,
Van Essen, D. C., et al. (1998). A common network of functional areas for attention
and eye movements. Neuron, 21(4):761–773.
[Cowie et al., 2001] Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kol-
lias, S., Fellenz, W., and Taylor, J. G. (2001). Emotion recognition in human-
computer interaction. IEEE Signal processing magazine, 18(1):32–80.
[Crichton, 2001] Crichton, N. (2001). Visual analogue scale (vas). J Clin Nurs,
10(5):706–6.
[Critchley, 2002] Critchley, H. D. (2002). Book review: electrodermal responses: what
happens in the brain. The Neuroscientist, 8(2):132–142.
[Cromby, 2012] Cromby, J. (2012). Feeling the way: Qualitative clinical research and
the affective turn. Qualitative Research in Psychology, 9(1):88–98.
[Daimi and Saha, 2014] Daimi, S. N. and Saha, G. (2014). Classification of emotions
induced by music videos and correlation with participants rating. Expert Systems
with Applications, 41(13):6057–6065.
[Dalvand and Kazemifard, 2012] Dalvand, K. and Kazemifard, M. (2012). An adap-
tive user-interface based on user’s emotion. 2012 2nd International eConference on
Computer and Knowledge Engineering, ICCKE 2012, pages 161–166.
BIBLIOGRAPHY 213
[Dan-Glauser and Scherer, 2011] Dan-Glauser, E. S. and Scherer, K. R. (2011). The
geneva affective picture database (gaped): a new 730-picture database focusing on
valence and normative significance. Behavior research methods, 43(2):468.
[Datcu, 2014] Datcu, D. (2014). On the Enhancement of Augmented Reality-based
Tele- Collaboration with Affective Computing Technology. -.
[Davidson, 2003] Davidson, R. J. (2003). Seven sins in the study of emotion: Correc-
tives from affective neuroscience. Brain and Cognition, 52(1):129–132.
[Davies et al., 2016] Davies, A., Horseman, L., Splendiani, B., Harper, S., and Jay, C.
(2016). Data driven analysis of visual behaviour for electrocardiogram interpreta-
tion. Technical Report.
[De Silva et al., 2006] De Silva, P. R., Osano, M., Marasinghe, A., and Madurappe-
ruma, A. P. (2006). Towards recognizing emotion with affective dimensions through
body gestures. In Automatic Face and Gesture Recognition, 2006. FGR 2006. 7th
International Conference on, pages 269–274. IEEE.
[Demeyer, 2011] Demeyer, S. (2011). Research methods in computer science. In ICSM,
page 600.
[Denicolo and Becker, 2012] Denicolo, P. and Becker, L. (2012). Developing research
proposals. Sage.
[Desmet, 2003] Desmet, P. (2003). Measuring emotion: Development and application
of an instrument to measure emotional responses to products. In Funology, pages
111–123. Springer.
[Detterman, 1987] Detterman, D. K. (1987). Theoretical notions of intelligence and
mental retardation. American Journal of Mental Deficiency.
[Dimberg, 1990] Dimberg, U. (1990). Facial electromyography and emotional reac-
tions. Psychophysiology.
[Dixon-Woods et al., 2006] Dixon-Woods, M., Bonas, S., Booth, A., Jones, D. R.,
Miller, T., Sutton, A. J., Shaw, R. L., Smith, J. A., and Young, B. (2006). How can
BIBLIOGRAPHY 214
systematic reviews incorporate qualitative research? a critical perspective. Quali-
tative research, 6(1):27–44.
[Doob and Kirshenbaum, 1973] Doob, A. N. and Kirshenbaum, H. M. (1973). The
effects on arousal of frustration and aggressive films. Journal of Experimental Social
Psychology, 9(1):57–64.
[Dowland and Furnell, 2004] Dowland, P. S. and Furnell, S. M. (2004). A long-term
trial of keystroke profiling using digraph, trigraph and keyword latencies. In Security
and Protection in Information Processing Systems, pages 275–289. Springer.
[Duchowski et al., 2018] Duchowski, A. T., Krejtz, K., Krejtz, I., Biele, C., Niedziel-
ska, A., Kiefer, P., Raubal, M., and Giannopoulos, I. (2018). The index of pupillary
activity: Measuring cognitive load vis- a-vis task difficulty with pupil oscillation. In
Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems,
page 282. ACM.
[Ehmke and Wilson, 2007] Ehmke, C. and Wilson, S. (2007). Identifying web usability
problems from eye-tracking data. In Proceedings of the 21st British HCI Group
Annual Conference on People and Computers: HCI... but not as we know it-Volume
1, pages 119–128. British Computer Society.
[Einhauser, 2017] Einhauser, W. (2017). The pupil as marker of cognitive processes.
In Computational and cognitive neuroscience of vision, pages 141–169. Springer.
[Ekman, 1992a] Ekman, P. (1992a). Are there basic emotions?
[Ekman, 1992b] Ekman, P. (1992b). An argument for basic emotions. Cognition &
emotion, 6(3-4):169–200.
[Ekman, 2004] Ekman, P. (2004). Emotions revealed. BMJ, 328(Suppl S5):0405184.
[Ekman and Friesen, 1971] Ekman, P. and Friesen, W. V. (1971). Constants across
cultures in the face and emotion. Journal of personality and social psychology,
17(2):124.
[Ekman and Friesen, 2003] Ekman, P. and Friesen, W. V. (2003). Unmasking the face:
A guide to recognizing emotions from facial clues. Ishk.
BIBLIOGRAPHY 215
[Ekman et al., 1987] Ekman, P., Friesen, W. V., O’sullivan, M., Chan, A., Diacoyanni-
Tarlatzis, I., Heider, K., Krause, R., LeCompte, W. A., Pitcairn, T., Ricci-Bitti,
P. E., et al. (1987). Universals and cultural differences in the judgments of facial
expressions of emotion. Journal of personality and social psychology, 53(4):712.
[el Kaliouby et al., 2006] el Kaliouby, R., Picard, R., and BARON-COHEN, S. (2006).
Affective computing and autism. Annals of the New York Academy of Sciences,
1093(1):228–248.
[Engel, 1960] Engel, B. T. (1960). Stimulus-response and individual-response speci-
ficity. AMA Archives of General Psychiatry, 2(3):305–313.
[Epp et al., 2011] Epp, C., Lippold, M., and Mandryk, R. L. (2011). Identifying emo-
tional states using keystroke dynamics. In Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems, pages 715–724. ACM.
[Eraslan et al., 2017] Eraslan, S., Yaneva, V., Yesilada, Y., and Harper, S. (2017). Do
web users with autism experience barriers when searching for information within
web pages? In Proceedings of the 14th Web for All Conference on The Future of
Accessible Work, W4A ’17, pages 20:1–20:4, New York, NY, USA. ACM.
[Eraslan et al., 2018] Eraslan, S., Yaneva, V., Yesilada, Y., and Harper, S. (2018).
Web users with autism: eye tracking evidence for differences. Behaviour & Infor-
mation Technology.
[Eraslan et al., 2014] Eraslan, S., Yesilada, Y., and Harper, S. (2014). Identifying
patterns in eyetracking scanpaths in terms of visual elements of web pages. In Web
Engineering, pages 163–180. Springer.
[Eraslan et al., 2016a] Eraslan, S., Yesilada, Y., and Harper, S. (2016a). Scanpath
trend analysis on web pages: Clustering eye tracking scanpaths. ACM Trans. Web,
10(4):20:1–20:35.
[Eraslan et al., 2016b] Eraslan, S., Yesilada, Y., and Harper, S. (2016b). Trends in
eye tracking scanpaths: Segmentation effect? In Proceedings of the 27th ACM
Conference on Hypertext and Social Media, HT ’16, pages 15–25, New York, NY,
USA. ACM.
BIBLIOGRAPHY 216
[Erdem and Sert, 2014] Erdem, E. S. and Sert, M. (2014). Efficient recognition of
human emotional states from audio signals. In Multimedia (ISM), 2014 IEEE In-
ternational Symposium on, pages 139–142. IEEE.
[Ettinger et al., 1991] Ettinger, E., Wyatt, H., and London, R. (1991). Anisocoria.
variation and clinical observation with different conditions of illumination and ac-
commodation. Investigative ophthalmology & visual science, 32(3):501–509.
[Exposito et al., 2018] Exposito, M., Picard, R. W., and Hernandez, J. (2018). Affec-
tive keys: towards unobtrusive stress sensing of smartphone users. In Proceedings
of the 20th International Conference on Human-Computer Interaction with Mobile
Devices and Services Adjunct, pages 139–145. ACM.
[Feild et al., 2010] Feild, H. A., Allan, J., and Jones, R. (2010). Predicting searcher
frustration. In Proceedings of the 33rd international ACM SIGIR conference on
Research and development in information retrieval, pages 34–41. ACM.
[Feinstein et al., 2011] Feinstein, J. S., Adolphs, R., Damasio, A., and Tranel, D.
(2011). The human amygdala and the induction and experience of fear. Current
biology, 21(1):34–38.
[Ferhat and Vilarino, 2016] Ferhat, O. and Vilarino, F. (2016). Low cost eye tracking.
Computational intelligence and neuroscience, 2016:17.
[Fernandez et al., 2012] Fernandez, J. M., Augusto, J. C., Seepold, R., and Madrid,
N. M. (2012). A sensor technology survey for a stress-aware trading process. IEEE
Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews,
42(6):809–824.
[Fisher, 1993] Fisher, R. J. (1993). Social desirability bias and the validity of indirect
questioning. Journal of consumer research, 20(2):303–315.
[Foglia et al., 2008] Foglia, P., Prete, C. A., and Zanda, M. (2008). Relating gsr
signals to traditional usability metrics: Case study with an anthropomorphic web
assistant. In 2008 IEEE Instrumentation and Measurement Technology Conference,
pages 1814–1818. IEEE.
BIBLIOGRAPHY 217
[Foster et al., 1998] Foster, C. A., Witcher, B. S., Campbell, W. K., and Green, J. D.
(1998). Arousal and attraction: Evidence for automatic and controlled processes.
Journal of Personality and Social Psychology, 74(1):86.
[Fragopanagos and Taylor, 2005] Fragopanagos, N. and Taylor, J. G. (2005). Emotion
recognition in human–computer interaction. Neural Networks, 18(4):389–405.
[Fritz et al., 2009] Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I.,
Turner, R., Friederici, A. D., and Koelsch, S. (2009). Universal recognition of three
basic emotions in music. Current biology, 19(7):573–576.
[Fuhl et al., 2018] Fuhl, W., Castner, N., and Kasneci, E. (2018). Histogram of ori-
ented velocities for eye movement detection. In Proceedings of the Workshop on
Modeling Cognitive Processes from Multimodal Data, page 5. ACM.
[Gallese, 2006] Gallese, V. (2006). Intentional attunement: A neurophysiological per-
spective on social cognition and its disruption in autism. Brain research, 1079(1):15–
24.
[Gamon, 2004] Gamon, M. (2004). Sentiment classification on customer feedback data:
noisy data, large feature vectors, and the role of linguistic analysis. In Proceedings
of the 20th international conference on Computational Linguistics, page 841. Asso-
ciation for Computational Linguistics.
[Gao and Wang, 2015] Gao, Z. and Wang, S. (2015). Emotion recognition from eeg
signals using hierarchical bayesian network with privileged information. In Proceed-
ings of the 5th ACM on International Conference on Multimedia Retrieval, pages
579–582. ACM.
[Garrett et al., 2004] Garrett, S. K., Horn, D. B., and Caldwell, B. S. (2004). Modeling
user satisfaction, frustration, and user goal/website compatibility. In Proceedings
of the Human Factors and Ergonomics Society Annual Meeting, volume 48, pages
1508–1512. SAGE Publications Sage CA: Los Angeles, CA.
[Geethanjali et al., 2017] Geethanjali, B., Adalarasu, K., Hemapraba, A., Pravin Ku-
mar, S., and Rajasekeran, R. (2017). Emotion analysis using sam (self-assessment
manikin) scale. Biomedical Research (0970-938X), 28.
BIBLIOGRAPHY 218
[Gehricke and Shapiro, 2000] Gehricke, J.-G. and Shapiro, D. (2000). Reduced facial
expression and social context in major depression: discrepancies between facial
muscle activity and self-reported emotion. Psychiatry Research, 95(2):157–167.
[Gellatly and Meyer, 1992] Gellatly, I. R. and Meyer, J. P. (1992). The effects of
goal difficulty on physiological arousal, cognition, and task performance. Journal of
Applied Psychology, 77(5):694.
[Gerald, 2018] Gerald (2018). Violentmonkey. https://github.com/violentmonkey/violentmonkey.
[Gil et al., 2013] Gil, G. B., de Jesus, A. B., and Lopez, J. M. M. (2013). Combining
machine learning techniques and natural language processing to infer emotions using
spanish twitter corpus. In Highlights on Practical Applications of Agents and Multi-
Agent Systems, pages 149–157. Springer.
[Gilleade et al., 2005] Gilleade, K., Dix, A., and Agllanson, J. (2005). Affective
videogames and modes of affective gaming: assist me, challenge me, emote me.
DiGRA 2005: Changing Views–Worlds in Play.
[Gingras et al., 2015] Gingras, B., Marin, M. M., Puig-Waldmuller, E., and Fitch,
W. (2015). The eye is listening: Music-induced arousal and individual differences
predict pupillary responses. Frontiers in human neuroscience, 9:619.
[Gollan et al., 2016] Gollan, B., Haslgrubler, M., and Ferscha, A. (2016). Demon-
strator for extracting cognitive load from pupil dilation for attention management
services. In Proceedings of the 2016 ACM International Joint Conference on Per-
vasive and Ubiquitous Computing: Adjunct, pages 1566–1571. ACM.
[Greene et al., 2016] Greene, S., Thapliyal, H., and Caban-Holt, A. (2016). A survey of
affective computing for stress detection: Evaluating technologies in stress detection
for better health. IEEE Consumer Electronics Magazine, 5(4):44–56.
[Greenland et al., 2016] Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B.,
Poole, C., Goodman, S. N., and Altman, D. G. (2016). Statistical tests, p values,
confidence intervals, and power: a guide to misinterpretations. European journal of
epidemiology, 31(4):337–350.
BIBLIOGRAPHY 219
[Gunes and Piccardi, 2007] Gunes, H. and Piccardi, M. (2007). Bi-modal emotion
recognition from expressive face and body gestures. Journal of Network and Com-
puter Applications, 30(4):1334–1345.
[Guo et al., 2019] Guo, F., Li, M., Qu, Q., and Duffy, V. G. (2019). The effect of
a humanoid robots emotional behaviors on users emotional responses: Evidence
from pupillometry and electroencephalography measures. International Journal of
Human–Computer Interaction, pages 1–13.
[Harms et al., 2010] Harms, M. B., Martin, A., and Wallace, G. L. (2010). Facial
emotion recognition in autism spectrum disorders: a review of behavioral and neu-
roimaging studies. Neuropsychology review, 20(3):290–322.
[Harper et al., 2006] Harper, S., Bechhofer, S., and Lunn, D. (2006). Sadie:: transcod-
ing based on css. In Proceedings of the 8th international ACM SIGACCESS confer-
ence on Computers and accessibility, pages 259–260. ACM.
[Harris et al., 2000] Harris, P. L., de Rosnay, M., and Pons, F. (2000). Understanding
emotion. Handbook of emotions, 2:281–292.
[Hart et al., 2012] Hart, J., Sutcliffe, A., and De Angeli, A. (2012). Using affect to
evaluate user engagement. In CHI’12 Extended Abstracts on Human Factors in
Computing Systems, pages 1811–1834. ACM.
[Hartmann et al., 2005] Hartmann, B., Mancini, M., and Pelachaud, C. (2005). Imple-
menting expressive gesture synthesis for embodied conversational agents. In gesture
in human-Computer Interaction and Simulation, pages 188–199. Springer.
[Hassan, 2006] Hassan, E. (2006). Recall bias can be a threat to retrospective and
prospective research designs. The Internet Journal of Epidemiology, 3(2):339–412.
[Hayes and Petrov, 2016] Hayes, T. R. and Petrov, A. A. (2016). Mapping and cor-
recting the influence of gaze position on pupil size measurements. Behavior Research
Methods, 48(2):510–527.
[Hazlett, 2003] Hazlett, R. (2003). Measurement of user frustration: a biologic ap-
proach. In CHI’03 extended abstracts on Human factors in computing systems,
pages 734–735. ACM.
BIBLIOGRAPHY 220
[Hazlett and Hazlett, 1999] Hazlett, R. L. and Hazlett, S. Y. (1999). Emotional re-
sponse to television commercials: Facial emg vs. self-report. Journal of Advertising
Research, 39:7–24.
[He et al., 2018] He, H., She, Y., Xiahou, J., Yao, J., Li, J., Hong, Q., and Ji, Y.
(2018). Real-time eye-gaze based interaction for human intention prediction and
emotion analysis. In Proceedings of Computer Graphics International 2018, pages
185–194. ACM.
[Heller et al., 1997] Heller, W., Nitschke, J. B., and Lindsay, D. L. (1997). Neuropsy-
chological correlates of arousal in self-reported emotion. Cognition & Emotion,
11(4):383–402.
[Henderson et al., 2018] Henderson, R. R., Bradley, M. M., and Lang, P. J. (2018).
Emotional imagery and pupil diameter. Psychophysiology, 55(6):e13050.
[Hernandez-Aguila et al., 2014] Hernandez-Aguila, A., Garcia-Valdez, M., and Man-
cilla, A. (2014). Affective states in software programming: Classification of individ-
uals based on their keystroke and mouse dynamics. Intelligent Learning Environ-
ments, page 27.
[Hjortsjo, 1969] Hjortsjo, C.-H. (1969). Man’s face and mimic language. Studen lit-
teratur.
[Hjortskov et al., 2004] Hjortskov, N., Rissen, D., Blangsted, A. K., Fallentin, N.,
Lundberg, U., and Søgaard, K. (2004). The effect of mental stress on heart rate
variability and blood pressure during computer work. European journal of applied
physiology, 92(1-2):84–89.
[Hochschild, 1979] Hochschild, A. R. (1979). Emotion work, feeling rules, and social
structure. American journal of sociology, 85(3):551–575.
[Hokanson and Burgess, 1964] Hokanson, J. E. and Burgess, M. (1964). Effects of
physiological arousal level, frustration, and task complexity on performance. The
Journal of Abnormal and Social Psychology, 68(6):698.
BIBLIOGRAPHY 221
[Holmqvist et al., 2011] Holmqvist, K., Nystrom, M., Andersson, R., Dewhurst, R.,
Jarodzka, H., and Van de Weijer, J. (2011). Eye tracking: A comprehensive guide
to methods and measures. OUP Oxford.
[Hornbæk and Law, 2007] Hornbæk, K. and Law, E. L.-C. (2007). Meta-analysis of
correlations among usability measures. In Proceedings of the SIGCHI conference on
Human factors in computing systems, pages 617–626. ACM.
[Horvat et al., 2013] Horvat, M., Popovic, S., and Cosic, K. (2013). Multimedia stim-
uli databases usage patterns: a survey report. In 2013 36th International Convention
on Information and Communication Technology, Electronics and Microelectronics
(MIPRO), pages 993–997. IEEE.
[Hosseini et al., 2010] Hosseini, S. A., Khalilzadeh, M. A., Naghibi-Sistani, M. B.,
and Niazmand, V. (2010). Higher order spectra analysis of eeg signals in emotional
stress states. In Information Technology and Computer Science (ITCS), 2010 Second
International Conference on, pages 60–63. IEEE.
[Hu et al., 2009] Hu, X., Downie, J. S., and Ehmann, A. F. (2009). Lyric text mining
in music mood classification. American music, 183(5,049):2–209.
[Hudlicka, 2009] Hudlicka, E. (2009). Affective game engines: motivation and require-
ments. In Proceedings of the 4th international conference on foundations of digital
games, pages 299–306. ACM.
[Hui and Triandis, 1989] Hui, C. H. and Triandis, H. C. (1989). Effects of culture and
response format on extreme response style. Journal of cross-cultural psychology,
20(3):296–309.
[Iqbal et al., 2004] Iqbal, S. T., Zheng, X. S., and Bailey, B. P. (2004). Task-evoked
pupillary response to mental workload in human-computer interaction. In CHI’04
extended abstracts on Human factors in computing systems, pages 1477–1480. ACM.
[Izard, 1992] Izard, C. E. (1992). Basic emotions, relations among emotions, and
emotion-cognition relations. Journal of personality and social psychology.
BIBLIOGRAPHY 222
[Izard et al., 1987] Izard, C. E., Hembree, E. A., and Huebner, R. R. (1987). Infants’
emotion expressions to acute pain: Developmental change and stability of individual
differences. Developmental Psychology, 23(1):105.
[Janisse, 1974] Janisse, M. P. (1974). Pupil size, affect and exposure frequency. Social
Behavior and personality, 2(2):125–146.
[Janssen et al., 2012] Janssen, J. H., Van Den Broek, E. L., and Westerink, J. H.
D. M. (2012). Tune in to your emotions: A robust personalized affective music
player. User Modelling and User-Adapted Interaction, 22:255–279.
[Jercic et al., 2018] Jercic, P., Sennersten, C., and Lindley, C. (2018). Modeling cog-
nitive load and physiological arousal through pupil diameter and heart rate. Multi-
media Tools and Applications, pages 1–15.
[Jeronimus and Laceulle, 2017] Jeronimus, B. F. and Laceulle, O. M. (2017). Frustra-
tion, pages 1–5. Springer International Publishing, Cham.
[Jerritta et al., 2013] Jerritta, S., Murugappan, M., Wan, K., and Yaacob, S. (2013).
Emotion detection from qrs complex of ecg signals using hurst exponent for differ-
ent age groups. In Affective Computing and Intelligent Interaction (ACII), 2013
Humaine Association Conference on, pages 849–854. IEEE.
[Jin, 1992] Jin, P. (1992). Toward a reconceptualization of the law of initial value.
Psychological Bulletin, 111(1):176.
[Johnson et al., 2007] Johnson, R. B., Onwuegbuzie, A. J., and Turner, L. A. (2007).
Toward a definition of mixed methods research. Journal of mixed methods research,
1(2):112–133.
[Kahneman and Beatty, 1966] Kahneman, D. and Beatty, J. (1966). Pupil diameter
and load on memory. Science, 154(3756):1583–1585.
[Kambanaros et al., 2013] Kambanaros, M., Grohmann, K. K., and Michaelides, M.
(2013). Lexical retrieval for nouns and verbs in typically developing bilectal children.
First language, 33(2):182–199.
BIBLIOGRAPHY 223
[Kao and Poteet, 2007] Kao, A. and Poteet, S. R. (2007). Natural language processing
and text mining. Springer Science & Business Media.
[Kassem et al., 2017] Kassem, K., Salah, J., Abdrabou, Y., Morsy, M., El-Gendy, R.,
Abdelrahman, Y., and Abdennadher, S. (2017). Diva: exploring the usage of pupil
diameter to elicit valence and arousal. In Proceedings of the 16th International
Conference on Mobile and Ubiquitous Multimedia, pages 273–278. ACM.
[Kassner et al., 2014] Kassner, M., Patera, W., and Bulling, A. (2014). Pupil: an
open source platform for pervasive eye tracking and mobile gaze-based interaction.
In Proceedings of the 2014 ACM international joint conference on pervasive and
ubiquitous computing: Adjunct publication, pages 1151–1160. ACM.
[Khan et al., 2006] Khan, M. M., Ward, R. D., and Ingleby, M. (2006). Infrared
thermal sensing of positive and negative affective states. In Robotics, Automation
and Mechatronics, 2006 IEEE Conference on, pages 1–6. IEEE.
[Khan et al., 2012] Khan, M. S., Khan, I. A., and Shafi, M. (2012). Keyboard and
mouse interaction based mood measurement using artificial neural networks. In
Robotics and Artificial Intelligence (ICRAI), 2012 International Conference on,
pages 130–134. IEEE.
[Khosrowabadi et al., 2010] Khosrowabadi, R., Quek, H. C., Wahab, A., and Ang,
K. K. (2010). Eeg-based emotion recognition using self-organizing map for boundary
detection. In Pattern Recognition (ICPR), 2010 20th International Conference on,
pages 4242–4245. IEEE.
[Khosrowabadi et al., 2009] Khosrowabadi, R., Wahab, A., Ang, K. K., and Baniasad,
M. H. (2009). Affective computation on eeg correlates of emotion from musical
and vocal stimuli. In Neural Networks, 2009. IJCNN 2009. International Joint
Conference on, pages 1590–1594. IEEE.
[Kihlstrom et al., 2000] Kihlstrom, J. F., Eich, E., Sandbrand, D., and Tobias, B. A.
(2000). Emotion and memory: Implications for self-report. The science of self-
report: Implications for research and practice, pages 81–99.
BIBLIOGRAPHY 224
[Kim and Andre, 2008] Kim, J. and Andre, E. (2008). Emotion recognition based on
physiological changes in music listening. Pattern Analysis and Machine Intelligence,
IEEE Transactions on, 30(12):2067–2083.
[Kim et al., 2004a] Kim, J., Bee, N., Wagner, J., and Andre, E. (2004a). Emote to
win: Affective interactions with a computer game agent. GI Jahrestagung, 1:159–
164.
[Kim et al., 2004b] Kim, K. H., Bang, S., and Kim, S. (2004b). Emotion recognition
system using short-term monitoring of physiological signals. Medical and biological
engineering and computing, 42(3):419–427.
[Kim et al., 2010] Kim, Y. E., Schmidt, E. M., Migneco, R., Morton, B. G., Richard-
son, P., Scott, J., Speck, J. A., and Turnbull, D. (2010). Music emotion recognition:
A state of the art review. In Proc. ISMIR, pages 255–266. Citeseer.
[Kirk et al., 2012] Kirk, M., Morgan, R., Tonkin, E., McDonald, K., and Skirton, H.
(2012). An objective approach to evaluating an internet-delivered genetics educa-
tion resource developed for nurses: using google analytics to monitor global visitor
engagement. Journal of Research in Nursing, 17(6):557–579.
[Klein et al., 2002] Klein, J., Moon, Y., and Picard, R. W. (2002). This computer re-
sponds to user frustration: Theory, design, and results. Interacting with computers,
14(2):119–140.
[Kleinginna and Kleinginna, 1981] Kleinginna, P. R. and Kleinginna, A. M. (1981). A
categorized list of emotion definitions, with suggestions for a consensual definition.
Motivation and emotion, 5(4):345–379.
[Klingner, 2010] Klingner, J. (2010). Fixation-aligned pupillary response averaging.
In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications,
pages 275–282. ACM.
[Klingner et al., 2008] Klingner, J., Kumar, R., and Hanrahan, P. (2008). Measuring
the task-evoked pupillary response with a remote eye tracker. In Proceedings of the
2008 symposium on Eye tracking research & applications, pages 69–72. ACM.
BIBLIOGRAPHY 225
[Klingner et al., 2011] Klingner, J., Tversky, B., and Hanrahan, P. (2011). Effects of
visual and verbal presentation on cognitive load in vigilance, memory, and arithmetic
tasks. Psychophysiology, 48(3):323–332.
[Kolakowska, 2013] Kolakowska, A. (2013). A review of emotion recognition methods
based on keystroke dynamics and mouse movements. In Human System Interaction
(HSI), 2013 The 6th International Conference on, pages 548–555. IEEE.
[Ko lakowska et al., 2015] Ko lakowska, A., Landowska, A., Szwoch, M., Szwoch, W.,
and Wrobel, M. R. (2015). Modeling emotions for affect-aware applications. Cover
and title page designed by ESENCJA Sp. z oo, page 55.
[Korn and Bach, 2016] Korn, C. W. and Bach, D. R. (2016). A solid frame for the
window on cognition: Modeling event-related pupil responses. Journal of Vision,
16(3):28–28.
[Kosir and Strle, 2017] Kosir, A. and Strle, G. (2017). Emotion elicitation in a socially
intelligent service: The typing tutor. Computers, 6(2):14.
[Koss, 1986] Koss, M. C. (1986). Pupillary dilation as an index of central nervous
system α 2-adrenoceptor activation. Journal of pharmacological methods, 15(1):1–
19.
[Kumar and Agarwal, 2014] Kumar, A. and Agarwal, A. (2014). Emotion recognition
using anatomical information in facial expressions. In Industrial and Information
Systems (ICIIS), 2014 9th International Conference on, pages 1–6. IEEE.
[La Heij, 1988] La Heij, W. (1988). Components of stroop-like interference in picture
naming. Memory & Cognition, 16(5):400–410.
[Laeng et al., 2016] Laeng, B., Eidet, L. M., Sulutvedt, U., and Panksepp, J. (2016).
Music chills: The eye pupil as a mirror to musics soul. Consciousness and cognition,
44:161–178.
[Lang, 1990] Lang, A. (1990). Involuntary attention and physiological arousal evoked
by structural features and emotional content in tv commercials. Communication
Research, 17(3):275–299.
BIBLIOGRAPHY 226
[Lang, 2005] Lang, P. J. (2005). International affective picture system (iaps): Affective
ratings of pictures and instruction manual. Technical report.
[Lang et al., 1997] Lang, P. J., Bradley, M. M., and Cuthbert, B. N. (1997). Interna-
tional affective picture system (iaps): Technical manual and affective ratings. NIMH
Center for the Study of Emotion and Attention, pages 39–58.
[Lang et al., 1993] Lang, P. J., Greenwald, M. K., Bradley, M. M., and Hamm, A. O.
(1993). Looking at pictures: Affective, facial, visceral, and behavioral reactions.
Psychophysiology, 30(3):261–273.
[Latif et al., 2015] Latif, M. A., Yusof, H. M., Sidek, S. N., and Rusli, N. (2015).
Thermal imaging based affective state recognition. In 2015 IEEE International
Symposium on Robotics and Intelligent Sensors (IRIS), pages 214–219. IEEE.
[Law et al., 2009] Law, E. L.-C., Roto, V., Hassenzahl, M., Vermeeren, A. P., and
Kort, J. (2009). Understanding, scoping and defining user experience: a survey
approach. In Proceedings of the SIGCHI conference on human factors in computing
systems, pages 719–728. ACM.
[Lazar et al., 2006a] Lazar, J., Jones, A., Hackley, M., and Shneiderman, B. (2006a).
Severity and impact of computer user frustration: A comparison of student and
workplace users. Interacting with Computers, 18(2):187–207.
[Lazar et al., 2006b] Lazar, J., Jones, A., and Shneiderman, B. (2006b). Workplace
user frustration with computers: An exploratory investigation of the causes and
severity. Behaviour & Information Technology, 25(03):239–251.
[Lazarus et al., 1952] Lazarus, R. S., Deese, J., and Osler, S. F. (1952). The effects of
psychological stress upon performance. Psychological bulletin, 49(4):293.
[Ledoux, 1993] Ledoux, J. E. (1993). Cognition versus emotion, again-this time in the
brain: A response to parrott and schulkin. Cognition & Emotion, 7(1):61–64.
[Lee et al., 2012] Lee, B., Isenberg, P., Riche, N. H., and Carpendale, S. (2012). Be-
yond mouse and keyboard: Expanding design considerations for information visual-
ization interactions. Visualization and Computer Graphics, IEEE Transactions on,
18(12):2689–2698.
BIBLIOGRAPHY 227
[Lee et al., 2011] Lee, Y.-K., Kwon, O.-W., Shin, H. S., Jo, J., and Lee, Y. (2011).
Noise reduction of ppg signals using a particle filter for robust emotion recognition.
In Consumer Electronics-Berlin (ICCE-Berlin), 2011 IEEE International Confer-
ence on, pages 202–205. IEEE.
[Levine and Safer, 2002] Levine, L. J. and Safer, M. A. (2002). Sources of bias in
memory for emotions. Current Directions in Psychological Science, 11(5):169–173.
[Li and Chen, 2006] Li, L. and Chen, J.-h. (2006). Emotion recognition using physi-
ological signals from multiple subjects. In Intelligent Information Hiding and Mul-
timedia Signal Processing, 2006. IIH-MSP’06. International Conference on, pages
355–358. IEEE.
[Li et al., 2009] Li, M., Chai, Q., Kaixiang, T., Wahab, A., and Abut, H. (2009). Eeg
emotion recognition system. In In-vehicle corpus and signal processing for driver
behavior, pages 125–135. Springer.
[Lin et al., 2013] Lin, T., Li, X., Wu, Z., and Tang, N. (2013). Automatic cognitive
load classification using high-frequency interaction events: An exploratory study.
International Journal of Technology and Human Interaction (IJTHI), 9(3):73–88.
[Lin et al., 2010] Lin, Y.-P., Wang, C.-H., Jung, T.-P., Wu, T.-L., Jeng, S.-K., Duann,
J.-R., and Chen, J.-H. (2010). Eeg-based emotion recognition in music listening.
Biomedical Engineering, IEEE Transactions on, 57(7):1798–1806.
[Lin et al., 2007] Lin, Y.-P., Wang, C.-H., Wu, T.-L., Jeng, S.-K., and Chen, J.-H.
(2007). Multilayer perceptron for eeg signal classification during listening to emo-
tional music. In TENCON 2007-2007 IEEE Region 10 Conference, pages 1–3. IEEE.
[Liu et al., 2009] Liu, C., Agrawal, P., Sarkar, N., and Chen, S. (2009). Dynamic
difficulty adjustment in computer games through real-time anxiety-based affective
feedback. International Journal of Human-Computer Interaction, 25(6):506–529.
[Liu and Joines, 2012] Liu, S. and Joines, S. (2012). Developing a Framework of Guid-
ing Interface Design for Older Adults. Proceedings of the Human Factors and Er-
gonomics Society Annual Meeting, 56:1967–1971.
BIBLIOGRAPHY 228
[Liu et al., 2014] Liu, Y., Ritchie, J. M., Lim, T., Kosmadoudi, Z., Sivanathan, a.,
and Sung, R. C. W. (2014). A fuzzy psycho-physiological approach to enable the
understanding of an engineer’s affect status during CAD activities. CAD Computer
Aided Design, 54:19–38.
[Lu et al., 2010] Lu, C.-Y., Lin, S.-H., Liu, J.-C., Cruz-Lara, S., and Hong, J.-S.
(2010). Automatic event-level textual emotion sensing using mutual action his-
togram between entities. Expert systems with applications, 37(2):1643–1653.
[Lu et al., 2015] Lu, Y., Zheng, W.-L., Li, B., and Lu, B.-L. (2015). Combining eye
movements and eeg to enhance emotion recognition. In Twenty-Fourth International
Joint Conference on Artificial Intelligence.
[Luharuka et al., 2003] Luharuka, R., Gao, R. X., and Krishnamurty, S. (2003). De-
sign and realization of a portable data logger for physiological sensing [gsr]. Instru-
mentation and Measurement, IEEE Transactions on, 52(4):1289–1295.
[Lunn and Harper, 2010a] Lunn, D. and Harper, S. (2010a). Using galvanic skin re-
sponse measures to identify areas of frustration for older web 2.0 users. W4A 10
Proceedings of the 2010 International Cross Disciplinary Conference on Web Acces-
sibility W4A, pages 1–10.
[Lunn and Harper, 2010b] Lunn, D. and Harper, S. (2010b). Using galvanic skin re-
sponse measures to identify areas of frustration for older web 2.0 users. In Proceed-
ings of the 2010 International Cross Disciplinary Conference on Web Accessibility
(W4A), page 34. ACM.
[Mackay et al., 1978] Mackay, C., Cox, T., Burrows, G., and Lazzerini, T. (1978). An
inventory for the measurement of self-reported stress and arousal. British journal
of social and clinical psychology, 17(3):283–284.
[MacLeod, 1991] MacLeod, C. M. (1991). Half a century of research on the stroop
effect: an integrative review. Psychological bulletin, 109(2):163.
[Madan et al., 2018] Madan, C. R., Bayer, J., Gamer, M., Lonsdorf, T. B., and Som-
mer, T. (2018). Visual complexity and affect: ratings reflect more than meets the
eye. Frontiers in psychology, 8:2368.
BIBLIOGRAPHY 229
[Mampusti et al., 2011] Mampusti, E. T., Ng, J. S., Quinto, J. J. I., Teng, G. L.,
Suarez, M. T. C., and Trogo, R. S. (2011). Measuring academic affective states
of students via brainwave signals. In Knowledge and Systems Engineering (KSE),
2011 Third International Conference on, pages 226–231. IEEE.
[Mantiuk et al., 2012] Mantiuk, R., Kowalik, M., Nowosielski, A., and Bazyluk, B.
(2012). Do-it-yourself eye tracker: Low-cost pupil-based eye tracker for computer
graphics applications. In International Conference on Multimedia Modeling, pages
115–125. Springer.
[Mao and Li, 2010] Mao, X. and Li, Z. (2010). Agent based affective tutoring systems:
A pilot study. Computers & Education, 55(1):202–208.
[Martin et al., 1994] Martin, A., Wiggs, C. L., Lalonde, F., and Mack, C. (1994).
Word retrieval to letter and semantic cues: A double dissociation in normal subjects
using interference tasks. Neuropsychologia, 32(12):1487–1494.
[Martin et al., 2007] Martin, L., y Restrepo, E. G., Barrera, C., Ascaso, A. R., San-
tos, O. C., and Boticario, J. G. (2007). Usability and accessibility evaluations along
the elearning cycle. In International Conference on Web Information Systems En-
gineering, pages 453–458. Springer.
[Matentzoglu et al., 2016] Matentzoglu, N., Vigo, M., Jay, C., and Stevens, R. (2016).
Making entailment set changes explicit improves the understanding of consequences
of ontology authoring actions. In European Knowledge Acquisition Workshop, pages
432–446. Springer.
[Mathieu et al., 2013] Mathieu, N., Bonnet, S., Harquel, S., Gentaz, E., and Cam-
pagne, A. (2013). Single-trial erp classification of emotional processing. In Neu-
ral Engineering (NER), 2013 6th International IEEE/EMBS Conference on, pages
101–104. IEEE.
[Mathot, 2018] Mathot, S. (2018). Pupillometry: Psychology, physiology, and func-
tion. Journal of Cognition, 1(1).
[Matsumoto et al., 2016] Matsumoto, A., Tange, Y., Nakazawa, A., and Nishida, T.
(2016). Estimation of task difficulty and habituation effect while visual manipulation
BIBLIOGRAPHY 230
using pupillary response. In Video Analytics. Face and Facial Expression Recognition
and Audience Measurement, pages 24–35. Springer.
[Matsumoto, 1993] Matsumoto, D. (1993). Ethnic differences in affect intensity, emo-
tion judgments, display rule attitudes, and self-reported emotional expression in an
american sample. Motivation and emotion, 17(2):107–123.
[Matthews et al., 2019a] Matthews, O., Davies, A., Vigo, M., and Harper, S. (2019a).
Unobtrusive arousal detection on the web using pupillary response. International
Journal of Human-Computer Studies.
[Matthews et al., 2019b] Matthews, O., Eraslan, S., Yaneva, V., Davies, A., Yesilada,
Y., Vigo, M., and Harper, S. (2019b). Combining trending scan paths with arousal
to model visual behaviour on the web: A case study of neurotypical people vs
people with autism. In Proceedings of the 27th ACM Conference on User Modeling,
Adaptation and Personalization, pages 86–94. ACM.
[Matthews et al., 2018a] Matthews, O., Sarsenbayeva, Z., Jiang, W., Newn, J., Vel-
loso, E., Clinch, S., and Gonalves, J. (2018a). Inferring the mood of a community
from their walking speed: A preliminary study. In UbiComp/ISWC Adjunct, pages
1144–1149. ACM.
[Matthews et al., 2018b] Matthews, O., Vigo, M., and Harper, S. (2018b). Sensing
arousal and focal attention during visual interaction. In Proceedings of the 20th
ACM International Conference on Multimodal Interaction, ICMI ’18, pages 263–
267, New York, NY, USA. ACM.
[Matthews et al., 2018c] Matthews, O., Vigo, M., and Harper, S. (2018c). Sensing
arousal and focal attention during visual interaction. In ICMI, pages 263–267.
ACM.
[Matthews et al., 2018d] Matthews, O., Vigo, M., and Harper, S. (2018d). Towards
arousal sensing with high fidelity detection of visual focal attention. Measuring
behaviour.
[Mayes and Calhoun, 2007] Mayes, S. D. and Calhoun, S. L. (2007). Learning, at-
tention, writing, and processing speed in typical children and children with adhd,
BIBLIOGRAPHY 231
autism, anxiety, depression, and oppositional-defiant disorder. Child Neuropsychol-
ogy, 13(6):469–493.
[McDuff et al., 2014] McDuff, D., Gontarek, S., and Picard, R. W. (2014). Improve-
ments in remote cardiopulmonary measurement using a five band digital camera.
Biomedical Engineering, IEEE Transactions on, 61(10):2593–2601.
[McGarrigle et al., 2017] McGarrigle, R., Dawes, P., Stewart, A. J., Kuchinsky, S. E.,
and Munro, K. J. (2017). Pupillometry reveals changes in physiological arousal
during a sustained listening task. Psychophysiology, 54(2):193–203.
[McKone, 1999] McKone, K. E. (1999). Analysis of student feedback improves in-
structor effectiveness. Journal of Management Education, 23(4):396–415.
[Mehrabian, 1996] Mehrabian, A. (1996). Pleasure-arousal-dominance: A general
framework for describing and measuring individual differences in temperament. Cur-
rent Psychology, 14(4):261–292.
[Mehrabian and Russell, 1974] Mehrabian, A. and Russell, J. A. (1974). An approach
to environmental psychology. the MIT Press.
[Merrill et al., 1992] Merrill, D. C., Reiser, B. J., Ranney, M., and Trafton, J. G.
(1992). Effective tutoring techniques: A comparison of human tutors and intelligent
tutoring systems. The Journal of the Learning Sciences, 2(3):277–305.
[Michailidou et al., 2008] Michailidou, E., Harper, S., and Bechhofer, S. (2008). Visual
complexity and aesthetic perception of web pages. In Proceedings of the 26th annual
ACM international conference on Design of communication, pages 215–224. ACM.
[Mion and Poli, 2008] Mion, L. and Poli, G. D. (2008). Score-independent audio fea-
tures for description of music expression. Audio, Speech, and Language Processing,
IEEE Transactions on, 16(2):458–466.
[Mirgain and Cordova, 2007] Mirgain, S. A. and Cordova, J. V. (2007). Emotion skills
and marital health: The association between observed and self-reported emotion
skills, intimacy, and marital satisfaction. Journal of Social and Clinical Psychology,
26(9):983.
BIBLIOGRAPHY 232
[Mitra and Acharya, 2007] Mitra, S. and Acharya, T. (2007). Gesture recognition: A
survey. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE
Transactions on, 37(3):311–324.
[Monrose and Rubin, 2000] Monrose, F. and Rubin, A. D. (2000). Keystroke dynamics
as a biometric for authentication. Future Generation computer systems, 16(4):351–
359.
[Morrison et al., 2005] Morrison, D., Wang, R., De Silva, L. C., and Xu, W. (2005).
Real-time spoken affect classification and its application in call-centres. In Infor-
mation Technology and Applications, 2005. ICITA 2005. Third International Con-
ference on, volume 1, pages 483–487. IEEE.
[Moses et al., 2007] Moses, Z. B., Luecken, L. J., and Eason, J. C. (2007). Measuring
task-related changes in heart rate variability. In Engineering in Medicine and Biology
Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE,
pages 644–647. IEEE.
[Mullins and Treu, 1991] Mullins, P. M. and Treu, S. (1991). Measurement of stress
to gauge user satisfaction with features of the computer interface. Behaviour &
Information Technology, 10(4):325–343.
[Murugappan and Murugappan, 2013] Murugappan, M. and Murugappan, S. (2013).
Human emotion recognition through short time electroencephalogram (eeg) signals
using fast fourier transform (fft). In Signal Processing and its Applications (CSPA),
2013 IEEE 9th International Colloquium on, pages 289–294. IEEE.
[Nakamura et al., 1993] Nakamura, Y., Yamamoto, Y., and Muraoka, I. (1993). Au-
tonomic control of heart rate during physical exercise and fractal dimension of heart
rate variability. Journal of Applied Physiology, 74(2):875–881.
[Nakarada-Kordic and Lobb, 2005] Nakarada-Kordic, I. and Lobb, B. (2005). Effect
of perceived attractiveness of web interface design on visual search of web sites. In
Proceedings of the 6th ACM SIGCHI New Zealand chapter’s international conference
on Computer-human interaction: making CHI natural, pages 25–27. ACM.
BIBLIOGRAPHY 233
[Nesbitt et al., 2015] Nesbitt, K., Blackmore, K., Hookham, G., Kay-Lambkin, F.,
and Walla, P. (2015). Using the startle eye-blink to measure affect in players. In
Serious Games Analytics, pages 401–434. Springer.
[Neuper et al., 2003] Neuper, C., Muller, G., Kubler, A., Birbaumer, N., and
Pfurtscheller, G. (2003). Clinical application of an eeg-based brain–computer in-
terface: a case study in a patient with severe motor impairment. Clinical neuro-
physiology, 114(3):399–409.
[Nguyen et al., 2015] Nguyen, A.-T., Chen, W., and Rauterberg, M. (2015). Intelli-
gent presentation skills trainer analyses body movement. In Advances in Computa-
tional Intelligence, pages 320–332. Springer.
[Nichols and Maner, 2008] Nichols, A. L. and Maner, J. K. (2008). The good-subject
effect: Investigating participant demand characteristics. The Journal of general
psychology, 135(2):151–166.
[Norcross et al., 1984] Norcross, J. C., Guadagnoli, E., and Prochaska, J. O. (1984).
Factor structure of the profile of mood states (poms): two partial replications.
Journal of clinical psychology, 40(5):1270–1277.
[Oatley et al., 2006] Oatley, K., Keltner, D., and Jenkins, J. M. (2006). Understanding
emotions. Blackwell publishing.
[Oliveira et al., 2009] Oliveira, F. T., Aula, A., and Russell, D. M. (2009). Discrimi-
nating the relevance of web search results with measures of pupil size. In Proceedings
of the SIGCHI Conference on Human Factors in Computing Systems, pages 2209–
2212. ACM.
[Olson and Olson, 2003] Olson, G. M. and Olson, J. S. (2003). Human-computer in-
teraction: Psychological aspects of the human use of computing. Annual review of
psychology, 54(1):491–516.
[Omata et al., 2012] Omata, M., Moriwaki, K., Mao, X., Kanuka, D., and Imamiya,
A. (2012). Affective rendering: Visual effect animations for affecting user arousal.
In 2012 International Conference on Multimedia Computing and Systems, pages
737–742. IEEE.
BIBLIOGRAPHY 234
[Ortony et al., 1990] Ortony, A., Clore, G. L., and Collins, A. (1990). The cognitive
structure of emotions. Cambridge university press.
[Obrien and Toms, 2013] Obrien, H. L. and Toms, E. G. (2013). Examining the gen-
eralizability of the user engagement scale (ues) in exploratory search. Information
Processing & Management, 49(5):1092–1107.
[Paas and Van Merrienboer, 1994] Paas, F. G. and Van Merrienboer, J. J. (1994).
Instructional control of cognitive load in the training of complex cognitive tasks.
Educational psychology review, 6(4):351–371.
[Pan et al., 2004] Pan, B., Hembrooke, H. A., Gay, G. K., Granka, L. A., Feusner,
M. K., and Newman, J. K. (2004). The determinants of web page viewing behav-
ior: an eye-tracking study. In Proceedings of the 2004 symposium on Eye tracking
research & applications, pages 147–154. ACM.
[Pantic et al., 2007] Pantic, M., Pentland, A., Nijholt, A., and Huang, T. S. (2007).
Human computing and machine understanding of human behavior: a survey. In
Artifical Intelligence for Human Computing, pages 47–71. Springer.
[Park and Kim, 2016] Park, S. M. and Kim, D. S. (2016). Human emotion decoding
using eye tracking and fmri. In Organization for Human Brain Mapping 2016.
Organization for Human Brain Mapping 2016.
[Parrott and Schulkin, 1993] Parrott, W. G. and Schulkin, J. (1993). Neuropsychology
and the cognitive nature of the emotions. Cognition & Emotion, 7(1):43–59.
[Partala et al., 2000] Partala, T., Jokiniemi, M., and Surakka, V. (2000). Pupillary
responses to emotionally provocative stimuli. In Proceedings of the 2000 symposium
on Eye tracking research & applications, pages 123–129. ACM.
[Partala and Surakka, 2003] Partala, T. and Surakka, V. (2003). Pupil size variation
as an indication of affective processing. International journal of human-computer
studies, 59(1):185–198.
[Partala and Surakka, 2004] Partala, T. and Surakka, V. (2004). The effects of af-
fective interventions in human–computer interaction. Interacting with computers,
16(2):295–309.
BIBLIOGRAPHY 235
[Paulhus and Vazire, 2007] Paulhus, D. L. and Vazire, S. (2007). The self-report
method. Handbook of research methods in personality psychology, 1:224–239.
[Pengnate, 2016] Pengnate, S. F. (2016). Measuring emotional arousal in clickbait:
eye-tracking approach.
[Peter, 2010] Peter, P. C. (2010). Emotional intelligence. Wiley International Ency-
clopedia of Marketing.
[Petrantonakis and Hadjileontiadis, 2010] Petrantonakis, P. C. and Hadjileontiadis,
L. J. (2010). Emotion recognition from brain signals using hybrid adaptive fil-
tering and higher order crossings analysis. Affective Computing, IEEE Transactions
on, 1(2):81–97.
[Pfleging et al., 2016] Pfleging, B., Fekety, D. K., Schmidt, A., and Kun, A. L. (2016).
A model relating pupil diameter to mental workload and lighting conditions. In
Proceedings of the 2016 CHI conference on human factors in computing systems,
pages 5776–5788. ACM.
[Philip et al., 2010] Philip, R., Whalley, H., Stanfield, A., Sprengelmeyer, R., Santos,
I., Young, A., Atkinson, A., Calder, A., Johnstone, E., Lawrie, S., et al. (2010).
Deficits in facial, body movement and vocal emotional processing in autism spectrum
disorders. Psychological medicine, 40(11):1919–1929.
[Picard, 1997] Picard, R. W. (1997). Affective computing, volume 252. MIT press
Cambridge.
[Picard, 2003] Picard, R. W. (2003). Affective computing: challenges. International
Journal of Human-Computer Studies, 59(1-2):55–64.
[Picard, 2010] Picard, R. W. (2010). Affective computing: from laughter to ieee. IEEE
Transactions on Affective Computing, 1(1):11–17.
[Plutchik, 1980] Plutchik, R. (1980). A general psychoevolutionary theory of emotion.
Theories of emotion, 1:3–31.
BIBLIOGRAPHY 236
[Plutchik, 2001] Plutchik, R. (2001). The nature of emotions human emotions have
deep evolutionary roots, a fact that may explain their complexity and provide tools
for clinical practice. American Scientist, 89(4):344–350.
[Prendinger and Ishizuka, 2005] Prendinger, H. and Ishizuka, M. (2005). The em-
pathic companion: A character-based interface that addresses users’affective states.
Applied Artificial Intelligence, 19(3-4):267–285.
[Psychlopedia, 2018] Psychlopedia (2018).
[Pusara and Brodley, 2004] Pusara, M. and Brodley, C. E. (2004). User re-
authentication via mouse movements. In Proceedings of the 2004 ACM workshop
on Visualization and data mining for computer security, pages 1–8. ACM.
[Qi et al., 2001] Qi, Y., Reynolds, C., and Picard, R. W. (2001). The bayes point
machine for computer-user frustration detection via pressuremouse. In Proceedings
of the 2001 workshop on Perceptive user interfaces, pages 1–5. ACM.
[Quazi et al., 2012] Quazi, M., Mukhopadhyay, S., Suryadevara, N., and Huang, Y.
(2012). Towards the smart sensors based human emotion recognition. In Instru-
mentation and Measurement Technology Conference (I2MTC), 2012 IEEE Interna-
tional, pages 2365–2370. IEEE.
[Ragot et al., 2017] Ragot, M., Martin, N., Em, S., Pallamin, N., and Diverrez, J.-M.
(2017). Emotion recognition using physiological signals: laboratory vs. wearable
sensors. In International Conference on Applied Human Factors and Ergonomics,
pages 15–22. Springer.
[Raiturkar et al., 2016] Raiturkar, P., Kleinsmith, A., Keil, A., Banerjee, A., and Jain,
E. (2016). Decoupling light reflex from pupillary dilation to measure emotional
arousal in videos. In Proceedings of the ACM Symposium on Applied Perception,
pages 89–96. ACM.
[Rani et al., 2005] Rani, P., Sarkar, N., and Liu, C. (2005). Maintaining optimal chal-
lenge in computer games through real-time physiological feedback. In Proceedings
of the 11th international conference on human computer interaction, volume 58.
BIBLIOGRAPHY 237
[Reeshad Khan, 2017] Reeshad Khan, O. S. (2017). A literature review on emotion
recognition using various methods. Global Journal of Computer Science and Tech-
nology.
[Ren et al., 2013] Ren, P., Barreto, A., Gao, Y., and Adjouadi, M. (2013). Affective
assessment by digital processing of the pupil diameter. Affective Computing, IEEE
Transactions on, 4(1):2–14.
[Ricketts et al., 2013] Ricketts, J., Jones, C. R., Happe, F., and Charman, T. (2013).
Reading comprehension in autism spectrum disorders: The role of oral language and
social functioning. Journal of autism and developmental disorders, 43(4):807–816.
[Rizk et al., 2014] Rizk, Y., Safieddine, M., Matchoulian, D., and Awad, M. (2014).
Face2mus: a facial emotion based internet radio tuner application. In Mediterranean
Electrotechnical Conference (MELECON), 2014 17th IEEE, pages 257–261. IEEE.
[Rosa, 2015] Rosa, P. (2015). What do your eyes say? bridging eye movements to
consumer behavior. International Journal of Psychological Research, 8(2):90–103.
[Ross and Mirowsky, 1984] Ross, C. E. and Mirowsky, J. (1984). Socially-desirable
response and acquiescence in a cross-cultural survey of mental health. Journal of
Health and Social Behavior, pages 189–197.
[Russell, 1994] Russell, J. A. (1994). Is there universal recognition of emotion from
facial expression? a review of the cross-cultural studies. Psychological bulletin,
115(1):102.
[Russell and Mehrabian, 1977] Russell, J. A. and Mehrabian, A. (1977). Evidence for
a three-factor theory of emotions. Journal of research in Personality, 11(3):273–294.
[Russell and Pratt, 1980] Russell, J. A. and Pratt, G. (1980). A description of the
affective quality attributed to environments. Journal of personality and social psy-
chology, 38(2):311.
[San Agustin et al., 2010] San Agustin, J., Skovsgaard, H., Mollenbach, E., Barret,
M., Tall, M., Hansen, D. W., and Hansen, J. P. (2010). Evaluation of a low-cost
open-source gaze tracker. In Proceedings of the 2010 Symposium on Eye-Tracking
Research & Applications, pages 77–80. ACM.
BIBLIOGRAPHY 238
[Sanchez et al., 2018] Sanchez, W., Martinez, A., Hernandez, Y., Estrada, H., and
Gonzalez-Mendoza, M. (2018). A predictive model for stress recognition in desk
jobs. Journal of Ambient Intelligence and Humanized Computing, pages 1–13.
[Sanghoon and Roberto, 2005] Sanghoon, A. L. B. D. W. and Roberto, P. E. S. (2005).
The impact of frustration-mitigating messages delivered by an interface agent. Ar-
tificial intelligence in education: supporting learning through intelligent and socially
informed technology, 125:73.
[Sarrafzadeh et al., 2006] Sarrafzadeh, A., Alexander, S., Dadgostar, F., Fan, C., and
Bigdeli, A. (2006). See me, teach me: Facial expression and gesture recognition for
intelligent tutoring systems. In Innovations in Information Technology, 2006, pages
1–5. IEEE.
[Sarrafzadeh et al., 2008] Sarrafzadeh, A., Alexander, S., Dadgostar, F., Fan, C., and
Bigdeli, A. (2008). how do you know that i dont understand? a look at the future
of intelligent tutoring systems. Computers in Human Behavior, 24(4):1342–1363.
[Savran et al., 2013] Savran, A., Gur, R., and Verma, R. (2013). Automatic detection
of emotion valence on faces using consumer depth cameras. In Proceedings of the
IEEE International Conference on Computer Vision Workshops, pages 75–82.
[Savva and Bianchi-Berthouze, 2011] Savva, N. and Bianchi-Berthouze, N. (2011).
Automatic recognition of affective body movement in a video game scenario. In
International Conference on Intelligent Technologies for interactive entertainment,
pages 149–159. Springer.
[Schlosberg, 1954] Schlosberg, H. (1954). Three dimensions of emotion. Psychological
review, 61(2):81.
[Scholtz, 2006] Scholtz, J. (2006). Metrics for evaluating human information interac-
tion systems. Interacting with Computers, 18(4):507–527.
[Schroder et al., 2005] Schroder, H., Berghaus, N., and Zimmermann, G. (2005). Das
blickverhalten der kunden als grundlage fur die warenplatzierung im lebensmit-
teleinzelhandel. der markt, 44(1):31–43.
BIBLIOGRAPHY 239
[Schuller et al., 2010] Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz,
A., Wendemuth, A., and Rigoll, G. (2010). Cross-corpus acoustic emotion recog-
nition: Variances and strategies. IEEE Transactions on Affective Computing,
1(2):119–131.
[Schwark, 2015] Schwark, J. D. (2015). Toward a taxonomy of affective computing.
International Journal of Human-Computer Interaction, 31(11):761–768.
[Setyati et al., 2012] Setyati, E., Suprapto, Y. K., and Purnomo, M. H. (2012). Fa-
cial emotional expressions recognition based on active shape model and radial basis
function network. In Computational Intelligence for Measurement Systems and Ap-
plications (CIMSA), 2012 IEEE International Conference on, pages 41–46. IEEE.
[Shao et al., 2015] Shao, Z., Roelofs, A., Martin, R. C., and Meyer, A. S. (2015).
Selective inhibition and naming performance in semantic blocking, picture-word
interference, and color–word stroop tasks. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 41(6):1806.
[Sharma et al., 2013] Sharma, N., Dhall, A., Gedeon, T., and Goecke, R. (2013). Mod-
eling stress using thermal facial patterns: A spatio-temporal approach. In Affective
Computing and Intelligent Interaction (ACII), 2013 Humaine Association Confer-
ence on, pages 387–392. IEEE.
[Shelley, 2007] Shelley, K. H. (2007). Photoplethysmography: beyond the calculation
of arterial oxygen saturation and heart rate. Anesthesia & Analgesia, 105(6):S31–
S36.
[Shi et al., 2007] Shi, Y., Ruiz, N., Taib, R., Choi, E., and Chen, F. (2007). Galvanic
skin response (gsr) as an index of cognitive load. In CHI’07 extended abstracts on
Human factors in computing systems, pages 2651–2656. ACM.
[Sidney et al., 2005] Sidney, K. D., Craig, S. D., Gholson, B., Franklin, S., Picard,
R., and Graesser, A. C. (2005). Integrating affect sensors in an intelligent tutoring
system. In Affective Interactions: The Computer in the Affective Loop Workshop
at, pages 7–13.
BIBLIOGRAPHY 240
[Simola et al., 2015] Simola, J., Le Fevre, K., Torniainen, J., and Baccino, T. (2015).
Affective processing in natural scene viewing: Valence and arousal interactions in
eye-fixation-related potentials. NeuroImage, 106:21–33.
[Simon and Nath, 2004] Simon, R. W. and Nath, L. E. (2004). Gender and emotion in
the united states: Do men and women differ in self-reports of feelings and expressive
behavior? 1. American journal of sociology, 109(5):1137–1176.
[Sioni and Chittaro, 2015] Sioni, R. and Chittaro, L. (2015). Stress detection using
physiological sensors. Computer, 48(10):26–33.
[Siraj et al., 2006] Siraj, F., Yusoff, N., and Kee, L. C. (2006). Emotion classification
using neural network. In Computing & Informatics, 2006. ICOCI’06. International
Conference on, pages 1–7. IEEE.
[Sirois and Brisson, 2014] Sirois, S. and Brisson, J. (2014). Pupillometry. Wiley In-
terdisciplinary Reviews: Cognitive Science, 5(6):679–692.
[Slanzi et al., 2017] Slanzi, G., Balazs, J. A., and Velasquez, J. D. (2017). Combining
eye tracking, pupil dilation and eeg analysis for predicting web users click intention.
Information Fusion, 35:51–57.
[Snowden et al., 2016] Snowden, R. J., O’Farrell, K. R., Burley, D., Erichsen, J. T.,
Newton, N. V., and Gray, N. S. (2016). The pupil’s response to affective pic-
tures: Role of image duration, habituation, and viewing mode. Psychophysiology,
53(8):1217–1223.
[Sobkowicz et al., 2012] Sobkowicz, P., Kaschesky, M., and Bouchard, G. (2012).
Opinion mining in social media: Modeling, simulating, and forecasting political
opinions in the web. Government Information Quarterly, 29(4):470–479.
[Soleymani et al., 2008] Soleymani, M., Chanel, G., Kierkels, J. J., and Pun, T.
(2008). Affective ranking of movie scenes using physiological signals and content
analysis. In Proceedings of the 2nd ACM workshop on Multimedia semantics, pages
32–39. ACM.
BIBLIOGRAPHY 241
[Soleymani et al., 2012] Soleymani, M., Pantic, M., and Pun, T. (2012). Multimodal
emotion recognition in response to videos. IEEE transactions on affective computing,
3(2):211–223.
[Sommer et al., 2014] Sommer, N., Hirshfield, L., and Velipasalar, S. (2014). Our
emotions as seen through a webcam. In Foundations of Augmented Cognition. Ad-
vancing Human Performance and Decision-Making through Adaptive Systems, pages
78–89. Springer.
[Steephen et al., 2018] Steephen, J. E., Obbineni, S. C., Kummetha, S., and Bapi,
R. S. (2018). An affective adaptation model explaining the intensity-duration rela-
tionship of emotion. IEEE Transactions on Affective Computing.
[Steinhauer and Hakerem, 1992] Steinhauer, S. R. and Hakerem, G. (1992). The pupil-
lary response in cognitive psychophysiology and schizophrenia. Annals of the New
York Academy of Sciences, 658(1):182–204.
[Steunebrink et al., 2009] Steunebrink, B. R., Dastani, M., and Meyer, J.-J. C. (2009).
The occ model revisited. In Proc. of the 4th Workshop on Emotion and Computing.
[Stieger et al., 2017] Stieger, S., Lewetz, D., and Reips, U.-D. (2017). Can smart-
phones be used to bring computer-based tasks from the lab to the field? a mobile
experience-sampling method study about the pace of life. Behavior Research Meth-
ods.
[Storms and Spector, 1987] Storms, P. L. and Spector, P. E. (1987). Relationships
of organizational frustration with reported behavioural reactions: The moderating
effect of locus of control. Journal of occupational psychology, 60(3):227–234.
[Sun et al., 2010] Sun, F.-T., Kuo, C., Cheng, H.-T., Buthpitiya, S., Collins, P., and
Griss, M. (2010). Activity-aware mental stress detection using physiological sensors.
In International Conference on Mobile Computing, Applications, and Services, pages
282–301. Springer.
[Sweller, 1994] Sweller, J. (1994). Cognitive load theory, learning difficulty, and in-
structional design. Learning and instruction, 4(4):295–312.
BIBLIOGRAPHY 242
[Szasz et al., 2011] Szasz, P. L., Szentagotai, A., and Hofmann, S. G. (2011). The
effect of emotion regulation strategies on anger. Behaviour research and therapy,
49(2):114–119.
[Tangnimitchok et al., 2018] Tangnimitchok, S., Nonnarit, O., Ratchatanantakit, N.,
Barreto, A., Ortega, F. R., Rishe, N. D., et al. (2018). A system for non-intrusive
affective assessment in the circumplex model from pupil diameter and facial ex-
pression monitoring. In International Conference on Human-Computer Interaction,
pages 465–477. Springer.
[Torres-Valencia et al., 2014] Torres-Valencia, C. A., Garcia-Arias, H. F., Lopez, M.
A. A., and Orozco-Gutierrez, A. A. (2014). Comparative analysis of physiological
signals and electroencephalogram (eeg) for multimodal emotion recognition using
generative models. In Image, Signal Processing and Artificial Vision (STSIVA),
2014 XIX Symposium on, pages 1–5. IEEE.
[Tottenham et al., 2009] Tottenham, N., Tanaka, J. W., Leon, A. C., McCarry, T.,
Nurse, M., Hare, T. A., Marcus, D. J., Westerlund, A., Casey, B., and Nelson, C.
(2009). The nimstim set of facial expressions: judgments from untrained research
participants. Psychiatry research, 168(3):242–249.
[Valstar and Pantic, 2012] Valstar, M. F. and Pantic, M. (2012). Fully automatic
recognition of the temporal phases of facial actions. Systems, Man, and Cybernetics,
Part B: Cybernetics, IEEE Transactions on, 42(1):28–43.
[van den Brink et al., 2016] van den Brink, R. L., Murphy, P. R., and Nieuwenhuis,
S. (2016). Pupil diameter tracks lapses of attention. PLoS One, 11(10):e0165274.
[van der Wel and van Steenbergen, 2018] van der Wel, P. and van Steenbergen, H.
(2018). Pupil dilation as an index of effort in cognitive control tasks: A review.
Psychonomic bulletin & review, 25(6):2005–2015.
[Van Gerven et al., 2004] Van Gerven, P. W., Paas, F., Van Merrienboer, J. J., and
Schmidt, H. G. (2004). Memory load and the cognitive pupillary response in aging.
Psychophysiology, 41(2):167–174.
BIBLIOGRAPHY 243
[Van Kleef, 2009] Van Kleef, G. A. (2009). How emotions regulate social life: The
emotions as social information (easi) model. Current directions in psychological
science, 18(3):184–188.
[Van Schaik and Ling, 2008] Van Schaik, P. and Ling, J. (2008). Modelling user ex-
perience with web sites: Usability, hedonic value, beauty and goodness. Interacting
with Computers, 20(3):419–432.
[Vega et al., 2018] Vega, J., Couth, S., Poliakoff, E., Kotz, S., Sullivan, M., Jay, C.,
Vigo, M., and Harper, S. (2018). Back to analogue: Self-reporting for parkinson’s
disease. In Proceedings of the 2018 CHI Conference on Human Factors in Computing
Systems, page 74. ACM.
[Visuri et al., 2018] Visuri, A., Asare, K. O., Kuosmanen, E., Nishiyama, Y., Ferreira,
D., Sarsenbayeva, Z., Gonalves, J., van Berkel, N., Wadley, G., Kostakos, V., Clinch,
S., Matthews, O., Harper, S., Jenkins, A., Snow, S., and m. c. schraefel (2018).
Ubiquitous mobile sensing: Behaviour, mood, and environment. In UbiComp/ISWC
Adjunct, pages 1140–1143. ACM.
[Vizer et al., 2009] Vizer, L. M., Zhou, L., and Sears, A. (2009). Automated stress de-
tection using keystroke and linguistic features: An exploratory study. International
Journal of Human-Computer Studies, 67(10):870–886.
[Volkmar et al., 2014] Volkmar, F., Siegel, M., Woodbury-Smith, M., King, B., Mc-
Cracken, J., State, M., of Child, A. A., et al. (2014). Practice parameter for the as-
sessment and treatment of children and adolescents with autism spectrum disorder.
Journal of the American Academy of Child & Adolescent Psychiatry, 53(2):237–257.
[Wahn et al., 2016] Wahn, B., Ferris, D. P., Hairston, W. D., and Konig, P. (2016).
Pupil sizes scale with attentional load and task experience in a multiple object
tracking task. PloS one, 11(12):e0168087.
[Walker et al., 1990] Walker, H. K., Hall, W. D., and Hurst, J. W. (1990). Cranial
nerves iii, iv, and vi: The oculomotor, trochlear, and abducens nerves. Clinical
methods: the history, physical, and laboratory examinations.
BIBLIOGRAPHY 244
[Wang et al., 2018] Wang, C.-A., Baird, T., Huang, J., Coutinho, J. D., Brien, D. C.,
and Munoz, D. P. (2018). Arousal effects on pupil size, heart rate, and skin con-
ductance in an emotional face task. Frontiers in neurology, 9.
[Wang et al., 2019] Wang, J., Fu, E. Y., Ngai, G., Leong, H. V., and Huang, M. X.
(2019). Detecting stress from mouse-gaze attraction. In Proceedings of the 34th
ACM/SIGAPP Symposium on Applied Computing, pages 692–700. ACM.
[Wang, 2011] Wang, J. T.-y. (2011). Pupil dilation and eye tracking. A handbook
of process tracing methods for decision research: A critical review and users guide,
page 188.
[Wang et al., 2013] Wang, W., Li, Z., Wang, Y., and Chen, F. (2013). Indexing cogni-
tive workload based on pupillary response under luminance and emotional changes.
In Proceedings of the 2013 international conference on Intelligent user interfaces,
pages 247–256. ACM.
[Ward and Marsden, 2004] Ward, R. D. and Marsden, P. H. (2004). Affective comput-
ing: problems, reactions and intentions. Interacting with Computers, 16(4):707–713.
[Watson et al., 1988] Watson, D., Clark, L. A., and Tellegen, A. (1988). Development
and validation of brief measures of positive and negative affect: the panas scales.
Journal of personality and social psychology, 54(6):1063.
[WELFORD, 1973] WELFORD, A. T. (1973). Stress and performance. Ergonomics,
16(5):567–580.
[Wenger et al., 1961] Wenger, M. A., Clemens, T., Coleman, D., Cullen, T., and En-
gel, B. T. (1961). Autonomic response specificity. Psychosomatic medicine.
[Wilder, 1958] Wilder, J. (1958). Modern psychophysiology and the law of initial
value. American Journal of Psychotherapy.
[Wilder, 2014] Wilder, J. (2014). Stimulus and response: The law of initial value.
Elsevier.
[Wolf et al., 2018] Wolf, E., Martinez, M., Roitberg, A., Stiefelhagen, R., and Deml,
B. (2018). Estimating mental load in passive and active tasks from pupil and gaze
BIBLIOGRAPHY 245
changes using bayesian surprise. In Proceedings of the Workshop on Modeling Cog-
nitive Processes from Multimodal Data, page 6. ACM.
[Wolpaw and McFarland, 1994] Wolpaw, J. R. and McFarland, D. J. (1994). Mul-
tichannel eeg-based brain-computer communication. Electroencephalography and
clinical Neurophysiology, 90(6):444–449.
[Wulfert et al., 2005] Wulfert, E., Roland, B. D., Hartley, J., Wang, N., and Franco,
C. (2005). Heart rate arousal and excitement in gambling: winners versus losers.
Psychology of Addictive Behaviors, 19(3):311.
[Xing et al., 2016] Xing, B., Zhang, L., Gao, J., Yu, R., and Lyu, R. (2016). Barrier-
free affective communication in mooc study by analyzing pupil diameter variation.
In SIGGRAPH ASIA 2016 Symposium on Education, page 7. ACM.
[Xu et al., 2015] Xu, C., Feng, Z., and Meng, Z. (2015). Affective experience modelling
based on interactive synergetic dependence in big data. Future Generation Computer
Systems.
[Xu et al., 2016] Xu, C., Feng, Z., and Meng, Z. (2016). Affective experience modeling
based on interactive synergetic dependence in big data. Future Generation Computer
Systems, 54:507–517.
[Xu, 2015] Xu, Q. (2015). Examining user engagement attributes in visual information
search. iConference 2015 Proceedings.
[Yaneva, 2016] Yaneva, V. (2016). Assessing text and web accessibility for people with
autism spectrum disorder. PhD thesis, University of Wolverhampton.
[Yaneva and Evans, 2015] Yaneva, V. and Evans, R. (2015). Six good predictors of
autistic text comprehension. In Proceedings of the International Conference Recent
Advances in Natural Language Processing, pages 697–706.
[Yaneva et al., 2016a] Yaneva, V., Evans, R., and Temnikova, I. (2016a). Predicting
reading difficulty for readers with autism spectrum disorder. In Proceedings of
Workshop on Improving Social Inclusion using NLP: Tools and Resources (ISI-NLP)
held in conjunction with LREC.
BIBLIOGRAPHY 246
[Yaneva et al., 2018] Yaneva, V., Ha, L. A., Eraslan, S., Yesilada, Y., and Mitkov, R.
(2018). Detecting autism based on eye-tracking data from web searching tasks. In
Proceedings of the Internet of Accessible Things, page 16. ACM.
[Yaneva et al., 2017] Yaneva, V., Orasan, C., Evans, R., and Rohanian, O. (2017).
Combining multiple corpora for readability assessment for people with cognitive
disabilities. In Proceedings of the 12th Workshop on Innovative Use of NLP for
Building Educational Applications, pages 121–132.
[Yaneva et al., 2015] Yaneva, V., Temnikova, I., and Mitkov, R. (2015). Accessible
texts for autism: An eye-tracking study. In Proceedings of the 17th International
ACM SIGACCESS Conference on Computers & Accessibility, pages 49–57. ACM.
[Yaneva et al., 2016b] Yaneva, V., Temnikova, I. P., and Mitkov, R. (2016b). A corpus
of text data and gaze fixations from autistic and non-autistic adults. In LREC.
[Yaneva et al., 2016c] Yaneva, V., Temnikova, I. P., and Mitkov, R. (2016c). Evaluat-
ing the readability of text simplification output for readers with cognitive disabilities.
In LREC.
[Yang and Chen, 2012] Yang, Y.-H. and Chen, H. H. (2012). Machine recognition of
music emotion: A review. ACM Transactions on Intelligent Systems and Technology
(TIST), 3(3):40.
[Yazdani et al., 2012] Yazdani, A., Lee, J.-S., Vesin, J.-M., and Ebrahimi, T. (2012).
Affect recognition based on physiological changes during the watching of music
videos. ACM Transactions on Interactive Intelligent Systems, 2(1):1–26.
[Zaalberg et al., 2004] Zaalberg, R., Manstead, A., and Fischer, A. (2004). Relations
between emotions, display rules, social motives, and facial behaviour. Cognition and
Emotion, 18(2):183–207.
[Zeng et al., 2008] Zeng, Z., Pantic, M., Roisman, G. I., and Huang, T. S. (2008). A
survey of affect recognition methods: Audio, visual, and spontaneous expressions.
IEEE transactions on pattern analysis and machine intelligence, 31(1):39–58.
BIBLIOGRAPHY 247
[Zhai and Barreto, 2006] Zhai, J. and Barreto, A. (2006). Stress detection in com-
puter users based on digital signal processing of noninvasive physiological variables.
In Engineering in Medicine and Biology Society, 2006. EMBS’06. 28th Annual In-
ternational Conference of the IEEE, pages 1355–1358. IEEE.
[Zhang et al., 2018] Zhang, H., Gashi, S., Kimm, H., Hanci, E., and Matthews, O.
(2018). Moodbook: An application for continuous monitoring of social media usage
and mood. In UbiComp/ISWC Adjunct, pages 1150–1155. ACM.
[Zhang et al., 2014] Zhang, H., Zhu, Y., Maniyeri, J., and Guan, C. (2014). Detection
of variations in cognitive workload using multi-modality physiological sensors and a
large margin unbiased regression machine. In Engineering in Medicine and Biology
Society (EMBC), 2014 36th Annual International Conference of the IEEE, pages
2985–2988. IEEE.
List of Acronyms
AFA Algorithm for sensing Arousal and Focal Attention. 93
BVP Blood Volume Pressure. 51, 188
ECG Electrocardiography. 34, 39, 73–75, 82–84
EEG Electroencephalogram. 18, 34, 39, 60, 189
EMG Electromyography. 18, 34, 40, 59
GSR Galvanic Skin Response. 18, 34, 39, 51, 59, 188
HR Heart Rate. 18, 34, 39, 59, 188
KD Keystroke Dynamics. 33
MD Mouse Dynamics. 33
NLP Natural Language Processing. 33
PANAS Positive Affect and Negative Affect scales. 35
POMS Profile Of Mood State. 35
PPG Photoplethysmogram. 39, 41
SAM Self-Assessment Manikin. 35
ST Skin Temperature. 34, 51
STA Scanpath Trending Analysis. 27
VAS Visual analogue scale. 35
248
02/06/2017, Version 3.0
School of Computer Science
Emotion Sensing Using Pupil Dilation and Eye tracking
Participant Information Sheet
You are being invited to take part in a research study as part of a PhD study to use pupil dilation and
eye tracking to measure emotions in interactive systems. Before you decide, it is important for you to
understand why the research is being done and what it will involve. Please take time to read the
following information carefully and discuss it with others if you wish. Please ask if there is anything
that is not clear or if you would like more information. Take time to decide whether or not you wish to
take part. Thank you for taking the time to read this.
Who will conduct the research?
Oludamilare Matthews
What is the purpose of the research?
The aim of this research is to build an algorithm for measuring emotions. This experiment serves as a
means to evaluate the accuracy of our algorithm using already rated stimulus.
Why have I been chosen?
We are inviting members of the public to take part in this study so that we can evaluate our algorithm.
What would I be asked to do if I took part?
You will be asked to look at a set of pictures for as long as you would normally do. An eye tracker,
located on the monitor will capture your pupil size and gaze data. Afterwards, you will be required to
rate the images according to how you feel about them.
What happens to the data collected?
Electronic data will be stored securely on a computer. Written information will be stored in a locked
drawer. The data will be analyzed and the results will be used in preparation for my dissertation.
How is confidentiality maintained?
Data will be made anonymous. The personal data collected is for consent alone and no one will be
able to match it with the data collected by the eye tracker. Furthermore, the consent form will be kept
separately in a secure file cabinet separate from the data collected from the eye tracker which will be
stored on a secure server within the University of Manchester.
What happens if I do not want to take part or if I change my mind?
It is up to you to decide whether or not to take part. If you do decide to take part, you will be given this
information sheet to keep and be asked to sign a consent form. If you decide not to take part, it is up
to you and there are no adverse consequences to you for this decision. If you decide to take part, you
are still free to withdraw at any point during the experiment without giving a reason and without
detriment to yourself. At any point you decide not to partake in the study (before or during the study),
your data will not be stored and no record of this will be taken.
02/06/2017, Version 3.0
Emotion Sensing Using Pupil Dilation and Eye tracking
CONSENT FORM
If you are happy to participate please complete and sign the consent form below.
Please initial box
Participant No.:
Gender: Age: Profession: Highest level of qualification:
I agree to take part in the above project
Name of participant
Date Signature
Name of researcher
Date Signature
This Project Has Been Approved by the University of Manchester’s Research Ethics Committee [2017-1906-3160].
1. I confirm that I have read the attached information sheet on the above project and have had the opportunity to consider the information and ask questions and had these answered satisfactorily.
2. I understand that my participation in the study is voluntary and that I am free to withdraw at any time without giving a reason and without detriment to my treatment/service/self and my data will not be stored.
3. I understand that my personal data will remain confidential and untraceable to the electronic data recorded.
Please rate the images you just viewed according to how much arousal (stress, anxiety, cognitive load,
fear, excitement) you felt.
Dog
1-Very low 2- Low 3- Medium 4- high 5-Very high
Basket
1-Very low 2- Low 3- Medium 4- high 5-Very high
Woman
1-Very low 2- Low 3- Medium 4- high 5-Very high
Couple
1-Very low 2- Low 3- Medium 4- high 5-Very high
Roller coaster
1-Very low 2- Low 3- Medium 4- high 5-Very high
Woman carrying baby
1-Very low 2- Low 3- Medium 4- high 5-Very high
Dirty foot
1-Very low 2- Low 3- Medium 4- high 5-Very high
Fallen Boxer
1-Very low 2- Low 3- Medium 4- high 5-Very high
Boy
1-Very low 2- Low 3- Medium 4- high 5-Very high
Lamp
1-Very low 2- Low 3- Medium 4- high 5-Very high
Bear
1-Very low 2- Low 3- Medium 4- high 5-Very high
Mother & baby
1-Very low 2- Low 3- Medium 4- high 5-Very high
School of Computer Science
Effects of cognitive tasks on Pupil Dilation and Eye movement
Participant Information Sheet
You are being invited to take part in a research study. Before you decide it is important for you to
understand why the research is being done and what it will involve. Please take time to read the
following information carefully and discuss it with others if you wish. Please ask if there is
anything that is not clear or if you would like more information. Take time to decide whether or not
you wish to take part. Thank you for reading this.
Who will conduct the research?
Oludamilare Matthews
Title of the research.
Effects of cognitive tasks on Pupil Dilation and Eye movement
Why have I been chosen?
We are inviting members of the public to take part in the study so that we can understand how
people react to cognitive activities.
What would I be asked to do if I take part?
You will be asked to view 4 pictures containing animals and coloured texts. Your task is to
verbally say what they are. In some cases, there will be textual cues to what the objects are. You
are still expected to say what the objects or the colours they really are and NOT what the text
says. You don't have a time limit to complete the task but you should do so as quick as you are
School of Computer Science
Effects of cognitive tasks on Pupil Dilation and Eye movement
CONSENT FORM
If you are happy to participate please complete and sign the consent form below
PleaseInitialBox
1 I confirm that I have read the attached information sheet on theabove project and have had the opportunity to consider theinformation and ask questions and had these answeredsatisfactorily.
2 I understand that my participation in the study is voluntary andthat I am free to withdraw at any time without giving a reason.
3 I understand that the session will be audio and video recordedand an eye-tracker will be used.
4 I agree to the use of anonymous quotes.
I agree to take part in the above project.
Name of participant Date Signature
Name of person taking consent Date Signature
24/04/2018, Version 1.0
School of Computer Science
Sensing frustration in end-user tasks from pupillary response
Participant Information Sheet
You are being invited to take part in a research study as part of a PhD study to use pupil dilation and
eye tracking to measure emotions in interactive systems. Before you decide, it is important for you to
understand why the research is being done and what it will involve. Please take time to read the
following information carefully and discuss it with others if you wish. Please ask if there is anything
that is not clear or if you would like more information. Take time to decide whether or not you wish to
take part. Thank you for taking the time to read this.
Who will conduct the research?
Oludamilare Matthews
What is the purpose of the research?
The aim of this research is to build an algorithm for measuring emotions. This experiment serves as a
means to evaluate the accuracy of our algorithm using already end user tasks.
Why have I been chosen?
We are inviting members of the public to take part in this study so that we can evaluate our algorithm.
What would I be asked to do if I took part?
You will be asked to perform 4 tasks
1. To book trips using National express web platform
2. To carry out searches on Google search engine
3. To search the biographies of some people on Wikipedia
4. To check the weather on BBC website.
An eye tracker, located on the monitor will capture your pupil size and gaze data. After these, you will
be required to rate the difficulty of your tasks, and optionally, qualitative feedback on your experience.
What happens to the data collected?
Electronic data will be stored securely on a computer. Written information will be stored in a locked
drawer. The data will be analyzed and the results will be used in preparation for my dissertation.
How is confidentiality maintained?
Data will be made anonymous. The personal data collected is for consent alone and no one will be
able to match it with the data collected by the eye tracker. Furthermore, the consent form will be kept
separately in a secure file cabinet separate from the data collected from the eye tracker which will be
stored on a secure server within the University of Manchester.
What happens if I do not want to take part or if I change my mind?
It is up to you to decide whether or not to take part. If you do decide to take part, you will be given this
information sheet to keep and be asked to sign a consent form. If you decide not to take part, it is up
to you and there are no adverse consequences to you for this decision. If you decide to take part, you
are still free to withdraw at any point during the experiment without giving a reason and without
24/04/2018, Version 1.0
Sensing frustration in end-user tasks from pupillary response CONSENT FORM
If you are happy to participate please complete and sign the consent form below.
Please initial box
Participant No.:
Gender: Age: Profession: Highest level of qualification:
I agree to take part in the above project
Name of participant
Date Signature
Name of researcher
Date Signature
This Project Has Been Approved by the University of Manchester’s Research Ethics Committee [2018-4365-5934].
1. I confirm that I have read the attached information sheet on the above project and have had the opportunity to consider the information and ask questions and had these answered satisfactorily.
2. I understand that my participation in the study is voluntary and that I am free to withdraw at any time without giving a reason and without detriment to my treatment/service/self and my data will not be stored.
3. I understand that my personal data will remain confidential and untraceable to the electronic data recorded.
How did these tasks make you feel?
Weather in Manchester
Normal Frustrated
Weather in London
Normal Frustrated
Alan Turin's thesis
Normal Frustrated
Stephen Hawkin's Thesis
Normal Frustrated
Time in Canberra (Australia)
Normal Frustrated
Time in Ottawa (Canada)
Normal Frustrated
Trip to Manchester
Normal Frustrated
Trip to London
Normal Frustrated