Evaluationofclassiﬁcationalgorithmsforsmooth ...763273/FULLTEXT01.pdf · the object of focus, and...

Evaluation of classification algorithms for smoothpursuit eye movements

Evaluating current algorithms for smooth pursuit detection on Tobii Eye Trackers

OLLANTA CUBA GYLLENSTEN

Master’s Thesis at NADASupervisor: Inge Frick

Examiner: Stefan Arnborg

TRITA xxx yyyy-nn

Abstract

Eye tracking is a field that has been growing immensely over thelast decade. Accompanying this growth is a need for simplified and au-tomatic analysis of eye tracking data. A part of that analysis is eyemovement classification, and while there are many adequate classifica-tion methods for fixations and saccades, the tools for smooth pursuitclassification are still lacking. This thesis gives an overview of the field,and analyses five different methods for classifying smooth pursuits, fixa-tions, and saccades. The analysis also explores evaluation methods thatavoid the laborious way of manually tagging data to get a referenceclassification. Despite earlier reports of decent performance, the overallresults for all the analysed algorithms is poor. In particular, the slowestpursuits are consistently misclassified. Most certainly, the inclusion ofthe slow pursuits have skewed the results, but even disregarding themdoesn’t yield particularly impressive results. This begs the question ofwhat concessions one has to make in terms of prerequisites on the data,or qualifiers for the resulting analysis, to achieve adequate performance,and given those, when would such a classification be preferred to some-thing tailored to the problem at hand?

ReferatUtvärdering av klassificeringsalgoritmer för följerörelser

Eye-tracking är ett fält som har växt kraftigt det senaste årtiondet.Denna tillväxt har även frammanat ett växande behov av automatiskoch förenklad analys av eye-tracking data. En del av en sådan analysär en klassificering av ögonrörelser, och medan det finns flera välfunge-rande klassificeringsmetoder för fixationer och sackader saknas det fort-farande bra alternativ för att klassificera följerörelser. Detta examens-arbete ger en översikt över fältet, och analyserar fem metoder för attklassificera följerörelser, fixationer, och sackader. Analysen undersökeräven möjligheten att utvärdera klassificeringsmetoder utan att behövagå den arbetsamma vägen genom manuell klassificering för att få enreferensklassificering. Trots att vissa metoder tidigare har utvärderatsmed adekvata resultat, så får alla de här analyserade algoritmerna över-lag ett undermåligt resultat. Störst problem har algoritmerna med delångsammaste följerörelserna, som konsekvent blir felklassificerade. Meddessa långsamma följerörelser inräknade blev resultatet av den resteran-de analysen självklart svag, men även utan dem fås inga imponeranderesultat. Detta leder en till att undra vad för eftergifter som behövergöras, såsom krav på indata, eller begränsningar på den efterföljandeanalysen, för att en klassificeringsmetod ska ge adekvat resultat. Ochmed dessa begränsningar, när är då en av dessa klassificeringsmetoderatt föredra framför något alternativ som är skräddarsytt för det specifikaproblemet eller analysen?

Contents

1 Introduction 11.1 Scope and Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 32.1 Physiology of the eye . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Movements of the eye . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.1 Saccades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2.2 Fixations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2.3 Smooth Pursuit . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Classification ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.1 Velocity and Dispersion Thresholds . . . . . . . . . . . . . . . 62.3.2 Angular Dispersion Threshold . . . . . . . . . . . . . . . . . . 72.3.3 Autoregressive Kalman Filter . . . . . . . . . . . . . . . . . . 82.3.4 Attention Focus Kalman Filter . . . . . . . . . . . . . . . . . 92.3.5 Data Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 Evaluation of methods . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4.1 Properties of classification . . . . . . . . . . . . . . . . . . . . 122.4.2 Stimulus-informed metrics . . . . . . . . . . . . . . . . . . . . 13

3 Method 193.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3.1 Velocity and Dispersion Thresholds . . . . . . . . . . . . . . . 203.3.2 Angular Dispersion Threshold . . . . . . . . . . . . . . . . . . 223.3.3 Attention Focus Kalman Filter . . . . . . . . . . . . . . . . . 23

3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.4.1 Qualitative and Quantitative scores . . . . . . . . . . . . . . 243.4.2 Error Categorisation with Wards Performance Metrics . . . . 24

4 Results 274.1 I-VVT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2 I-VDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3 I-VMPStd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.4 I-VMPRay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.5 I-KF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.6 Movement properties histograms . . . . . . . . . . . . . . . . . . . . 42

5 Discussion 47

Appendices 50

A Glossary 51

B Kalman Filter 53

C Additional Figures 55

Bibliography 61

Chapter 1

Introduction

Eye-Tracking is the process of measuring the movements of the eye. The earliestdocumented accounts date back to the 19th century, when measurements were doneby manual observation. In 1908 Edmund Huey built the first automated eye-tracker,a sort of contact lens connected to an aluminum pointer [1], and the technology hassince evolved to enable less intrusive and more precise methods. The most populareye-trackers of today make use of cameras registering the eye and certain reflectionsfrom it, called the Purkinje images, to determine where the eye points.

Eye-Tracking is most commonly applied when studying the visual system or inthe field of psychology, for example to study how people read or register scenes;however, as the technology evolves it is finding its use in more applications. Now-adays eye-tracking is also used in as disparate fields as usability research, productand infant research, and as an assistive aid for the functionally impaired.

In many of the applications of eye-tracking, the exact point at which the eye isdirected is not of interest, but the point which holds the attention is. This is animportant distinction, because during saccades (the fast movements of the eye thatoccur when the point of focus is changed) people are virtually blind, no attention isgiven to the object at which the eye is pointing. This necessitates, for the purposesof automation and simplicity, algorithms for determining attention. This, in turn,is problematic due to an inherent limitation in eye-trackers: with eye-tracker data,attention can only be determined insofar as it follows gaze-direction. It is possiblefor a human to separate these concepts, to have the attention fixed on somethingthat is not in the eyes focus, this is called covert attention. However, since there iscurrently no way of directly determining the actual point of attention, and assumingonly overt attention is a decent approximation, that issue is best avoided. Hence,algorithms are developed to distinguish between different movements of the eye,something that of course is interesting in and of itself. The most common algorithmsdistinguish between fixations and non-fixations or saccades and non-saccades, thiscan be done by, for example, analysing eye velocity. Algorithms for classifyingsmooth pursuit movements are far less common; the purpose of this thesis was toevaluate some of them.

1

CHAPTER 1. INTRODUCTION

This thesis was done in collaboration with Tobii Technology, a Swedish companybased in Stockholm that develops remote eye-trackers. Remote eye-trackers are non-intrusive and allow for a certain degree of movement of the head, enabling a morenatural environment for the subjects. Their trackers collect data at frequencies of 30to 300 Hz, a range that calls for different algorithms to employ, since, for example,estimates of velocity in short movements, such as saccades, become less reliable forlower frequencies.

1.1 Scope and AimThe purpose of this study is to evaluate available algorithms for detecting smoothpursuit eye movements: two simple methods based on thresholds on gaze velocityand dispersion; one based on the Kalman-filter; and two based on estimating theapparent changes of the gaze-trajectory. Note that most of these were originallydesigned for real-time (or online) detection whereas this thesis evaluates their offlineperformance.

The focus is the detection and classification of smooth pursuits, but the clas-sification of fixations is also studied. This classification is done on data collectedin experiments where participants were asked to follow a preprogrammed stimulus,a circular disk moving in certain patterns to prompt the different eye movements;staying still to prompt fixations, moving steadily to prompt smooth pursuits, anddoing immediate jumps in position to prompt saccades. The knowledge of the stim-ulus movements is used to generate a reference classification to which the results ofthe algorithms are compared, without any manual classification.

1.2 LimitationsUsing the movement of the stimulus as a reference will always result in a slight errordue to the inexactness of the human visual system, be it from catch-up saccades orthe variable latency of the visual systems response (which also differs between thedifferent types of movements). The alternative, to construct a reference classifica-tion by hand, while considered better, is both laborious and subjective, and it isstill affected by system noise. The stimulus used contained only fairly long move-ments, which made the automatic reference classification a better fit, but therewas no measure of how the algorithms handled classification of shorter movements.Furthermore, only a few jumps in the stimulus to prompt saccades were presentin the experiments, and the only transitions were between fixation and saccades,and pursuits and fixations, which further reduced the the types of eye movementsthat were analysed. While using a more freely generated stimulus might have simu-lated real-life situations better, these restrictions meant that results from differentsubjects were more easily compared, reducing potential bias.

2

Chapter 2

Background

2.1 Physiology of the eye

Figure 2.1: Diagram of the hu-man eye.

The human eye as an optical system consistsroughly of four parts, see figure 2.1. As lightenters the eye it is first refracted by the cornea,the protrusion at the front of the eye. With nomuscles controlling the shape of the cornea, itsfocus is fixed.

Behind the cornea is the iris; an opaquestructure which controls the size of the pupil,the aperture in its center. Controlling the aper-ture means controlling the amount of light ad-mitted, as well as the sharpness of the image bycontrolling how collimated1 the admitted lightis.

As the light continues it passes through the lens. The refractive power of thelens is not as great as that of the cornea, but its shape can be changed by the ciliarymuscles to control the focal distance. This controls the distance at which objectshave to be in order to give a sharp representation on the eyes image plane, theretina.

The retina is composed of layers of cells at the rear interior surface of the eye,where the most important cells are the two types of photoreceptors, the rods (sen-sitive to dim and achromatic light) and the cones (sensitive to brighter chromaticlight). The distribution of the photoreceptors across the retina is such that theconcentration is the greatest close to the center of the visual axis, a region of theretina called the fovea. Outside of the fovea, the concentration drops quickly, with

1Light is collimated when its rays are parallel, meaning they don’t diverge as they propagate,and the more the light rays from an object diverges before they reach the retina, the fuzzier theimage on the retina becomes. A small aperture will admit less divergent light rays than a largeaperture, and therefore result in a sharper image.

3

CHAPTER 2. BACKGROUND

concentration of cones declining at a much faster rate than that of rods. This con-centration of visual intake from a relatively small region around the visual axis (the“useful” region of the visual field is often said to be 10 ◦ compared to our 180 ◦ fieldof vision[2]) means that the eye needs to move in order to accurately take in a largerscene.

2.2 Movements of the eyeIn this thesis, three classes of movements are considered: fixations, saccades, andsmooth pursuit. Other types, such as nystagmus, do not usually need to be distin-guished for the purposes of eye-tracking.

2.2.1 Saccades

To change its focus from one point to another, the human eye does, in general, notmove its gaze smoothly between the two points, but must do so in a quick, jump-likemanner, this movement is called a saccade. These can be both voluntary, as in aconscious change of focus, as well as involuntary, as a reflex. They are generallyregarded as ballistic movements (meaning they are preprogrammed in a way suchthat they do not change during the course of the movement2) and carry with thema refractory period (a period after execution where a new saccade can’t be initiated,reported to be approximately 150 ms[3]).

A saccade is the fastest movement a human can make, with reported peak veloc-ities around 800 deg/s and acceleration around 30000 deg/s2, while they typicallylast for 20 to 100 ms [2]. During the duration of a saccade a person is virtuallyblind, and conscious vision is gradually returned at the end of the saccade[4]. Thisphenomenon, called saccadic suppression, is an important reason for distinguishingsaccades from other movements of the eye. Since the return of full vision and theend of the physical saccade don’t coincide, what constitutes the end of the saccadeis debated. In addition, after the main part of the saccade, smaller correctionalmovements could occur called glissades; it is uncertain what impact these move-ments have on vision, which can be a problem when restricting classification tosaccades and fixations for the purpose of identifying attention. However, this canbe partially worked around by classifying the glissades separately, as done recentlyby M. Nyström [5].

2.2.2 Fixations

A fixation is when a person is keeping his focus still on one static spot. Althoughit may seem counterintuitive that this would involve movement, there are severalsmall involuntary movements included in a fixation. This is partly due to how

2This is not undisputed, it is also argued they might just appear ballistic because of their speed[2]

4

2.3. CLASSIFICATION IDEAS

the retinas photoreceptors work, responding to differences in the received light andnot to an absolute measure; this leads them to saturate after a while if given aconstant input, and not register anymore that constant light[6]. Although this isone reason for there to be movements during a fixation, the particular reasons foreach of the different fixational eye movements are still not known, though manytheories exist[7].

For reference, the identified fixational eye movements are: ocular microtremor,a constant high-frequency low-amplitude movement of the eye; ocular drift, a slowmovement of the visual axis away from the original point of focus; microsaccades,small (typically 1 to 25 arc minutes [8]) saccades occuring at about once or twiceper second. Note that all these movements are involuntary.

2.2.3 Smooth Pursuit

When trying to keep focus on an object that is moving, the eye can follow thatmovement in a smooth manner, called smooth pursuit. This is, with few exceptions,only possible when there actually exists an object to follow. This pursuit is mostly,as the name suggests, smooth, but the pursuit is not perfect; it tends to drift offthe object of focus, and a certain type of correctional saccades, called catch-upsaccades, are needed to correct the pursuit. As these movements are governed bydifferent processes, the smooth pursuit does not halt during a catch-up saccade,but the movements are superimposed[9]. For the purpose of classification, it isworth noting that the upper limit of smooth pursuit velocities is somewhere around90 deg/s [10], depending on the subject, while the peak velocities during saccadessurpass that limit for all but the smallest (< 1 deg) saccades[11].

For a more in-depth explanation of the physiology and the movements of theeye, as well as the implications for eye-tracking, see Duchowskis “Eye trackingmethodology: Theory and practice” [2].

2.3 Classification ideas

As previously stated there are many applications for eye-tracking, and different ap-plications will need to process the gaze-data differently. There is however most oftena need to classify the data as the different classes of eye-movements (by analysisor even assumption), and for most purposes to study overt attention (and manyother purposes as well), the three categories in the previous section are sufficient.However, when using fixed stimuli (such as pictures or text), there will be no move-ment for the eye to pursue, so the smooth pursuit movements can be disregarded.Fixed stimuli being historically common has meant that research of algorithms forautomatic classification has mostly been limited to fixations and saccades. This hasseen a slight change recently, as researchers interested in gaze-contingent displays(displays providing an interface to be controlled by gaze) have explored the possi-bility of identifying this kind of eye movement as part of that interaction[12, 13].

5


However, these algorithms are designed for real-time classification, and so couldpossibly be improved for offline classification.

The classification of fixational and saccadic eye-movements revolves around twomain ideas: differentiation based on dispersion and velocity respectively. Dispersion-based methods try to find sequences of gaze-points spatially close enough, and span-ning a time long enough to be considered fixations, points not belonging to sucha sequence are labeled non-fixations, or just simply, slightly inaccurately, saccades.A velocity-based method tries to extract the instantaneous velocities from the gazedata, comparing them to a threshold that would separate the fixations from the sac-cades. There are also other, less commonly used methods, such as hidden Markovmodels and minimum spanning trees. For a comparison and overview of a coupleof these, see [14].

Despite the developments in classification algorithms the most well regardedmethod for classifying data, whether it includes smooth pursuit or not, remainsthat of manually classifying (or tagging) data [15].

In the following sections the analyzed methods for detecting smooth pursuitmovements are detailed. Note that not all of these algorithms include a way ofdistinguishing between fixational and smooth pursuit movements in their originalconception. The value of such a method is realized, when considering saccadicsuppression, as an attention filter in the sense that it distinguishes movements thatmaintain visual perception, or attention, namely fixations and smooth pursuits, fromthose that don’t, namely saccades (the word attention is used here for convenience,as no corresponding term was found in current literature).

2.3.1 Velocity and Dispersion Thresholds

A simple way of distinguishing pursuit from saccadic eye movement is to put athreshold on the velocity, a movement faster than this threshold would be considereda saccade, while a slower movement would be classified as a pursuit or a fixation.This is similar to velocity-based methods for the classification of fixations. Thismethod holds merit by the difference in velocities inherent to saccadic and non-saccadic eye-movements. As with its fixational counterpart, problems arise withthe velocity filtering of the temporally discrete data; short saccades will appear tobe made at much slower velocities, and perhaps be misclassified as pursuits. Thehigher threshold needed for pursuits, as compared to only fixations, will only makethis problem more apparent.

While this method works well for classifying fixations in the absence of smoothpursuits, it is fairly weak in their presence. This is because, as noted earlier, theeye is not still during a fixation, several fixational eye-movements exist of non-negligible velocities, while there is no real lower velocity limit of pursuits. If it isknown beforehand that the stimuli used does not contain movements slower thana certain velocity, it is fairly certain that the velocities of pursuits will abide to asimilar limit (or slightly slower, given the imperfect nature of pursuits). If that limitis high enough, and note that this limit should be in angular velocity of the eye, this

6


method could be successfully applied; however, this special case is not consideredin this thesis.

An expansion of this idea is to limit the velocity-threshold to identify only sac-cades, and subsequently distinguish between fixations and pursuits with a dispersion-based method, much like I-DT of [14]. That is, finding windows with non-saccadesof a minimum length (such as 100 ms), where the distance between any two points donot exceed a certain threshold (in the order of 1 deg of the visual field). This wouldalleviate problems with short but fast fixational movements, without affecting pur-suits of similar velocities. Sufficiently slow pursuits would still be problematic, asany time-limited dispersion threshold implies a velocity threshold, and one can onlyhope that this threshold is so low as to be inconsequential. In fact, Komogortsevanalysed these algorithms [16], calling them I-VVT and I-VDT (the first letter Vstanding for the velocity threshold to separate saccades, and the second for velocityor dispersion for separating fixations and pursuits), and found the former lacking,while the latter performed among the best of the algorithms analysed.

2.3.2 Angular Dispersion Threshold

In his PhD thesis [12], S.A. Lopez proposed a new method of distinguishing thedifferent eye movements. The main idea is to separate fixations and pursuits byanalyzing the trajectory that the gaze data follows; during a fixation the eye movesseemingly at random, while during a pursuit the trajectory would appear smoother,even with noise. To measure this property of the trajectory the standard deviationof the change in direction (as measured by the angle between regularly fitted lines)is used.

The first part of the algorithm amounts to separating the saccades from theother eye movements, this is done as in the previous algorithm with a threshold onthe angular velocity of the eye. A point in the discretely measured time series with avelocity below this threshold is added to a moving window of a predefined maximumsize N . When a point is added to a non-empty window a line is fitted to all thepoints in the window, the direction of this line is compared with the previously fittedline of the window (if such a line exists), to get a change in direction in the form ofan angle. The standard deviation is then calculated for the angles in a window. Athreshold α is set on this deviation, such that points belonging to a window with alower standard deviation are considered part of a smooth pursuit, and those with ahigher standard deviation are classified as fixations.

In the master’s thesis of L. Larsson [17], this algorithm is analysed, and thelast part of analysing the angles is discussed in particular. Observing that thecircular nature of angles can make statistics on them unreliable, she proposed twoalternative methods to analyse the gaze trajectory taking that into account. Insteadof analysing the change of angles between subsequently fitted lines, only the angleof the line between subsequent points is analysed, converted to points on the unitcircle. The mean of the vectors to all such points in a window, is called the circularmean, and defines a mean for the directions of the corresponding lines. The length

7


of this mean vector is a measure on the directional bias, so a threshold on this valuecould separate fixations in a similar way as the Lopez’ original idea. The secondalternative proposed by Larsson was to do a Rayleigh test on the mean vector,defined by:

p = exp(√

1 + 4N + 4N2(1−R2)− 1− 2N)

Where R is the length of the earlier mentioned mean vector, and N the numberof points in the window. The resulting value p gives a probability of the angles beinguniformly distributed around the unit circle. Setting a threshold on this probabilitywill then equate to identifying directionally random movements, or fixations, witha certain significance, to be separated from directed movements, or pursuits.

In her thesis, the methods were analysed, grouping the data in blocks of 50 ms,and adding a dispersion threshold to fixations, and both the new methods werefound to perform better than the original, with the Rayleigh test being more robustthan the mean-vector threshold.

2.3.3 Autoregressive Kalman Filter

This algorithm is based on the Kalman filter, for an overview see appendix B.As proposed by D. Sauter 1991 [18], the idea behind this algorithm is that the

non-saccadic eye movements could be modelled fairly accurately with some simplemodel, and that saccadic movements follow a different enough model so that theycan be distinguished by hypothesis testing.

Sauter notes that while the point of gaze, during non-saccadic movement, doesnot necessarily keep a constant mean for short time intervals, it can be consideredthat the velocity of the eye-movement does; this would mean that saccades could beseen as jumps in this mean. Thus the sequence w(t) approximating the velocity ofthe eye-movement signal, calculated as the backward difference, could be modelledby an AR-process corrupted by a white Gaussian random noise ε(t),

w(t) = a1w(t− 1) + ...+ amw(t−m) + ε(t).

No specific AR-model is proposed, rather the generation of one is detailed in athree-step process as follows.

1. Determination of the order m of the AR-model by inference of the autocorre-lation function.

2. Estimation of the model parameters ai. This is done by minimizing the fol-lowing loss function by means of the least-squares method

J =N∑

t=m

ε̂(t)2 ε̂(t) = w(t)− a1w(t− 1)− ...− amw(t−m).

8


3. Validation of the model by testing the whiteness of the residuals ε̂(t) usingtheir autocorrelation function as a guide. If enough whiteness is not verified,the model order m has to be changed and the procedure continued from step2.

The resulting AR-model becomes the basis of a model M for the Kalman filteras follows {

X(t+ 1) = AX(t) + Bε(t)wm(t) = CX(t) + ν(t)

Where X(t) is the state vector, A the state transition matrix, Bε(t) the processnoise, C the measurement matrix, and ν(t) the measurement noise, defined asfollows:

X(t) =(x1(t) ... xm(t)

)T, xi(t+ 1) = wi(t−m+ i)

A =

0 1 0 . . . 0... . . . ...... . . . 00 . . . . . . 0 1am . . . . . . a2 a1

CT = B =

00...01

To test the validity of the hypothesis that the eye moves according to the given

modelM , an indicator of the models accuracy is needed. To this end, the innovationof the Kalman filter is used, this is defined as the difference between the measuredand the modelled signal (as predicted by the Kalman filter). Given an adequatemodel and non-saccadic movement, the innovation γ(t) would be a sequence of zeromean and a known variance s(t), thus if it shows a different behavior, a saccadicmovement is presumed; this is determined with a chi-square test by calculating thestatistic l(t, t−NT ) defined as

l(t, t−NT ) =N∑

k=0

γ2(t− kT )s(t− kT )

with T being the sampling period, and comparing this to a chi-square distribu-tion ofN degrees of freedom χN . A saccade is detected if the cumulative distributionfunction of χN at l is greater than a significance level α.

Sauter’s algorithm, in its original form, does not distinguish between fixationaleye movements and smooth pursuit, and so is an example of an attention filter.

2.3.4 Attention Focus Kalman FilterAs proposed by Komogortsev and Khan [13], this method builds on Sauter’s, using achi-square test on innovations to filter out saccades. It differs in that the underlyingKalman filter is based on a position-velocity model, with two state-vectors:

9


xk =(

Θx(k)Θ̇x(k)

)yk =

(Θy(k)Θ̇y(k)

),

where Θx(k) represents the horizontal (x) position at the discrete time point k,and Θ̇x(k) the corresponding velocity, and equivalently for the vertical (y) positionand velocity. The state-transition matrices Ak and the observation matrices Ok aresimply defined as

Ak =[1 ∆t0 1

]Ok =

[1 0

],

with ∆t the systems sampling interval. Note that the observation matrix onlyextracts the position from the state vector, thus the position is the only input tothe Kalman filter and the velocity is estimated by the filter itself. This differs fromSauter’s idea, where the velocity, taken as the backward difference, is the input to theKalman filter. Another subtle difference is the definition of the innovation, althoughboth calculate it as the difference between the filtered and unfiltered velocities, forSauter this is the innovation of the Kalman filter, whereas that’s not the case forthis algorithm (as the underlying model is based on position, not velocity).

Komogortsev further uses a result from T. Grindinger [19], that said the filterbehaves better if the variance of the system s(t) is kept fixed, and recommends aconstant δ2 = 1000.

Fixations are then classified as non-saccadic movement with a velocity belowa certain threshold for some minimum duration. Smooth pursuits, in turn, areclassified as non-saccadic and non-fixational movements not exceeding an uppervelocity threshold. Note that this means there may also be non-classified datapoints.

This algorithm simplifies Sauter’s by specifying a model to be used, therebysimplifying its implementation. However, the velocity thresholds, though fixed inthe paper, would be best left as parameters, as the exactness with which they canbe specified depends on the peculiarities of the system used, and so complicates theuse. Furthermore, for the detection of fixations, the threshold is dependent on thesystem noise (which would increase the apparent fixational velocities), and wouldindiscriminately classify slow-enough pursuits as fixations (a flaw inherent to anypurely velocity-based detection of fixations). It should be noted though, that thisalgorithm, as well as Sauter’s, is built for online classification, where the choice ofalgorithm also depends on computational complexity.

2.3.5 Data Filtering

Classification algorithms are usually designed for data being in the form of actualgaze-position or velocity. Naturally, the eye-tracker can only estimate the positionat discrete time-points, with a noise, and only when the eye is visible to the tracker.

10


Thus de-noising or filtering becomes important for these algorithms. Three suchtopics where filtering is important are given a short overview here.

Angular filtering

The natural way to describe eye-movements is to use the angle of the visual axis(compared to the visual axis of the eye at rest). Natural thresholds of eye-movementvelocities and fixational dispersion are indeed given in such angles, and their usebrings the additional advantage of being independent of the stimuli distance. Fordisplaying purposes using a computer screen however, a more practical measure isthe gaze position on that screen, which is one probable reason why raw data fromeye-trackers is given in such a way.

To mediate this discrepancy, to calculate the angle from the gaze position on ascreen, head distance (relative to the screen) as well as direction would be needed.The first is given by Tobii eye-trackers, whereas the second provides a more techni-cally difficult problem. Fortunately, for the study of attention the absolute value ofthe angle is not needed, properties such as velocity and dispersion can be calculatedusing only differences in angle which is independent of head direction.

Noise Removal

There exist a myriad of ways to remove noise from a given signal, and their respectiveperformance depends on both the characteristics of the signal and the noise, as wellas what analysis is to be made of the filtered data. For example, a simple movingaverage filter will attenuate the fast but small movements during a fixation, andthus might be favorable for a dispersion-based classification, but will also smoothout the longer saccades so that they will seem to attain a much lower maximumvelocity, which would be devastating for a velocity-based classification. This thesisuses the Savitzky-Golay filter, as used by M. Nyström [5], when filtering is needed.It works by fitting an n-degree polynomial to a window of size m around the pointof interest, where n and m are parameters to the filter.

Most filters are designed to remove relatively random and low amplitude noise,which is good for the most part of eye-tracker data with the exception of blinks.Blinks are essentially data loss, and closed eyes can be recognized as such, buteye-tracking data might also contain frames from half-closed eyes, these might con-fuse the eye-detection of the tracker and lead to it miscalculate the gaze-direction.How to detect, interpolate, and classify the affected data is non-trivial, but willnot be discussed in depth here. However, a common and easy way to deal withblinks in classification analysis is to simply remove them, when detected, and somesurrounding frames deemed corrupted by them, from the collected data completely.

Velocity Filtering

The eye velocity and its acceleration is a very useful property for distinguishingbetween the different eye movements. This information, though, is not readily

11


available from the raw eye-tracker data, as it includes only positional data for certaintime points. The problem becomes that of numerical differentiation, and due to thenoise present in most eye-tracker data, simple methods such as forward differencedo not provide good results3. Better results are instead obtained using filters thattake advantage of other properties of the eye-position signal.

A recent study [17] of the performances of such filters has been made by L.Larsson recommending two filters; the Savitzky-Golay filter [20] that works by localpolynomial regression; and a filter used by Engbert and Kliegl [21] that takes themoving average of central differences to estimate velocity. Although Engberts filteris easier to construct and slightly faster, the Savitzky-Golay filter has the advantageof simultaneously calculating all derivatives and smoothing the original signal.

2.4 Evaluation of methodsThe most obvious way of evaluating the performance of a given algorithm is tocompare its classification to a known correct classification. The lack of such areference, however, poses an unyielding problem in this kind of evaluation. Thereis, in fact, not a general consensus for exactly what constitutes the different eyemovements, especially regarding the onset and offset of saccades. This is especiallyproblematic since the most well regarded method is manual classification of thedata by an expert of the field. Indeed, the most well regarded method of evaluationis a comparison with such manually classified data. The lack of consensus is anissue for the eye-tracking community, but even given one, manual tagging wouldremain a time-consuming ordeal, and its credibility dependent on the experience ofthe tagger. There are two general ways of evaluating classification methods whilecircumventing manual tagging, which will be explained in the following sections.

2.4.1 Properties of classificationEven without any ground truth for the classification, it is possible to indirectlyanalyse the performance of different algorithms if there are some secondary char-acteristics that can be extracted, together with some expectations on them. Forexample, knowing limits on the velocity of the different eye movements, they canbe compared to the velocities of the found movements as they are implied by aclassification.

Intrinsic properties of eye movements

There have been many studies detailing the properties of the eye and its movements,such as physiological limits of velocities and durations during saccades or fixations,as well as attempts at modelling the human occulomotor system. These propertiescould be calculated from the gaze data and, with a classification, compared with

3The result is very noisy, since any system noise is amplified by the small time-intervals betweenframes

12

2.4. EVALUATION OF METHODS

those intrinsic properties of the corresponding eye movements. If the classificationis good, the physiological limits should be upheld, bar system noise.

Nyström did such an analysis [5], proposing a new classification algorithm (clas-sifying fixations, saccades and glissades). For data classified with the proposed al-gorithm, as well as two others (one dispersion-based, one velocity-based, from [14]),he plotted the distributions of fixation and saccade duration, saccade peak velocityand saccade peak acceleration. While the proposed algorithm produced “natural”-looking, smooth distributions, the others didn’t fare as well. The dispersion-basedalgorithm, for example, in the fixation duration distribution, produced a peak atand a bias towards the minimum allowed duration (which is a parameter for thealgorithm), hinting at a problem with the algorithms selection criteria not beingnatural.

General event properties – Implied subject behavior

Other more generic metrics could be built by analysing general properties of theclassified events, such as average number or duration of saccades. However, withouta naturality condition to appeal to, or any ground truth to compare with, theyare quite without context. Although, if such metrics, for example, are comparedbetween different classifications of the same data, outliers could be identified andfurther analysed.

Komogortsev [22] includes average number of fixations and saccades, as well asaverage fixation duration and saccade amplitude, in an analysis of existing classifi-cation algorithms. Though these metrics were found to be of limited interest, andhe instead leverages stimulus information to propose a couple of new metrics whichwill be discussed in the following section.

2.4.2 Stimulus-informed metrics

Any method that could automatically tag arbitrary data for comparison wouldactually be a classification algorithm, and as such, will never be a feasible basis forevaluating other algorithms. However, this can be alleviated if the stimuli can becontrolled and registered, and the test subject instructed to follow it. The idea isthen that the different movements of the stimuli would prompt specific movementsof the eye; a fixed stimulus would prompt a fixation, a moving stimulus wouldprompt a smooth pursuit, and a quick jump would prompt a saccade. If the stimuliare recorded along with the eye-tracking data, a classification of the latter couldthen be compared to the movements of an ideal observer of the former.

Stimuli as ground truth

The easiest way to use the stimulus information is to make it the ground truth forfurther evaluation. All the usual performance metrics can then easily be calculated,such as precision, recall, f-score, and so on.

13


An example of such an evaluation was done by S.A. Lopez, evaluating the angu-lar dispersion algorithm proposed in [12]. He created stimuli where a circle wouldalternate between staying fixed at one point and moving between points (instanta-neously as a jump, or over time in a smooth movement). Using the knowledge of thestimuli, he then measured the delay of onset (how long after the stimulus changedthat his algorithm registered the new type of movement) and ratio of detection (howmany data points that were classified as the same type of movement as the stimuli).

While simple to produce, it is important to realize that these measures are dif-ficult to analyze conclusively; even disregarding systemic errors or blinks, a testsubject cannot follow a stimulus perfectly. For example, given a jump in the stimu-lus, promoting a saccade, it can potentially take 200 ms for the test subject to react4, and for the saccade (and perhaps a potential glissade) to finish. While this isa problem for producing absolute performance metrics, these methods can still beuseful for comparisons on the same data. Presumably then, the human tracking er-ror would affect the performance of the different algorithms or parameters similarly,so that the results can be compared.

Qualitative and Quantitative Scores

Komogortsev and Khan, in a paper trying to find a common ground on which tocompare algorithms [22], proposed a set of measures to use on gaze data and stim-uli. These so called qualitative and quantitative scores compared the classificationwith the stimuli in a way appropriate for saccades and fixations, together with aformula to calculate the ideal score, based on the stimuli and expected eye move-ment latencies. In a later paper [16], these measures were expanded with scoresfor smooth pursuits, and the calculation of ideal scores were amended to take intoaccount possible pursuits in the stimuli.

The quantitative scores resemble the sensitivity (or true positive rate). Indeed,the Fixation Quantitative Score (FQnS) and the Smooth Pursuit QuantitativeScore (PQnS) are exactly that given the stimuli as ground truth, that is the per-centage of correctly classified frames to the total amount of the particular movementpresented (for each movement separately). For a frame to be considered correctlyclassified, it should not only be found to be the correct movement, but the gazeshould also be close enough to the stimulus to be considered related. The calcu-lation of the ideal scores is a bit involved, but essentially they correspond to asemi-ideal observer that suffers from a transition latency, and non-zero saccadicdurations. Since saccades are immediate in the stimulus, the sensitivity can’t becalculated in the same way as for fixations and pursuits. Instead, the SaccadeQuantitative Score (SQnS) is calculated as the ratio of the total amplitude of allfound saccades to that of the saccades in the stimuli. To avoid counting correctivesaccades during smooth pursuit, only saccades found in a temporal window aroundpresented saccades are counted. The ideal value for SQnS is simply 100%.

4A value dependent on many things such as age and viewing distance [23], as well as stimuli[24]

14


The qualitative scores represent more general qualities of the eye movements asthey are classified. Fixation Qualitative Score (FQlS) is a measure of the (frameby frame) average distance between correctly identified fixations and the stimuli.Where the position of a found fixation is calculated as the centroid of consecutiveframes classified as fixations. For smooth pursuits, there are two qualitative scores:the positional PQlS_P , and the velocity based PQlS_V . The former is similarto FQlS, the average distance between correctly identified frames of pursuit andthe stimuli, while the latter is simply the average absolute difference between thevelocities of the stimuli and correctly identified pursuits. The ideal values for thepositional scores are 0 deg, and 0 deg/s for the velocity score.

In the expansion of these scores to smooth pursuit movements [16], Komogortsevfurther proposes a last score to deal with what he identifies as the most challeng-ing misclassification in this now multiclass problem, namely distinguishing pursuitsfrom fixations. He calls this the Misclassified Fixation Score (MisFix), which is cal-culated as the ratio of fixation frames in the stimuli classified as smooth pursuits.The ideal value for this metric takes into account the latency of the terminationphase of pursuits prior to a fixation, as well as corrective saccades.

For more detailed information about these metrics, as well as evaluations of aset of algorithms, see Komogortsev’s papers [22] and [16].

Error Categorisation

Most of the metrics so far presented give a pretty general score on how good aclassification is, but disregard the time signal nature of the data, and give littleinformation as to the nature of the classification and where it fails. In the relateddiscipline of activity recognition, Ward et al [25] found a similar problem, and,analysing previous attempts in the field, proposed a set of performance metrics toclarify how classifications lined up with a ground truth, without having to manuallystudy the data.

Their analysis proceeds in a couple of steps to find and categorise misclassifica-tions in a manner that is easily overviewed. First off, to simplify analysis, each classis considered separately as a binary classification problem. Consecutive frames ofidentical classification are then merged into events. The sequence of events in theclassification and ground truth could now be compared, and errors like insertionsand deletions could be found, but the aim is to provide a richer categorisation, withoverfill, underfill, fragmentation and more. To do that unambiguously, the two se-quences of events are joined into a sequence of segments, were each segment is thelongest period of time with constant classification and ground truth. Each segmentnow represents either a true positive, true negative, false positive, or false negative,and the false classifications can then be further categorised as follows:

Insertion, corresponding to an inserted event, is a false positive (FP) segmentsurrounded by true negative (TN) or false negative (FN) segments

15


Merge, a failure to separate two distinct events, or a false positive (FP) surroundedby true positives (TP)

Deletion, corresponding to a deleted event, is a FN surrounded by TN or FP

Fragmentation, a partitioning of a single event, or a false negative surrounded bytrue positives

Underfill, a missed positive classification at the start or end of an event, or a FNpreceded by a TN or FP, and followed by a TP, or vice versa

Overfill, an extended positive classification over the edges of an event, or a FPpreceded by a TN or FN, and followed by a TP, or vice versa

Under- and overfill can be further qualified with start or end, depending on whichend of the event they occur. For these categorisations to make sense, an importantassumption is made that the time shift of event classification is smaller than thelength of events.

These categorisations of segments are trivially translated to individual frames,and rates can be calculated for each category in a similar way as for example truepositive rates, by dividing the amount of each such error frame with the total amountof frames with the same ground truth. For example the deletion rate, deletions beinga subset of false negatives, is the ratio of frames categorised as deletions to the totalamount of positive frames in the ground truth.

Furthermore, the events in the ground truth and classification can be classifiedby analysing the segments that they constitute. Naming ground truth events simplyevents, and events in the classification as returns, they are categorised into thefollowing categories:

Deleted Event (D), a deleted event is an exact match with a deletion segment.

Fragmented Event (F), an event that is fragmented will contain at least onefragmentation segment.

Merged Event (M), a merged event is an event that overlaps in any way with amerging return.

Fragmented & Merged Event (FM), if an event is both fragmented and merged,it will be moved to this category.

Matched Event (C), this will contain the remaining events, which then have amatching return, without being fragmented or merged with another event.

Merging Return (M’), any return with a merge segment will be categorised asmerging.

Fragmenting Return (F’), if a return in any way overlaps with a fragmentedevent it is called fragmenting.

16


Fragmenting & Merging Return (FM’), any return that is both merging andfragmenting will be moved to this category.

Inserted Return (I’), a inserted return is an exact match with an insertion seg-ment.

With these two sets of error categories, an unambiguous and detailed overviewof a classification is given. As an example of the power of this, an event can be seenas correctly matched, even if the return suffers from a time delay, and the amountof time delay can be seen in the start underfill statistic. Ideally this analysis is donewith a proper ground truth, but the approximation of using the stimuli movementsas ground truth could still be useful, if the analysis takes into account the differencesbetween an ideal and actual observer. For example, a large part of fragmentation,start underfill, and end overfill, could probably be attributed to such differences.

17

Chapter 3

Method

3.1 Experimental Setup

The experiment was conducted with a Tobii T120 eye-tracker, selected out of theTobii product range for its ability to record data in both 60 Hz and 120 Hz. Eachof the 11 subjects was seated in a dimly lit room in front of the tracker, and thetracker was calibrated using a standard 7-point calibration. After the calibrationthe stimuli (described in more detail in the next section) were shown, a white circleon a black background, the subject had previously been told to follow the circlefor the duration of the experiment, and the tracker data was collected. To alleviatefatigue the experiment was divided into four segments of roughly a minute and a halfeach. Participation was voluntary and subjects were informed about the purpose ofthe experiment.

The program collecting the data was a minimal C# program designed to delegateas much of the work to the Tobii API as possible, both the calibration and the datacollection was done using the Tobii API, whereas the stimuli were shown usingWindows Forms. The collected data for each segment of a session was saved inplain-text for later analysis. The stimuli were displayed smoothly using double-buffering, and since the stimuli was defined in angles of the visual axis the distancebetween the tracker and the head was used to translate this into screen position.The head distance as given by the eye-tracker was prone to noise, so to make thestimuli smoother it was filtered using a simple AR-filter (d̂x = (8d̂x−1 + 2dx)/10,where dx is the raw distance and d̂x the filtered).

3.2 Stimuli

In this thesis, three independent variables and their effect on classification wereoriginally investigated; the frequency of the eye-tracker, 60 as well as 120 Hz; thevelocity of the pursuit, four different pursuit velocities were analyzed ranging from0.5 deg/s to 10 deg/s, note that these velocities are given as the speed apparent tothe eye (that is, dependent on the position of the head); angular dispersion of the

19

CHAPTER 3. METHOD

trajectory, the moving stimuli would follow a trajectory of constant curvature, fourvalues in a range from 0 to 0.2 deg−1 (corresponding to the curvature of a cicle ofradius 5 deg, with the common definition of curvature as the inverse of the radius ofcurvature) were used. Both the velocity and curvature were measured using degreesof visual angle. However, due to the amount of data, only the 120 Hz data was fullyinvestigated, and due to its lack of importance, curvature was also not given anin-depth investigation.

The stimuli were composed of a circle appearing on one side of the screen (in thevertical middle and at a fixed horizontal distance of 5 deg of visual angle), whichwould then move to the opposite point of the screen in a trajectory of constantcurvature and velocity. Both these variables were chosen at random from the allowedvalues which included an “infinite” velocity (signifying an instant change of position,which would prompt a saccade), with each distinct combination of velocity andcurvature appearing once in each test. In between each of these movements, thecircle remained still for 1 s in order to separate the data into distinct pursuits, andto study the classification of the resulting fixations.

The main output of the experiments was a time-series for gaze position, asmeasured by the eye tracker, coupled with a definition of the movement of thestimulus. For an ideal observer, the tracker data would coincide with the stimulusposition, but because of human physiology and tracker latency, this is impossible inpractice.

The analysis was mostly done in “Eye Studio“[26], a filter analysis suite createdin a parallel thesis at Tobii. It is capable of reading the stimuli and data saved by thedata collector. New algorithms and metrics are added by creating python modulesextending the basic templates. Except for those additions, the only modificationto the original code was to extend the support for smooth pursuit movements inclassification and analysis.

3.3 Algorithms

The algorithms were as far as possible modelled after the descriptions detailed inthe background section. They were implemented in Python as extensions to “EyeStudio”. The following chapters will detail where the original descriptions mayhave been ambiguous, and the choice of static and varying parameters used in theanalysis.

3.3.1 Velocity and Dispersion Thresholds

Two algorithms were created based on the methods of thresholds on velocity anddispersion presented in section 2.3.1. One, named I-VVT after [16], using twothresholds on velocity to separate fixations, saccades, and pursuits, and one, namedI-VDT, also after [16], using a threshold on velocity to find saccades, and a thresholdon dispersion to separate pursuits and fixations.

20

3.3. ALGORITHMS

Both algorithms use the Savitzky-Golay filter to determine the velocity of theeye movement. The recommendation from [5] and [17] is to use a polynomial of order2 and a filter length of approximately twice the minimum duration of a saccade (or10 ms). For a frequency of 120 Hz, this means a filter length of 3, which turns thefilter into a simple central difference filter, and for 60 Hz the recommended lengthwould be too short for the velocity to be extracted at all. To achieve a betterfiltering, the calculation is instead made with the ideal1 duration of the minimal(10 deg) saccade in the stimuli. This yields a length of 2 · ceil (0.043 · 120) = 9 for120 Hz data and 5 for 60 Hz data.

I-VVT

The implementation of I-VVT was simply a filter on the velocity as it was given bythe Savitzky-Golay filter. Fixations were delimited by an upper fixational velocitythreshold, and pursuits by the saccadic velocity threshold, anything faster thanthat was denoted a saccade. To further refine the result and reduce the noise,two minimum durations were introduced for fixations and pursuits. Any sequenceof frames slow enough to be a fixation, but too short according to the fixationduration limit, were reclassified as pursuits. The same procedure was then repeatedfor pursuits, with the pursuit duration limit.

To limit the parameter space in the analysis, the saccadic velocity threshold waskept constant at 100 deg/s, and both the duration limits were kept at 80 ms, whilethe fixation velocity threshold was varied from 2 to 12 deg/s.

I-VDT

For I-VDT, the classification of non-saccadic frames was done with a dispersionthreshold after [16], where the dispersion of a sequence is calculated as the sumof the coordinate-wise diameters2. If a sequence of frames was longer than a min-imum fixation duration, and did not exceed the fixation dispersion threshold, itwas classified as a fixation. Since Komogortsev [16] showed that a longer minimumfixation duration made the classification of I-VDT more balanced3, it was increasedas compared to I-VVT.

As in I-VVT, the saccadic velocity threshold was kept constant at 100 deg/s,and the minimum pursuit duration was 80 ms. The minimum fixation duration waskept at 160 ms, while the dispersion threshold was varied from 0.1 to 1.6 deg.

1Carpenter [3] gives a formula for the saccade duration as D = 2.2A + 21 ms, where A is thelength of the saccade in degrees

2That is, the sum of the differences between maximum and minimum value in a sequence foreach coordinate respectively

3Note that this duration is part of the dispersion definition for I-VDT while it’s role in I-VVTis only noise reduction.

21

CHAPTER 3. METHOD

3.3.2 Angular Dispersion ThresholdTwo algorithms based on this method were implemented, one using Lopez originalidea of measuring angular dispersion with the standard deviation of the change ofdirection of consecutive fitted lines, and one using one of the suggested improve-ments in [17], measuring angular dispersion with a Rayleigh test on the angles of theframe-by-frame changes. These were named I-VMPStd and I-VMPRay respectively,taking inspiration from [16].

While Larsson suggests using the gaze acceleration to separate saccades insteadof velocity, the reduced frequency used in this thesis makes it more convenient touse the velocity. For both I-VMPStd and I-VMPRay a first-order Savitzky-Golayfilter of length 3 was used for this purpose.

I-VMPStd

In accordance with Lopez original thesis, I-VMPStd classified movements fasterthan the saccadic velocity threshold as saccades. Any non-saccadic movement wasadded to a moving window of a maximum size N , and when the moving windowhad two or more points in it, a line was fitted to the points in the window with theleast squares method. The line was fitted with time, as given by their index in thewindow, to obtain a direction for the line. This results in the line being definedwith a vector in the coordinate-plane, and for any subsequent lines, the differencein angle between them could then be calculated through the dot-product betweenthose vectors. These angular differences were added to a parallel moving window,and whenever that window contained two or more angles, the standard deviation ofthe contained angles was calculated. If that standard deviation was below a certainthreshold, all frames in the moving window were classified as smooth pursuits, andotherwise as fixations.

This implies that any non-saccadic sequence shorter than four frames can’t beclassified4, and that the first N non-saccadic frames after a saccade will have thesame classification, but that afterwards there is no minimum duration for eachmovement. This is done to mimic the original implementation, although Lopezdoes mention minimum fixation and pursuit durations, he doesn’t describe howthey would be implemented. Although a possible way to deal with this is shortlydescribed in section 4.3.

For the analysis, the saccadic velocity threshold was kept constant at 80 deg/s,the maximum size of the moving window was set to 12 (as used by Lopez), whilethe standard deviation threshold was varied from 10 to 30 deg.

I-VMPRay

The implementation of I-VMPRay is a bit simpler than that for I-VMPStd, althoughthey start of the same, by separating saccades by velocity.

4This isn’t very problematic here since that corresponds to 50 ms in 60 Hz data, which is justshort for fixations and pursuits anyway.

22

3.4. EVALUATION

Each non-saccadic point is added to a moving window of a maximum size N ,and each subsequent pair of points in the window defines a vector. These vectorsin the window each have an angle from the x-axis that can be analysed with theRayleigh test as described in section 2.3.2.

In Larssons thesis [17], the data was partitioned into blocks of 50 ms to beclassified with information from the surrounding blocks. To simulate this behavior,but with added noise reduction, the significance given by the Rayleigh test wascalculated for each frame from the window of size N centered in that frame (cut offby any saccadic frames). The frames were then classified in consecutive blocks ofsize M by comparing the significance threshold to the median of the significancesin each block.

This means that I-VMPRay, unlike I-VMPStd defines an implicit soft minimumduration for fixations and pursuits with the block size M . In fact, most movementsthat I-VMPRay will find will be of a length that is a multiple of M , the exceptionbeing movements at the end of non-saccadic sequences that can’t be split up evenlyin M -size blocks.

As for I-VMPStd, the saccadic threshold was kept fixed at 80 deg/s for I-VMPRay. Further, to mimic Larsson, N was set to 18, and M to 6 (correspondingto 150 and 50 ms respectively at 120 Hz), while the significance threshold was variedfrom 0.1 to 0.9.

3.3.3 Attention Focus Kalman FilterThis algorithm was implemented straightforwardly from Komogortsevs original pa-per [13], as described briefly in 2.3.4. The Kalman filter was run separately forthe two coordinates, as degrees of visual angle, using a fixed standard deviation of√

1000 deg/s and a window size of 5 for the chi-square test, as recommended byKomogortsev. Identity matrices were used as the two noise covariance matrices,again mimicking Komgortsev, while he also recommended adjusting the observa-tion noise matrix to the system, this was found to have little impact, and hence itwas kept unadjusted. The minimum fixation duration was set to 80 ms as in thetangential velocity filter, and the upper smooth pursuit velocity threshold was setto 80 deg/s. Since the analysis focuses on classifying fixations and smooth pursuits,the chi-square threshold (that separates the saccades) was kept fixed at 150, whilethe fixation velocity threshold was varied from 0.3 to 3 deg/s.

3.4 EvaluationThe evaluation can be split in two groups, evaluation of apparent properties of theeye movements as classified by an algorithm, and evaluation of the classifying per-formance as compared to a perfect classifier. For the second group, in this thesis,no manual tagging was done, instead the classification was compared to the move-ments that would be natural for the presented stimulus (as done by Lopez[12] andKomogortsev[22]). While this eases up the manual workload, it will also necessi-

23

CHAPTER 3. METHOD

tate interpretation of the results, since the actual eye movements far from perfectlymatches the presented stimulus.

3.4.1 Qualitative and Quantitative scores

Matching gaze-data from the tracker with the movement of the stimulus, the qual-itative and quantitative scores were calculated as in [16]. To quickly recapitulate,FQnS and PQnS are the true positive rates (or recall) of fixation and pursuitsrespectively, counting only instances where the gaze is within 5 deg of the stim-ulus. FQlS and Smooth Pursuit Position Qualitative Score (PQlS_P ) are theaverage distance of correctly classified points to the stimulus, where the position ofa fixation is averaged over its duration. Smooth Pursuit Velocity Qualitative Score(PQlS_V ) is the average absolute difference in frame-by-frame velocity betweenthe gaze and stimulus. MisFix is simply the ratio of fixation frames classified aspursuits. Finally, SQnS, is the ratio of total amplitude (measured as the distancebetween onset and offset) of saccades found between 100 ms before, and 300 ms aftera jump in the stimulus, compared with the amplitudes of those stimulus jumps.

Using the same assumptions for onset latencies, saccadic duration, etc. as in[16], the ideal values can be estimated as 86% for FQnS, 100% for PQnS, and 11%for MisFix.

3.4.2 Error Categorisation with Wards Performance Metrics

Given a ground truth, these metrics are very straightforwardly implemented afterthe description in section 2.4.2, or Ward et al. original paper [25]. Without man-ually classifying the data, however, there is no real good ground truth to compareto. Instead, this was approximated with the ideal response given the stimulus asdescribed earlier.

This is carried out with one exception, namely saccades. When the stimulusjumps, it is immediate, that is, there is no ’saccade frame’ in between the twostimuli. Since an event is defined as consecutive frames of the same ground truthclassification, unless specially handled, sequences where the stimulus jumps betweentwo fixations will be seen by these metrics as one long fixation. This in turn isproblematic, since the complete sequences, and not just the individual frames, areanalysed. Since the human subject must make a saccade in between these stimuli,the event would be naturally fragmented. If a saccade frame would be inserted,the inherent latency in the human visual system would instead mean that bothevents would be naturally merged (and the second fragmented). To remedy this,the gaze-data was analysed when the stimulus made such a jump, and a three framesaccade was inserted such that the middle of the saccade was the first point wherethe distance to the end of the jump was shorter than the distance to the start.

There are of course a host of further problems in using such an approximateground truth, and even more possible solutions. For example, there is no consid-eration here for corrective saccades, which could probably be handled in a similar

24

3.4. EVALUATION

manner as the normal saccades just described. Neither is there a consideration forcatch-up saccades, but trying to automatically modify the event sequence to takethem into account will be close to implementing an actual classification algorithm,and as such, will be difficult and bias the result. In other words, trying to improvethe original approximate ground truth is even harder when looking at classificationsequences, and so is kept at a minimum here.

To recapitulate the metrics that make up the error categorisation, there are twogeneral classes of metrics: the scores and the rates. An example of rates can beseen in figure 4.4, the positive frame rates for each movement are representationsof the classification when that movement is the approximate ground truth; the truepositive rate represents correct classifications, and is similar to the quantitativescores, without considering the distance between gaze and stimulus; the underflowstart rate measures the amount of frames it takes before a positive classification,similarly the underflow end rate measures the amount of frames after the last pos-itive classification until the ground truth event actually ended; the fragmentationrate represents frames of incorrect classification between correct ones; the deletionrate represents frames of incorrect classification for a ground truth event that isnowhere correctly classified. The negative frame rates in turn represent the remain-ing frames where the movement is not in the ground truth: the true negative rateis not necessarily a correct classification (since only the binary classification of eachmovement is considered), but a frame where the particular movement was correctlynot present; the overflow start and end rate are duals of the underfill rates for pos-itive frames, frames after an event has ended where the positive, now incorrect,classification remains; the merging rate is the dual of the deletion rate, representingsequences of positive classification covering all negative frames between two pos-itive; the insertion rate is the dual of the fragmentation rate, negative frames ofpositive classification in between negative classifications.

The scores are categorisations of complete events or returns, sequences of framesof the same class in the ground truth and classification respectively. Intuitively,an event that has no corresponding returns is called deleted; a return without acorresponding event is called inserted; an event is fragmented if it is classified asmany returns (which in turn are called fragmenting); a return that corresponds tomore than one event is called merging, and those events are called merged. Notethat merged events and returns can also be fragmented (or fragmenting). The rest,where one event corresponds to one return, are simply called correct.

Due to the approximate ground truth, a lot of fragmentation (from saccades andblinks) can be expected, especially for the longer pursuits, while the underfill startand overfill end will include all the system latencies. Since under- and overfill hasno bearing on event scores, a perfect classification should be able to result in a lotof correct or fragmented events, while the fragmentation rates should remain low(since the relative lengths of saccades and blinks is so short).

25

Chapter 4

Results

4.1 I-VVT

Figure 4.1 shows the qualitative and quantitative scores for an algorithm that onlyuses filtered velocity to discriminate between the different movements. Since thevariable that is changed along the x-axis is the fixational velocity threshold, thesaccadic quantitative score remains constant at 0.96, which is very close to the ideal100%.

2 4 6 8 10 12Fixational velocity threshold

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Quan

titative Sc

ore

PQnSFQnSMisFixSQnS


0.40

0.45

0.50

0.55

0.60

0.65

0.70

Qualita

tive Sc

ore [deg

]

FQlSPQlS_P

Figure 4.1: Quantitative and qualitative scores for I-VVT as the velocity thresholdthat separates fixations from pursuits is varied from 2 to 12 deg/s.

However, the fixational and pursuit quantitative scores are faring worse, whilethe pursuit score is high at low thresholds, the fixational score is simultaneouslyvery low, and there is a clear negative correlation between the two. Interestingly,MisFix remains above PQnS throughout the parameter space, which means that alarger fraction of fixations are classified as pursuits than actual pursuits are. In fact,if pursuits of different velocities are analysed individually, see figure 4.2, the slowerpursuits (0.5 and 1 deg/s), are consistently classified as fixations more often thanactual fixations are. The difference is not very large though, and could possibly beexplained by the latency not being accounted for, as this would disproportionately

27

CHAPTER 4. RESULTS

affect fixations as compared to the slower pursuits (which make up a larger part ofthe stimulus), but even then it speaks of the difficulty of distinguishing slow pursuitsfrom fixations.


0.0

0.2

0.4

0.6

0.8

1.0

Portion of m

ovem

ent c

lassified

as fix

ation 0

0.51510


0.0

0.2

0.4

0.6

0.8

1.0

Portion of m

ovem

ent c

lassified

as pu

rsuit 0

0.51510

Figure 4.2: The classification by I-VVT of stimulus movements of different velocities,measured in degrees per second, as the fixational velocity threshold is varied from 2to 12 deg/s. A stimulus velocity of zero would signify a static stimulus, and shouldtherefore be identified as a fixation.

The positional qualitative scores are better, for most thresholds FQlS staysbelow 0.45 deg, while PQlS_P stays below 0.55 deg. At higher thresholds PQlS_Pstarts rising steeply, a similar pattern is seen for PQlS_V which rises from below20 to 35 deg/s (see figure C.1), perhaps as a result of slower pursuits (that woulddecrease these means) being misclassified as fixations.

The error categorisation scores for I-VVT can be seen in figure 4.3. In thefixation event scores, the problems of trying to distinguish fixations by velocity canbe seen, as the amount of deleted events remain high until a velocity threshold of6 to 8 deg/s. In the pursuit event scores, these missed fixations show up as mergedevents. The remaining events are fragmented or correct, which, depending on thenature of the fragmentation, is not necessarily bad, but in the return scores, theamount of inserted fixations indicate a lot of pursuits misclassified as fixations. Theerror rates (see figure 4.4) show a similar picture, while a fragmented event could stillbe mostly correctly classified (with small fragmentations), the pursuit fragmentationrate is actually larger than the true positive rate above 6 deg/s, and the fixationinsertion rate is similarly larger than the true negative rate. This means that thefragmentation in pursuits is made up of inserted fixations that, in total, are longerthan the correctly classified returns. Note that here, as opposed to the quantitativescores and figure 4.2, the underfill is taken into account, which is important sinceit includes the reaction latencies of the subjects.

28

4.2. I-VDT


0

50

100

150

200

Fixation Ev

ent S

cores

D

F

FM

M

C


0

1000

2000

3000

4000

Fixation Re

turn Scores

I

F

FM

M

C


0

20

40

60

80

100

120

140

160

Smoo

th pursuit Ev

ent S

cores

D

F

FM

M

C


0

500

1000

1500

2000

2500

Smoo

th pursuit Re

turn Scores I

F

FM

M

C

Figure 4.3: The error categorisation scores for I-VVT as the fixational velocitythreshold is varied from 2 to 12 deg/s. The event scores for each movement are:deleted, fragmented, fragmented & merged, merged, and correct. The return scoresare inserted, fragmenting, fragmenting & merging, merging, and correct.

4.2 I-VDT

The qualitative and quantitative scores for this algorithm are presented in figure 4.5.Since the saccadic velocity threshold remains constant in the investigated parameterspace, and since that value is the same as for I-VVT, SQnS remains constant at0.96.

The similarity continues with the quantitative scores, where MisFix stayingslightly higher than PQnS, and any improvement in quantitative score for eithermovement is at the detriment of the other.

The qualitative scores are also fairly similar to those of pure velocity thresholds.With FQlS staying relatively constant around 0.45 deg at high enough dispersionthresholds, while PQlS_P and PQlS_V (see figure C.1) look good at lower thresh-olds, and increase exponentially at higher. Perhaps unexpectedly then, figure 4.6,showing the classification by stimulus velocity, hints at a similar problem with slowpursuits.

The error scores (figure 4.7) reinforces the notions of similarity, with a largeamount of inserted fixations, and fragmented returns. Pursuit detection appears tobe less noisy, with fewer pursuit returns, and slightly more correct events, perhaps asa result of the increased minimum fixation duration, though the amount of fixationreturns remaining about the same doesn’t seem to support that. However, this seems

29

CHAPTER 4. RESULTS


0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram

e Metric

Positive

Rates dr

fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram

e Metric

Neg

ative Ra

tes

mr

ir

oer

osr

tnr


0.0

0.2

0.4

0.6

0.8

1.0

Smoo

th pursuit Fram

e Metric

Positive

Rates

dr

fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Smoo

th pursuit Fram

e Metric

Neg

ative Ra

tes

mr

ir

oer

osr

tnr

Figure 4.4: The error categorisation rates for I-VVT as the fixational velocity thresh-old is varied from 2 to 12 deg/s. The positive rates for each movement are deletionrate, fragmentation rate, underfill end rate, underfill start rate, and true positiverate. The negative rates are merging rate, insertion rate, overfill end rate, overfillstart rate, and true negative rate.

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6Fixational dispersion threshold

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Quantitative Score

PQnSFQnSMisFixSQnS


0.40

0.45

0.50

0.55

0.60

0.65

0.70

0.75

Qualita

tive Score [deg]

FQlSPQlS_P

Figure 4.5: Quantitative and qualitative scores for I-VDT as the fixation dispersionthreshold is varied from 0.1 to 1.6 deg

to be at the cost of more merging pursuit returns, and hence more deleted events,importantly, there is a larger overlap of thresholds where a significant amount ofdeleted fixation and pursuit scores. The error rates can be seen in figure C.2,although they don’t add much new to the analysis.

It is a bit peculiar that some (probably slow and long) pursuits would be deleted,

30

4.2. I-VDT


0.0

0.2

0.4

0.6

0.8

1.0

Portion of m

ovem

ent c

lassified

as fix

ation 0

0.51510


0.0

0.2

0.4

0.6

0.8

1.0

Portion of m

ovem

ent c

lassified

as pu

rsuit 0

0.51510

Figure 4.6: I-VDT classification of different stimulus movements separated by ve-locity, as the fixation dispersion threshold is varied from 0.1 to 1.6 deg


0

50

100

150

200

Fixation Event S

cores

D

F

FM

M

C


0

500

1000

1500

2000

2500

3000

3500

4000

Fixation Return Scores

I

F

FM

M

C


0

20

40

60

80

100

120

140

160

Smooth pursuit Event S

cores

D

F

FM

M

C


0

200

400

600

800

1000

1200

1400

Smooth pursuit Return Scores I

F

FM

M

C

Figure 4.7: The error categorisation scores for I-VDT as the fixation dispersionthreshold is varied from 0.1 to 1.6 deg

while others, simultaneously, are being merged (over one second fixations). Sincethese results are summed over all trials, the most likely explanation is the differ-ence in noise between the trials having a large effect on what would be a “good”dispersion threshold. A longer minimum fixation duration would help with deletedslow pursuits, but could also delete shorter fixations. Another possibility could befiltering the position before measuring the dispersion. However, even with perfecttracking, due to the noise inherent in fixations, separating them from slow pursuitswould be hard without increasing the minimum fixation duration, and punishingshorter fixations.

31

CHAPTER 4. RESULTS

In the appendix, figures C.3 and C.4 show the classification by stimulus velocityfor a version of I-VDT that measures dispersion with the positions filtered by aSavitzky-Golay filter (order 2 and length 9, the same parameters that are used forthe velocity), with the dispersion threshold range modified accordingly. The latteris also using a doubled minimum fixation duration of 320 ms. There it is readilyseen that, while the performance for faster pursuits is improved, slower pursuit arestill not properly distinguished from fixations for either modifications.

4.3 I-VMPStd

The qualitative and quantitative scores for I-VMPStd can be seen in figure 4.8. Thepositional scores are decent, FQlS stays around 0.46 deg, while PQlS_P quicklydecreases to a similar level. The velocity qualitative score PQlS_V (see C.1) isalso staying relatively constant, increasing from 13 to 17 deg/s, which is the lowestof all analysed algorithms.

10 15 20 25 30Standard deviation threshold

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Quan

titat

ive

Scor

e

PQnSFQnSMisFixSQnS


0.44

0.46

0.48

0.50

0.52

0.54

0.56

0.58

0.60

0.62

Qual

itativ

e Sc

ore

[deg

]

FQlSPQlS_P

Figure 4.8: Quantitative and qualitative scores for I-VMPStd as the standard de-viation threshold is varied from 10 to 30 deg.

With the saccadic threshold staying constant, SQnS remains constant at adecent 0.81 deg, while PQnS increases with the angular dispersion threshold, asFQnS decreases. Further, while MisFix increases with PQnS, it remains lower.Looking at the classification separated by stimulus velocity (figure 4.9) it seemsthat while slow pursuits are classified similarly to fixations (leading to the similaritybetween PQnS and MisFix, as they make up the larger class of pursuits by time),only the slowest (0.5 deg/s) pursuit is misclassified more often than fixations.

While still having problems with slow pursuits, this algorithm seems to be prettystable in classifying faster pursuits. Note that there is a big difference in the samplesize between the different pursuit velocities (since the stimulus for each travels thesame distance), so system latency would affect the ideal scores unequally. Forexample, an onset latency of 200 ms would render the ideal portion classified asfixation at roughly 16% for pursuits with a velocity of 10 deg/s, 8% for 5 deg/s,1.6% for 1 deg/s, 0.8% for 0.5 deg/s, and 80% for fixations.

32

4.3. I-VMPSTD


0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Portion of m

ovem

ent c

lassified

as fix

ation 0

0.51510


0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Portion of m

ovem

ent c

lassified

as pu

rsuit 0

0.51510

Figure 4.9: I-VMPStd classification of different stimulus movements separated byvelocity, as the standard deviation threshold is varied from 10 to 30 deg.

The event scores (figure 4.10) show very few correct events, and a lot of frag-mented ones. The amount of fragmented events could be due to the sheer amountof returns for both movements which stays between 15 to 40 times the number ofevents. Although this might be alleviated with minimum durations (the averagepursuit duration stays below 100 ms throughout most of the dispersion thresholdinterval), if these short sequences aren’t properly classified, they might still resultin a fragmented event.


0

50

100

150

200

Fixation Event S

cores

D

F

FM

M

C


0

1000

2000

3000

4000

5000

6000

7000

8000


I

F

FM

M

C


0

20

40

60

80

100

120

140

160


cores

D

F

FM

M

C


0

1000

2000

3000

4000

5000

6000

7000

8000


F

FM

M

C

Figure 4.10: Error categorisation scores for I-VMPStd as the standard deviationthreshold is varied from 10 to 30 deg.

The error rates (figure 4.11), the fixation insertion rate in particular, show that

33

CHAPTER 4. RESULTS

a large part of the pursuit fragmentation are from inserted fixations. While therewere very few correct fixation events, the fixation fragmentation rate isn’t veryhigh, which implies that the fragmenting classification is largely made up of smallermovements (probably pursuits). It is of course also possible that a fixation eventcould be fragmented if the end of the pursuit leading up to it (which, due to latency,will be seen as the start of the fixation event) is fragmented by fixations. However,the reasonable fixation underfill start rate (and equivalently the pursuit overfillend) implies that this isn’t the main source of fragmentation. Note that, since allfixations are one second long, the underfill start rate roughly corresponds to thelatency fixation classification in seconds.


0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram

e Metric Positive Rates dr

fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram

e Metric Negative Ra

tes

mr

ir

oer

osr

tnr


0.0

0.2

0.4

0.6

0.8

1.0

Smooth pursuit Fram

e Metric Positive Rates

dr

fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Smooth pursuit Fram


tes

mr

ir

oer

osr

tnr

Figure 4.11: Error categorisation rates for I-VMPStd as the standard deviationthreshold is varied from 10 to 30 deg.

A way to reduce the apparent classification noise, and thus the fragmentationfrom short returns, is to filter the classification with a median, or moving average,filter. In fact, if such a filter is repeatedly filtered until the fixed point is reached(where additional filtering would have no result), an implicit minimum duration ofhalf the median filter size is achieved. The result of applying such a filter on thesequences of fixations and pursuits in the I-VMPStd classification can be seen infigures C.5 and C.6. The size of the filter was 19 frames, which (with 120 Hz data)corresponds to a minimum duration of about 80 ms for pursuits and fixations. Forboth movements the error scores show a lot fewer fragmenting returns, resultingin decrease in fragmented events, and a corresponding increase in correct events.However, the error rates show a more nuanced picture, where, while pursuit insertionhas decreased with fixation fragmentation, and increased the true positive rate of

34

4.4. I-VMPRAY

fixations, the true positive rates of pursuits has shrunk, both from fixation insertionsand overfill.

4.4 I-VMPRay

Although the response to changes in significance is less linear compared to thestandard deviation threshold of I-VMPStd, the quantitative scores seen in 4.12 areotherwise quite similar. The case is similar for the qualitative scores in 4.12 andC.1, except for the lower significance threshold where FQlS rises above 1 deg (mostprobably from the misclassification of slow pursuits in connection to fixations, sincethere is no dispersion limit on fixations implemented for I-VMPRay, and the fixationposition is averaged over all consecutive fixation frames).

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Significance threshold

0.0

0.2

0.4

0.6

0.8

1.0

Quan

titative Sc

ore

PQnSFQnSMisFixSQnS


0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

Qualita

tive Sc

ore [deg

]

FQlSPQlS_P

Figure 4.12: Quantitative and qualitative scores for I-VMPRay as the significancethreshold for the Rayleigh test is varied from 0.1 to 0.9.

Looking at the classification by velocity, in figure 4.13, it can be seen that whilea large part of the low PQnS comes from the bad detection of slower pursuits,the faster pursuits aren’t properly detected either. Perhaps counterintuitively, agood classification of the faster pursuits would yield a higher portion of correctclassification for the slower pursuits. This is because the latency remains relativelyconstant, while the size of the pursuits in the stimuli varies inversely with thevelocity, so the portion of misclassification that can be attributed to latency for10 deg/ms pursuits would be roughly double the portion for 5 deg/ms pursuits.This means that taking latency into account would only exasperate the differencein classification seen between 5 and 10 deg/s pursuits.

The error scores and rates (figure 4.14 and 4.15 further explain how the classifi-cation varies with the significance threshold. At lower thresholds, there are a lot ofcorrect fixation events, but also merged events, resulting in deleted pursuits. Thedeletion rate rising more quickly than the percentage of deleted events indicatesthat the longer (and hence slower) pursuits are deleted. The suspicions about thereasons for the high FQlS seen earlier is here explained by the large overfill andmerging rates. While there is also a peak in the amount of correct pursuits at a

35

CHAPTER 4. RESULTS


0.0

0.2

0.4

0.6

0.8

1.0

Portion of m

ovem

ent c

lassified

as fix

ation 0

0.51510


0.0

0.2

0.4

0.6

0.8

1.0

Portion of m

ovem

ent c

lassified

as pu

rsuit 0

0.51510

Figure 4.13: I-VMPRay classification of different stimulus movements separated byvelocity, as the significance threshold for the Rayleigh test is varied from 0.1 to 0.9.

significance threshold of 0.2, the large underfill rates and low true positive rate in-dicate that the corresponding correct return is significantly smaller than the event,and perhaps not preferred to a fragmented event where the corresponding returnsare longer, and give a higher true positive rate.

At thresholds above 0.4, the deleted pursuit events disappear, along with theunderfill. Here, also, the amount of returns increases steeply in the form of fixationinsertions, fragmenting the pursuits. However, as the pursuit fragmentation rateshrinks along with the fixation insertion rate, it’s likely that the inserted fixationreturns are longer returns at lower thresholds, broken into pieces by pursuit returnsat higher thresholds.

Since I-VMPRay already defines minimum movement durations, this fragmenta-tion from excessive returns can’t be fixed with a moving average filter like I-VMPStd.Instead it’s possible to change the size of the window around each point where themovement is analysed. In figures C.7 and C.8 the error scores and rates are shownfor I-VMPRay when the window size is doubled, to 36 or 300 ms. The amount ofreturns is lowered, along with the fragmentation, with only a slight increase in un-derfill. It’s possible though, as the analysed time intervals get larger, that smallermovements would be ignored, none of which exist in the stimulus presented in thesetrials.

When compared with results from other articles, the correlation with the signifi-cance threshold might be confusing. In Larssons original thesis [17], the significancelevels ranged from 0.001 to 0.05, much unlike the range here. But since significanceis a statistical measure, it’s heavily dependent on sample size, and while 150 ms inthe 120 Hz data used here corresponds to 18 samples, Larsson (and Komogortsevet al. in [16]) used 1000 Hz data, resulting in more than 8 times the sample size.In fact, in mean vector lengths (see the algorithm description, in section 2.3.2, foran explanation), Larssons parameter interval is roughly translated to 0.21 and 0.14(note the change in direction), which, translating back to significance, but with thesample size used here, would be 0.46 and 0.7. This is perhaps also the intervalwhere this algorithm performs best in this analysis, Komogortsev too, in [16], finds

36

4.5. I-KF


0

50

100

150

200

Fixation Event S

cores

D

F

FM

M

C


0

1000

2000

3000

4000

5000

6000


I

F

FM

M

C


0

20

40

60

80

100

120

140

160


cores

D

F

FM

M

C


0

1000

2000

3000

4000

5000

6000

7000


F

FM

M

C

Figure 4.14: Error categorisation scores for I-VMPRay as the significance thresholdfor the Rayleigh test is varied from 0.1 to 0.9.

0.2 to be a good mean vector threshold. So while a Rayleigh test takes sample sizeinto account (which could be important around saccades and blinks), mean vectorthresholds could be a more universal property between data of different frequencies.

In fact, if the same algorithm is run on the 60 Hz of this trial, that is, with thesame combination of fixation, saccades, and different speed pursuits, albeit not inthe same order, the results are remarkably similar. The error scores and rates ofthe results on that data can be seen in figures C.9 and C.10 respectively.

4.5 I-KFFor the Kalman filter based algorithm, explained in 2.3.4 and 3.3.3, the quantitativeand qualitative scores are found in figure 4.16. Once again, there is nothing inthe varied parameter (the velocity threshold for fixations) that would affect theclassification of saccades, so the SQnS stays fixed at 0.74 which is the lowest ofthe algorithms analysed, but may of course be improved if the chi-square thresholdfor saccades is adjusted properly. The qualitative score for pursuits is decent andrelatively stable throughout the threshold interval, but for fixations it diverges asthe velocity threshold rises above 1.0, as with the other algorithms, this is mostprobably because of overfill into pursuits.

The PQnS is very high at the low thresholds, but is lowered significantly asthe threshold rises, and above about 1.5 deg/s it remains relatively stable. For anideal velocity filter, this could be expected though, as half of the pursuits (and the

37

CHAPTER 4. RESULTS


0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram

e Metric

Pos

itive

Rates dr

fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram

e Metric

Neg

ative Ra

tes

mr

ir

oer

osr

tnr


0.0

0.2

0.4

0.6

0.8

1.0

Smoo

th pur

suit Fram

e Metric

Pos

itive

Rates

dr

fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Smoo

th pur

suit Fram

e Metric

Neg

ative Ra

tes

mr

ir

oer

osr

tnr

Figure 4.15: Error categorisation rates for I-VMPRay as the significance thresholdfor the Rayleigh test is varied from 0.1 to 0.9.

majority of time spent in them) would be pursuits of 0.5 and 1 deg/s. However, theFQnS remains high, and only reaches 0.6 at the highest velocity thresholds.

0.5 1.0 1.5 2.0 2.5 3.0Fixational velocity threshold

0.0

0.2

0.4

0.6

0.8

1.0

Quan

titative Sc

ore

PQnSFQnSMisFixSQnS


0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Qualita

tive Sc

ore [deg

]

FQlSPQlS_P

Figure 4.16: Quantitative and qualitative scores for I-KF as the fixation velocitythreshold is varied from 0.4 to 3.0 deg/s.

The separation of classifications by velocity in figure 4.17 corroborates the theorythat the misclassification of the slower pursuits bring down the PQnS at highervelocity thresholds; around a velocity threshold of 0.5 deg/s, pursuits of that speedgo from being classified as pursuits, to being classified as fixations, and similarlyfor pursuits of 1 deg/s.

However, the correct classification of fixations remains relatively low, and con-

38

4.5. I-KF

sidering the high misclassification of the slower pursuits, it can’t be fully explainedwith system latency. To ignore slower pursuits, and increase the velocity thresholdwouldn’t seem to help either, as the propensity of the faster 5 deg/s pursuits tobe misclassified increases exponentially towards the end of the interval, while cor-rect fixation classification only sees a slow linear increase. This seems to indicatethat there’s an effect other than noise in the estimated velocity that hampers thisapproach.


0.0

0.2

0.4

0.6

0.8

1.0

Portion of m

ovem

ent c

lassified

as fix

ation 0

0.51510


0.0

0.2

0.4

0.6

0.8

1.0

Portion of m

ovem

ent c

lassified

as pu

rsuit 0

0.51510

Figure 4.17: I-KF classification of different stimulus movements separated by veloc-ity, as the fixation velocity threshold is varied from 0.4 to 3.0 deg/s.

The error scores of I-KF can be seen in figure 4.18. Interestingly, there are nota lot of fragmented fixation events, instead events are mostly deleted or correct.Though as the velocity threshold rises, more of the correct returns result in mergedevents.

As expected, the amount of inserted fixation and fragmenting pursuit returns ishighest around velocity thresholds that correspond to the slower pursuits. At thatinterval, around half of the pursuit events are fragmented, and a similar portionare merged. As the velocity threshold increases, the merging disappears, and theamount of deleted and correct pursuit events increases.

Note that while the amount of merged fixations seem to correlate with thedeleted pursuits, as is expected, the reverse doesn’t seem to hold, as there remainsdeleted fixations even when there are no merged pursuits. Since the amount ofdeleted fixations is quite sizable, it seems likely that they could correspond to thefixations following saccades. Even if they were classified as pursuits, they would notresult in merged pursuits unless both the saccade and earlier fixation are classifiedas pursuits.

The error rates (seen in figure 4.19) further clarify the issues this algorithm has.The pursuit deletion rate, compared with the ratio of deleted pursuit events, showthat the longer (and hence slower) events are the ones being deleted. And whilethe large amount of fixation overfill probably comes from the misclassified slowerpursuits, the large amount of pursuit end overfill, even at higher velocity thresholds,indicates a large classification delay in the transition between faster pursuit andfixations (considering the slower pursuits are already classified as fixations at those

39

CHAPTER 4. RESULTS


0

50

100

150

200

Fixation Event S

cores

D

F

FM

M

C


0

500

1000

1500

2000

2500


I

F

FM

M

C


0

20

40

60

80

100

120

140

160


cores

D

F

FM

M

C


0

500

1000

1500

2000

2500


F

FM

M

C

Figure 4.18: Error categorisation scores for I-KF as the fixation velocity thresholdis varied from 0.4 to 3.0 deg/s.

thresholds). Similarly, the pursuit overfill start rate is remarkably high, and wouldfit the idea of fixations being deleted following saccades, if they would instead beclassified as pursuits.

Figure 4.20 illustrates more specifically the problem this algorithm has in thetransitions between high speed pursuits and fixations in one of the trials. Thesegment starts with a fixation, and the velocity is correctly estimated very closeto zero, but as the fixation turns into a pursuit and back again to a fixation, theestimated velocity is only changing slowly, not reaching the full speed of the pursuit,and not reaching below 1 deg/s during the fixation before it ends in a saccade. Soclearly, in this case, the issue is not a noisy velocity measure, but the filter beingtoo slow to react to changes in the velocity, which leads to deleted fixations, evenat reasonable velocity thresholds. The same case is true for fixations followingsaccades, as can be seen towards the end of the segment in the figure.

The responsiveness of the Kalman filter to changes in the velocity depends on thenoise covariance matrices that defines it, describing the process and measurementnoise. And while these can be adjusted to a more adequate representation of theactual noise, there will always be a trade-off between responsiveness and carry-over noise in the resulting estimation. For example, Grindinger (in [19]) uses aperhaps more natural definition for these covariance matrices, where notably theprocess noise matrix takes the data frequency into account such that it models aprocess with constant velocity and random acceleration. The result of using thosecovariances can be seen in figure C.11, and while the velocity estimation is more

40

4.5. I-KF


0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram


fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram


tes

mr

ir

oer

osr

tnr


0.0

0.2

0.4

0.6

0.8

1.0

Smooth pursuit Fram


dr

fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Smooth pursuit Fram


tes

mr

ir

oer

osr

tnr

Figure 4.19: Error categorisation rates for I-KF as the fixation velocity threshold isvaried from 0.4 to 3.0 deg/s.

34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 38.0 38.5time [s]

−6

−4

−2

0

2

4

6

posi

tion

[deg

]

34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 38.0 38.5time [s]

−6

−4

−2

0

2

4

6

8

velo

city

[de

g/s]

Figure 4.20: Illustration of the performance of the Kalman filter proposed in [13].The left figure plots the x-coordinate of the gaze in angles, and the right shows thevelocity in the positive x-direction as it is estimated by the Kalman filter for thesame time interval.

responsive during pursuits and saccades, it is also noisier to such a degree thatvelocities of up to 3 deg/s are seen during fixations, which would complicate theclassification of slower pursuits.

Finally, even if the covariance matrices were perfectly balanced, it’s importantto note that the modelled process disregards saccades by design to be able to findthem with the chi-square test. Unmodelled dynamics will break the optimality ofKalman filters , so this might be the cause of a more fundamental problem for the

41

CHAPTER 4. RESULTS

filter if it should correctly find saccades while approximating the eye velocity wellenough to separate slow pursuits from fixations. The solution would have to be tomodify the algorithm, either by trying to include saccades in the model [27], or byusing a more robust Kalman filter that can infer the saccades as unknown inputssuch as those proposed in [28] or [29].

4.6 Movement properties histograms

For a select choice of the best performing parameters, plot histograms for the prop-erties of eye movements as classified by the algorithms, i.e. duration, velocity,amplitude. Compare these values to what is natural, and what would correspondto the stimulus.

The previous section has dealt with metrics based on comparisons with thestimulus. As detailed in chapter 2.4.1, another way to evaluate the algorithms is toignore the stimulus, and analyse what an algorithms classification implies in regardsto properties of the different types of movements. Examples of such properties canbe duration, velocity, and acceleration, and possible metrics could be averages andvariances.

Unfortunately, the low data frequencies used in this analysis are too low toreliably infer velocities [17], which leaves duration to be investigated. However, dueto the noisy results seen previously in most algorithms, the durations are heavilyskewed towards the lowest allowed value, and the variance wouldn’t properly reflectthe distribution of durations.

Another alternative, which will be used here, is to show the distribution of du-rations explicitly, as a histogram. Unfortunately, this means that the results formultiple parameter choices can’t be clearly displayed simultaneously. Here instead,one parameter choice is made for each algorithm to give an idea of how the distri-bution looks for each algorithm. The following histograms all plot the count againstthe duration, calculated as the amount of frames translated into milliseconds.

0 200 400 600 800 1000 1200 1400Fixation duration [ms]

0

50

100

150

200

250

300

350

400

450

coun

t [n]

0 200 400 600 800 1000 1200 1400Smooth pursuit duration [ms]

0

20

40

60

80

100

120

140

160

coun

t [n]

Figure 4.21: Movement duration histogram for I-VVT with a fixational velocitythreshold of 6.5 deg/s.

42

4.6. MOVEMENT PROPERTIES HISTOGRAMS

In figure 4.21, the durations of fixations and smooth pursuits as they are classi-fied by I-VVT can be seen. The fixational velocity threshold was set to 6.5 deg/s sothat there were as few deletions as possible while maintaining a decent classificationof at least the faster pursuits. The previously seen fragmentation is evident fromthe significant skew towards the lowest allowed value. Naturally, if the fixationalvelocity threshold is increased, longer fixations become more common, but the skewremains significant, and as seen earlier, smooth pursuits start getting deleted.


0

50

100

150

200

250

300

350

400

coun

t [n]


0

10

20

30

40

50

60

70

coun

t [n]

Figure 4.22: Movement duration histogram for I-VDT with a fixational dispersionthreshold of 0.95 deg and filtered position.

For I-VDT, the resulting duration distribution can be seen in figure 4.22. Toimprove the classification, the position was filtered using a Savitzky-Golay filterof order 2 and size 9. A dispersion threshold of 0.95 deg was chosen, so as tominimise deletions and having a reasonable sensitivity for both fixations and thefaster pursuits.

While there is a slight increase in medium-length fixations, and the peak forpursuits is smaller, there is still a heavy skew towards the shortest allowed durationfor both movements. As with I-VVT, the dispersion threshold can be increased tolower the amount of short fixations, but it comes at the cost of deleting pursuits.


0

20

40

60

80

100

120

140

coun

t [n]


0

50

100

150

200

250

coun

t [n]

Figure 4.23: Movement duration histogram for I-VMPStd with a standard deviationthreshold of 16.5 deg.

43

CHAPTER 4. RESULTS

To reduce the noise in the classification of I-VMPStd, the result was modifiedwith an iterated moving average of length 19 as explained in 4.3. Since deletion isnot a big issue for this filter, a standard deviation threshold of 16.5 was chosen toreduce fragmentation and keep a relatively high true positive ratio for both fixationsand the faster pursuits. The distribution of durations can be seen in figure 4.23.

While there is a significant bias for a fixation duration of 100 ms (being halfthe length of the moving average), the distribution is otherwise much more spreadout. The same can not be said for smooth pursuits, which remain heavily skewedtowards the lower end of the spectrum. Since 100 ms is not a hard lower limit ofmovement duration, a fair few fixations and pursuits exist that are shorter thanthat. These are mostly made up of segments between saccades or blinks that areshorter than 100 ms, classified in their entirety as fixations or pursuits.


0

100

200

300

400

500

600

700

coun

t [n]


0

500

1000

1500

2000

2500co

unt [n]

Figure 4.24: Movement duration histogram for I-VMPRay with a Rayleigh testsignificance of 0.6.

As for I-VMPStd, deletion is not a big issue for I-VMPRay, instead sensitivitywas weighted against fragmentation, and a significance of 0.6 was used to create thedistribution of durations seen in figure 4.24.

The peculiar shape of this distribution, with single-frame peaks separated byintervals of much less common durations is an effect of I-VMPRays block-wise clas-sification. Since segments between saccades were classified in blocks of 6 frames,durations that are a multiples of that block length are the most common, but sincethose segments aren’t all evenly divisible into such blocks, other durations are alsopossible. Having noted that, the duration distributions are still heavily skewed to-wards the minimum, although the fixation duration does exhibit a thicker tail thanI-VVT and I-VDT.

The distribution of durations for I-KF with a velocity threshold of 2 deg/s, seenin figure 4.25 are quite distinct from the other durations. Though there is nosignificant skew for fixations, there are a lot of longer fixations (not seen in thefigure), even longer than 10 s, which is quite unnatural given no fixation in thestimulus is longer than 1 s. Of course, given that the fixational velocity limit wasabove that of the slower pursuits, large parts of those pursuits would be classifiedas fixations, and contribute to the longer fixation durations.

44

4.6. MOVEMENT PROPERTIES HISTOGRAMS


0

1

2

3

4

5

6

7

8

9

coun

t [n]


0

5

10

15

20

25

30

35

40

coun

t [n]

Figure 4.25: Movement duration histogram for I-KF with a fixational velocitythreshold of 2 deg/s

While the other algorithms are mostly suffering from a noisy classification, theopposite is true for I-KF. And apart from misclassified slow pursuits, the issue hereis, as discussed earlier, the corruption of the velocity measure from accelerationthat isn’t properly modelled, be it saccades or the acceleration to and from fasterpursuits. Although flaws with this algorithm were made clear earlier, this is also agood example of how such flaws can be seen even without a reference classification.

45

Chapter 5

Discussion

To recapitulate the results of the previous section, most metrics showed a bleakpicture for all algorithms, but upon closer inspection it was seen that this was for themost part due to the difficulty of classifying the slower pursuits; no algorithm couldreliably separate the 0.5 and 1 deg/s pursuits from fixations. Since the stimuluswas created to display similar trajectories (with the same curvature and length)for all pursuit velocities, the slower pursuits constituted a larger portion of thetotal time spent in pursuits. The combination of the large proportion and poorclassification of the slower pursuits lead to skewed results, when this issue wasn’taccounted for. Notably, the quantitative scores and the error rates showed a muchworse classification result for pursuits than what one could expect from the generalcase. The presence of this skew in the quantitative and qualitative scores meansthat there’s no direct comparison with Komogortsev’s analysis of algorithms in[16], however, the results by velocity can be compared with Larsson’s [17] reportsof correctly versus falsely detected pursuits. Larsson’s results for I-VMPRay arecomparable with the results found here, while I-VMPStd performs much better inthis analysis, possibly due the higher data frequency used by Larsson which seemsto suit I-VMPStd worse.

When disregarding the slower pursuits, it seems the movement pattern basedalgorithms (I-VMPStd and I-VMPRay) result in a better positive classification rate,while I-VVT and I-VDT are struggling to properly classify the 5 deg/s pursuits.Although the I-VDT classification can be significantly improved by increasing theminimum fixation duration that accompanies the dispersion threshold, it resultsin a trade off between the abilities to classify short fixations and slow pursuits; inthis analysis, for example, proper classification of 5 deg/s pursuits was reached atdurations of 300 ms, which is far longer than the natural minimum fixation duration.

The I-VMP variants have another advantage in that they are less prone tooutright miss an entire movement, as evident by the error deletion and merge scores.However, they also exhibit a high degree of fragmentation, something that could befixed with extra filtering of the classification, or by setting minimum durations forthe eye movements. The latter option should be used with care though, so as to not

47

CHAPTER 5. DISCUSSION

simply classify as unknown segments where the thresholded variable is noisy, afterall, the threshold is not a clear separator between the two classes.

Among the two movement pattern algorithms, I-VMPRay is the more easilytuned; when using a threshold on the mean vector, results seem to translate fairlywell between frequencies and window sizes. The results of I-VMPStd on the otherhand, is grounded in the raw window size, which needs to be of a certain sizeregardless of frequency, this becomes an issue at higher frequencies, when a limitedwindow size means only a short time period will be analysed.

The Kalman filter based approach did not fair well at all in this analysis. Al-though it sometimes could provide the most accurate velocity filtering (at timesbeing able to separate fixations from slow pursuits), it meant that changes in veloc-ity were very slow to register (meaning transitions from faster to slower movementscorrupted the classification). Alternatively, changes in velocity could be handledquickly, but at a cost of accuracy in the velocity filtering (to the point where slowerpursuits were indistinguishable from fixations).

The inadequacies of this algorithm lie in the model used, and while it can betuned for a specific situation, improvement for the general case is probably to befound in more advanced underlying models such as those presented in [29].

Two algorithms not analysed here could be worth mentioning, the movementshape based machine learning approach developed in [30], and the data driventhresholding used in [21, 5]. The former is interesting, and the original paper[30] reports good initial results. But as with most machine-learning, it will be anarduous task to analyse its performance, giving it enough good data to learn from,while keeping it from overfitting and remain performant in the general case. Thelatter uses statistical assumptions on the velocity data, to adaptively set velocitythresholds based on the variance velocity for each individual trial. This is also aninteresting approach which has seen good results for the classification of fixationsand saccades. But while it’s relatively simple to use the velocity variance in thatlimited scenario (as saccades take up such a small part of the time-series data, yetare so distinct in velocity), using the same approach with pursuits would requireadditional assumptions on the relative size of fixations and pursuits in the data, andif it remains a purely velocity-based algorithm it would probably still have troubleswith the slowest pursuits.

Apart from suffering from the pursuit class skew, the quantitative score servedtheir purpose well, albeit simply. The exception would be MisFix which was oflittle interest, being roughly equal to 1− FQnS for the most part, as pursuits andfixations were the only sizable classes. The qualitative scores were also of limitedvalue, greatly affected by misclassifications in transitions to each movement, theyalso improve by fragmentation which is counterproductive. In the end, if positionalaccuracy is important for fixations, it can be improved in a post-processing step(perhaps by setting a dispersion threshold) independent of the algorithm used. Thepursuit qualitative score was of unclear importance, it could perhaps be best re-garded as a measure on the classifications of catch-up saccades, but system delaymeans that it premiers classification of slower pursuits to such a degree that effects

48

from catch-up saccades is negligible.As for the error categorisation, though more convoluted and harder to initially

get a feel for, it provided a more comprehensive view of how the data was classi-fied by the different algorithms. In addition to the straight-forward detection ofdeletion and merging, for example, under- and overfill could be analysed to adjustclassification accuracy for system delay. And while the error rates did suffer fromthe pursuit class skew, the skew was also made apparent by a comparison with theerror scores.

Using the stimulus as the reference classification worked for the most part, butwas problematic for the longer movements. Specifically the longer pursuits, wereprone to fragmentation due to catch-up saccades or blinks, though of course, thesewere also the slow pursuits that were hard to classify anyway.

The intrinsic movement properties, of course, avoided the issue of reference,but still suffered from bias in the stimulus. For example, the presence of hard-to-classify slow pursuits, and to a lesser extent the length of fixations in the stimulus,will skew the distribution of durations and velocity, but the nature of the skewdepends on the algorithm. This issue is made worse by these properties not beingeasily overviewable (over a range of parameters), without severe information loss.

In the end, to understand and properly evaluate the different algorithms, mostinsights had to be leveraged with knowledge about the makeup of the stimuli. Per-haps the next level of analysis has to properly couple the stimulus informationwith the analysis. If the metrics could be separated between different velocitiesand durations in an ad-hoc way, one could generate a more general stimulus, andidentify problem areas (such as slow pursuits) while still providing metrics for othermovements without skew.

49

Appendix A

Glossary

FQlS Fixation Qualitative Score. 14, 20, 23, 26, 28, 30, 31

FQnS Fixation Quantitative Score. 14, 20, 28, 34

MisFix Misclassified Fixation Score. 14, 20, 23, 26, 28

PQlS_P Smooth Pursuit Position Qualitative Score. 14, 20, 24, 26, 28

PQlS_V Smooth Pursuit Velocity Qualitative Score. 14, 20, 24, 26, 28

PQnS Smooth Pursuit Quantitative Score. 14, 20, 23, 26, 28, 31, 34

SQnS Saccade Quantitative Score. 14, 20, 25, 28, 33

I-KF Classification method based on the Kalman filter with an underlying position-velocity model. Saccades are found as deviations in the innovation with a chi-square test, and fixations and pursuits are separated with a velocity thresholdon the a posteriori velocity.. 33–36, 40

I-VDT Classification method based on a velocity threshold between smooth pur-suits and saccades, and a dispersion threshold to separate fixations and pur-suits.. 18, 19, 25–27, 38–40

I-VMPRay Classification method related to I-VMPStd. The angle of change be-tween consecutive gaze points is analysed, and a statistic is calculated onsequences of such angles, to compare them to the uniform distribution.. 19,30–33, 39, 40

I-VMPStd Classification method separating fixations and pursuits by finding (andsetting a threshold on) the standard deviation of the change in angles betweenconsecutive lines fitted to the raw gaze position, approximating gaze direction..19, 28–30, 32, 39

I-VVT Classification method based on velocity thresholds for fixations, smoothpursuits and saccades.. 18, 19, 23–25, 38–40

51

Appendix B

Kalman Filter

The Kalman filter is a so-called recursive estimator developed by R.E. Kalman in[31]. It approximates the internal state of partially observable noisy linear process byestimating an error covariance, and combining it with known process covariances tocalculate the Kalman gain. The Kalman gain, in turn, weighs a noisy observed statewith a predicted state to create a state estimation with minimum error covariance.

As noted, the process that is to be approximated should be linear, with aninternal state xk that is propagated in time through the state transition matrix Ak.This process can involve a known control input uk that affects the state througha control-input matrix Bk, and it can also be subject to a zero-mean noise wk

with known covariance Qk called the process noise. The state progression is thendescribed by:

xk = Akxk−1 + Bkuk + wk

The state should be observable through the observation matrix Hk, althoughthis could also be subject to an observation noise vk with zero mean and knowncovariance Rk. The observed values are then:

yk = Hkxk + vk

Internally, the Kalman filter also uses the error covariance matrix Pk, a measureof the state estimate error, and the Kalman gain Kk, the final state estimator forthe filter.

The first part of the Kalman filter is the prediction phase, where the previousestimates are weighted into a priori estimates of the next state and error covariance:

x̂−k = Akx̂k−1 + Bkuk

P−k = AkPk−1AT

k + Qk

Note that these can be calculated before any measurements at k are available,and as such can be useful in themselves for tasks where measurements are sparse.

53

APPENDIX B. KALMAN FILTER

The next part, called the update phase, calcuates the Kalman gain, and uses itto update the process error covariance. Here the observed state yk is also used tocalculate the innovation γk = yk − xk, which is weighted with the Kalman gain toproduce the a posteriori state estimate.

Kk = P−k HT

k

(HkP−

k HTk + Rk

)−1

Pk = P−k −KkHkP−

k

x̂k = x̂−k + Kkγk

Since the filter is recursive in the state and the error covariance matrix, the initialstate x0 and error covariance matrix P0 should ideally be known. This is, however,not entirely necessary in most cases, since the intial state can often be estimated withthe first data points, and the error covariance matrix generally converges quickly.Alternatively, since the calculation of the error covariance doesn’t depend on themeasurements, in many cases, such as for I −KF , where the dependent matricesdon’t (or rarely) change, the initial error covariance matrix can be approximatedby doing a “dry-run” of the estimation calculations until convergence.

The derivation of these equations can be found in the original paper [31], orin numerous introductory articles. There are also various alternative formulations,extensions to non-linear processes, estimations of input variables, and so on.

54

Appendix C

Additional Figures

This appendix contains a couple of additional figures that did not fit into earlierchapters due to space considerations. While most have an explanatory caption,their proper context is given where they are referred, so it is not recommended thatthis chapter is read on its own.

min maxThreshold parameter

10

15

20

25

30

35

40

Qualita

tive Score [deg

/s]

I-VVTI-VDTI-VMPStdI-VMPRayI-KF

Figure C.1: The velocity qualitative scores (PQlS_V ) for the analysed algorithms,as the thresholding parameter is varied from a minimum to a maximum value. ForI-VVT the fixational velocity threshold is varied from 2 to 12 deg/s; for I-VDT thefixational dispersion threshold is varied from 0.1 to 1.6 deg; for I-VMPStd the stan-dard deviation threshold varied from 10 to 30 deg; for I-VMPRay the significancethreshold is varied from 0.1 to 0.9; for I-KF the fixational threshold is varied from0.4 to 3.0 deg/s.

55

APPENDIX C. ADDITIONAL FIGURES


0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram

e Metric

Positive

Rates dr

fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram

e Metric

Neg

ative Ra

tes

mr

ir

oer

osr

tnr


0.0

0.2

0.4

0.6

0.8

1.0

Smoo

th pursuit Fram

e Metric

Positive

Rates

dr

fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Smoo

th pursuit Fram

e Metric

Neg

ative Ra

tes

mr

ir

oer

osr

tnr

Figure C.2: Error categorisation rates for I-VDT as the fixational dispersion thresh-old is varied from 0.1 to 1.6 deg.

0.2 0.4 0.6 0.8 1.0 1.2Fixational dispersion threshold

0.0

0.2

0.4

0.6

0.8

1.0

Portion of m

ovem

ent c

lassified

as fix

ation 0

0.51510

0.2 0.4 0.6 0.8 1.0 1.2Fixational dispersion threshold

0.0

0.2

0.4

0.6

0.8

1.0

Portion of m

ovem

ent c

lassified

as pu

rsuit 0

0.51510

Figure C.3: I-VDT classification of different stimulus movements separated by ve-locity, as the fixation dispersion threshold is varied from 0.1 to 1.2 deg. Additionally,the position has been filtered by the same Savitzky-Golay filter as the velocity.

56

0.5 1.0 1.5 2.0Fixational dispersion threshold

0.0

0.2

0.4

0.6

0.8

1.0

Portion of m

ovem

ent c

lassified

as fix

ation 0

0.51510

0.5 1.0 1.5 2.0Fixational dispersion threshold

0.0

0.2

0.4

0.6

0.8

1.0

Portion of m

ovem

ent c

lassified

as pu

rsuit 0

0.51510

Figure C.4: I-VDT classification of different stimulus movements separated by ve-locity, as the fixation dispersion threshold is varied from 0.1 to 2.0 deg. Additionally,the position has been filtered by the same Savitzky-Golay filter as the velocity, andthe minimum fixation duration was increased to 320 ms.


0

50

100

150

200

Fixation Ev

ent S

cores

D

F

FM

M

C


0

500

1000

1500

2000

2500

3000

3500

Fixation Re

turn Scores

I

F

FM

M

C


0

20

40

60

80

100

120

140

160

Smoo

th pursuit Ev

ent S

cores

D

F

FM

M

C


0

500

1000

1500

2000

2500

3000

3500

4000

Smoo

th pursuit Re

turn Scores I

F

FM

M

C

Figure C.5: Error categorisation scores for I-VMPStd as the standard deviationthreshold is varied from 10 to 30 deg. Additionally, the resulting classification wasfiltered with a moving average, implying a soft minimum duration of 80 ms forpursuits and fixations)

57



0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram


fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram


tes

mr

ir

oer

osr

tnr


0.0

0.2

0.4

0.6

0.8

1.0

Smooth pursuit Fram


dr

fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Smooth pursuit Fram


tes

mr

ir

oer

osr

tnr

Figure C.6: Error categorisation rates for I-VMPStd as the standard deviationthreshold is varied from 10 to 30 deg. Additionally, the resulting classification wasfiltered with a moving average, implying a soft minimum duration of 80 ms forpursuits and fixations)


0

50

100

150

200

Fixation Event S

cores

D

F

FM

M

C


0

1000

2000

3000

4000

5000


I

F

FM

M

C


0

20

40

60

80

100

120

140

160


cores

D

F

FM

M

C


0

1000

2000

3000

4000

5000


F

FM

M

C

Figure C.7: Error categorisation scores for I-VMPRay with a 300 ms window size,as the significance for the Rayleigh test is varied from 0.1 to 0.9.

58


0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram

e Metric

Pos

itive

Rates dr

fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram

e Metric

Neg

ative Ra

tes

mr

ir

oer

osr

tnr


0.0

0.2

0.4

0.6

0.8

1.0

Smoo

th pur

suit Fram

e Metric

Pos

itive

Rates

dr

fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Smoo

th pur

suit Fram

e Metric

Neg

ative Ra

tes

mr

ir

oer

osr

tnr

Figure C.8: Error categorisation rates for I-VMPRay with a 300 ms window size, asthe significance for the Rayleigh test is varied from 0.1 to 0.9.


0

50

100

150

200

Fixation Event S

cores

D

F

FM

M

C


0

1000

2000

3000

4000

5000

6000

7000


I

F

FM

M

C


0

20

40

60

80

100

120

140

160


cores

D

F

FM

M

C


0

1000

2000

3000

4000

5000

6000

7000


F

FM

M

C

Figure C.9: Error categorisation scores for I-VMPRay on 60 Hz data, as the signif-icance for the Rayleigh test is varied from 0.1 to 0.9.

59



0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram

e Metric

Pos

itive

Rates dr

fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Fixation Fram

e Metric

Neg

ative Ra

tes

mr

ir

oer

osr

tnr


0.0

0.2

0.4

0.6

0.8

1.0

Smoo

th pur

suit Fram

e Metric

Pos

itive

Rates

dr

fr

uer

usr

tpr


0.0

0.2

0.4

0.6

0.8

1.0

Smoo

th pur

suit Fram

e Metric

Neg

ative Ra

tes

mr

ir

oer

osr

tnr

Figure C.10: Error categorisation rates for I-VMPRay on 60 Hz data, as the signif-icance for the Rayleigh test is varied from 0.1 to 0.9.

34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 38.0 38.5time [s]

−6

−4

−2

0

2

4

6

posi

tion

[deg

]

34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 38.0 38.5time [s]

−10

−5

0

5

10

velo

city

[de

g/s]

Figure C.11: Illustration of the performance of the kalman filter proposed in [19].The left figure plots the x-coordinate of the gaze in angles, and the right shows thevelocity in the positive x-direction as it is estimated by the kalman filter for thesame time interval.

60

Bibliography

[1] E.B. Huey. The psychology and pedagogy of reading. Macmillan New York,1908.

[2] A.T. Duchowski. Eye tracking methodology: Theory and practice. Springer-Verlag New York Inc, 2007.

[3] P.A. Carpenter and M.A. Just. Reading comprehension as eyes see it. Cognitiveprocesses in comprehension, pages 109–139, 1977.

[4] Ethel Matin. Saccadic suppression: a review and an analysis. Psychologicalbulletin, 81(12):899, 1974.

[5] M. Nyström and K. Holmqvist. An adaptive algorithm for fixation, sac-cade, and glissade detection in eyetracking data. Behavior Research Methods,42(1):188–204, February 2010.

[6] Lorrin A Riggs, Floyd Ratliff, Janet C Cornsweet, and Tom N Cornsweet. Thedisappearance of steadily fixated visual test objects. JOSA, 43(6):495–500,1953.

[7] Susana Martinez-Conde, Stephen L Macknik, and David H Hubel. The role offixational eye movements in visual perception. Nature Reviews Neuroscience,5(3):229–240, 2004.

[8] Kenneth J Ciuffreda and Barry Tannen. Eye movement basics for the clinician,volume 18. Mosby St. Louis, 1995.

[9] Sophie de Brouwer, Marcus Missal, Graham Barnes, and Philippe Lefèvre.Quantitative analysis of catch-up saccades during sustained pursuit. Journalof Neurophysiology, 87(4):1772–1780, 2002.

[10] Craig H Meyer, Adrian G Lasker, and David A Robinson. The upper limit ofhuman smooth pursuit velocity. Vision research, 25(4):561–563, 1985.

[11] A Terry Bahill, Michael R Clark, and Lawrence Stark. The main sequence, atool for studying human eye movements. Mathematical Biosciences, 24(3):191–204, 1975.

61

BIBLIOGRAPHY

[12] J. S.A. Lopez. Off-the-Shelf Gaze Interaction. PhD thesis, IT University ofCopenhagen, September 2009.

[13] O. Komogortsev and J. Khan. Kalman filtering in the design of eye-gaze-guided computer interfaces. Human-Computer Interaction. HCI IntelligentMultimodal Interaction Environments, pages 679–689, 2007.

[14] D.D. Salvucci and J.H. Goldberg. Identifying fixations and saccades in eye-tracking protocols. In Proceedings of the 2000 symposium on Eye trackingresearch & applications, pages 71–78. ACM New York, NY, USA, 2000.

[15] K. Holmqvist, M. Nyström, R. Andersson, R. Dewhurst, H. Jarodzka, andJ. van de Weijer. Eye Tracking : A Comprehensive Guide to Methods andMeasures. Oxford University Press, 2011.

[16] Oleg V Komogortsev and Alex Karpov. Automated classification and scoringof smooth pursuit eye movements in the presence of fixations and saccades.Behavior research methods, 45(1):203–215, 2013.

[17] L. Larsson. Event detection in eye-tracking data. Master’s thesis, Lund Uni-versity, 2010.

[18] D. Sauter, B.J. Martin, N. Di Renzo, and C. Vomscheid. Analysis of eyetracking movements using innovations generated by a Kalman filter. Medicaland Biological Engineering and Computing, 29(1):63–69, 1991.

[19] T. Grindinger. Eye Movement Analysis & Prediction with the Kalman Filter.Master’s thesis, Clemson University, 2006.

[20] A. Savitzky and M.J.E. Golay. Smoothing and differentiation of data by sim-plified least squares procedures. Analytical chemistry, 36(8):1627–1639, 1964.

[21] R. Engbert and R. Kliegl. Microsaccades uncover the orientation of covertattention. Vision Research, 43(9):1035–1045, 2003.

[22] O.V. Komogortsev, S. Jayarathna, D.H. Koh, and S.M. Gowda. Qualitativeand Quantitative Scoring and Evaluation of the Eye Movement ClassificationAlgorithms. Technical Reports-Computer Science, page 15, 2009.

[23] Q. Yang, M.P. Bucci, and Z. Kapoula. The latency of saccades, vergence, andcombined eye movements in children and in adults. Investigative ophthalmology& visual science, 43(9):2939, 2002.

[24] B. Fischer and E. Ramsperger. Human express saccades: extremely shortreaction times of goal directed eye movements. Experimental Brain Research,57(1):191–195, 1984.

62

[25] Jamie A Ward, Paul Lukowicz, and Hans W Gellersen. Performance metrics foractivity recognition. ACM Transactions on Intelligent Systems and Technology(TIST), 2(1):6, 2011.

[26] G. Larsson. Evaluation methodology of eye movement classification algorithms,2010.

[27] Wael Abd-Almageed, M Sami Fadali, and George Bebis. A non-intrusivekalman filter-based tracker for pursuit eye movement. In American ControlConference, 2002. Proceedings of the 2002, volume 2, pages 1443–1447. IEEE,2002.

[28] Chien-Shu Hsieh. Robust two-stage kalman filters for systems with unknowninputs. Automatic Control, IEEE Transactions on, 45(12):2374–2378, 2000.

[29] Mohamed Darouach, Michel Zasadzinski, André Bassong Onana, SamuelNowakowski, et al. Kalman filtering with unknown inputs via optimal stateestimation of singular systems. International Journal of Systems Science,26(10):2015–2028, 1995.

[30] Mélodie Vidal, Andreas Bulling, and Hans Gellersen. Detection of smoothpursuits using eye movement shape features. In Proceedings of the Symposiumon Eye Tracking Research and Applications, pages 177–180. ACM, 2012.

[31] Rudolph Emil Kalman. A new approach to linear filtering and predictionproblems. Journal of basic Engineering, 82(1):35–45, 1960.

[32] O.V. Komogortsev and J.I. Khan. Eye movement prediction by Kalman filterwith integrated linear horizontal oculomotor plant mechanical model. In Pro-ceedings of the 2008 symposium on Eye tracking research & applications, pages229–236. ACM, 2008.

[33] DA Robinson. The mechanics of human smooth pursuit eye movement. TheJournal of Physiology, 180(3):569, 1965.

63

Date post:	09-Aug-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Evaluationofclassiﬁcationalgorithmsforsmooth ...763273/FULLTEXT01.pdf · the object of focus, and...

Documents