Analysing Eye-Tracking Data

transcript

Hayward GodwinUniversity of Southampton

OutlinePart 1• Eye-tracking measures – an overview• Data Viewer reports• The Organise-Analyse-Visualise approach in R

Part 2• Try it yourself!

Eye-Tracking MeasuresAn Overview

for a detailed review, see Rayner (2009)

“Global” versus “Local” measures• Global measures are computed at the overall (or global) level of a trial

and ignore what was being fixated at any point in time• e.g., mean fixation duration for a trial

• Local measures are computed for each object or stimulus in a trial, paying attention to what was being fixated at any point in time• e.g., mean fixation duration for target words in a reading study

• Many measures can be computed at both a global and a local level

“Search for a blue square target”

Mean Fixation Duration (global)(Mean duration of fixations)

Mean Fixation Duration =

(130+125+110+90+150+190)/6

Mean Fixation Duration (local)(Mean duration of fixations on a specific object type)

Mean Fixation Duration for target =

(110+190)/2

Number of Fixations (global)(Mean number of fixations)

Number of fixations =

Number of Fixations (local)(Mean number of fixations on a specific object type)

Number of fixations for target =

Total Gaze Duration (global)(Sum of fixation durations)

Total gaze duration =

130+125+110+90+150+190

Total Gaze Duration (local)(sum of fixation durations on a specific object type)

Total gaze duration for target =

110+190

First-pass Gaze Duration(sum of fixation durations on the first visit or pass of an object)

First-pass gaze duration for target =

(the second fixation of 190ms duration occurs on the second pass so is excluded)

Single Fixation Duration(mean of fixation durations when an object is only ever fixated once)

This is one of the cleanest measures there are in eye-tracking since only fixating an object once means we can chart the time taken to fully process that object

Here, only two objects are ever fixated once. These are highlighted to the left.

Since the target object is fixated twice, this trial would be excluded from the single fixation duration calculations.

Proportion of objects fixated (global)(Proportion of objects directly fixated)

Proportion fixated = 3 / 5 = 0.6

Proportion of objects fixated (local)(Proportion of objects directly fixated, broken down by object type)

Proportion of distractors fixated=2/4=0.5

Probability of fixating target = 1/1 = 1

Saccade onset latency(Time from display onset to start of first saccade)

If display occurs at time 0, then this is 130ms

Mean number of visits(Mean number of times each object is visited)

Count up number of times each object is visited and then divide by the number of objects that were visited

Do NOT include zero values for unvisited objects

1 + 2 + 1 = 4 / 3 = 1.3

Saccade Amplitude(Mean amplitude of saccades)

Mean length of all saccades =

(1.2 + 1.4 + 2.2 + 0.2 + 3.4) / 51.2

0.23.4

Verification Time(Time between first fixating and button press)

Find when button press occurred. If we find that it occurred 150ms into the second fixation (of 190ms) on the target, then verification time =

110 + 90 + 150 + 190 150

A better way to do this is to find the time the first fixation starts on the target and take this value away from the RT

Scanpath Ratio(sum of saccade lengths to target divided by shortest distance to target)

Scanpath ratio =

(1.2 + 1.4 + 2.2 + 0.2 + 3.4) / 5.21.2

0.23.4

Notes on Measures• Many, many measures that can be run• Just because you can run these, it doesn’t mean that you should• Focus on running only the measures that address your research

questions and avoid doing or reporting additional ones for the sake of it (i.e., avoid fishing!)

Data Viewer Reports

Fixation Report• One row of data for every fixation in your study (per trial, per participant)• You will typically need to use the fixation report if you are running visual

search/scene perception studies• Use fixation reports to filter out fixations that coincide with other events,

such as display changes, button-press responses, etc• This can be done by filtering using the Interest Period (as you’ll see in the

tutorials) but often you’ll end up removing some fixations you still want• Fixation reports can also be used to re-compute the size of interest areas

and capture fixations that fell just outside of interest areas

Fixation Report – Important Columns • RECORDING_SESSION_LABEL: The recording session ID• TRIAL_INDEX: Trial number• CURRENT_FIX_INDEX: The fixation ID for the current• CURRENT_FIX_DURATION: The duration of the current fixation• CURRENT_FIX_BUTTON_PRESS_X: The time during the current fixation that a button was

pressed• CURRENT_FIX_INTEREST_AREA_LABEL: The interest area label of the current fixation (“.” if

the eyes are not on an IA)• CURRENT_FIX_NEAREST_INTEREST_AREA_LABEL: The nearest IA to the eyes• CURRENT_FIX_NEAREST_INTEREST_AREA_DISTANCE: The distance to the CENTRE of the

nearest IA• Can also get NEXT_ and PREVIOUS_ versions of all measures

Interest Area Report• One row of data for every interest area in your study (per trial, per

participant)• Reading researchers typically use this type of report• They typically change the interest period to be set to the time period

of the trial itself, enabling the filtering out of any unnecessary fixations

Interest Area Report – Important Columns • RECORDING_SESSION_LABEL: The recording session ID• TRIAL_INDEX: Trial number• IA_DWELL_TIME - Total time spent on the IA (sum of all fixations on IA)• IA_FIRST_FIXATION_DURATION - Often referred to as First Fix Duration in reading research. The duration of the first fixation of

the interest area (only on first pass, if the target region is skipped this will have no value)• IA_FIRST_RUN_DWELL_TIME - Often referred to as Gaze Duration in reading research. A sum of all fixation on the IA for the

first pass. You also use this column for calculating Single Fixation Duration, but remove all occurrences where the IA region was fixated more than once.

• IA_ID/IA_LABEL - The ID number and label for the interest area• IA_REGRESSION_IN - Returns 0 or 1 • IA_REGRESSION_IN_COUNT - Returns the number of regressions in• IA_REGRESSION_OUT - Returns 0 or 1 • IA_REGRESSION_OUT_COUNT - Returns the number of regressions out• IA_REGRESSION_PATH_DURATION - Often referred to as Go Past Time in reading research. Sum of all fixations that occur

before passing to the right of the target interest area (to a greater numbered IA_ID). • IA_SKIP - Returns a 0 or 1

Message Report• One row of data for every message that occurred during the study

(per trial, per participant)• If you want an accurate view of when things happened during your

study, the message report is the one to use • This is particularly important for gaze-contingent studies where

display changes occur• You can technically get most of the messages that occur from the

fixation report. However, some messages do get missed from the fixation report

Message Report – Important Columns • RECORDING_SESSION_LABEL: The recording session ID• TRIAL_INDEX: Trial number• CURRENT_MSG_LABEL : message text details• CURRENT_MSG_TEXT : message text details• CURRENT_MSG_TIME : the time the message occured

Sample Report• One row of data for every sample recorded by the eye-tracker during

the study (per trial, per participant)• If you have your Eyelink running at 1000Hz, that gives you 1,000 rows

of data per second of recording• Sample reports typically are tens of millions of rows in size• You’ll only need to use a sample report if you have certain highly

customised setups (e.g., moving displays) or want to get an idea of millisecond-by-millisecond pupil size (as is the case in pupillometry)

The Organise-Analyse-Visualise Approach in R

Data• In the past, data could easily be organised in Excel, Analysed in SPSS

and Visualised in SPSS/Excel/Sigmaplot• With the size and complexity of eye-tracking studies, this is no longer

really possible• We can now do all three steps in R, making the transition between

them easier:• Organise: data.table • Analyse: ezANOVA• Visualise: ggplot

Organising your Scripts for Reproducible Results• However you do things, it’s best to have a consistent approach to

organising your R scripts• I have two types of script:• ORGANISE__XYZ.R scripts that organise the data• ANALYSE__XYZ.R scripts that analyse and visualise the data

• However you set up your own R scripts, find an approach and stick to it• This then makes it easier to copy and paste existing scripts, and being

consistent means you can go back to old stuff and understand it more easily

Organise: the data.table package• Why use data.table?• It does things very quickly• It extends (builds upon) data.frame objects, meaning that everything you can

do to a data.frame object, you can do to a data.table

• Now going to go through some examples of what it can do and how to use it• I’ll be giving out the example code later, so no need to type or run

through it now

Create a data.frame

Create a normal data.frame

It will look something like this on the right

It lists different trials for a bunch of participants and gives you their RT (Reaction Time) in ms

Convert data.frame to data.table

Add Keys• For large data sets you will want to set keys• When data are keyed, they can be processed faster• A key is set to various columns in your data.table• When a column is associated with a key, it will be able to group the

data by that column more rapidly• In our example, let's set participant id (ppt) and trialType as keys so

we can group the data by these values more rapidly using the setkey command

Basic Syntax

• {WHERE} allows you to select only certain columns. In other words you can get the command you run to focus only on the data cells WHERE certain conditions are met• {SELECT} is where you tell data.table what columns or values you

want back. In other words you SELECT certain values• {GROUPBY} allows you to group the output data in different ways.

This is a bit like pivot tables in Excel.

Getting means• How about the mean RT overall?

• Gives us:

• In other words we are SELECTing the mean of the RT column

Getting means• Overall RT isn’t the interesting. Let’s GROUP BY trialtype:

• Gives us:

• In other words we are SELECTing the mean of the RT column but GROUPING BY the trialType column

Getting means• Now let's group by participant and trialType:

• Gives us:

• In other words we are SELECTing the mean of the RT column but GROUPING BY the trialType and ppt columns

Getting means• But what if we want to only obtain the means for trials 3 and 4? How do we do that? We use

WHERE ! (Reminder “==“ means “is equal to”)

• Gives us:

• In other words we are SELECTing the mean of the RT column but GROUPING BY the trialType and ppt columns but only including values WHERE trial is 3 or 4

Adding Columns• Data.table also offers more convenient syntax for adding columns• If you run:

• You add a newColumn column with a value of 1. You can combine this with WHERE and GROUP BY commands. If you run:

• You get:

Joins and Merges• Suppose we forgot to include information relating to which condition

each participant was in. How do we get that in there?• We can use a join! • A join in data science is a special type of operation that combines two

datasets• To do this, create a new data.table, listing the participant id and the

condition and follow the steps in the next slide• Joins (or merges) hunt down identical column names and then join

the data from one table with that from another

Performing the Join• Create new data.table containing condition information and set the

• To perform a join, it’s one simple command

We then have our joined-up dataDT

joinedDT

We then have our joined-up dataDT

joinedDT

Other Types of Join• We’ve just done our first join!

• Note that we’ve just joined one column with one other column, but there is no theoretical limit to how many columns you can join by at once

• There are many types of join, which you may want to use (e.g., left, right, natural, outer, full, Cartesian product, etc.)

• The main point is making sure that the column names match in the tables you are trying to join, or else things will go horribly wrong

Analysing DataWorked Example

Worked Example: Mean Fixation Durations (global)• Let’s begin by taking data from a fixation report

• We’ll analyse it, compute mean fixation durations (global), run an ANOVA, and then plot a graph

• The data and scripts required are on the website but let’s walk through it together first

Computing Mean Fixation Durations (global)Example from a fixation report• First we compute the by-trial, by-participant means:

• This gives us the mean fixation duration for each participant and each trial• Then we take the mean of these to get means by participant:

Computing Mean Fixation Durations (global)Example from a fixation report• This is what we now have:

• Each participant (RECORDING_SESSION_LABEL) grouped by TRIAL_TYPE with a DV (mean fixation duration)• What next?

Computing Mean Fixation Durations (global)Example from a fixation report• Now we analyse the data using ezAONVA!

• This is from the ez package

• Note: make sure that all columns that are factors in your anova are factors in R before proceeding

Computing Mean Fixation Durations (global)Example from a fixation report• ezANOVA syntax:

The dependent variable column

A list of within-subjects factors

A list of between-subjects factors

The column containing participant IDs

The data.table name

Computing Mean Fixation Durations (global)Example from a fixation report• Here, we want to see if the within-subjects variable TRIAL_TYPE influences fixation

durations. So we do this:

• And get this:

• Most of this should be self-explanatory (it’s significant!)• Note that ges is generalised eta-squared, a measure of effect size (remember: APA

format wants effect sizes now). Cite this paper when you use it: http://www.uv.es/friasnav/Bakeman2005

Computing Mean Fixation Durations (global)Example from a fixation report• Let’s plot it!• To produce a plot, we can use ezStats to first get descriptive means• The nice thing here is that ezStats has the same syntax as ezANOVA

(i.e., you can copy/paste)

• Take a look at the values:

Computing Mean Fixation Durations (global)Example from a fixation report• Now, let’s plot it! We use ggplot to do the plotting.

The data.table containing the means for plotting Controlling axes and making it APA format

Save the plot to disk

Set up the aesthetics of the plot, with x being the values plotted along the x-axis and y being the value plotted on the y-axis

Draw points (as opposed to bars/lines)

Graphing with ggplot• There’s a very large number of options when plotting with ggplot

• We will only cover very basic ones here

• More information can be found at:• http://www.cookbook-r.com/Graphs/• http://ggplot2.org/• And elsewhere online…

Computing Mean Fixation Durations (local)Example from a fixation report• Next, we want to see if the within-subjects variable TRIAL_TYPE

influences fixation durations AND if fixation durations are different for each interest area type• We have two types of interest area: TARGET and DISTRACTOR• We therefore run local mean fixation durations, comparing target and

distractor fixation durations• We also now need to remove fixations that did not fall on an interest

area• The column to use is CURRENT_FIX_INTEREST_AREA_LABEL

Computing Mean Fixation Durations (local)Example from a fixation report• Same process as before: compute by-trial means and then by-ppt means

• The only difference now is that we’re removing fixations that didn’t land on an interest area (i.e., WHERE CURRENT_FIX_INTEREST_AREA_LABEL is “.”)• We’re also now GROUPING BY the CURRENT_FIX_INTEREST_AREA_LABEL

column

Computing Mean Fixation Durations (local)Example from a fixation report• Now it’s time to run the ANOVA• This is done the same as before, just now we have one more within-

subjects factor

• But the results are similar: only TRIAL_TYPE is significant

Computing Mean Fixation Durations (local)Example from a fixation report• Next, we get the means as before:

• Again, we are now adding CURRENT_FIX_INTEREST_AREA_LABEL to our list of grouping within-subjects factor columns

Sneak Peak at the Graph

Note that this graph has two panels – or in ggplot’s language – two facets, one for DISTRACTOR_A objects and one for TARGET objects

How do we get it to do that?

The facet_wrap command will create facets for every level of CURRENT_FIX_INTEREST_AREA_LABEL

You’re not limited to creating facets for only one column. Try out facet_wrap(TRIAL_TYPE~CURRENT_FIX_INTEREST_AREA_LABEL) and see what happens

Writing it up• When writing up eye-tracking data, don’t just assume the reader

knows why you examined each measure• Given the complexity and number of possible measures it’s vital that

you are extremely clear both in your own head and when you write things up why each measure was examined and what that measure is telling you• If people start complaining that you’ve explained it too much and that

it’s bordering on being patronising, then you’re doing it right

Writing it upFrom Godwin, Hyde, Taunton, Calver, Blake & Liversedge (2013)

• Simple approach:• Begin by stating what the measure

has been shown to demonstrate in the past• Make a prediction for that measure

in your own study• Then describe how you examined it• Finally describe what it showed

• Don’t just bombard the reader with F and t values

Writing it upFrom Sheridan & Reingold (2013)

Writing it upFrom Fitzsimmons & Drieghe (2013)

The bigger picture• This approach forms part of a larger picture when writing up your

• Let’s just note a few pointers before finishing

The bigger picture• Introduction• First paragraph: general context of the work, prelude main points• Middle paragraphs: existing research on the topic, highlighting what has been

missed or not done (either at all or perfectly) before• Ending paragraphs: say how your work will overcome the limitations in

previous work, clearly noting how what you have done fills a gap in the existing literature and human knowledge. Tell them why your work is awesome. State your research question(s). Applied relevance also gets noted if relevant• Final paragraph(s): make a series of clear and direct predictions. State WHY

you are examining each measure and PREDICT what you think each will show you

The bigger picture• Results• First paragraph: describe what you are going to do in your results and why• Second paragraph: describe how you cleaned your eye-tracking data• Middle paragraphs: go through each of your measures in the same order as

you predicted them in your introduction. For each one, state WHY you are analysing that one and WHAT it shows you, and whether it confirms or rejects your predictions

The bigger picture• Discussion• First paragraph: re-state what you did in the study and remind the reader of

your goals and research questions. • Middle paragraphs: go through each of your measures in the same order as

you predicted them in your introduction. For each one, state WHY you analysed that one, what the outcome was, and WHAT THAT MEANS in relation to your predictions• Later paragraphs: draw the results together for an overall picture. State

applied implications if necessary. Suggest future studies that would be cool.

• Never end by saying something along the lines of “more research is needed.”

The rest of today• Next up:• Head to the website (http://wiki.psychwire.co.uk/) and go through the Part 4:

Data Viewer section

• Then go through the Part 5: Data Analysis section, which will outline the bits we’ve gone through above and some extra pieces here and there

• That’s it.

Analysing Eye-Tracking Data

Documents