Evaluation ofusability tests
Why evaluate?1. choose the most suitable data-
collection techniques2. identify methodological strength
and weaknesses of a user test
Evaluation Criteria fordata-collection techniques Utility
how useful are the data? Costs
resources needed? Objectivity
how much subjective judgement is involved? Level of detail
is the amount and resolution of the data suitable? Intrusiveness
does the method interfere with the user’s performance?
Observations in real timeStrengths: Level of detail:
Allows you to experience the context in which performance takes place
Weaknesses: Level of detail:
Difficult to keep up with the pace of the user
Objective: Based on your own subjective judgement as an observer
Observations from videoStrengths: Utility: Allows you to
conduct detailed analysis of various usability attributes
Utility: Can obtain data about the user’s reasoning (”Think-aloud”)
Weaknesses: Costs: Time
consuming Utility: Lots of data
not being used Intrusiveness:
”Think-aloud” may disturb the user
Observations: Real time or Video?
Real time Video
Context Product ProductContext
Level of detail
Event logsStrengths: Objective: The data
are collected automatically
Costs: Automated data collection requires little effort from the test team
Weaknesses: Level of detail: Both
the amount of data and the resolution can be too high
Utility: It can be difficult to create useful measures
http://zing.ncsl.nist.gov/WebTools/VisVIP/overview.html
Questionnaire, self-madeStrengths: Level of detail: Can
be tailored to fit the purpose of the test
Utility: Can be used in several setting with different products
Costs: It doesn’t take long time to develop
Weaknesses: Objectivity: Based on
subjective judgement
Utility: Difficult to construct good items
Questionnaire, validatedStrengths: Utility: Can be used
in several setting with different products
Costs: the data are typically easy to transform into measures
Weaknesses: Level of detail:
Validated questionnaires may not address the features of the interface you are interested in.
Objectivity: based on subjective judgement
Summary data-collection techniques
Data-collection technique/ Criteria
Utility Costs Objectivity Level of detail
Intrusiveness
Interview - - - + - Questionnaire self-made ++ ++ - ++ + Questionnaire validated + - - + + Observation real time + + - - + Observation video ++ - + + +
Event logs - - ++ + ++
Physiological - -- ++ + --
The assessment concern MEASURES and not use/problem descriptions; ++ = very good; + = good; - = not so good; -- = poor
…Use/problem descriptions Observation and
Interviews are the most suitable data-collection techniques for use/problem descriptions
Data-collection technique/ Criteria
Utility
Interview ++ Observation real time ++ Observation video ++
Event logs +
Evaluation of measures The evaluation criteria of the
data-collection techniques Validitity Reliability
ValidityDo you measure what you believe you measure?
ReliabilityDo you obtain the same results when you measure the same thing during similar conditions at different points in time?
Relationship betweenValidity & Reliability
Evaluating the validity of a measure is primarily based on subjective judgement, while reliability is typically evaluated by means of statistics
It is possible to obtain reliable results that are invalid, but not unreliable results that are valid!
How can you avoid invalid results? Use several measures!
Triangulation Multiple operationalism
Ethical issues Be well prepared - act
professionally! Create a script
Introduction During test Debriefing
Create a consent form
Ethical issues The product is being tested, not the user! Respectful treatment: preserve integrity Informed consent
Inform the user what will happen, how the collected data will be used etc.
Make sure the user understands and agrees The user may leave whenever she/he
wants Confidentiality
Types of measures Experience-attitude Performance Cognitive
Experience-attitudeStrengths: Utility: Can address
most usability attributes
Validity: User-centered; we ask for the user’s opinions
Weaknesses: Validity/Objectivity:
based on the user’s subjective judgement
Performance: completenessStrengths: Utility: Can be used
for most tasks and in different settings
Cost-effective: Quite easy to create a list of activities
Weaknesses: Validity/reliability: The
user may choose a solution path you didn’t think of, but that nevertheless is satisfactory
Validity(senitivity): Ceiling or flooring effects: the task is too easy or too difficult
Summary of measuresMetric type/data-collection technique
Validity (are we able to measure it)
Construct validity (importance to usability)
Utility (how useful it is to make design decisions - currently)
Experience-attitude ++ ++ ++ Performance time + + + Peformance completeness ++ + + Performance failures - ++ + Situation awareness - + -
Workload - + -
++ = very good; + = good; - = not so good; -- = poor
Relation between data-collection techniques and measures
Data-collection technique/ Metrics
Experience-attitude
Performance time
Performance completeness
Performance failures
Situation awareness
Workload
Interview + - - + - -
Questionnaire ++ - + - + + Observation real time - - + + - - Observation video - ++ ++ ++ + +
Event log - ++ + - - -
Physiological - -- -- -- - +
++ = very good; + = good; - = not so good; -- = poor
Relation between data-collection techniques and measures
Measure
Data-collection technique
Practicle limitations
Purposeof test