©20111www.id-book.com
Introducing Evaluation
Chapter 12
©20112www.id-book.com
The aims
• How can the usability of a system be evaluated?
• How can usability problems be found and improvements suggested?
©20113www.id-book.com
Key questions for an evaluation
Iterative design & evaluation is a continuous process that examines:
• Why: to check users’ requirements and that users can use the product and they like it.
• What: a conceptual model, early prototypes of a new system and later, more complete prototypes. Lay down usability criteria.
• Where: in natural and laboratory settings.• When: throughout design; finished products can be
evaluated to collect information to inform new products.
Summative evaluation : final quantitative assessment of initially defined criteria.
Formative evaluation : at different times, assess current system against actual requirements.
©20114www.id-book.com
Bruce Tognazzini tells you why you need to evaluate
“Iterative design, with its repeating cycle of design and testing, is the only validated methodology in existence that will consistently produce successful results. If you don’t have user-testing as an integral part of your design process you are going to throw buckets of money down the drain.”
See AskTog.com for topical discussions about design and evaluation.
©20115www.id-book.com
Types of evaluation
3 broad categories, depending on the setting, user involvement and level of control.
• Controlled settings involving users
- usability testing & experiments in laboratories and living labs.
• Natural settings involving users
- field studies to see how the product is used in the real world.
• Any settings not involving users
- consultants critique; to predict, analyze & model aspects of the interface analytics.
©20116www.id-book.com
Pros and Cons• Controlled settings involving users (Lab-based
studies)
- Good at revealing usability problems
- Poor at capturing context of use• Natural settings involving users (Field studies)
- Good at demonstrating how people use technologies in their intended setting
- Expensive and difficult to conduct• Any settings not involving users (Modelling and
predicting approaches)
- Quick and cheap to perform
- Missing unpredictable usability problems and subtle aspects of the user experience
©20117www.id-book.com
Living labs
• People’s use of technology in their everyday lives can be evaluated in living labs.
• Such evaluations are too difficult to do in a usability lab.
• Eg the Aware Home was embedded with a complex network of sensors and audio/video recording devices (Abowd et al., 2000).
©20118www.id-book.com
Usability testing & field studies can compliment
©20119www.id-book.com
Evaluation methodsMethod Controlled
settingsNatural settings
Without users
Observing x x
Asking users
x x
Asking experts
x x
Testing x
Modeling x
©201110www.id-book.com
Usability TestingUsability testing refers to evaluating a product, website, mobile app or system by testing it with representative users with real-life scenarios and user task.
The goal is to identify any usability problems, collect qualitative and quantitative data and determine the participant's satisfaction with the product.
©201111www.id-book.com
Usability Testing1. Get representative users• 5 – 10 participants
2. Define criteria for evaluation• Time to complete a task.• Time to complete a task after a specified time
away from the product.• Number and type of errors per task.• Number of errors per unit of time.• Number of navigations to online help or
manuals.• Number of users making a particular errors.• Number of users completing a task successfully.
©201112www.id-book.com
Usability Testing3. Develop test scenario:setup+context+task• Choose relevant scenarios (typical vs extreme)• Keep task duration shorter than 30 minutes• Ensure identical conditions for all participants
4. Consider ethical issues• De-brief participants, get consent, etc.
5. Run pilot tests & refine design• Practice with staff and observers
6. Actual testing• Instruction of participants• Carry out test and record data
©201113www.id-book.com
Usability Testing7. Analysis• Statistics eg. Mouse events, menu selection• Screen design : gaze tracking and course of task
completion• Post task video confrontation and user interview
8. Report results and make recommendations for improvement.
©2011© 20117w ww.id- book .com
Usability lab w ith observers watching a user & assistant
©2011
©2011© 20115w ww.id- book .com
Usability testing & researchUsability testing
• I mprove products• Few participants• Results inform design• Usually not
com pletely replicable• Conditions controlled
as m uch as possible• Procedure planned• Results reported to
developers
Experim ents for research
• Discover knowledge• Many participants• Results validated
statistically • Must be replicable• S trongly controlled
conditions• Experimental design• S cientific report to
scientific comm unity
©2011
Portable equipm ent for use in the field
©2011
Examples of some of the tests used in the Ipad
evaluation(adapted from Budiu and Nielsen, 2010)
App or Website Task
iBook Download a free copy of Alice's Advantures in Wonderland and read through the first few pages.
eBay You want to buy a new iPad on eBay. Find one that you could buy from the reputable seller.
Time Magazine Browse through the magazine and find the best pictures of the week.
Kayak You are planning a trip to Death Valley in May this year. Find a hotel located in the park or close to the park.
©201119www.id-book.com
Experiments
Predict the relationship between two or more variables.
Independent variable is manipulated by the researcher.
Dependent variable depends on the independent variable.
Typical experimental designs have one or two independent variable.
Validated statistically & replicable.
©2011
Experim ental designs
• Different participants - single group of participants is allocated random ly to the experim ental conditions.
• S am e participants - all participants appear in both conditions.
• Matched participants - participants are m atched in pairs, e.g., based on expertise, gender, etc.
©2011
Design Advantages Disadvantages
Different No order effects Many subjects & individual differences a problem
Sam e Few individuals, no individual differences
Counter- balancing needed because of ordering effects
Matched S am e as different participants but individual differences reduced
Cannot be sure of perfect m atching on all differences
Different, sam e, m atched participant design
©2011
Field studies
• Field studies are done in natural settings.• “in the w ild” is a term for prototypes being
used freely in natural settings.• Aim to understand what users do naturally
and how technology im pacts them .• Field studies are used in product design to:
- identify opportunities for new technology;- determ ine design requirem ents; - decide how best to introduce new technology;- evaluate technology in use.
©2011
UbiFit Garden: An in the w ild study
©2011
Analytical evaluation
• Describe the key concepts associated w ith inspection m ethods.
• Explain how to do heuristic evaluation and walkthroughs.
• Explain the role of analytics in evaluation.
• Describe how to perform two types of predictive m ethods, GOMS and Fitts’ Law.
©2011
• S everal k inds.• Experts use their know ledge of users &
technology to review software usability.• Expert critiques ( crits) can be form al or
inform al reports.• Heuristic evaluation is a review guided
by a set of heuristics.• W alkthroughs involve stepping through
a pre- planned scenario noting potential problem s.
I nspections
©2011
Heuristic evaluation
• Developed J acob Nielsen in the early 1990s.
• Based on heuristics distilled from an em pirical analysis of 249 usability problem s.
• These heuristics have been revised for current technology.
• Heuristics being developed for m obile devices, wearables, virtual worlds, etc.
• Design guidelines form a basis for developing heuristics.
©2011
Nielsen’s original heuristics
1. Visibility of system status
2. Match between system and the real worldSpeak the user's language, follow real-world conventions, make information appear in a natural and logical order
3. User freedom and controlProvide a clearly marked “emergency exit” to leave an unwanted state (undo and redo)
4. Consistency and standardsUsers should not have to wonder whether different words, situations, or actions means the same thing.
5. Error prevention
©2011
Nielsen’s original heuristics6. Recognition rather than recall
7. Flexibility and efficiency of useCater both inexperienced and experienced users, allow to tailor frequent actions
8. Aesthetic and minimalist designProvide no irrelevant or rarely needed info
9. Help users recognize, diagnose and recover from errorsError messages in plain language (no codes), precisely indicate the problem, suggest a solution
10. Help and documentationProvide help and documentation, easy to search, focus on user task, list concrete steps to be carried out, not too large
©2011
Discount evaluation
• Heuristic evaluation is referred to as discount evaluation when 5 evaluators are used.
• Em pirical evidence suggests that on average 5 evaluators identify 75- 80% of usability problem s.
©2011
No. of evaluators & problem s
©2011
• Briefing session to tell experts what to do.
• Evaluation period of 1- 2 hours in which:– Each expert works separately;– Take one pass to get a feel for the product;– Take a second pass to focus on specific
features.
• Debriefing session in which experts work together to prioritize problem s.
3 stages for doing heuristic evaluation
©2011
• Few ethical & practical issues to consider because users not involved.
• Can be diffi cult & expensive to find experts.
• Best experts have know ledge of application dom ain & users.
• Biggest problem s:– I m portant problem s m ay get m issed;– Many trivial problem s are often identified;– Experts have biases.
Advantages and problem s
©2011
• Clarity• Minim ize unnecessary com plexity &
cognitive load• Provide users w ith context• Prom ote positive & pleasurable user
experience
Heuristics for websites focus on key criteria (Budd, 2007)
©2011
Walkthroughs are an alternative to heuristic evaluation for predicting user's problems without doing user testing.
Involve walking through a task with the product and nothing problematic usability features. Most walkthrough methods to not involve users.
Walkthroughs
©2011
• Focus on ease of learning.• Designer presents an aspect of the
design & usage scenarios.• Expert is told the assum ptions
about user population, context of use, task details.
• One or m ore experts walk through the design prototype w ith the scenario.
• Experts are guided by 3 questions.
Cognitive walkthroughs
©2011
• W ill the correct action be suffi ciently evident to the user?
• W ill the user notice that the correct action is available?
• W ill the user associate and interpret the response from the action correctly?
As the experts work through the scenario they note problem s.
The 3 questions
©2011
• Variation on the cognitive walkthrough them e.
• Perform ed by a carefully m anaged team .• The panel of experts begins by working
separately.• Then there is m anaged discussion that
leads to agreed decisions.• The approach lends itself well to
participatory design.
Pluralistic walkthrough
©2011
• A m ethod for evaluating user traffi c through a system or part of a system
• Many exam ples including Google Analytics, Visistat ( shown below)
• Tim es of day & visitor I P addresses
Analytics
©2011
S ocial action analysis(Perer & S hneiderm an, 2008)
©2011
• Provide a way of evaluating products or designs w ithout directly involving users.
• Less expensive than user testing.• Usefulness lim ited to system s w ith
predictable tasks - e.g., telephone answering system s, m obiles, cell phones, etc.
• Based on expert error- free behavior.
Predictive m odels
©2011
• Goals – what the user wants to achieve eg. find a website.
• Operators - the cognitive processes & physical actions needed to attain goals, eg. decide which search engine to use.
• Methods - the procedures to accom plish the goals, eg. drag m ouse over field, type in keywords, press the go button.
• S election rules - decide which m ethod to select when there is m ore than one.
GOMS – Goal, Operators, Methods, Selection rules
©2011
GOAL : delete a word in a sentence
Method for accomplishing goal of deleting a word using menu optionsMethod for accomplishing goal of deleting a word using delete key
Operators to use in the above methods :Click mouseDrag cursor over textSelect menuMove cursor to command Press key
Selection rules to decide which method to use :1. Delete text using mouse and selecting from menu if a large amount of text is to be deleted.2. Delete text using delete' key if small number of letters are to be deleted.
©2011
• GOMS has also been developed to provide a quantitative m odel - the keystroke level m odel.
• The keystroke m odel allows predictions to be m ade about how long it takes an expert user to perform a task.
K eystroke level m odel
©2011
O perator Description T im e (sec)K Pressing a s ingle key or button
Average skilled typ ist (55 wpm )Average non-skilled typ ist (40 wpm )Pressing sh ift or contro l keyTypist unfam iliar w ith the keyboard
0.220.280.081.20
P
P1
Pointing w ith a m ouse or other device on adisplay to se lect an object.This value is derived from F itts ’ Law which isdiscussed be low .C licking the m ouse or s im ilar device
0.40
0.20H Bring ‘hom e’ hands on the keyboard or other
device0.40
M M entally prepare/respond 1.35R(t) The response tim e is counted on ly if it causes
the user to wa it.t
Response tim es for keystroke level operators (Card et al. , 1983)
©2011
Using K LM to calculate tim e to change gaze (Holleis et al. , 2007)
©2011
• Fitts’ Law predicts that the tim e to point at an object using a device is a function of the distance from the target object & the object’s size.
• The further away & the sm aller the object, the longer the tim e to locate it & point to it.
• Fitts’ Law is useful for evaluating system s for which the tim e to locate an object is im portant, e.g. , a cell phone,a handheld devices.
Fitts’ Law ( Fitts, 1954)
©201147www.id-book.com
The language of evaluationAnalytics Analytical
evaluationControlled
experimentExpert review or crit Field study Formative
evaluation
Heuristic evaluation
In the wild evaluationLiving laboratoryPredictive evaluationSummative
evaluationUsability laboratory User studies Usability testing Users or participants
©201148www.id-book.com
Key points Evaluation & design are closely integrated in user-centered
design. Some of the same techniques are used in evaluation as for
establishing requirements but they are used differently (e.g. observation interviews & questionnaires).
Three types of evaluation: laboratory based with users, in the field with users, studies that do not involve users
The main methods are: observing, asking users, asking experts, user testing, inspection, and modeling users’ task performance, analytics.
Dealing with constraints is an important skill for evaluators to develop.
©2011
Usability testing is done in controlled conditions. Usability testing is an adapted form of experim entation. Experim ents aim to test hypotheses by m anipulating certain
variables while keeping others constant. The experim enter controls the independent variable(s) but not
the dependent variable(s) . There are three types of experim ental design: different-
participants, sam e- participants, & m atched participants. Field studies are done in natural environm ents. “I n the w ild” is a recent term for studies in which a prototype
is freely used in a natural setting. Typically observation and interviews are used to collect field
studies data. Data is usually presented as anecdotes, excerpts, critical
incidents, patterns and narratives.
©2011
• I nspections can be used to evaluate requirem ents, m ockups, functional prototypes, or system s.
• User testing & heuristic evaluation m ay reveal different usability problem s.
• W alkthroughs are focused so are suitable for evaluating sm all parts of a product.
• Analytics involves collecting data about users activity on a website or product
• The GOMS and KLM m odels and Fitts’ Law can be used to predict expert, error - free perform ance for certain k inds of tasks.
K ey points