Combining Human Error Verification and Timing Analysis: A Case ...

EPSRC Programme Grant EP/G059063/1

Public Paper no. 152

Combining Human Error Verification and Timing Analysis: A Case Study on an Infusion Pump

Rimvydas Ruksenas, Paul Curzon, Ann Blandford & Jonathan Back

Ruksenas, R., Curzon, P., Blandford, A. E., & Back, J. (2014). Combining human error verification and timing analysis:

A case study on an infusion pump. Formal aspects of computing, 26, 1033–1076.

PP release date: 14 June 2013

file: WP152.pdf

Under consideration for publication in Formal Aspects of Computing

Combining Human Error Verificationand Timing Analysis: A Case Studyon an Infusion PumpRimvydas Ruksenas,

1Paul Curzon,

1Ann Blandford

2and Jonathan Back

2

1Queen Mary University of London, School of Electronic Engineering & Computer Science, Mile End, London E1 4NS, UK2University College London, UCL Interaction Centre, MPEB, Gower Street, London WC1E 6BT, UK

Abstract. The design of a human-computer interactive system can be unacceptable for a range of reasons.User performance concerns, for example the likelihood of user errors and time needed for a user to completetasks, are important areas of consideration. For safety-critical systems it is vital that tools are available tosupport the analysis of such properties before expensive design commitment has been made. In this work,we give a unified formal verification framework for integrating two kinds of analysis: 1) predicting boundsfor task-completion times via exhaustive state-space exploration, and 2) detecting user-error related designissues. The framework is based on a generic model of cognitively plausible behaviour that captures assump-tions about cognitive behaviour decided through a process of interdisciplinary negotiation. Assumptionsmade in an analysis, including those relating to the performance consequences of users recovering from likelyerrors, are also investigated in this framework. We further present a novel way of exploring the consequencesof cognitive mismatches, on both correctness and performance grounds. We illustrate our analysis approachwith a realistic medical device scenario: programming an infusion pump. We explore an initial pump designand then two variations based on features found in real designs, illustrating how the approach identifies bothtiming and human error issues.

Keywords: Human error, formal verification, performance, medical devices, model checking, SAL.

1. Introduction

A key problem in the design of human-computer interactive systems is how to be explicit about the per-spective of the user or operator. The correct operation of any such system depends on the behaviour of bothhuman and computer actors. For a safety-critical system, potential problems with its usability are importanthazards to take into account. In hospitals, for example, a significant part of a nurse’s role can be to program

Correspondence and o↵print requests to: Rimvydas Ruksenas, School of Electronic Engineering & Computer Science, QueenMary, University of London, Mile End Road, London E1 4NS, UK. E-mail: [email protected], Fax: +44 (0)20 78827997

2 R. Ruksenas, P. Curzon, A. Blandford and J. Back

and monitor the use of interactive devices such as heart monitors, infusion pumps and syringe drivers. Thereis also a strong move towards developing portable versions of some medical devices, allowing patients to leavehospital, operating or monitoring the devices themselves based on only limited training. Mistakes made inthe operation of such devices can and do lead to patient harm. This is therefore an issue that must be takeninto account when assessing their reliability, so that designs can be improved to eliminate or ameliorate anyproblems.

Much e↵ort is expended on developing ways to verify that safety critical devices work to specification –i.e., the computer does as the designer intended. There has been much less work on ways to ensure that thespecification takes into account human behaviour. A hidden assumption is often that the users of systems willbehave exactly as the designer intended. If they do not and things go wrong then it is assumed that the useris to blame. However this is not a realistic position to take if safety-critical errors are to be avoided. Humansdo make mistakes. Furthermore if designers of interactive systems accept this then they can design systemsin ways that make some kinds of errors impossible, where the likelihood of others is reduced, and whereerror recovery is more likely when mistakes are made [BB97]. Verification and validation tools are needed todetect when an interaction or interface design allows such problems to persist so that improvements can bemade.

While formal analysis techniques can now contribute realistically to evidence supporting claims aboutthe reliability of the system, the analysis of that system from the perspective of human reliability remainsrelatively informal based on subjective assessment using guidelines, checklists, heuristics and expert judge-ment. This paper continues our ongoing exploration of whether formal techniques can be extended to anunderstanding of these human factors. In particular we explore an approach involving modelling user be-haviour.

People are not deterministic machines. Their behaviour is not fully predictable. However, it is a usefulapproximation to assume that humans behave in ways that follow certain patterns. For example, peopleenter interactions with goals and domain knowledge likely to help them achieve their goals. They takeactions based on that knowledge and those goals. They also have to work within the resource limitations oftheir cognitive system, such as having limited working memory, being only able to focus visual attention atone place at a time, and so on. Work in the cognitive sciences is investigating such patterns of behaviourand increasing our understanding of their subtleties. Following Butterworth et al. [BBD00] we refer tosuch behaviour as ‘cognitively plausible behaviour’. Bad consequences should ideally not arise from peoplebehaving in cognitively plausible ways. We have previously shown that formal models based on generalresults about such behaviour can help to detect designs susceptible to whole classes of persistent, systematicuser errors [CRB07]. We build here on that work. In particular, we extend it by including assumptions aboutthe cognitive mismatch between the user’s understanding of the device and its actual function, and explorehow to integrate the approach with the detection of timing issues.

In many situations, such as on a busy hospital ward, the complexity of achieving tasks is also a crucialissue. If certain tasks take too long to complete then a system may be unworkable in reality. Furthermore, ifpeople perceive a task to be too complex they will start to adopt workarounds, taking short cuts that maycompromise safety. Ideally systems should be designed with this in mind using principles relating to cognitionthat will allow prediction of behaviour so that once again, good design can remove the likelihood of harmoccurring due to human behaviour. Even for everyday systems and devices, the time and/or the number ofsteps taken to achieve a task goal can be an indication of the usability or otherwise of a particular design.However, design changes intended to speed up a task can inadvertently increase the likelihood of other kindsof systematic error, and vice versa. Ideally these issues need to be considered together and verification toolsshould allow them to be analysed in tandem.

Timing analysis is one of the core concerns in the well-established GOMS methodology [JK96a]. AGOMS model predicts the trace of operators and task completion time. These predictions can be usedto compare and evaluate di↵erent user interface (UI) designs and improve them by redesign. Tools suchas CogTool [JPSK04] allow UI designers to perform timing analysis even with no previous experience incognitive modelling. However, since GOMS models are deterministic, these predictions assume and apply toa single, usually considered as expert or optimal, sequence of operators. Such assumptions may be invalid foreveryday interactive systems whose average users are not trained to follow optimal procedures, and may noteven be aware of them. Such users might also simply choose a less cognitively demanding method. Moreover,under pressure, even the expert users of safety-critical systems may choose sub-optimal plans of action, eitherconsciously or unconsciously, due to limitations in cognitive resources. This suggests that a timing analysis ofinteractive systems should include a broader set of cognitively plausible behaviours than just optimal paths.

Combining Human Error Verification and Timing Analysis: A Case Study on an Infusion Pump 3

Considering broader sets of user behaviour is the underlying premise in our approach to cognitive mod-elling. We previously developed a generic formal model of cognitively plausible user behaviour from abstractcognitive assumptions [CB01, RBCB09] such as entering an interaction with knowledge of the task and itssubsidiary goals. We have demonstrated its utility for detecting a range of di↵erent kinds of systematic usererror. To date we have concentrated on the verification of functional correctness (the user achieving a taskgoal), usability properties (the absence of particular kinds of systematic errors) and security properties.

In precursor work [RCBB07] to this paper we introduced a way to combine a GOMS-like ability to predictexecution times with our verification methodology for detecting the potential in a design for systematic usererror. We showed the feasibility of this approach based on simple examples involving cash machines. Themain goal of this paper is to extend that work, indicating precisely what cognitive assumptions have beenmade, and applying them to a more serious safety-critical case study – that of the design of a typical medicalinfusion device. We present a scenario where several interface designs for an infusion pump are analysed andcompared from both correctness and performance points of view. In doing so we have had to incorporatefurther cognitive modelling assumptions applying them to new user behaviour modelling techniques suchas a model of cognitive mismatch. Though we presume an expert (trained) user, in our analysis this doesnot automatically imply error-free and/or optimal behaviour. One of the reasons for this is the possibilityof cognitive mismatches when, even expert, user beliefs about the system state or device behaviour divergefrom the reality [Rus01].

In our analysis, we use the SAL verification tools [dMOR+04]. The model checking approach allows usto deal with all cognitively plausible runs according to our assumptions rather than considering just a single‘run’ as in a conventional GOMS-like analysis. Some of those runs may lead to an error. Thus, correctness(error) analysis is a natural first step in our approach. Once a more resilient design has been developed,fixing any problems found, a broad timing analysis can be undertaken based on the same models. A seriesof plausible runs generate a range of timings depending on the path taken. We make no attempt, however,to predict the most likely one. Instead, the questions to be asked in our approach are as follows.

• Do the slowest plausible runs fall under an acceptable time bound?• Do the fastest plausible runs satisfy some lower time bound?• How does the degree of cognitive mismatch impact those time bounds?

An advantage of doing this kind of analysis is that the timings can be used to argue that accordingto the assumptions plausible user choices are ‘erroneous’ on coarse performance grounds: the user modeldoes achieve the goal but very ine�ciently. If one potential method for achieving a goal was significantlyslower, but for which task completion could be proved, this might suggest design changes to either disablethe possibility of choosing that method or changing the design so that if the method was taken then it wouldbe easier to accomplish the goal. Similarly, a design chosen by users on performance grounds to eliminatea slow path might be rejected by our analysis due to its potential for systematic error discovered by theintegrated human error analysis. Again design change might be warranted to prevent the potential problem.

We assume the use of timing data such as that used by HCI models like GOMS. Such timings are onlyestimates so ‘proofs’ based on them are not formal guarantees of a particular performance level. They arejust proofs that the GOMS-based execution times lead to values within a particular range, not proofs directlygiving assurances about human behaviour. They can be of use by giving clear indications of areas of thedesign that warrant further consideration according to the assumptions made.

To summarise, the main contribution of our work here is to present a framework that combines user-centred timing analysis with a formal verification approach to detect design flaws that lead to systematicuser error. We demonstrate its applicability by exploring its use on a realistic safety-critical medical deviceexample. More specifically:

• We show how a set of cognitive assumptions can be used to analyse cognitively plausible ways of perform-ing a task that emerge from a formal model of behaviour, rather than just analysing optimal or ‘correct’traces.

• We demonstrate how timing analysis can be performed on all traces proposed by the model, i.e. a formof exhaustive analysis, rather than just simulating single traces at a time as in GOMS.

• We show how system designs that have potential for systematic human error occurring can be detectedusing the same specification and models used for the timing analysis.

• We extend our previous concept of systematic error in an analysis to include ‘erroneous’ choices in the


sense of choosing an alternative that, whilst eventually achieving the result, is predicted to be slowerthan acceptable.

• We show how a formalisation of assumptions about cognitive mismatch between the user’s understandingof the device and its actual function can be used within our framework. We can use it to analyse howperformance and correctness are a↵ected by the degree of mismatch.

• We demonstrate the utility of the approach by applying it to a case study based on a typical, realisticdesign of a medical infusion pump, illustrating the way that consequences that follow from assumptionscan be detected and addressed in alternative designs.

In this paper we consider how the particular encapsulation of cognitive assumptions produces an approachto timing and human error verification and present an underlying ‘engine’ for doing such analysis. Ultimately,an analysis such as ours would be a part of a user interface design process, complementing other forms ofanalysis and helping to explore correctness and performance issues and trade-o↵s between them. A longerterm objective of our work is therefore to explore ways that this ‘engine’ can provide the core for a toolsuitable for use with domain specialists, human factors specialists and developers. This is beyond the scopeof the present paper, however, and is left as further work.

2. Related Work

There is a large body of work on the formal analysis of interactive systems. Specific aims and focus vary. Herewe concentrate on the work most directly linked to our work in this paper. A good review of the literatureon combinations of user or use assumptions with models of devices is provided by Bolton et al [BBS12].

Performance analysis is one of the primary concerns of the GOMS methodology [JK96b]. GOMS analysisusually assumes error-free execution; it does not involve verifying correctness as such. This, however, does notpreclude GOMS from being used to analyse erroneous performances in a limited way. As noted by John andKieras [JK96b], GOMS can be used, for example, to give performance predictions for error recovery times. Todo this one simply specifies GOMS models for the task of recovering from error rather than the original task,perhaps comparing predictions for di↵erent recovery mechanisms or determining whether recovery can beachieved with minimal e↵ort. With these approaches the analysis would not be able to identify the potentialfor human error: the specific errors considered must be decided in advance by the analyst.

Beckert and Beuster [BB06] take a step towards combining GOMS modelling and correctness verifica-tion. They present a verification environment with a structure similar to our models – connecting a devicespecification, a user assumption module and a user action module, the latter being based on CMN-GOMS.The selection rules of their GOMS model are driven by the assumption model while the actions drive thedevice model. This gives a way of exploring the e↵ect of erroneous user behaviour in the form of incorrectselection decisions as specified in the user assumption module. However, the assumption module has no spe-cific structure and, thus, does not provide systematic guidance as to what kind of potential errors to explore.These decisions are left to the analysts of the system. Also, Beckert and Beuster have not specifically focusedon predicting performance times, but rather are using GOMS as a formal hierarchical task model.

Observable manifestations of erroneous behaviour are explicitly modelled by Fields [Fie01] who analysederror patterns using model checking. This work is based around the idea of a mistake model that transformsa given ideal interaction into a faulty interaction that includes, for example, duplicated steps. This approach,however, whilst giving specific kinds of errors to explore in the form of the mistake model lacks discrimina-tion between random and systematic errors. It also implicitly assumes there is a correct plan, from whichdeviations are errors. A recent example of the approach based on task models is the work of Bolton andothers [BBS12]. They use the Enhanced Operator Function Model (EOFM) to describe operator tasks. Thistask model is combined with a device model as a basis for a model checking analysis using SAL [dMOR+04].The analysis involves considering variants of the task by inserting automatically “phenotypes of erroneoushuman behaviour”. These variant models are checked against correctness properties – that the combinedmodel would reach specified goals.

Temporal aspects of usability have also been investigated in work based on the task models of userbehaviour [FWH96, LPNB02]. Fields et al [FWH96] focus on the analysis of situations where there aredeadlines for completing some actions and where the user may have to perform several simultaneous actions.Their approach is based on Hierarchical Task Analysis and uses the CSP formalism to specify both tasks


and system constraints. Lazace et al [LPNB02] add quantitative temporal elements to the ICO formalismand use this extension for performance analysis.

A common feature of the above approaches is that they all consider specific interaction scenarios. Thim-bleby [Thi02] explores a di↵erent approach: he analyses interface complexity relying on probabilistic distri-butions for the usage of menu functions. This gives an indication of the e�ciency of interactions, albeit notin terms of timing but measured as the number of actions needed to reach desired menu options. Thoughour framework also considers a wider range of plausible interaction scenarios, these are not probabilistic intheir nature. Rather, they arise as a result of non-determinism of our cognitive model based on the salienceof user actions. A similar approach to ours is taken by Bowman and Faconti [BF99]. They formalise onemodel of human information processing (Interactive Cognitive Subsystems [BM95]) using the process calcu-lus LOTOS, and then apply a temporal interval logic to analyse constraints, including timing ones, on theinformation flow and transformation between the di↵erent cognitive subsystems. Their approach is more de-tailed than ours and deals with information flow between the di↵erent cognitive subsystems and constraintson the associated transformation processes. As a result, it focusses on reasoning about multi-modal interfacesand analyses whether interfaces based on several simultaneous modes of interaction are compatible with thecapabilities of human cognition.

One source of of user error is cognitive mismatch between user beliefs about the system state or behaviourand the reality. Rushby et al [Rus01] focus on mode errors resulting from cognitive mismatch and the abilityof pilots to track mode changes. They formalise plausible mental models of systems and analyse them usingthe Mur� verification tool. Their mental models are essentially abstracted system models; they do not relyupon structure provided by cognitive assumptions as in our work. Furthermore, in addition to error issues,we also consider the e↵ect of cognitive mismatch on timing.

Recently, the formal modelling and analysis of interaction with medical devices has become an active areaof research. For example, Bolton and Bass [BB10] describe a model of user interaction with an infusion pump.They develop a detailed model of the pump interface and combine it with a human task behaviour modelbased on the enhanced operator framework. The model obtained is used to verify several safety properties ofpump programming. Whereas, in the work of Bolton and Bass, the model of task behaviour is derived frompump documentation and training materials, we investigate whether pump interfaces su�ciently ‘guide’ useractivities by providing relevant cues.

Sankaranarayanan et al [SHL11] consider the dependability analysis of programmable medical devicesusing model checking. They rely on timed and hybrid automata to model the real-time operation of thedevice and its interactions. Their focus, however, is on detailed modelling of the pump’s operation and thedrug concentration in the patient’s body. They are concerned with the impact of the failure rather than howthe details of the design increase or reduce the likelihood of error. The e↵ect of user error is summarisedusing generic mistake models that transform a given ideal interaction into a faulty interaction in the styleof Fields [Fie01].

Campos and Harrison [CH11] develop formal models of several infusion pumps. They verify similarproperties relating to the interfaces of these pumps with the aim of discovering potentially unforeseen designconsequences. Though the main focus of their approach is on the device, their models of infusion pumpsinclude an ‘activity’ layer. The latter can be seen as an abstraction of cognitively plausible behaviour.

Based on model-driven engineering, Kim et al [KAS+11] present a development approach that aimsto establish a safety-assured implementation of infusion pump software. They translate the generic pumpreference model provided by the U.S. Food and Drug Administration (FDA) into timed automata. Its safetyrequirements, also provided by the FDA, are then analysed using the UPPAAL model checker. Their analysis,however, does not cover user interaction issues, focusing on the pump operation instead.

3. Generic Model of Cognitively Plausible User Behaviour

Our generic model of cognitively plausible user behaviour (GUM) is a higher-order specification of a set ofassumptions about the way people behave. It specifies a form of cognitively plausible user behaviour accordingto these assumptions [BBD00]. The assumptions, however, do not attempt to completely specify cognitiveplausibility. Real users can act outside this behaviour of course, about which the model says nothing. However,behaviour defined by the GUM can be regarded as potentially systematic, and so erroneous behaviour, thatemerges from the assumptions and a given system, is similarly systematic in the design. The predictive powerof the GUM is bounded by the situations where people act according to the assumptions specified. It makes


it possible to investigate what happens if a person acts in such plausible ways. The behaviour defined isneither ‘correct’ nor ‘incorrect’. It could be either, depending on the environment and task in question. Wedo not attempt to model the underlying neural architecture nor the higher-level cognitive architecture suchas information processing. Instead our model is an abstract specification, intended for ease of automatedformal reasoning.

It is generic in the sense that it does not contain details of any specific task or system, but rather generalassumptions about behavioural patterns common to many situations. To reason about a specific system theGUM must be instantiated with specific details of that system and task. We discuss this in more detail inSection 7. Here we focus on the generic features of the model.

3.1. Cognitive assumptions

The GUM is based on a set of assumptions that give a behavioural description on the cognitive and knowledge

levels in the terms of Newell [New90]. Our assumptions fall into two main groups discussed in more detailbelow. The first group, of behavioural assumptions, focuses on the role of the internal goals and knowledge ofusers in directing their behaviour. The second group, of salience assumptions, deals with the salience of cuesin choosing plausible actions. The specific assumptions our model is based on can be justified in terms ofresults from the cognitive science and human-computer interaction literature (e.g., [KWM97, Ras83, AT02,OKM86, BB97, Hol93a, RBCB09]). Grouped according to their subject matter, they are briefly discussedbelow. Their formalisation is presented in Section 4.

3.2. Behavioural assumptions

These assumptions provide the underlying structure of user behaviour and postulate basic mechanisms ofaction choice.

Elements of user behaviour. We assume that interactive user behaviour can be structured in terms ofactions and activities.

An action is assumed to be an atomic unit of user behaviour. We also assume that an action has twostages: an internal mental action when a person mentally commits to acting (either due to the internal goalsor as a response to the interface prompts) and a physical action when the action itself is taken. A similardistinction between a preparation phase and an execution phase is made in the EPIC architecture [KWM97].

An activity is a chunking of user actions that represents the knowledge-based level of user be-haviour [Ras83]. It is also a convenient way to structure the model. We assume that each user activityhas a goal; the aim of the activity is to achieve that goal. It is this goal that drives an activity. User actionsare linked to an activity if, on the knowledge-based level, they are associated with achieving that goal. Forexample, pressing a key for increasing (or decreasing) the number entered can be linked to the activity ofsetting the rate of an infusion. Achieving the goal of an activity may reduce the salience of the linked actions.

Choice of actions. We assume that user’s choice of actions is based on their salience (activation) [AT02]which determines the cognitive plausibility of each action. However, any one of several cognitively plausiblebehaviours might be chosen in any situation. It cannot be assumed that any specific behaviour will be theone that a person will follow where there are alternatives. This assumption of non-determinism allows us toexplore a space of plausible behaviours rather than single ‘correct’ behaviour. We also assume that one canswitch between activities when some action from the new activity is at least as salient as any action in thecurrent activity.

Though the choice of actions is generally based on their salience, we assume that users may also followcertain procedures or (partially) order their activities based on their experience and knowledge of the task.For example, in incremental number entry, expert users will not use the key for making a small increase ifthe distance to the target value is large. Instead, the key presses associated with a large increase will betaken.

Execution of actions. We assume that there is a delay between the commitment moment and the momentwhen the physical action is taken. Once a signal has been sent from the brain to the motor system to take


an action, it cannot be revoked after a certain ‘point of no return’ [Bar58], even if the person becomes awarethat it is wrong before the action is taken. In cognitive psychology, the mental processes that immediatelyprecede this temporal boundary are known as ballistic processes. Once launched, they ‘must proceed tocompletion, and, upon completion, necessitate the start of overt movement’ [OKM86]. To capture this, weassume that a physical action immediately follows the committing mental action.

Task termination. We make two assumptions about the way users may terminate an interaction: voluntary

termination and forced termination.Users intermittently, but persistently, terminate interactions as soon as their main goal has been

achieved [BB97], even if subsidiary tasks generated in achieving the main goal have not been completed.For example, a medical device may require a code to be entered to authenticate the user allowing them tomake changes to the programmed settings, but once programmed a logout button must be pressed. In thesesituations, nurses are likely to sometimes leave having finished the programming without logging out. Werefer to such scenarios as voluntary termination.

If there is no apparent action that a person can take that will help to complete the task then the personmay terminate the interaction. For example, if, on an infusion pump, the user wishes to enter a rate inmg/hour, but the options presented include nothing obviously about changing the units, then the personmight give up, assuming the goal is not achievable. We refer to such situations as forced termination. Inpractice the person would probably not completely give up but start again. However, we assume forcedtermination in such situations as it is indicative of a design feature that warrants improvement if a user aimis considered plausible.

3.3. Salience related assumptions

The second group, of salience related assumptions, is more closely related to cognitive phenomena. Com-pared to the behavioural ones, these assumptions are more detailed and therefore more open to change.In our model, they refine the underlying non-determinism of action choice. Though based on experimentaldata [RBCB09], they are just a specific set of salience related assumptions that serve as a convenient basisfor analysing interactive systems. Di↵erent sets of such assumptions (e.g., more coarsely grained) could beconsidered, if more appropriate, to fit specific scenarios being investigated.

Salience. In our framework, action salience can be seen as an abstraction of activation theory [AT02]. Suchan abstraction is needed to make verification tractable. On this abstraction level, we assume that the overallsalience of an action is derived from three kinds of cue salience to take that action. Two are internal (in-the-head) forms of salience: procedural salience and cognitive salience, whilst the third kind of salience isin-the-world salience: sensory salience. They are discussed in more detail below. We assume that all threekinds of cues can be a↵ected by various inhibitors (see Section 3.3).

In our abstraction of activation, we postulate several discrete levels of salience that represent a continuousquantity of activation. Only actions with the highest level of salience are assumed to be candidates forexecution.

Routines and procedural cues. We assume that procedural salience is a↵orded to a routine action dueto the ‘habit’ formed from taking actions in sequence. A linkage builds up between them in the person’s headso that one cues the next in the commonly performed sequence. Furthermore, we assume that proceduralcues apply to both the rule- and skill-based levels of behaviour [Ras83]. It is this kind of salience that leadsto capture errors [Hol93b] where a person follows the habitual sequence without intending to. For example,suppose a nurse normally uses infusion pump A, where the sequence of actions is to authenticate, then enterthe volume to be infused and then the rate of infusion. On having to use infusion pump B, the authenticationstep may procedurally cue entering the volume, leading to an error if on this device the rate should be inputfirst.

Task-related goals and cognitive cues. We assume that a user enters an interaction with knowledgeof the task and, in particular, task dependent sub-goals that must be discharged. These sub-goals mightconcern information that must be communicated to the device, such as the dose of a drug to be infusedor items that must be used with the device (such as an identity card that must be swiped to authenticate


Behavioural assumptions, including timing

(Sections 3.2 & 4.1)

Concrete actions

Salience assumptions(Sections 3.3 & 4.2)

Generic user model (GUM)

Concrete user model

Fig. 1. Layers in a concrete user model

the user). Given the opportunity, people may attempt to discharge such goals, even when the device isprompting for a di↵erent action. Such task-related goals arise from experience and knowledge of the task inhand, independent of the environment in which that task is performed.

According to Rasmussen, actions are goal-controlled at the knowledge-based level of behaviour [Ras83].We assume that cognitive salience covers the situations where the need to do an action just springs-to-minddue to that action being associated with the goal of the current activity being performed. For examplethe need to authenticate may spring-to-mind even without other cues. Actions that spring-to-mind morerandomly, that are part of completely distinct activities, are not a↵orded cognitive cueing with respect tothe current activity. While possible due to other kinds of cueing, their cognitive salience is assumed to bebelow the level of activation su�cient for an action.

Reactive behaviour and sensory cues. Users may react to an external stimulus, doing the action sug-gested by the stimulus. For example, if a light flashes on the identity card reader a user might, if the lightis noticed, react by swiping their card. We assume that sensory salience is a↵orded to actions due to suchexternal cues in the world. External cues could be related to any of the senses though typically are in theform of visual or audible cues. Clearly others – such as haptic cues leading to haptic salience – are alsopossible. We do not distinguish between the di↵erent kinds of sensory cues, and indeed the salience couldbe provided by some combination of senses. For example, a low battery warning light is intended to act as acue to plug a device in to recharge it. The strength and timing of that cue may determine its saliency andso whether it is actually seen and acted upon [CB08].

Specificity. The concept of specificity refers to the dynamic, just-in-time, aspect of cue relevance, i.e., howtimely and clearly that cue indicates the action required. Following Hollnagel [Hol93a] (p. 299), we assumethat the strength of a cue is relative to its specificity. For example, in number entry, pressing an increasekey is irrelevant when the current value is already higher than the target. Thus, we say that the sensory andcognitive cues of the increase key are not specific in such situations. Consequently, the sensory and cognitivesalience is reduced for the action of pressing that key.

3.4. The GUM

Our GUM framework is intended to be a flexible basis in which the e↵ects of assumptions about interactivebehaviour can be explored. Schematically, the GUM is represented by the diagram in Fig. 1. It shows that the


Table 1. A fragment of the SAL language

Notation Meaning

x:T x has type T�(x:T):e a function of x with the value ex0 = e an update: the new value of x is that of e{x:T | p(x)} a subset of T such that the predicate p(x) holdsa[i] the i-th element of the array ar.x the field x of the record rr WITH .x := e the record r with its field x updated by eg ! upd if g is true then update according to updc [] d non-deterministic choice between c and d[](i:T): ci non-deterministic choice between ci with i in range T

behavioural assumptions are the most stable postulates in the framework. On the other hand, the saliencerelated assumptions are more flexible. They can be viewed as a plugin to the core behavioural assumptions.Taken together, the behavioural and salience assumptions form a generic user model (GUM). In this paper,we consider a particular set of the salience related assumptions and, consequently, one version of GUM whichcan be used to analyse a range of interactive devices. The aim is to illustrate our approach. A di↵erent setof salience assumptions (GUM) might be more appropriate in substantially di↵erent scenarios. The point isthat the specific assumptions about behaviour, made in an analysis, are explicit and inspectable. The resultsare about the consequences of people behaving in that way.

As also shown in Fig. 1, the GUM must be instantiated with a further set of assumptions concerningconcrete user actions and their cue strength for the analysis of a specific interactive system.

Next we describe how our framework and the specific cognitive assumptions made have been formalisedwithin the SAL verification environment [dMOR+04].

4. The GUM in SAL

SAL is a verification system that provides a higher-order language for specifying concurrent systems in acompositional way. State machines are specified in SAL as parametrized modules and can be composed eithersynchronously or asynchronously. SAL support for higher-order specifications is essential for capturing thegeneric aspects of our model such as the salience and termination rules as well as its structuring into activitiesand actions. The SAL notation we use here is given in Table 1.

4.1. The behavioural assumptions in SAL

The top level of the SAL specification of a transition relation that defines our GUM (SAL module User)is given in Fig. 2. It represents the core behavioural assumptions shown in the diagram from Fig. 1. Weprovide more detail on this specification in the following subsections, where we also discuss how it reflectsour cognitive assumptions.

The behaviour of the GUM specified in Fig. 2 can be represented as the state machine shown in Fig. 3.In state choose (which is also the initial state), the user model either terminates the interaction (guardedcommands ExitTask, Abort) or chooses one of the plausible actions and commits to it (CommitAction). Instate execute, the model simply performs the previously chosen action and returns to state choose. Finally,if the user model terminates the interaction, it remains in state terminated forever.

SAL models are transition systems specified as named guarded commands. Each guarded command(i.e. transition) in the specification describes an action that a user could plausibly take. Non-determinism isrepresented by the non-deterministic choice, [], between the guarded commands. For example, CommitActionin Fig. 2 is the name of a family of transitions indexed by action a and activity n. These transitions model userchoice of actions, and the process of committing to one of them, as explained in more detail in Section 4.1.3.In our model, committing to an action leads to its execution in the next step. Execution of actions is


transition[

([](a:ActionRange, n:ActivityRange): CommitAction:<Choose some action a and activity n (see Section 4.1.3)>

)[] ExitTask: Voluntary termination

not(acommitted) and finished = notf and( Perceived(inp,mem,env) or Failure(inp,mem) )-->finished’ = ok

[] Abort: Forced termination

not(acommitted) and finished = notf andnot(Perceived(inp,mem,env)) andlevel = 0 andturn /= mach-->finished’ = if Wait(inp,mem) then notf else abort endif;turn’ = mach

[]([](a:ActionRange): PerformAction:

<Perform previously chosen action a (see Section 4.1.2)>

)[] Idle:

finished /= notf --> finished’ = finished[]

else --> level’ = level - 1]

Fig. 2. Top level of our GUM in SAL (italics are commentary, the angle brackets indicate a verbal substitute for a model part).

choose execute

terminated

CommitAction

PerformAction

ExitTask

Abort

Idle

Fig. 3. GUM behaviour

specified as the family of transitions PerformAction, discussed in Section 4.1.2. The pairs CommitAction–PerformAction of the corresponding transitions reflect the connection between the mental and physicalactions.

Before describing the guarded commands in detail in the remainder of this section, we introduce theunderlying structures for user behaviour in our SAL models.


4.1.1. Elements of user behaviour

The GUM’s specification of user behaviour is structured in terms of user actions and activities introduced inSection 3. These elements are specified over the following four components of the GUM state: inp, mem, outand env. The variable inp:Inp represents what the user can perceive in the world, the variable mem:Memoryrepresents their internal memory state such as their beliefs about the state of the system, the variableout:Out represents the actions they can take in the world, and the variable env:Env represents the state ofthe world. These components are specified using type variables (Inp, Memory, Out) and Env that must beinstantiated for each concrete interactive system to be analysed.

Actions. In the GUM, uninstantiated type ActionRange represents the names of possible user actions. Tospecify an action we need to know in what circumstances it happens and the e↵ect of it being taken. Eachuser action is therefore specified by two records. The first, of type Cue, collects information related to whenthe action is cued. The second record, of type Trans, describes the state transition that occurs when theaction is executed. These records for all user actions are collected into two arrays, cues and trans, bothindexed by the action name (i.e., ActionRange). They are parameters of the User module.

The cueing record of each action details assumptions about when that action will be cued. It specifiesthe strengths of its procedural and sensory cues (see Section 3.3) and two predicates that may inhibit thechoice of that action for execution. The first predicate, spec, indicates when the cognitive and sensory cuesfor that action are specific, i.e., relevant to the current situation and therefore likely to be taken. If a cueis non-specific, the salience of that action may be reduced. The second predicate, grd, indicates when thataction can be performed in principle. For example, if physical resources such as the buttons necessary fortaking it are not available then it will not be enabled. If an action is not enabled then it will not be executedwhatever the strengths of its procedural and cognitive cues are.

In summary, a cueing record for action a includes the following fields (as functions over the triple of theGUM state components (inp,mem,env)):

proc: A function that for any action b, returns the strength of the action a as a procedural cue for b.sens: A function that specifies the combined strength of sensory cues for a.spec: A specificity predicate for a.grd: An enabledness predicate for a.

The nature of the execution of an action a is also defined by a record. The latter consists of a time valueassociated with the duration of the physical stage of a and two relations that specify the physical actionitself and the update of user’s memory (beliefs). These components are further discussed in Section 4.1.2.

Activities. Uninstantiated type ActivityRange represents the names of possible activities in the usermodel. An activity itself is modelled as a record of type Activity. All activities are collected into an array,activities, indexed by the activity name (i.e., ActivityRange). This activity array is another parameterof the User module so the precise details for a particular scenario are specified when the User module isinstantiated.

Each activity is defined by a collection of the actions that are linked to it and a goal that those actionsaim to achieve. An action is linked to an activity if it is cognitively cued within the context of that activity.Thus, each activity is specified by its goal and the strength of cognitive cueing of all actions with respect toit.1 In summary, a record for an activity n includes the following two fields:

goal: A predicate that specifies states when the goal of the activity is achieved.cog: A function that for each action a returns the strength of cognitive cueing for a within the context of

activity n. Action a is not linked to activity n, if it is not cognitively cued by n at all.

1 Note that for clarity of exposition we are omitting from this description, and models described later, the parts related tocognitive load, which are included in the full model. They are not relevant because they have not been used in the work describedhere. For the omitted detail see [RBCB09].


4.1.2. Execution of actions

We now look at how the execution of actions is modelled in SAL in more detail. This is specified by the familyof guarded commands PerformAction which executes a user action that has been previously committed to(see Fig. 2):

[](a:ActionRange): PerformAction:commit[a] = committed-->% Physical action and update of user’s memory:

out’ in x:Out | trans[a].tout(inp,out,mem)(x) ;mem’ in x:Memory | trans[a].tmem(inp,mem,out)(x) ;% Timing:

t’ = t + CogOverhead + trans[a].time;% Commitment:

commit’[a] = ready;acommitted’ = false;% Bookkeeping:

turn’ = mach;level’ = -1

This rule consists of a guard and e↵ect if the rule is followed, corresponding to the left- and right-handsides of --> respectively. According to this specification, an action a can only be taken as a result of this ruleif the guard of PerformAction is true. This means that the rule will only activate if action a has alreadybeen committed to, as indicated by the element a in the array commit holding value committed.

If there are several such rules with their guards true then the choice of which fires is non-deterministicIf the rule is chosen to fire, however, then the right hand side specifies the updates to the system state thatoccur.

The first two updates on the right hand side specify the physical user action and the update of user’smemory, associated with the execution of a. In both cases the new value of the variable (e.g. out’) is definedusing the set membership operation. This value is chosen non-deterministically from the set of possible valuesas determined by a dedicated transition relation (e.g. tout). Both transition relations (tout and tmem) arerelations from old to new values. They both use the old values of inputs (perceptions), inp, memory, mem,and outputs (action), out, to determine the relevant new value.

The next update deals with the task execution time t. This time is increased by the value predicted forthe duration of the physical stage of a (trans[a].time). In addition to this a time value, CogOverhead,associated with the cognitive overhead of taking the action is added. This is discussed in detail in Section 4.1.5.

The third group of updates concerns the commitment to an action. Thus, the value in array commit ischanged from committed to ready and the variable acommited is set to false. This indicates that there arenow no outstanding commitments and the GUM is free to choose an action again.

For each action a, the transition relations tout and tmem together with the duration value time arecollected in the record trans[a]. The specific values for these components are provided when the GUM isinstantiated for a concrete system.

The PerformAction rules are enabled by executing the guarded command for choosing an action,CommitAction, which we describe in the following section.

4.1.3. Choice of actions

The first guarded command (CommitAction in Fig. 2) of the transition relation determines which particularaction is chosen. There are a series of conditions specified in the guard that must be true before an actionis chosen. We explain them in three groups as below:

[](a:ActionRange, n:ActivityRange): CommitAction:% Model is in a position to choose an action:

not(acommitted) and % no commitments made

finished = notf and % GUM not terminated

turn /= mach and % user (not device) turn


% Behaviour restrictions for a are obeyed:

cues[a].grd(inp,mem,env) and % no physical restrictions

strategy(a,n,s)(inp,mem,env) and % a fits the strategy

activities[n].cog(a)(inp,mem,env) /= NoCue and % a is linked to activity n

% Salience assumptions (see Section 4.2 and Appendix A):

. . .-->. . .

First of all, the GUM can commit to an action only if it is in a position to choose. The previouslycommitted action must have been taken (condition not(acommitted)). The GUM must not already haveterminated the interaction (finished = notf). Finally, the device must also have had an opportunity torespond before the GUM perceives and interprets device outputs again (turn /= mach).

The next group of preconditions restricts the behaviours based on the system state. Conditioncues[a].grd(inp,mem,env) refers to the absence or presence of physical restrictions on what is possible withrespect to action a. For example, some actions may only be possible if a physical object is held or the person isin a particular position. If the person does not possess an identity card they cannot swipe that non-existentcard, however salient a flashing light to do so might be. The predicate strategy(a,n,s)(inp,mem,env)specifies the problem solving strategies to be explored. For example, it is used to eliminate from explorationless likely user behaviours such as increasing the number entered in small steps, even though larger increasesare possible and the distance to the target value is large. Condition activities[n].cog(a)(inp,mem,env)/= NoCue ensures that the selected action a is linked to the current activity.

The final group of preconditions concerns salience assumptions. Our assumptions about user choice ofactions employ a priority system for the overall salience of actions. Only actions with the highest salienceare considered as candidates to be taken. This is described in detail in Section 4.2 and Appendix A.

Once an action is selected the right hand side of the CommitAction rule activates.

[](a:ActionRange, n:ActivityRange): CommitAction:. . .

-->s’ = s with .last := a % log action a as last selected

with .active := n % make n current activity

with .trace[a] := true % log that a was selected

with .size := 1; % log that at least one action was selected

commit’[a] = committed;acommitted’ = true;t’ = t + if cues[s.last].proc(a)(inp,mem,inv) /= NoCue

then CogOverhead else TimeThink endif

It does not specify how the action is taken, but rather logs it as selected. The user model keeps track ofselected actions and activities in the record s. This involves a series of book-keeping assignments. s.lastrecords a as the last action selected. s.active then records the activity, n, that the selected action is partof, as the current activity. To indicate that the action a has been selected the element a of the array s.traceis set to true. Finally, s.size is set to 1 to indicate that at least one action has been selected (this is neededwhen considering procedural salience).

The change to time is discussed in Section 4.1.5 below.

4.1.4. Task termination

Our two assumptions about the way users may terminate an interaction, voluntary termination and forcedtermination, are represented as top level rules in Fig. 2.

Voluntary termination, where finished is set to ok, is specified by the ExitTask command in Fig. 2.It simply models ‘going away’ from the interactive device. This command can be chosen when the maintask goal, as the user perceives it (predicate Perceived(inp,mem,env)), has been achieved. Since the choicebetween guarded commands is non-deterministic, the ExitTask action may still not be taken even if possible.Also, ExitTask is only possible when there is no earlier commitment to another action.

Forced termination, where finished is set to abort, is specified by the guarded command Abort in


Fig. 2. It models random user behaviour due to there being no apparent way forward. This condition offorced termination is captured by level = 0 which can only be true if there are no enabled salient actions.In such a case, a possible action that a person could take is to wait. The GUM will only do this if there issome cognitively plausible reason to do so, such as a displayed ‘please wait’ message. The waiting conditionsare represented by predicate parameter Wait. If Wait is false, finished is set to abort to model a usergiving up and terminating the task. Forced termination is only possible if the main task goal has not beenachieved.

4.1.5. The timing aspects

To allow timing analysis to be performed, we extended our original generic user model with timing informa-tion concerning user actions. The way timing information is represented is based on the approach taken inGOMS. The execution time associated with a GOMS operator is specified as a constant in our model. At thelevel of abstraction we are working, three GOMS models, KLM, CMN-GOMS and NGOMSL, are similar intheir treatment of execution time [JK96a]. They all include similar primitive physical operators and asso-ciate similar times with their execution. The main di↵erence is in their interpretation and treatment of theunobservable operators. NGOMSL assigns some cognitive execution time (‘cognitive overhead’) to every stepof its production-rule cycle (constant CogOverhead). KLM and CMN-GOMS do not use cognitive overheadtime. All three models, however, include mental operators for such user actions as perceiving information onthe screen.

Though there are di↵erences in the placement of mental operators in these three GOMS models, theyare largely irrelevant in most design situations [JK96a]. A basic GOMS principle [CMN80, CMN83] is thata mental operator should appear only before a ‘cognitive unit’ (highly integrated group of actions) but notbefore user actions that can be fully anticipated [CMN83]. Though Card et al. acknowledge that the notionof a ‘cognitive unit’ is sometimes ambiguous [CMN83] (p. 268), it is generally a more closely related chunk ofactions than an activity in our framework. We implement this basic GOMS principle as follows. Proceduralcueing for an action, a, represents our assumption that a was fully anticipated, otherwise a starts a newcognitive unit. The overall task execution time (variable t) is increased when a is chosen and committedto based on its salience (see the specification of CommitAction in Section 4.1.3). The constant TimeThink,which represents ‘mental time’, is added to t if there is no procedural cueing for a. Otherwise, action a isfully anticipated, and the constant CogOverhead is added to t.

In our model, GOMS-like physical operators apply to the physical phase of action execution(PerformAction). Depending on the granularity of analysis, our physical action may accumulate severalphysical GOMS operators as appropriate (see Section 7 for an example). In this work, we use two primitivephysical operators: pressing a key or button, defined as the time duration constant TimePress, and hominghands on the keyboard, defined as the constantTimeMove. This set of operators can be extended as needed.The physical execution time for action a is specified by trans[a].time when the concrete actions are defined.This value is used to increase the overall execution time in the PerformAction command. To accommodatetime overhead as in NGOMSL, the overall execution time is also increased by the constant CogOverhead inthe same command.

The constants CogOverhead, TimeThink, TimeMove and TimePress are parameters of the GUM. Theycan be easily (re)defined as appropriate. For example, CogOverhead is set to 0 in our case study analysis inSection 5, to reflect the KLM/CMN-GOMS style. The duration values we use for the constants TimeThinkand TimeMove are based on Hudson et al [HJKB99]. These can be easily altered if more accurate times canbe found based on research. Note however that obtaining precise and reliable execution times is a secondaryissue in this paper. Instead, our main focus here is to illustrate our approach to reasoning about and analysisof correctness and performance issues by comparing di↵erent designs of an infusion pump.

4.2. Salience related assumptions in SAL

In this work, we rely on a particular set of salience assumptions that we have used before [RBCB09]. Theseassumptions fall into two groups. The first one concerns a priority system for the choice of actions basedon their overall salience. The second one specifies how the overall salience of actions is determined from thestrength of their cues.

The priority system used here is based on three levels of overall salience: high salience (level 3), medium


Overall saliencefor a user action

Procedural salience Cognitive salienceSensory salience

Procedural cues Cognitive cuesSensory cues

Cue inhibitors (non-specific cue, goal achieved, high cognitive load

Fig. 4. From action cues to overall salience for a user action

salience (level 2) and low salience (level 1). In SAL, these salience levels are represented by the booleanvariables highSalience, mediumSalience and lowSalience defined in Appendix A. If there are actions ofa higher salience then actions with a lower salience level will not be considered for activation, so not be partof any model-checking based exploration. This is specified in the final part of the guard of CommitAction(see Section 4.1.3) as follows:

. . .( level = 3 and highSalience orlevel = 2 and mediumSalience orlevel = 1 and lowSalience )

-->. . .

Once the level of salience that is highest and has actions associated with it is determined, then those actionswill become candidates for execution and their consequences explored during model checking.

More precisely, the overall action salience is checked in three rounds, starting from high salience (level= 3). If there are such actions, the GUM commits non-deterministically to one of them, say a, thus enablingPerformAction. If there are no high salience actions, the variable level is decreased. This is specified in theelse rule at the bottom of Fig. 2. The same process is then repeated to check for medium and then possiblylow salience actions until a salient action is selected.

The overall action salience is derived from several kinds of salience (see Fig. 4). These di↵erent kinds ofsalience are calculated dynamically from the strength of the corresponding cues. We use the term ‘salience’when referring to calculated values and ‘cues’ when referring to the ‘raw’ strength values. These raw valuesconcern how eye-catching a visual cue is, for example, irrespective of the load on the person. The rawstrength of the cues would need to be determined in collaboration with an HCI expert for any specificsystem. Intuitively, the raw strength of a cue represents a ‘base’ level for the corresponding kind of salience.This base level can be lowered in our model by various inhibitors such as high cognitive load, non-specificityof a cue, or the goal of an activity being achieved (see Fig. 4). The precise detail of how the salience valuesare calculated is given in Appendix A.

Our earlier paper [RBCB09] provides more detail about these salience related assumptions, including adiscussion of how they stem from experimental data.

5. A GUM Analysis

In the remainder of this paper, we present a design scenario that demonstrates how the GUM can be used toanalyse both correctness and timing aspects of an interactive system. We will show how the issues identifiedby the GUM analysis can prompt redesigns of the device interface. We will then apply the GUM analysis tothe redesigned systems thus exploring tradeo↵s between timing and correctness. In this section, we give anoutline of the models and analysis. They are discussed in detail in the subsequent sections.


input

Mismatch Model

Device Model

EffectConcrete User Model

input memory output

Interpretation

output

Fig. 5. Complete model of an interactive system

Device. Our case study is based on an infusion pump design. There are several reasons for choosing thiscase study. An infusion pump provides a realistic test of the viability of our approach. Infusion pumps areused both in hospital settings and at home to provide intravenous infusions. They are safety-critical devices,since infusing a drug at the wrong rate or volume may seriously harm patients. Therefore, both error-freeand e�cient interaction with them is important.

We will look at the basic task of setting and starting an infusion with prescribed values of the volumeto be infused and the rate of infusion. The interface we use for entering these numbers as well as the basicmodes of operation are similar to a variety of makes and models of infusion pumps. The general lessons overthe applicability of the approach are therefore likely to apply across a range of di↵erent pumps as well asother devices using similar number entry mechanisms. Our infusion pump model is described in Section 6.

User. The GUM described in Section 4 specifies the general behavioural and salience assumptions commonto a range of situations. To reason about a concrete interactive system this GUM must be instantiated withspecific details of that system and the task analysed (recall Fig. 1) resulting in a concrete user model. Theinstantiation involves deciding what user activities and actions are relevant for the intended analysis, howthose actions are linked to the activities and what goals can be associated with the activities. Once thesedecisions are taken, the next step is to provide cueing information for each action, and state update relationsassociated with their execution. This process would be based on the exploration of interactions with theactual device and a dialogue between the developer of a formal model and an HCI expert. Its goal is todetermine model elements such as plausible values for cue strength and belief updates associated with useractions. In general, where there is doubt over specific values then the model could be used to explore theconsequences of each situation.

Furthermore, in this work, we introduce a model of cognitive mismatch that represents discrepanciesbetween the user beliefs about the system state or device behaviour and the reality. When analysing thepossibility of such a mismatch and its e↵ect on performance in our scenario, the mismatch model is alsoadded into our overall model of the interactive system.

The instantiation of the GUM for our specific device and scenario as well as the related mismatch modelare described in Section 7.

Interactive system. Though the GUM analysis is based on a combination of the device model and theconcrete user model, these models can, in principle, be developed independently of each other. Furthermore,in an ideal scenario, the device model would be developed by a separate team responsible for the formalanalysis of the device. Such independent development is possible, since our approach employs two additionalmodels that connect the state spaces of the device and user models (see Fig. 5). The first one models userperception and interpretation of the device interface, while the second one specifies the e↵ect of user actionson the device (see [RCBB07]). The model of an interactive system is then a suitable composition of theabove four models and a cognitive mismatch model. This is described in more detail in Section 8.1.

System analysis. Though various concrete properties of interactive systems can be explored using theGUM, having a model of user behaviour (a ‘surrogate’ user) allows one to ask more general questions. Forexample, does the user model always achieve the task goal? What are the time bounds for doing that? An


Mode

Option labels

Numerical values

opt1 opt3opt2

UP up dn DN

infuse

on/off

Fig. 6. Schematic representation of the pump interface

advantage of looking at such more abstract properties is that analysing them can help to identify unforeseeninteraction issues.

In our case study, we focus on a basic correctness property that asks whether the concrete user model isalways able to start the infusion with the prescribed values. In general, there are many cognitively plausiblebehaviours for entering these values, especially in the presence of cognitive mismatch. Furthermore, in somesituations it may be critical that the infusion is started su�ciently quickly. Therefore, the second centralproperty in our analysis concerns performance of the user model. It asks whether, in all plausible behaviours,infusion is started within a certain time bound. A formal specification of these properties and their analysisfor our initial system model is described in Section 8.

Design scenario. Performance analysis for our initial system, as we will show, indicates that number entryis too slow. To address this issue, we then consider a plausible redesign scenario for our pump. The firstmodification attempts to make number entry faster. However, our correctness analysis for the modifieddesign shows that the user model is prone to error due to cognitive mismatch. To address this issue, wemodify the pump design once more by introducing an additional step into the procedure of infusion settingas a reasonable trade-o↵ between correctness and speed. These redesigns and their analysis are described inSection 9.

6. The Device Model: An Infusion Pump

In this section, we first describe the pump interface and functionality related to the task of setting the infusionparameters (volume to be infused and rate of infusion) and starting the infusion process. The representationof numerical values is then briefly discussed. A full formal specification of pump behaviour and interface, thePump model, is presented in Appendix B.

Pump interface. The infusion pump (see Fig. 6 for the schematic representation of its interface) has asmall display which indicates the mode of its operation (type ModeMessage) and shows numerical values(type Numbers) relevant in that mode, such as the rate and volume to be infused. Just below the displaythere are three soft option keys that allow a pump user to move between di↵erent modes. The functionalityof these keys depends on the mode and is indicated by the option labels (type Option) at the bottom of thedisplay.

There are four chevron buttons placed below the option keys; they are used to enter numerical valuesincrementally. Pressing the ‘up’ chevron (single up) increases the number entered by a ‘smaller’ amount,whereas pressing the ‘UP’ chevron (double up) increases it in a ‘larger’ amount. The down chevrons ‘dn’


and ‘DN’ operate in a similar way, decreasing the value. Other keys are placed further below the chevronbuttons. For the analysis presented in this paper, of the other keys only ‘infuse’ and ‘on/o↵’ are relevant.

Numerical values. The pump handles numbers with a single digit after the decimal point. We representall these numbers as integers from the interval [0..max], modelled as the following type Numbers:

Numbers: type = [0..max]

Here, max is a parameter of our model which can be instantiated to a suitable value, say 999. In this type,for example, the number 47 represents 4.7 due to the fact that we scale all numbers by a factor of 10. Thisallows the analysis to work on integers which are handled by model checkers more easily.

Key presses. The pump responds to the key presses performed by its user. These are modelled as the inputvariable event of type Events. The event OnOff represents the user pressing the ‘on/o↵’ key. Opt1, Opt2and Opt3 model the three option keys being pressed. Events up and UP model pressing the single and doublechevron buttons, respectively, with dn and DN similar. Run and Hold correspond to pressing the infusion andhold keys. Finally, the Tick event represents states where there were no key presses. In the infusion mode,this would also model the infusion process.

7. The Concrete User Model: A Pump User

Our analysis is based on the combination of the device model and the user model. We introduced the formalmodel of our infusion pump in the previous section. In this section, we describe a concrete user model forthis pump, denoted PumpUser. In doing so we will assume a basic user task of starting an infusion with theprescribed values of rate and volume to be infused. A model of cognitive mismatch between the user beliefsabout the system state or device behaviour and the reality will be introduced at the end of this section.In the following sections, we analyse correctness and timing properties subject to this cognitive mismatchcriterion.

The pump user model is derived by instantiating the GUM described in Section 4. For this instantiation,the following parameters must be defined for the pump user model:

• the types for specifying its state space• the names of its activities and actions• the user model activities and actions themselves

These parameters represent an additional, most specific, set of our assumptions about the behaviour of pumpusers.

Underlying assumption. We start by noting that our instantiation reflects, in various ways, the followingunderlying assumption: the system is analysed when used by a user who has full and correct knowledge asregards the pump interface. By this we mean that erroneous actions may be generated only due to wronguser beliefs about the current state such as the mode of the pump.

7.1. State space

Here we instantiate GUM parameters that define the state of our pump user model. Recall that this state isspecified by the following components: inp:Inp giving things the user can perceive in the world, mem:Memorygiving their beliefs about the state of the system, and out:Out giving the actions they can take in the world.2

Perceptions. The following type Inp represents our assumptions about what the pump users can perceive:

Inp: type =[# on: bool, mode: ModeMessage, option: array [1..3] of Option,

2 The environment component env is not used in this particular case study.


rate: Numbers, vtbi: Numbers, infused: boolean,rateDisp: bool, vtbiDisp: bool, volumeDisp: bool #]

Thus, we assume that the user model is aware of when the pump is switched on (on field). It can perceivethe pump mode (mode field), based on the mode message at the top of the display. The modelled user canalso read the labels of the three option keys at the bottom of the display (option field). Finally, the rate ofinfusion, the volume to be infused and the volume already infused (rate, vtbi and infused fields) can beperceived by the user model but only when these values are actually displayed as indicated by the rateDisp,vtbiDisp and volumeDisp fields, respectively. Note that the components of Inp may, but do not have to,be specified in terms of the device model. If they are not, the Interpretation model is used to provide a‘glue’ for the mismatching elements.

Beliefs. The memory component mem represents our modelling assumptions about the user beliefs aboutthe state of the pump, as specified by the following type Memory:

Memory: type =[# targetRate: Numbers, targetVolume: Numbers, rateSet: bool, volumeSet: bool,

last: MNumbers, delta: [1..100], over: bool, under: bool #]

Here, the targetRate and targetVolume fields represent the target values for rate and volume to be pro-grammed into the infusion pump. For our analysis here, it would make no di↵erence if these two elementswere specified as part of user perception (type Inp). The rateSet and volumeSet fields represent the user’sbeliefs about whether the rate and volume, respectively, have been set to the target values.

The field last3 is relevant in scenarios where the user model does not rely on the actual value potentiallyperceived as inp.rate or inp.vtbi. Instead, it ‘guesses’ this value based on the beliefs mem.last andmem.delta, where the delta field represents the amount by which the user expects the number to bechanged by pressing a chevron. We will give more details about this when specifying how the user model’smemory is updated by executing each action.

Finally, the over and under fields record whether ‘jumps’ over the target value have already occurred dueto the ‘up’ and ‘down’ chevron presses, respectively. This is needed to prevent the user model from infinitelyrepeating sequences such as double ‘up’ chevron then double ‘down’ chevron, for example, in a state wherethe current value is 58 and the target value is 62, leading to an oscillation around the number 62.

Outputs. The type Out in the generic model represents output produced by the user model (i.e., actionstaken). In this case, we assume that the outputs are simply key presses. Since the key presses (or theirabsence) are represented by the Events type specified earlier, we define Out as follows:

Out: type = Events

7.2. User actions and activities

Plausible user action and activities when setting up an infusion would need to be identified by exploratoryinteractions with the actual infusion pump and a dialogue with an HCI expert. In our example, we identifiedthree user activities: two of them involve setting the infusion parameters (rate and volume to be infused),while the third one (mainActivity) is the overall activity the user model is attempting to achieve andconcerns basic pump functions (switching the pump on/o↵ and starting the infusion). In the instantiateduser model, these activities are represented by the enumerated type ActivityRange:

ActivityRange: type = { mainActivity, rateActivity, volumeActivity }

For each of these activities, user actions that are relevant key presses on the pump are taken to be plausiblein that activity. We also distinguish two mental actions (markRateAction and markVolumeAction). These, aswell as the other actions, will be discussed in more detail when their specifications are given in Appendix C.The possible user actions are represented by the following type ActionRange:

3 Type MNumbers consists of the same values as Numbers plus the value -1 which represents the states where the modelled userhas no particular belief as regards this value.


ActionRange: type ={ startAction, infuseAction, exitAction,markRateAction, upRateAction, UpRateAction, downRateAction, DownRateAction,chooseVolumeAction, markVolumeAction, confirmVolumeAction,upVolumeAction, UpVolumeAction, downVolumeAction, DownVolumeAction }

Next, we give examples of formal SAL specifications for the user activities and actions identified above(a detailed development of other user activities and actions is given in Appendix C). To define an activity,we specify its goal and cognitive cueing for all user actions with respect to that activity. This also definesthe actions linked to the activity in question. Recall that in the GUM, user activities are represented bythe parameter, activities. Each element in this array is a record that specifies an activity. Below we showhow the corresponding record is defined for the overall user activity mainActivity for setting and startinginfusion. These two aspects are reflected in the goal we associate with this activity. From the user perspective,the infusion parameters are set if the user believes that the prescribed rate and volume values have beencorrectly entered. These two beliefs are represented in the user model by mem.rateSet and mem.volumeSet,respectively. Similarly, from the user perspective, an infusion has started if the user perceives the relevantmode message (infusingMsg) at the top of the display. Formally, this activity goal is specified as a predicate,denoted PerceivedGoal. It checks whether, in a given state, the modelled user is perceiving the messagethat the infusion has started, and believes that both the rate and volume have been set:

PerceivedGoal = lambda(inp:Inp,mem:Memory,env:Env):inp.mode = infusingMsg and mem.rateSet and mem.volumeSet

To achieve the above goal, the pump must be switched on, the rate and volume set as required andthe key for starting the infusion pressed. Since setting the rate and volume values are modelled as separateactivities, we link to mainActivity only such basic actions as switching the pump on/o↵ and starting aninfusion. Within the context of starting the infusion, switching the pump on (startAction) and pressing the‘infuse’ key (infuseAction) are task-related actions. Thus it is assumed that their cognitive cueing is strongwithin this activity. On the other hand, exitAction has weak cognitive cueing, since switching the pumpo↵ does not seem to bear obvious relevance to the task we consider. All other actions are not part of thisactivity so have no cognitive cueing. Formally, mainActivity is specified as the following record combiningthe goal with the cognitive cueing strengths of the actions linked to that goal:

(# goal := PerceivedGoal, cog := lambda (a:ActionRange): lambda(inp:Inp,mem:Memory,env:Env):

if a = startAction or a = infuseAction then StrongCueelsif a = exitAction then WeakCueelse NoCue endif #)

One of the infusion parameters to set up is the volume to be infused. As an example of the actionspecification, let us consider the user choosing the vtbi mode (chooseVolumeAction) by pressing the secondoption key which has vtbi label in the basic mode. Recall that user actions are formally specified by definingthe relevant cueing information and transitions associated with them. The cueing information for an actiona is specified as the corresponding element in the array cues (i.e., cues[a]). Similarly, the state updatesassociated with a are specified as the array element trans[a]).

First, the cueing information for chooseVolumeAction is formally specified as the recordcues[chooseVolumeAction] defined as follows. Since there is no fixed action the user would take beforechoosing the vtbi option, we assume that chooseVolumeAction does not have procedural cues:

proc := noCue

We assume that the focus of user attention is the pump display when setting up the infusion parameters.Consequently, the sensory cueing for this action is assumed to be strong since the vtbi label is shown onthe display:

sens := strongCue

Also, we assume that this user action is specific only if the user does not believe that the volume has alreadybeen set:

spec := lambda(s:Status): lambda(inp:Inp,mem:Memory,env:Env): not(mem.volumeSet)


Finally, chooseVolumeAction is only possible if the label vtbiOpt is displayed above one of the three optionkeys (inp.option[j]):

grd := lambda(inp:Inp,mem:Memory,env:Env): (exists(j:[1..3]): inp.option[j] = vtbiOpt)

Next we specify the state transitions associated with chooseVolumeAction. That is we define the recordtrans[chooseVolumeAction] as follows. We assume that this action yields an event that represents thevtbi option key being pressed (optAction(j) returns the relevant event):

tout := lambda(inp:Inp,out0:Out,mem:Memory): lambda(out1:Out):exists(j:[1..3]): inp.option[j] = vtbiOpt and out1 = optAction(j)

This action is also assumed to clear the memory of any beliefs about the last value entered (last := -1)and the occurrence of ‘jumps’ over or under the target value:

tmem := lambda(inp:Inp,mem0:Memory,out:Out): lambda(mem1:Memory):mem1 = mem0 with .last := -1 with .over := false with .under := false

Finally, we assume that executing this action involves moving a hand to the keypad and pressing a key.Thus, the execution time associated with this physical action is obtained by adding the duration of thesetwo operators:

time := TimeMove + TimePress

7.3. Cognitive mismatches

Based on device manuals, training or previous experience, humans usually create internal ‘mental’ modelsof the interactive devices they use. These models may lead to certain beliefs about the device behaviour.Depending on the situation, these beliefs may or may not be justified. This leads potentially to cognitive

mismatches between the mental model and the actual state of the system, which can result in erroneoususer actions. Rushby has previously explored formal verification techniques to investigate such cognitivemismatches [Rus01].

In general, cognitive mismatches are unavoidable, even when all the required information is present in thedevice interface. This might be due to high cognitive load, interruptions, habits or other facets of the systembeing more salient, for example. In our earlier work, the assumptions about how cognitive mismatches arehandled by the user were part of the user models. However, such assumptions are usually strongly dependenton the task, its context and on behavioural strategies. We trial here a way to take this into account and makeour approach more flexible, by specifying potential cognitive mismatches as a separate ‘mismatch’ model(see Fig. 5). Here we use a concrete mismatch model specifically for the pump under consideration. In futurework we will explore the creation of generic mismatch models that can be instantiated with the detail of aspecific device.

The mismatch model. This model, denoted Mismatch, describing potential cognitive mismatches for thepump user model is specified in Fig. 7.

Intuitively, we consider as cognitive mismatches those states of our interactive system where the displayedvalue and the user belief about that value di↵er in some number entry activity. In this case that means eitherin entering the volume or the rate. The degree of mismatch then corresponds to the number of continuoussteps (number entry actions) the user takes without correcting the wrong belief. Its parameter maxDegreegives the maximal degree of mismatch we are ready to accept in our analysis. The current degree of mismatchis represented by the counter k. When the current degree becomes greater than the maximum in some traceof the system, the boolean stop is set to true. This indicates that the analysis of that trace can be stopped atthis point as a problematic situation has been discovered (see below). If the user corrects the wrong belief atany time then the current degree of mismatch is restored to zero, if it has not already reached the maximum.The e↵ect of cognitive mismatch on system properties of interest can be explored by choosing di↵erent valuesfor the parameter maxDegree.


initialization k = 0 % no mismatch initially

definition stop = (k > maxDegree) % stop when mismatch is too great

transition[ % Increase mismatch for the rate increase actions:

k <= maxDegree and % mismatch does not exceed the maximum

mem.last + mem.delta /= inp.rate and % believed rate is different from actual

(s.last = upRateAction or s.last = UpRateAction) and % last action was rate increase

turn = usr and not(acommitted) and % user turn and no commitment

level = 3-->k’ = k + 1

[]% Increase mismatch for the rate decrease actions:

k <= maxDegree and mem.last - mem.delta /= inp.rate and(s.last = downRateAction or s.last = DownRateAction) andturn = usr and not(acommitted) and level = 3-->k’ = k + 1

[]% Increase mismatch for the vtbi increase actions:

k <= maxDegree and mem.last + mem.delta /= inp.vtbi and(s.last = upVolumeAction or s.last = UpVolumeAction) andturn = usr and not(acommitted) and level = 3-->k’ = k + 1

[]% Decrease mismatch for the vtbi decrease actions:

k <= maxDegree and mem.last - mem.delta /= inp.vtbi and(s.last = downVolumeAction or s.last = DownVolumeAction) andturn = usr and not(acommitted) and level = 3-->k’ = k + 1

[]else -->

]

Fig. 7. Mismatch model for the infusion pump in SAL.

8. System Analysis: Correctness and Performance

So far we have formally specified the pump device model, Pump, the pump user model, PumpUser, and themismatch model, Mismatch. In this section, we first define a formal model for the interactive system underconsideration as a whole. Then we will formally define the properties needed to analyse its correctness ande�ciency. Finally, we will move to the formal analysis itself.

8.1. System model

Our specification of the interactive system as a whole involves combining and connecting the pump modeland the pump user model. This requires two additional models (see Fig. 5)): firstly that of user interpretationof the pump interface (Interpretation), and secondly a model giving the e↵ect of user actions on the pump(Effect). These additional models connect the state spaces of the pump and user model. For our casestudy, these connectors are in fact trivial – they simply rename appropriate variables. For example, userinterpretation of the displayed value for the volume to be infused is specified in Interpretation as follows:

inp.vtbi = val.vtbiDisp

Similarly, the e↵ect of user actions is specified in Effect as below, stating that the input event on the pump(event) is either a do nothing step (Tick) or whatever key press (out) the pump user model produced:


event = if acommitted or finished /= notf then Tick else out endif

The Tick event is registered by the pump when the user has just committed to some action (i.e., performedthat action’s mental phase) or already finished the interaction in some way. In these situations no physicalaction is taken by the user.

The SAL module IntSystem of the interactive system is then specified as the following composition ofall these separate models:

IntSystem: module = (PumpUser || Effect) [] (Pump || Interpretation)

For our analysis, we add the cognitive mismatch model to this system model. The combined system, denotedSystem, is defined as follows:

System: module = IntSystem || Mismatch

8.2. Properties

The correctness and performance properties under consideration are specified in LTL (linear-time temporallogic).

Correctness. The basic correctness property concerns the functional correctness of our interactive systemfor the task considered. It asks whether the pump user model is always able to start the infusion with theprescribed values. The formal specification of this property involves two conditions. The first one reflectsthe user perspective and represents user’s belief that the infusion has started with the prescribed values. Wehave already specified this condition as the predicate PerceivedGoal in Section 7. In general, however, userbeliefs are not necessarily correct. Thus, the second condition reflects a more global perspective. It checkswhether the actual system state is consistent with the user beliefs about the infusion parameters. In our casestudy, this condition, denoted PrescribedSettings, is defined as follows, stating that the volume and rateare actually set to the target values:

PrescribedSettings(mem:Memory, val:Values): bool =val.vtbi = mem.targetVolume and val.rate = mem.targetRate

Subject to the cognitive mismatch criterion, functional correctness for System and the task we consider isthen formally specified as the following LTL assertion, denoted Correct:

F (not(stop) => Perceived(inp,mem,env) and PrescribedSettings(mem,val))

where the F operator means ‘eventually’. The assertion states that, in any interactive system behavioursuch that the degree of cognitive mismatch does not exceed the maximum (not(stop) is true), a state iseventually achieved where both functional correctness conditions hold.

Cognitive mismatches are unavoidable in many situations. A natural question to ask is then whetherthe user model can ever ‘experience’ cognitive mismatch of a degree higher than specified as the acceptablemaximum. Our second correctness property, denoted MismatchOK, formally expresses this concern:

G (not(stop))

where the G operator means ‘always’. Note that, if MismatchOK holds for maxDegree = 0 then the cognitivemismatch, as specified, never arises in our interactive system as modelled.

Functional correctness only guarantees that the main task goal is eventually achieved at some unspec-ified point in the future. In many situations, especially in the case of various critical systems includinginfusion pumps, designs can be judged as unacceptable on the grounds of poor performance, even if they arefunctionally correct. Next we state a system property that supports performance analysis in our approach.

Performance. Functional correctness is a binary property – it is either true or false. While it is importantto have assurance that an interactive system is correct, its quantitative aspects such as performance can be ofequal interest. Setting parameters and starting an infusion quickly may be critical in, e.g., operating theatres.However, for busy nurses, being able to perform such tasks easily is also important even when not safetycritical. Therefore, one needs some assurance that an infusion can be started in a reasonable amount of time,


not at some unspecified point in the future. Formal timing verification could provide stronger guaranteesthat critical interactions are su�ciently e�cient by exploring a range of plausible but not necessarily optimalbehaviours.

Thus, the second central concern relates to performance of the user model. Since model checking involvesthe exploration of all possible behaviours generated by the system model, a natural question to ask is aboutthe time bounds for a number of user behaviours. In this respect, our approach is di↵erent from GOMS whichfocuses on the precise execution time of a specific sequence of user actions. More specifically, our performanceproperty checks whether the execution time does not exceed the specified upper bound for any plausible userbehaviour (as defined by our assumptions) that stays within the range of acceptable mismatch and actuallystarts infusion with the prescribed values. This property is specified as the following LTL assertion, denotedPerformance:

G (not(stop) and Perceived(inp,mem,env) and PrescribedSettings(mem,val) => t <= T)

Here T is a relevant time bound. Recall that the not(stop) condition actually depends on the degree ofcognitive mismatch (i.e., the maxDegree parameter). Choosing di↵erent maxDegree values allows one toexplore the impact of the degree of cognitive mismatch on user performance.

If the Performance assertion is false, SAL would produce a counter-example violating it in the form ofa specific sequence of actions and intermediate states. This can help to determine interface design featuresthat inhibit user performance and to modify them in subsequent designs.

8.3. Formal analysis

Having determined the properties of interest next we demonstrate how these kinds of correctness and per-formance properties can be used to analyse user interaction using the infusion pump specified in Section 6as illustration.

Remarks. In our analysis, we will focus on the volume setting activity. By this, we mean that the propertiesof interest will be checked for all target volume values in the range ([1..max]).4 At the same time the targetrate will be fixed, say targetRate = 1. We do so, since both volume and rate setting activities are similar inour pump.5 Thus, any findings about the issues related to volume setting are likely to be relevant to ratesetting as well. Furthermore, if necessary, setting the rate can be analysed in the same way. We do not doso here as it would not illustrate any new point. Note that such ‘decomposition’ of analysis makes modelchecking of the relevant properties considerably faster. Finally, to speed up model checking of our examples,we also set max to 300 (instead of 990).

Correctness. We start by asking a question about whether cognitive mismatch, as defined in Section 7.3,is possible in the user’s interaction with the pump. For that, we verify the assertion MismatchOK with theparameter maxDegree = 0. Model checking shows that this property holds. This means that the modelleduser beliefs about the volume and rate values are always consistent with the pump state.

Next we check functional correctness by verifying the assertion Correct. Again, model checking showsthat the property holds. This indicates that, with this pump interface design, the user is always able to per-form the task of setting the infusion correctly, assuming the user behaviour is consistent with the assumptionsrepresented by our pump user model.

Performance. Now we check how e�cient user performance on the task in question is. Let us assumethat the upper time bound is T = 300 time units (tenths of a second). Note that we have chosen this valuerather arbitrarily to explore what time may be needed to perform the task. With this value of T, modelchecking the assertion Performance produces a counter-example. Further verification with T = 325 generatesa sequence of user actions which shows that the user model takes t = 329 time units to achieve the task goalfor targetVolume = 300. Investigation of the trace shows that there are no unnecessary actions taken in thatsequence. The reason for rather slow performance is simply the number of chevron presses (30 double ‘up’

4 This is done by requiring only that the condition targetVolume > 0 holds at the initialisation of our pump user model.5 Setting the rate is simpler in that it does not require changing the pump mode after switching it on and confirming the ratevalue.


keys) required to set 300 as the volume to be infused. On the other hand, model checking the same assertionfor T = 330 shows that this upper bound is never exceeded for behaviours consistent with our model. Eachmodel checking run to verify the Performance property takes a time within the range of 200–400 seconds. Theverification times for the correctness and performance properties are discussed in more detail in Section 10.1.

The time of almost 33 seconds to start an infusion may be considered by designers of the system asbeing too long, especially since our analysis assumed targetRate = 1 which required little time to set. Morerealistic values of rate would only increase the overall time. In any case, what nurses actually believe to bereasonable times to do such tasks might be determined based on empirical studies.

To summarise, our analysis so far does not indicate any potential for systematic human error while settingand starting infusion with the initial design of pump. The pump user model does achieve the goal, while thedesign does not lead to problems of cognitive mismatch. On the negative side, the pump is slow to operate,since entering larger values requires more key presses. In the next section, we consider a modification to thedesign of the number entry system that aims to improve user performance.

9. Pump Redesign

This section describes a plausible redesign scenario for our pump. We start by attempting to address theperformance issue with the pump design identified during our previous analysis. The initial pump design isfirst modified to make number entry faster, then analysed using the same set of properties. The correctnessanalysis of the modified design, as we will see, indicates that the user model is prone to cognitive mismatchand subsequent error when setting the infusion parameters. We therefore modify the pump design once more.By introducing an additional review step into the procedure of infusion setting, this third design facilitatesthe user model in recovering from a possible error in number entry. While our analysis of performanceindicates that user interaction is somewhat slower, the new modification may be viewed as a reasonabletrade-o↵ between correctness and speed.

9.1. Faster number entry

First we consider a small modification to the number entry system to make incremental entry more e�cient.The design feature we introduce here is actually used in real pumps.

The pump model. Recall that all numbers can be entered with a single digit after the decimal point inour initial pump design. Our modification introduces a threshold of 10 so that the numbers below and aboveit will be treated di↵erently. The numbers less than 10 can be entered with a single digit after the decimalpoint. Other numbers can only be entered as integers. This threshold of when numbers switch to integers isspecified as the following constant:

threshold: natural = 100

Note that the value 100 stands for the number 10 in the actual pump due to the scaling factor explainedearlier.

Next we modify the operation of the chevron keys so that the changes they produce in the number enteredare dependant on the current value of that number. Thus, pressing the single ‘up’ chevron when the currentvalue is less than the threshold 10 would increase it by 0.1. However, if the current value is greater than thethreshold, the increase will be by 1. For the double ‘up’ chevron, the corresponding increases will be by 1and 10, respectively. Finally, the operation of the ‘down’ chevrons is modified in the same way.

Let us see how the relevant transitions (vtbi up and vtbi UP) are specified for the new style of numberentry. A new specification for the vtbi up transition is as follows:

vtbi_up:event = up and m.mode = vtbi and val.vdisp + deltaUp(val.vdisp) <= max -->

val’.vdisp = val.vdisp + deltaUp(val.vdisp)

Here, the function deltaUp implements the rule (replacing the previous constant value) for determining theincrease value for the single ‘up’ chevron, informally explained above. Its SAL definition is as follows:


deltaUp(value:Numbers): [1..10] = if value < threshold then 1 else 10 endif

The vtbi UP transition is specified in a similar way. However, in this case, a situation is possible suchthat increasing the current value by DeltaUp(val.vdisp) would produce a new value that is higher thanthe threshold even when the initial value was lower than the threshold. For example, pressing the double‘up’ chevron when the current value is 9.6 should normally produce 10.6 as the result. However, the pumphandles only whole numbers above the threshold. Thus, the actual result is 10 – the threshold. Formally, wespecify this transition as follows:

vtbi_UP:event = UP and m.mode = vtbi and val.vdisp + DeltaUp(val.vdisp) <= max -->

val’.vdisp = let new:Numbers = val.vdisp + DeltaUp(val.vdisp) inif val.vdisp < threshold and threshold < new then threshold else new endif

The definition of DeltaUp is similar to that of deltaUp. It implements the increase rule for the double ‘up’chevron:

DeltaUp(value:Numbers): [10..100] = if value < threshold then 10 else 100 endif

The pump user model. Since our pump modifications do not introduce new functionality, the samepump user model from Section 7 can be used in our formal analysis. Note that the delta functions (deltaUp,DeltaUp, etc.) are used in the user model which is therefore ‘aware’ of the delta values. In other words, westill model the user who has correct knowledge of the pump operation.

Formal analysis. We again start our analysis by checking whether cognitive mismatch is possible in userinteraction with the modified pump. This time, model checking the assertion MismatchOK with the parametermaxDegree = 0 fails. A counter-example is generated that indicates the following scenario. The user repeatedlypresses the double ‘up’ chevron to increase the volume value. In the state val.vtbiDisp = 100, the user beliefabout that value mem.last = 100 is still correct. The user presses the same chevron again (UpVolumeAction).The procedural cueing induced by the repetition of the same action triggers the user belief mem.last = 110(increase by the same old delta value). Now, however, the pump increases the volume value by 100 so thatval.vtbiDisp = 200 since the threshold has been reached. This matches the common real situation where aperson gets into a particular habitual mindset as a result of repetition of an action.

The cognitive mismatch in the above scenario suggests that erroneous behaviours are possible whensetting the volume to 110. Indeed, initialising mem.targetVolume to 110 and model checking the assertionCorrect with maxDegree = 1, indicating that cognitive mismatches of degree 1 are allowed, confirms ourintuition by producing the relevant counter-example. Note that, in reality, this would correspond to a ratherserious drug overdose by a factor of almost 2.

Using our approach, this issue can be analysed further. For example, let us assume that the cognitivemismatch does not continue longer than a single number entry action (maxDegree = 1). Then model checkingthe Correct property for the initialisation mem.targetVolume /= 110 shows that the property holds for allother values of the target volume. This suggests that some additional strong cues are needed when thethreshold value is reached in this style of number entry.

What is the impact of our changes for user performance in interaction with the modified pump? Westart with the direct comparison to the initial design. Recall that there were no cognitive mismatches arisingin our model in that case. Thus, we first check user performance with the modified design assuming thereare no cognitive mismatches (maxDegree = 0). It turns out that performance has improved – model checkingPerformance for di↵erent T values shows that it holds for the time bound T = 262. In fact, 262 is the timewhich our user model might use to achieve the task for the target volume 240.

Next, one might want to analyse user performance in the presence of cognitive mismatches since theyare possible with the new style of number entry. For example, let us assume cognitive mismatches of degree2 or lower (maxDegree = 2). In this case, it turns out that even the higher time bound T = 300 might notbe su�cient. Model checking generates a trace such that T = 300 is exceeded for mem.targetVolume /= 190.This corresponds to a scenario where, in a state val.vtbiDisp = 100, the user overshoots 190 by pressingthe double ‘up’ chevron twice due to the procedural cueing, assuming at the critical point that the increasevalue is only 10 instead of the actual increase by 100.

To summarise, what does our analysis tell us about the modified design? It suggests that the new designis more e�cient. In the absence of cognitive mismatches, the upper time bound decreases from almost 32.9


seconds to 26.2 in the new design. In fact, model checking Performance confirms that T = 304 is an uppertime bound even for the behaviours involving cognitive mismatches of degree up to 2 (maxDegree = 2). Theupper time bound (T = 329) for the initial design, on the other hand, exceeds this time bound even thoughno cognitive mismatches arise. However, the possibility of cognitive mismatches with the new design leadsto erroneous user behaviours that were not possible with the original pump. Next, we will consider a furthermodification to the pump that attempts to solve the error issue outlined in this section.

9.2. Review of parameters

The basic idea for the next modification is to facilitate user recovery from an error by prompting the user toreview the entered values and correct them if needed. This is achieved when, in response to the user pressingthe ‘infuse’ key, the pump would display the infusion parameters to the user with the options to confirmthem or cancel. If the parameters are confirmed, the pump would start the infusion. Otherwise, it wouldreturn to the basic mode of operation. As with the previous modification, this idea, with similar confirmationsteps, is used in the designs of real pumps.

The pump model. First we outline how the review step is reflected in our pump model. Since it canbe viewed as an additional mode in the pump behaviour, we redefine our two previously introduced typesrelated to the pump modes by adding an extra element to each (review and reviewMsg, respectively):

BasicMode: type = { off, rate, vtbi, clear, review, infusing }ModeMessage: type = { noMsg, vtbiMsg, clearMsg, reviewMsg, infusingMsg, onholdMsg }

Here, review is a new mode where the pump waits for the outcome of user review, while reviewMsg is ourrepresentation of the message shown at the top of the pump display while it is in that mode.

Most of the transitions specified in the pump model of Section 9.1 remain unchanged. Below we brieflydiscuss those that require modifications. For example, the run transition now changes the pump mode toreview instead of infusing:

run:event = Run and m.mode = basic and not(val.volume) --> m’.mode = review

This mode transition can only occur when the pump is in the basic mode and the whole volume has notalready been infused (condition not(val.volume)).

When the pump is waiting for the outcome of user review, two new transitions are possible. The firstone, denoted review ok, represents the pump response to the user pressing the confirmation key (the firstoption key):

review_ok:event = Opt1 and m.mode = review -->

m’.mode = if val.vtbi > 0 then infusing else basic endif

If the volume has been set to anything greater than zero, the pump starts infusing. Otherwise if the volumeis not set to greater than zero, the pump returns to the basic mode of operation. The second transition,review cancel, models the pump response to a negative outcome of the review – the user pressing thecancelation key (the third option key). The pump mode is changed to basic in this case:

review_cancel:event = Opt3 and m.mode = review --> m’.mode = basic

Finally, the pump display disp presented to the user in the review mode is defined by the followingrecord:

offDisp with .modeMsg := reviewMsg with .modeMsg := vtbiMsg with .vtbiDisp := truewith .option1 := okOpt with .option3 := quitOpt

Thus, in this mode, the pump interface displays the entered values for the infusion parameters and a messageasking the user to review them. The okOpt label indicates that the first option key can be used to confirmthe displayed values and start the infusion, while quitOpt indicates that pressing the third option key willprovide the user with an opportunity to change them.


The pump user model. Some changes are needed to the instantiation of the GUM. First of all, we redefinethe Memory type by adding an extra element, recover, to this record:

Memory: type =[# targetRate: Numbers, targetVolume: Numbers, rateSet: bool, volumeSet: bool,

last: MNumbers, delta: [1..100], over: bool, under: bool, recover: bool #]

Intuitively, if true, the new recover field represents the user belief that an error has occurred while enteringthe infusion parameters and some kind of recovery is necessary. This is discussed in more detail below.

Modifications to our pump user model developed in Section 7 mostly concern the two user actionsrelevant to the review step. Thus, we redefine the type ActionRange by adding to it two new elements:confirmReviewAction and cancelReviewAction. Our cueing assumptions for the latter are specified asfollows. First, we assume that there is no procedural cueing for the cancelation action:

proc := noCue

Next, its sensory cueing is assumed to be weak since neither the cancellation key nor confirmation key ismore visually prominent than the other:

sens := weakCue

The cancellation action is specific if either rate or volume perceived by the user does not match the target:

spec := lambda(s:Status): lambda(inp:Inp,mem:Memory,env:Env):inp.rateDisp and inp.vtbiDisp and(inp.rate /= mem.targetRate or inp.vtbi /= mem.targetVolume)

Finally, this action is only possible if there is an option key with a label represented by quitOpt in ourmodel:

grd := lambda(inp:Inp,mem:Memory,env:Env): (exists(j:[1..3]): inp.option[j] = quitOpt)

In terms of state transitions, cancelReviewAction yields a press of the option key labelled by quitOpt:

tout := lambda(inp:Inp,out0:Out,mem:Memory): lambda(out1:Out):exists(j:[1..3]): inp.option[j] = quitOpt and out1 = optAction(j)

More importantly, the user beliefs about the parameter settings are updated as well. Thus, by cancelling thecurrent settings, the user assumes that recovery is needed (mem.recover is set to true). Furthermore, thebeliefs about rate and volume setting (mem.rateSet and mem.volumeSet) are updated by comparing theirperceived values to the corresponding target values (e.g., inp.rate = mem0.targetRate). Formally:

tmem := lambda(inp:Inp,mem0:Memory,out:Out): lambda (mem1:Memory):mem1 = mem0 with .recover := true

with .rateSet := (inp.rate = mem0.targetRate)with .volumeSet := (inp.vtbi = mem0.targetVolume)with .last := -1 with .over := false with .under := false

The specification for confirmReviewAction is defined in a similar way. It should be noted, however, thatthe weak procedural cueing assumed for that action is more debatable than in the case of the cancellationaction. It is possible that a user will press the confirmation key due to habit without actually checkingthe values of the infusion parameters. This problem can be addressed by a more advanced confirmationprocedure. For example, instead of the simple key press, the user could be asked to enter the infusion time.The latter will be interpreted as a confirmation, only if it is consistent with the time calculated from thevolume and rate values by the pump. We do not explore this issue formally, here, however.

Formal analysis. As might be expected, adding a review step does not eliminate the cognitive mismatchdetected for the previous design in Section 9.1. This is confirmed when the verification of the assertionMismatchOK (maxDegree = 0) fails for the modified design, producing the same counter-example as before.Furthermore, model checking the same assertion for maxDegree = 5 shows how the user could reach a cognitivemismatch of degree 5.

On the other hand, the review step, with our assumption that the user takes it ‘seriously’, makes an


impact on the functional correctness property. For example, model checking with maxDegree = 5 provesthat the assertion Correct holds for all target volumes with the new design. Thus, the specified cognitivemismatches of degrees up to 5 do not prevent the pump user model from achieving the task goal. The modelis able to recover from mistakes due to the review step and carry on. Of course, this does not formally provethat the same assertion would hold for higher degrees of mismatch.6 However, there is a strong reason toexpect that it would, since the review step does not really depend on cognitive mismatches earlier in thetask. Obviously, these conclusions could only be applied to real users assuming that (i) the target values arecorrectly perceived, and (ii) the user is able to determine correctly whether the values entered are the sameas the target values before confirming them in the review step.

Let us now see what impact an additional review step makes on user performance. First, we considerscenarios without cognitive mismatch. Recall, that for the previous design, the upper time bound was T = 262.Obviously, an extra step requires some time, so now we verify the assertion Performance for T = 285. Modelchecking proves this. One can conclude that the price to pay for introducing an extra step is reasonablysmall.

Our next verification result is perhaps more surprising. Model checking reveals that, for the new design,the upper time bound (T = 323) for the scenarios with cognitive mismatches up to degree 2 is comparable tothat (T = 329) of the initial ‘slow’ design. Presumably, this is so because di↵erent scenarios require maximalexecution time for the two designs. Indeed, it turns out that, for the latest design, setting the volume to 190takes the maximal time (323 units). In this scenario, the user in a state such that val.vtbi = 100 presses thedouble ‘up’ chevron twice due to the cognitive mismatch involving the increase value delta. Due to proceduralcueing, the user then confirms the current value 300 wrongly believing it is the target volume 190. This isdetected during the review step, but the recovery from the error takes some time. Note, however, that themaximal value for the target volume was 300 in this analysis. The new design would be obviously moree�cient than the initial ‘slow’ design for the target values higher than 500.

10. Conclusions

For safety-critical systems, both the potential for user errors being made and time needed for a user tocomplete tasks are important areas of concern. In this paper, we have presented a formal verification frame-work that addresses both concerns in an integrated way, based on a generic model of cognitively plausiblebehaviour. Using the same set of formal specifications and tools, our framework makes it possible to exploreuser-error related design issues and to predict time bounds for task-completion. We have also incorporateda cognitive mismatch model into our framework. This provides a way to analyse the e↵ect and degree ofcognitive mismatch on both correctness and performance of user interactions.

10.1. Results

To illustrate modelling and analysis using our approach, we have presented a case study based on a plausibledesign scenario for a safety-critical medical device – an infusion pump. We have applied both kinds of analysisto an initial design of the pump and then demonstrated how the identified timing and human error issuescan guide further redesigns of the pump so that a reasonable trade-o↵ between speed and correctness isachieved. This case study has also shown how timing in the presence of error and user recovery from thaterror can be investigated in our framework.

Based on this analysis, it can be concluded that the task of infusion programming is both successfullyand more e�ciently (compared to the initial design) performed on the second redesign of the pump. Theconclusion about successful performance, however, rests on the following two assumptions: (i) the targetvalues are correctly perceived, and (ii) the review step for checking the infusion parameters is taken seriously.In other words, we assume that the user is able to determine correctly whether the values entered are thesame as the target values before confirming them. Though this redesign is slightly less e�cient than the firstone, it arguably achieves a reasonable trade-o↵ between speed and correctness.

6 Though this could be done since model checking Correct with, for example, maxDegree = 2 and maxDegree = 5 takes a similaramount of time.


Verification times. Verification described here was performed using the symbolic SAL model checker (sal-smc) on a MacBook Pro laptop with a 2.66 GHz dual-core Intel Core i7 processor and 4 GB of RAM runningMac OS X 10.6.8.

It should be noted that the running times were considerably a↵ected by the variable ordering used bythe model checker. In some cases, a better suited variable ordering resulted in at least several times shorterverification times. On the positive side, we have found that a more e�cient variable ordering can be discoveredusing sal-smc and incrementally increasing the state space of the model. This variable ordering can later beused for the actual analysis of the correctness and performance properties.

Generally, the verification time depended on both the model complexity and the property checked. Thus,it increased with each of the redesign steps. At the same time, the verification of the correctness property(Correct) required a substantially longer time than that for the performance property (Performance). Morespecifically, the verification time range for the correctness property was 1200–1700 seconds, while the timerange for the performance property was 220–910 seconds.

10.2. Discussion

The approach described here does not aim at developing a definitive representation (cognitive architecture)of cognitive human behaviour. Rather, it provides a formal framework that allows one to take a set ofreasonable (not definitive) cognitive assumptions and then explore their consequences. The GUM describedhere encapsulates a particular set of assumptions about what is plausible behaviour. These have been basedon concepts from the literature together with experimental results. In particular, the set of salience relatedassumptions used here was formed in parallel and justified by empirical studies [RBCB09]. However, wedo not see these as a final set of assumptions and ongoing research is needed to further validate them andextend them. Indeed the addition of cognitive mismatches incorporated into this work is one such way ourassumptions have been extended, allowing the consequences of a new class of behaviour to be analysed. TheGUM framework could also be used to explore the consequences of alternative sets of assumptions.

The GUM and salience related assumptions were all formed for earlier case studies and did not needexplicit modification for this new case study. We did add the model of cognitive mismatches here. This ishowever a general approach that we would expect to be applicable in other realistic case studies. Our aimin this whole body of work has been to create a generic approach where only specific details of devices andtasks need to be introduced for each new study to remove the need for craft skill in constructing modelswhere possible.

There is clearly still craft skill involved in choosing an appropriate level of abstraction, deciding what ismodelled, and instantiating the models (for example, in deciding how salient individual interface elementsare). Note, however, that an analyst would only need to provide discrete values for the cue strength – thenumeric salience levels and values are generated in the model. That means the analyst has only to provide anindication whether a cue is strong, weak or not there at all. The latter is clear (e.g., an alarm isn’t soundingat all at that point), so the judgement is a matter of deciding if a cue is strong or not, and ultimatelythis is relative. Moreover, some cues may not clearly be very strong or very weak. In case of proceduralcueing, a starting point for the analyst would be that someone fully trained in a procedure would havestrong procedural cuing for that sequence, whereas a novice would not. However, in those cases where itwas a judgment call, at least the assumptions made for the analysis are clear. Furthermore, a sensitivityanalysis could be performed thereby exploring the consequences of di↵erent decisions as part of the analysis.For example, this approach might be used looking at the consequences of when a novice (e.g., locum nurse),with only weak procedural cueing for a sequence, attempts the task: does this lead to problems or is theinteraction design robust enough?

The main goal of our approach is to predict erroneous behaviours, not cognitively plausible sequences assuch. Consequently, our models may generate wide range of behaviours, both those that people actually andfrequently exhibit and those observed less frequently. For the kind of application (medical device use error)we are interested in, very rare but still plausible behaviour that leads to incidents is of importance. Some ofthese behaviours may not be so easily observed within the confines of an experiment or in real life.

This paper aims to demonstrate as a proof of concept the possibility to combine both correctness verifi-cation and timing analysis within the single modelling framework. We have not made an attempt to comeup with reliable timing values for specific actions; therefore timing predictions in the paper may not beaccurate. However, we have shown that given appropriate timing data, our approach can utilise it. There is


a whole body of work (e.g., around GOMS) of determining accurate timings. Our aim is to show in principlethat our approach can build on that work.

As with CCT models [KP99] and unlike pure GOMS, we use an explicit device specification that couldhave its own timings for each machine response. It is likely that most are essentially instantaneous (belowthe millisecond timing level) and so approximated to zero time. However, where device response operatorsare used in GOMS, the corresponding times can be assigned to the device specification in our approach. Wehave already demonstrated the capability of our approach to detect potential user errors resulting from thedevice delays or indirect interface changes without any sort of feedback [CB01]. Though this aspect has notbeen investigated here, the presented framework opens a way to deal with real-time issues and detect someclasses of timing-related usability errors (e.g., when system time-outs are too short, or system delays are toolong), whilst still doing in parallel time-free error analysis based on the verification of correctness properties.

Though the GUM was extended with the GOMS-style timing information, the key di↵erence to GOMS isthat our model is inherently non-deterministic and covers a wide range of plausible behaviours. Consequently,performance analysis in our approach deals with a set of execution times, each reflecting a di↵erent possiblebehaviour. In a sense, this corresponds to a series of GOMS analyses using di↵erent procedural/selectionrules, but performed within a single automated analysis in our approach.

10.3. Further work

Our main focus here has been to demonstrate our approach to analysis of correctness and performance issuesrelevant to infusion pumps. How precise and reliable are the execution times generated by our analysis hashad a secondary importance in this paper. This aspect will become more prominent in our future work,where we intend to focus more on comparison and evaluation of di↵erent models of infusion pumps fromboth performance and correctness perspectives.

This paper describes only our first case study scaling the approach from simple toy examples to a realisticexample. Further work is needed to explore the full applicability of the approach to other kinds of incidents.We intend to look at a wider range of real incidents in future work.

The models described in this paper can be viewed as an underlying framework for exploring correctnessand performance issues and trade-o↵s between them. Currently, they require certain familiarity with formalmethods. Ultimately, however, an analysis such as ours would be a part of a UI design process. This analysiswould be used and measured using human factors or domain expertise. The most obvious necessity is todevelop a simpler language for instantiating the GUM, thus making our framework more intuitive andaccessible to UI designers and HCI experts. Further layers for higher-level modelling can be built on topof this language. In this respect, a parallel can be drawn with the ACT-R cognitive architecture [AL98]which is used as the underlying computational engine by higher-level cognitive modelling frameworks suchas ACT-Simple [SL03]. The latter supports a GOMS-like description language which is then compiled downto ACT-R where performance analysis is actually carried out. On top of this, CogTool [JPSK04] providesan additional layer which automatically generates an ACT-Simple specification from the demonstration howthe task analysed is performed on a mock-up of the interface under investigation. A longer term objective ofour work is that the framework described would ultimately provide the ‘engine’ for a tool in which domainspecialists, human factors specialists and developers can underpin less formal dialogues that have the e↵ectof improving the veracity of the evidence.

Acknowledgements

We are grateful to Harold Thimbleby for providing the original specification of the pump in Mathematica.Michael Harrison provided helpful comments about the manuscript. Anonymous reviewers also helped toimprove the paper. This work has been funded by the EPSRC research grant EP/G059063/1: CHI+MED(Computer–Human Interaction for Medical Devices).

References

[AL98] John Robert Anderson and Christian Lebiere. The Atomic Components of Thought. Lawrence Erlbaum Associates,1998.


[AT02] Erik M. Altmann and J.Gregory Trafton. Memory for goals: an activation-based model. Cognitive Science, 26(1):39– 83, 2002.

[Bar58] Frederic Bartlett. Thinking: An Experimental and Social Study Basic Books, New York, 1958.[BB97] Michael D. Byrne and Susan Bovair. A working memory model of a common procedural error. Cognitive Science,

21(1):31–61, 1997.[BB06] Bernhard Beckert and Gerd Beuster. A method for formalizing, analyzing, and verifying secure user interfaces. In

Zhiming Liu and Jifeng He, editors, Formal Methods and Software Engineering, volume 4260 of Lecture Notes inComputer Science, pages 55–73. Springer Berlin / Heidelberg, 2006.

[BB10] Matthew L. Bolton and Ellen J. Bass. Formally verifying human-automation interaction as part of a system model:limitations and tradeo↵s. Innovations in Systems and Software Engineering, 6:219–231, 2010.

[BBS12] Matthew L. Bolton, Ellen J. Bass and Radu I. Siminiceanu. Generating phenotypical erroneous human behavior toevaluate human–automation interaction using model checking. International Journal of Human-Computer Studies,70(11):888–906, 2012.

[BBD00] Richard J. Butterworth, Ann E. Blandford, and David J. Duke. Demonstrating the cognitive plausibility ofinteractive systems. Formal Aspects of Computing, 12:237–259, 2000.

[BF99] Howard Bowman and Giorgio Faconti. Analysing cognitive behaviour using LOTOS and Mexitl. Formal Aspectsof Computing, 11:132–159, 1999.

[BM95] Philip J. Barnard and Jon May. Interactions with advanced graphical interfaces and the deployment of latenthuman knowledge. In Interactive Systems: Design, Specification, and Verification (DSV-IS’95), pages 15–49.Springer-Verlag, 1995.

[CB01] Paul Curzon and Ann E. Blandford. Detecting multiple classes of user errors. In Reed Little and LaurenceNigay, editors, Proceedings of the 8th IFIP Working Conference on Engineering for Human-Computer Interaction(EHCI’01), volume 2254 of Lecture Notes in Computer Science, pages 57–71. Springer-Verlag, 2001.

[CB08] Phillip H. Chung and Michael D. Byrne. Cue e↵ectiveness in mitigating postcompletion errors in a routine proce-dural task. International Journal of Human-Computer Studies, 66(4):217 – 232, 2008.

[CH11] Jose C. Campos and Michael D. Harrison. Modelling and analysing the interactive behaviour of an infusion pump.In Proceedings of the Fourth International Workshop on Formal Methods for Interactive Systems: FMIS 2011,volume 45 of Electronic Communications of the EASST, 2011.

[CMN80] Stuart K. Card, Thomas P. Moran, and Allen Newell. The keystroke-level model for user performance time withinteractive systems. Commun. ACM, 23:396–410, July 1980.

[CMN83] Stuart K. Card, Thomas P. Moran, and Allen Newell. The psychology of human-computer interaction. LawrenceErlbaum Associates, 1983.

[CRB07] Paul Curzon, Rimvydas Ruksenas, and Ann Blandford. An approach to formal verification of human-computerinteraction. Formal Aspects of Computing, 19:513–550, 2007.

[dMOR+04] Leonardo de Moura, Sam Owre, Harald Ruess, John Rushby, N. Shankar, Maria Sorea, and Ashish Tiwari. SAL2. In Rajeev Alur and Doron A. Peled, editors, Computer Aided Verification: CAV 2004, volume 3114 of LectureNotes in Computer Science, pages 496–500. Springer-Verlag, July 2004.

[Fie01] Robert E. Fields. Analysis of erroneous actions in the design of critical systems. Technical Report YCST 20001/09,University of York, Department of Computer Science, 2001. D.Phil Thesis.

[FWH96] Bob Fields, Peter Wright, and Michael Harrison. Time, tasks and errors. SIGCHI Bull., 28:53–56, April 1996.[HJKB99] Scott E. Hudson, Bonnie E. John, Keith Knudsen, and Michael D. Byrne. A tool for creating predictive performance

models from user interface demonstrations. In UIST ’99: Proceedings of the 12th annual ACM symposium on Userinterface software and technology, pages 93–102, New York, NY, USA, 1999. ACM Press.

[Hol93a] Erik Hollnagel. Human Reliability Analysis: Context and Control. Academic Press, London, 1993.[Hol93b] Erik Hollnagel. The phenotype of erroneous actions. International Journal of Man-Machine Studies, 39(1):1–32,

1993.[HRA+11] Huayi Huang, Rimvydas Ruksenas, Maartje G.A. Ament, Paul Curzon, Anna L. Cox, Ann Blandford, and Duncan

Brumby. Capturing the distinction between task and device errors in a formal model of user behaviour. In Proceed-ings of the Fourth International Workshop on Formal Methods for Interactive Systems: FMIS 2011, volume 45 ofElectronic Communications of the EASST, 2011.

[JK96a] B.E. John and D.E. Kieras. The GOMS family of user interface analysis techniques: Comparison and contrast.ACM Trans. Comput.-Hum. Interact., 3(4):320–351, 1996.

[JK96b] Bonnie E. John and David E. Kieras. Using GOMS for user interface design and evaluation: which technique?ACM Trans. Comput.-Hum. Interact., 3:287–319, 1996.

[JPSK04] Bonnie E. John, Konstantine Prevas, Dario D. Salvucci, and Ken Koedinger. Predictive human performancemodeling made easy. In Proceedings of the SIGCHI conference on Human factors in computing systems, CHI ’04,pages 455–462, New York, NY, USA, 2004. ACM.

[KAS+11] BaekGyu Kim, Anaheed Ayoub, Oleg Sokolsky, Insup Lee, Paul Jones, Yi Zhang, and Raoul Jetley. Safety-assureddevelopment of the GPCA infusion pump software. In Proceedings of the ninth ACM international conference onEmbedded software, EMSOFT ’11, pages 155–164, New York, NY, USA, 2011. ACM.

[KP99] David Kieras and Peter G. Polson. An approach to the formal analysis of user complexity. International Journalof Human-Computer Studies, 51(2):405–434, 1999.

[KWM97] David E. Kieras, Scott D. Wood, and David E. Meyer. Predictive engineering models based on the EPIC archi-tecture for a multimodal high-performance human-computer interaction task. ACM Trans. on Computer-HumanInteraction 4(3):230–275, 1997.

[LPNB02] Xavier Lacaze, Philippe Palanque, David Navarre, and Remi Bastide. Performance evaluation as a tool for quan-titative assessment of complexity of interactive systems. In Peter Forbrig, Quentin Limbourg, Jean Vanderdonckt,


and Bodo Urban, editors, Interactive Systems: Design, Specification, and Verification, volume 2545 of LectureNotes in Computer Science, pages 208–222. Springer Berlin / Heidelberg, 2002.

[New90] Allen Newell. Unified Theories of Cognition. Harvard University Press, 1990.[OKM86] Allen Osman, Sylvan Kornblum, and David E. Meyer. The point of no return in choice reaction time: controlled and

ballistic stages of response preparation. Journal of Experimental Psychology: Human Perception and Performance12(3):243–258, 1986.

[Ras83] Jens Rasmussen. Skills, rules, and knowledge; signals, signs, and symbols, and other distinctions in human perfor-mance models. Systems, Man and Cybernetics, IEEE Transactions on, SMC-13(3):257–266, 1983.

[RBCB09] Rimvydas Ruksenas, Jonathan Back, Paul Curzon, and Ann Blandford. Verification-guided modelling of salienceand cognitive load. Formal Aspects of Computing, 21:541–569, 2009.

[RCBB07] Rimvydas Ruksenas, Paul Curzon, Jonathan Back, and Ann Blandford. Formal modelling of cognitive interpreta-tion. In Gavin Doherty and Ann Blandford, editors, Interactive Systems. Design, Specification, and Verification,volume 4323 of Lecture Notes in Computer Science, pages 123–136. Springer Berlin / Heidelberg, 2007.

[Rus01] John Rushby. Analyzing cockpit interfaces using formal methods. Electronic Notes in Theoretical ComputerScience, 43, 2001.

[SHL11] Sriram Sankaranarayanan, Hadjar Homaei, and Clayton Lewis. Model-based dependability analysis of pro-grammable drug infusion pumps. In Uli Fahrenberg and Stavros Tripakis, editors, Formal Modeling and Analysis ofTimed Systems, volume 6919 of Lecture Notes in Computer Science, pages 317–334. Springer Berlin / Heidelberg,2011.

[SL03] Dario D. Salvucci and Frank J. Lee. Simple cognitive modeling in a complex cognitive architecture. In Proceedingsof the SIGCHI conference on Human factors in computing systems, CHI ’03, pages 265–272, New York, NY, USA,2003. ACM.

[Thi02] Harold Thimbleby. Analysis and simulation of user interfaces. In Y. Waern S. McDonald and G. Cockton, editors,Human Computer Interaction 2000, volume XIV of BCS Conference on Human-Computer Interaction, pages221–237. Springer-Verlag, 2002.

Appendix

A. A model of salience related assumptions

In this appendix, we describe our model of the salience related assumptions introduced in Section 3.3. Themodel has been discussed in more detail in an earlier paper [RBCB09].

A.1. Cue strength and action salience

In this set of salience assumptions, we postulate three possible values for cue strength, modelled as thefollowing SAL type Cueing:

{ StrongCue, WeakCue, NoCue }

On the other hand, we assume here four possible values for salience levels, represented by the following typeSalience:

{ HighSLC, ReducedSLC, LowSLC, NoSLC }

Note that our choices here for possible values in both types were guided by an attempt to keep modelsreasonably simple to facilitate their automatic analysis. In principle, however, other alternatives with ahigher number of the cueing and salience values can explored in di↵erent sets of salience assumptions.

We specify the inhibition e↵ect as the following function CueSalience:7

CueSalience(cued:boolean, strong:boolean, specific:boolean, goal:boolean) : Salience =if not(cued) or not(specific) then NoSLCelsif strong then

if not(goal) then HighSLCelse ReducedSLC endif

else LowSLC endif

7 Our case study analysis will not use the cognitive load factor. This means that the corresponding load parameter is set to ‘low’in our model of salience related assumptions. For purposes of exposition, in the rest of this section, we present a pre-processedversion of our actual specifications where by all conditions are simplified taking into account that cognitive load is ‘low’.


Here, the boolean cued denotes whether or not the action considered is cued at all (no matter what the cuestrength is), strong denotes whether that cue is strong, specific states whether that cue is specific in thecurrent state, and goal states whether the goal of the activity considered has already been achieved. FunctionCueSalience is used to define each form of salience by providing di↵erent arguments. The interpretationof cued may depend on the kind of salience in question. We will say more on this below when consideringparticular kinds of salience.

Procedural salience. After a person trains on a task some actions become strongly chunked together ina way that proceduralizes them almost as though they were a single action. In this situation, we assumethat cued is true when procedural cueing is StrongCue or WeakCue. A procedural cue is considered as strongwhen the habit to take actions in sequence is primarily cognitive. On the other hand, if such a habit has amore mechanical nature, like repeatedly pressing the same key a number of times approximately required toachieve some goal, we assume that the strength of such a procedural cue is WeakCue. Thus, the proceduralsalience, procSalience, of action a in the user model state (inp,mem,env) is calculated as follows:

procSalience: Salience =let cued: boolean =

s.size > 0 and cues[s.last].proc(a)(inp,mem,env) /= NoCue,strong: boolean =

s.size > 0 and cues[s.last].proc(a)(inp,mem,env) = StrongCuein

CueSalience(cued, strong, true, false)

Here, s.size > 0 indicates that at least one action has been taken by the user model, while s.last refersto that action. Since procedural cues are linked to habits, we assume that neither specificity of an action northe goal of the relevant activity a↵ect them. Hence, specific is true, while goal is false in the calculationabove.

Sensory salience. Sensory cues can use any of the senses, though with current technology they are mostlikely to be visual or auditory cues – flashing lights and messages on screen versus beeps and spoken warnings.The position of a visual cue on the screen will a↵ect its raw sensory cueing strength. If it is not in a positionthat the person’s attention will naturally be directed to, then it is likely to have a lower cue strength. Thesesensory cues lead to an action having sensory salience. We assume that action a is sensorily cued only whenthe raw strength of its sensory cues is StrongCue. Sensory salience can be inhibited by non-specificity ofthe cue or when the goal of the current (or possible next) activity has been achieved. Hence, the sensorysalience, sensSalience, of action a within the context of activity n is calculated as follows:

sensSalience: Salience =let strong: boolean = cues[a].sens(inp,mem,env) = StrongCue,

specific: boolean = cues[a].spec(s)(inp,mem,env),goal: boolean = activities[n].goal(inp,mem,env)

inCueSalience(strong, strong, specific, goal)

Cognitive salience. Some actions are relevant to a particular activity being undertaken, while others arenothing to do with it. The former are likely to ‘spring to mind’ when doing other actions whilst the latterare not. We refer to the salience of such actions that spring-to-mind as cognitive salience.

The strength of cognitive cues is specified with respect to activities during the instantiation of the GUMto a concrete interactive system. If an action is nothing to do with the activity considered, it will be modelledas having no cognitive cueing at all. However, of those actions that are linked to the activity, some will morereadily spring to mind than others. For actions that are seen as a central part of a task, irrespective of thedevice used to perform the task (‘task actions’), the raw strength of cognitive cueing will be StrongCue.On the other hand, the cognitive cueing of device specific actions, that are specific only to the device inquestion, is WeakCue, even when they are relevant to the activity as a whole (see [HRA+11] for a moredetailed discussion of this and its link to experimental work). Hence, cued is true when cognitive cueingis StrongCue or WeakCue. As with sensory cueing, cognitive salience can be inhibited by non-specificity of


the cue or when the goal of the current (or possible next) activity has been achieved. Thus, the cognitivesalience, cogSalience, of a within the context of activity n is calculated as follows:

cogSalience: Salience =let cued: boolean = activities[n].cog(a)(inp,mem,env) /= NoCue,

strong: boolean = activities[n].cog(a)(inp,mem,env) = StrongCuespecific: boolean = cues[a].spec(s)(inp,mem,env),goal: boolean = activities[n].goal(inp,mem,env)

inCueSalience(cued, strong, specific, goal)

A.2. Overall salience

Overall salience, as used in the guarded rules of Section 4, is determined from the strength of the threeseparate kinds of salience considered above. Essentially, we distinguish four levels of ‘overall salience’.8

High salience (level 3). Procedural salience has the strongest e↵ect on overall salience. If it is high thenthe overall salience will be high. Even if procedural salience is reduced due to inhibitors, then overall saliencecan still be boosted to high if one of the other saliences (sensory or cognitive) is high. Without proceduralsalience the others cannot give high salience:

highSalience: boolean =procSalience = HighSLCorprocSalience = ReducedSLC and (sensSalience = HighSLC or cogSalience = HighSLC)

Medium salience (level 2). If an action does not have high overall salience it may have medium overallsalience. This will be so if procedural salience is reduced. If this is so then neither of the other kinds ofsalience will be high as if so it will have high overall salience. Alternatively the overall salience will bemedium if procedural salience is low. If no procedural salience is associated with an action, then it will stillhave medium overall salience if at least one of sensory or cognitive salience is high. This rule reflects the ideathat sensory and cognitive salience do not have as strong an e↵ect as procedural salience:9

mediumSalience: boolean =procSalience = ReducedSLC or procSalience = LowSLCorsensSalience = HighSLC or cogSalience = HighSLC

Low salience (level 1). According to the model an action can still have some minimal salience su�cientto attract attention even if not of high or medium salience. This will occur only if there is no proceduralsalience as otherwise overall salience will have a higher level according to the above rules. In such a case,overall salience will be low only when sensory salience is reduced, cognitive salience is reduced or cognitivesalience is low (indicating a ‘device specific action’):

lowSalience: boolean =sensSalience = ReducedSLC or cogSalience = ReducedSLCorcogSalience = LowSLC and n = s.active

Note that according to this rule (condition n = s.active) device specific actions cannot trigger a newactivity, since such actions will satisfy the rule only if they are linked to the current activity (s.active).

8 These should not be confused with the four possible values of salience represented by the type Salience which is used withrespect to the particular kinds of raw salience: procedural, sensory and cognitive salience.9 This definition, as with the similar one for low salience below, does not exclude by itself some cases where an action has highoverall salience. Note, however, that a test for medium salience is taken in the model only if the corresponding test for highsalience failed.


No salience (level 0). An action will have no overall salience and so not be a candidate for being takenat all if all of the following hold: it is linked to an activity other than the current one, there is no proceduralsalience and both its sensory and cognitive salience are low. We do not provide an explicit definition forthis salience level. Actions are assigned to it implicitly, if they do not satisfy any of the above conditions forhigher levels of salience.

B. The device model: initial pump design

In this appendix, we present our models of pump behaviour and interface for the initial pump design.

B.1. Interactive pump behaviour

We start from the model of the interactive pump behaviour.

State space. Two main elements in the underlying state space are records val of type Values and m oftype Modes. The type Values has four components. The current values of the rate of infusion and the volumeto be infused are represented by rate and vtbi, respectively. The third component vdisp stands for thedisplayed value of the volume to be infused before it has been confirmed by the user (see below). Note thatthere is no such distinction for the rate parameter, since it does not require confirmation. Since we focus onthe user behaviour in infusion setting, the infusion process itself is modelled in an abstract way. Thus, thefourth component, boolean volume set non-deterministically, simply indicates whether the whole volume hasalready been infused or not.

The type Modes represents information about the modes of pump operation. It has three components.The basic modes of operation are represented by the component mode which can take five possible values:off, basic, vtbi, clear and infusing. These are represented by the type BasicMode. When switched o↵,the pump’s mode is off. After switching on, its mode changes to basic mode, where the rate of infusion canbe set. From the basic mode, the pump can move into the vtbi mode (setting the volume to be infused),clear (checking and/or clearing val.volume – the volume that has already been infused) and infusing.Furthermore, the infusion process can be paused. This is represented by the second component hold inModes. Its last component, alarm models the alarm status in the pump which, for example, can be silentor beep.

Mode transitions. In SAL, each mode transition in pump operation is modelled as a guarded command.For example, the command switch on models the pump being switched on. It simply states that, in the caseof event OnOff and the pump being in off mode, the pump’s mode is changed to basic. One of the options inthe basic mode is to move to vtbi mode. This happens when the second option key is pressed. The guardedcommand vtbi models this mode transition. When the volume to be infused has been entered (vtbi mode),that value can be confirmed by pressing the first option key. The confirmation changes the pump’s mode tobasic and sets the volume to be infused to the value shown on the display (val.vdisp). This is modelledas the command vtbi confirm. Other commands related to the mode transitions are specified in a similarway (see below).

Number entry. Numbers (representing the rate and volume to be infused) are entered by incrementallychanging them using the four chevron keys as discussed earlier. As an example, consider how the numberentry is modelled for the volume to be infused. In this case, the pump must be in the vtbi mode. Pressingthe single ‘up’ chevron then increases the value by one tenth of the unit which is represented as delta =1 in our model. This transition is specified by the command vtbi up which updates the displayed valueval.vdisp. It is possible only if the updated value val’.vdisp does not exceed the maximum max. Thetransition associated with pressing the double ‘up’ chevron is similar: the current value is increased by thewhole unit, represented as Delta = 10 in the model. This transition is specified by the command vtbi UP.Other number entry commands are are specified in a similar way.

Modelling pump behaviour. The interactive behaviour of the pump, pump basic, is specified by thefollowing SAL model:


pump_basic {max:natural} : context =

begin

Numbers: type = [0..max];Values: type = [# rate: Numbers, vtbi: Numbers, vdisp: Numbers, volume: bool #];

Alarm: type = {silent, beep};

BasicMode: type = { off, basic, vtbi, clear, infusing };Modes: type = [# mode: BasicMode, hold: bool, alarm: Alarm #];

Events: type ={ OnOff, Mute, Opt1, Opt2, Opt3, UP, up, DN, dn, Run, Hold, Tick };

delta: natural = 1;Delta: natural = 10;

moff(m:Modes): bool =m.mode = off and m.hold = false and m.alarm = silent;

pump_basic [ init: [[Modes,Values] -> bool] ] : module =begin

input event: Eventsoutput val: Values, m: Modeslocal vol: bool

initialization[ init(m,c) --> ];vol = false;

transition[% mode transitionsswitch_on:

event = OnOff and m.mode = off --> m’.mode = basic[]switch_off:

event = OnOff and m.mode /= off and m.mode /= infusing -->val’.vdisp = val.vtbi; m’ in { x: Modes | moff(x) }

[]vtbi:

event = Opt2 and m.mode = basic --> m’.mode = vtbi[]vtbi_confirm:

event = Opt1 and m.mode = vtbi --> val’.vtbi = val.vdisp; m’.mode = basic[]vtbi_cancel:

event = Opt3 and m.mode = vtbi --> val’.vdisp = val.vtbi; m’.mode = basic[]clear:

event = Opt1 and m.mode = basic --> m’.mode = clear[]clear_confirm:

event = Opt1 and m.mode = clear --> val’.volume = false; m’.mode = basic


[]clear_cancel:

event = Opt3 and m.mode = clear --> m’.mode = basic[]run:

event = Run and m.mode = basic and not(val.volume) and m.alarm = silent -->m’.mode = infusing

[]stop:

event = Run and m.mode = infusing --> m’.mode = basic[]flip_hold:

event = Hold and m.mode = infusing --> m’.hold = not(m.hold)[]% number entryrate_up:

event = up and m.mode = basic and val.rate + delta <= max -->val’.rate = val.rate + delta

[]rate_UP:

event = UP and m.mode = basic and val.rate + Delta <= max -->val’.rate = val.rate + Delta

[]rate_dn:

event = dn and m.mode = basic and val.rate - delta >= 0 -->val’.rate = val.rate - delta

[]rate_DN:

event = DN and m.mode = basic and val.rate - Delta >= 0 -->val’.rate = val.rate - Delta

[]vtbi_up:

event = up and m.mode = vtbi and val.vdisp + delta <= max -->val’.vdisp = val.vdisp + delta

[]vtbi_UP:

event = UP and m.mode = vtbi and val.vdisp + Delta <= max -->val’.vdisp = val.vdisp + Delta

[]vtbi_dn:

event = dn and m.mode = vtbi and val.vdisp - delta >= 0 -->val’.vdisp = val.vdisp - delta

[]vtbi_DN:

event = DN and m.mode = vtbi and val.vdisp - Delta >= 0 -->val’.vdisp = val.vdisp - Delta

[]% infusion prosessinfusion:event = Tick and m.mode = infusing and not(m.hold) -->

vol’ in { x: bool | true };val’.volume = vol’;m’ = if not(vol’) then m

else m with .mode := basic with .alarm := beep endif[]else -->


]

end;

B.2. Pump interface

The state space of our pump model specified above represents internal information as regards pump operation.Some of this information may not be observable on the pump interface. In this section, we formally specifywhich elements of the pump state are represented by its interface.

Most of the information about the pump state is given on the display. We model the pump display as arecord disp of type Display. The modeMsg field in this record represents a message, shown at the top of thedisplay, about the mode of pump operation. The messages correspond to the pump modes discussed above.They are specified by the Mode Message type. Note that noMsg, representing no message shown on the topline of the display, corresponds to the basic mode. Depending on the pump mode either rate, volume tobe infused or volume already infused are shown in the middle of the display. In our model, the booleansdisp.rateDisp, disp.vtbiDisp and disp.volumeDisp are true when the corresponding value is displayedtogether with the associated label. Finally, the fields option1, option2 and option3 represent the labels ofthe three option keys (shown at the bottom of the display). These labels are modelled as the type Option.In this type, noOption corresponds to the situations when the relevant option key provides no functionality,and there is no label associated with it. Other labels are discussed below.

We specify the disp record, representing the display, separately for each of the pump modes. As anexample, both the rate of infusion and the volume already infused are displayed in in the basic mode.The vtbiOpt label indicates that the second option key can be used to change the pump mode to vtbi,while the first option key labelled volumeOpt moves the pump into the volume clearing mode. The constantoffDisp models the empty display. In the vtbi mode, the volume to be infused is displayed together withthe corresponding message about the pump mode. The okOpt label indicates that the first option key is tobe used to confirm the displayed value, while quitOpt indicates that the third option key will cancel it.

Finally, the pump interface, interface, and the whole pump device, Pump, are specified by the followingSAL model:

pump {max:natural} : context =

begin

importing pump_basic{max};

ModeMessage: type = { noMsg, vtbiMsg, clearMsg, infusingMsg, onholdMsg };Option: type = { volumeOpt, vtbiOpt, okOpt, quitOpt, yesOpt, noOpt, noOption }

LED: type = [# on: bool, infusing: bool, onhold: bool #];

Display: type =[# modeMsg: ModeMessage,

rateDisp: bool, vtbiDisp: bool, volumeDisp: bool,option1: Option, option2: Option, option3: Option

#]

offDisp: Display =(# modeMsg := noMsg,

rateDisp := false, vtbiDisp := false, volumeDisp := false,option1 := noOption, option2 := noOption, option3 := noOption

#)

interface: module =begin


input m: Modesoutput disp: Displayoutput led: LEDoutput alarm: bool

definitionalarm = (m.alarm /= silent);led = (# on := (m.mode /= off), infusing := (m.mode = infusing), onhold := m.hold #);screen =if m.hold then

offScr with .modeMsg := onholdMsgwith .rateDisp := true with .volumeDisp := true

elsif m.mode = infusing thenoffScr with .modeMsg := infusingMsg

with .rateDisp := true with .volumeDisp := trueelsif m.mode = clear then

offScr with .modeMsg := clearMsg with .volumeDisp := truewith .option1 := yesOpt with .option3 := noOpt

elsif m.mode = vtbi thenoffScr with .modeMsg := vtbiMsg with .vtbiDisp := true

with .option1 := okOpt with .option3 := quitOptelsif m.mode = basic then

offScr with .rateDisp := true with .volumeDisp := truewith .option1 := volumeOpt with .option2 := vtbiOpt

elseoffScr

endif;

end;

Pump [ init: [[Modes,Values] -> bool] ] : module = pump_basic[init] || interface;

end

C. The pump user model

In this appendix, we give additional SAL specifications for the user activities and actions identified inSection 7 where the specifications for mainActivity and chooseVolumeAction have already been provided.

First, we consider the volume setting activity volumeActivity. Its goal is to set the volume to be infusedas prescribed. We assume that there are two cases when this goal can be considered as achieved from theuser perspective. The first represents the user perceiving the displayed value which is the same as the targetvolume: inp.vtbiDisp and inp.vtbi = mem.targetVolume. The second represents the user believing thatthe volume has already been set. We will discuss below how such a belief (mem.volumeSet) can be formedin the user model.

To achieve the above goal, the user has to choose the volume setting mode (chooseVolumeAction),enter the required value by using four chevrons (upVolumeAction, UpVolumeAction, downVolumeAction andDownVolumeAction), check the entered value (markVolumeAction) and confirm it (confirmVolumeAction).Thus, we link these actions to volumeActivity which is formally specified as follows:

(# goal := lambda(inp:Inp,mem:Memory,env:Env):inp.vtbiDisp and inp.vtbi = mem.targetVolume or mem.volumeSet

, cog := lambda (a:ActionRange): lambda(inp:Inp,mem:Memory,env:Env):if a = chooseVolumeAction or a = upVolumeAction or a = UpVolumeAction or

a = downVolumeAction or a = DownVolumeAction or a = markVolumeAction then StrongCue


elsif a = confirmVolumeAction then WeakCueelse NoCue endif #)

All but one of the actions linked to this activity seem necessary for achieving its goal. Therefore, they areassumed to have strong cognitive cueing. The main reason why confirmVolumeAction is assumed to haveweak cognitive cueing is that it is not obviously necessary to perform this activity – it is essentially a devicedependent step. For example, the pump interface does not include a confirmation step when the rate isentered so even for this device such a step is not always required.

The specification of the rate setting activity is done in a similar way. Next, we show how other useractions are formally specified in our model.

Switching on. Let us first consider startAction which models the user switching the pump on. We makethe following assumptions about its cueing. This defines the record cues[startAction]:

(# proc := noCue, sens := weakCue, spec := lambda(s:Status): lambda(inp:Inp,mem:Memory,env:Env):

not(s.trace[startAction]) and not(inp.on), grd := lambda(inp:Inp,mem:Memory,env:Env): true #)

Since switching on is normally the first action the user would take when setting up an infusion, startActiondoes not have procedural cues. Since the ‘on/o↵’ key is located a little below the display, startAction doesnot have particular sensory cues: weakCue, here, is a constant function that returns WeakCue. In this analysis,we only consider interactions that are not restarted again once the pump has been switched o↵. Thus, theaction startAction is specific to the interaction only if it has not been taken before and the user perceivesthat the pump is o↵. Since there is no pre-condition associated with the action, grd(inp,mem,inv) is alwaystrue, meaning no preconditions need to hold for the user model to do the switching on action.

Next we specify the state transitions associated with startAction. That is we define the recordtrans[startAction]:

(# tout := lambda(inp:Inp,out0:Out,mem:Memory): lambda(out1:Out): out1 = OnOff, tmem := lambda(inp:Inp,mem0:Memory,out:Out): lambda(mem1:Memory):

mem1 = mem0 with .last := -1 with .over := false with .under := false, time := TimeMove + TimePress #)

Thus, we assume that, as a result of startAction, the user model produces the OnOff event which cor-responds to pressing the ‘on/o↵’ key. This action also clears the memory as explained earlier. Finally, weassume that the execution time associated with this physical action is obtained by adding the duration ofthese two operators.

Starting the infusion. The action infuseAction di↵ers from startAction only in its specificity condition.We assume that infuseAction is specific to the user when the user perceives that the pump is on and believesthat the target rate and volume have already been set:


inp.on and mem.rateSet and mem.volumeSet, grd := lambda(inp:Inp,mem:Memory,env:Env): true #)

Similarly, the single di↵erence to startAction in the specification of transitions is the output ofinfuseAction. The user model produces the Run output which corresponds to pressing the ‘infuse’ key:

(# tout := lambda(inp:Inp,out0:Out,mem:Memory): lambda(out1:Out): out1 = Run, tmem := lambda(inp:Inp,mem0:Memory,out:Out): lambda(mem1:Memory):

mem1 = mem0 with .last := -1 with .over := false with .under := false, time := TimeMove + TimePress #)

Number entry. In response to the user pressing the vtbi option key, the pump changes its mode to vtbi.When in this mode, the four chevrons can be used to enter the target volume as discussed earlier. Pressing


each chevron is modelled as a separate user action. Below, we specify the cueing information and transitionsfor upVolumeAction and UpVolumeAction. The corresponding specifications for the two decrease actions aresimilar, as are the specifications for the rate entry actions which are also omitted from the presentation here.

We start from the single ‘up’ chevron. Pressing the latter in vtbi mode corresponds to upVolumeAction.Since all four chevrons are located below the display, the sensory cueing for this and other number entryactions is weak. Also, the number entry actions are always possible (chevrons are always available):

(# proc := lambda (a:ActionRange): lambda(inp:Inp,mem:Memory,env:Env):if a = markVolumeAction and mem.last + mem.delta = mem.targetVolume then StrongCueelsif a = upVolumeAction and mem.last < mem.targetVolume then WeakCueelse NoCue endif

, sens := weakCue, spec := lambda(s:Status): lambda(inp:Inp,mem:Memory,env:Env):

inp.vtbiDisp and inp.vtbi < mem.targetVolume and(s.active = volumeActivity or inp.mode = vtbiMsg) andnot(inp.vtbi = mem.last and s.last = upVolumeAction)

, grd := lambda(inp:Inp,mem:Memory,env:Env): true #)

We also assume that, when the user believes that the next value (after the chevron press) is the tar-get volume, upVolumeAction serves as a strong procedural cue for markVolumeAction. The latter mentallymarks the volume as already set (see below). This cueing models a more conscious user habit (i.e., learnedknowledge) about the operation of the pump. Furthermore, upVolumeAction can also serve as a weak pro-cedural cue to itself. This is so when the user believes that the current value (mem.last + mem.delta) isstill less than the target volume. Thus, this cueing models a habit of a more mechanical nature – the userpressing the same key an approximately correct number of times. Finally, upVolumeAction is considered asspecific when: the current volume is displayed; its perceived value is less than the target volume, and eitherthe user believes they are currently involved in the volume setting activity (s.active = volumeActivity)or can perceive that from the mode message on the display (inp.mode = vtbiMsg).

Executing upVolumeAction yields the up event (single ‘up’ key pressed) and the following user beliefupdates :

tout := lambda(inp:Inp,out0:Out,mem:Memory): lambda(out1:Out): out1 = uptmem := lambda(inp:Inp,mem0:Memory,out:Out): lambda(mem1:Memory):

( mem1.last = inp.vtbi and mem1.delta = deltaUp(inp.vtbi)or(out = up or out = UP) andmem1.last = mem0.last + mem0.delta andmem1.delta = if out = up then mem0.delta else deltaUp(mem1.last) endif

) andmem1.over =

(mem0.over ormem1.last + mem1.delta > mem0.targetVolume and mem1.last <= mem0.targetVolume) and

mem1.under =(mem0.under or mem1.last < mem0.targetVolume and mem0.last >= mem0.targetVolume)

...

The dots ... stand here for the specification of those fields in mem that remain unchanged.Two possibilities are considered for the belief updates concerning the volume and delta values. The

first one, which is always possible, involves scenarios where the user checks, and correctly perceives, thecurrent volume value on the display (inp.vtbi). This value is then used to update the modelled userbeliefs about both the current value and delta. The function deltaUp simply returns the constant deltadefined in the pump model. This reflects our assumption that the user has correct knowledge as regardspump operation. The second possibility covers only those scenarios where the previous key press was onan increase chevron (out = up or out = UP). In this case, the current delta is used to calculate the beliefabout the current volume (mem0.last + mem0.delta). Moreover, it is natural to assume that the user may

expect that pressing the same key yields the same e↵ect. Thus, the user model keeps the belief about thedelta unchanged (mem1.delta = mem0.delta), if the previous action was pressing the single ‘up’ chevron


(out = up). Otherwise, it is assumed that the belief about the current delta is formed based on the beliefabout the latest volume value (deltaUp(mem1.last)).

Furthermore, the user model keeps track of whether jumps beyond the target volume have happened in ei-ther direction. These beliefs are represented by the over and under fields in the memory, mem. They rely on thecomparison of the current and ‘predicted’ values with the target volume. Note that even upVolumeAction canupdate the under belief. This could happen when the user corrects a wrong belief (mem0.last + mem0.delta)about the current value by checking the displayed volume (inp.vtbi) and relying on it.

Our specifications for UpVolumeAction are similar to those of upVolumeAction. For the cueing specifi-cation, the specificity predicate includes a minor addition:

spec := lambda(s:Status): lambda(inp:Inp,mem:Memory,env:Env):inp.vtbiDisp and inp.vtbi < mem.targetVolume and not(mem.over)(s.active = volumeActivity or inp.mode = vtbiMsg) andnot(inp.vtbi = mem.last and s.last = upVolumeAction)

Thus, UpVolumeAction is specific only if the modelled user believes that a jump over the target volume hasnot happened (not(mem.over)). We assume that, if the opposite is true, the user is more careful and wouldnot repeat jump-over again, relying instead on the single ‘up’ chevron.

Marking vtbi as set. This action, denoted markVolumeAction, models the user mentally marking thevolume as set. The action can be procedurally cued by the number entry actions as discussed above. It canalso be sensorily or cognitively cued. In this case, however, its specificity condition must hold. Formally,cueing for markVolumeAction is specified as follows:

(# proc := lambda (a:ActionRange): lambda(inp:Inp,mem:Memory,env:Env):if a = confirmVolumeAction then StrongCue else NoCue endif

, sens := strongCue, spec := lambda(s:Status): lambda(inp:Inp,mem:Memory,env:Env):

inp.vtbiDisp and inp.vtbi = mem.targetVolume andnot(mem.volumeSet) and (s.active = volumeActivity or inp.mode = vtbiMsg)

, grd := lambda(inp:Inp,mem:Memory,env:Env): true #)

Thus, the action is specific when the modelled user believes it is engaging in the volume setting activity andthat the volume has not yet been set. Furthermore, the perceived value must be the same as the target volume.The sensory cueing for this action is assumed to be strong, since the current volume is shown on the display.Also, marking the volume as set serves as a strong procedural cue for confirming (confirmVolumeAction)the current value of volume.

Since markVolumeAction is a purely mental action, it yields the Tick event which models the user takingno physical actions. Therefore, the time associated with the physical stage of this action is 0. Finally, thememory is updated so that the belief about the volume being set becomes true:

(# tout := lambda(inp:Inp,out0:Out,mem:Memory): lambda(out1:Out): out1 = Tick, tmem := lambda(inp:Inp,mem0:Memory,out:Out): lambda(mem1:Memory):

mem1 = mem0 with .volumeSet := true, time := 0 #)

Confirming vtbi. This action, denoted confirmVolumeAction, models the user confirming the cur-rent value of volume by pressing the relevant confirmation button. It can be procedurally cued bymarkVolumeAction (see above). However, the latter is not necessary, since confirmVolumeAction is as-sumed to have strong cognitive cueing. In this case, it is considered as specific when the perceived valueis the same as the target volume, and the user believes they are involved in the volume setting activity.Furthermore, this action is only possible if there is an option key with the confirmation label (okOpt):


inp.vtbiDisp and inp.vtbi = mem.targetVolume and s.active = volumeActivity, grd := lambda(inp:Inp,mem:Memory,env:Env): (exists(j:[1..3]): inp.option[j] = okOpt) #)


Executing this action yields an event that represents the user pressing the confirmation key. The memoryis also updated so that the belief about volume setting becomes true. This is in case markVolumeAction wasskipped.

tout := lambda(inp:Inp,out0:Out,mem:Memory): lambda(out1:Out):exists(j:[1..3]): inp.option[j] = vtbiOpt and out1 = optAction(j)

tmem := lambda(inp:Inp,mem0:Memory,out:Out): lambda(mem1:Memory):mem1 = mem0 with .volumeSet := true

with .last := -1 with .over := false with .under := false

Date post:	10-Feb-2017
Category:	Documents
Upload:	trinhdat
View:	216 times
Download:	0 times

Combining Human Error Verification and Timing Analysis: A Case ...

Documents