Towards Publishable Event Logs That Reveal Touchscreen Faults … · 2019-02-25 · Towards...

Towards Publishable Event LogsThat Reveal Touchscreen Faults

Andrea L. Mascher Paul T. Cotton

Department of Computer ScienceThe University of Iowa

Iowa City, Iowa{amascher, pcotton, jones}@cs.uiowa.edu

Douglas W. Jones

Abstract

Federal standards require that electronic voting machineslog information about the voting system behavior to sup-port post-election audits and investigations. Our studyexamines interface issues commonly reported in touch-screen voting systems (miscalibration, insensitivity, etc.)and the voter interaction data that can be collected to al-low investigation of these issues while at the same timepreserving the right to a secret ballot. We also provideempirically derived metrics that can detect these issuesby analyzing these data.

1 Introduction

Electronic voting machines have become prevalent in thewake of the 2000 US Presidential election. Such systemshave replaced mechanical and punchcard ballots becausethey prevent overvotes (the selection of too many candi-dates in a given contest), have the potential to reduce un-dervotes (the lack of a selection in a contest), and provideimproved accessibility through multilingual and multi-modal interfaces.

The Help America Vote Act mandated “at least onedirect recording electronic voting system or other votingsystem equipped for individuals with disabilities at eachpolling place,” and authorized up to $3.9 billion for im-plementation of its reforms [1]. Despite the recent ubiq-uity of these systems in American elections, there are stillwidespread problems with existing voting machines.

Most concerns regarding electronic voting systemshave focused on their security vulnerabilities and lackof verified audit logs, but the 2006 Florida Congres-sional District 13 election (“CD13”) in Sarasota Countyhas brought increased scrutiny to the user interfaces oftouchscreen voting systems. 14.8% of votes cast onthe ES&S iVotronic touchscreen systems used in thatelection had undervotes in the CD13 contest, a rate sev-eral times higher than for comparable up-ticket contests

Figure 1: Screenshot of Florida’s 2006 Sarasota CD13and Gubernatorial contests. Image edited to fit columnwidth.

such as Senate, Governor, or Attorney General (1.14%,1.28%, 4.36%, respectively) and more than five timesgreater than the undervote rate for paper ballots (2.5%)used in the same election [10]. Post-election investiga-tions have proposed that this abnormally high undervoterate was due to user interface issues, namely poor ballotdesign, and touchscreen miscalibration or insensitiv-ity [6, 10, 21], but the existing event logs for CD13 didnot record sufficient information to test these hypotheses.

All electronic voting systems are required to maintainan audit trail, more properly called an event log. The

requirement for event logs in voting systems dates backto the original voting system standards promulgated bythe Federal Election Commission in 1990:

“All systems shall include capabilities ofrecording and reporting the date and time ofnormal and abnormal events, and of maintain-ing a permanent record of audit informationthat cannot be turned off. For all systems,provisions shall be made to detect and recordsignificant events (e.g., casting a ballot, errorconditions which cannot be disposed of by thesystem itself, time-dependent or programmedevents which occur without the intervention ofthe voter or a polling place operator)” [8]

Subsequent federal standards have continued to sup-port this requirement [9, 20]. Currently deployed sys-tems rarely record events beyond the minimum listed inthe 1990 FEC standard. While these events are useful fora post-election investigation, they are far from sufficient.In many voting system event logs, the only voter interac-tion recorded is the casting of a ballot. This lack of in-formation recorded in existing event logs hinders inves-tigations into many reported problems, specifically thoserelated to voter experience and intent.

A log of all voter actions should allow easy diagnosisuser interface issues, such as touchscreen miscalibration,but this records too much information. The right to asecret ballot is compromised when it is possible to re-construct how a person voted from the event log. Thisbalance, between the need to protect ballot secrecy andthe desire to collect the maximum amount of meaningfuldata for post-election investigations, has prompted sev-eral questions:

• Which user interface problems can be detected bylogging events without revealing voter selections?

• Can different types of problems be differentiatedfrom these event logs?

We have developed a touchscreen voting system, Vote-O-Graph, to be a testbed for experiments intended to an-swer these questions. The user study described in this pa-per investigates what user interaction data can be main-tained in a voting system event log without threateningballot secrecy and what measurable differences in behav-ior exist under a variety of interface issues.

2 Related Work

In many systems that record event logs, the entire historyof the system is captured. The level of detail in theselogs are sufficient that, given an initial state of the sys-tem and the information maintained in the event logs, the

final state of the system can be reconstructed. In finan-cial event logs, for example, events typically indicate theamount of money transferred, the source account and thedestination account, as well as who authorized the trans-fer and why. An equivalent event log for a voting systemwould indicate, at the moment each vote was cast, whocast that vote and for what candidate. Recording suchan event log poses obvious threats to the right to a secretballot. It was observed as early as 1893 that even a se-quential record of the votes cast, with no time stamps, issufficient to allow an observer to determine who cast thatvote [17].

Cordero and Wagner proposed using replayable auditlogs to create a visual record all of the events in each vot-ing session. By recording touchscreen touches and out-put events for each voting session, they allow reconstruc-tion of that session in sufficient detail that human-factorsproblems during voter interaction with the system can bestudied in detail [5].

In an effort to anonymize the data, Cordero and Wag-ner does not store time stamps in the log, and while thesequence of events in each voting session is stored inorder, a history independent data structure is needed tostore the logs for each voter, so that, after the polls close,it is difficult to tie individual voters to the records of theirvoting sessions. The lack of time stamps and the use ofa history independent data structure mean that Corderoand Wagner’s replayable logs must be stored separatelyfrom the conventional event log required by current vot-ing system standards.

Despite the efforts to anonymize voting sessions, vot-ers can easily add personal signatures to their replayableevent logs. Consider, for example, a voter who hasagreed to sell her vote. The vote buyer and vote sellerwould agree on a pre-determined ballot signature, suchas touching each corner of the screen in some pre-arranged sequence. The vote buyer could examine thereplayable log to look for the signature and verify that thevote seller cast a ballot with the agreed selections. Be-cause a ballot signature can be associated with a voter’scandidate selections, public release of such a replayablelog is problematic. We believe that event logs that can-not be released for public examinations are themselvesproblematic, so we have sought an alternative.

3 System

Our experimental touchscreen voting system, Vote-O-Graph, is not designed to be an honest voting machine inthe traditional sense. Instead, it is designed to simulatecommonly reported touchscreen interface issues. Con-trolled modifications have been applied to impact the bal-lot layout, perceived touchscreen calibration, perceivedtouchscreen sensitivity, and summary screen honesty.

Dave LoebsackDemocratic

Mariannette Miller-MeeksRepublican

Wendy BarthGreen Party

Brian WhiteNominated by Petition

FOR UNITED STATES REPRESENTATIVE2ND DISTRICT

(Vote for no more than ONE)

Previous Next

Navigation Buttons

Contest Selection Buttons

Write-In

Figure 2: Layout of a Vote-O-Graph contest page. Thecontest shown is from the November 2008 US Houseelection in Iowa’s 2nd district.

Vote-O-Graph is a 1,500 line Java/Swing applicationdesigned to work on any touchscreen notebook com-puter. Our user studies were conducted on a HP tx2510laptop/tablet running Ubuntu Linux 8.10. This computerhas a 12.1” (307 mm) screen running at 1200×800 pixelresolution and was configured as a tablet computer in allexperiments. The ballot is specified in an XML file.

The visual design of Vote-O-Graph is based on lay-outs used in existing commercial and experimental vot-ing systems, such as the ES&S iVotronic or Pvote [25].Contests are normally presented one per page with con-test description at the top of the screen and candidate se-lection options presented as a column of adjacent but-tons in the middle of the screen. The “Next” and “Pre-vious” navigation buttons are in the lower right and lefthand corners, 20 pixels (4.1 mm) from the bottom of thescreen. Voters are required to review their selections andmay make updates via a series of summary screens be-fore the ballot is cast. On an update page, the “Next” and“Previous” buttons are replaced by a “Return to Sum-mary” button which spans the width of the screen. Allbuttons had a height of 90 pixels (18.4 mm). An exam-ple Vote-O-Graph screen layout is shown in Figure 2.

4 Preserving Anonymity in Event Logs

In US elections, it is crucial to maintain separation be-tween the identity of the voter and the particular selec-tions made in the voter’s cast ballot. When these datacan be linked, voters become susceptible to coercion andvote selling.

On the other hand, we want to record as muchinformation about the voter’s interaction as possible toallow diagnosis of interface problems. To address thesecompeting concerns, we propose logging additionalevents, while keeping the following goals in mind:

Standards compliant The new events we record con-tain timestamps and other elements required under cur-rent federal standards.

Ease of integration with existing logging systemsThe new events we record are conventional timestampedevent records comparable to the events already beinglogged on existing voting systems.

Record novel interaction information The newevents support detection and ascribe causes of voting sys-tem irregularities.

Avoid compromising secret ballot rights As long asvote data can not be inferred from the event log, then theevent logs can be released to the public with little or nomodifications or redactions.

Unlike Cordero and Wagner, it is not our goal to pro-vide a record or method to recount or verify election re-sults. Rather, our goal is to allow detection of user in-terface problems. To do this, our interface-based loggingsystem records three types of data: timestamps, buttontypes, and relative locations.

4.1 Timestamps

When the time of an event is recorded in an event log, itis trivial to link it to the voter who was present at thattime. For example, an observer at the polling stationcould keep track of the times and machines used by vot-ers throughout the day. At some later time, the voterson the observer’s list could be cross-checked against theentries in the event log.

Given the requirements for timestamped, sequentialentries in event logs, there can never be guaranteedanonymity of voters’ identities in any system that canbe publicly observed. Therefore, to protect ballot se-crecy we do not log button identities or absolute touchcoordinates.

4.2 Button Type

For each touch event we record the type of button but notthe identity. Our event log shows that the voter made aselection, removed a selection, or navigated to anotherballot page, but the candidates selected are not recorded.Retaining information about the type of button touchedin the event log provides diagnostic information aboutwhere in the ballot irregularities occurred. For exam-ple, multiple candidate selects followed immediately bydeselects on a single ballot page may indicate that thevoter had difficulty with the interface. Recording onlythe button type prevents revealing a voter’s selection, al-though it can reveal when a voter abstains from a specificcontest, depending on how a voter navigates through theballot. Several different approaches to limit this risk arediscussed in Section 8.

4.3 Relative Touch Coordinates

We record two types of locations that a voter could touch:a button or the background. A touch on the backgrounddoes not change the state of the ballot or screen, but anexcessive number of background touches may indicate asystem or interface issue. It may be the case that a back-ground touch is a miss on a nearby button, so to preservevoter privacy, we only record when a background touchoccurs, not where.

The location where a button was touched is recordedas an (x,y) pair relative to the button itself, not to thescreen as a whole. This prevents leaking a voter’s selec-tion, since a touch on the same location of any other but-ton would be recorded the same. For example, Figure 3shows the relative touch coordinates for both “Loebsack”or “Miller-Meeks” recorded as (197,39) even thoughtheir absolute coordinates differ. This use of relativetouch coordinates allows Vote-O-Graph to record usefulinformation about the voter interaction without revealingthe selections the voter made.

5 User Study

5.1 Participants and Environment

To simulate the election experience as closely as pos-sible, studies were conducted in locations that are, orresemble, actual polling stations. Participants were re-cruited from passers-by at our study locations in John-son County, Iowa.

As of publication, 100 participants have completed thestudy. The age range was 18–75+ years; 51 were femaleand 49 were male. Computer and internet experienceranged from none to more than 40 hours a week. 22.5%











<selection> <press> 00:13.37 </press> <x-coord> 197 </x-coord> <y-coord> 039 </y-coord> <screen-update> 00:13:42 </screen-update> <release> 00:13.61 </release></selection>

Voter A Voter B

Figure 3: Relative touch coordinates. Voter A selectsLoebsack (upper left), while Voter B selects Miller-Meeks (upper right). The event log entries for both vot-ers are as shown at bottom because both voters’ touchevents were on the same location relative to their selectedbuttons.

of subjects had previous experience with a touchscreenvoting system.

5.2 ProcedureParticipants were told that the study was about “how peo-ple interact with voting machines,” with no further de-scription of the nature of the study. After demographicdata were logged, participants were instructed to vote anyway they wished and encouraged to use the system asthey normally would in a real election setting. Partici-pants were reminded that their selections would not berecorded.

Participants were free to ask questions, but wheneverpossible we gave minimal information without looking ator touching the system. After voting, participants weregiven a questionnaire and notes were taken on any com-ments made. Voting sessions took 1.33–11.14 minutes(mean=3.78, sd=2.60), depending on the physical andtechnological abilities of the participant.

5.3 TaskWe conducted randomized, double-blind voting sessionswith one of the simulated interface issues described in

Group Name Abbreviation Description Number ofSubjects

Control Cont No intentional problems, one contest per page 13Compressed Ballot Comp Multiple contests per page 11Dishonest Summary Dis Presidential selection changed on summary page 15

Delayed Response Del-100 Delayed screen response to touch events 100 ms 14Del-250 250 ms 20

Touchscreen Miscalibration Mis Up{amt} Touch coordinates transformed Up 11Mis Down{amt} Down 15

Figure 4: User Study Test Groups and Subject Counts

Figure 4. We wanted high levels of recognition for can-didates and ballot measures to give the voting act a senseof importance; a ballot with frivolous choices could leadthe participants to forget who they voted for when theyreviewed the summary screen. Participants voted on theNovember 2008 General Election ballot used in JohnsonCounty, Iowa [11], but without the option for straight-party voting. The use of a recent election ensures thatmany contests (especially top-ticket candidates) are stillfamiliar while avoiding the risk that voters might thinkthey have voted in a real election.

The 2008 Johnson County, Iowa ballot had 24 con-tests, three of which allowed for multiple selections, fora maximum total of 31 selections per ballot. We cre-ated two different ballot designs: standard and com-pressed. The standard ballot placed only one contestper ballot page and was used in the Control, DelayedResponse, Dishonest Summary, and Miscalibration ex-perimental groups. The compressed ballot was designedto minimize the number of ballot pages whenever pos-sible and was used to test our hypotheses about bannerblindness.

We hypothesize that the events logged by Vote-O-Graph are sufficient to allow the diagnosis of interfaceproblems, but without experimental evaluation we can-not justify requirements that these events be logged byproduction voting systems. In many cases we expect tobe able to diagnose user interface problems by compar-ing statistical measures of the event logs against normsderived from experiments. These hypotheses are furtherdetailed in Sections 6.1, 6.2, 6.3, and 6.4.

6 Hypotheses and Results

6.1 Dishonest Summary ScreensVoting machines are complex systems that perform manydifferent functions. As such, they are constructed of mul-

Cont Del-100Dis Del-250 MisUp

MisDown

0

10

20

30

5

15

25

35

Num

ber o

f Add

ition

al N

avig

atio

n Ev

ents

Figure 5: Additional Navigation Events. Whiskers showthe inner-90% range, boxes show inner quartile, the di-viding line in the box is the median. In this case, the me-dian for all but the Dishonest Summary group was zero.

tiple layers. A typical voting system consists of firmwarethat interprets a ballot description that solicits choicesthrough the user interface. Ideally, event logs should berecorded by the lowest system level possible, below alllayers that vary from election to election or that are sen-sitive to candidates or parties.

Ballot designs are especially vulnerable to attack be-cause small changes have the potential not only to mis-lead and influence voters’ selections, but also falsify therecord of a vote. For example, a dishonest ballot de-scription could cause a voting machine to record a vote

different from the selection shown to voter [22]. Withmany voting systems dishonesty can be effectively ac-complished in the ballot description without changingthe firmware. This is often used as a hypothesis to ex-plain the phenomenon that the media and activists havecalled “vote flipping.” Everett’s work showed that ap-proximately one out of three voters verify informationon the summary page [7]. From this, we expected thatabout a third of voters would observe a change in theirselections on the summary screen and attempt to correcttheir misrecorded votes via the update page.

To simulate a dishonest ballot description, we changeda subject’s initial selection in the Presidential conteston the summary screen. Votes for Barack Obama wereswitched to John McCain, all other presidential selec-tions (including abstentions) were switched to BarackObama. Subsequent changes made by the subject to theirselection for the Presidential contest were not modified.

Dishonesty in the summary screen led to significantchanges in navigation behavior. The standard ballot usedin these groups required the subject to navigate forward37 times to complete the ballot. We expected that a sub-ject discovering a problem at the summary screen wouldresult in an increase in navigation events to update theincorrect selection. Updating each contest requires twonavigation events: one to return to the page for a givencontest, and another to return the summary screen. Weobserved this increase in navigation events to update con-tests and a sharp increase in the number of navigationevents back and forth between review pages.

15 subjects experienced a ballot with a dishonest sum-mary screen. 67% of those subjects noticed the dishon-esty and reviewed at least one contest. 33% subjectsmade no reviews and completed the ballot with the min-imum number of navigation events as Everett’s resultspredict.

Despite the fact that many subjects apparently did notnotice our dishonesty, the effect of dishonesty is still ev-ident in the event logs. On average, subjects with a dis-honest ballot reviewed one contest and performed addi-tional navigation between the summary screens resultingin an average of 47.8 navigation events, nearly 12 eventsmore than the minimum of 37. In the Control group, theaverage number of navigation events was 39.6, only 2.6higher than the minimum. The results for the DishonestSummary group and those of all other groups except forCompressed Ballot are presented in Figure 5. This is be-cause the minimum number of navigation events for thatballot layout is different.

More than a quarter of subjects in the Dishonest Sum-mary group performed 50% or more additional naviga-tion events. This result points to a technique which canbe used to detect voters responding to abnormal resultsthe summary screen. Setting a threshold based on the

number of voters who exceed a certain number of navi-gation would also be able to detect dishonesty. We rec-ommend more study before determining the optimal set-tings for such thresholds.

In addition to identifying problems which appear tothe voter as a dishonest summary screen, touchscreenmiscalibration causes errors which are frequently notcorrected until the voter reaches the summary screen.Voters in the Miscalibration groups who corrected con-tests from the summary screen also perform substantiallymore navigation events than the Control group. We de-scribe techniques for identifying touchscreen miscalibra-tion in the following section.

6.2 Touchscreen MiscalibrationTouchscreen devices consist of two completely separatecomponents: a display screen, and the touch input de-vice that overlays the screen. Because of this separation,there is no intrinsic relationship between a point on thedisplay screen and touch sensor directly above it. Whenthe display screen and touch sensor do not correspondthe touchscreen is said to be miscalibrated.

Systems can be deliberately or accidentally miscali-brated by touching the wrong locations during the cali-bration process, or unintentionally miscalibrated by thevoter resting one hand on the screen while voting withthe other [12].

If a touchscreen device is miscalibrated by a constantdisplacement, then all recorded touch coordinates will beoffset by the same constant. This offset vector will be thesame, regardless if the coordinate is relative to the screenas a whole or to a target, such as a button, on the screen.

Moffatt discovered that there is a general trend for sub-jects to tap below the middle of a target with 82% of tar-get selection errors occurring on the item immediatelybeneath the intended target. Likewise, a target selectedin the top 10% of its height is 11 times more likely tobe intended for the item above it than for the selecteditem itself [18]. From this, we hypothesized that verticalmiscalibration would impact the average relative verticalcoordinate for button presses.

We simulated miscalibration by intercepting touchevents and transforming the coordinates by a constantvertical offset vector. The buttons used in all sessions hada height of 90 pixels (18.4 mm). Offsets were ±15%–30% of button height, resulting in physical offsets of±13–27 pixels (±2.6–5.5 mm).

As of publication, 5,713 vertical touch coordinateshave been recorded. 4,069 touches were not perturbed.653 touches were perturbed upwards (the recorded touchwas above the physical touch location). 991 toucheswere perturbed downwards (the recorded touch was be-low the physical touch location).

Distance from bottom of button

Freq

uenc

y of

Tou

ch

0 10 20 30 40 50 60 70 80 90

Figure 6: Frequency of relative touch positions on a 90pixel (18.4 mm) button. Out of 4,069 normally calibratedtouch events, 3,091 (76%) of these touches fell belowthe center of the button, while only 29 (0.71%) of thesetouches were in the top 10% of the button.

The average vertical coordinate for normally cali-brated touches was approximately one third above thebottom of the button (height = 34.28 pixels (7.3 mm),sd = 16.46 pixels (3.5 mm)). (See Figure 6.) Perturba-tions in average coordinates for the various miscalibra-tion experiments were proportional to the direction andmagnitude of their offsets (F5, 5,707 = 360.19, p < 0.001).(See Figure 7.)

These results demonstrate the potential of relativetouch coordinates as an anonymity-preserving techniqueto detect and diagnose touchscreen miscalibration. Ourdata agree with Moffatt’s findings on the distribution oftouches. The tendency to touch targets below the middlewas especially pronounced: 3,091 of the 4,069 (76%)unperturbed touches were in the lower half of a button,while only 29 (0.71%) touches were in the top 10% ofa button. Perturbed touch coordinates followed similardistributions when readjusted by their initial offset vec-tors. This consistency in physical touch behavior meansthat miscalibration that is small with respect to the screenas a whole is still detectable.

The consistent distribution of touch positions allowsus to use the average location of a touch within a but-ton to measure the degree of miscalibration. A greaterupward miscalibration causes a button touch be to berecorded closer to the top of the button. Downward

0

10

20

30

40

50

60

70

80

90

Dis

tanc

e fro

m b

otto

m o

f but

ton

(in p

ixel

s)

Down27

Down22

Down13

Up13

Up27Normal

Degree of Miscalibration (in pixels)

Figure 7: Recorded touch positions on a 90 pixel but-ton for all offset vectors tested. Notation as in Fig-ure 5 with circles marking means. Perturbations in av-erage coordinates for miscalibration vectors were pro-portional to the direction and magnitude of their offsets(F5, 5,707 = 360.19, p < 0.001)

miscalibration causes a touch to be recorded closer tothe bottom of the button, as shown in Figure 7. Giventhe existing tendency to touch the lower half of a but-ton, as the degree of downward miscalibration increases,the median location becomes lower than the mean loca-tion, while upward miscalibration maintains a touch fre-quency distribution similar to that of normally calibratedtouches.

Because a running average gives an inaccurate viewof the density of the distribution of touch coordinates,we recommend that each relative coordinate be loggedto help identify touchscreen miscalibration.

Our results demonstrate that downward miscalibra-tion strongly effects other aspects of interaction withthe system. As discussed in Section 6.1, there is ahigher incidence of additional navigation events, indi-cating that selections need to be re-checked more often.Also, the average number of background touches wassignificantly higher for subjects in Downward Calibra-tion groups (mean = 24.40, sd = 20.08), see Figure 8.Both numbers indicate that subjects are missing their in-tended targets significantly more when the touchscreenis downward miscalibrated.

Cont Comp Dis Delay(All)

MisUp

MisDown

Num

ber o

f bac

kgro

und

touc

hes

50

40

30

20

10

0

45

35

25

15

5

Figure 8: Recorded background touches. Notation as inFigure 5.

6.3 Compressed Ballots

We experimented with compressed ballots in order to in-vestigate banner blindness. Banner blindness refers toa phenomenon where computer users fail to notice ban-ner ads, even if the ads are prominently placed, large,colorful, or animated [19]. The effect is particularlypronounced if the banners are placed at the top of ascreen [2]. It has been suggested that banner blindnessmay have been at least partly to blame for the unusu-ally large percentage of undervotes in CD13 where thecontest was placed at the top of the screen, above a high-lighted line [6, 10]. (See Figure 1.)

On our compressed ballots, we placed the US Senateand US House of Representatives contests on the sameballot page. We also compressed the 15 judicial contestsdown to 6 ballot pages. We expected to see two trendswith the compressed ballot style. First, we expected tosee a decrease in the rate of votes for the US Senate con-test because some voters would miss the contest. Second,we expected to see a slight increase in the rate at whichvoters change their senatorial votes because the reviewpage would be the first time a voter notices the contest.

Out of the 11 subjects who voted on a compressed bal-lot only one failed to notice the US Senate contest whilevoting, but caught the omission on the review screen.Not only was this visible in the event log data, but thevoter also commented on the difficulty to find the con-test. The low US Senate omission rate may be because

Voter Reaction Time

Delayed DisplayFeedback

Voter Reaction Time

Normal DisplayFeedback

Sensor Threshold

Force

Time

Delayed

Normal

Normal

DelayedUpdate-ReleaseInterval

Recorded Touch Interval

Figure 9: Hypothetical force-delay relationship. Thecurve of force versus time is not to any scale.

our screen layouts and designs were not sufficiently mis-leading. However, 1 out of 11 (9%) is consistent with theincreased undervote rate in the CD13 election. A largersample size is necessary to be conclusive.

6.4 Touchscreen InsensitivityTouchscreen insensitivity was reported as one possiblecause of the problems in Sarasota CD13 with systemvendors acknowledging the existence of delay as inten-tional [6]. Delay in system response can be quite frus-trating and has been shown to markedly increase errorrates at 225 ms delay. Shorter, less obvious delays areperceived to be tactile: at 66 ms delay, subjects reportthat some input devices feel “spongy” [15].

We expected that an increase in delay time would re-sult in greater force being applied to the screen. Thiseffect is illustrated in Figure 9. A number of events oc-cur between the time a subject’s finger touches the screenand releases from the screen. First, as the force betweenfinger and screen crosses the screen’s sensor threshold,the computer is notified of the touch. The Vote-O-Graphprogram then computes feedback and displays it. Thedisplay-feedback time averages 9.6 ms for changes tocandidate selection; the median is 5.7 ms. We assumethat the subject does not begin releasing until the systemresponse is displayed, and just as the finger pressure in-creases with time while waiting for a system response,the release is not instantaneous.

Cont Comp Dis MisUp

MisDown

Del100

Del250

600

400

200

0

700

500

300

100

Tim

e fro

m p

ress

to re

leas

e (in

milli

seco

nds)

Figure 10: Length of touch times from finger touch tofinger release. Notation as in Figure 5.

We hypothesize that if a voter must press the touch-screen longer or harder to select a ballot item, the update-release interval will increase. To detect touchscreen in-sensitivity, we record the time feedback is displayed andthe finger release times for each candidate selection andde-selection event. We also recorded the actual timesthe sensor threshold was crossed at touch and release.We did not record this data for screen updates involvedwith ballot navigation. Our experimental test involvedadding a delay of 100 ms or 250 ms to the display-feedback time in order to simulate varying degrees oftouchscreen insensitivity. Several subjects who experi-enced the 250 ms delay commented that they had to pressthe screen with unexpectedly high force, confirming thata delayed response is indistinguishable from an insensi-tive touchscreen.

The update-release interval for the combined Control,Compressed Ballot and Dishonest Summary groups av-eraged 155 ms. The 100 ms Delayed Response grouphad update-release intervals comparable to this, aver-aging 124 ms, while the 250 ms Delayed Responsegroup averaged 226 ms. The short to average times for100 ms group indicate that many subjects did not signif-icantly perceive the added delay, so their behavior didnot change. We suspect that this is a result of somesubjects not waiting for a screen update before releasingtheir fingers but instead tapping the screen for a dura-tion of at least 100 ms. Several subjects 250 ms DelayedResponse group commented that they had to press the

Tim

e fro

m s

cree

n up

date

to re

leas

e (in

milli

seco

nds)

Cont Comp Dis MisUp

MisDown

Del100

Del250

400

0

500

200

300

100

Figure 11: Length of response times from screen updateto finger release. Notation as in Figure 5.

screen with unexpectedly high force, confirming that adelayed response is indistinguishable from an insensitivetouchscreen.

7 Unexpected Results

7.1 Insensitivity Deters ProofreadingIn the 250 ms Delayed Response group, we observed amarked decrease in the number of subjects who reviewcontests compared with the Control group. Of the 12subjects in the Control group, 50% subjects made two ormore extra navigation events and 17% made six or moreextra events. The 100 ms Delayed Response group hadbehavior similar to that of the Control group: 14 subjectswere in the 100 ms Delayed Response group. Less than50% made any extra navigation events 25% made four ormore.

There are two hypotheses to explain the reduction incontest review behavior. Subjects do not bother review-ing contests because the increased delay makes subjectsmore confident that no review is necessary, or they donot review contests because the increased delay is suffi-ciently annoying that subjects would rather just get thewhole thing over with.

We suspect the latter hypothesis is more likely to becorrect. Several subjects in the Delayed Response groupscomplained about the touchscreen, with more vehementcomplaints in the 250 ms Delayed Response group. This

Cont Comp Dis MisUp

MisDown

Del100

Del250

0

10

2 0

30

5

15

25

Num

ber o

f Con

test

Sel

ectio

ns

Figure 12: Contest selection rates. Subjects could makea maximum of 31 selections. Notation as in Figure 5.

echos our experience testing the Vote-O-Graph. Whenthe Delayed Response mode was turned on, the touch-screen felt gummy and insensitive so that using it wasdistasteful. This effect may well have played a role inthe Sarasota CD13 contest where the system vendor ac-knowledged a delayed response with their touchscreen.

7.2 Compressed Ballots can be Good

We observed that subjects who voted on a compressedballot voted on more contests than any other group (seeFigure 12). The increase in selections was primarily inthe compressed judicial contest pages.

This increase contradicts the supposition that multiplecontests on a single ballot page will increase the residualvote count [13]. During the voting session and in post-voting comments, subjects who did not receive a com-pressed ballot spontaneously suggested that they wouldhave preferred a compressed judicial ballot.

8 Mitigations

Logging the the number of navigation and candidate se-lection events for each voter helps identify unexpectedresults at the summary screen; however this carries risksrevealing abstentions from certain contests. This appliesto any system that uses a linear navigation model andrecords both navigation and candidate selection events.

If the event log for a voter contains the minimum num-ber of navigation events needed to cast a ballot, then allof the navigation events represent forward navigation, soit is possible to identify which page the voter was view-ing at all times. In this context, every abstention will besignaled by two consecutive navigation events with nointervening candidate selection.

An event log that reveals voter abstentions may be ac-ceptable. The right to a secret ballot was instituted toprotect voters from coercion, and it is not clear that re-vealing abstentions subjects voters to the same coercionrisk that they would face if their selections were revealed.We take no stand on this public policy question, so inthe event that leaking information about abstentions is aproblem, we suggest the following mitigation measures.

One way to mask abstentions is to provide an explicit“abstain” option for all contests. Selecting this optionwould record a selection event identical to other candi-date selections. In addition to obscuring voter absten-tions, this scheme ensures that there are no unintentionalundervotes.

Another option is to record only backward navigationevents, since all such events are extra events. The num-ber of backward navigation events with our navigationmodel is half the difference between the total number ofnavigation events in a voting session and the minimumnumber of navigation events per voted ballot. This op-tion still leaks the total number of candidates selected ona ballot, while hiding the contests involved. If we wantto guarantee the right of a voter to anonymously cast acompletely blank ballot, this option will not suffice.

If we opt not to force explicit selection of an absten-tion option and we wish to avoid leaking informationabout which voters cast blank ballots, we must not recordcandidate selection events. In this case, we can stilllearn about touchscreen miscalibration from two classesof events: candidate deselection events and backgroundtouches. We could also record, for each voting session,the average position within candidate selection and nav-igation buttons, provided that all buttons are the samedimension. Similarly, we can record, for each session,the average time between displaying feedback and but-ton release.

Note that all of these mitigation options except forexplicit abstention eliminate the opportunity to estimatewhich contest was the source of trouble in a problem bal-lot. It is not always possible to detect which page is prob-lematic with the primary technique we proposed sincewe do not differentiate forward and backward naviga-tion. However, in cases where a subject navigates backremoves a selection and then makes a new selection itcan be inferred that they must have navigated backward,since they could not remove a selection on a contest thefirst time they see it.

9 Future Work

Further examination of the impact on selection rates us-ing compressed ballots is needed. Several authors havesuggested that displaying multiple contests on a screencan increase undervotes [3, 13]. However, some havenoted undervote rates increased on longer ballots, sug-gesting that some voters may become fatigued. Our re-sults suggest that compressing ballots is indeed valuable,but we hesitate to make specific ballot design recommen-dations without further work.

Some of our experimental groups are too small to al-low us to draw firm conclusions. We only had one sub-ject who clearly exhibited banner blindness. It would beuseful to enlarge this experimental group significantly.

In testing the impact of touchscreen sensitivity, itwould be useful to use a force-sensing screen so thatwe could directly measure the force versus time behaviorduring touch events. This would be particularly valuablein a study of the conditions under which some users tapthe screen without waiting for a response, and those whotouch the screen until they see a response.

We required voters to navigate linearly through theentire ballot before visiting the summary screen whichserved as a menu for navigating back to contests to cor-rect errors. We do not know how many voters found thechange of navigation scheme from linear to menu-basedto be confusing. Experiments with other approaches toballot navigation are clearly needed.

We required explicit deselection of candidates beforeanother candidate could be selected after voters had se-lected the maximum number of candidates permitted ina contest. We noted that this caused difficulty for somevoters attempting to change their selections. There areother models. Consider, for example, first-in-first-out se-lection where, after a voter selects the maximum permit-ted number of candidates, additional selections cause de-selection of the oldest previous selection. The impact ofsuch alternative models should be explored.

10 Conclusion

Our study demonstrates several types changes to voterbehavior under different circumstances which lead us tomake new recommendations for both event logs and bal-lot layout. The increase in votes cast on judicial retentioncontests in the Compressed Ballot group demonstratesthat placing multiple contests on a page can decrease un-dervotes in some circumstances, particularly when thecontests are of the same type.

The changes we observed in voter behavior when us-ing a malfunctioning system can be used to develop aset of decision rules to help identify those malfunctions,which should lead to new requirements for voting system

event logs that increase the likelihood a post-election au-dit could properly identify abnormalities in voting sys-tem behavior.

Recording the frequency at which voters navigate backto certain contests from the review screen helps identifycontests which were undervoted due to poor ballot de-sign. It could also indicate the presence of a dishonestballot design. When voters navigate back to contests theyhave previously visited, this indicates that something iswrong. Touchscreen miscalibration, particularly down-ward miscalibration, leads to increased backward navi-gation rate, but a dishonest presentation on the summaryscreen leads to a far greater effect.

We can distinguish the effect of touchscreen miscal-ibration from a dishonest voting machine by looking atthe background touch rate and changes in the averagetouch location in the direction of the miscalibration. Inaddition, the average vertical position of a touch relativeto the button touched is a sensitive measure of the qualityof touchscreen calibration.

The interval of time between visual feedback from atouch, and the finger release is an effective measure ofthe sensitivity of the touchscreen.

While we have not proposed specific decision rulesfor diagnosing problems with touchscreen voting sys-tems, our results support a requirement that voting ma-chine event logs include records of touch duration, loca-tion relative to the touched button, background touches(with no location information) and backward navigationevents. The changes recommended in this work couldsignificantly strengthen the routine audits of voting sys-tems and provide investigators more tools to diagnosereported problems in elections, while preserving voters’right to a secret ballot.

11 Acknowledgments

This material is based upon work supported by theNational Science Foundation under A Center forCorrect, Usable, Reliable, Auditable and TransparentElections (ACCURATE), Grant Number CNS-0524745.

References[1] 107TH U. S. CONGRESS. Help America Vote Act of 2002, Oct.

2002.

[2] BENWAY, J. P. Banner Blindness: What Searching Users Noticeand Do Not Notice on the World Wide Web. PhD thesis, RiceUniversity, Houston, TX, USA, Apr. 1999.

[3] COMMISSION ON LAW AND AGING, STANDING COMMITTEEON ELECTION LAW, AND COMMISSION ON MENTAL ANDPHYSICAL DISABILITY LAW. Report to the House of Delegates— Approved by the ABA House of Delegates on August 13, 2007.American Bar Association, Aug. 2007.

[4] COMMITTEE ON NATIONAL SECURITY SYSTEMS. National in-formation assurance glossary. Tech. Rep. 4009, June 2006.

[5] CORDERO, A., AND WAGNER, D. Replayable voting machineaudit logs. In Proceedings of the 2008 USENIX/ACCURATEElectronic Voting Technology Workshop (July 2008).

[6] DILL, D. L., AND WALLACH, D. S. Stones unturned: Gapsin the investigation of Sarasota’s disputed congressional election.Apr. 2007.

[7] EVERETT, S. P. The Usability of Electronic Voting Machines andHow Votes Can Be Changed Without Detection. PhD thesis, RiceUniversity, Houston, TX, USA, May 2007.

[8] FEDERAL ELECTION COMMISSION. Performance and test stan-dards for punchard, marksense and direct recording electronicvoting systems. Tech. rep., Federal Election Commission, Jan.1990.

[9] FEDERAL ELECTION COMMISSION. Voting systems perfor-mance and test standards. Tech. rep., Federal Election Commis-sion, 2002.

[10] FRISINA, L., HERRON, M. C., HONAKER, J., AND LEWIS,J. B. Ballot formats, touchscreens, and undervotes: A study ofthe 2006 midterm elections in Florida. Election Law Journal 7, 1(Mar. 2008), 25–47.

[11] JOHNSON COUNTY IOWA AUDITOR’S OFFICE. November 4,2008 Presidential Election, Nov. 2008.

[12] JONES, D. W. Observations and recommendations on pre-election testing in Miami-Dade County. Sept. 2004.

[13] KIMBALL, D. C., AND KROPF, M. Voting technology, ballotmeasures, and residual votes. American Politics Research 36, 4(2008), 479–509.

[14] KOHNO, T., STUBBLEFIELD, A., RUBIN, A. D., AND WAL-LACH, D. S. Analysis of an electronic voting system. In Pro-ceedings of the 2004 IEEE Symposium on Security and Privacy(May 2004).

[15] MACKENZIE, I. S., AND WARE, C. Lag as a determinant ofhuman performance in interactive systems. In Proceedings of theINTERACT ‘93 and CHI ‘93 Conference on Human Factors inComputing Systems (Apr. 1993), pp. 488–493.

[16] MCCLURE, N. L., WIELAND, R. D., BABBITT, V. L., ANDNICHOLS, R. A. Precinct voting system, June 2006.

[17] MCTAMMANY, J. Balloting device, Aug. 1893. US Patent no.502,743.

[18] MOFFATT, K. A., AND MCGRENERE, J. Slipping and drifting:Using older users to uncover pen-based target acquisition difficul-ties. In Proceedings of the 9th International ACM SIGACCESSConference on Computers and Accessibility (Oct. 2007), pp. 11–18.

[19] PAGENDARM, M., AND SCHAUMBURG, H. Why are usersbanner-blind? The impact of navigation style on the perceptionof web banners. Journal of Digital Information 2, 1 (2001).

[20] U.S. ELECTION ASSISTANCE COMMISSION. Voluntary votingsystem guidelines. Tech. rep., 2005.

[21] YASINSAC, A., WAGNER, D., BISHOP, M., BAKER, T.,DE MEDEIROS, B., TYSON, G., SHAMOS, M., ANDBURMESTER, M. Software review and security analysis of theES&S iVotronic 8.0.1.2 voting machine firmware, final report.Tech. rep., Security and Assurance in Information TechnologyLaboratory, Florida State University, Feb. 2007.

[22] YEE, K.-P. Prerendered user interfaces for higher-assurance electronic voting. In Proceedings of the 2006USENIX/ACCURATE Electronic Voting Technology Workshop(Aug. 2006).

[23] YEE, K.-P. Building Reliable Voting Machine Software. PhDthesis, University of California at Berkeley, Berkeley, CA, USA,Dec. 2007.

[24] YEE, K.-P. Extending prerendered-interface voting software tosupport accessibility and other ballot features. In Proceedingsof the 2007 USENIX/ACCURATE Electronic Voting TechnologyWorkshop (Aug. 2007).

[25] YEE, K.-P. Pvote. http://pvote.org/, Mar. 2007. version 1.0(beta).

Date post:	07-Aug-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Towards Publishable Event Logs That Reveal Touchscreen Faults … · 2019-02-25 · Towards...

Documents