On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability

Field Study Niharika Sachdeva*, Nitesh Saxena, Ponnurangam Kumaraguru*

University of Alabama, Birmingham*IIIT-‐Delhi InformaDon Security Conference, 2013 (Nov 13 – 15)

Overview � MoDvaDon � Research quesDon � Study Design � Experimental setup � ParDcipants � Results � Guidelines

2

3

Some Attacks

CAPTCHA

� Completely Automated Public Turing Test to tell Computers and Humans Apart

Google ReCAPTCHA

Yahoo Math FuncDon

Is it Really Useful ??

� FrustraDng � Lack of incenDve � Hard to recognize � Difficult to solve � NaDve language

5

E. Bursztein, S. Bethard, C. Fabry, J. Mitchell, and D. Jurafsky. How Good Are Humans at Solving CAPTCHAs? A Large Scale EvaluaDon. SP ’10, pages 399–413. J. Yan, A. Ahmad. Usability of CAPTCHAs Or usability issues in CAPTCHA design. In Symposium On Usable Privacy and Security, pages 44–52, 2008.

But CAPTCHA continues to Rule

CAPTCHA for RoboCalls

6

But CAPTCHA continues to Rule

CAPTCHA for RoboCalls

7

Audio CAPTCHA a solution?

�  Yahoo �  Google � Patent CAPTCHA

8

Research Question

� QuanDfy the amount of inconvenience CAPTCHA causes to users.

� How different features of CAPTCHA, e.g. duraDon, size and character set influence the users’ performance?

-  H1: Close to the expected / correct answers even though the overall CAPTCHA solving accuracy is low.

-  H2: Accuracy of answering the CAPTCHA correctly on telephony decreases as the number of key presses required increases.

-  H3: Users will take more Eme responding to a CAPTCHA that requires more key presses than to the one requiring less key presses.

9

Study Design

10

LaDn Square

Polakis, G. Kontaxis, and S. Ioannidis. CAPTCHuring Automated (Smart) Phone Aiacks. In SYSSEC, 2011.

CAPTCHA Features

Category Char. Set Word Repeat Duration Nois

e Voice Beep Min length

Max length

Google 0-9 No Yes 34.4 Yes M Yes 5 15

Ebay 0-9 No No 3.7 Yes V No 6 6

Yahoo 0-9 No No 18.0 Yes Child No 6 8

Recaptcha a-z Yes No 10.6 Yes F No 6 6

Slashdot a-z Yes No 2.9 No M No 1 1

CD 1-5 No No 14 Yes M No 1 1

Math-function 0-9 No No 6.0 No M No 4 3

RPC 0-9 No No 20.0 No M No 3 2

C+CD 0-9 No No 14.0 No M No 4 3

11

M = Male ; F = Female; V=Various Voices

Deployment

Linux Server acting as CAPTCHA Shield (With

FreeSWITCH)

Source (Legitimate or malicious)

Database

File System

Java Application

Linksys Gateway SPA 3102

IVRS Playing CAPTCHA

PSTN

Cellular Network

IP phone

VOIP

Architecture Diagram 12

Participants

� 90 ParDcipants � Five ciDes - Delhi - Mumbai - Chennai - Noida - Vellore

� Real world deployment

13

Results: Accuracy

14

CAPTCHA Category Accuracy (%) Skip (%)

CD Telephony 18.71 35.67

Math-‐FuncDon Telephony 17.47 26.51

RPC Telephony 15.47 40.33

C+CD Telephony 4.57 40.10

Ebay Web (Number) 8.75 13.13

Google Web (Number) 0.00 43.83

Yahoo Web (Number) 7.74 20.24

ReCaptcha Web (Alphabet) 0.00 46.07

Slashdot Web (Alphabet) 13.73 30.06

Results: Time taken

15

CAPTCHA Category Time (s)

CD Telephony 96.11

Math-‐Func Telephony 90.23

RPC Telephony 147.44

C+CD Telephony 109.59

Ebay Web (Number) 80.25

Google Web (Number) 123.49

Yahoo Web (Number) 95.88

ReCaptcha Web (Alphabet) 120.64

Slashdot Web (Alphabet) 122.57

Results:H1 � H1: Close to the expected / correct answers even though the overall CAPTCHA solving accuracy is low.

16

1 2 3 4 5 6 70

5

10

15

20

25

30

35

40

45

50

Edit Distance

Num

ber o

f Cap

tcha

Yahoo!

eBay

Google

Slashdot

RPC

Math−Function

CD

C+CD

17

Math function, but we noticed a negative relationship with correlation coefficient r =� 0.47 forweb-based captcha. Finally, we found significant difference (t-test, t-value = 5.30 p-value <0.001) between Expected Key Press (Average DTMF) and accuracy in statistical results showsthat these two were independent of each other. The results mentioned above do not approve ourhypothesis H2.

Table 3: Presents the Average DTMF expected for captcha (Avg. DTMF), accuracy, time andAverage DTMF input by users (Avg. User DTMF) of each captcha.

Scheme Category Avg.DTMF

Accuracy Time Avg. UserDTMF

CD Telephony 1.00 18.71 96.11 1.76Math-function

Telephony 2.05 17.47 90.23 2.71

RPC Telephony 3.00 15.47 147.44 3.92C + CD Telephony 2.06 4.57 109.59 2.65Ebay Web 6.00 8.75 80.25 3.85Google Web 6.36 0.00 123.49 4.68Yahoo Web 7.09 7.74 95.88 4.99Slashdot Web 15.34 13.73 120.64 6.02ReCaptcha Web 64.93 0.00 122.57 10.97

H3 – Time vs. Number of key press: Table 3 shows that users spent varying amount of timein submitting a comparable number of DTMF responses. For example, the average time spentfor Google was 123.49 seconds (min: 17.15 and max: 341.21) whereas for Yahoo, it was 95.88seconds (min: 25.88 and max: 278.00), although both of them had same average DTMF (5) toinput. There was a significant difference between the time taken to solve Google vs. Yahoo!(t-Test, t-value = -12.39, p-value < 0.01). Further, we found a correlation (r =0.85) betweentime spent and DTMF input for Math-function captcha, suggesting increase in the time wasproportionate to DTMF input. However, this correlation dropped to r = 0.56 for web-basedcaptcha, implying an absence of any strong relativity between time and DTMF input. Theresults from our study suggest lack of any strong relationship between the time spent by theparticipants in solving a captcha and the number of DTMF input from them. We found that thecorrelation between the time spent to answer the captcha and DTMF response from the userswas 0.36 for all the captcha used in our study. We found significant difference (t-test, t-value =4.33, p-value = 0.00045) between number of key press (Average User DTMF) and accuracy instatistical results suggesting that these two were independent of each other. We further tested,if the duration for which a captcha is played influences the accuracy but found that exposingusers longer to a captcha did not help improve solving accuracy. Figure 5 shows the averageplaytime of the number web-based captcha (eBay, Yahoo, Google) varied from as low as 3.7to 34.4 seconds where all these required similar number of DTMFs to be recognized. Googlecaptcha provided a feature to repeat the challenge in each attempt, without users asking for itexplicitly, irrespective of these; the correct response was 0% for Google and 8.75% for eBay.

12

•  H2: Accuracy of answering the CAPTCHA correctly on telephony decreases as the number of key presses required increases

Results: H2

Results: H3 � H3: Users will take more Dme responding to a CAPTCHA that requires more key presses than to the one requiring less key presses.

0"

10"

20"

30"

40"

50"

60"

70"

80"

�Ebay"" Google"" Yahoo"" Slashdot"" Recaptcha"

Key"press"(#)" Accuracy"(%)" Avg"play"Gme(sec)"

18

User Experience

0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%#

#Strongly#disagree#

Disagree#

Nuetral#

Agree#

Strongly#Agree#

Complexity# Frequently#use# Confidence# Technical#help#

Figure 6: Users reported the system to be complex,not usable, suggesting the need for technical helpfor using captcha over telephony.

types. Users took the least of 80.25 seconds to solve eBaycaptcha. There was no significant di↵erence in time takenby users for solving web-based captcha to telephony-basedcaptcha. We found that the most time consuming captchaon telephony was RPC captcha with an average solving timeof 147.44 seconds. Among the web-based captcha, Google,ReCaptcha and Slashdot were the most time consuming withmean greater than 120 seconds (min: 120.64 seconds andmax: 123.49 seconds). On analyzing the existing studieson the web, we found that Google and ReCaptcha were themost time consuming with a mean value greater than 25 sec-onds. However, users took on an average 12 seconds to solveSlashdot captcha on the web [8] in comparison to 122.57seconds on telephony. The increase in the solving time in-dicates inconvenience caused to users for solving captcha ontelephony is more than that on the web.

6.1.3 User ExperienceAs the ultimate assessment metric to understand the in-

convenience caused, we study the feedback provided by theparticipants for di↵erent captcha types. About 50% of theparticipants found the system (on which users called to an-swer captcha in our study) extremely complex to use, andonly 17% felt confident using this system (as shown in Fig-ure 6). Users’ feedback suggests that they did notlike alphabet captcha at all. When asked about whichcaptcha they would prefer, numeric, alphabetic or mathfunction, only 14.44% of the users preferred use of alpha-bet audio challenge. Most of the users (52.22%) preferrednumeric captcha (Contextual Degradation, Random menucaptcha and numeric web based captcha) and 33.33% of theusers favored captcha involving math functions i.e. math-function and math-function with contextual degradation.We also analyzed if age has any e↵ect on captcha preferenceof the participants. Participants in the age group of 36 to50 did not wish to use the Alphabet captcha at all, whereas numeric captcha was appreciated the most among all agegroups.

Next, we wanted to analyze if participants felt that us-ing speakerphones or headphones would help them solvethe captcha better. We found that 30% of the users dis-agreed and 8.89% strongly disagreed, feeling that speaker-phones will be of no use. A participant mentioned ‘ ‘the voice

0%#

10%#

20%#

30%#

40%#

50%#

60%#

70%#

80%#

90%#

100%#

18-24# 25-35# 36-50# 51-65#

Par$cipa

nts+(%)++

Age+

#Numeric# #Mathfunc;on# #Alphabets#

Figure 7: Participants in various age groups foundthe numeric captcha to be most usable whereas al-phabet captcha was least preferred by the partici-pants.

recording was not clear therefore any audio accessory wouldnot have helped” disagreeing with the use of speakerphones.Some participants supported the use of speakerphones with24.44% agreeing and 8.89% strongly agreeing. We also askedthe participants if they felt the use of headphones wouldhelp them solve the captcha better. The results show that15.56% of the users strongly agreed and 70% agreed; usingheadphones would help them respond better to the chal-lenge. The users also complained about audio captcha notbeing clear and being less audible.In order to understand the user perspective about the

mode of input, we asked the users if they would prefer to re-spond verbally to the captcha challenge; 36.67% of the usersagreed with the statement and 28.89% strongly agreed. Thissuggests that entering responses using a keypad is di�cultand causes trouble for respondents; this might be one of theprimary reasons for errors in the captcha responses leadingto low solving accuracy in the study [25]. Further, we sug-gest the need for further research to investigate what makesan audio captcha easier to answer - verbal response or key-pad touch.

System Usability Scale (SUS).We calculated the SUS score for the telephony captcha

as 38.42, which is extremely low. 9 We further analyzedthe raw comments from the users. The users mentionedfollowing problems with respect to audio captcha on tele-phony: accent, clarity of the voice, and noise. Participants(35.71%) complained about voice being not clear, a partici-pant commented“It sounded like ghost voices. I was not ableto understand almost any utterance,”with another user feel-ing “The voice is not clear and the words are not distinctlyspoken.” Around 8.92% explicitly complained about the ac-cent in the voice captcha stating, “at least the accent shouldbe better,” another participant mentioned, “with Accent bet-ter, the system is good to use.” Another issue, which wasmentioned, was noise. Participants (17.85%) complainedabout the noise in the audios being very disturbing. A par-ticipant commented “too many disturbances and the accentwas bad and 80% of time I couldn’t understand,”and another

9Given that SUS is 68 for average usable sys-tem.http://www.measuringusability.com/sus.php

9

19

� User friendliness

0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%#

#Strongly#disagree#

Disagree#

Nuetral#

Agree#

Strongly#Agree#

Complexity# Frequently#use# Confidence# Technical#help#

Figure 6: Users reported the system to be complex,not usable, suggesting the need for technical helpfor using captcha over telephony.

types. Users took the least of 80.25 seconds to solve eBaycaptcha. There was no significant di↵erence in time takenby users for solving web-based captcha to telephony-basedcaptcha. We found that the most time consuming captchaon telephony was RPC captcha with an average solving timeof 147.44 seconds. Among the web-based captcha, Google,ReCaptcha and Slashdot were the most time consuming withmean greater than 120 seconds (min: 120.64 seconds andmax: 123.49 seconds). On analyzing the existing studieson the web, we found that Google and ReCaptcha were themost time consuming with a mean value greater than 25 sec-onds. However, users took on an average 12 seconds to solveSlashdot captcha on the web [8] in comparison to 122.57seconds on telephony. The increase in the solving time in-dicates inconvenience caused to users for solving captcha ontelephony is more than that on the web.

6.1.3 User ExperienceAs the ultimate assessment metric to understand the in-

convenience caused, we study the feedback provided by theparticipants for di↵erent captcha types. About 50% of theparticipants found the system (on which users called to an-swer captcha in our study) extremely complex to use, andonly 17% felt confident using this system (as shown in Fig-ure 6). Users’ feedback suggests that they did notlike alphabet captcha at all. When asked about whichcaptcha they would prefer, numeric, alphabetic or mathfunction, only 14.44% of the users preferred use of alpha-bet audio challenge. Most of the users (52.22%) preferrednumeric captcha (Contextual Degradation, Random menucaptcha and numeric web based captcha) and 33.33% of theusers favored captcha involving math functions i.e. math-function and math-function with contextual degradation.We also analyzed if age has any e↵ect on captcha preferenceof the participants. Participants in the age group of 36 to50 did not wish to use the Alphabet captcha at all, whereas numeric captcha was appreciated the most among all agegroups.

Next, we wanted to analyze if participants felt that us-ing speakerphones or headphones would help them solvethe captcha better. We found that 30% of the users dis-agreed and 8.89% strongly disagreed, feeling that speaker-phones will be of no use. A participant mentioned ‘ ‘the voice

0%#

10%#

20%#

30%#

40%#

50%#

60%#

70%#

80%#

90%#

100%#

18-24# 25-35# 36-50# 51-65#

Par$cipa

nts+(%)++

Age+

#Numeric# #Mathfunc;on# #Alphabets#

Figure 7: Participants in various age groups foundthe numeric captcha to be most usable whereas al-phabet captcha was least preferred by the partici-pants.

recording was not clear therefore any audio accessory wouldnot have helped” disagreeing with the use of speakerphones.Some participants supported the use of speakerphones with24.44% agreeing and 8.89% strongly agreeing. We also askedthe participants if they felt the use of headphones wouldhelp them solve the captcha better. The results show that15.56% of the users strongly agreed and 70% agreed; usingheadphones would help them respond better to the chal-lenge. The users also complained about audio captcha notbeing clear and being less audible.In order to understand the user perspective about the

mode of input, we asked the users if they would prefer to re-spond verbally to the captcha challenge; 36.67% of the usersagreed with the statement and 28.89% strongly agreed. Thissuggests that entering responses using a keypad is di�cultand causes trouble for respondents; this might be one of theprimary reasons for errors in the captcha responses leadingto low solving accuracy in the study [25]. Further, we sug-gest the need for further research to investigate what makesan audio captcha easier to answer - verbal response or key-pad touch.

System Usability Scale (SUS).We calculated the SUS score for the telephony captcha

as 38.42, which is extremely low. 9 We further analyzedthe raw comments from the users. The users mentionedfollowing problems with respect to audio captcha on tele-phony: accent, clarity of the voice, and noise. Participants(35.71%) complained about voice being not clear, a partici-pant commented“It sounded like ghost voices. I was not ableto understand almost any utterance,”with another user feel-ing “The voice is not clear and the words are not distinctlyspoken.” Around 8.92% explicitly complained about the ac-cent in the voice captcha stating, “at least the accent shouldbe better,” another participant mentioned, “with Accent bet-ter, the system is good to use.” Another issue, which wasmentioned, was noise. Participants (17.85%) complainedabout the noise in the audios being very disturbing. A par-ticipant commented “too many disturbances and the accentwas bad and 80% of time I couldn’t understand,”and another

9Given that SUS is 68 for average usable sys-tem.http://www.measuringusability.com/sus.php

9

20

User Experience

� User preferred scheme

Guidelines

� One Dme instrucDon � Loss / Error Tolerant � Feedback � Verbal Responses

21

Thank you!! QuesDons

22

For any further information, please write to

[email protected] precog.iiitd.edu.in

Date post:	06-May-2015
Category:	Technology
Upload:	precog
View:	121 times
Download:	3 times

On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Technology