Date post: | 06-May-2015 |
Category: |
Technology |
Upload: | precog |
View: | 121 times |
Download: | 3 times |
On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability
Field Study Niharika Sachdeva*, Nitesh Saxena, Ponnurangam Kumaraguru*
University of Alabama, Birmingham*IIIT-‐Delhi InformaDon Security Conference, 2013 (Nov 13 – 15)
Overview � MoDvaDon � Research quesDon � Study Design � Experimental setup � ParDcipants � Results � Guidelines
2
3
Some Attacks
CAPTCHA
� Completely Automated Public Turing Test to tell Computers and Humans Apart
Google ReCAPTCHA
Yahoo Math FuncDon
Is it Really Useful ??
� FrustraDng � Lack of incenDve � Hard to recognize � Difficult to solve � NaDve language
5
E. Bursztein, S. Bethard, C. Fabry, J. Mitchell, and D. Jurafsky. How Good Are Humans at Solving CAPTCHAs? A Large Scale EvaluaDon. SP ’10, pages 399–413. J. Yan, A. Ahmad. Usability of CAPTCHAs Or usability issues in CAPTCHA design. In Symposium On Usable Privacy and Security, pages 44–52, 2008.
But CAPTCHA continues to Rule
CAPTCHA for RoboCalls
6
But CAPTCHA continues to Rule
CAPTCHA for RoboCalls
7
Audio CAPTCHA a solution?
� Yahoo � Google � Patent CAPTCHA
8
Research Question
� QuanDfy the amount of inconvenience CAPTCHA causes to users.
� How different features of CAPTCHA, e.g. duraDon, size and character set influence the users’ performance?
- H1: Close to the expected / correct answers even though the overall CAPTCHA solving accuracy is low.
- H2: Accuracy of answering the CAPTCHA correctly on telephony decreases as the number of key presses required increases.
- H3: Users will take more Eme responding to a CAPTCHA that requires more key presses than to the one requiring less key presses.
9
Study Design
10
LaDn Square
Polakis, G. Kontaxis, and S. Ioannidis. CAPTCHuring Automated (Smart) Phone Aiacks. In SYSSEC, 2011.
CAPTCHA Features
Category Char. Set Word Repeat Duration Nois
e Voice Beep Min length
Max length
Google 0-9 No Yes 34.4 Yes M Yes 5 15
Ebay 0-9 No No 3.7 Yes V No 6 6
Yahoo 0-9 No No 18.0 Yes Child No 6 8
Recaptcha a-z Yes No 10.6 Yes F No 6 6
Slashdot a-z Yes No 2.9 No M No 1 1
CD 1-5 No No 14 Yes M No 1 1
Math-function 0-9 No No 6.0 No M No 4 3
RPC 0-9 No No 20.0 No M No 3 2
C+CD 0-9 No No 14.0 No M No 4 3
11
M = Male ; F = Female; V=Various Voices
Deployment
Linux Server acting as CAPTCHA Shield (With
FreeSWITCH)
Source (Legitimate or malicious)
Database
File System
Java Application
Linksys Gateway SPA 3102
IVRS Playing CAPTCHA
PSTN
Cellular Network
IP phone
VOIP
Architecture Diagram 12
Participants
� 90 ParDcipants � Five ciDes - Delhi - Mumbai - Chennai - Noida - Vellore
� Real world deployment
13
Results: Accuracy
14
CAPTCHA Category Accuracy (%) Skip (%)
CD Telephony 18.71 35.67
Math-‐FuncDon Telephony 17.47 26.51
RPC Telephony 15.47 40.33
C+CD Telephony 4.57 40.10
Ebay Web (Number) 8.75 13.13
Google Web (Number) 0.00 43.83
Yahoo Web (Number) 7.74 20.24
ReCaptcha Web (Alphabet) 0.00 46.07
Slashdot Web (Alphabet) 13.73 30.06
Results: Time taken
15
CAPTCHA Category Time (s)
CD Telephony 96.11
Math-‐Func Telephony 90.23
RPC Telephony 147.44
C+CD Telephony 109.59
Ebay Web (Number) 80.25
Google Web (Number) 123.49
Yahoo Web (Number) 95.88
ReCaptcha Web (Alphabet) 120.64
Slashdot Web (Alphabet) 122.57
Results:H1 � H1: Close to the expected / correct answers even though the overall CAPTCHA solving accuracy is low.
16
1 2 3 4 5 6 70
5
10
15
20
25
30
35
40
45
50
Edit Distance
Num
ber o
f Cap
tcha
Yahoo!
eBay
Slashdot
RPC
Math−Function
CD
C+CD
17
Math function, but we noticed a negative relationship with correlation coefficient r =� 0.47 forweb-based captcha. Finally, we found significant difference (t-test, t-value = 5.30 p-value <0.001) between Expected Key Press (Average DTMF) and accuracy in statistical results showsthat these two were independent of each other. The results mentioned above do not approve ourhypothesis H2.
Table 3: Presents the Average DTMF expected for captcha (Avg. DTMF), accuracy, time andAverage DTMF input by users (Avg. User DTMF) of each captcha.
Scheme Category Avg.DTMF
Accuracy Time Avg. UserDTMF
CD Telephony 1.00 18.71 96.11 1.76Math-function
Telephony 2.05 17.47 90.23 2.71
RPC Telephony 3.00 15.47 147.44 3.92C + CD Telephony 2.06 4.57 109.59 2.65Ebay Web 6.00 8.75 80.25 3.85Google Web 6.36 0.00 123.49 4.68Yahoo Web 7.09 7.74 95.88 4.99Slashdot Web 15.34 13.73 120.64 6.02ReCaptcha Web 64.93 0.00 122.57 10.97
H3 – Time vs. Number of key press: Table 3 shows that users spent varying amount of timein submitting a comparable number of DTMF responses. For example, the average time spentfor Google was 123.49 seconds (min: 17.15 and max: 341.21) whereas for Yahoo, it was 95.88seconds (min: 25.88 and max: 278.00), although both of them had same average DTMF (5) toinput. There was a significant difference between the time taken to solve Google vs. Yahoo!(t-Test, t-value = -12.39, p-value < 0.01). Further, we found a correlation (r =0.85) betweentime spent and DTMF input for Math-function captcha, suggesting increase in the time wasproportionate to DTMF input. However, this correlation dropped to r = 0.56 for web-basedcaptcha, implying an absence of any strong relativity between time and DTMF input. Theresults from our study suggest lack of any strong relationship between the time spent by theparticipants in solving a captcha and the number of DTMF input from them. We found that thecorrelation between the time spent to answer the captcha and DTMF response from the userswas 0.36 for all the captcha used in our study. We found significant difference (t-test, t-value =4.33, p-value = 0.00045) between number of key press (Average User DTMF) and accuracy instatistical results suggesting that these two were independent of each other. We further tested,if the duration for which a captcha is played influences the accuracy but found that exposingusers longer to a captcha did not help improve solving accuracy. Figure 5 shows the averageplaytime of the number web-based captcha (eBay, Yahoo, Google) varied from as low as 3.7to 34.4 seconds where all these required similar number of DTMFs to be recognized. Googlecaptcha provided a feature to repeat the challenge in each attempt, without users asking for itexplicitly, irrespective of these; the correct response was 0% for Google and 8.75% for eBay.
12
• H2: Accuracy of answering the CAPTCHA correctly on telephony decreases as the number of key presses required increases
Results: H2
Results: H3 � H3: Users will take more Dme responding to a CAPTCHA that requires more key presses than to the one requiring less key presses.
0"
10"
20"
30"
40"
50"
60"
70"
80"
�Ebay"" Google"" Yahoo"" Slashdot"" Recaptcha"
Key"press"(#)" Accuracy"(%)" Avg"play"Gme(sec)"
18
User Experience
0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%#
#Strongly#disagree#
Disagree#
Nuetral#
Agree#
Strongly#Agree#
Complexity# Frequently#use# Confidence# Technical#help#
Figure 6: Users reported the system to be complex,not usable, suggesting the need for technical helpfor using captcha over telephony.
types. Users took the least of 80.25 seconds to solve eBaycaptcha. There was no significant di↵erence in time takenby users for solving web-based captcha to telephony-basedcaptcha. We found that the most time consuming captchaon telephony was RPC captcha with an average solving timeof 147.44 seconds. Among the web-based captcha, Google,ReCaptcha and Slashdot were the most time consuming withmean greater than 120 seconds (min: 120.64 seconds andmax: 123.49 seconds). On analyzing the existing studieson the web, we found that Google and ReCaptcha were themost time consuming with a mean value greater than 25 sec-onds. However, users took on an average 12 seconds to solveSlashdot captcha on the web [8] in comparison to 122.57seconds on telephony. The increase in the solving time in-dicates inconvenience caused to users for solving captcha ontelephony is more than that on the web.
6.1.3 User ExperienceAs the ultimate assessment metric to understand the in-
convenience caused, we study the feedback provided by theparticipants for di↵erent captcha types. About 50% of theparticipants found the system (on which users called to an-swer captcha in our study) extremely complex to use, andonly 17% felt confident using this system (as shown in Fig-ure 6). Users’ feedback suggests that they did notlike alphabet captcha at all. When asked about whichcaptcha they would prefer, numeric, alphabetic or mathfunction, only 14.44% of the users preferred use of alpha-bet audio challenge. Most of the users (52.22%) preferrednumeric captcha (Contextual Degradation, Random menucaptcha and numeric web based captcha) and 33.33% of theusers favored captcha involving math functions i.e. math-function and math-function with contextual degradation.We also analyzed if age has any e↵ect on captcha preferenceof the participants. Participants in the age group of 36 to50 did not wish to use the Alphabet captcha at all, whereas numeric captcha was appreciated the most among all agegroups.
Next, we wanted to analyze if participants felt that us-ing speakerphones or headphones would help them solvethe captcha better. We found that 30% of the users dis-agreed and 8.89% strongly disagreed, feeling that speaker-phones will be of no use. A participant mentioned ‘ ‘the voice
0%#
10%#
20%#
30%#
40%#
50%#
60%#
70%#
80%#
90%#
100%#
18-24# 25-35# 36-50# 51-65#
Par$cipa
nts+(%)++
Age+
#Numeric# #Mathfunc;on# #Alphabets#
Figure 7: Participants in various age groups foundthe numeric captcha to be most usable whereas al-phabet captcha was least preferred by the partici-pants.
recording was not clear therefore any audio accessory wouldnot have helped” disagreeing with the use of speakerphones.Some participants supported the use of speakerphones with24.44% agreeing and 8.89% strongly agreeing. We also askedthe participants if they felt the use of headphones wouldhelp them solve the captcha better. The results show that15.56% of the users strongly agreed and 70% agreed; usingheadphones would help them respond better to the chal-lenge. The users also complained about audio captcha notbeing clear and being less audible.In order to understand the user perspective about the
mode of input, we asked the users if they would prefer to re-spond verbally to the captcha challenge; 36.67% of the usersagreed with the statement and 28.89% strongly agreed. Thissuggests that entering responses using a keypad is di�cultand causes trouble for respondents; this might be one of theprimary reasons for errors in the captcha responses leadingto low solving accuracy in the study [25]. Further, we sug-gest the need for further research to investigate what makesan audio captcha easier to answer - verbal response or key-pad touch.
System Usability Scale (SUS).We calculated the SUS score for the telephony captcha
as 38.42, which is extremely low. 9 We further analyzedthe raw comments from the users. The users mentionedfollowing problems with respect to audio captcha on tele-phony: accent, clarity of the voice, and noise. Participants(35.71%) complained about voice being not clear, a partici-pant commented“It sounded like ghost voices. I was not ableto understand almost any utterance,”with another user feel-ing “The voice is not clear and the words are not distinctlyspoken.” Around 8.92% explicitly complained about the ac-cent in the voice captcha stating, “at least the accent shouldbe better,” another participant mentioned, “with Accent bet-ter, the system is good to use.” Another issue, which wasmentioned, was noise. Participants (17.85%) complainedabout the noise in the audios being very disturbing. A par-ticipant commented “too many disturbances and the accentwas bad and 80% of time I couldn’t understand,”and another
9Given that SUS is 68 for average usable sys-tem.http://www.measuringusability.com/sus.php
9
19
� User friendliness
0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%#
#Strongly#disagree#
Disagree#
Nuetral#
Agree#
Strongly#Agree#
Complexity# Frequently#use# Confidence# Technical#help#
Figure 6: Users reported the system to be complex,not usable, suggesting the need for technical helpfor using captcha over telephony.
types. Users took the least of 80.25 seconds to solve eBaycaptcha. There was no significant di↵erence in time takenby users for solving web-based captcha to telephony-basedcaptcha. We found that the most time consuming captchaon telephony was RPC captcha with an average solving timeof 147.44 seconds. Among the web-based captcha, Google,ReCaptcha and Slashdot were the most time consuming withmean greater than 120 seconds (min: 120.64 seconds andmax: 123.49 seconds). On analyzing the existing studieson the web, we found that Google and ReCaptcha were themost time consuming with a mean value greater than 25 sec-onds. However, users took on an average 12 seconds to solveSlashdot captcha on the web [8] in comparison to 122.57seconds on telephony. The increase in the solving time in-dicates inconvenience caused to users for solving captcha ontelephony is more than that on the web.
6.1.3 User ExperienceAs the ultimate assessment metric to understand the in-
convenience caused, we study the feedback provided by theparticipants for di↵erent captcha types. About 50% of theparticipants found the system (on which users called to an-swer captcha in our study) extremely complex to use, andonly 17% felt confident using this system (as shown in Fig-ure 6). Users’ feedback suggests that they did notlike alphabet captcha at all. When asked about whichcaptcha they would prefer, numeric, alphabetic or mathfunction, only 14.44% of the users preferred use of alpha-bet audio challenge. Most of the users (52.22%) preferrednumeric captcha (Contextual Degradation, Random menucaptcha and numeric web based captcha) and 33.33% of theusers favored captcha involving math functions i.e. math-function and math-function with contextual degradation.We also analyzed if age has any e↵ect on captcha preferenceof the participants. Participants in the age group of 36 to50 did not wish to use the Alphabet captcha at all, whereas numeric captcha was appreciated the most among all agegroups.
Next, we wanted to analyze if participants felt that us-ing speakerphones or headphones would help them solvethe captcha better. We found that 30% of the users dis-agreed and 8.89% strongly disagreed, feeling that speaker-phones will be of no use. A participant mentioned ‘ ‘the voice
0%#
10%#
20%#
30%#
40%#
50%#
60%#
70%#
80%#
90%#
100%#
18-24# 25-35# 36-50# 51-65#
Par$cipa
nts+(%)++
Age+
#Numeric# #Mathfunc;on# #Alphabets#
Figure 7: Participants in various age groups foundthe numeric captcha to be most usable whereas al-phabet captcha was least preferred by the partici-pants.
recording was not clear therefore any audio accessory wouldnot have helped” disagreeing with the use of speakerphones.Some participants supported the use of speakerphones with24.44% agreeing and 8.89% strongly agreeing. We also askedthe participants if they felt the use of headphones wouldhelp them solve the captcha better. The results show that15.56% of the users strongly agreed and 70% agreed; usingheadphones would help them respond better to the chal-lenge. The users also complained about audio captcha notbeing clear and being less audible.In order to understand the user perspective about the
mode of input, we asked the users if they would prefer to re-spond verbally to the captcha challenge; 36.67% of the usersagreed with the statement and 28.89% strongly agreed. Thissuggests that entering responses using a keypad is di�cultand causes trouble for respondents; this might be one of theprimary reasons for errors in the captcha responses leadingto low solving accuracy in the study [25]. Further, we sug-gest the need for further research to investigate what makesan audio captcha easier to answer - verbal response or key-pad touch.
System Usability Scale (SUS).We calculated the SUS score for the telephony captcha
as 38.42, which is extremely low. 9 We further analyzedthe raw comments from the users. The users mentionedfollowing problems with respect to audio captcha on tele-phony: accent, clarity of the voice, and noise. Participants(35.71%) complained about voice being not clear, a partici-pant commented“It sounded like ghost voices. I was not ableto understand almost any utterance,”with another user feel-ing “The voice is not clear and the words are not distinctlyspoken.” Around 8.92% explicitly complained about the ac-cent in the voice captcha stating, “at least the accent shouldbe better,” another participant mentioned, “with Accent bet-ter, the system is good to use.” Another issue, which wasmentioned, was noise. Participants (17.85%) complainedabout the noise in the audios being very disturbing. A par-ticipant commented “too many disturbances and the accentwas bad and 80% of time I couldn’t understand,”and another
9Given that SUS is 68 for average usable sys-tem.http://www.measuringusability.com/sus.php
9
20
User Experience
� User preferred scheme
Guidelines
� One Dme instrucDon � Loss / Error Tolerant � Feedback � Verbal Responses
21
Thank you!! QuesDons
22