Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | alice-benson |
View: | 214 times |
Download: | 0 times |
IMAGINATION: A Robust Image-based CAPTCHA Generation System
Ritendra Datta, Jia Li, and James Z. WangThe Pennsylvania State University – University Park
ACM International Conference on Multimedia, November 2005
What are CAPTCHAs1,2 ? Completely Automated Public Test to
Tell Computers and Humans Apart. Web-based protection mechanisms Only humans allowed to perform certain
tasks` Opening E-mail accounts Voting on-line, etc.
Prevent automated attacks by bots To avoid eating up resources To avoid biasing results, etc.
Most current systems - text-based.Text-based CAPTCHAs
1. L. von Ahn et al., CACM, 2004.2. The CAPTCHA Project – http://www.captcha.net
Why image-based CAPTCHAs ?
Computer vision techniques1,2,3 have broken text-based CAPTCHAs Over 90% accuracy Makes these systems vulnerable
Solution More noise – harder for humans too Natural image based CAPTCHAs
Present an image to the user User labels content
Hard to attack Image recognition is a hard problem Hence more secure CAPTCHAs !
1. G. Mori et al., CVPR, 2003. 2. A. Thayananthan et al., CVPR, 2004.3. G. Moy et al., CVPR, 2004.
Image-based CAPTCHAs
(Courtesy: The Captcha Project, CMU)
What’s the problem ?
CBIR (e.g. SIMPLIcity) and automated annotation systems (e.g. ALIP) may attack
Solution: Generate CAPTCHA images that Humans can easily label Automated systems fail in most cases
How Use systematic distortions on images.
Dithering, noise, quantizing etc. Maintain low perceptual degradation Test using state-of-the-art automated
systems Optimize attack rate & perceptual
quality Generate word choices systematically to
reduce ambiguity and attack chance
SIMPLIcity and ALIP (Pictures courtesy Corel)
The IMAGINATION System
Image Generation for Internet Authentication.
Exploits the difference between human perception and current level of machine perception.
Generates a CAPTCHA based on a hard AI problem.
Breaking IMAGINATION, though highly unlikely, would in turn advance the state-of-the-art in AI.
Uses a two-phase click-and-annotate process to achieve very low chance of attack.
Click Phase – Select center of an image
Annotate Phase – Select best label from list
The IMAGINATION System: Architecture
Composite Image Generation
Composite image generation by re-partitioning and dithering using different randomly chosen base colors
Composite Distortion Selection
How to smartly choose distortions that can be applied to the images ?
Use state-of-the-art CBIR/related systems that can be potential attack weapons
Enforce probabilistic constraints on what is a good distortion Make some realistic assumptions Generate many distortions Choose a subset that satisfies
these constraints Include in the IMAGINATION
system
A tiger image distorted by four acceptable composite distortions
Composite Distortions: Probabilistic Constraints
An image distortion is considered acceptable, if probabilistically, potential attack algorithms are unable to significantly reduce the uncertainty associated with the labeling of those images
Composite Distortions in IMAGINATION
Schematic view of the four composite distortions satisfying the probabilistic constraints and hence chosen for the IMAGINATION system
Word Choice Generation
User choose instead of types: Avoid spelling mistakes, polysemy
etc. More user-friendly (critical) But leads to higher attack chance !
Three issues with choice list generation Ambiguity (e.g. Dog and Wolf) Attack using word choices
themselves (Odd-one-out) Multiple valid labels
Solution Use the WordNet ontology Solve heuristically by constructing a
word hyper-tetrahedron
W1 W2
W4W3
d1,3d2,4,
d1,4d1,3
d1,2
d3,4
A word hyper-tetrahedron (K=4)
Wk = word choice, k = {1, …, K}
di,j = WordNet distance between Wi & Wj
Constraint: di,j ≈ δ, for all (i,j)
Conclusions
New form of CAPTCHA Likely to be more robust against attacks
Some issues Need more rigorous testing against many attack scenarios User-friendliness is critical – needs large-scale testing
Given these issues are somewhat addressed Promise of a more secure Internet Web servers more reliable Potential for commercialization