Date post: | 10-Jun-2015 |
Category: |
Engineering |
Upload: | aarushi-mukhi |
View: | 278 times |
Download: | 3 times |
SECURITY ANALYSIS BY RECOGNITION OF
CAPTCHA
CAPTCHAS
HOW DOES IT WORK?
CAPTCHA works on a simple principal: Only solvable by Humans. CAPTCHA works on the principle that
computers cannot process the image character, while a human can easily read the CAPTCHA text. Hence it
became quite a successful scheme where a user would have to enter the characters in order to
proceed to any website.
While there exist many types of CAPTCHA, the most common one is the text based CAPTCHA where the random combination of characters of varying length is distorted into an image which, assumingly, cannot
be processed and solved by a computer script but only read and understood by the Human senses.
Once the Human enters the CAPTCHA characters, it is matched at the backend with the already known
solution and if it is 100% perfect, the user can proceed to do the tasks. Cracking the CAPTCHA has been a challenge to AI Research community, and till
date there has been so system that has been developed that was able to achieve a 100% accuracy
and efficiency rate.
CAPTCHAs has applications for practical security like
• Preventing Comment Spam in Blogs: Comment spamming to increase the index in the search engine. These bots spam the comments in blog
with index words that will increase the blog’s index higher on search engine. CAPTCHA ensures that this does not happen.
• Protecting Website Registration: Everyone uses emails! Sever websites have signups. It is humans who are supposed to sign up, however with
Registration bots several such email services and sign up websites realized that it had millions of accounts overnight, all fake generated by
the bots.• Protecting Email Addresses From Scrapers: Spammers crawl the Web
in search of email addresses posted in clear text. CAPTCHAs provide an effective mechanism to hide your email address from Web scrapers. The idea is to require users to solve a CAPTCHA before showing your
email address.
• Preventing Dictionary Attacks: A way to hack someone’s email or registration account is try millions of combinations in the password box
along with the right userid. A CAPTCHA prevents this by showing up after a number of ‘miss’ trials of logging in. Since a bot cannot solve the CAPTCHA, more trials are not possible and it doesn’t account the account in any way.
• Search Engine Bots: It is sometimes desirable to keep web pages unindexed to prevent others from finding them easily. There is an html tag to prevent search engine bots from reading web pages. The tag, however, doesn't guarantee that bots won't read a web page; it only serves to say "no bots, please." Search engine bots, since they usually belong to large
companies, respect web pages that don't want to allow them in. However, in order to truly guarantee that bots won't enter a web site, CAPTCHAs are
needed.
GOALS TO ACHIEVE• Web interface for the CAPTCHA system: Given a web page, we
construct a plug-in so that when you click a button, the CAPTCHA will be captured, passed to a recognizer, get the result back, and fill
in the CAPTCHA text box. The result is checked to see if the CAPTCHA is correctly filled. If yes, we record the CAPTCHA and the
answer in a database, for future research. Also, the recognition rate is calculated for analysis.
• Segmentation Engine: The JCAPTCHA is segmented here implemented on differed modes of segmentation. The
segmentation algorithms are based on invariants observed on hundreds of JCAPTCHA.
• Recognition Engine: Build a recognition engine for the JCAPTCHA segmented characters to identify the best answer possible.
A BRIEF FLOW:
• A CAPTCHA recognition framework consists of 3 main features:
• The front end plug-in that is used to detect the CAPTCHA on the webpage.
• The segmentation engine which segments the characters of the CAPTCHA.
• The recognizer which is responsible to identify the segmented character.
The diagram below demonstrates the framework for CAPTCHA recognition:
JCAPTCHA Recognizer Engine
• The Recognizer Engine forms the core of the JCAP1. Collecting files and removing artifacts
We observed that the JCAPTCHA image file saved by the plugin had a 2-pixel blue border. This border
was not in the original image and was an artifact created when the plugin software iMacros selected
the image to take a screen shot. This border is cropped off the image, and the new image is saved
in the Recognizer folder.
2. Segmentation• There are three modes of segmentation that is
configurable by the user.1.Fast Pixel Array mode2.Slow Pixel Array mode
3.Connected Components mode3. Recognition
• As introduced in the theory our approach to Character Recognition is based on template matching. Although, the implementation of the OCR is based very much on
explanation given in the theory, I’d like to walk you through the flow of the code talking about some of the
challenges I experienced building each function.
Screenshots
1. Image extraction using imacros
2. Extracted CAPTCHA in the specified folder
3. Pre-processed images
4. Segmentation
THANK YOU!