Major Project- Security analysis by recognition of captcha

SECURITY ANALYSIS BY RECOGNITION OF

CAPTCHA

CAPTCHAS

HOW DOES IT WORK?

CAPTCHA works on a simple principal: Only solvable by Humans. CAPTCHA works on the principle that

computers cannot process the image character, while a human can easily read the CAPTCHA text. Hence it

became quite a successful scheme where a user would have to enter the characters in order to

proceed to any website.

While there exist many types of CAPTCHA, the most common one is the text based CAPTCHA where the random combination of characters of varying length is distorted into an image which, assumingly, cannot

be processed and solved by a computer script but only read and understood by the Human senses.

Once the Human enters the CAPTCHA characters, it is matched at the backend with the already known

solution and if it is 100% perfect, the user can proceed to do the tasks. Cracking the CAPTCHA has been a challenge to AI Research community, and till

date there has been so system that has been developed that was able to achieve a 100% accuracy

and efficiency rate.

CAPTCHAs has applications for practical security like

• Preventing Comment Spam in Blogs: Comment spamming to increase the index in the search engine. These bots spam the comments in blog

with index words that will increase the blog’s index higher on search engine. CAPTCHA ensures that this does not happen.

• Protecting Website Registration: Everyone uses emails! Sever websites have signups. It is humans who are supposed to sign up, however with

Registration bots several such email services and sign up websites realized that it had millions of accounts overnight, all fake generated by

the bots.• Protecting Email Addresses From Scrapers: Spammers crawl the Web

in search of email addresses posted in clear text. CAPTCHAs provide an effective mechanism to hide your email address from Web scrapers. The idea is to require users to solve a CAPTCHA before showing your

email address.

• Preventing Dictionary Attacks: A way to hack someone’s email or registration account is try millions of combinations in the password box

along with the right userid. A CAPTCHA prevents this by showing up after a number of ‘miss’ trials of logging in. Since a bot cannot solve the CAPTCHA, more trials are not possible and it doesn’t account the account in any way.

• Search Engine Bots: It is sometimes desirable to keep web pages unindexed to prevent others from finding them easily. There is an html tag to prevent search engine bots from reading web pages. The tag, however, doesn't guarantee that bots won't read a web page; it only serves to say "no bots, please." Search engine bots, since they usually belong to large

companies, respect web pages that don't want to allow them in. However, in order to truly guarantee that bots won't enter a web site, CAPTCHAs are

needed.

GOALS TO ACHIEVE• Web interface for the CAPTCHA system: Given a web page, we

construct a plug-in so that when you click a button, the CAPTCHA will be captured, passed to a recognizer, get the result back, and fill

in the CAPTCHA text box. The result is checked to see if the CAPTCHA is correctly filled. If yes, we record the CAPTCHA and the

answer in a database, for future research. Also, the recognition rate is calculated for analysis.

• Segmentation Engine: The JCAPTCHA is segmented here implemented on differed modes of segmentation. The

segmentation algorithms are based on invariants observed on hundreds of JCAPTCHA.

• Recognition Engine: Build a recognition engine for the JCAPTCHA segmented characters to identify the best answer possible.

A BRIEF FLOW:

• A CAPTCHA recognition framework consists of 3 main features:

• The front end plug-in that is used to detect the CAPTCHA on the webpage.

• The segmentation engine which segments the characters of the CAPTCHA.

• The recognizer which is responsible to identify the segmented character.

The diagram below demonstrates the framework for CAPTCHA recognition:

JCAPTCHA Recognizer Engine

• The Recognizer Engine forms the core of the JCAP1. Collecting files and removing artifacts

We observed that the JCAPTCHA image file saved by the plugin had a 2-pixel blue border. This border

was not in the original image and was an artifact created when the plugin software iMacros selected

the image to take a screen shot. This border is cropped off the image, and the new image is saved

in the Recognizer folder.

2. Segmentation• There are three modes of segmentation that is

configurable by the user.1.Fast Pixel Array mode2.Slow Pixel Array mode

3.Connected Components mode3. Recognition

• As introduced in the theory our approach to Character Recognition is based on template matching. Although, the implementation of the OCR is based very much on

explanation given in the theory, I’d like to walk you through the flow of the code talking about some of the

challenges I experienced building each function.

Screenshots

1. Image extraction using imacros

2. Extracted CAPTCHA in the specified folder

3. Pre-processed images

4. Segmentation

THANK YOU!

Date post:	10-Jun-2015
Category:	Engineering
Upload:	aarushi-mukhi
View:	278 times
Download:	3 times

Major Project- Security analysis by recognition of captcha

Engineering