+ All Categories
Home > Documents > Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma ([email protected]) ReMiND Laboratory...

Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma ([email protected]) ReMiND Laboratory...

Date post: 23-Dec-2015
Category:
Upload: francis-cunningham
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
43
Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma ([email protected]) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks 12/20/2014 1
Transcript
Page 1: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

Verma - ICISS 2014

ReasoningMiningNLPDefense

Rakesh M. Verma ([email protected])

ReMiND Laboratory

Catching Classical and Hijack-based Phishing Attacks

12/20/2014

1

Page 2: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

2Digital Identity and Phishing

Page 3: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

Classical Phishing AttacksSend emailcontaining•Bad link, and•Loss, urgency, or

incentive

Plant a link•Internet forums•Social networks •Chat or bulletin

boards

12/20/2014Verma - ICISS 2014

3

Page 4: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

Hijack-based Attacks

Hijack a legal server and plant a phishing page

Install malware and when user types a legal target URL interpose a phishing page

Note: The URL in the address bar is legal

12/20/2014Verma - ICISS 2014

4

Page 5: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

5Motivation for Phishing

Phishing causes loss of time, productivity and monetary loss which run to billions of dollars.

Despite advances and research in phishing protection, number of victims of phishing is increasing every year.

Source: Gartner, Anti-Phishing Working Group, 2014.

Page 6: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

Phishing Detection Dimensions

Web site and address (URL)

Web site only

(e.g. “Account quota exceeded”)

12/20/2014Verma - ICISS 2014

6

This Paper

Page 7: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

7Evolving Phishing Trends

Phishing patterns are constantly evolving.

So we want to detect phishing patterns based on the fundamental characteristics of a phishing website.

Page 8: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

8Characteristics of Phishing Website

URL

Content

Behavior

Page 9: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

9URL Characteristics

Disguise URL with Targets (APWG: 45 - 50%) Top Level Domain (TLD) gets misplaced

Page 10: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

10Content Characteristics

External sources of images, styles from target site, to mimic the appearance.

Page Contents (Text) resemble target site Unencrypted sessions

Page 11: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

11Behavior Characteristics

Page 12: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

12Behavior Characteristics

Page 13: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

13Behavior Characteristics

Page 14: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

14Behavior Characteristics

Page 15: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

15Objective

Distinguish characteristics of classical and hijack based phishing sites

Develop an algorithm for detection

Page 16: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

16Approach

1

•Develop Algorithm

•To detect characteristics

2

•Test Algorithm

•Dataset from PhishTank, Alexa and DMOZ

3

•Evaluate Algorithm

•Against Google Safe Browsing (GSB) Phishing detection

Page 17: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

17DEVELOPING THE ALGORITHM

Page 18: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

18Algorithm

Decision Model URL Classifiers Content Classifiers Behavior Classifier

The Classification Classes: Phishing , Legitimate

Page 19: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

19URL Classifiers

URL is checked for presence of target domain and extra top level domain (TLD) at non-TLD place

U1 - Targets in URL Targets are whitelisted domains (n=5000) and

some popular targets identified in security blogs (n=50). (total of 5050)

Applies regular expression on URL to detect targets

Page 20: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

20URL Classifiers

U2 - Misplaced Top-Level Domains (TLDs) 7 most targeted TLDs

(.com, .net, .org, .gov, .edu, .info, .biz) Applies regular expression to detect additional

TLDs.

Page 21: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

21URL Classifiers

U3 - General Characteristics of URL Detects Phishing URL based on the following features:

Length of the domain

Number of @ symbols, hyphens, punctuation symbols, top-level domains, target words, suspicious words

Whether or not the URL is an IP address, and

Euclidean and Kolmogorov-Smirnov (KS) distances between the distribution of characters in the URL and the distribution of characters in standard English text

Development:

Used PART algorithm to set optimal thresholds for the features.

Dataset: 10600 Random Alexa URLs with 9640 PhishTank URLs

10 fold cross validation: TPR= 94.66, FPR= 2.04

Page 22: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

22Content Classifier

C1 – More Redirection: Ratio of Internal Links to Total Links in source Anchor tags, Link tags

Script tags, Images tags

C2 – Copy Detection : Compare given page with targets Terms (words)

Top Terms (~21)

Random Top Terms (~11)

IDs used for tags

C3 - Unsecure Password Handling Checks SSL on submit page and result page

Page 23: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

23Behavior Classifier

B1 – Real-time Form analysis Extracts action URL from forms with password

fields Analyzes contents of action URL page

Page 24: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

24TESTING OF ALGORITHM

Page 25: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

25Testing of Algorithm

Algorithm applied on dataset from PhishTank, Alexa and DMOZ

Preprocessing of data was done before algorithm was applied.

Page 26: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

26Dataset

Phishing Set 17200 PhishTank URLs

Legitimate Set 17200 DMOZ

Whitelist Top-5000 domains from Alexa

Page 27: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

27Preprocessing

Remove URLs redirecting to any URL already in the dataset

Remove offline (including 404 response), and other inaccessible URLs (timeout > 10 second)

If response is 200, read final landing page URL and HTML contents. Check landing URL against whitelist Check for password input field in body

Page 28: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

28Metrics

Classified as Phishing

Classified as Legitimate

Phishing pages TP FN

Legitimate pages FP TN

True Positive Rate (TPR) =

False Positive Rate (FPR) =

Precision (PR) =

F1 Score =

Page 29: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

29Algorithm

URL Classifier U1 - Targets in URL U2 - Misplaced TLD U3 - General Characteristics of URL

Content Classifier C1- More Redirection C2 - Copy Detection C3 - Unsecure Password Handling

Behavioral Classifier B1 - Real-time Form Analysis

Page 30: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

Models

URLYes

NoU1 – Target in URL

Yes

NoU2 – Misplaced TLD

Yes

NoU3 – Gen.

Characteristics of URL

Yes

NoC1 – More Redirection

Yes

NoC2 – Copy Detection

Yes

NoC3 – Unsecure Pwd.

Handling

Yes

NoB1 – Realtime Form

Analysis

Combination Phishing URL Condition

OR(U1 OR U2 OR U3)

OR

(C1 OR C2 OR C3 OR B1)

AND(U1 OR U2 OR U3)

AND

(C1 OR C2 OR C3 OR B1)Potential

Site only(C1 OR C2 OR C3 OR

B1)

Yes>= 2

Page 31: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

31Performance of Classifiers on the dataset

Page 32: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

32Results

Combinations

Search Based Filtering = OFF Search Based Filtering = ON

TPR FPR PRF-

scoreTPR FPR PR

F-score

Or 99.97 3.50 88.25 93.75 93.37 0.54 97.84 95.55

And 87.64 1.80 92.76 90.13 82.30 0.22 98.98 89.88

Pot. 97.94 2.48 91.24 94.47 91.55 0.36 98.52 94.91

Site only

99.31 3.44 88.37 93.52 92.84 0.53 97.88 95.30

Page 33: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

33Discussion

The Or combination effectively combines URL and content based classifiers and achieves the highest detection rate of 99.97% with FPR of 3.5%.

The FPR can be dropped to 0.36% with TPR of 91.55% with the potential scheme

And has the lowest FPR with detection rate of 82.30%

Site only method has second lowest FPR of 0.53% with second highest detection rate using search-based filtering

Page 34: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

34Advantages of the Approach

Can be used effectively in zero hour environment

Can handle hijack based attacks, as they have behavioral analysis

Content language independent.

Page 35: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

35EVALUATION OF ALGORITHM

Page 36: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

36Existing Methods

Related phishing algorithms Blacklisting Xiang et al - hierarchical adaptive probabilistic approach CANTINA CANTINA+ Google Safe Browsing

Good performance, but could not compare with my algorithm Closed source No API

So used publically available Google Safe Browsing for evaluation.

Page 37: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

37Google Safe Browsing

Large-scale automatic phishing website detection

Analyzes both URL and content

Claims accuracy of 90% and FPR of 0.1%

Page 38: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

38Direct Comparison

Model

Combination

s

Search Based Filtering = OFF Search Based Filtering = ON

TPR FPR PR F-score TPR FPR PR F-

score

Ours

1 99.97 3.50 88.25 93.75 93.37 0.54 97.84 95.55

2 87.64 1.80 92.76 90.13 82.30 0.22 98.98 89.88

3 97.94 2.48 91.24 94.47 91.55 0.36 98.52 94.91

GSB 51.46 0.03 99.80 67.91

Page 39: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

39Security Analysis

If phishers get hold of this work, then they might adapt to hide from the detection techniques.

Buying genuine domain, SSL, using self signed or open-SSL can hamper some of the classifiers, but it will add to phishers’ efforts and it will reduce their profit.

If phishers, somehow, manage to get good page rank, and higher position in search results, then they can escape from being detected.

They can change the behavior of the page for hiding purposes, but this could alarm the users, and responsible users will report the URL

Page 40: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

40Conclusion

Efficient algorithms based on the fundamental characteristics of phishing websites were developed.

Algorithms have comparable or better efficacy with other established phishing detection algorithms.

A novel approach to handle hijack based attacks.

Page 41: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

41Future Work

Improve the Behavior classifier to include other phishing website behaviors.

Deploy as a browser extension to test in-field performance.

Page 42: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

Thank You

Questions?

Page 43: Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma (rverma@uh.edu) ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.

43Hijack Based Phishing Attacks

Agency for the Safety of Aerial Navigation in Africa and Madagascar (ASECNA)

April 2014

Redirected to PayPal


Recommended