+ All Categories
Home > Engineering > Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Date post: 17-Dec-2014
Category:
Upload: precog
View: 54 times
Download: 6 times
Share this document with a friend
Description:
Targeted social engineering attacks in the form of spear phishing emails, are often the main gimmick used by attackers to infiltrate organizational networks and implant state- of-the-art Advanced Persistent Threats (APTs). Spear phishing is a complex targeted attack in which, an attacker harvests information about the victim prior to the attack. This infor- mation is then used to create sophisticated, genuine-looking attack vectors, drawing the victim to compromise confidential information. What makes spear phishing different, and more powerful than normal phishing, is this contextual information about the victim. Online social media services can be one such source for gathering vital information about an individual. In this paper, we characterize and examine a true positive dataset of spear phishing, spam, and normal phishing emails from Symantec’s enterprise email scanning service. We then present a model to detect spear phishing emails sent to employees of 14 international organizations, by using social features extracted from LinkedIn. Our dataset consists of 4,742 targeted attack emails sent to 2,434 victims, and 9,353 non targeted attack emails sent to 5,912 non victims; and publicly available information from their LinkedIn profiles. We applied various machine learning algorithms to this labeled data, and achieved an overall maximum accuracy of 97.76% in identifying spear phishing emails. We used a combination of social features from LinkedIn profiles, and stylometric features extracted from email subjects, bodies, and attachments. However, we achieved a slightly better accuracy of 98.28% without the social features. Our analysis revealed that social features extracted from LinkedIn do not help in identifying spear phishing emails. To the best of our knowledge, this is one of the first attempts to make use of a combination of stylometric features extracted from emails, and social features extracted from an online social network to detect targeted spear phishing emails.
28
Unifying the Global Response to Cybercrime Analyzing Social and Stylometric Features to Identify Spearphishing Emails Prateek Dewan, Anand Kashyap, Ponnurangam Kumaraguru Indraprastha Institute of Information Technology – Delhi (IIITD), India
Transcript
Page 1: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Analyzing Social and Stylometric Features to Identify Spearphishing Emails

Prateek Dewan, Anand Kashyap, Ponnurangam Kumaraguru

Indraprastha Institute of Information Technology – Delhi (IIITD), India

Page 2: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Overview

•  What is spearphishing? •  Spearphishing and Online Social Media

•  Challenges and dataset

•  Feature extraction

•  Classification results

•  Discussion

1

Page 3: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

What is spearphishing? •  Targeted phishing attack

•  Contains contextual content instead of random messages

•  Harder to detect, since spearphishing emails look more genuine

•  Victims are asked to •  Download malicious attachments

•  Reply with sensitive information

•  Click on URLs •  …

2

Page 4: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Why study spearphishing? •  Victims are 4.5 times more likely to fall for spear

phishing, than normal phishing [1].

•  One of the main entry points for Advanced Persistent Threats.

•  Causes losses worth millions.

[1] M. Jakobsson. Modeling and preventing phishing attacks. In Financial Cryptography, volume 5. Citeseer, 2005.

3

Page 5: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Spearphishing and social media •  Social media profiles can be a good source for

the “context” part of spear phishing emails

•  FBI warning on July 04, 20131

•  “…emails typically contain accurate information about victims obtained from data posted on social networking sites…”

1 http://www.computerweekly.com/news/2240187487/FBI-warns-of-increased-spear-phishing-attacks

4

Page 6: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Data •  Emails

•  Spear phishing emails (Symantec)

•  Spam / phishing emails (Symantec)

•  Benign emails (Enron)

•  LinkedIn profiles •  Recipients of emails in the three datasets mentioned

above

•  LinkedIn People Search API

5

Page 7: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Challenges (social features) •  Limited information about victim to identify her on

social media •  Only first name, last name, organization available from

victim’s email ID

•  Hard to find victim on Facebook, Twitter, Google+ •  Too many profiles with same first name, last name

•  Work field not searchable.

6

Page 8: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Challenges (social features) contd. •  LinkedIn – Only network which provides searching

using work field

•  People search API access restricted. •  We requested for access under their Vetted API access

scheme.

•  Rate limited •  Only 100 requests per day per app

7

Page 9: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Dataset •  Emails sent to employees of 14 international

organizations

•  SPEAR (Targeted spear phishing emails from Symantec) •  4,742 emails à 2,434 victims / LinkedIn profiles

•  SPAM (Spam / phishing emails from Symantec) •  9,353 emails à 5,912 victims / LinkedIn profiles

•  BENIGN (Sample from Enron email corpus) •  6,601 emails à 1,240 victims / LinkedIn profiles

8

Page 10: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Feature set creation

SPAM

SPEAR

BENIGN

Stylometric  features  from  emails  

http://api.linkedin.com/v1/people-search:

1. firstName 2. lastName 3. organization  

LinkedIn Profile(s)

Social  features  from  LinkedIn  

Final  feature  vector  Recipient

email address

9

Page 11: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Stylometric Features •  Subject based (7)

•  Num. words, Num. characters, Richness

•  Has words: “bank”, “verify”

•  isReply, isForwarded

•  Attachment based (2) •  Length of attachment name

•  Attachment size

•  Body based (9) •  Num. words, Num. characters, Num. unique words

•  Has words: “attach”, “suspension”, “verify your account”

•  Num. newlines, Richness, function words

10

Page 12: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Social Features •  Location

•  Connections

•  Summary based (5) •  Num. words, Num. Characters, Num. unique words

•  Length, Richness

•  Profession based (2) •  Job Level (0-7)

•  Job Type (0-9)

11

Page 13: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s SPAM) Feature Set (num. features)

Classifier Random Forest J48 Decision Tree

Naïve Bayes

Subject (7) Accuracy (%) 83.91 83.10 58.87

FP Rate 0.208 0.227 0.371

Attachment (2) Accuracy (%) 97.86 96.69 69.15

FP Rate 0.035 0.046 0.218

All email (9) Accuracy (%) 98.28 97.32 68.69

FP Rate 0.024 0.035 0.221

Social (9) Accuracy (%) 81.73 76.63 65.85

FP Rate 0.229 0.356 0.445

Email + Social (18) Accuracy (%) 96.47 95.90 69.35

FP Rate 0.052 0.054 0.232

12

Page 14: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s SPAM) contd. •  Most informative features

•  Attachment size

•  Length of attachment name

•  Subject Richness

•  No. of characters in subject

•  Location (from LinkedIn profile)

•  No. of words in subject

•  LinkedIn connections

•  …

13

Page 15: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s SPAM) contd.

14

Page 16: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

SPEAR v/s SPAM subjects

ß Spam / phishing

Spear phishing à

15

Page 17: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s BENIGN) Feature Set (num. features)

Classifier Random Forest J48 Decision Tree

Naïve Bayes

Subject (7) Accuracy (%) 81.19 81.11 61.75

FP Rate 0.210 0.217 0.489

Body(9) Accuracy (%) 97.17 95.62 53.81

FP Rate 0.031 0.048 0.338

All email (16) Accuracy (%) 97.39 95.84 54.14

FP Rate 0.029 0.044 0.334

Social (9) Accuracy (%) 94.48 91.79 69.76

FP Rate 0.067 0.103 0.278

Email + Social (25) Accuracy (%) 97.04 95.28 57.27

FP Rate 0.032 0.052 0.316

16

Page 18: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s BENIGN) contd. •  Most informative features

•  Body richness •  No. of characters in body •  No. of words in body •  No. of unique words in body •  Location (from LinkedIn) •  No. of newlines in body •  Subject richness

•  …

17

Page 19: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s SPAM + BENIGN)

Feature Set (num. features)

Classifier Random Forest J48 Decision Tree

Naïve Bayes

Subject (7) Accuracy (%) 86.48 86.35 77.99

FP Rate 0.333 0.352 0.681

Social (9) Accuracy (%) 88.04 84.69 74.46

FP Rate 0.241 0.371 0.454

Email + Social (16) Accuracy (%) 89.86 88.38 73.97

FP Rate 0.202 0.248 0.381

18

Page 20: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s SPAM + BENIGN) contd.

•  Most informative features •  Subject richness

•  No. of characters in subject

•  Location (from LinkedIn)

•  LinkedIn connections

•  No. of words in subject

•  Email forwarded? (True / false)

•  Email is a reply? (True / false)

•  …

19

Page 21: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Discussion •  Social features features (from LinkedIn) did not help in

distinguishing spear phishing emails from non spear phishing emails. •  Stylometric features from emails suffice to do so.

•  Real world scenarios may be much different •  Attackers may use information from other sources / social

networks, viz. Facebook, Twitter, etc.

•  Dataset limitation •  It is possible that no spear phishing mails in our dataset were

crafted using LinkedIn features

•  We cannot conclude that such behavior would not be found outside our dataset, or in future.

20

Page 22: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Thanks!

Prateek Dewan E: [email protected]

W: http://precog.iiitd.edu.in/people/prateek

21

Page 23: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Backup slides…

Page 24: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s SPAM) contd.

Page 25: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Attachment names

Page 26: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Results (SPEAR v/s BENIGN) contd.

ß Benign emails

Spear phishing à

Page 27: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Attachment types

Page 28: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Details of organizations


Recommended