+ All Categories
Transcript
Page 1: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Analyzing Social and Stylometric Features to Identify Spearphishing Emails

Prateek Dewan, Anand Kashyap, Ponnurangam Kumaraguru

Indraprastha Institute of Information Technology – Delhi (IIITD), India

Page 2: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Overview

•  What is spearphishing? •  Spearphishing and Online Social Media

•  Challenges and dataset

•  Feature extraction

•  Classification results

•  Discussion

1

Page 3: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

What is spearphishing? •  Targeted phishing attack

•  Contains contextual content instead of random messages

•  Harder to detect, since spearphishing emails look more genuine

•  Victims are asked to •  Download malicious attachments

•  Reply with sensitive information

•  Click on URLs •  …

2

Page 4: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Why study spearphishing? •  Victims are 4.5 times more likely to fall for spear

phishing, than normal phishing [1].

•  One of the main entry points for Advanced Persistent Threats.

•  Causes losses worth millions.

[1] M. Jakobsson. Modeling and preventing phishing attacks. In Financial Cryptography, volume 5. Citeseer, 2005.

3

Page 5: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Spearphishing and social media •  Social media profiles can be a good source for

the “context” part of spear phishing emails

•  FBI warning on July 04, 20131

•  “…emails typically contain accurate information about victims obtained from data posted on social networking sites…”

1 http://www.computerweekly.com/news/2240187487/FBI-warns-of-increased-spear-phishing-attacks

4

Page 6: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Data •  Emails

•  Spear phishing emails (Symantec)

•  Spam / phishing emails (Symantec)

•  Benign emails (Enron)

•  LinkedIn profiles •  Recipients of emails in the three datasets mentioned

above

•  LinkedIn People Search API

5

Page 7: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Challenges (social features) •  Limited information about victim to identify her on

social media •  Only first name, last name, organization available from

victim’s email ID

•  Hard to find victim on Facebook, Twitter, Google+ •  Too many profiles with same first name, last name

•  Work field not searchable.

6

Page 8: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Challenges (social features) contd. •  LinkedIn – Only network which provides searching

using work field

•  People search API access restricted. •  We requested for access under their Vetted API access

scheme.

•  Rate limited •  Only 100 requests per day per app

7

Page 9: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Dataset •  Emails sent to employees of 14 international

organizations

•  SPEAR (Targeted spear phishing emails from Symantec) •  4,742 emails à 2,434 victims / LinkedIn profiles

•  SPAM (Spam / phishing emails from Symantec) •  9,353 emails à 5,912 victims / LinkedIn profiles

•  BENIGN (Sample from Enron email corpus) •  6,601 emails à 1,240 victims / LinkedIn profiles

8

Page 10: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Feature set creation

SPAM

SPEAR

BENIGN

Stylometric  features  from  emails  

http://api.linkedin.com/v1/people-search:

1. firstName 2. lastName 3. organization  

LinkedIn Profile(s)

Social  features  from  LinkedIn  

Final  feature  vector  Recipient

email address

9

Page 11: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Stylometric Features •  Subject based (7)

•  Num. words, Num. characters, Richness

•  Has words: “bank”, “verify”

•  isReply, isForwarded

•  Attachment based (2) •  Length of attachment name

•  Attachment size

•  Body based (9) •  Num. words, Num. characters, Num. unique words

•  Has words: “attach”, “suspension”, “verify your account”

•  Num. newlines, Richness, function words

10

Page 12: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Social Features •  Location

•  Connections

•  Summary based (5) •  Num. words, Num. Characters, Num. unique words

•  Length, Richness

•  Profession based (2) •  Job Level (0-7)

•  Job Type (0-9)

11

Page 13: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s SPAM) Feature Set (num. features)

Classifier Random Forest J48 Decision Tree

Naïve Bayes

Subject (7) Accuracy (%) 83.91 83.10 58.87

FP Rate 0.208 0.227 0.371

Attachment (2) Accuracy (%) 97.86 96.69 69.15

FP Rate 0.035 0.046 0.218

All email (9) Accuracy (%) 98.28 97.32 68.69

FP Rate 0.024 0.035 0.221

Social (9) Accuracy (%) 81.73 76.63 65.85

FP Rate 0.229 0.356 0.445

Email + Social (18) Accuracy (%) 96.47 95.90 69.35

FP Rate 0.052 0.054 0.232

12

Page 14: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s SPAM) contd. •  Most informative features

•  Attachment size

•  Length of attachment name

•  Subject Richness

•  No. of characters in subject

•  Location (from LinkedIn profile)

•  No. of words in subject

•  LinkedIn connections

•  …

13

Page 15: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s SPAM) contd.

14

Page 16: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

SPEAR v/s SPAM subjects

ß Spam / phishing

Spear phishing à

15

Page 17: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s BENIGN) Feature Set (num. features)

Classifier Random Forest J48 Decision Tree

Naïve Bayes

Subject (7) Accuracy (%) 81.19 81.11 61.75

FP Rate 0.210 0.217 0.489

Body(9) Accuracy (%) 97.17 95.62 53.81

FP Rate 0.031 0.048 0.338

All email (16) Accuracy (%) 97.39 95.84 54.14

FP Rate 0.029 0.044 0.334

Social (9) Accuracy (%) 94.48 91.79 69.76

FP Rate 0.067 0.103 0.278

Email + Social (25) Accuracy (%) 97.04 95.28 57.27

FP Rate 0.032 0.052 0.316

16

Page 18: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s BENIGN) contd. •  Most informative features

•  Body richness •  No. of characters in body •  No. of words in body •  No. of unique words in body •  Location (from LinkedIn) •  No. of newlines in body •  Subject richness

•  …

17

Page 19: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s SPAM + BENIGN)

Feature Set (num. features)

Classifier Random Forest J48 Decision Tree

Naïve Bayes

Subject (7) Accuracy (%) 86.48 86.35 77.99

FP Rate 0.333 0.352 0.681

Social (9) Accuracy (%) 88.04 84.69 74.46

FP Rate 0.241 0.371 0.454

Email + Social (16) Accuracy (%) 89.86 88.38 73.97

FP Rate 0.202 0.248 0.381

18

Page 20: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s SPAM + BENIGN) contd.

•  Most informative features •  Subject richness

•  No. of characters in subject

•  Location (from LinkedIn)

•  LinkedIn connections

•  No. of words in subject

•  Email forwarded? (True / false)

•  Email is a reply? (True / false)

•  …

19

Page 21: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Discussion •  Social features features (from LinkedIn) did not help in

distinguishing spear phishing emails from non spear phishing emails. •  Stylometric features from emails suffice to do so.

•  Real world scenarios may be much different •  Attackers may use information from other sources / social

networks, viz. Facebook, Twitter, etc.

•  Dataset limitation •  It is possible that no spear phishing mails in our dataset were

crafted using LinkedIn features

•  We cannot conclude that such behavior would not be found outside our dataset, or in future.

20

Page 22: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Thanks!

Prateek Dewan E: [email protected]

W: http://precog.iiitd.edu.in/people/prateek

21

Page 23: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Backup slides…

Page 24: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Results (SPEAR v/s SPAM) contd.

Page 25: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Attachment names

Page 26: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Results (SPEAR v/s BENIGN) contd.

ß Benign emails

Spear phishing à

Page 27: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Attachment types

Page 28: Analyzing Social and Stylometric Features to Identify Spear phishing Emails

Unifying the Global Response to Cybercrime

Details of organizations


Top Related