Date post: | 17-Dec-2014 |
Category: |
Engineering |
Upload: | precog |
View: | 54 times |
Download: | 6 times |
Unifying the Global Response to Cybercrime
Analyzing Social and Stylometric Features to Identify Spearphishing Emails
Prateek Dewan, Anand Kashyap, Ponnurangam Kumaraguru
Indraprastha Institute of Information Technology – Delhi (IIITD), India
Unifying the Global Response to Cybercrime
Overview
• What is spearphishing? • Spearphishing and Online Social Media
• Challenges and dataset
• Feature extraction
• Classification results
• Discussion
1
Unifying the Global Response to Cybercrime
What is spearphishing? • Targeted phishing attack
• Contains contextual content instead of random messages
• Harder to detect, since spearphishing emails look more genuine
• Victims are asked to • Download malicious attachments
• Reply with sensitive information
• Click on URLs • …
2
Unifying the Global Response to Cybercrime
Why study spearphishing? • Victims are 4.5 times more likely to fall for spear
phishing, than normal phishing [1].
• One of the main entry points for Advanced Persistent Threats.
• Causes losses worth millions.
[1] M. Jakobsson. Modeling and preventing phishing attacks. In Financial Cryptography, volume 5. Citeseer, 2005.
3
Unifying the Global Response to Cybercrime
Spearphishing and social media • Social media profiles can be a good source for
the “context” part of spear phishing emails
• FBI warning on July 04, 20131
• “…emails typically contain accurate information about victims obtained from data posted on social networking sites…”
1 http://www.computerweekly.com/news/2240187487/FBI-warns-of-increased-spear-phishing-attacks
4
Unifying the Global Response to Cybercrime
Data • Emails
• Spear phishing emails (Symantec)
• Spam / phishing emails (Symantec)
• Benign emails (Enron)
• LinkedIn profiles • Recipients of emails in the three datasets mentioned
above
• LinkedIn People Search API
5
Unifying the Global Response to Cybercrime
Challenges (social features) • Limited information about victim to identify her on
social media • Only first name, last name, organization available from
victim’s email ID
• Hard to find victim on Facebook, Twitter, Google+ • Too many profiles with same first name, last name
• Work field not searchable.
6
Unifying the Global Response to Cybercrime
Challenges (social features) contd. • LinkedIn – Only network which provides searching
using work field
• People search API access restricted. • We requested for access under their Vetted API access
scheme.
• Rate limited • Only 100 requests per day per app
7
Unifying the Global Response to Cybercrime
Dataset • Emails sent to employees of 14 international
organizations
• SPEAR (Targeted spear phishing emails from Symantec) • 4,742 emails à 2,434 victims / LinkedIn profiles
• SPAM (Spam / phishing emails from Symantec) • 9,353 emails à 5,912 victims / LinkedIn profiles
• BENIGN (Sample from Enron email corpus) • 6,601 emails à 1,240 victims / LinkedIn profiles
8
Unifying the Global Response to Cybercrime
Feature set creation
SPAM
SPEAR
BENIGN
Stylometric features from emails
http://api.linkedin.com/v1/people-search:
1. firstName 2. lastName 3. organization
LinkedIn Profile(s)
Social features from LinkedIn
Final feature vector Recipient
email address
9
Unifying the Global Response to Cybercrime
Stylometric Features • Subject based (7)
• Num. words, Num. characters, Richness
• Has words: “bank”, “verify”
• isReply, isForwarded
• Attachment based (2) • Length of attachment name
• Attachment size
• Body based (9) • Num. words, Num. characters, Num. unique words
• Has words: “attach”, “suspension”, “verify your account”
• Num. newlines, Richness, function words
10
Unifying the Global Response to Cybercrime
Social Features • Location
• Connections
• Summary based (5) • Num. words, Num. Characters, Num. unique words
• Length, Richness
• Profession based (2) • Job Level (0-7)
• Job Type (0-9)
11
Unifying the Global Response to Cybercrime
Results (SPEAR v/s SPAM) Feature Set (num. features)
Classifier Random Forest J48 Decision Tree
Naïve Bayes
Subject (7) Accuracy (%) 83.91 83.10 58.87
FP Rate 0.208 0.227 0.371
Attachment (2) Accuracy (%) 97.86 96.69 69.15
FP Rate 0.035 0.046 0.218
All email (9) Accuracy (%) 98.28 97.32 68.69
FP Rate 0.024 0.035 0.221
Social (9) Accuracy (%) 81.73 76.63 65.85
FP Rate 0.229 0.356 0.445
Email + Social (18) Accuracy (%) 96.47 95.90 69.35
FP Rate 0.052 0.054 0.232
12
Unifying the Global Response to Cybercrime
Results (SPEAR v/s SPAM) contd. • Most informative features
• Attachment size
• Length of attachment name
• Subject Richness
• No. of characters in subject
• Location (from LinkedIn profile)
• No. of words in subject
• LinkedIn connections
• …
13
Unifying the Global Response to Cybercrime
Results (SPEAR v/s SPAM) contd.
14
SPEAR v/s SPAM subjects
ß Spam / phishing
Spear phishing à
15
Unifying the Global Response to Cybercrime
Results (SPEAR v/s BENIGN) Feature Set (num. features)
Classifier Random Forest J48 Decision Tree
Naïve Bayes
Subject (7) Accuracy (%) 81.19 81.11 61.75
FP Rate 0.210 0.217 0.489
Body(9) Accuracy (%) 97.17 95.62 53.81
FP Rate 0.031 0.048 0.338
All email (16) Accuracy (%) 97.39 95.84 54.14
FP Rate 0.029 0.044 0.334
Social (9) Accuracy (%) 94.48 91.79 69.76
FP Rate 0.067 0.103 0.278
Email + Social (25) Accuracy (%) 97.04 95.28 57.27
FP Rate 0.032 0.052 0.316
16
Unifying the Global Response to Cybercrime
Results (SPEAR v/s BENIGN) contd. • Most informative features
• Body richness • No. of characters in body • No. of words in body • No. of unique words in body • Location (from LinkedIn) • No. of newlines in body • Subject richness
• …
17
Unifying the Global Response to Cybercrime
Results (SPEAR v/s SPAM + BENIGN)
Feature Set (num. features)
Classifier Random Forest J48 Decision Tree
Naïve Bayes
Subject (7) Accuracy (%) 86.48 86.35 77.99
FP Rate 0.333 0.352 0.681
Social (9) Accuracy (%) 88.04 84.69 74.46
FP Rate 0.241 0.371 0.454
Email + Social (16) Accuracy (%) 89.86 88.38 73.97
FP Rate 0.202 0.248 0.381
18
Unifying the Global Response to Cybercrime
Results (SPEAR v/s SPAM + BENIGN) contd.
• Most informative features • Subject richness
• No. of characters in subject
• Location (from LinkedIn)
• LinkedIn connections
• No. of words in subject
• Email forwarded? (True / false)
• Email is a reply? (True / false)
• …
19
Unifying the Global Response to Cybercrime
Discussion • Social features features (from LinkedIn) did not help in
distinguishing spear phishing emails from non spear phishing emails. • Stylometric features from emails suffice to do so.
• Real world scenarios may be much different • Attackers may use information from other sources / social
networks, viz. Facebook, Twitter, etc.
• Dataset limitation • It is possible that no spear phishing mails in our dataset were
crafted using LinkedIn features
• We cannot conclude that such behavior would not be found outside our dataset, or in future.
20
Unifying the Global Response to Cybercrime
Thanks!
Prateek Dewan E: [email protected]
W: http://precog.iiitd.edu.in/people/prateek
21
Unifying the Global Response to Cybercrime
Backup slides…
Unifying the Global Response to Cybercrime
Results (SPEAR v/s SPAM) contd.
Unifying the Global Response to Cybercrime
Attachment names
Results (SPEAR v/s BENIGN) contd.
ß Benign emails
Spear phishing à
Unifying the Global Response to Cybercrime
Attachment types
Unifying the Global Response to Cybercrime
Details of organizations