Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

Post on 05-Jul-2015

789 views 0 download

description

Slides of my talk at CCS 2013

transcript

Shady Paths: Leveraging Surfing Crowds to Detect

Malicious Web Pages

Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna

University of California, Santa Barbara

The Web is a Dangerous Place

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 2

• Drive-by downloads• Social engineering

Current Detection Techniques

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 3

Static Analysis Dynamic Analysis

Suspicious elements in• URLs• JavaScript• Flash

Visit the web page (honeyclients)• Signs of exploitation

Obfuscation Cloaking

Can only detect attacks that exploit vulnerabilities!

Our Technique

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 4

Redirection Graphs

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 5

By analyzing the characteristics of the set of visitors and of the redirection graph, we can determine if the destination page is malicious

No need to analyze the final page!

Legitimate Uses of Redirections

• Inform that a web page has moved

• Login functionalities

• Advertisements

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 6

We cannot flag all redirections as malicious

Luckily, malicious redirection graphs look different

Malicious Redirection Graphs

Uniform software configuration

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 7

Malicious Redirection Graphs

Cross-domain redirections

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 8

evil.co.cc malicious.ru

Malicious Redirection Graphs

“Hubs” to aggregate traffic

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 9

Malicious Redirection Graphs

“Infected” websites

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 10

System Overview

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 11

Our System: SpiderWeb

We leverage the differences between legitimate and malicious redirection graphs for detection

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 12

Three components:• Data collection• Creation of redirection graphs• Classification component

Data Collection

SpiderWeb needs a set of navigation data from a diverse population of users

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 13

Dataset obtained from a large AV vendor• Users of a browser

security tool• Data collection was opt-

in only• Data was anonymized

Creation of Redirection Graphs

When we specify the final page, we allow wildcards (e.g., malicious.com/*) → Groupings

We need to discard groupings that are too general

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 14

a.com

b.com

c.com

c.com

d.com

d.com

c.com d.com

Classification Component

Five categories of features• Client features (3 features)• Referrer features (4 features)• Landing page features (4 features)• Final page features (5 features)

Distinct URLs, Parameters, TLD, Domain is an IP

• Redirection graph features (12 features)Length of chains, same country across referrer and final page, intra-domain redirections, hubs

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 15

} how diverse are these elements

We use Support Vector Machines for classification

Evaluation

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 16

Evaluation Dataset

388,098 redirection chains, collected over two months

• 34,011 final URLs

• 13,780 distinct user IP addresses per week

• 145 countries

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 17

Labeled dataset for training• 2,533 redirection chains leading to 1,854 malicious URLs• 2,466 redirection chains leading to 510 legitimate URLs

Analysis of the Classifier

SpiderWeb’s performance depends on the redirection graph complexity

• Complexity ≥ 6 causes no FPs and no FNs

• Our dataset is limited → we discard graphs with complexity < 4

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 18

We need to accept a certain amount of FPs and FNsFull URL grouping: 1.2% FP rate, 17% FN rate

Redirection-graph specific features are the most important:Without them, FNs raise to 67%

Detection in the Wild

3,549 redirection graphs with complexity ≥ 4

564 flagged as malicious → 3,368 URLs

778 URLs undetected by the AV vendor

• We could not confirm 1.5% of them

• Effectively complements state of the art

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 19

Comparison with Previous WorkA few previous systems leverage redirection information to detect malicious web pages

These systems also use other type of information

• WarningBird: uses Twitter profile information

• SURF: SEO specific

If this additional information is not present, SpiderWeboutperforms previous systems

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 20

Possible Use Cases

Offline detection (blacklist)

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 21

Online detection

Users get infected until the required “complexity” is reached

We performed a chronological experimentSpiderWeb would have protected 93% users

Discussion

Limitations

• Graphs with high complexity are required

• Groupings are not perfect

• Attackers might redirect users to legitimate pages

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 22

Attackers might make their redirections look legitimate• Stop using cloaking (easier to detect by previous work)• Stop using hubs (raises the bar)

Conclusions

• We showed that malicious and legitimate redirection graphs differ

• We presented a system that analyzes redirection graphs to detect malicious web pages

• We showed that our system is effective, and complements existing systems

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 23

Questions?gianluca@cs.ucsb.edu

@gianlucaSB

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages 24