Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage Manan...

Post on 21-Dec-2015

215 views 0 download

transcript

Automated Worm Fingerprinting

Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage

Manan Sanghi

The menace

Context

Worm Detection Scan detection Honeypots Host based behavioral detection

Payload-based ???

Context

Characterization A priori vulnerability signatures

Generally manual Honeycomb

Host based Longest common subsequences

Autograph Network level automatic signature generation

Context

Containment Host quarantine String matching Connection throttling

Address Blacklisting

Content Filtering

Internet Quarantine

Worm behavior

Content Invariance Limited polymorphism e.g. encryption key portions are invariant e.g. decryption routine

Content Prevalence invariant portion appear frequently

Address Dispersion # of infected distinct hosts grow overtime reflecting different source and dest. addresses

Key Idea

Detect unknown worms on the basis of

A common exploit sequence

Rage of unique sources and destination

Content Sifting

For each string w, maintain prevalence(w): Number of times it is found in the

network traffic sources(w): Number of unique sources

corresponding to it destinations(w): Number of unique destinations

corresponding to it

If thresholds exceeded, then block(w)

Issues

How to compute prevalence(w), sources(w) and destinations(w) efficiently?

Scalable Low memory and CPU requirements Real time deployment over a Gigabit scale

link

prevalence(w)

w – entire packet Use multi-stage filters (k-ary sketches?)

w – small fixed length b Rabin fingerprints Value sampling

Value Sampling

The problem: s-b+1 substrings Solution: Sample But: Random sampling is not good enough Trick: Sample only those substrings for which

the fingerprint matches a certain pattern Since Rabin fingerprints are randomly

ditributed,

Prtrack(x)=1-e-f(x-b+1)

sources(w) & destinations(w)

Address Dispersion Counting distinct elements vs. repeating

elements Simple list or hash table is too expensive Key Idea: Bitmaps Trick : Scaled Bitmaps

Direct Bitmap

Each content source is hashed into a bitmap, the corresponding bit is set, and an alarm is raised when the number of bits set exceeds a threshold

Drawback: lose estimation of actual values of each counter

Scaled Bitmap

Idea: Subsample the range of hash space How it works?

multiple bitmaps each mapped to progressively smaller and smaller portions of the hash space.

bitmap recycled if necessary.

Result

Roughly 5 time less memory + actual estimation of address dispersion

Putting it together

Experience

System design: Sensors and Aggregators sensor sift through traffic on configurable address space

zones of responsibility aggregator coordinates real-time updates from the sensors,

coalesces related signatures and so on. Parameters:

content prevalence: 3 address dispersion threshold:30 garbage collection time: several hours

prevalence(w) threshold

Address Dispersion threshold

Garbage Collection threshold

Trace-based False Positives

Performance Processing time:

Memory Consumption: 4M bytes

Live Experience

Detect known worms: CodeRed,

Detect new worms: MyDoom, Sasser, Kibvu.B

Limitation & Extension

Variant content

Network evasion

Extension: Dealing with slow worms

Comparison

Earlybird Autograph

Infect the system with Network Data (real traces)

Rabin fingerprint

White-list/blacklist

No-prefiltering Flow-reassembly

Single sensor algorithmics + centralized aggregators

Distributed Deployment + active cooperation between

multiple sensors

On-line Off-line

Overlapping, fixed-length chunks

Non-overlapping, variable-length chunks

Qinghua Zhang

Breather

Polygraph: Automatically Generating Signatures For Polymorphic Worms

James Newsome, Brad Karp, Dawn Song

The case for polymorphic worms

Single Substring Insufficient

Sensitive: Should exist in all payload of a worm

Specific: Should be long enough to not exist in any non-worm payload

Examples

Signature Classes

Signature – set of tokens

Conjunction Signatures

Token-subsequence Signatures

Bayes Signatures

Problem Formulation

Algorithms

Preprocessing Distinct substrings of a minimum length l that

occur in at least k samples in suspicious pool

Generating signatures Conjunction signatures Token Subsequence Signatures Bayes Signatures

Wrap Up

Automated Worm Fingerprinting (OSDI 2004)

Polygraph: Automatically Generating Signatures For Polymorphic Worms

(IEEE Security Symposium 2005)

Manan Sanghi