1 Integrating BotMiner & SNARE into SMITE Nick Feamster and Wenke Lee Georgia Tech Students: Shuang...

Post on 27-Mar-2015

215 views 1 download

Tags:

transcript

1

Integrating BotMiner & SNARE into SMITE

Nick Feamster and Wenke LeeGeorgia Tech

Students: Shuang Hao, Junjie Zhang

2

Status Report

• Summary of BotMiner and SNARE

• Integration on GaTech campus network

• Preliminary evaluation results

• Next steps

3

SMITE Integration

4

BotMiner: Structure and Protocol Independent

• Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models …

bot

bot

bot

bot

bot

C&C

bot

bot

bot

bot

bot

bot

(a) (b)

5

Definition of a Botnet

• “A coordinated group of malware instances that are controlled by a botmaster via some C&C channel”– Hosts that have similar C&C-like traffic and similar

malicious activities

• We need to monitor two planes– C-plane (C&C communication plane): “who is talking

to whom”– A-plane (malicious activity plane): “who is doing what”

6

BotMiner Architecture

Scan

Spam

A-Plane Monitor

BinaryDownloading

C-Plane Monitor

Flow Log

C-PlaneClustering

NetworkTraffic

Exploit

...

Activity Log

A-PlaneClustering

Cross-PlaneCorrelation

Reports

SensorsAlgorithms

Correlation

7

SNARE: Network-Level Spam Filter

• Single-Packet– AS of sender’s IP– Distance to k nearest senders– Status of email service ports– Geodesic distance– Time of day

• Single-Message– Number of recipients– Length of message

• Aggregate (Multiple Message/Recipient)

8

Test Environment

• Port mirrored from College of Computing network switch– About 300 Mbps

9

Current Status

• Real-time test on college network

• Summary of results– Pipeline runs in real-time (200 to 300 Mbps)– BotMiner & SNARE run in batch mode,

detecting bots/spammers based on data of one day

– Results from 4 days of testing: September 21-24, 2009

10

Metrics

• Volume– N1: raw by pipeline.– N2: raw flows recorded. – N3-B: C-flows. (BotMiner)– N4-S: SMTP flows (SNARE)

• Time– T1: Dumping raw flows– T2-B: Aggregating raw flows to c-flows – T3-B: Clustering and correlation. – T4-S: Feature extraction

(single-packet based)– T5-S: Building classifier

(based on sampled flows)– T6-S: Detection

11

Detection Metrics

• BotMiner– TP: Detection Rate

(6 botnets including HTTP-, IRC-, P2P-based botnets).– FP: False positive rate

• SNARE– TP: (Ground truth from DNSBL)– FP: False positive rate

12

Reducing Flow Volume

• N2 (# of flows recorded) < N1 (# of raw flows)

• Policies for reducing volume– Keep the only flows whose SrcIP is from internal

networks and DstIP is to external networks• For TCP flows, to eliminate flows for scanning, we only

record flows in database which have at least 2 packets in outgoing or incoming direction.

– BotMiner detects scanning/spamming behaviors on raw flows (rather than flow recorded in database)

– SNARE works on SMTP flows

• Discard the flows whose IP appear on the whitelist (e.g., internal major HTTP/DNS)

13

Pipeline Configuration

• Device Info– Box

• Intel(R) Xeon(TM) CPU 3.00GHz

• 2G Memory

• Debian Linux 2.6.16

– NIC informationLink encap:Ethernet HWaddr 00:15:c5:e6:72:96

inet6 addr: 2610:148:1f02:8f00:215:c5ff:fee6:7296/64 Scope:Global

inet6 addr: fe80::215:c5ff:fee6:7296/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

• Pipeline Configuration-pcaplive device=eth1

-addressanalysis

-flow_analyzer dump_period=600 (10 minutes)

14

Volume: Number of Flows

Date N1 (# of raw flows)

N2 (# flows in DB)

N3-B (# of C-flows)

N4-S (# SMTP flows)

2009-09-21 486,636,396 936,397 21,204 307,931

2009-09-22 450,589,989 936,962 29,912 287,695

2009-09-23 380,575,773 869,811 15,796 404,746

2009-09-24 454,792,651 967,945 13,070 404,426

15

BotMiner Evaluation: Time

• All times in minutes

Date T1(dumping raw

flows)

T2-B (flow aggregation)

T3-B(clustering and

correlation)

2009-09-21 900 56 8

2009-09-22 780 41 10

2009-09-23 540 43 6

2009-09-24 606 48 5

16

BotMiner Evaluation: Detection

• The number of the hosts we used to evaluate the false positives is the number of internal hosts in the recorded flows.

Date B-HTTP-I

B-HTTP-II

B-IRC B-spybot

B-sdbot Waldec (p2p)

False positives

2009-09-21 4/4 2/4 4/4 3/4 3/4 3/3 11/889

2009-09-22 4/4 2/4 4/4 3/4 3/4 3/3 10/850

2009-09-23 3/4 2/4 4/4 2/4 2/4 3/3 9/799

2009-09-24 4/4 4/4 4/4 4/4 4/4 3/3 11/801

17

SNARE Evaluation

• Single packet/header features (for initial testing):– AS number– Geodesic distance between the sender and

the recipient– Message size (bytes sent)– Local hour when the email was sent

18

Evaluation of SNARE

• SNARE trains on sampled SMTP flows (in T5-S)• All times in seconds

Date T4-S(feature extraction)

T5-S (model, 10000 samples)

T5-S (model, 30000 samples)

T5-S (model, 50000 samples)

T6-S(detection)

2009-09-21 34.50 73.80 247.61 3857.27 35.59

2009-09-22 32.67 74.79 198.53 3967.38 32.53

2009-09-23 44.87 70.12 184.16 3689.47 45.46

2009-09-24 45.47 68.03 184.12 3731.98 46.50

2) Time for training 50,000 samples (in T5-S) is high, probably because it reaches the physical memory limitations.

1) The detection time (T6-S) is relatively small (note: all SMTP flows)

19

Next Steps

• Optimize the flow dumping process to improve efficiency.

• In the case of SNARE, evaluate with more features.