Network Mapping and Anomaly Detection Athina Markopoulou ( Irvine)

transcript

Network Mapping and Anomaly Detection

Athina Markopoulou (Irvine)

Robert Calderbank (Princeton)

Rob Nowak (Madison)

MURI Kickoff Meeting September 19, 2009

Challenges

- Applications- Mathematics

Preliminary Results

- Detecting Malicious Traffic Sources(Athina Markopoulou)

- Network Topology Id

- Network-wide Anomaly Detection

Research Directions

Outline

Application Challenges

Network Mapping: Infer network topology/connectivity from minimal measurements

Detecting Topology Changes: Quickly sense changes in routing or connectivity

Network-wide Anomalies: Detect weak and distributed patters of anomalous network activity

Predicting Malicious Traffic: Identify network sources that are likely to launch future attacks

Mathematical Challenges

Vastly Incomplete Data: Impossible to monitor a network everywhere and all the time. Where and when should we measure?

Large-scale Inference: Inference of high-dimensional signals/graphs from noisy and incomplete data. Robust statistical data analysis and scalable algorithms are crucial.

Network Representations: Statistical analysis matched to network structures. Can network data be ‘sparsified’ using new representations and transformations?

Network Prediction Models: New ‘network-centric’ statistical methods are needed to cluster network nodes for robust prediction from limited datasets.

Predicting Malicious Traffic Sources

Predictive Blacklistingas an Implicit Recommendation System

• Problem: predict sources of malicious traffic on the Internet– Blacklists:

• list of worst offenders (source IP addresses or prefixes)• used to block (or to further scrub) traffic originating from those sources

– Goal:• Predict malicious sources that are likely to attack a victim in the future

based on past logs

• Prediction vs. Detection• strictly speaking, this is not “detection”• but it does require finding patterns in the data

Traditional Blacklist Generation

• Local Worst Offender List (LWOL)– Most prolific local offenders– Reactive but not proactive

• Global Worst Offender List (GWOL)– Most prolific global offenders– Might contain irrelevant offenders– Non prolific attackers are elusive to GWOL

• State of the art: Collaborative Blacklisting – J. Zhang, P. Porras, and J. Ullrich, “Highly Predictive Blacklisting”, USENIX

Security 2008 (best paper award)– Key idea: A victim is likely to be attacked not only by sources that

attacked this particular, but also by sources that attacked “similar victims”

– Methodology: Use link-analysis (pagerank) on the victims similarity graph to predict future attacks

– First methodological development in this problem a long time!

Formulating Predictive Blacklistingas an Implicit Recommendation System

3 2 ? ?

1 ? ? 4

6 3 1 9

? ? 2 ?

R = Rating Matrix

Recommendation system(e.g. Netflix, Amazon)

- 13 4 ?

? - 3 ?

? ? - 2

3 8 ? -

- ? ? 1

? - 12 1

4 ? - 27

2 ? 6 -

- 7 ? 1

3 - ? 9

? 21 - ?

11 2 ? -

Attackers

Predictive Blacklisting

- ? ? ?

? - ? ?

? ? - ?

? ? ? -Time

R(t) = Attack Matrix

Collaborative Filtering (CF) different techniques capture different patterns in the data

• Multi-level Prediction– Individual level: (attacker, victim)

• use time series to project past trends

– Local level: neighborhood-based CF• group “similar” victim networks,(knn)

– notion of similarity accounts for common attackers and time

• groups of attackers attacking the same victims – find them using the cross-association (CA) method

– [Global level: factorization-based CF (in progress)]• find latent factors in the data using, e.g. SVD

• Combine ratings from different predictors

Tested our approach on Dshield data6-month of logs

• Dshield.org is a central repository of shared logs– Several victim organizations submit their IDS logs (flow data)– The repository analyzes the logs and provides a predictive

blacklist, tailored to each victim

Princeton

Dshield.org

Several different patterns co-exist in the data

and should be detected and used for prediction

Preliminary results

• A combination of methods significantly improves the hit count of the blacklist

– up to 70% (57% on avg) compared to the state-of-the art (HPB)

Combined method

State-of-the-art (HPB)

Older method (GWOL)

• and there is much room for improvement

Challenges & Future Directions

• Get closer to the upper bound– Latent factor techniques– Dealing with missing data

• Adversarial model• Scalability

• Hopefully interactions with other people in this group

• F. Soldo, A. Le, A. Markopoulou, http://arxiv.org/abs/0908.2007

Network Mapping

Existing Methods:Active probing (e.g., traceroute)

New Approach:Passive monitoring

Lumeta Corporation

The Data

161110

monitors

sHop-counts from end-hosts to monitors; extracted from TTL fields of traffic at monitors

16111049

5101365

101118410

734515

11710127

831076

10152136

5214121

785510

946910

51511126

910852

1512775

6134115

881295

414274

1214544

716829

1101385

16111049

5101365

101118410

734515

11710127

831076

10152136

5214121

785510

946910

51511126

910852

1512775

6134115

881295

414274

1214544

716829

1101385

Clustering End-Hosts

Problem: Use hop-count data to automatically cluster end-hosts into topologically relevant groups (e.g., subnets)

Intuition: End-hosts with similar hop-counts are probably close together

Challenge: Clustering with missing data

161110

honeypots

2-d histogram of hop-counts; ellipses indicate end-hosts from different subnets

Matrix Completion

161110

observed hop-counts are random samples of entries of complete hop-count matrix

16111049

5101365

101118410

734515

11710127

831076

10152136

5214121

785510

946910

51511126

910852

1512775

6134115

881295

414274

1214544

716829

1101385

16111049

5101365

101118410

734515

11710127

831076

10152136

5214121

785510

946910

51511126

910852

1512775

6134115

881295

414274

1214544

716829

1101385

SVD of hop-countmatrix is low-rank

Results

clusters from complete data

clusters from 25% data

0 fraction of complete data 1

mixture model

Network-wide Anomaly Detection

Binary pattern 0/1Signal strength

Observation model:

Weak in strength: signal

Invisible in per node signature

Weak in extent: # affected nodes

Invisible in network wide aggregate

unknown

Distributed Network Anomaly Detection

Prior work: Can detect weak and unstructured patterns by exploiting multiplicity. (Ingster, Jin-Donoho’03, Abramovich et al ’01)

Subtle adaptive testing procedures: Higher criticism, False discovery control

Localizable

sparsity

Now you see it, now you don’t

# active nodes

Detecting weak and sparse patterns

In addition to multiplicity, can we exploit the (possibly non-local) dependencies between node measurements to boost performance?

Method must be adaptive to network interaction structure.

How do node interactions effect thresholds of detectability/localizability?

InteractionsNodes

Network anomaly patterns

Latent multi-scale Ising model :

# edge agreementsstrength of interaction

Observed network node measurements

Latent multi-scale dependencies

Modeling network anomaly patterns

Theorem: Consider a latent multi-scale Ising model with uniform node interaction strength . With probability ,

(1)the correct dependency structure (tree) can be learnt using i.i.d network observations x by hierarchical correlation clustering.

(2)the number of non-zero basis coefficients for an x drawn at random is is .

Hierarchical correlation clustering Unbalanced Haar Basis

Hierarchical clustering and basis learning

Network data

Signal is focused and strengthis amplified

Wavelet coefficients

Weak patterns are amplified by the sparsifying transform adapted to network topology, whilenoise characteristics remain the same.

Detection of anomalies in transform domain

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

Detection vs. Signal Strength

Detection usingOriginal data

Detection using Wavelet coefficients

Coherent activation patterns result in few non-zero basis coefficients and can be detected with much smaller signal strength

Example basis vectors learnt from O(log p) network measurements using hierarchical clustering

Sample delay covariance matrix

Internet anomaly detection

Monitor

unknown networkunknown network

Compression achieved for real Internet RTT data

Research Directions

Active Sensing: Sequential algorithms that automatically decide where, what and when to measure?

Online Large-scale Inference: on-line and near real-time network monitoring to detect topology changes and traffic anomalies.

Wireless Network Sensing: Exploitation of sparsity and diversity in wireless networks for fast and robust identification of network-wide characteristics.

New Network Representations: Relationships between wavelet representations and persistent homology.

Extra slides

network structure is unknown; infer network routing/topology by ‘triangulation’

Network Discovery

Unfortunately, many hop-counts are not

observed

Network Mapping and Anomaly Detection Athina Markopoulou ( Irvine)

Documents