Post on 02-Feb-2016
description
transcript
Network Mapping and Anomaly Detection
Athina Markopoulou (Irvine)
Robert Calderbank (Princeton)
Rob Nowak (Madison)
MURI Kickoff Meeting September 19, 2009
Challenges
- Applications- Mathematics
Preliminary Results
- Detecting Malicious Traffic Sources(Athina Markopoulou)
- Network Topology Id
- Network-wide Anomaly Detection
Research Directions
Outline
Application Challenges
Network Mapping: Infer network topology/connectivity from minimal measurements
Detecting Topology Changes: Quickly sense changes in routing or connectivity
Network-wide Anomalies: Detect weak and distributed patters of anomalous network activity
Predicting Malicious Traffic: Identify network sources that are likely to launch future attacks
Mathematical Challenges
Vastly Incomplete Data: Impossible to monitor a network everywhere and all the time. Where and when should we measure?
Large-scale Inference: Inference of high-dimensional signals/graphs from noisy and incomplete data. Robust statistical data analysis and scalable algorithms are crucial.
Network Representations: Statistical analysis matched to network structures. Can network data be ‘sparsified’ using new representations and transformations?
Network Prediction Models: New ‘network-centric’ statistical methods are needed to cluster network nodes for robust prediction from limited datasets.
Predicting Malicious Traffic Sources
Predictive Blacklistingas an Implicit Recommendation System
• Problem: predict sources of malicious traffic on the Internet– Blacklists:
• list of worst offenders (source IP addresses or prefixes)• used to block (or to further scrub) traffic originating from those sources
– Goal:• Predict malicious sources that are likely to attack a victim in the future
based on past logs
• Prediction vs. Detection• strictly speaking, this is not “detection”• but it does require finding patterns in the data
Traditional Blacklist Generation
• Local Worst Offender List (LWOL)– Most prolific local offenders– Reactive but not proactive
• Global Worst Offender List (GWOL)– Most prolific global offenders– Might contain irrelevant offenders– Non prolific attackers are elusive to GWOL
• State of the art: Collaborative Blacklisting – J. Zhang, P. Porras, and J. Ullrich, “Highly Predictive Blacklisting”, USENIX
Security 2008 (best paper award)– Key idea: A victim is likely to be attacked not only by sources that
attacked this particular, but also by sources that attacked “similar victims”
– Methodology: Use link-analysis (pagerank) on the victims similarity graph to predict future attacks
– First methodological development in this problem a long time!
Formulating Predictive Blacklistingas an Implicit Recommendation System
3 2 ? ?
1 ? ? 4
6 3 1 9
? ? 2 ?
Item
s (
movie
s)
Users
R = Rating Matrix
Recommendation system(e.g. Netflix, Amazon)
8
- 13 4 ?
? - 3 ?
? ? - 2
3 8 ? -
- ? ? 1
? - 12 1
4 ? - 27
2 ? 6 -
- 7 ? 1
3 - ? 9
? 21 - ?
11 2 ? -
Vic
tim
s
Attackers
Predictive Blacklisting
- ? ? ?
? - ? ?
? ? - ?
? ? ? -Time
R(t) = Attack Matrix
Collaborative Filtering (CF) different techniques capture different patterns in the data
• Multi-level Prediction– Individual level: (attacker, victim)
• use time series to project past trends
– Local level: neighborhood-based CF• group “similar” victim networks,(knn)
– notion of similarity accounts for common attackers and time
• groups of attackers attacking the same victims – find them using the cross-association (CA) method
– [Global level: factorization-based CF (in progress)]• find latent factors in the data using, e.g. SVD
• Combine ratings from different predictors
9
Tested our approach on Dshield data6-month of logs
• Dshield.org is a central repository of shared logs– Several victim organizations submit their IDS logs (flow data)– The repository analyzes the logs and provides a predictive
blacklist, tailored to each victim
UCI
Princeton
Dshield.org
Several different patterns co-exist in the data
and should be detected and used for prediction
11
Preliminary results
• A combination of methods significantly improves the hit count of the blacklist
– up to 70% (57% on avg) compared to the state-of-the art (HPB)
Combined method
State-of-the-art (HPB)
Older method (GWOL)
• and there is much room for improvement
Challenges & Future Directions
• Get closer to the upper bound– Latent factor techniques– Dealing with missing data
• Adversarial model• Scalability
• Hopefully interactions with other people in this group
• F. Soldo, A. Le, A. Markopoulou, http://arxiv.org/abs/0908.2007
Network Mapping
Network Mapping
Existing Methods:Active probing (e.g., traceroute)
New Approach:Passive monitoring
Lumeta Corporation
The Data
8
5
1011
13
161110
51013
10410
64
74
11107
1076
1015
1412
85510
946
12
98
1575
6411
812
144
1244
782
10135
161110
51013
10410
64
74
11107
1076
1015
1412
85510
946
12
98
1575
6411
812
144
1244
782
10135
monitors
end-
host
sHop-counts from end-hosts to monitors; extracted from TTL fields of traffic at monitors
16111049
5101365
101118410
64794
734515
11710127
831076
10152136
5214121
785510
946910
51511126
910852
1512775
6134115
881295
414274
1214544
716829
1101385
16111049
5101365
101118410
64794
734515
11710127
831076
10152136
5214121
785510
946910
51511126
910852
1512775
6134115
881295
414274
1214544
716829
1101385
?1
?
Clustering End-Hosts
Problem: Use hop-count data to automatically cluster end-hosts into topologically relevant groups (e.g., subnets)
Intuition: End-hosts with similar hop-counts are probably close together
Challenge: Clustering with missing data
161110
51013
10410
64
74
11107
1076
1015
1412
85510
946
12
98
1575
6411
812
144
1244
782
10135
161110
51013
10410
64
74
11107
1076
1015
1412
85510
946
12
98
1575
6411
812
144
1244
782
10135
honeypots
end-
host
s
2-d histogram of hop-counts; ellipses indicate end-hosts from different subnets
Matrix Completion
161110
51013
10410
64
74
11107
1076
1015
1412
85510
946
12
98
1575
6411
812
144
1244
782
10135
161110
51013
10410
64
74
11107
1076
1015
1412
85510
946
12
98
1575
6411
812
144
1244
782
10135
observed hop-counts are random samples of entries of complete hop-count matrix
16111049
5101365
101118410
64794
734515
11710127
831076
10152136
5214121
785510
946910
51511126
910852
1512775
6134115
881295
414274
1214544
716829
1101385
16111049
5101365
101118410
64794
734515
11710127
831076
10152136
5214121
785510
946910
51511126
910852
1512775
6134115
881295
414274
1214544
716829
1101385
SVD of hop-countmatrix is low-rank
r
Results
clusters from complete data
clusters from 25% data
0 fraction of complete data 1
mixture model
Network-wide Anomaly Detection
Binary pattern 0/1Signal strength
Observation model:
Weak in strength: signal
Invisible in per node signature
Weak in extent: # affected nodes
Invisible in network wide aggregate
unknown
Distributed Network Anomaly Detection
Prior work: Can detect weak and unstructured patterns by exploiting multiplicity. (Ingster, Jin-Donoho’03, Abramovich et al ’01)
Subtle adaptive testing procedures: Higher criticism, False discovery control
Localizable
sign
al s
tren
gth
sparsity
Now you see it, now you don’t
# active nodes
Detecting weak and sparse patterns
In addition to multiplicity, can we exploit the (possibly non-local) dependencies between node measurements to boost performance?
Method must be adaptive to network interaction structure.
How do node interactions effect thresholds of detectability/localizability?
InteractionsNodes
Network anomaly patterns
Latent multi-scale Ising model :
# edge agreementsstrength of interaction
Observed network node measurements
Latent multi-scale dependencies
Modeling network anomaly patterns
Theorem: Consider a latent multi-scale Ising model with uniform node interaction strength . With probability ,
(1)the correct dependency structure (tree) can be learnt using i.i.d network observations x by hierarchical correlation clustering.
(2)the number of non-zero basis coefficients for an x drawn at random is is .
Hierarchical correlation clustering Unbalanced Haar Basis
Hierarchical clustering and basis learning
Network data
Signal is focused and strengthis amplified
Wavelet coefficients
Weak patterns are amplified by the sparsifying transform adapted to network topology, whilenoise characteristics remain the same.
Detection of anomalies in transform domain
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Detection vs. Signal Strength
Detection usingOriginal data
Detection using Wavelet coefficients
Coherent activation patterns result in few non-zero basis coefficients and can be detected with much smaller signal strength
Example basis vectors learnt from O(log p) network measurements using hierarchical clustering
Sample delay covariance matrix
Internet anomaly detection
Monitor
unknown networkunknown network
Compression achieved for real Internet RTT data
Research Directions
Active Sensing: Sequential algorithms that automatically decide where, what and when to measure?
Online Large-scale Inference: on-line and near real-time network monitoring to detect topology changes and traffic anomalies.
Wireless Network Sensing: Exploitation of sparsity and diversity in wireless networks for fast and robust identification of network-wide characteristics.
New Network Representations: Relationships between wavelet representations and persistent homology.
Extra slides
5
10
6
11
13
network structure is unknown; infer network routing/topology by ‘triangulation’
Network Discovery
Network Discovery
5
10
6
11
13
Unfortunately, many hop-counts are not
observed
?
?