BGP-lens: Patterns and Anomalies in Internet Routing Updates
B. Aditya Prakash1, Nicholas Valler2, David Andersen1, Michalis Faloutsos2, Christos Faloutsos1
1Carnegie Mellon University2UC-Riverside
KDD 2009, Paris
Introduction
• Border Gateway Protocol (BGP)– Internet Routing Protocol– Router sending messages to each other– Keeps path information up-to-date
• Ideal Setting - no BGP updates• Really – many updates
– link failures, router restarts, malicious behavior
2
Time peerAS originAS prefix
2005-02-17 12:39:42 ATT SPRINT 204.29.119.0/24
2005-02-17 12:39:43 VERIZON AOL 204.29.80.0/24
2005-02-17 12:39:46 WASH ATLA 204.29.79.0/24
…. …. …. ….
Each Row is an
update
Introduction contd.
Question: Find patterns/anomalies?• Challenges:
– Millions of updates sent over network– Data has multiple dimensions– Noisy Measurements– Impossible for human to sift through updates
3
Automated Tool needed!
The DataTime peerAS originAS prefix
2005-02-17 12:39:42 ATT SPRINT 204.29.119.0/24
2005-02-17 12:39:43 VERIZON AOL 204.29.80.0/24
2005-02-17 12:39:46 WASH ATLA 204.29.79.0/24
…. …. …. ….
• Data from Datapository.net• Abilene Network
4
18 million update messages – over two years!
Our Approach
• Look at a simple time-series• Focus on just the time• # of updates received every b seconds (bin size)
• Specific Problem we are tackling– Given such time-series– Report patterns and anomalies
• Also find suspicious entities (paths, ASes etc.)
5
Time
2005-02-17 12:39:42
2005-02-17 12:39:43
2005-02-17 12:39:46
2005-02-17 12:40:01
….
Time peerAS originAS prefix
2005-02-17 12:39:42 ATT SPRINT 204.29.119.0/24
2005-02-17 12:39:43 VERIZON AOL 204.29.80.0/24
2005-02-17 12:39:46 WASH ATLA 204.29.79.0/24
…. …. …. ….
b secs
time
Bin: 0 1 2 …
Count: 4 2 6 …
Real data: Washington Router
Very Bursty!
Traditional Tools like FFT, auto-regression
don’t work
6
# of Updates
Bin number (‘Time’)
Bin Size = 600s
Outline
• Introduction and Problem Statement• Techniques
– Temporal Analysis– Frequency Analysis
• BGP-lens at work• Conclusions
7
Temporal Analysis
• First Cut: Take log-linear plot– emphasizes small values over high values
8
Bin size: 10sBin size: 10s
9
But: Bin size is important!
10
‘Clotheslines’
Bin size: 600sBin size: 600s
Clotheslines
Q1: Why Clotheslines?– Near consecutive updates over long time-period
– Can be Route Flapping• advertise/withdraw same path frequently• important to identify
Q2: How to automate this discovery? 11
Proposal: Marginals to Rescue
• PDF of volume of updates– Number of time-bins with volume
Extremes == Height of the clotheslines!
12
Marginals to Rescue
• PDF of volume of updates– Number of time-bins with volume
13
Algorithm - Clotheslines
• For marginals plot use the median filtering approach to determine ‘outliers’;
• For each time interval found, report the most consistent IPs/ASes etc.
High Level Idea only – details in
paper!
14
Outline
• Introduction and Problem Statement• Techniques
– Temporal Analysis– Frequency Analysis
• BGP-lens at work• Conclusions
15
16
Low Freq.
HighFreq.
High energy Low energy‘Tornado’does nottouch down
time ->Signal
In real data…
17
E2
18
E2
~ 20,000 updates!
~ 8 hrs
Why Prolonged Spike?
• Bursts of short duration
• Can represent malicious behavior– Or simple router restarts!
• Exact cause hard to find – but important for system-administrators
19
Algorithm – Prolonged Spikes
• Basic idea: find tornados from scalogram• Find suitable starting point at higher levels• Extend downward as much as possible• The finest scale where tornado stops
– the shortest time period to look for a prolonged spike
• Again, details in paper!
20
Scalability
21
BGP-lens: User Interface
22
# of suspicious events sysadmin wants to check
duration: length of events to be checked (think daily vs weekly vs monthly)
optionaloptional
Outline
• Introduction and Problem Statement• Techniques
– Temporal Analysis– Frequency Analysis
• BGP-lens at work• Conclusions
23
BGP-lens at Work
• We found real events too . examples-Event 1:
50-clothesline– Prefix and Origin-AS pointed to Alabama
Supercomputing Net– When contacted sysadmins
• attributed changes to route flapping• “the route for 207.157.115.0/24 was appearing and
disappearing in [the] IGP routing table ... [which] may have caused BGP to flap.”
– Anomaly went undetected and unresolved for 30 days!
24
Results from real data
25
Event 2Prolonged Spike
– May 12th 2006 – 8hr spike – Most persistent IPs/ASes
• Primary and middle schools in a large district in a country
– Two more spikes Jan18-19, 2006 and Aug 1
Conclusions• Studied huge real data (~18 million updates)• Developed two new techniques
– effective• spots subtle phenomena like clotheslines and prolonged
spikes
– scalable
• BGP-lens: a user-friendly tool• provides reasonable defaults• provides easy-to-use knobs• leads like IPs/ASes
26
Thank You!
• Any questions?
• www.cs.cmu.edu/~badityap– We thank NSF, USA for their support.
• Author-Reel!
27
Extra - Frequency Analysis
• Data is self-similar!– we used the entropy-plot measure – also called the b-model [26]
– Corresponds to b-model of 75-25
– Multi-resolution techniques needed!
28
Extra - FFT
29
Extra – Marginals for 10sec
30
Extra – Prolonged Spike Algorithm
31