+ All Categories
Home > Technology > Bgp Anomaly Detection In An Isp

Bgp Anomaly Detection In An Isp

Date post: 09-Jun-2015
Category:
Upload: nirmala-last
View: 402 times
Download: 3 times
Share this document with a friend
Popular Tags:
22
1 BGP Anomaly Detection in an ISP Jian Wu (U. Michigan) Z. Morley Mao (U. Michigan) Jennifer Rexford (Princeton) Jia Wang (AT&T Labs) http://www.cs.princeton.edu/~jrex/papers/nsdi05-jian.pdf
Transcript
Page 1: Bgp Anomaly Detection In An Isp

1

BGP Anomaly Detection in an ISP

Jian Wu (U. Michigan)Z. Morley Mao (U. Michigan) Jennifer Rexford (Princeton)

Jia Wang (AT&T Labs)

http://www.cs.princeton.edu/~jrex/papers/nsdi05-jian.pdf

Page 2: Bgp Anomaly Detection In An Isp

2

Goal

Identify important anomalies Lost reachability Persistent flapping Large traffic shifts

Contributions:

•Build a tool to identify a small number of important routing disruptions from a large volume of raw BGP updates in real time.

•Use the tool to characterize routing disruptions in an operational network

Page 3: Bgp Anomaly Detection In An Isp

3

Capturing Routing Changes

CBRCBR

CPEBGP Monit

or

CBRCBR

CBRCBR

CBRCBR

CBRCBR

CBRCBR

iBGP

iBGP

iBG

P

iBG

P

iBGP

iBGP

eBGP

eBG

P

eBGP

eBG

P

eBGP

eBGP

UpdatesUpdates

Best routesBest ro

utes

Large operational network(8/16/2004 – 10/10-2004)

Page 4: Bgp Anomaly Detection In An Isp

4

Challenges

Large volume of BGP updates Millions daily, very bursty Too much for an operator to manage

Different than root-cause analysis Identify changes and their effects Focus on actionable events Diagnose causes only in/near the AS

Page 5: Bgp Anomaly Detection In An Isp

5

System Architecture

Event Classification

Event Classification

“Typed”Events

EEBR

EEBR

EEBR

BGP Updates

(106)

BGP Update Grouping

BGP Update Grouping

Events

Persistent Flapping Prefixes

(101)

(105)

EventCorrelation

EventCorrelation

Clusters

Frequent Flapping Prefixes

(103)

(101)

Traffic ImpactPrediction

Traffic ImpactPrediction

EEBREEBR EEBR

LargeDisruptions

Netflow Data

(101)

Page 6: Bgp Anomaly Detection In An Isp

6

Grouping BGP Update into Events

Challenge: A single routing change leads to multiple update messages affects routing decisions at multiple routers

Solution: •Group all updates for a prefix with inter-arrival < 70 seconds•Flag prefixes with changes lasting > 10 minutes.

BGP Update Grouping

BGP Update Grouping

EEBR

EEBR

EEBR

BGP Updates

Events

Persistent Flapping Prefixes

Page 7: Bgp Anomaly Detection In An Isp

7

Grouping Thresholds

Based on data analysis and our understanding of BGP

Event timeout: 70 seconds 2 * MRAI timer + 10 seconds 98% inter-arrival time < 70 seconds

Convergence timeout: 10 minutes BGP usually converges within minutes 99.9% events < 10 minutes

Page 8: Bgp Anomaly Detection In An Isp

8

Persistent Flapping Prefixes

Causes of persistent flapping Conservative damping parameters (78.6%) Protocol oscillations due to MED (18.3%) Unstable interface or BGP session (3.0%)

Surprising finding: 15.2% of updates were caused by persistent flapping prefixes, even though flap damping was enabled!

Page 9: Bgp Anomaly Detection In An Isp

9

Example: Unstable eBGP Session

ISP Peer

CustomerEC

EB

EA ED

p

Flap damping parameters are session-based Damping not implemented for iBGP sessions

Page 10: Bgp Anomaly Detection In An Isp

10

Event Classification

Challenge: Major concerns in network management Changes in reachability Heavy load of routing messages on the routers Change of flow of traffic through the network

Event Classification

Event Classification

Events “Typed”Events

Solution: classify events by severity of their impacts

Page 11: Bgp Anomaly Detection In An Isp

11

Event Category – “No Disruption”

ISP

EA

p

EB

EC

EE

AS2

ED

AS1

No Traffic Shift

“No Disruption”: each of the border routers has no traffic shift. (50.3%)

Page 12: Bgp Anomaly Detection In An Isp

12

Event Category – “Internal Disruption”

ISP

EA

p

EB

EC

EE

AS2

ED

AS1

Internal Traffic Shift

“Internal Disruption”: all of the traffic shifts are internal traffic shift. (15.6%)

Page 13: Bgp Anomaly Detection In An Isp

13

Event Category – “Single External Disruption”

ISP

EA

p

EB

EC

EE

AS2

ED

AS1

external Traffic Shift

“Single External Disruption”: only one of the traffic shifts is external traffic shift. (20.7%)

Page 14: Bgp Anomaly Detection In An Isp

14

Statistics on Event Classification

Events Updates

No Disruption 50.3% 48.6%

Internal Disruption 15.6% 3.4%

Single External Disruption 20.7% 7.9%

Multiple External Disruption 7.4% 18.2%

Loss/Gain of Reachability 6.0% 21.9%

First 3 categories have significant variations from day to day

Updates per event depends on the type of events and the number of affected routers

Page 15: Bgp Anomaly Detection In An Isp

15

Event Correlation

Challenge: A single routing change affects multiple destination prefixes

EventCorrelation

EventCorrelation“Typed”

EventsClusters

Solution: group events of same type that occur close in time

Page 16: Bgp Anomaly Detection In An Isp

16

EBGP Session Reset Caused most “single external disruption” events Check if the number of prefixes using that

session as the best route changes dramatically

Validation with Syslog router report (95%)

time

Number of prefixes

session failure

session recovery

Page 17: Bgp Anomaly Detection In An Isp

17

Hot-Potato Changes Hot-Potato Changes

Caused “internal disruption” events Validation with OSPF measurement (95%)

[Teixeira et al – SIGMETRICS’ 04]

ISP

P

EA EB

EC

10119

“Hot-potato routing” = route to closest egress point

Page 18: Bgp Anomaly Detection In An Isp

18

Traffic Impact Prediction

Challenge: Routing changes have different impacts on the network which depends on the popularity of the destinations

Traffic ImpactPrediction

Traffic ImpactPrediction

EEBR

Clusters LargeDisruptions

Netflow Data

EEBR EEBR

Solution: weigh each cluster by traffic volume

Page 19: Bgp Anomaly Detection In An Isp

19

Traffic Impact Prediction

Traffic weight Per-prefix measurement from Netflow 10% prefixes accounts for 90% of traffic

Traffic weight of a cluster Sum of “traffic weight” of the prefixes A few clusters have large traffic weight Mostly session resets & hot-potato changes

Page 20: Bgp Anomaly Detection In An Isp

20

Performance Evaluation Memory

Static memory: “current routes”, 600 MB Dynamic memory: “clusters”, 300 MB

Speed 99% of intervals of 1 second of updates can

be process within 1 second Occasional execution lag Every interval of 70 seconds of updates can

be processed within 70 seconds

Measurements were based on 900MHz CPU

Page 21: Bgp Anomaly Detection In An Isp

21

Conclusion

BGP anomaly detection Fast, online fashion Operator concerns (reachability, flapping,

traffic) Significant information reduction

Uncovered important network behaviors Persistent flapping prefixes Hot-potato changes Session resets and interface failures

Page 22: Bgp Anomaly Detection In An Isp

22

Detecting Peering Violations Consistent export requirement

Peer should advertise prefixes at all peering points, with the same AS path length

Allows the AS to do hot-potato routing Detecting violations

Using iBGP feeds from the border routers Some inference tricks to identify inconsistencies

Results of the study http://www.nanog.org/mtg-0410/feamster.html http://www.cs.princeton.edu/~jrex/papers/imc04.pdf


Recommended