1
Internet Routing:BGP Routing Convergence
Jennifer Rexford
Princeton University
http://www.cs.princeton.edu/~jrex/bgp-tutorial
2
Goals of This Section
• BGP routing changes– Detecting failures– Path exploration
• Reducing convergence time– Route flap damping and lower timer values– Favoring stability, root-cause tags, extra routes
• BGP stability– Stable paths problem and policy conflicts– Policy guidelines that ensure stability
• Active research areas on Internet routing– Location/identifier separation, routing servers, multipath
routing, overlays and network virtualization
3
BGP Routing Changes
4
Causes of BGP Routing Changes
• Topology changes– Equipment going up or down– Deployment of new routers or sessions
• BGP session failures– Due to equipment failures, maintenance, etc.– Or, due to congestion on the physical path
• Changes in routing policy– Changes in preferences in the routes– Changes in whether the route is exported
• Persistent protocol oscillation– Conflicts between policies in different ASes
5
BGP Session Failure
• BGP runs over TCP– BGP only sends updates
when changes occur– TCP doesn’t detect lost
connectivity on its own
• Detecting a failure– Keep-alive: 60 seconds– Hold timer: 180 seconds
• Reacting to a failure– Discard all routes learned
from the neighbor– Send new updates for any
routes that change
AS1
AS2
6
Routing Change: Before and After
0
1 2
3
0
1 2
3
(1,0) (2,0)
(3,1,0)
(2,0)
(1,2,0)
(3,2,0)
7
Routing Change: Path Exploration
• AS 1– Delete the route (1,0)– Switch to next route (1,2,0)– Send route (1,2,0) to AS 3
• AS 3– Sees (1,2,0) replace (1,0)– Compares to route (2,0)– Switches to using AS 2
0
1 2
3
(2,0)
(1,2,0)
(3,2,0)
8
Routing Change: Path Exploration
• Initial situation– Destination 0 is alive– All ASes use direct path
• When destination dies– All ASes lose direct path– All switch to longer paths– Eventually withdrawn
• E.g., AS 2– (2,0) (2,1,0) – (2,1,0) (2,3,0) – (2,3,0) (2,1,3,0)– (2,1,3,0) null
1 2
3
0
(1,0)(1,2,0)(1,3,0)
(2,0)(2,1,0)(2,3,0)
(2,1,3,0)
(3,0)(3,1,0)(3,2,0)
9
BGP Converges Slowly
• Path vector avoids count-to-infinity– But, ASes still must explore many alternate paths– … to find the highest-ranked path that is still available
• Fortunately, in practice– Most popular destinations have very stable BGP routes– And most instability lies in a few unpopular destinations
• Still, lower BGP convergence delay is a goal– Can be tens of seconds to tens of minutes– High for important interactive applications– … or even conventional application, like Web browsing
10
Reducing BGP Convergence Time
11
Existing Solution: Tune MRAI TImer
• Minimum route advertisement interval (MRAI)– Minimum spacing between announcements– For a particular (prefix, peer) pair
• Advantages of large MRAI– Provides a rate limit on BGP updates– Allows grouping of updates within the interval
• Disadvantages of large MRAI– Adds delay to the convergence process– E.g., 30 seconds for each step
• Trade-off overhead for convergence time
12
Existing Solution: Route-Flap Damping
• Identify (prefix, next-hop) that changes often– Suppress route until stable for a period of time
• Problematic in practice– Path exploration can inadvertently trigger RFD– May suppress all routes, leaving no route left
Reuse limit
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250
1000
2000
3000
4000
Time
Penalty
Suppress limit
NetworkAnnounced
NetworkRe-announced
NetworkNot Announced
13
Proposed: Preferring More Stable Routes
• Alternative to route-flap damping– Score routes on how stable they are
E.g., time elapsed since the last change
– Incorporate into the path-selection decision Prefer more stable routes over less stable routes
• Advantages– Always select a route, if one is available– Prevents excessive routing changes– Creates incentives for greater stability
• Disadvantages– Leads to non-determinism in route selection– Requires state for each route
14
Proposed: Root-Cause Tagging
• Identify reason for changing the route–E.g., which node or edge failed
• Allow routers to skip routes with same fate–E.g., routes with same node or edge in AS path
• Practical challenges–Multiple routers or
links per AS–Incremental
deployment
d
1
2
3
s
4
5
15
Proposed: Disseminating Backup Routes
• Disseminating extra (backup) routes– So a route is available after a failure– To enable faster forwarding convergence
1
d
2
3
4
5
3 2 1 d3 4 5 d
2 1 d
1 d
• Announce alternate route to neighbor– AS 3 makes “3 4 5
d” available to 2– AS 2 makes “2 3 4 5
d” to AS 1– So ASes can switch
immediately
16
BGP Routing Stability
17
Stable Paths Problem (SPP) Model
• Model of routing policy– Each AS has a ranking of the permissible paths
• Model of path selection– Pick the highest-ranked path consistent with neighbors
• Flexibility is not free– Global system may not converge to a stable assignment– Depending on the way the ASes rank their paths
1 2 d1 d
2 3 d2 d
3 1 d3 d
1
3
2
d
18
Permanent Oscillation: “Bad Gadget”
0
1
23
1 2 01 0
2 3 02 0
3 1 03 0
Pick the highest-ranked path consistent with your neighbors’ choices.
Only choice!
Top choice!
Only choice!
Better choice!
Only choice!
Better choice!
19
Two Stable Solutions: Disagree
• Each AS prefers the path through the other
• Two stable states– AS 2 picks “2 0”, and AS 1 picks “1 2 0”– AS 1 picks “1 0”, and AS 2 picks “2 1 0”
• Outcome depends on timing/ordering of messages
1 2 01 0
0
1 22 1 02 0
20
Ways to Achieve Global Stability
• Detect conflicting rankings of paths?– Computationally intractable (NP-hard)– Requires global coordination
• Restrict the policy programming languages?– In what way? How to require this globally?– What if the world should change, and the protocol can’t?
• Rely on economic incentives?– Policies typically driven by business relationships– E.g., customer-provider and peer-peer relationships– Sufficient conditions to guarantee unique, stable solution
21
Bilateral Business Relationships
• Provider-Customer– Customer pays provider for access to the Internet
• Peer-Peer– Peers carry traffic between their respective customers
2 3
1
d
4
5 6
7 8
Provider-Customer
Peer-Peer
Valid paths: “1 2 d” and “7 d”Invalid path: “5 8 d”Valid paths: “6 4 3 d” and “8 5 d”
Invalid paths: “6 5 d” and “1 4 3 d”
22
Act Locally, Prove Globally
• Global topology– Provider-customer relationship graph is acyclic– Peer-peer relationships between any pairs of ASes
• Route export– Do not export routes learned from a peer or provider– … to another peer or provider
• Route selection– Prefer routes through customers– … over routes through peers and providers
• Guaranteed to converge to unique, stable solution
23
Rough Sketch of the Proof
• Two phases–Walking up the customer-provider hierarchy–Walking down the provider-customer hierarchy
2 3
1
d
4
5 6
7 8
Provider-Customer
Peer-Peer
24
Trade-offs Between Assumptions
• Three kinds of assumptions–Route export, route selection, global topology–Relax one assumption, need to tighten other two
• Extensions for other kinds of relationships–Backups, siblings, …
• But, many questions remain–Complete understanding of the trade-offs–Business practices may change over time–ASes may lie about their paths–Protocol extensions for multi-path routing
25
Research Directions:New Internet Routing Architectures
26
Why Change Routing?
• Better performance–Scalability, security, convergence, reliability,
flexibility, stability, …
• Simpler management–For network operators–For folks deploying services
• Greater extensibility–To enable experimentation–To enable new services
27
What to Change, and Where?
• Add another layer about network routing–Routing functionality in overlay networks
• Change the routing protocols–To improve scalability, security, convergence, …
• Change the division of functionality–Data, control, and management planes
• Change the division of responsibility–End users, third parties, and service providers
• ???
28
Theme: Location/Identity Separation
• Scalability problems with BGP–300,000 prefixes and growing–Difficult in handling mobility
• Idea: separate location and identity–Identity associated with a host or group of hosts–Location is “looked up” when sending packets
• Examples–Route packets based on destination AS–Route packets based on “label” found in DNS–Establish e2e paths and associate with labels
29
Server
Theme: Separating Routing From Routers
• Today’s routers do many things–Compute routes, forward packets, monitoring
• Separate service for computing routes–Better scalability, network-wide view, …
• Several deployment scenarios–Within an AS
Incrementally deployable Use BGP to instruct the routers
–Across multiple ASes Routing as a Service Provided by third parties
AS 2
Server
30
Theme: Multipath Routing
• Benefits of multipath routing– Efficiency, performance, reliability, and security– Greater control to users and edge ASes
• Many ways to construct multiple paths– Multipath extensions to BGP– Overlays on top of BGP– Stitching together sub-paths – Source routing
• Many new challenges– Scalability– Stable load balancing– Incentives for participation
d
31
Theme: Overlays and Virtualization
• Build end-to-end topologies– Overlays by tunneling from one node to another– Virtual networks by “hosting” overlays on the routers
• Separation of “interdomain” issues– Instantiate a (virtual) topology over the infrastructure– Run (intradomain) routing protocols on this topology
Competing ISPs with different goals must coordinate
Single service provider controls end-to-end path
32
Conclusion
• Internet routing– A competitive cooperation of ~40,000 networks– Policy-based path-vector routing protocol on prefixes– Tension between local autonomy and global properties
• Many important practical challenges– Scalability, stability, flexibility, performance, reliability, …
• Many interesting research directions– Understanding today’s BGP– Extensions and enhancements to BGP– Entirely new Internet routing architectures
• Please, please help us fix interdomain routing!!!