Post on 18-Dec-2015
transcript
Univ. of Tehran Computer Network 1
Computer Computer NetworksNetworks
(Graduate level)
University of TehranDept. of EE and Computer Engineering
By:Dr. Nasser Yazdani
Lecture 7: Inter-domain Routing
Univ. of Tehran Computer Network 2
Inter-Domain Routing Border Gateway Protocol (BGP) Assigned reading
[LAB00] Delayed Internet Routing Convergence Sources
RFC1771: main BGP RFC RFC1772-3-4: application, experiences, and
analysis of BGP RFC1965: AS confederations for BGP Christian Huitema: “Routing in the Internet”,
chapters 8 and 9. John Stewart III: “BGP4 - Inter-domain routing in
the Internet”
Univ. of Tehran Computer Network 3
Outline External BGP (E-BGP)
Internal BGP (I-BGP)
Multi-Homing
Stability Issues
Univ. of Tehran Computer Network 4
Internet’s Area Hierarchy What is an Autonomous System (AS)?
A set of routers under a single technical administration, using an interior gateway protocol (IGP) and common metrics to route packets within the AS and using an exterior gateway protocol (EGP) to route packets to other AS’s
Sometimes AS’s use multiple IGPs and metrics, but appear as single AS’s to other AS’s
Each AS assigned unique ID AS’s peer at network exchange routing
information.
Univ. of Tehran Computer Network 5
Example
1 2
3
1.11.2
2.1 2.2
3.1 3.2
2.2.1
44.1 4.2
5
5.1 5.2
EGP
IGP
EGPEGP
IGP
IGP
IGPIGP
EGP
EGP
Univ. of Tehran Computer Network 6
History Mid-80s: EGP
Reachability protocol (no shortest path) Did not accommodate cycles (tree topology) Evolved when all networks connected to NSF
backbone Result: BGP introduced as routing
protocol Latest version = BGP 4 BGP-4 supports CIDR Primary objective: connectivity not
performance
Univ. of Tehran Computer Network 7
Choices Link state or distance vector?
No universal metric – policy decisions Problems with distance-vector:
Bellman-Ford algorithm may not converge Problems with link state:
Metric used by routers not the same – loops
LS database too large – entire Internet May expose policies to other AS’s
Univ. of Tehran Computer Network 8
Solution: Distance Vector with Path
Each routing update carries the entire path
Loops are detected as follows: When AS gets route check if AS already is in
path If yes, reject route If no, add self and (possibly) advertise route further
Advantage: Metrics are local - AS chooses path, protocol
ensures no loops
Univ. of Tehran Computer Network 9
Interconnecting BGP Peers BGP uses TCP to connect peers Advantages:
Simplifies BGP No need for periodic refresh - routes are
valid until withdrawn, or the connection is lost
Incremental updates Disadvantages
Congestion control on a routing protocol? Poor interaction during high load
Univ. of Tehran Computer Network 10
Hop-by-hop Model BGP advertises to neighbors only
those routes that it uses Consistent with the hop-by-hop Internet
paradigm e.g., AS1 cannot tell AS2 to route to
other AS’s in a manner different than what AS2 has chosen (need source routing for that)
Univ. of Tehran Computer Network 11
AS Categories Stub: an AS that has only a single
connection to one other AS - carries only local traffic.
Multi-homed: an AS that has connections to more than one AS, but does not carry transit traffic
Transit: an AS that has connections to more than one AS, and carries both transit and local traffic (under certain policy restrictions)
Univ. of Tehran Computer Network 12
AS Categories
AS1
AS3AS2
AS1
AS2
AS3AS1
AS2
Stub
Multi-homed
Transit
Univ. of Tehran Computer Network 13
Policy with BGP BGP provides capability for enforcing
various policies Policies are not part of BGP: they
are provided to BGP as configuration information
BGP enforces policies by choosing paths from multiple alternatives and controlling advertisement to other AS’s
Univ. of Tehran Computer Network 14
Examples of BGP Policies A multi-homed AS refuses to act as
transit Limit path advertisement
A multi-homed AS can become transit for some AS’s Only advertise paths to some AS’s
An AS can favor or disfavor certain AS’s for traffic transit from itself
Univ. of Tehran Computer Network 15
Routing Information Bases (RIB)
Routes are stored in RIBs Adj-RIBs-In: routing info that has
been learned from other routers (unprocessed routing info)
Loc-RIB: local routing information selected from Adj-RIBs-In (routes selected locally)
Adj-RIBs-Out: info to be advertised to peers (routes to be advertised)
Univ. of Tehran Computer Network 16
BGP Common Header
Length (2 bytes) Type (1 byte)
0 1 2 3
Marker (security and message delineation)16 bytes
Types: OPEN, UPDATE, NOTIFICATION, KEEPALIVE
Univ. of Tehran Computer Network 17
Optional parameters <type, length, value>
BGP OPEN message
Length Type: open
0 1 2 3
Marker (security and message delineation)
versionMy autonomous system Hold time
BGP identifierParameter length
My AS: id assigned to that ASHold timer: max interval between KEEPALIVE or UPDATE messages interval implies no keep_alive.BGP ID: IP address of one interface (same for all messages)
Univ. of Tehran Computer Network 18
BGP UPDATE message
Length Type: update
0 1 2 3
Marker (security and message delineation)
..routes len Withdrawn routes (variable)
Path attribute len Path attributes (variable)
Network layer reachability information (NLRI) (variable)
•Many prefixes may be included in UPDATE, but mustshare same attributes.
•UPDATE message may report multiple withdrawn routes.
...
Withdrawn..
Univ. of Tehran Computer Network 19
BGP UPDATE Message List of withdrawn routes Network layer reachability
information List of reachable prefixes
Path attributes Origin Path Metrics
All prefixes advertised in a message have same path attributes
Univ. of Tehran Computer Network 20
NLRI Network Level Reachability
Information list of IP address prefixes encoded as
follows:
Length (1 byte) Prefix (variable)
Univ. of Tehran Computer Network 21
Path attributes
Attribute type (2 bytes) Attribute length (1-2 bytes)
Attribute Value (variable length)
Type-Length-Value encoding
Attribute flags (1 byte) Attribute type code (1 byte)
Attribute type field
Flags: optional, v.s. well-knowntransitive, partial, extended length
Univ. of Tehran Computer Network 22
Data
BGP NOTIFICATION message
Length Type: NOTIFICATION
0 1 2 3
Marker (security and message delineation)
Error code
Error sub-code
•Used for error notificationTCP connection is closed immediately after notification
Univ. of Tehran Computer Network 23
BGP KEEPALIVE message
Length Type: KEEPALIVE
0 1 2 3
Marker (security and message delineation)
Sent periodically to peers to ensure connectivity.If hold_time is zero, messages are not sent..Sent in place of an UPDATE message
Univ. of Tehran Computer Network 24
Path Selection Criteria Information based on path attributes Attributes + external (policy)
information Examples:
Hop count Policy considerations
Preference for AS Presence or absence of certain AS
Path origin Link dynamics
Univ. of Tehran Computer Network 25
Route Selection Summary
Highest Local Preference
Shortest ASPATH
Lowest MED
i-BGP < e-BGP
Lowest IGP cost to BGP egress
Lowest router ID
traffic engineering
Enforce relationships
Throw up hands andbreak ties
Univ. of Tehran 26
Back to Frank …
AS 1AS 2
AS 4
AS 3
13.13.0.0/16
peer peer
customerprovider
local pref = 80
local pref = 100
local pref = 90
Higher Localpreference valuesare more preferred
Local preference only used in iBGP
Univ. of Tehran 27
Implementing Backup Links with Local Preference (Outbound Traffic)
Forces outbound traffic to take primary link, unless link is down.
AS 1
primary link backup link
Set Local Pref = 100for all routes from AS 1 AS 65000
Set Local Pref = 50for all routes from AS 1
We’ll talk about inbound traffic soon …
Univ. of Tehran 28
Multihomed Backups (Outbound Traffic)
Forces outbound traffic to take primary link, unless link is down.
AS 1
primary link backup link
Set Local Pref = 100for all routes from AS 1
AS 2
Set Local Pref = 50for all routes from AS 3
AS 3provider provider
Univ. of Tehran 29
ASPATH Attribute
AS7018135.207.0.0/16AS Path = 6341
AS 1239Sprint
AS 1755Ebone
AT&T
AS 3549Global Crossing
135.207.0.0/16AS Path = 7018 6341
135.207.0.0/16AS Path = 3549 7018 6341
AS 6341
135.207.0.0/16
AT&T Research
Prefix Originated
AS 12654RIPE NCCRIS project
AS 1129Global Access
135.207.0.0/16AS Path = 7018 6341
135.207.0.0/16AS Path = 1239 7018 6341
135.207.0.0/16AS Path = 1755 1239 7018 6341
135.207.0.0/16AS Path = 1129 1755 1239 7018 6341
Univ. of Tehran 30
COMMUNITY Attribute to the Rescue!
AS 1
customerAS 2
provider
192.0.2.0/24
192.0.2.0/24ASPATH = 2
AS 3provider
backupprimary
192.0.2.0/24ASPATH = 2 COMMUNITY = 3:70
Customer import policy at AS 3:If 3:90 in COMMUNITY then set local preference to 90If 3:80 in COMMUNITY then set local preference to 80If 3:70 in COMMUNITY then set local preference to 70
AS 3: normal customer local pref is 100,peer local pref is 90
Univ. of Tehran 31
Hot Potato Routing: Go for the Closest Egress Point
192.44.78.0/24
15 56 IGP distances
egress 1 egress 2
This Router has two BGP routes to 192.44.78.0/24.
Hot potato: get traffic off of your network as Soon as possible. Go for egress 1!
Univ. of Tehran 32
Getting Burned by the Hot Potato
15 56
172865High bandwidth
Provider backbone
Low bandwidthcustomer backbone
Heavy Content Web Farm
Many customers want their provider to carry the bits!
tiny http request
huge http reply
SFF NYC
San Diego
Univ. of Tehran 33
Cold Potato Routing with MEDs(Multi-Exit Discriminator Attribute)
15 56
172865
Heavy Content Web Farm
192.44.78.0/24
192.44.78.0/24MED = 15
192.44.78.0/24MED = 56
This means that MEDs must be considered BEFOREIGP distance!
Prefer lower MED values
Note1 : some providers will not listen to MEDs
Note2 : MEDs need not be tied to IGP distance
Univ. of Tehran Computer Network 34
Route Selection Summary
Highest Local Preference
Shortest ASPATH
Lowest MED
i-BGP < e-BGP
Lowest IGP cost to BGP egress
Lowest router ID
traffic engineering
Enforce relationships
Throw up hands andbreak ties
This is somewhat simplified. Hey, what happened to ORIGIN??
Univ. of Tehran Computer Network 35
Policies Can Interact Strangely(“Route Pinning” Example)
backup
Disaster strikes primary linkand the backup takes over
Primary link is restored but sometraffic remains pinned to backup
1 2
3 4
Install backup link using community
customer
Univ. of Tehran Computer Network 36
Path Attributes Categories (recall flags):
well-known mandatory (passed on) well-known discretionary (passed on) optional transitive (passed on) optional non-transitive (if unrecognized,
not passed on) Optional attributes allow for BGP
extensions
Univ. of Tehran Computer Network 37
Path attribute message format (repeated)
O T P E
Attribute flags Attribute type code
0
O: optional or well-knownT: transitive or localP: partially evaluatedE: length in 1 or 2 bytes
OriginAS_pathNext hopetc.
Univ. of Tehran Computer Network 38
ORIGIN path attribute Well-known, mandatory attribute. Describes how a prefix was
generated at the origin AS. Possible values: IGP: prefix learned from IGP EGP: prefix learned through EGP INCOMPLETE: none of the above (often
seen for static routes)
Univ. of Tehran Computer Network 39
AS_PATH attribute Well-known, mandatory attribute. Important components:
list of traversed AS’s If forwarding to internal peer:
do not modify AS_PATH attribute If forwarding to external peer:
prepend self into the path
Univ. of Tehran Computer Network 40
Next hop path attribute Well-known, mandatory attribute NEXT_HOP: IP address of border
router to be used as next hop Usually, next hop is the router
sending the UPDATE message Useful when some routers do not
speak BGP
Univ. of Tehran Computer Network 41
Example of NEXT_HOP
A(BGP)
B(BGP)
C(no BGP)
138.39.0.0/16
UPDATE MSG through BGP
Traffic to 138.39.0.0/16
Univ. of Tehran Computer Network 42
LOCAL PREF Local (within an AS) mechanism to
provide relative priority among BGP routers
R1 R2
R3 R4I-BGP
AS 256
AS 300
Local Pref = 500 Local Pref =800
AS 100
R5
AS 200
Univ. of Tehran Computer Network 43
AS_PATH List of traversed AS’s
AS 500
AS 300
AS 200 AS 100
180.10.0.0/16 300 200 100170.10.0.0/16 300 200
170.10.0.0/16 180.10.0.0/16
Univ. of Tehran Computer Network 44
CIDR and BGP
AS X197.8.2.0/24
AS Y197.8.3.0/24
AS T (provider)197.8.0.0/23
AS Z
What should T announce to Z?
Univ. of Tehran Computer Network 45
Options Advertise all paths:
Path 1: through T can reach 197.8.0.0/23 Path 2: through T can reach 197.8.2.0/24 Path 3: through T can reach 197.8.3.0/24
But this does not reduce routing tables! We would like to advertise: Path 1: through T can reach 197.8.0.0/22
Univ. of Tehran Computer Network 46
Sets and Sequences Problem: what do we list in the route?
List T: omitting information not acceptable, may lead to loops
List T, X, Y: misleading, appears as 3-hop path
Solution: restructure AS Path attribute as: Path: (Sequence (T), Set (X, Y)) If Z wants to advertise path:
Path: (Sequence (Z, T), Set (X, Y)) In practice used only if paths in set have same
attributes
Univ. of Tehran Computer Network 47
Multi-Exit Discriminator (MED)
Hint to external neighbors about the preferred path into an AS Non-transitive attribute (we will see
later why) Different AS choose different scales
Used when two AS’s connect to each other in more than one place
Univ. of Tehran Computer Network 48
MED Hint to R1 to use R3 over R4 link Cannot compare AS40’s values to
AS30’s
R1 R2
R3 R4
AS 30
AS 40
180.10.0.0MED = 120
180.10.0.0MED = 200
AS 10
180.10.0.0MED = 50
Univ. of Tehran Computer Network 49
MED
• MED is typically used in provider/subscriber scenarios• It can lead to unfairness if used between ISP because it
may force one ISP to carry more traffic:
SF
NY
• ISP1 ignores MED from ISP2• ISP2 obeys MED from ISP1• ISP2 ends up carrying traffic most of the way
ISP1
ISP2
Univ. of Tehran Computer Network 50
Other Attributes ORIGIN
Source of route (IGP, EGP, other) NEXT_HOP
Address of next hop router to use Used to direct traffic to non-BGP router
Check out http://www.cisco.com for full explanation
Univ. of Tehran Computer Network 51
Decision Process Processing order of attributes:
Select route with highest LOCAL-PREF Select route with shortest AS-PATH Apply MED (if routes learned from same
neighbor)
Univ. of Tehran Computer Network 52
Outline External BGP (E-BGP)
Internal BGP (I-BGP)
Multi-Homing
Stability Issues
Univ. of Tehran Computer Network 53
Internal vs. External BGP
R3 R4R1
R2
E-BGP
•BGP can be used by R3 and R4 to learn routes•How do R1 and R2 learn routes?•Option 1: Inject routes in IGP
•Only works for small routing tables•Option 2: Use I-BGP
AS1 AS2
Univ. of Tehran Computer Network 55
Internal BGP (I-BGP) Same messages as E-BGP Different rules about re-advertising
prefixes: Prefix learned from E-BGP can be
advertised to I-BGP neighbor and vice-versa, but
Prefix learned from one I-BGP neighbor cannot be advertised to another I-BGP neighbor
Reason: no AS PATH within the same AS and thus danger of looping.
Univ. of Tehran Computer Network 56
Internal BGP (I-BGP)
R3 R4
R1
R2
E-BGP
I-BGP
• R3 can tell R1 and R2 prefixes from R4• R3 can tell R4 prefixes from R1 and R2• R3 cannot tell R2 prefixes from R1
R2 can only find these prefixes through a direct connection to R1Result: I-BGP routers must be fully connected (via TCP)!
• contrast with E-BGP sessions that map to physical links
AS1 AS2
Univ. of Tehran Computer Network 57
Link Failures Two types of link failures:
Failure on an E-BGP link Failure on an I-BGP Link
These failures are treated completely different in BGP
Why?
Univ. of Tehran Computer Network 58
Failure on an E-BGP Link
AS1 R1 AS2R2
Physical link
E-BGP session
138.39.1.1/30 138.39.1.2/30
• If the link R1-R2 goes down• The TCP connection breaks• BGP routes are removed
• This is the desired behavior
Univ. of Tehran Computer Network 59
Failure on an I-BGP Link
R1
R2
R3
Physical link
I-BGP connection
138.39.1.1/30
138.39.1.2/30
•If link R1-R2 goes down, R1 and R2 should still be able to exchange traffic
•The indirect path through R3 must be used•Thus, E-BGP and I-BGP must use different conventions with respect to TCP endpoints
Univ. of Tehran Computer Network 60
Outline External BGP (E-BGP)
Internal BGP (I-BGP)
Multi-Homing
Stability Issues
Univ. of Tehran Computer Network 61
Multi-homing With multi-homing, a single network
has more than one connection to the Internet.
Improves reliability and performance: Can accommodate link failure Bandwidth is sum of links to Internet
Challenges Getting policy right (MED, etc..) Addressing
Univ. of Tehran Computer Network 62
Multi-homing to a Single Provider Case 1
ISP
Customer
R1
R2
Easy solution: Use IMUX or Multi-
link PPP Hard solution:
Use BGP Makes assumptions
about traffic (same amount of prefixes can be reached from both links)
Univ. of Tehran Computer Network 63
Multi-homing to a single provider: Case 2
ISP
Customer
R1
R2
If multiple prefixes, may use MED good if traffic load
from prefixes is equal If single prefix, load
may be unequal break-down prefix
and advertise different prefixes over different links
R3
138.39/16 204.70/16
Univ. of Tehran Computer Network 64
Multi-homing to a single provider: Case 3
ISP
Customer
R1 R2
For ISP-> customer traffic, same as before: use MED good if traffic load
to prefixes is equal For customer ->
ISP traffic: R3 alternates links multiple default
routes
R3
138.39/16 204.70/16
Univ. of Tehran Computer Network 65
Multi-homing to a single provider: Case 4
ISP
Customer
R1 R2
Most reliable approach no equipment
sharing Customer -> ISP:
same as case 2 ISP -> customer:
same as case 3 R3
138.39/16 204.70/16
R4
Univ. of Tehran Computer Network 66
Multi-homing to Multiple Providers
Major issues: Addressing Aggregation
Customer address space: Delegated by ISP1 Delegated by ISP2 Delegated by ISP1 and
ISP2 Obtained independently
Advantage and disadvantage?
ISP1 ISP2
ISP3
Customer
Univ. of Tehran Computer Network 67
Address Space from one ISP
Customer uses address space from one, I.e ISP1
ISP1 advertises /16 aggregate
Customer advertises /24 route to ISP2
ISP2 relays route to ISP1 and ISP3
ISP2-3 use /24 route ISP1 routes directly Problems with traffic
load?
138.39/16
138.39.1/24
ISP1 ISP2
ISP3
Customer
Univ. of Tehran Computer Network 68
Pitfalls ISP1 aggregates to a
/19 at border router to reduce internal tables.
ISP1 still announces /16. ISP1 hears /24 from
ISP2. ISP1 routes packets for
customer to ISP2! Workaround: ISP1 must
inject /24 into I-BGP.
138.39.0/19
138.39/16
ISP1 ISP2
ISP3
Customer
138.39.1/24
Univ. of Tehran Computer Network 69
Address Space from Both ISPs
ISP1 and ISP2 continue to announce aggregates
Load sharing depends on traffic to two prefixes
Lack of reliability: if ISP1 link goes down, part of customer becomes inaccessible.
Customer may announce prefixes to both ISPs, but still problems with longest match as in case 1.
138.39.1/24 204.70.1/24
ISP1 ISP2
ISP3
Customer
Univ. of Tehran Computer Network 70
Address Space Obtained Independently
Offers the most control, but at the cost of aggregation.
Still need to control paths
suppose ISP1 large, ISP2-3 small
customer advertises long path to ISP1, but local-pref attribute used to override
ISP3 learns shorter path from ISP2
ISP1 ISP2
ISP3
Customer
Univ. of Tehran Computer Network 71
Outline External BGP (e-BGP)
Internal BGP (i-BGP)
Multi-Homing
Stability Issues
Univ. of Tehran Computer Network 72
Signs of Routing Instability Record of BGP messages at major
exchanges Discovered orders of magnitude larger
than expected updates Bulk were duplicate withdrawals
Stateless implementation of BGP – did not keep track of information passed to peers
Impact of few implementations Strong frequency (30/60 sec) components
Interaction with other local routing/links etc.
Univ. of Tehran Computer Network 73
Route Flap Storm Overloaded routers fail to send
Keep_Alive message and marked as down
I-BGP peers find alternate paths Overloaded router re-establishes
peering session Must send large updates Increased load causes more routers
to fail!
Univ. of Tehran Computer Network 74
Route Flap Dampening Routers now give higher priority to
BGP/Keep_Alive to avoid problem Associate a penalty with each route
Increase when route flaps Exponentially decay penalty with time
When penalty reaches threshold, suppress route
Univ. of Tehran Computer Network 75
BGP Limitations: Oscillations
AS 0
AS 2 AS 1
R
(*R,1R,2R)
(0R,*R,2R)(0R,1R,*R)
Univ. of Tehran Computer Network 76
BGP Limitations: Oscillations
AS 0
AS 2 AS 1
R
(-,*1R,2R)
(*0R,-,2R)(*0R,1R,-)
W
WW
(*R,1R,2R)
(0R,*R,2R)(0R,1R,*R)
Univ. of Tehran Computer Network 77
BGP Limitations: Oscillations
AS 0
AS 2 AS 1
R
(-,*1R,2R)
(-,-,*2R)(01R,*1R,-)
01R01R
(-,*1R,2R)
(*0R,-,2R)(*0R,1R,-)
Univ. of Tehran Computer Network 78
BGP Limitations: Oscillations
AS 0
AS 2 AS 1
R
(-,-,*2R)
(-,-,*2R)(*01R,10R,-)
10R
10R
(-,*1R,2R)
(-,-,*2R)(01R,*1R,-)
Univ. of Tehran Computer Network 79
BGP Limitations: Oscillations
AS 0
AS 2 AS 1
R
(-,-,-)
(-,-,*20R)(*01R,10R,-)
20R
20R
(-,-,*2R)
(-,-,*2R)(*01R,10R,-)
Univ. of Tehran Computer Network 80
BGP Limitations: Oscillations
AS 0
AS 2 AS 1
R
(-,*12R,-)
(-,-,*20R)(*01R,-,-)
12R
12R
(-,-,-)
(-,-,*20R)(*01R,10R,-)
Univ. of Tehran Computer Network 81
BGP Limitations: Oscillations
AS 0
AS 2 AS 1
R
(-,*12R,21R)
(-,-,-)(*01R,-,-)
21R
21R
(-,*12R,-)
(-,-,*20R)(*01R,-,-)
Univ. of Tehran Computer Network 82
BGP Oscillations Can possible explore every possible path
through network (n-1)! Combinations Limit between update messages
(MinRouteAdver) reduces exploration Forces router to process all outstanding
messages Typical Internet failover times
New/shorter link 60 seconds Results in simple replacement at nodes
Down link 180 seconds Results in search of possible options
Longer link 120 seconds Results in replacement or search based on length
Univ. of Tehran Computer Network 83
Problems Routing table size
Need an entry for all paths to all networks
Required memory= O((N + M*A) * K) N: number of networks M: mean AS distance (in terms of hops) A: number of AS’s K: number of BGP peers
Univ. of Tehran Computer Network 84
Routing Table Size
Mean AS Distance Number of AS’s
2,100 5 59
4,000 10 100
10,000 15 300
BGP Peers/Net
3
6
10
100,000 20 3,000 20
Networks Memory
27,000
108,000
490,000
1,040,000
Problem reduced with CIDR