Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | gray-gould |
View: | 34 times |
Download: | 0 times |
Global Internet & BGP
Acknowledgement(texts & figs)
• Kurose
• Govindan
• Zahid
• Peterson & Davie
• Kevin
Hierarchies
• What?– Logical structure overlaid on collections of
nodes
• Why?– Together with information abstraction, the only
known solution to scaling issues
Routing Hierarchies
• Flat routing doesn’t scale– Each node cannot be expected to have routes to every
destination (or destination network)
• Key observation– Need less information with increasing distance to
destination
• Two radically different approaches for routing– The area hierarchy
– The landmark hierarchy
Inter-AS routing
Autonomous systems
• What is an AS?– A set of routers under a single technical
administration, using an interior gateway protocol (IGP) and common metrics to route packets within the AS and using an exterior gateway protocol (EGP) to route packets to other AS’s.
– sometimes AS’s use multiple IGPs and metrics, but appear as single AS’s to other AS’s.
Subnetting & CIDR
Global Addresses
• Properties– IPv4 uses 32 bit address space– globally unique– hierarchical: network + host
• Dot Notation– 10.3.2.4– 128.96.33.81– 192.12.69.77
• Assigning authority– Jon Postel ran IANA ‘til ‘98– Assigned by ICANN
Network Host
7 24
0A:
Network Host
14 16
1 0B:
Network Host
21 8
1 1 0C:
1 1 1D: 0 Multicast
1 1 1E: 1 Experimental
How to Make Routing Scale• Flat (Ethernet) versus Hierarchical (Internet) Addresses
– All hosts attached to same network have same network address
• Problem: inefficient use of Hierarchical Address Space– class C with 2 hosts (2/255 = 0.78% efficient)– class B with 256 hosts (256/65535 = 0.39% efficient)
• Problem: still Too Many Networks– routing tables do not scale
• Big tables make routers expensive
– route propagation protocols do not scale
Today’s Internet
• Consists of ISP’s (Internet Service Providers) who run AS’s (Autonomous Systems)
• All you need to become an ISP is some address space, an AS number and a peer or two– Easier said than done
• Getting addresses and AS number is the tricky part• There are public peering points (MAE East, Central and West)
– NAP’s run by MCI where peering can take place
• Most peering points are private
• Number of connections have been doubling for some time – how do we deal with this kind of scaling?
Subnetting - 1985• Original intent was for network to identify one physical network
– Lots of small networks are what we actually have – how do we handle this?
• Solution: add another level to address/routing hierarchy: subnet• Subnet masks define variable partition of host part
– 1’s identify subnet, 0’s identify hosts within the subnet– Mechanism for sharing a single network number among multiple networks
• Subnets visible only within a site
Network number Host number
Class B address
Subnet mask (255.255.255.0)
Subnetted address
111111111111111111111111 00000000
Network number Host IDSubnet ID
Subnetting
• Subnetting is the process of creating multiple segments within a single IP network address space
• From the perspective of a node outside the network, all nodes on any of the subnetworks appear to be on the original single network
• Internet routing tables are not affected by subnetting, I.e. routing tables need not be overloaded with information about routes to all internal subnets, just information to the access router/gateway
Subnetting (Continued)
• Classes A, B and C in IP addressing are designed with two levels of hierarchy (netid & hostid)
• Problem: An organization with a class B address can not have more than network and all 216 hosts are attached to that network A nightmare in managing this network, single broadcast domain, security issues, etc…
• Subnetting create another level of hierarchy (netid, subnetid and hostid). Delivery of IP packets involves three steps; delivery to the site router, delivery to the subnet router, delivery to the host
Subnet Example
Forwarding table at router R1Subnet Number Subnet Mask Next Hop128.96.34.0 255.255.255.128 interface 0128.96.34.128 255.255.255.128 interface 1128.96.33.0 255.255.255.0 R2
Subnet mask: 255.255.255.128Subnet number: 128.96.34.0
128.96.34.15 128.96.34.1
H1R1
128.96.34.130Subnet mask: 255.255.255.128Subnet number: 128.96.34.128
128.96.34.129128.96.34.139
R2H2
128.96.33.1128.96.33.14
Subnet mask: 255.255.255.0Subnet number: 128.96.33.0
H3
A network with two levels of hierarchy
R 141.14.0.0
141.14.2.20141.14.2.21 141.14.2.105
141.14.7.96
141.14.7.95
141.14.7.44141.14.22.64
141.14.22.8
To the internet
A network with three levels of hierarchy
R
141.14.2.0
141.14.7.0
141.14.22.0
141.14.2.20 141.14.2.21141.14.2.105
141.14.7.96
141.14.4.45
141.14.22.9
141.14.22.64
To the internet
The previous network isdivided into 3 subnets
Subnet Masking• Subnetting is achieved by “stealing” some bits from
the hostid field to represent the subnet portion of the address
• Those bits that are used for the subnetid are identified through the use of a subnet mask
• Masking is the process of extracting the address of the physical network (if subnetting is not used) or the subnet address (if subnetting is used) from an IP address
• A subnet mask is a 32-bit pattern having a “1” in every netid and subnetid locations and a “0” in every hostid location
Subnet masking (Continued)
• Subnet masking is performed (both at the host and at the router) by applying “bit-wise-and” operation between the IP address and the subnet mask
• Example 1: Class B network without subnetting– 141.14.2.21 is 10001101.00001110.00000010.00010101
– 255.255.0.0 is 11111111.11111111.00000000.00000000
– “Bit-wise and”10001101.00001110.00000000.00000000
Mask255.255.0.0
IP address141.14.2.21
Network address 141.14.0.0
Subnet Masking (Continued)
• Example 2: Class B network with subnetting– 141.14.2.21 is
10001101.00001110.00000010.00010101
– 255.255.255.0 is 11111111.11111111.11111111.00000000
– “Bit-wise and” is 10001101.00001110.00000010.00000000
– The subnet address is hence 141.14.2.0
Mask
255.255.255.0IP address141.14.2.21
Subnet address 141.14.2.0
Forwarding Algorithm
D = destination IP addressfor each entry (SubnetNum, SubnetMask, NextHop)
D1 = SubnetMask & D if D1 = SubnetNum if NextHop is an interface deliver datagram directly to D
else deliver datagram to NextHop
Forwarding
• Use a default router if nothing matches• Not necessary for all 1s in subnet mask to be
contiguous • Can put multiple subnets on one physical network• Subnets not visible from the rest of the Internet• This is a simple, toy example!!
Subnets
• Subnetting is not the only way to solve scalability problems
• Additional router support is necessary to include netmask and forwarding functionality
• Non-contiguous netmask numbers can be used
– They make administration more difficult
• Multiple subnets can reside on a single network
– Requires routers within the network
• Subnets help solve scalability problems
– Do not require us to use class B or C address for each physical network
– Help us to aggrigate information
• Chief advantage of IP addresses: routers could keep one entry per network instead of one per destination host
Continued Problems with IPv4 Addresses
• Problem: – Potential exhaustion of IPv4 address space (due to
inefficiency)• Class B network numbers are highly prized• Lots of class C addresses but no one wants them
– Growth of back bone routing tables• We don’t want lots of small networks since this causes large routing
tables
• Solution: – Allow addresses assigned to a single entity to span multiple
classed prefixes– Enhance route aggregation
Supernetting • Assign block of contiguous network numbers to nearby
networks• Called CIDR: Classless Inter-Domain Routing
– Breaks rigid boundries between address classes– If ISP needs 16 class C addresses, make them contiguous
• Eg.192.4.16 to 192.4.31 enables a 20-bit network number
• Represent blocks (number of class C networks) with a single pair (first_network_address, count)
• Restrict block sizes to powers of 2• Use a bit mask (CIDR mask) to identify block size• All routers must understand CIDR addressing
IP addressing: CIDR
• CIDR: Classless InterDomain Routing– network portion of address of arbitrary length– address format: a.b.c.d/x, where x is # bits in
network portion of address
11001000 00010111 00010000 00000000
networkpart
hostpart
200.23.16.0/23
CIDR (continued)
• Why?– Reduce amount of global routing information
via aggregation
Service Provider
GlobalInternetRoutingMesh
204.71.0.0204.71.1.0
204.71.2.0204.71.3.0
204.71.4.0
204.71.0.0/16
CIDR Addresses• Identifying a CIDR block requires both an address and a mask
– Slash notation– 128.211.168.0/21 for addresses 128.211.168.0 – 128.211.175.255
• Here the /21 indicates a 21 bit mask
– All possible CIDR masks can easily be generated• /8, /16, /24 correspond to traditional class A, B, C categories
• IP addresses are now arbitrary integers, not classes• Raises interesting questions about lookups
– Routers cannot determine the division between prefix and suffix just by looking at the address
• Hashing does not work well• Interesting lookup algorithms have been developed and analyzed
CIDR – A Couple Details
• ISP’s can further subdivide their blocks of addresses using CIDR
• Some prefixes are reserved for private addresses– 10/8, 172.16/12, 192.168/16, 169.254/16– These are not routable in the Internet
Inter-domain routing
BGP
AS Numbers (ASNs)ASNs are 16 bit values.
64512 through 65535 are “private”
• Genuity: 1 • MIT: 3• Harvard: 11• UC San Diego: 7377• AT&T: 7018, 6341, 5074, … • UUNET: 701, 702, 284, 12199, …• Sprint: 1239, 1240, 6211, 6242, …• …
ASNs represent units of routing policy
Currently over 11,000 in use.
How Many ASNs are there?
Thanks to Geoff Huston. http://www.telstra.net/ops on June 23, 2001
64,511
2005?2007?
When will we run out of ASNs?
Internet inter-AS routing: BGP
• BGP (Border Gateway Protocol): the de facto standard
• Path Vector protocol:
– similar to Distance Vector protocol
– each Border Gateway broadcast to neighbors (peers) entire path (I.e, sequence of ASs) to destination
– E.g., Gateway X may send its path to dest. Z:
Path (X,Z) = X,Y1,Y2,Y3,…,Z
Internet inter-AS routing: BGPSuppose: gateway X send its path to peer gateway W• W may or may not select path offered by X
– cost, policy (don’t route via competitors AS), loop prevention reasons.
• If W selects path advertised by X, then:Path (W,Z) = w, Path (X,Z)
• Note: X can control incoming traffic by controling it route advertisements to peers:– e.g., don’t want to route traffic to Z -> don’t advertise any routes to Z
Internet inter-AS routing: BGP• BGP messages exchanged using TCP.• BGP messages:
– OPEN: opens TCP connection to peer and authenticates sender
– UPDATE: advertises new path (or withdraws old)– KEEPALIVE keeps connection alive in absence of
UPDATES; also ACKs OPEN request– NOTIFICATION: reports errors in previous msg;
also used to close connection
Why different Intra- and Inter-AS routing ? Policy: • Inter-AS: admin wants control over how its traffic
routed, who routes through its net. • Intra-AS: single admin, so no policy decisions needed
Scale:• hierarchical routing saves table size, reduced update
trafficPerformance: • Intra-AS: can focus on performance• Inter-AS: policy may dominate over performance
Architecture of Dynamic Routing
AS 1
AS 2
BGP
EGP = Exterior Gateway Protocol
IGP = Interior Gateway Protocol
Metric based: OSPF, IS-IS, RIP, EIGRP (cisco)
Policy based: BGP
The Routing Domain of BGP is the entire Internet
OSPF
EIGRP
38
Many Routing Processes Can Run on a Single Router
Forwarding Table
OSPFDomain
RIPDomain
BGP
OS kernel
OSPF Process
OSPF Routing tables
RIP Process
RIP Routing tables
BGP Process
BGP Routing tables
Forwarding Table Manager
AS categories
– Stub: an AS that has only a single connection to one other AS - carries only local traffic.
– Multihomed: an AS that has connections to more than one AS, but refuses to carry transit traffic
– Transit: an AS that has connections to more than one AS, and carries both transit and local traffic (under certain policy restrictions)
40
Nontransit vs. Transit ASes
ISP 1ISP 2
Nontransit ASmight be a corporateor campus network.Could be a “content provider”
NET ATraffic NEVER flows from ISP 1through NET A to ISP 2(At least not intentionally!)
IP traffic
Internet Serviceproviders (often)have transit networks
41
Selective Transit
NET BNET C
NET A provides transitbetween NET B and NET Cand between NET D and NET C
NET A
NET D
NET A DOES NOTprovide transitBetween NET D and NET B
Most transit networks transit in a selective manner…
IP traffic
Choices for global routing
• Link state or distance vector?– no universal metric - policy decisions
• Problems with distance-vector:– Bellman-Ford algorithm may not converge
• Problems with link state:– metric used by routers not the same - loops– LS database too large - entire Internet– may expose policies to other AS’s
Solution: Path Vectors
• Each routing update carries the entire path
• Loops are detected as follows:– when AS gets route check if AS already in path– if yes, reject route– if no, add self and advertise route further
• Advantage:– metrics are local - AS chooses path, protocol
ensures no loops
44
ASPATH Attribute
AS7018135.207.0.0/16AS Path = 6341
AS 1239Sprint
AS 1755Ebone
AT&T
AS 3549Global Crossing
135.207.0.0/16AS Path = 7018 6341
135.207.0.0/16AS Path = 3549 7018 6341
AS 6341
135.207.0.0/16
AT&T Research
Prefix Originated
AS 12654RIPE NCCRIS project
AS 1129Global Access
135.207.0.0/16AS Path = 7018 6341
135.207.0.0/16AS Path = 1239 7018 6341
135.207.0.0/16AS Path = 1755 1239 7018 6341
135.207.0.0/16AS Path = 1129 1755 1239 7018 6341
45
Interdomain Loop Prevention
BGP at AS YYY will never accept a route with ASPATH containing YYY.
AS 7018
12.22.0.0/16ASPATH = 1 333 7018 877
Don’t Accept!
AS 1
Problems
• Routing table size– need an entry for all paths to all networks
• Required memory= O(N + M*A) * K)– N: number of networks– M: mean AS distance– A: number of AS’s– K: number of BGP peers
• Problem reduced with CIDR
Routing information bases (RIB)
• Routes are stored in RIBs
• Adj-RIBs-In: routing info that has been learned from other routers (unprocessed routing info)
• Loc-RIB: local routing information selected from Adj-RIBs-In (routes selected locally)
• Adj-RIBs-Out: info to be advertised to peers (routes to be advertised)
Conceptual Mode of Operation
• RIB = Routing information baseAdj-RIB-In Adj-RIB-Out
Loc-RIB
per BGP neighbor per BGP neighbor
Routing table size
networks(NLRI)
mean ASdistance
# of AS’s BGPpeers/net
memory
2,100 5 59 3 27,000
4,000 10 100 6 108,000
10,000 15 300 10 490,000
100,000 20 3,000 20 1,040,000
Policy with BGP
• BGP provides capability for enforcing various policies
• Policies are not part of BGP: they are provided to BGP as configuration information
• BGP enforces policies by choosing paths from multiple alternatives and controlling advertisement to other AS’s
51
So Many Choices
Which route shouldFrank pick to 13.13.0.0./16?
AS 1
AS 2
AS 4
AS 3
13.13.0.0/16
Frank’s Internet Barn
peer peer
customerprovider
52
Implementing Backup Links with Local Preference (Outbound
Traffic)
Forces outbound traffic to take primary link, unless link is down.
AS 1
primary link backup link
Set Local Pref = 100for all routes from AS 1 AS 65000
Set Local Pref = 50for all routes from AS 1
We’ll talk about inbound traffic soon …
53
Back to Frank …
AS 1AS 2
AS 4
AS 3
13.13.0.0/16
peer peer
customerprovider
local pref = 80
local pref = 100
local pref = 90
Higher Localpreference valuesare more preferred
Local preference only used in iBGP
CIDR and BGP
AS X197.8.2.0/24
AS Y197.8.3.0/24
AS T (provider)197.8.0.0/23
AS Z
What should T announce to Z?
Options
• Advertise all paths:– Path 1: through T can reach 197.8.0.0/23– Path 2: through T can reach 197.8.2.0/24– Path 3: through T can reach 197.8.3.0/24
• But this does not reduce routing tables! We would like to advertise:– Path 1: through T can reach 197.8.0.0/22
Sets and Sequences
• Problem: what do we list in the route?– list T: omitting information- not acceptable,
may lead to loops– list T, X, Y: misleading, appears as 3-hop path
• Solution: restructure AS Path attribute as:– Path: (Sequence (T), Set (X, Y))– if Z wants to advertise path:
• Path: (Sequence (Z, T), Set (X, Y))
Routing Areas
• Address areas hierarchically– sequentially number top-level areas– sub-areas of area are labeled relative to that
area– nodes are numbered relative to the smallest
containing area• nodes can have multiple addresses
Routing
• Within area– each node has routes to every other node
• Outside area– each node has routes for other top-level areas
only– inter-area packets are routed to nearest border
router
• Can result in sub-optimal paths
Path Suboptimality
1 2
3
1.11.2
2.1 2.2
3.1 3.2
2.2.1
3 hop red pathvs2 hop green path
In fairness: could you do this “right” and still scale?
Exporting internalstate would dramatically increase global instability and amount of routingstate
Shorter Doesn’t Always Mean Shorter
AS 4
AS 3
AS 2
AS 1
Mr. BGP says that path 4 1 is better than path 3 2 1
Duh!
BGP limitations
Delayed Internet Routing Convergence
BGP Limitations: Policy
A
B
C
D
E
F
BGP Limitations: Oscillations
A
B C
D
Daily Update Count
What is the Sound of One Route Flapping?
Implementation Does Matter!
Thanks to Abha Ahuja and Craig Labovitz for this plot.
stateless withdrawswidely deployed
stateful withdrawswidely deployed
A Few Bad Apples …
Thanks to Madanlal Musuvathi for this plot. Data source: RIPE NCC
Typically, 80% ofthe updates are for less than 5% Of the prefixes.
Most prefixes are stable most of the time. On this day, about 83% of the prefixes were not updated.
Percent of BGP table prefixes
30 Second Bursts
How Long Does BGP Take to Adapt to Changes?
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100 120 140 160
Seconds Until Convergence
Cu
mu
lati
ve P
erce
nta
ge
of
Eve
nts
Tup
Tshort
Tlong
Tdow n
Thanks to Abha Ahuja and Craig Labovitz for this plot.