Network Layer 4-1
Chapter 4Network Layer
Computer Networking: A Top Down Approach Featuring the Internet, 3rd edition. Jim Kurose, Keith RossAddison-Wesley, July 2004.
A note on the use of these ppt slides:We’re making these slides freely available to all (faculty, students, readers). They’re in PowerPoint form so you can add, modify, and delete slides (including this one) and slide content to suit your needs. They obviously represent a lot of work on our part. In return for use, we only ask the following:q If you use these slides (e.g., in a class) in substantially unaltered form, that you mention their source (after all, we’d like people to use our book!)q If you post any slides in substantially unaltered form on a www site, that you note that they are adapted from (or perhaps identical to) our slides, and note our copyright of this material.
Thanks and enjoy! JFK/KWR
All material copyright 1996-2005J.F Kurose and K.W. Ross, All Rights Reserved
Network Layer 4-2
Chapter 4: Network Layer
Chapter goals:r understand principles behind network layer
services:m network layer service modelsm forwarding versus routingm how a router worksm routing (path selection)m dealing with scalem advanced topics: IPv6, mobility
r instantiation, implementation in the Internet
Network Layer 4-3
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-4
Network layerr transport segment from
sending to receiving host r on sending side
encapsulates segments into datagrams
r on rcving side, delivers segments to transport layer
r network layer protocols in every host, router
r Router examines header fields in all IP datagrams passing through it
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
Network Layer 4-5
Network layer functionsr Transport packet from
sending to receiving hosts r Network layer protocols in
every host, router
m Addressing• flat vs. hierarchical
– Routing table size?
• global vs. local– NAT
• variable vs. fixed length– processing cost
– Header size
– Address flexibility
m Delivery semantics:• Unicast, multicast (IPv4)• Anycast (IPv6)• Broadcast• In-order (ATM)• Any-order (IP)
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
Network Layer 4-6
Network layer functionsr Transport packet from
sending to receiving hosts r Network layer protocols in
every host, router
m Security• secrecy, integrity, authenticity
m Fragmentation• break-up packets based on data-link
layer propertiesm Quality-of-service
• provide predictable performancem Routing
• path selection and packet forwarding m Demux to upper layer
• next protocol• Can be either transport or network
(tunneling)m Connection setup
• ATM, X.25, Frame-relay• Host-to-host network layer
connection vs. process to process transport layer
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
Network Layer 4-7
Network service modelCombining the functions into a particular networkQ: What service model for “channel” transporting datagrams from sender to rcvr?
Example services for individual datagrams:
r guaranteed deliveryr Guaranteed delivery
with less than 40 msec delay
Example services for a flow of datagrams:
r In-order datagram delivery
r Guaranteed minimum bandwidth to flow
r Restrictions on changes in inter-packet spacing (jitter)
Network Layer 4-8
Network layer service models:
NetworkArchitecture
Internet
ATM
ATM
ATM
ATM
ServiceModel
best effort
CBR
VBR
ABR
UBR
Bandwidth
none
constantrateguaranteedrateguaranteed minimumnone
Loss
no
yes
yes
no
no
Order
no
yes
yes
yes
yes
Timing
no
yes
yes
no
no
Congestionfeedback
no (inferredvia loss)nocongestionnocongestionyes
no
Guarantees ?
Network Layer 4-9
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-10
Network layer connection and connection-less servicerDatagram network provides network-layer
connectionless servicer VC network provides network-layer
connection servicerAnalogous to the transport-layer services,
but:m Service: host-to-hostmNo choice: network provides one or the otherm Implementation: in the core
Network Layer 4-11
Connection-oriented virtual circuitsr Phone circuit abstraction (ATM, phone network)
m Model• call setup and signaling for each call before data can flow• guaranteed performance during call• call teardown and signaling to remove call
m Network support• each packet carries circuit identifier (not destination host ID)• every router on source-dest path maintains “state” for each passing
circuit• link, router resources (bandwidth, buffers) allocated to VC to
guarantee circuit-like performance
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
1. Initiate call2. incoming call3. Accept call4. Call connected
5. Data flow begins 6. Receive data
Network Layer 4-12
Connectionless datagram servicer Postal service abstraction (Internet)
m Model• no call setup or teardown at network layer• no service guarantees
m Network support• no state within network on end-to-end connections• packets forwarded based on destination host ID• packets between same source-dest pair may take different
paths
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
1. Send data 2. Receive data
Network Layer 4-13
Datagram or VC network: why?
Internetr data exchange among
computersm “elastic” service, no strict
timing req. r “smart” end systems
(computers)m can adapt, perform
control, error recoverym simple inside network,
complexity at “edge”r many link types
m different characteristicsm uniform service difficult
ATMr evolved from telephonyr human conversation:
m strict timing, reliability requirements
m need for guaranteed service
r “dumb” end systemsm telephonesm complexity inside
network
Network Layer 4-14
Best of both worlds?• Adding circuits to the Internet
– Intserv, Diffserv (at the end of course if time permits)– Chapter 6 in book
• Support both modes from the start?– ATM
Network Layer 4-15
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-16
The Internet Network layer
forwardingtable
Host, router network layer functions:
Routing protocols•path selection•RIP, OSPF, BGP
IP protocol•addressing conventions•datagram format•packet handling conventions
ICMP protocol•error reporting•router “signaling”
Transport layer: TCP, UDP
Link layer
physical layer
Networklayer
Network Layer 4-17
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-18
How is IP Design Standardized?r IETFm Voluntary organizationmMeeting every 4 monthsmWorking groups and email discussions
r “We reject kings, presidents, and voting; we believe in rough consensus and running code” (Dave Clark 1992)mNeed 2 independent, interoperable implementations
for standardr IRTFm End2End m Reliable Multicast, etc..
Network Layer 4-19
IP datagram format
ver length
32 bits
data (variable length,typically a TCP
or UDP segment)
16-bit identifierInternetchecksum
time tolive
32 bit source IP address
IP protocol versionnumber
header length(bytes)
max numberremaining hops
(decremented at each router)
forfragmentation/reassembly
total datagramlength (bytes)
upper layer protocolto deliver payload to
head.len
type ofservice
“type” of data flgs fragmentoffset
upperlayer
32 bit destination IP address
Options (if any) E.g. timestamp,record routetaken, specifylist of routers to visit.
how much overhead with TCP?
r 20 bytes of TCPr 20 bytes of IPr = 40 bytes + app
layer overhead
Network Layer 4-20
IP headerr Versionm Currently at 4, next version 6
rHeader lengthm Length of header (20 bytes plus options)
rType of Servicem Typically ignoredm Values
• 3 bits of precedence• 1 bit of delay requirements• 1 bit of throughput requirements• 1 bit of reliability requirements
m Replaced by DiffServ and ECNr Lengthm Length of IP fragment (payload)
Network Layer 4-21
IP header (cont)r Identification m To match up with other fragments
r Flagsm Don’t fragment flagmMore fragments flag
r Fragment offsetmWhere this fragment lies in entire IP datagrammMeasured in 8 octet units (11 bit field)
Network Layer 4-22
IP header (cont)r Time to live
m Ensure packets exit the networkr Protocol
m Demultiplexing to higher layer protocolsr Header checksum
m Ensures some degree of header integritym Relatively weak – 16 bit
r Source IP, Destination IP (32 bit addresses)r Options
m E.g. Source routing, record route, etc.m Performance issues
• Poorly supported
Network Layer 4-23
IP quality of servicer IP originally had “type-of-service” (TOS) field to
eventually support qualitymNot used, ignored by most routers
rThen came int-serv (integrated services) and RSVP signallingm Per-flow quality of service through end-to-end
support• Setup and match flows on connection ID• Per-flow signaling• Per-flow network resource allocation (*FQ, *RR scheduling
algorithms)
Network Layer 4-24
IP quality of servicer RSVP
m http://www.rfc-editor.org/rfc/rfc2205.txtm Provides end-to-end signaling to network elementsm General purpose protocol for signaling informationm Not used now on a per-flow basis to support int-serv, but being
reused for diff-serv.r int-serv
m Defines service model (guaranteed, controlled-load)• http://www.rfc-editor.org/rfc/rfc2210.txt• http://www.rfc-editor.org/rfc/rfc2211.txt• http://www.rfc-editor.org/rfc/rfc2212.txt
m Dozens of scheduling algorithms to support these services• WFQ, W2FQ, STFQ, Virtual Clock, DRR, etc.• If this class was being given 5 years ago….
Network Layer 4-25
IP quality of servicerWhy did RSVP, int-serv fail?m Complexity
• Scheduling• Routing• Per-flow signaling overhead
m Lack of scalability• Per-flow state• Route pinning
m Economics• Providers with no incentive to deploy• SLA, end-to-end billing issues
mQoS a weak-link property• Requires every device on an end-to-end basis to support flow
Network Layer 4-26
IP quality of servicerNow it’s diff-serv…m Use the “type-of-service” bits as a priority markingm http://www.rfc-editor.org/rfc/rfc2474.txtm http://www.rfc-editor.org/rfc/rfc2475.txtm http://www.rfc-editor.org/rfc/rfc2597.txtm http://www.rfc-editor.org/rfc/rfc2598.txtm Core network relatively statelessm AF
• Assured forwarding (drop precedence)m EF
• Expedited forwarding (strict priority handling)
Network Layer 4-27
IP Fragmentation & Reassemblyr network links have MTU
(max.transfer size) - largest possible link-level frame.m different link types,
different MTUs r large IP datagram (can be
64KB) “fragmented” within networkm one datagram becomes
several datagramsm IP header on each
fragmentm Bits used to identify,
order fragments
fragmentation: in: one large datagramout: 3 smaller datagrams
reassembly
Network Layer 4-28
IP Fragmentation & Reassemblyr Where to do reassembly?
m End nodes• avoids unnecessary
workm Dangerous to do at
intermediate nodes• Buffer space• Must assume single
path through network• May be re-
fragmented later on in the route again
fragmentation: in: one large datagramout: 3 smaller datagrams
reassembly
Network Layer 4-29
IP Fragmentation and Reassembly
ID=x
offset=0
fragflag=0
length=4000
ID=x
offset=0
fragflag=1
length=1500
ID=x
offset=185
fragflag=1
length=1500
ID=x
offset=370
fragflag=0
length=1040
One large datagram becomesseveral smaller datagrams
Exampler 4000 byte
datagramr MTU = 1500 bytes
1480 bytes in data field
offset =1480/8
Network Layer 4-30
Fragmentation is Harmful
rUses resources poorlym Forwarding costs per packetm Best if we can send large chunks of datamWorst case: packet just bigger than MTU
r Poor end-to-end performancem Loss of a fragment makes other fragments
uselessr Reassembly is hardm Buffering constraints
Network Layer 4-31
Fragmentation
r Referencesm Characteristics of Fragmented IP Traffic on Internet
Links. Colleen Shannon, David Moore, and k claffy --CAIDA, UC San Diego. ACM SIGCOMM Internet Measurement Workshop 2001. http://www.aciri.org/vern/sigcomm-imeas-2001.program.html
– C. A. Kent and J. C. Mogul, "Fragmentation considered harmful," in Proceedings of the ACM Workshop on Frontiers in Computer Communications Technology, pp. 390--401, Aug. 1988.http://www.research.compaq.com/wrl/techreports/abstracts/87.3.html
Network Layer 4-32
Fragmentation
r Path MTU Discoverym Remove fragmentation from the networkm Mandatory in IPv6
• Network layer does no fragmentationm Hosts dynamically discover minimum MTU of path
• http://www.rfc-editor.org/rfc/rfc1191.txt• Algorithm:
– Initialize MTU to MTU for first hop– Send datagrams with Don’t Fragment bit set– If ICMP “pkt too big” msg, decrease MTU
• What happens if path changes?– Periodically (>5mins, or >1min after previous increase), increase
MTU• Some routers will return proper MTU
Network Layer 4-33
IP demux to upper layerr http://www.rfc-editor.org/rfc/rfc1700.txtm Protocol type field
• 1 = ICMP• 2 = IGMP• 3 = GGP• 4 = IP in IP• 6 = TCP• 8 = EGP• 9 = IGP • 17 = UDP• 29 = ISO-TP4• 80 = ISO-IP• 88 = IGRP• 89 = OSPFIGP• 94 = IPIP http://www.rfc-editor.org/rfc/rfc2003.txt
Network Layer 4-34
IP error detectionr IP checksumm IP has a header checksum, leaves data integrity to
TCP/UDPm Catch errors within router or bridge that are not
detected by link layerm Incrementally updated as routers change fieldsm http://www.rfc-editor.org/rfc/rfc1141.txt
Network Layer 4-35
IP delivery semanticsrThe waist of the hourglassm Unreliable datagram servicemOut-of-order delivery possiblem Compare to ATM and phone network…
rUnicast mostlym IP broadcast not forwardedm IP multicast supported, but not widely used
Network Layer 4-36
IP securityr IP originally had no provisions for securityr IPsecm Retrofit IP network layer with encryption and
authenticationm http://www.rfc-editor.org/rfc/rfc2411.txt
Network Layer 4-37
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-38
IP Addressingr IP address: fixed-
length, 32-bit identifier for host, router interfacem semantics getting fuzzy,
though (more later)
r interface: connection between host, router and physical linkm router’s typically have
multiple interfacesm host may have multiple
interfacesm IP addresses associated
with interface, not host, router
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
223.1.1.1 = 11011111 00000001 00000001 00000001
223 1 11
Network Layer 4-39
IP Addressingr IP address:
m network part (high order bits)
m host part (low order bits) r What’s a network ?
m all device interfaces with same network part of IP address
m all interfaces that can physically reach each other without intervening router
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
network consisting of 3 IP networks(for IP addresses starting with 223, first 24 bits are network address)
LAN
Network Layer 4-40
Subnets 223.1.1.0/24223.1.2.0/24
223.1.3.0/24
How to find the networks (subnets)?
r Detach each interface from router, host
r create “islands of isolated networks
r Each isolated network is called a subnet
Subnet mask: /24
Network Layer 4-41
SubnetsHow many? 223.1.1.1
223.1.1.3
223.1.1.4
223.1.2.2223.1.2.1
223.1.2.6
223.1.3.2223.1.3.1
223.1.3.27
223.1.1.2
223.1.7.0
223.1.7.1223.1.8.0223.1.8.1
223.1.9.1
223.1.9.2
Network Layer 4-42
Classful IP Addressing (1981)rTotal IP address size: 4 billionm Initially one large class (8-bit network, 24-bit host)m Classful addressing for smaller networks (LANs)
• Class A: 128 networks, 16M hosts• Class B: 16K networks, 64K hosts• Class C: 2M networks, 256 hosts
High Order Bits0 10 110
Format7 bits of net, 24 bits of host14 bits of net, 16 bits of host21 bits of net, 8 bits of host
ClassABC
Network Layer 4-43
IP address classes
Network ID Host ID8 16
Class A32
0
Class B 10
Class C110
Multicast AddressesClass D 1110
Reserved for experimentsClass E 1111
24
Network ID
Network ID
Host ID
Host ID
1.0.0.0 to 127.255.255.255
128.0.0.0 to 191.255.255.255
192.0.0.0 to 223.255.255.255
224.0.0.0 to 239.255.255.255
Network Layer 4-44
Special IP Addressesr Private addresses
– http://www.rfc-editor.org/rfc/rfc1918.txt– Class A: 10.0.0.0 - 10.255.255.255 (10/8 prefix)– Class B: 172.16.0.0 - 172.31.255.255 (172.16/12 prefix)– Class C: 192.168.0.0 - 192.168.255.255 (192.168/16 prefix)
r 127.0.0.1: local host (a.k.a. the loopback address)r 255.255.255.255
m IP broadcast to local hardware that must not be forwarded m http://www.rfc-editor.org/rfc/rfc919.txtm Same as network broadcast if no subnetting
• IP of network broadcast=NetworkID+(all 1’s for HostID)
r 0.0.0.0m IP address of unassigned host (BOOTP, ARP, DHCP)m Default route advertisement
Network Layer 4-45
IP Addressing Problem #1 (1984)r Inefficient use of address space
m Class A (rarely given out, not many of them given out by IANA)m Class B = 64k hosts
• Very few LANs have close to 64K hosts• Electrical/LAN limitations, performance or administrative reasons • e.g., class B net allocated enough addresses for 64K hosts, even if only 2K
hosts in that networkm Need simple/address-efficient way to get multiple “networks”
• Reduce the total number of addresses that are assigned, but not used
r Subnet addressingm http://www.rfc-editor.org/rfc/rfc917.txtm Split up single large network address ranges into multiple smaller ones
(subnet)
Network Layer 4-46
Subnettingr Variable length subnet masks m Subnet a class B address space into several chunks
Network Host
Network HostSubnet
1111.. 00000000..1111 Mask
Network Layer 4-47
Subnetting ExamplerAssume an organization was assigned address
150.100rAssume < 100 hosts per subnet
m How many host bits do we need? Sevenm What is the network mask?
• 11111111 11111111 11111111 10000000• 255.255.255.128
Network Layer 4-48
IP Address Problem #2 (1991)rAddress space depletionm In danger of running out of classes A and Bm Class A
• very few in number, IANA frugal in giving them outm Class B
• subnetting only applied to new allocations of class B• existing class B networks sparsely populated • people refuse to give it back
m Class C• plenty available, but too small for most domains• giving out multiple class C to a domain explodes # of routes
rSupernettingm Assign multiple consecutive class C blocks as one
blockm http://www.rfc-editor.org/rfc/rfc1338.txt
Network Layer 4-49
CIDRr Evolved into Classless Inter-Domain Routing (CIDR)
• http://www.rfc-editor.org/rfc/rfc1518.txt• http://www.rfc-editor.org/rfc/rfc1519.txt
Network Layer 4-50
IP addressing: CIDR
rOriginal classful addressingm Use class structure (A, B, C) to determine
network ID for route lookuprCIDR: Classless InterDomain Routingm Do not use classes to determine network IDm network portion of address of arbitrary lengthm address format: a.b.c.d/x, where x is # bits in
network portion of address
11001000 00010111 00010000 00000000
networkpart
hostpart
200.23.16.0/23
Network Layer 4-51
CIDR
rAssign any range of addresses to networkm Use common part of address as network numberm e.g., addresses 192.4.16.* to 192.4.31.* have the
first 20 bits in common. Thus, we use this as the network number
m netmask is /20, /xx is valid for almost any xxm 192.4.16.0/20
r Enables more efficient usage of address space (and router tables)
rMore on how this impacts routing later….
Network Layer 4-52
IP addresses: how to get one?
Q: How does host get IP address?
r hard-coded by system admin in a filemWintel: control-panel->network->configuration-
>tcp/ip->propertiesm UNIX: /etc/rc.config
r DHCP: Dynamic Host Configuration Protocol: dynamically get address from as serverm “plug-and-play” (more in next chapter)
Network Layer 4-53
IP addresses: how to get one?Q: How does network get subnet part of IP addr?A: organization gets allocated portion of its provider
ISP’s address spacem ISPs get it from ICANN: Internet Corporation for
Assigned Names and Numbers• Allocates addresses, manages DNS, resolves disputes
ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20
Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23 Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23 Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23
... ….. …. ….
Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23
Network Layer 4-54
IP route lookupsrOriginal IP Route Lookup m In the early days, address classes made it easy
• A: 0 | 7 bit network | 24 bit host (16M each)• B: 10 | 14 bit network | 16 bit host (64K)• C: 110 | 21 bit network | 8 bit host (255)
m Address would specify prefix for forwarding tablem Simple lookup
Network Layer 4-55
Original IP Route Lookup – Examplerwww.pdx.edu address 131.252.120.50m Class B address – class + network is 131.252m Lookup 131.252 in forwarding tablem Prefix – part of address that really matters for
routingr Forwarding table containsm List of prefix entriesm A few fixed prefix lengths (8/16/24)
r Large tablesm 2 Million class C networksm Sites with multiple class C networks have multiple
route entries at every router
Network Layer 4-56
Getting a datagram from source to dest.
Classful routing example
IP datagram:
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
A
BE
miscfields
sourceIP addr
destIP addr data
• datagram remains unchanged, as it travels source to destination
• addr fields of interest here
Dest. Net. next router Nhops
223.1.1 1223.1.2 223.1.1.4 2223.1.3 223.1.1.4 2
routing table in A
Network Layer 4-57
Getting a datagram from source to dest.
Starting at A, given IP datagram addressed to B:
r look up net. address of Br find B is on same net. as Ar link layer will send datagram
directly to B inside link-layer framem B and A are directly
connected
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
A
BE
Dest. Net. next router Nhops
223.1.1 1223.1.2 223.1.1.4 2223.1.3 223.1.1.4 2
miscfields 223.1.1.1 223.1.1.3 data
Network Layer 4-58
Getting a datagram from source to dest.
Starting at A, dest. E:m look up network address of Em E on different network
• A, E not directly attachedm routing table: next hop router
to E is 223.1.1.4 m link layer sends datagram to
router 223.1.1.4 inside link-layer frame
m datagram arrives at 223.1.1.4 m continued…..
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
A
BE
Dest. Net. next router Nhops
223.1.1 1223.1.2 223.1.1.4 2223.1.3 223.1.1.4 2
miscfields 223.1.1.1 223.1.2.2 data
Network Layer 4-59
Getting a datagram from source to dest.
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
A
BE
miscfields 223.1.1.1 223.1.2.2 data network router Nhops interface
223.1.1 - 1 223.1.1.4223.1.2 - 1 223.1.2.9
223.1.3 - 1 223.1.3.27
Dest. next
Arriving at 223.1.4, destined for 223.1.2.2m look up network address of Em E on same network as router’s
interface 223.1.2.9• router, E directly attached
m link layer sends datagram to 223.1.2.2 inside link-layer frame via interface 223.1.2.9
m datagram arrives at 223.1.2.2!!!(hooray!)
Network Layer 4-60
IP route lookup and CIDRr Recall Classless routing (CIDR)
m Advantages• Saves space in route tables• Makes more efficient use of address space
– ISP allocated 8 class C chunks, 201.10.0.0 to 201.10.7.255– Allocation uses 3 bits of class C space– Remaining 21 bits are network number, written as 201.10.0.0/21– Replace 8 class C entries with 1 combined entry
• Routing protocols carry prefix length with destination network addressm But....Makes route lookup more complex
• No longer separate class A/B/C route tables each with O(1) lookup• One table containing many prefix lengths• Must match against all routes simultaneously via longest prefix match
Network Layer 4-61
CIDR exampleISP X given 16 class C networks 200.23.16.* to 200.23.31.* (or 200.23.16/20)
200.23.16.0/24, 200.200.17.0/24200.23.18.0/24, 200.200.19.0/24200.23.20.0/24, 200.200.21.0/24200.23.22.0/24, 200.200.23.0/24
Large company
200.23.16.0/21
Medium company
200.23.24.0/22
200.23.24.0/24200.23.25.0/24200.23.26.0/24200.23.27.0/24
Small company
200.23.28.0/23
200.23.28.0/24200.23.29.0/24
Tiny company
200.23.30.0/24
Adjacent ISP
routerISP X
Route Interface200.23.16/20 1
1 Route Interface200.23.16/21 2200.23.24/22 3200.23.28/23 4200.23.30/24 5
1
23 4
5
Network Layer 4-62
CIDR route aggregation
“Send me anythingwith addresses beginning 200.23.16.0/20”
200.23.16.0/23
200.23.18.0/23
200.23.30.0/23
Fly-By-Night-ISP
Organization 0
Organization 7Internet
Organization 1
ISPs-R-Us “Send me anythingwith addresses beginning 199.31.0.0/16”
200.23.20.0/23Organization 2
...
...
Hierarchical addressing allows efficient advertisement of routing information:
Network Layer 4-63
Another CIDR example
Provider
• Routing to the network• Packet to 10.1.1.3
arrives• Path is R2 – R1 – H1
– H2
H2
H3
H4
R1
10.1.1/24
10.1.1.210.1.1.4
10.1.16/24 10.1.8/24
10.1.3/24
10.1.1.3
10.1.2/24
R2
10.1.3.2
10.1.8.4
10.1.1.110.1.2.210.1.3.1
10.1.8.110.1.2.110.1.16.1
H1
10.1.1.2/31
Network Layer 4-64
Another CIDR example
Routing table at R2Destination Next Hop Interface
127.0.0.1 127.0.0.1 lo0
Default or 0/0 provider 10.1.16.1
10.1.8.0/24 10.1.8.1 10.1.8.1
10.1.2.0/24 10.1.2.1 10.1.2.1
10.1.0.0/22 10.1.2.2 10.1.2.1
• Subnet Routing• Packet to 10.1.1.3• Matches 10.1.0.0/22
H2
H3
H4
R1
10.1.1/24
10.1.1.210.1.1.4
10.1.16/24 10.1.8/24
10.1.3/24
10.1.1.3
10.1.2/24
R2
10.1.3.2
10.1.8.4
10.1.1.110.1.2.210.1.3.1
10.1.8.110.1.2.110.1.16.1
H1
10.1.1.2/31
Network Layer 4-65
Another CIDR example
Routing table at R1Destination Next Hop Interface
127.0.0.1 127.0.0.1 lo0
Default or 0/0 10.1.2.1 10.1.2.2
10.1.3.1 10.1.3.1
10.1.1.0/24 10.1.1.1 10.1.1.1
10.1.2.2 10.1.2.2
• Subnet Routing• Packet to 10.1.1.3• Matches 10.1.1.2/31
• Longest prefix match
10.1.1.4 10.1.1.1
10.1.2.0/24
10.1.1.2/31
10.1.3.0/24
H2
H3
H4
R1
10.1.1/24
10.1.1.210.1.1.4
10.1.16/24 10.1.8/24
10.1.3/24
10.1.1.3
10.1.2/24
R2
10.1.3.2
10.1.8.4
10.1.1.110.1.2.210.1.3.1
10.1.8.110.1.2.110.1.16.1
H1
10.1.1.2/31
10.1.1.3 matches both routes, use longest prefix match
Network Layer 4-66
Another CIDR example
Routing table at H1Destination Next Hop Interface
127.0.0.1 127.0.0.1 lo0
Default or 0/0 10.1.1.1 10.1.1.4
10.1.1.0/24 10.1.1.4 10.1.1.4
10.1.1.2/31 10.1.1.2 10.1.1.2
• Subnet Routing• Packet to 10.1.1.3• Direct route
• Longest prefix match
H2
H3
H4
R1
10.1.1/24
10.1.1.210.1.1.4
10.1.16/24 10.1.8/24
10.1.3/24
10.1.1.3
10.1.2/24
R2
10.1.3.2
10.1.8.4
10.1.1.110.1.2.210.1.3.1
10.1.8.110.1.2.110.1.16.1
H1
10.1.1.2/31
10.1.1.3 matches both routes, use longest prefix match
Network Layer 4-67
CIDR Shortcomingsr Customer selecting a new providerm Renumbering required
201.10.0.0/21
201.10.0.0/22 201.10.4.0/24 201.10.5.0/24 201.10.6.0/23
Provider 1 Provider 2
199.31.0.0/16
Network Layer 4-68
CIDR shortcomings
r More specific routesr Multi-homingISPs-R-Us has a more specific route to Organization 1
“Send me anythingwith addresses beginning 200.23.16.0/20”
200.23.16.0/23
200.23.18.0/23
200.23.30.0/23
Fly-By-Night-ISP
Organization 0
Organization 7Internet
Organization 1
ISPs-R-Us “Send me anythingwith addresses beginning 199.31.0.0/16or 200.23.18.0/23”
200.23.20.0/23Organization 2
...
...
Network Layer 4-69
Longest-prefix matchingr Algorithms and data structures for CIDR-based IP route lookups
m Ruiz-Sanchez, Biersack, Dabbous, “Survey and Taxonomy of IP address Lookup Algorithms”, IEEE Network, Vol. 15, No. 2, March 2001
• Binary trie• Multi-bit trie• LC trie• Lulea trie• Full expansion/compression• Binary search on prefix lengths• Binary range search• Multiway range search• Multiway range trees• Binary search on hash tables (Waldvogel – SIGCOMM 97)
Network Layer 4-70
Binary trie
Route PrefixesA 0* B 01000*C 011*D 1*E 100*F 1100*G 1101*H 1110*I 1111*
A
0
0
0
0
1
1
0
0 0
0 0
1
1
1 1
1
B
C
D
E
F G H I
r Data structure to support longest-prefix match for forwardingr Bit-wise traversal from left-to-right
Network Layer 4-71
Path-compressed binary trier Eliminate single branch point nodesr Compare address against all prefixes along path to leafm Take deepest match
r Variants include PATRICIA and BSD tries
Route PrefixesA 0* B 01000*C 011*D 1*E 100*F 1100*G 1101*H 1110*I 1111*
A
0
1 0
0
0 0
1
1
1 1
1
B C
D
E
F G H I
0
Bit=3 Bit=2
Bit=3
Bit=4 Bit=4
Bit=1
Network Layer 4-73
Example #2:Path-compressed binary trieRoute PrefixesA 0* B 00010*C 00011*
A
0
B
0
C
Bit=1
Bit=5
1
Network Layer 4-74
Multi-bit triesr Compare multiple bits at a time
m Stride = number of bits being examinedm Reduces memory accessesm Increase memory required
• Forces table expansion for prefixes falling in between stridesm Two types
• Variable stride multi-bit tries• Fixed stride multi-bit tries
r Most route entries are Class Cm Optimize “stride” based on this
Network Layer 4-75
Variable stride multi-bit trierSingle level has variable stride lengths
Route PrefixesA 0* B 01000*C 011*D 1*E 100*F 1100*G 1101*H 1110*I 1111*
A
0 1
0 1
00 01 10 11
A D D
B
CC E
00 01 10 11
GF IH
00 01 10 11
Network Layer 4-76
Fixed stride multi-bit trierSingle level has equal strides
Route PrefixesA 0* B 01000*C 011*D 1*E 100*F 1100*G 1101*H 1110*I 1111*
A
000 001 010 011 100 101 110 111
A A
00 01 10 11 00 01 10 11 00 01 10 11
C E D D D
B F F G HG H II
Network Layer 4-77
IssuesrScaling m IPv6
rStride choicem Tuning stride to route tablem Bit shuffling
Network Layer 4-78
IP addressing and NATr Network Address Translation (NAT)
m Alternate solution to address space depletion problem• Kludge (but useful)
m Sits between your network and the Internetm Translates local, private, network layer addresses to global IP
addressesm Has a pool of global IP addresses (less than number of hosts on
your network)r What if we only have few (or just one) IP address?
m Use NAPT (Network Address Port Translator)m Both addresses and ports are translated
• Translates Paddr + flow info to Gaddr + new flow info• Uses TCP/UDP port numbers
m Potentially thousands of simultaneous connections with one global IP address
Network Layer 4-79
NAT Illustration
Global Internet
PrivateNetwork
Pool of global IP addresses
•Operation: Source (S) wants to talk to Destination (D):• Create Sg-Sp mapping• Replace Sp with Sg for outgoing packets• Replace Sg with Sp for incoming packets
PG
Dg Sp DataNAT
Destination Source
Dg Sg Data
Network Layer 4-80
NAPT: Network Address and Port Translation
10.0.0.1
10.0.0.2
10.0.0.3
10.0.0.4
138.76.29.7
local network(e.g., home network)
10.0.0/24
rest ofInternet
Datagrams with source or destination in this networkhave 10.0.0/24 address for source, destination (as usual)
All datagrams leaving localnetwork have same single source
NAT IP address: 138.76.29.7,different source port numbers
Network Layer 4-81
NAT: Network Address Translation
r Advantagesm range of addresses not needed from ISP: just a
small set of IP addresses for all devicesm can change addresses of devices in local network
without notifying outside worldm can change ISP without changing addresses of
devices in local networkm devices inside local net not explicitly addressable,
visible by outside world (a security plus).
Network Layer 4-82
NAT: Network Address TranslationImplementation: NAT router must:
m outgoing datagrams: replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #). . . remote clients/servers will respond using (NAT
IP address, new port #) as destination addr.
m remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair
m incoming datagrams: replace (NAT IP address, new port #) in dest fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table
Network Layer 4-83
NAT: Network Address Translation
10.0.0.1
10.0.0.2
10.0.0.3
S: 10.0.0.1, 3345D: 128.119.40.186, 80
110.0.0.4
138.76.29.7
1: host 10.0.0.1 sends datagram to 128.119.40.186, 80
NAT translation tableWAN side addr LAN side addr138.76.29.7, 5001 10.0.0.1, 3345…… ……
S: 128.119.40.186, 80 D: 10.0.0.1, 3345 4
S: 138.76.29.7, 5001D: 128.119.40.186, 802
2: NAT routerchanges datagramsource addr from10.0.0.1, 3345 to138.76.29.7, 5001,updates table
S: 128.119.40.186, 80 D: 138.76.29.7, 5001 3
3: Reply arrivesdest. address:138.76.29.7, 5001
4: NAT routerchanges datagramdest addr from138.76.29.7, 5001 to 10.0.0.1, 3345
Network Layer 4-84
NAT: Network Address Translation
r 16-bit port-number field: m 60,000 simultaneous connections with a single
LAN-side address!rNAT is controversial:m routers should only process up to layer 3m violates end-to-end argument
• NAT possibility must be taken into account by app designers, eg, P2P applications
m address shortage should instead be solved by IPv6
Network Layer 4-85
Problems with NATrHides the internal network structurem Some consider this an advantage
rMultiple NAT hops must ensure consistent mappings
rSome protocols carry addressesm e.g., FTP carries addresses in textmWhat is the problem?
r EncryptionrNo inbound connections
Network Layer 4-86
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-87
ICMP: Internet Control Message Protocol
r Essentially a network-layer protocol for passing control messages
r used by hosts & routers to communicate network-level informationm error reporting: unreachable
host, network, port, protocolm echo request/reply (used by
ping)r network-layer “above” IP:
m ICMP msgs carried in IP datagrams
r ICMP message: type, code plus first 8 bytes of IP datagram causing error
r http://www.rfc-editor.org/rfc/rfc792.txt
Type Code description0 0 echo reply (ping)3 0 dest. network unreachable3 1 dest host unreachable3 2 dest protocol unreachable3 3 dest port unreachable3 6 dest network unknown3 7 dest host unknown4 0 source quench (congestion
control - not used)8 0 echo request (ping)9 0 route advertisement10 0 router discovery11 0 TTL expired12 0 bad IP header
Network Layer 4-88
Traceroute and ICMP
r Source sends series of UDP segments to destm First has TTL =1m Second has TTL=2, etc.m Unlikely port number
r When nth datagram arrives to nth router:m Router discards datagramm And sends to source an
ICMP message (type 11, code 0)
m Message includes name of router& IP address
r When ICMP message arrives, source calculates RTT
r Traceroute does this 3 times
Stopping criterionr UDP segment eventually
arrives at destination hostr Destination returns ICMP
“host unreachable” packet (type 3, code 3)
r When source gets this ICMP, stops.
Network Layer 4-89
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-90
IPv6r Redefine functions of IP (version 4)mWhat changes should be made in….
• IP addressing• IP delivery semantics• IP quality of service• IP security• IP routing• IP fragmentation• IP error detection
Network Layer 4-91
IPv6r Initial motivation: 32-bit address space soon
to be completely allocated (est. 2008)rAdditional motivation:m Remove ancillary functionality
• header format helps speed processing/forwardingm Add missing, but essential functionality
• header changes to facilitate QoS • new “anycast” address: route to “best” of several
replicated servers IPv6 datagram format:m fixed-length 40 byte headerm no fragmentation allowed
Network Layer 4-92
IPv6 Header (Cont)Priority: identify priority among datagrams in flowFlow Label: identify datagrams in same “flow.”
(concept of“flow” not well defined).Next header: identify upper layer protocol for data
Network Layer 4-93
IPv6 Changes
r Scale – addresses are 128bitm Header size?
r Simplificationm Removes infrequently used parts of headerm 40 byte fixed header vs. 20+ byte variable header
r IPv6 removes checksumm IPv4 checksum = provide extra protection on top of data-
link layer and below transport layerm End-to-end principle
• Is this necessary?• IPv6 answer =>No
m Relies on upper layer protocols to provide integritym Reduces processing time at each hop
Network Layer 4-94
IPv6 Changes
r IPv6 eliminates fragmentationm Requires path MTU discovery
r ICMPv6: new version of ICMPm additional message types, e.g. “Packet Too Big”m multicast group management functions
r Protocol field replaced by next header fieldm Unify support for protocol demultiplexing as well as
option processingr Option processing
m Options allowed, but only outside of header, indicated by “Next Header” field
m Options header does not need to be processed by every router
• Large performance improvement• Makes options practical/useful
Network Layer 4-95
IPv6 Changes
r TOS replaced with traffic class octetm Support QoS via DiffServ
r FlowID fieldm Help soft state systems, accelerate flow classificationm Maps well onto TCP connection or stream of UDP packets
on host-port pairr Easy configuration
m Provides auto-configuration using hardware MAC addressr Additional requirements
m Support for securitym Support for mobility
Network Layer 4-96
Transition From IPv4 To IPv6rNot all routers can be upgraded simultaneousm no “flag days”mHow will the network operate with mixed IPv4 and
IPv6 routers? rTwo proposed approaches:m Dual Stack: some routers with dual stack (v6, v4) can
“translate” between formatsm Tunneling: IPv6 carried as payload in an IPv4
datagram among IPv4 routers
Network Layer 4-97
TunnelingA B E F
IPv6 IPv6 IPv6 IPv6
tunnelLogical view:
Physical view:A B E F
IPv6 IPv6 IPv6 IPv6IPv4 IPv4
Network Layer 4-98
TunnelingA B E F
IPv6 IPv6 IPv6 IPv6
tunnelLogical view:
Physical view:A B E F
IPv6 IPv6 IPv6 IPv6
C D
IPv4 IPv4
Flow: XSrc: ADest: F
data
Flow: XSrc: ADest: F
data
Flow: XSrc: ADest: F
data
Src:BDest: E
Flow: XSrc: ADest: F
data
Src:BDest: E
A-to-B:IPv6
E-to-F:IPv6B-to-C:
IPv6 insideIPv4
B-to-C:IPv6 inside
IPv4
Network Layer 4-99
Dual Stack ApproachrDual-stack router translates b/w v4 and v6m v4 addresses have special v6 equivalentsm Issue: how to translate “FlowField” of v6 ?
Network Layer 4-100
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-101
1
23
0111
value in arrivingpacket’s header
routing algorithm
local forwarding tableheader value output link
0100010101111001
3221
Interplay between routing, forwarding
r Previously: Forward based on forwarding table
r Q: How to generate forwarding tables?
• Routing algorithms and protocols
Network Layer 4-102
Routing
Graph abstraction for routing algorithms:
r graph nodes are routers
r graph edges are physical linksm link cost
• Delay• $ cost• congestion level
Goal: determine “good” path(sequence of routers) thru
network from source to dest.
Routing protocol
A
ED
CB
F2
21
3
1
1
2
53
5
• “good” path:– typically means
minimum cost path– other def’s possible
Network Layer 4-103
Who handles IP routing functions?m Source (IP source routing)
• Packet carries pathmNetwork edge devices
• Map IP route into label, wavelength, or circuit at edges• Switch on label, wavelength, or circuit in the core
– ATM– MPLS– lambda switching
mNetwork routers• Hop-by-hop forwarding based on destination IP carried by
packet• Routers keep next hop for destination• IP route table calculated in network routers• Most common
Network Layer 4-104
Source Routingr IP source route optionm List entire path (strict) or partial path (loose) in
packetm Attach list of IP addresses within header
r Router processingm Examine first step in directions
• Increment pointer offset in header• Forward to step• Copy entire source route header on fragmentation
Network Layer 4-105
Source Routing Example
Receiver
Packet 3,4,3
Sender
2
34
1
2
34
1
2
34
1
R1
R2
R1
4,3
3
Network Layer 4-106
Source RoutingrAdvantagesm Switches can be very simple and fast
rDisadvantagesm Variable (unbounded) header sizem Sources must know or discover topology (e.g., failures)
rTypical usem Ad-hoc networks (DSR)mMachine room networks (Myrinet)
Network Layer 4-107
Network edge device routingr Virtual circuits, tag switchingr Connection setup phasem IP route lookup at edges to generate appropriate
label, wavelength, circuitm Switch on label, wavelength, circuit ID in core
r In-network processingm Lookup flow ID – simple table lookupm Potentially replace flow ID with outgoing flow IDm Forward to output port
Network Layer 4-108
Virtual Circuits Examples
Receiver
Packet
1,5 à 3,7
Sender
2
34
11,7 à 4,2
2
34
1
2
34
1
2,2 à 3,6
R1
R2
R1
5 7
2
6
Network Layer 4-109
Virtual CircuitsrAdvantagesmMore efficient lookup (simple table lookup)
• Easier for hardware implementationsmMore flexible (different path for each flow)m Can reserve bandwidth at connection setup
rDisadvantagesm Still need to route connection setup requestmMore complex failure recovery – must recreate
connection staterTypical usesm ATM – combined with fix sized cellsmMPLS – tag switching for IP networks
Network Layer 4-110
IP Datagrams on Virtual Circuitsr Challenge – when to setup connectionsm At bootup time – permanent virtual circuits (PVC)
• Large number of circuitsm For every packet transmission
• Connection setup is expensivem For every connection
• What is a connection?• How to route connectionless traffic?
m Based on traffic• VC for long-lived flows• Normal IP forwarding for all other flows
Network Layer 4-111
Network routers (Global IP addresses)rMost prevalent way to route on the Internetm Each packet has destination IP addressm Each router has forwarding table of..
• destination IP à next hop IP addressm Distributed routing algorithm for calculating
forwarding tables
Network Layer 4-112
Global Address Example
Receiver
Packet R
Sender
2
34
1
2
34
1
2
34
1
R2
R3
R1
R
RR à 3
R à 4
R à 3
R
Network Layer 4-113
Issues in Router Table SizerOne entry for every host on the Internetm 100M entries
rOne entry for every LANm Every host on LAN shares prefixm Still too many
rOne entry for every organizationm Every host in organization shares prefixm Requires careful address allocationmWhat constitutes an “organization”?
Network Layer 4-114
Global AddressesrAdvantagesm Simple error recovery
rDisadvantagesm Every router knows about every destination
• Potentially large tablesm All packets to destination take same route
Network Layer 4-115
Comparison
Source Routing Global Addresses
Header Size Worst OK – Large address
Router Table Size None Number of hosts (prefixes)
Forward Overhead Best Prefix matching
Virtual Circuits
OK (larger thanglobal if IP payload)
Number of circuits
Good (table index)
Setup Overhead None None
Error Recovery Tell all hosts Tell all routers
Connection Setup
Tell all routers, Tear down circuit
and re-route
Network Layer 4-116
u
yx
wv
z2
21
3
1
1
2
53
5
Graph: G = (N,E)
N = set of routers = { u, v, w, x, y, z }
E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }
Graph abstraction
Remark: Graph abstraction is useful in other network contexts
Example: P2P, where N is set of peers and E is set of TCP connections
Network Layer 4-117
Graph abstraction: costs
u
yx
wv
z2
21
3
1
1
2
53
5 • c(x,x’) = cost of link (x,x’)
- e.g., c(w,z) = 5
• cost could always be 1, or inversely related to bandwidth,or inversely related to congestion
Cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)
Question: What’s the least-cost path between u and z ?
Routing algorithm: algorithm that finds least-cost path
Network Layer 4-118
Routing Algorithm classificationGlobal or decentralized
information?Global:r all routers have complete
topology, link cost infor “link state” algorithmsDecentralized:r router knows physically-
connected neighbors, link costs to neighbors
r iterative process of computation, exchange of info with neighbors
r “distance vector” algorithms
Static or dynamic?Static:r routes change slowly
over timeDynamic:r routes change more
quicklym periodic updatem in response to link
cost changes
Network Layer 4-119
Other characteristics
r Communication costsr Processing costsrOptimalityrStabilitym Convergence timem Loop freedommOscillation damping
Network Layer 4-120
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-121
A Link-State Routing Algorithm
Dijkstra’s algorithmr net topology, link costs known to all nodesmaccomplished via “link state broadcast” mall nodes have same info
r computes least cost paths from one node (‘source”) to all other nodesmgives forwarding table for that nodem iterative: after k iterations, know least cost
path to k dest.’s
Network Layer 4-122
Dijkstra’s algorithmrStart conditionm Each node assumed to know state of links to its
neighborsrStep 1: Link state broadcastm Each node broadcasts its local link states to all other
nodesm Reliable flooding mechanism
rStep 2: Shortest-path tree calculationm Each node locally computes shortest paths to all
other nodes from global statem Dijkstra’s shortest path tree (SPT) algorithm
Network Layer 4-123
Link state broadcast
r Link State Packets (LSPs) to broadcast state to all nodes
r Periodically, each node creates a link state packet containing:mNode IDm List of neighbors and link costm Sequence numberm Time to live (TTL)mNode outputs LSP on all its links
Network Layer 4-124
Link state broadcast
r Reliable Flooding mWhen node J receives LSP from node K
• If LSP is the most recent LSP from K that J has seen so far, J saves it in database and forwards a copy on all links except link LSP was received on
• Otherwise, discard LSPmHow to tell more recent
• Use sequence numbers– Same method as sliding window protocols– Needed to avoid stale information from flood– Problem: sequence number wrap-around
» Lollipop sequence space
Network Layer 4-125
Wrapped sequence numbers
rWrapped sequence numbersm 0-N where N is largem If difference between numbers is large, assume
a wrapm A is older than B if….
• A < B and |A-B| < N/2 or…• A > B and |A-B| > N/2
rWhat about new nodes or rebooted nodes that are out of sync with sequence number space?m Lollipop sequence (Perlman 1983)
Network Layer 4-126
Lollipop sequence numbers
r Divide sequence number spacer Special negative sequence for recovering from
rebootm New and rebooted nodes use negative sequence numbersm Upon receipt of negative number, other nodes inform
these nodes of current “up-to-date” sequence numberr A older than B if
m A < 0 and A < Bm A > 0, A < B and (B – A) < N/4m A > 0, A > B and (A – B) > N/4
0-N/2
N/2 - 1
Network Layer 4-127
Shortest-path tree calculation
Notation:rc(x,y): link cost from node x to y; = 8 if
not direct neighborsrD(v): current value of cost of path from
source to dest. vrp(v): predecessor node along path from
source to vrN': set of nodes whose least cost path
definitively known
Network Layer 4-128
Dijsktra’s Algorithm1 Initialization:2 N' = {u} 3 for all nodes v 4 if v adjacent to u 5 then D(v) = c(u,v) 6 else D(v) = 87 8 Loop9 find w not in N' such that D(w) is a minimum 10 add w to N'11 update D(v) for all v adjacent to w and not in N' : 12 D(v) = min( D(v), D(w) + c(w,v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N'
Network Layer 4-129
Shortest-path tree calculation(Dijkstra’s algorithm example)
A F
B
D E
C2
2
2
3
1
1
1
3
5
step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)0 A 2, A 5, A 1, A ~ ~
5
B C D E F
D(v) = min( D(v), D(w) + c(w,v) )
Network Layer 4-130
Dijkstra’s algorithm example
A F
B
D E
C2
2
2
3
1
1
1
3
5
step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)0 A 2, A 5, A 1, A ~ ~1 AD 2, A 4, D 2, D ~
5
B C D E F
D(v) = min( D(v), D(w) + c(w,v) )
Network Layer 4-131
Dijkstra’s algorithm example
A F
B
D E
C2
2
2
3
1
1
1
3
5
step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)0 A 2, A 5, A 1, A ~ ~1 AD 2, A 4, D 2, D ~2 ADE 2, A 3, E 4, E
5
B C D E F
D(v) = min( D(v), D(w) + c(w,v) )
Network Layer 4-132
Dijkstra’s algorithm example
A F
B
D E
C2
2
2
3
1
1
1
3
5
step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)0 A 2, A 5, A 1, A ~ ~1 AD 2, A 4, D 2, D ~2 ADE 2, A 3, E 4, E3 ADEB 3, E 4, E
5
B C D E F
D(v) = min( D(v), D(w) + c(w,v) )
Network Layer 4-133
Dijkstra’s algorithm example
A F
B
D E
C2
2
2
3
1
1
1
3
5
step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)0 A 2, A 5, A 1, A ~ ~1 AD 2, A 4, D 2, D ~2 ADE 2, A 3, E 4, E3 ADEB 3, E 4, E4 ADEBC 4, E
5
B C D E F
D(v) = min( D(v), D(w) + c(w,v) )
Network Layer 4-134
Dijkstra’s algorithm example
A F
B
D E
C2
2
2
3
1
1
1
3
5
step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)0 A 2, A 5, A 1, A ~ ~1 AD 2, A 4, D 2, D ~2 ADE 2, A 3, E 4, E3 ADEB 3, E 4, E4 ADEBC 4, E5 ADEBCF
5
B C D E F
D(v) = min( D(v), D(w) + c(w,v) )
Network Layer 4-135
Dijkstra’s algorithm example
A
ED
CB
F
Resulting shortest-path tree from A:
BDECF
(A,B)(A,D)
(A,D)(A,D)(A,D)
destination link
Resulting forwarding table in A:
Network Layer 4-136
Link state algorithm characteristicsr Computation overhead
m n nodesm each iteration: need to check all
nodes, w, not in N• n*(n+1)/2 comparisons: O(n**2)• more efficient implementations
possible: O(n log(n)) r Space requirementsr Bandwidth requirementsr Stability
m Inconsistencies can cause transient loops
m Consistent LSDBs required for loop-free paths
A
B
C
D
1
3
5 2
1
Packet from CàAmay loop around BDCif B knows about failureand C & D do not
X
Network Layer 4-137
Link-state algorithm issuesOscillations possible:r e.g., link cost = amount of carried trafficr Example: path to A flaps as traffic routed clockwise
and counter-clockwiser Common problem in load-based link metrics
m A. Khanna and J. Zinky, "The Revised ARPANET Routing Metric," in ACM SIGCOMM, 1989, pp. 45--46.
AD
CB
1 1+e
e0
e1 1
0 0
AD
C
B2+e 0
001+e 1
AD
CB
0 2+e
1+e10 0
AD
C
B2+e 0
e01+e 1
initially … recomputerouting
… recompute … recompute
Network Layer 4-138
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-139
Distance vector routing algorithmsr Variants used inm Early ARPAnetm RIP (intra-domain routing protocol)m BGP (inter-domain routing protocol)
rDistributed next hop computationm “Gossip with immediate neighbors until you find the
best route”m Best route is achieved when there are no more
changesrUnit of information exchangem Vector of distances to destinations
Network Layer 4-140
Distance Vector Algorithm
Bellman-Ford EquationDefinedx(y) := cost of least-cost path from x to y
Then
dx(y) = min {c(x,v) + dv(y) }
where min is taken over all neighbors v of x
v
Network Layer 4-141
Bellman-Ford example
u
yx
wv
z2
21
3
1
1
2
53
5Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3
du(z) = min { c(u,v) + dv(z),c(u,x) + dx(z),c(u,w) + dw(z) }
= min {2 + 5,1 + 3,5 + 3} = 4
Node that achieves minimum is nexthop in shortest path ? forwarding table
B-F equation says:
Network Layer 4-142
Bellman algorithmr Update distance information iterativelyr Example (Bellman 1957)
m Start with link table (as with Dijkstra), calculate distance table iteratively
m Distance table data structure• table of known distances and next hops kept per node• row for each possible destination• column for each directly-attached neighbor to node• example: in node X, for dest. Y via neighbor Z:
Network Layer 4-143
Dj(k,*)
Bellman algorithmr Centralized version
i j
k
j’ k’
c(i,j)
c(i,j’)
Dj’(k,*)
Di(k,*)
For node i
while there is a change in D
for all k not neighbor of i
for each j neighbor of i
Di(k,j) = c(i,j) + Dj(k,*)if Di(k,j) < Di(k,*) {
Di(k,*) = Di(k,j)
Hi(k) = j
D (Y,Z)X
distance from X toY, via Z as next hop
c(X,Z) + min {D (Y,w)}Zw
=
=
D (Y,*)X
Minimum known distance from X to Y=
H (Y)X=
Next hop node from X to Y
Network Layer 4-144
Distance table example
A
E D
CB7
81
2
1
2D ()
A
B
C
D
A
1
7
6
4
B
14
8
9
11
D
5
5
4
2
Ecost to destination via
dest
inat
ion
D (C,D)E
c(E,D) + min {D (C,w)}Dw=
= 2+2 = 4
D (A,D)E
c(E,D) + min {D (A,w)}Dw=
= 2+3 = 5
D (A,B)E
c(E,B) + min {D (A,w)}Bw=
= 8+6 = 14
loop!
loop! H (Y) = X
Network Layer 4-145
Distance table gives forwarding table
D ()
A
B
C
D
A
1
7
6
4
B
14
8
9
11
D
5
5
4
2
Ecost to destination via
dest
inat
ion
A
B
C
D
A,1
D,5
D,4
D,4
Outgoing link to use, cost
dest
inat
ion
Distance table Routing table
H (Y)X
Network Layer 4-146
Distributed Bellman-Ford
r Make Bellman algorithm distributed (Ford-Fulkerson 1962)m Each node i has distance vector estimates to other nodesm Iterate
• Each node sends around and recalculates D[i,*]• When a node x receives new DV estimate from neighbor, it updates its
own DV using B-F equation:
• If estimates change, broadcast entire table to neighbors– continues until no nodes exchange info.– self-terminating: no “signal” to stop
m D[i,*] eventually converges to shortest distance
Dx(y) ? minv{c(x,v) + Dv(y)} for each node y ? N
Network Layer 4-147
Distributed Bellman-Ford overview
Asynchronous:r “triggered updates”
m no need to exchange info/iterate in lock step!
Iterative:r When local link costs change r When neighbor sends a
message that its least cost path has changed for a node
Distributed:r nodes communicate only with
directly-attached neighborsr each node notifies neighbors
only when its least cost path to any destination changesm neighbors then notify their
neighbors if necessary
wait for (change in local link cost of msg from neighbor)
recompute distance table
if least cost path to any desthas changed, notifyneighbors
Each node:
Network Layer 4-148
Distributed Bellman-Ford algorithm
1 Initialization: 2 for all adjacent nodes v: 3 DX(*,v) = infinity /* the * operator means "for all rows" */ 4 DX(v,v) = c(X,v) 5 for all destinations, y 6 send minwDX(y,w) to each neighbor /* w over all X's neighbors */
At all nodes, X:
Network Layer 4-149
Distributed Bellman-Ford algorithm8 loop9 wait (until I see a link cost change to neighbor V 10 or until I receive update from neighbor V) 11 12 if (c(X,V) changes by d) 13 /* change cost to all dest's via neighbor v by d */14 /* note: d could be positive or negative */ 15 for all destinations y: DX(y,V) = DX(y,V) + d 16 17 else if (update received from V wrt destination Y) 18 /* shortest path from V to some Y has changed */19 /* V has sent a new value for its minwDV(Y,w) */ 20 /* call this received new value is "newval" */ 21 for the single destination y: DX(Y,V) = c(X,V) + newval 22 23 if we have a new minwDX(Y,w)for any destination Y 24 send new value of minwDX(Y,w) to all neighbors 25 26 forever
Network Layer 4-150
DBF example
A
B
E
C
D
Info atNode
A
B
C
D
A B C
0 7 ~
7 0 1
~ 1 0
~ ~ 2
7
1
1
2
28
Distance to Node
D
~
~
2
0
E 1 8 ~ 2
1
8
~
2
0
E
Initial Distance Vectors
Network Layer 4-151
DBF example
Info atNode
A
B
C
D
A B C
0 7 ~
7 0 1
~ 1 0
~ ~ 2
Distance to Node
D
~
~
2
0
E 1 8 4 2
1
8
~
2
0
E
A
B
E
C
D
7
1
1
2
28
E Receives D’s RoutesUpdates cost to C
Network Layer 4-152
DBF example
Info atNode
A
B
C
D
A B C
0 7 8
7 0 1
~ 1 0
~ ~ 2
Distance to Node
D
~
~
2
0
E 1 8 4 2
1
8
~
2
0
E
A
B
E
C
D
7
1
1
2
28
A receives B’s updateUpdates cost to C, but cost to E unchanged
Network Layer 4-153
DBF example
Info atNode
A
B
C
D
A B C
0 7 5
7 0 1
~ 1 0
~ ~ 2
Distance to Node
D
3
~
2
0
E 1 8 4 2
1
8
~
2
0
E
A
B
E
C
D
7
1
1
2
28
A receives E’s routesUpdates cost to C (new min) and D
Network Layer 4-154
DBF example
Info atNode
A
B
C
D
A B C
0 6 5
6 0 1
5 1 0
3 3 2
Distance to Node
D
3
3
2
0
E 1 5 4 2
1
5
4
2
0
E
A
B
E
C
D
7
1
1
2
28
And so on, until final distances....
Network Layer 4-155
DBF example
dest
A
B
C
D
A B D
1 14 5
7 8 5
6 9 4
4 11 2
Next hop
E’s routing table
A
B
E
C
D
7
1
1
2
28
E’s routing table
Network Layer 4-157
DBF (another example)
X Z12
7
Y
D (Y,Z)X
c(X,Z) + min {D (Y,w)}w
=
= 7+1 = 8
Z
D (Z,Y)X
c(X,Y) + min {D (Z,w)}w=
= 2+1 = 3
Y
Network Layer 4-158
DBF (good news example)Link cost changes:• node detects local link cost change • updates distance table (line 15)• if cost change in least cost path, notify
neighbors (lines 23,24)• fast convergence (see book for details)
X Z14
50
Y1
algorithmterminates“good
news travelsfast”
Network Layer 4-159
DBF (good news example)
“goodnews travelsfast”
x z14
50
y1
At time t0, y detects the link-cost change, updates its DV, and informs its neighbors.
At time t1, z receives the update from y and updates its table. It computes a new least cost to x and sends its neighbors its DV.
At time t2, y receives z’s update and updates its distance table. y’s least costs do not change and hence y does not send any message to z.
Network Layer 4-160
DBF (count-to-infinity example)
Link cost changes:• good news travels fast • bad news travels slow - “count to infinity”
problem!• alternate route implicitly used link that
changed
X Z14
50
Y60
algorithmcontinues
on!
Network Layer 4-161
DBF: (count-to-infinity example)
A
25
1
1
B
C
BC 2
1
dest cost
AC 1
1
dest cost
AB 1
2
dest cost
X
Network Layer 4-162
DBF: (count-to-infinity example)
A
25 1
B
C
BC 2
1
dest cost
AC 1
~
dest cost
AB 1
2
dest cost
C Sends Routes to B
Network Layer 4-163
DBF: (count-to-infinity example)
A
25 1
B
C
BC 2
1
dest cost
AC 1
3
dest cost
AB 1
2
dest cost
B Updates Distance to A
Network Layer 4-164
DBF: (count-to-infinity example)
A
25 1
B
C
BC 2
1
dest cost
AC 1
3
dest cost
AB 1
4
dest cost
B Sends Routes to C
Network Layer 4-165
DBF: (count-to-infinity example)
A
25 1
B
C
BC 2
1
dest cost
AC 1
5
dest cost
AB 1
4
dest cost
C Sends Routes to B
Network Layer 4-166
Analyzing Distributed Bellman-Fordr Continuously send local distance tables of best
known routes to all neighbors until your table convergesm Computation diffuses until all nodes convergem Will computation converge quickly and deterministically?
• Not all the time, pathologic cases possible (count-to-infinity)
• Several algorithms for minimizing such cases
Network Layer 4-167
How are loops caused?rObservation 1:m B’s metric increases
rObservation 2:m C picks B as next hop to Am But, the implicit path from C to A includes itself!
Network Layer 4-168
Solutions to loopingrSplit horizonm Do not advertise route to X to an adjacent neighbor if
your route to X goes through that neighborm If C routes through B to get to A, C does not
advertise (C=>A) route to B.r Poisoned reversem Advertise an infinite distance route to X to an
adjacent neighbor if your route to X goes through that neighbor
m If C routes through B to get to A, C advertises to B that its distance to A is infinity
rWorks for two node loopsm Does not work for loops with more nodes
Network Layer 4-169
Split-horizon with poisoned reverseIf Z routes through Y to get to X :• Z tells Y its (Z’s) distance to X is infinite (so
Y won’t route to X via Z)• will this completely solve count to infinity
problem? X Z
14
50
Y60
algorithmterminates
new route to X not involving Y
can now select and advertise route to X via Z
route to X through Y goes thru Zpoison it!
Network Layer 4-171
Solutions to loopingr Route poisoning
m Advertise infinite cost on a route to everyone (not just next hop) when lowest cost route increases
m Gets rid of stale information throughout networkm Used in conjunction with Path Holdown
r Path Holddownm Freeze route for a fixed time
• Do not switch to an alternate while route poisoning is happening• In our example, A and B delay changing and advertising new routes• A and B both set route to D to infinity after single step
m Configuring holddown delay• Delay too large: Slow convergence• Delay too small: Count-to-infinity more probable
Network Layer 4-172
Solutions to loopingr Path vector m Select loop-free pathsm Each route advertisement carries entire pathm If a router sees itself in path, it rejects the routem BGP does it this waym Space proportional to diameter of network
Network Layer 4-173
Solutions to loopingrDo solutions completely eliminate loops?mNo! Transient loops are still possiblemWhy? Because implicit path information may be stalem See this in BGP convergence
rOnly way to fix thism Ensure that you have up-to-date information by
explicitly querying
Network Layer 4-174
Message complexity, network bandwidthr LS: with n nodes, E links, O(nE) msgs sent m Send info about your neighbors to everyonem Small messages broadcast globally
rDV: exchange between neighbors onlymSend everything you know to your neighborsmLarge messages, but transfers only to
neighborsmconvergence time varies
Link State vs. Distance Vector
Network Layer 4-175
Link State vs. Distance VectorSpeed of Convergencer LS: O(n2) algorithm requires O(nE) msgsmFaster – can forward LSPs before processingmSingle SPT calculation
rDV: convergence time variesmFast with triggered updatesmcount-to-infinity problemmmay be routing loops
Network Layer 4-176
Link State vs. Distance VectorSpace requirements:r LS mmaintains entire topology
rDV mmaintains only neighbor statem path vector maintains routes proportional to network
diameter
Network Layer 4-177
Link State vs. Distance VectorRobustness:m LS can broadcast incorrect/corrupted LSP
• Can be made robust since sources are aware of alternate paths within topology
m DV can advertise incorrect paths to all destinations• Incorrect calculation can spread to entire network
Network Layer 4-178
DUALrDistributed Update Algorithm m Garcia-Luna-Aceves 1989m Goal: Avoid transient loops in DV and LS algorithms
• Similar in flavor to route poisoning and path holddownm 2 ideas
• A path shorter than current path cannot contain a loop• Based on diffusing computation (Dijkstra-Scholten 1980)
– Wait until computation completes before changing routes in response to a new update
– Similar to path-holddown
m 3 kinds of messages• Update, query, reply
m 2 states for routers• Active (queries outstanding), passive
Network Layer 4-179
DUALOn update if (lower cost) adoptelse if (higher cost) {
if (from next hop) {if (any path exists < old length from next hop)
switch pathelse
freeze routesend query to all neighbors except next hopgo into activewait for reply from all neighborsupdate routereturn to passive
}send reply to all querying neighbors
}
Network Layer 4-180
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-181
Hierarchical Routing
scale: with 200 million destinations:
r can’t store all dest’s in routing tables!
r routing table exchange would swamp links!
r Flat routing does not scale
administrative autonomyr internet = network of
networksr each network admin may
want to control routing in its own network
Our routing study thus far - idealization r all routers identicalr network “flat”… not true in practice
Network Layer 4-182
Routing Hierarchies
r Key observationmNeed less information with increasing distance to
destinationrTwo radically different approaches for routingm The area hierarchym The landmark hierarchy
• Covered in advanced topics at end of course...
Network Layer 4-183
Areasr Divide network into areas
m Areas can have nested sub-areasm No path between two sub-areas of an area can exit that aream Within area, each node has routes to every other node m Outside area
• Each node has routes for other top-level areas only• Inter-area packets are routed to nearest appropriate border router• Can result in sub-optimal paths
r Hierarchically address nodes in a networkm Sequentially number top-level areasm Sub-areas of area are labeled relative to that aream Nodes are numbered relative to the smallest containing area
Network Layer 4-184
Hierarchical Routing on the Internet
r aggregate routers into regions, “autonomous systems” (AS)m administrative
autonomyr routers in same AS run
same routing protocolm “intra-AS” routing
protocol (IGP)m routers in different AS
can run different intra-AS routing protocol
Gateway routerm Direct link to router in
another ASm special routers in ASm run intra-AS routing
protocol with all other routers in AS
m also responsible for routing to destinations outside AS
m run inter-AS routing protocol or exterior gateway protocol (EGP) with other gateway routers in other AS’s
Network Layer 4-185
Example #1
1 2
3
1.11.2
2.1 2.2
3.1 3.2
2.2.1
44.1 4.2
5
5.1 5.2
EGP
IGP
EGPEGP
IGP
IGP
IGPIGP
EGPEGP
Network Layer 4-186
Example #2Gateways:
•perform inter-AS routing amongst themselves•perform intra-AS routers with other routers in their AS
inter-AS, intra-AS routing in
gateway A.c
network layer
link layerphysical layer
a
b
b
aaC
A
Bd
A.aA.c
C.bB.a
cb
c
Network Layer 4-187
Path Sub-optimality
1 2
3
1.11.2
2.1 2.2
3.1 3.2
2.2.1
3 hop red pathvs.2 hop green path
startend
3.2.1
1.2.1
Network Layer 4-188
AS CategoriesrStub: an AS that has only a single connection to
one other AS - carries only local traffic.rMulti-homed: an AS that has connections to
more than one AS, but does not carry transit traffic
rTransit: an AS that has connections to more than one AS, and carries both transit and local traffic (under certain policy restrictions)
Network Layer 4-190
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-191
Intra-AS Routing
r Also known as Interior Gateway Protocols (IGP)r Most common Intra-AS routing protocols:
m RIP: Routing Information Protocol• Distance-vector
mOSPF: Open Shortest Path First• Link-state
m IGRP: Interior Gateway Routing Protocol (Cisco proprietary)
• Distance-vector
Network Layer 4-192
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-193
RIP (Routing Information Protocol)r Distance vector algorithm
m Distance metric: # of hops (max = 15 hops)m Vectors exchanged every 30 sec and when triggeredm Static update period leads to synchronization problemsm Split horizon with poisonous reverse
r Included in BSD-UNIX Distribution in 1982r RIP-2 in 1993 adds prefix mask for CIDR
DC
BA
u vw
x
yz
destination hopsu 1v 2w 2x 3y 3z 2
From router A to subsets:
Network Layer 4-194
RIP advertisements
rDistance vectors: exchanged among neighbors every 30 sec via Response Message (also called advertisement)
r Each advertisement: list of up to 25 destination nets within AS
Network Layer 4-195
RIP: Example
Destination Network Next Router Num. of hops to dest.w A 2y B 2z B 7x -- 1…. …. ....
w x y
z
A
C
D B
Routing table in D
Network Layer 4-196
RIP: Example
Destination Network Next Router Num. of hops to dest.w A 2y B 2z B A 7 5x -- 1…. …. ....
Routing table in D
w x y
z
A
C
D B
Dest Next hopsw - 1x - 1z C 4…. … ...
Advertisementfrom A to D
Network Layer 4-197
RIP: Link Failure and RecoveryIf no advertisement heard after 180 sec -->
neighbor/link declared deadm routes via neighbor invalidatedm new advertisements sent to neighborsm neighbors in turn send out new advertisements (if
tables changed)m link failure info quickly propagates to entire netm poison reverse used to prevent ping-pong loops
(infinite distance = 16 hops)
Network Layer 4-198
RIP Table processing
r RIP routing tables managed by application-levelprocess called route-d (daemon)
r advertisements sent in UDP packets, periodically repeated
physicallink
network forwarding(IP) table
Transprt(UDP)
routed
physicallink
network(IP)
Transprt(UDP)
routed
forwardingtable
Network Layer 4-199
IGRP (Interior Gateway Routing Protocol)r CISCO proprietary; successor of RIP (mid 80s)
m Distance Vector, like RIPm several cost metrics (delay, bandwidth, reliability, load etc)m 90 sec update with triggered updatesm Split horizon
• V1: path holddown• V2: route poisoning• multiple path support
m uses TCP to exchange routing updatesm EIGRP
• Loop-free routing via DUAL (based on diffused computation)• CIDR support
Network Layer 4-200
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-201
OSPF (Open Shortest Path First)
r “open”: publicly availabler Uses Link State algorithm
m LS packet disseminationm Topology map at each nodem Route computation using Dijkstra’s algorithm
r OSPF advertisement carries one entry per neighbor router
r Advertisements disseminated to entire AS (via flooding)m Carried in OSPF messages directly over IP (rather than TCP
or UDP
Network Layer 4-202
OSPF “advanced” features (not in RIP)
r Security: all OSPF messages authenticated (to prevent malicious intrusion)
r Multiple same-cost paths allowed (only one path in RIP)
r For each link, multiple cost metrics for different TOS (e.g., satellite link cost set “low” for best effort; high for real time)
r Integrated uni- and multicast support: mMulticast OSPF (MOSPF) uses same topology data
base as OSPFr Hierarchical OSPF in large domains.
Network Layer 4-203
Hierarchical OSPF
r Two-level hierarchy: local area, backbone.m Link-state advertisements only in area m each nodes has detailed area topology; only know
direction (shortest path) to nets in other areas.r Area border routers: “summarize” distances to nets
in own area, advertise to other Area Border routers.r Backbone routers: run OSPF routing limited to
backbone.r Boundary routers: connect to other AS’s.
Network Layer 4-205
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-207
Why different Intra- and Inter-AS routing ?
Policy:r Inter-AS: ISP wants control over how its traffic
routed, who routes through its net. m Policy and monetary factors dominate over performance
r Intra-AS: single administrative policym No policy decisions needed, performance dominatesm Focus on performance
Scale:r hierarchical routing saves table size, reduced update
traffic
Network Layer 4-208
HistoryrMid-80s: EGP (Exterior Gateway Protocol)m Used in original ARPAnet m Reachability protocol (no shortest path)
• Single bit for reachability information m Topology restricted to a tree (no cycles allowed)
• ARPA-managed packet switches at top of treem Unacceptable once Internet grew to multiple
independent backbonesr Result: BGP development
Network Layer 4-209
Inter-AS routing: BGPr Link state or distance vector?m Problems with distance-vector:
• Bellman-Ford algorithm may not convergemMore problems with link state:
• Everyone sees every link– LS database too large – entire Internet– Can’t easily control who uses the network (i.e. an ISP may want to
hide particular links from being used by others, but link states are broadcast)
• Metric used by routers not the same – loops– No universal routing metric– Policy drives routing decisions
Network Layer 4-210
BGP
r BGP (Border Gateway Protocol): the de facto standardm Predecessor: EGP (Exterior Gateway Protocol)
r BGP provides each AS a means to:1. Obtain subnet reachability information from neighboring
ASs.2. Propagate the reachability information to all routers
internal to the AS.3. Determine “good” routes to subnets based on
reachability information and policy.r Allows a subnet to advertise its existence to rest
of the Internet: “I am here”
Network Layer 4-211
BGP messages
r BGP messages exchanged using TCP.m Advantages:
• Simplifies BGP• No need for periodic refresh - routes are valid until withdrawn,
or the connection is lost• Note recent news on BGP TCP spoofing attack• Incremental updates
m Disadvantages• Congestion control on a routing protocol?• Poor interaction during high load (Code Red)
m BGP messages:• OPEN: opens TCP connection to peer and authenticates sender• UPDATE: advertises new path (or withdraws old)• KEEPALIVE keeps connection alive in absence of UPDATES; also
ACKs OPEN request• NOTIFICATION: reports errors in previous msg; also used to
close connection
Network Layer 4-212
BGPr Path Vector protocol:m similar to Distance Vector protocolm each Border Gateway broadcast to neighbors (peers)
entire path (I.e, sequence of ASs) to destination• E.g., Gateway X sends its path to dest. Z:
– Path (X,Z) = X,Y1,Y2,Y3,…,Zm When AS gets route check if AS already in path
• If yes, reject route• If no, add self and (possibly) advertise route further
m Allows for policy application (different metrics) • Metrics are local - AS chooses path, protocol ensures no loops
Supports CIDR aggregation (BGP4)Supports alternative routes
Network Layer 4-213
BGP basicsr Pairs of routers (BGP peers) exchange routing info over semi-
permanent TCP conctns: BGP sessionsr Note that BGP sessions do not correspond to physical links.r When AS2 advertises a prefix to AS1, AS2 is promising it will
forward any datagrams destined to that prefix towards the prefix.m AS2 can aggregate prefixes in its advertisement
3b
1d
3a
1c2aAS3
AS1
AS21a
2c
2b
1b
3c
eBGP session
iBGP session
Network Layer 4-214
Distributing reachability infor With eBGP session between 3a and 1c, AS3 sends prefix
reachability info to AS1.r 1c can then use iBGP do distribute this new prefix reach info
to all routers in AS1r 1b can then re-advertise the new reach info to AS2 over the
1b-to-2a eBGP sessionr When router learns about a new prefix, it creates an entry
for the prefix in its forwarding table.
3b
1d
3a
1c2aAS3
AS1
AS21a
2c
2b
1b
3c
eBGP session
iBGP session
Network Layer 4-215
Policy with BGPr BGP provides capability for enforcing various
policiesr Policies are not part of BGP: they are provided
to BGP as configuration informationr BGP enforces policies by choosing paths from
multiple alternatives and controlling advertisement to other AS’s
Network Layer 4-216
Path Selection Criteriar Path attributes + external (policy) informationr Examples:mHop countm Policy considerations
• Preference for AS• Presence or absence of certain AS
m Path originm Link dynamicsm Early-exit
• Hot-potato routing for transit packets
Network Layer 4-217
Examples of BGP PoliciesrA multi-homed AS refuses to act as transitm Limit path advertisement
rA multi-homed AS can become transit for some AS’smOnly advertise paths to some AS’s
rAn AS can favor or disfavor certain AS’s for traffic transit from itself
Network Layer 4-218
BGP routing policy
Figure 4.5-BGPnew: a simple BGP scenario
A
B
C
W X
Y
legend:
customer network:
provider network
r A,B,C are provider networksr X,W,Y are customers (of provider networks)r X is dual-homed: attached to two networksm X does not want to route from B via X to Cm .. so X will not advertise to B a route to C
Network Layer 4-219
BGP routing policy (2)
Figure 4.5-BGPnew: a simple BGP scenario
A
B
C
W X
Y
legend:
customer network:
provider network
r A advertises to B the path AW r B advertises to X the path BAW r Should B advertise to C the path BAW?
m No way! B gets no “revenue” for routing CBAW since neither W nor C are B’s customers
m B wants to force C to route to w via Am B wants to route only to/from its customers!
Network Layer 4-221
1
23
0111
value in arrivingpacket’s header
routing algorithm
local forwarding tableheader value output link
0100010101111001
3221
Interplay between routing and forwarding
Network Layer 4-222
Dijkstra’s algorithm: example
Step012345
N'u
uxuxy
uxyvuxyvw
uxyvwz
D(v),p(v)2,u2,u2,u
D(w),p(w)5,u4,x3,y3,y
D(x),p(x)1,u
D(y),p(y)8
2,x
D(z),p(z)8 8
4,y4,y4,y
u
yx
wv
z2
21
3
1
1
2
53
5
Network Layer 4-223
Dijkstra’s algorithm: example (2)
u
yx
wv
z
Resulting shortest-path tree from u:
vxywz
(u,v)(u,x)
(u,x)(u,x)(u,x)
destination link
Resulting forwarding table in u:
Network Layer 4-224
Distance Vector Algorithm
rDx(y) = estimate of least cost from x to yrDistance vector: Dx = [Dx(y): y ? N ]rNode x knows cost to each neighbor v:
c(x,v)rNode x maintains Dx = [Dx(y): y ? N ]rNode x also maintains its neighbors’
distance vectorsm For each neighbor v, x maintains
Dv = [Dv(y): y ? N ]
Network Layer 4-225
x y zxyz
0 2 78 8 88 8 8
from
cost to
from
from
x y zxyz
0 2 3
from
cost tox y z
xyz
0 2 3
from
cost to
x y zxyz
8 8
8 8 8
cost tox y z
xyz
0 2 7
from
cost tox y z
xyz
0 2 3
from
cost to
x y zxyz
0 2 3
from
cost tox y z
xyz
0 2 7
from
cost tox y z
xyz
8 8 87 1 0
cost to
82 0 1
8 8 8
2 0 17 1 0
2 0 17 1 0
2 0 13 1 0
2 0 13 1 0
2 0 1
3 1 02 0 1
3 1 0
time
x z12
7
y
node x table
node y table
node z table
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)} = min{2+0 , 7+1} = 2
Dx(z) = min{c(x,y) + Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
Network Layer 4-226
VC implementation
A VC consists of:1. Path from source to destination2. VC numbers, one number for each link along
path3. Entries in forwarding tables in routers along
pathr Packet belonging to VC carries a VC
number.r VC number must be changed on each link.m New VC number comes from forwarding table
Network Layer 4-227
Forwarding table12 22 32
1 23
VC number
interfacenumber
Incoming interface Incoming VC # Outgoing interface Outgoing VC #
1 12 3 222 63 1 18 3 7 2 171 97 3 87… … … …
Forwarding table innorthwest router:
Routers maintain connection state information!
Network Layer 4-228
Forwarding table
Destination Address Range Link Interface
11001000 00010111 00010000 00000000through 0
11001000 00010111 00010111 11111111
11001000 00010111 00011000 00000000through 1
11001000 00010111 00011000 11111111
11001000 00010111 00011001 00000000through 2
11001000 00010111 00011111 11111111
otherwise 3
4 billion possible entries
Network Layer 4-229
Longest prefix matching
Prefix Match Link Interface11001000 00010111 00010 0 11001000 00010111 00011000 111001000 00010111 00011 2
otherwise 3
DA: 11001000 00010111 00011000 10101010
Examples
DA: 11001000 00010111 00010110 10100001 Which interface?
Which interface?
Network Layer 4-230
RIP Table example (continued)
Router: giroflee.eurocom.fr
• Three attached class C networks (LANs)• Router only knows routes to attached LANs• Default router used to “go up”• Route multicast address: 224.0.0.0• Loopback interface (for debugging)
Destination Gateway Flags Ref Use Interface -------------------- -------------------- ----- ----- ------ ---------127.0.0.1 127.0.0.1 UH 0 26492 lo0 192.168.2. 192.168.2.5 U 2 13 fa0 193.55.114. 193.55.114.6 U 3 58503 le0 192.168.3. 192.168.3.5 U 2 25 qaa0 224.0.0.0 193.55.114.6 U 3 0 le0 default 193.55.114.129 UG 0 143454
Network Layer 4-232
BGP route selection
r Router may learn about more than 1 route to some prefix. Router must select route.
r Elimination rules:1. Local preference value attribute: policy
decision, hot potato routing2. Shortest AS-PATH 3. Closest NEXT-HOP router4. Additional criteria
Network Layer 4-233
Path attributes & BGP routes
r When advertising a prefix, advert includes BGP attributes. m prefix + attributes = “route”
r Two important attributes:m AS-PATH: contains the ASs through which the advert
for the prefix passed: AS 67 AS 17 m NEXT-HOP: Indicates the specific internal-AS router to
next-hop AS. (There may be multiple links from current AS to next-hop-AS.)
r When gateway router receives route advert, uses import policy to accept/decline.
Network Layer 4-234
3b
1d
3a
1c2aAS3
AS1AS2
1a
2c2b
1b
Intra-ASRouting algorithm
Inter-ASRouting algorithm
Forwardingtable
3c
Interconnected ASes
r Forwarding table is configured by both intra- and inter-AS routing algorithmm Intra-AS sets entries
for internal destsm Inter-AS & Intra-As
sets entries for external dests
Network Layer 4-235
3b
1d
3a
1c2aAS3
AS1AS2
1a
2c2b
1b
3c
Inter-AS tasksr Suppose router in AS1
receives datagram for which dest is outside of AS1m Router should forward
packet towards one of the gateway routers, but which one?
AS1 needs:1. to learn which dests
are reachable through AS2 and which through AS3
2. to propagate this reachability info to all routers in AS1
Job of inter-AS routing!
Network Layer 4-236
Example: Setting forwarding table in router 1d
rSuppose AS1 learns from the inter-AS protocol that subnet x is reachable from AS3 (gateway 1c) but not from AS2.
r Inter-AS protocol propagates reachabilityinfo to all internal routers.
r Router 1d determines from intra-AS routing info that its interface I is on the least cost path to 1c.
r Puts in forwarding table entry (x,I).
Network Layer 4-237
Learn from inter-AS protocol that subnet x is reachable via multiple gateways
Use routing infofrom intra-AS
protocol to determinecosts of least-cost
paths to eachof the gateways
Hot potato routing:Choose the gateway
that has the smallest least cost
Determine fromforwarding table the interface I that leads
to least-cost gateway. Enter (x,I) in
forwarding table
Example: Choosing among multiple ASes
r Now suppose AS1 learns from the inter-AS protocol that subnet x is reachable from AS3 and from AS2.
r To configure forwarding table, router 1d must determine towards which gateway it should forward packets for dest x.
r This is also the job on inter-AS routing protocol!r Hot potato routing: send packet towards closest of
two routers.
Network Layer 4-238
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-239
Router Architecture OverviewTwo key router functions:r RoutingmDetermine route taken by packets from source to
destinationmRun protocol (RIP, OSPF, BGP)
• Generate forwarding table from routing algorithms• Algorithms based on either (LS,DV)
r Forwardingm Process of moving packets from input port to output portmLookup forwarding table given information in packetmSwitch/forward datagrams from incoming to outgoing link
based on route
Network Layer 4-240
What Does a Router Look Like?r Routing processor/controller
m Handles routing protocols, error conditions r Line cards
m Network interface cardsr Forwarding engine
m Fast path routing (hardware vs. software)r Backplane
m Switch or bus interconnect
Network Layer 4-241
Typical mode of operationr Packet arrives arrives at inbound line cardr Header transferred to forwarding enginer Forwarding engine determines output interface given a
table initialized by routing processorr Forwarding engine signals result to line cardr Packet copied to outbound line card
Network Layer 4-242
Routing Processorr Runs routing protocol r Uploads forwarding table to forwarding engines
m Forwarding engines with two forwarding tables to allow easy switchover (double buffering)
r Typically performs “slow-path” processingm ICMP error messagesm IP option processingm IP fragmentation m IP multicast packets
Network Layer 4-243
Input Port Functions
Decentralized switching:r given datagram dest., lookup output port
using forwarding table in input port memory
r goal: complete input port processing at ‘line speed’
r queuing: if datagrams arrive faster than forwarding rate into switch fabric
Physical layer:bit-level reception
Data link layer:e.g., Ethernetsee chapter 5
Network Layer 4-244
Input Port Queuingr Fabric slower than input ports combined => queuing
may occur at input queues r Head-of-the-Line (HOL) blocking: queued datagram
at front of queue prevents others in queue from moving forward
r queueing delay and loss due to input buffer overflow!
Network Layer 4-245
Input Port Queuingr Possible solutionm Virtual output buffering
• Maintain per output buffer at input• Solves head of line blocking problem• Each of MxN input buffer places bid for output
Network Layer 4-246
Forwarding Enginer Two major components
m Lookup logic/software• Data structures and algorithms to lookup route table• See previous section on IP route lookup
m Caches• Small, fast memory storing recent lookups
m Alternatives• Hardware-support• Hints
Network Layer 4-247
Cachesr Leverage temporal localityr Many packets to same destination
m Long flows help, short flows do not
r Similar to idea behind IP switching (ATM/MPLS) where long-lived flows map into single label
r Examplem Partridge, et. al. “A 50-Gb/s IP Router”, IEEE Trans. On Networking, Vol
6, No 3, June 1998. m 8KB L1 Icache
• Holds full forwarding codem 96KB L2 cache
• Forwarding table cachem 16MB L3 cache
• Full forwarding table x 2 - double buffered for updates
Network Layer 4-248
Alternativesr Lookup via content addressable memory (CAM)
m Hardware based route lookupm Input = tag, output = value associated with tagm Requires exact match with tag
• Multiple cycles (1 per prefix length searched) with single CAM• Multiple CAMs (1 per prefix) searched in parallel
m Ternary CAM• 0,1,don’t care values in tag match• Priority (i.e. longest prefix) by order of entries in CAM
r “Spatial caching” via protocol accelerationm Add clue (5 bits) to IP headerm Indicate where IP lookup ended on previous node (Bremler-Barr
SIGCOMM 99)
Network Layer 4-249
Types of network switching fabrics
Memory
BusMultistage interconnection
Crossbar interconnection
Network Layer 4-250
Types of network switching fabricsr Issuesm Switch contention
• Packets arrive faster than switching fabric can switch• Speed of switching fabric versus line card speed
determines input queuing vs. output queuing
Network Layer 4-251
Switching Via MemoryFirst generation routers:r packet copied by system’s (single) CPUr 2 bus crossings per datagramr speed limited by memory bandwidth Second generation routers:r input port processor performs lookup, copy into memoryr Cisco Catalyst 8500
InputPort
OutputPort
Memory
System Bus
Network Layer 4-252
Switching Via Busr Datagram from input port memory directly to output port memory
via a shared busr Issues
m Bus contention: switching speed limited by bus bandwidthr Examples
m 1 Gbps bus, Cisco 1900: sufficient speed for access and enterprise routers (not regional or backbone)
Network Layer 4-253
Switching Via An Interconnection Networkr Overcome bus bandwidth limitationsr Crossbar networks
m Fully connected (n2 elements)m All one-to-one, invertible permutations supported
r Issuesm Crossbar with N2 elements hard to scale
Network Layer 4-254
Switching Via An Interconnection Network
r Multi-stage interconnection networks (Banyan)m Initially developed to connect processors in multiprocessorm Typically O(n log n) elementsm Datagram fragmented fixed length cells, switched through the
fabricr Issues
m Blocking (not all one-to-one, invertible permutations supported)
r Examplem Cisco 12000: Gbps through an interconnection network
A
B
C
D
W
X
Y
Z
Network Layer 4-255
Output Ports
r Output contentionm Datagrams arrive from fabric faster than output port’s transmission
ratem Buffering requiredm Scheduling discipline chooses among queued datagrams for
transmission
Network Layer 4-256
Output port queueing
r buffering when arrival rate via switch exceeds ouput line speed
r queueing (delay) and loss due to output port buffer overflow!
Network Layer 4-257
Chapter 4: Network Layer
r 4. 1 Introductionr 4.2 Virtual circuit and
datagram networksr 4.3 What’s inside a
routerr 4.4 IP: Internet
Protocolm Datagram formatm IPv4 addressingm ICMPm IPv6
r 4.5 Routing algorithmsm Link statem Distance Vectorm Hierarchical routing
r 4.6 Routing in the Internetm RIPm OSPFm BGP
r 4.7 Broadcast and multicast routing
Network Layer 4-258
R1
R2
R3 R4
sourceduplication
R1
R2
R3 R4
in-networkduplication
duplicatecreation/transmissionduplicate
duplicate
Broadcast RoutingrDeliver packets from source to all other nodesrSource duplication is inefficient:
rSource duplication: how does source determine recipient addresses?
Network Layer 4-259
In-network duplication
r Flooding: when node receives brdcst pckt, sends copy to all neighborsm Problems: cycles & broadcast storm
r Controlled flooding: node only brdcsts pktif it hasn’t brdcst same packet beforemNode keeps track of pckt ids already brdcstedmOr reverse path forwarding (RPF): only forward
pckt if it arrived on shortest path between node and source
rSpanning treemNo redundant packets received by any node
Network Layer 4-260
A
B
G
DE
c
F
A
B
G
DE
c
F
(a) Broadcast initiated at A (b) Broadcast initiated at D
Spanning Tree
r First construct a spanning treerNodes forward copies only along spanning
tree
Network Layer 4-261
A
B
G
DE
c
F1
2
3
4
5
(a) Stepwise construction of spanning tree
A
B
G
DE
c
F
(b) Constructed spanning tree
Spanning Tree: Creationr Center noder Each node sends unicast join message to center
nodem Message forwarded until it arrives at a node already
belonging to spanning tree
Multicast Routing: Problem StatementrGoal: find a tree (or trees) connecting
routers having local mcast group members m tree: not all paths between routers usedm source-based: different tree from each sender to rcvrsm shared-tree: same tree used by all group members
Shared tree Source-based trees
Approaches for building mcast trees
Approaches:r source-based tree: one tree per sourcem shortest path treesm reverse path forwarding
r group-shared tree: group uses one treemminimal spanning (Steiner) m center-based trees
…we first look at basic approaches, then specific protocols adopting these approaches
Shortest Path Tree
rmcast forwarding tree: tree of shortest path routes from source to all receiversm Dijkstra’s algorithm
R1
R2
R3
R4
R5
R6 R7
21
6
3 45
i
router with attachedgroup member
router with no attachedgroup memberlink used for forwarding,i indicates order linkadded by algorithm
LEGENDS: source
Reverse Path Forwarding
if (mcast datagram received on incoming link on shortest path back to center)then flood datagram onto all outgoing linkselse ignore datagram
q rely on router’s knowledge of unicast shortest path from it to sender
q each router has simple forwarding behavior:
Reverse Path Forwarding: example
• result is a source-specific reverse SPT– may be a bad choice with asymmetric links
R1
R2
R3
R4
R5
R6 R7
router with attachedgroup member
router with no attachedgroup memberdatagram will be forwarded
LEGENDS: source
datagram will not be forwarded
Reverse Path Forwarding: pruningr forwarding tree contains subtrees with no mcast
group membersm no need to forward datagrams down subtreem “prune” msgs sent upstream by router with no
downstream group members
R1
R2
R3
R4
R5
R6 R7
router with attachedgroup memberrouter with no attachedgroup memberprune message
LEGENDS: source
links with multicastforwarding
P
P
P
Shared-Tree: Steiner Tree
rSteiner Tree: minimum cost tree connecting all routers with attached group members
r problem is NP-completer excellent heuristics existsr not used in practice:m computational complexitym information about entire network neededmmonolithic: rerun whenever a router needs to
join/leave
Center-based trees
r single delivery tree shared by allr one router identified as “center” of treer to join:m edge router sends unicast join-msg addressed
to center routerm join-msg “processed” by intermediate routers
and forwarded towards centerm join-msg either hits existing tree branch for
this center, or arrives at centerm path taken by join-msg becomes new branch of
tree for this router
Center-based trees: an example
Suppose R6 chosen as center:
R1
R2
R3
R4
R5
R6 R7
router with attachedgroup memberrouter with no attachedgroup memberpath order in which join messages generated
LEGEND
21
3
1
Internet Multicasting Routing: DVMRP
rDVMRP: distance vector multicast routing protocol, RFC1075
r flood and prune: reverse path forwarding, source-based treem RPF tree based on DVMRP’s own routing tables
constructed by communicating DVMRP routers m no assumptions about underlying unicastm initial datagram to mcast group flooded
everywhere via RPFm routers not wanting group: send upstream prune
msgs
DVMRP: continued…r soft state: DVMRP router periodically (1 min.)
“forgets” branches are pruned: mmcast data again flows down unpruned branchm downstream router: reprune or else continue to
receive datar routers can quickly regraft to tree m following IGMP join at leaf
r odds and endsm commonly implemented in commercial routersmMbone routing done using DVMRP
TunnelingQ: How to connect “islands” of multicast
routers in a “sea” of unicast routers?
q mcast datagram encapsulated inside “normal” (non-multicast-addressed) datagram
q normal IP datagram sent thru “tunnel” via regular IP unicast to receiving mcast router
q receiving mcast router unencapsulates to get mcast datagram
physical topology logical topology
PIM: Protocol Independent Multicast
r not dependent on any specific underlying unicast routing algorithm (works with all)
r two different multicast distribution scenarios :
Dense:q group members
densely packed, in “close” proximity.
q bandwidth more plentiful
Sparse:q # networks with group
members small wrt # interconnected networks
q group members “widely dispersed”
q bandwidth not plentiful
Consequences of Sparse-Dense Dichotomy:
Denser group membership by
routers assumed until routers explicitly prune
r data-driven construction on mcast tree (e.g., RPF)
r bandwidth and non-group-router processing profligate
Sparse:r no membership until
routers explicitly joinr receiver- driven
construction of mcast tree (e.g., center-based)
r bandwidth and non-group-router processing conservative
PIM- Dense Mode
flood-and-prune RPF, similar to DVMRP butq underlying unicast protocol provides RPF info
for incoming datagramq less complicated (less efficient) downstream
flood than DVMRP reduces reliance on underlying routing algorithm
q has protocol mechanism for router to detect it is a leaf-node router
PIM - Sparse Mode
r center-based approachr router sends join msg
to rendezvous point (RP)m intermediate routers
update state and forward join
r after joining via RP, router can switch to source-specific treem increased performance:
less concentration, shorter paths
R1
R2
R3
R4
R5
R6R7
join
join
join
all data multicastfrom rendezvouspoint
rendezvouspoint
PIM - Sparse Mode
sender(s):r unicast data to RP,
which distributes down RP-rooted tree
r RP can extend mcast tree upstream to source
r RP can send stop msg if no attached receiversm “no one is listening!”
R1
R2
R3
R4
R5
R6R7
join
join
join
all data multicastfrom rendezvouspoint
rendezvouspoint
Network Layer 4-279
NL: Advanced topicsr Routing synchronizationr Routing instabilityr Routing metricsrOverlay networksr Routing alternatives: Landmark routing
Network Layer 4-280
NL: Routing Update SynchronizationrDynamic robustness issue to consider...
m Intuitive assumption that independent streams will not synchronize is not always valid
m Abrupt transition from unsynchronized to synchronized system states
Network Layer 4-281
NL: How Synchronization OccursT
AMessage from B
Weak Coupling when A’s behavior is triggered off of B’smessage arrival!
A
T
Weak couplingcan result in
eventual synchronization
Network Layer 4-282
NL: Examples/Sources of Synchronizationr TCP congestion window behaviorr Periodic transmission by audio/video applicationsr Synchronized client restartr Routing
m Periodic routing protocol messages from different routersm Lots of this in initial routing protocols....
Network Layer 4-283
NL: Routing Source of Synchronizationr Router resets timer after processing its own and incoming
updatesr Creates weak coupling among routersr Solutions
m Set timer based on clock event that is not a function of processing other routers’ updates, or
m Add randomization, or reset timer before processing update• With increasing randomization, abrupt transition from
predominantly synchronized to predominantly unsynchronized• Most protocols now incorporate some form of randomization
Network Layer 4-284
NL: Routing Instabilityr References
m C. Labovitz, R. Malan, F. Jahanian, ``Internet Routing Stability'', SIGCOMM 1997.
r Record of BGP messages at major exchangesr Discovered orders of magnitude larger than expected
updatesm Bulk were duplicate withdrawals
• Stateless implementation of BGP – did not keep track of information passed to peers
• Impact of few implementationsm Strong frequency (30/60 sec) components
• Interaction with other local routing/links etc.
Network Layer 4-285
NL: Route Flap StormrOverloaded routers fail to send Keep_Alive
message and marked as downr BGP peers find alternate pathsrOverloaded router re-establishes peering
sessionrMust send large updates r Increased load causes more routers to fail!
Network Layer 4-286
NL: Route Flap Dampeningr Routers now give higher priority to
BGP/Keep_Alive to avoid problemrAssociate a penalty with each route on changem Increase when route flapsm Exponentially decay penalty with time
rWhen penalty reaches threshold, suppress route
Network Layer 4-287
NL: Overlay Routingr Basic idea:
m Treat multiple hops through IP network as one hop in an overlay network
m Run routing protocol on overlay nodes
rWhy?m For performance – can run more clever protocol on overlaym For efficiency – can make core routers very simplem For functionality – can provide new features such as multicast,
active processing, IPv6
Network Layer 4-288
NL: Overlay for Performancer References
m Savage et. al. “The End-to-End Effects of Internet Path Selection”, SIGCOMM 99
m Anderson et. al. “Resilient Overlay Networks”, SOSP 2001r Why would IP routing not give good performance?
m Policy routing – limits selection/advertisement of routesm Early exit/hot-potato routing – local not global incentivesm Lack of performance based metrics – AS hop count is the wide
area metricr How bad is it really?
m Look at performance gain an overlay provides
Network Layer 4-289
NL: Quantifying Performance LossrMeasure round trip time (RTT) and loss rate
between pairs of hostsm ICMP rate limiting
rAlternate path characteristicsm 30-55% of hosts had lower latencym 10% of alternate routes have 50% lower latencym 75-85% have lower loss rates
Network Layer 4-290
NL: Bandwidth Estimationr RTT & loss for multi-hop pathm RTT by additionm Loss either worst or combine of hops – why?
• Large number of flowsà combination of probabilities• Small number of flowsà worst hop
r Bandwidth calculationm TCP bandwidth is based primarily on loss and RTT
r 70-80% paths have better bandwidthr 10-20% of paths have 3x improvement
Network Layer 4-291
NL: Overlay for EfficiencyrMulti-path routingmMore efficient use of links or QOSmNeed to be able to direct packets based on more
than just destination address à can be computationally expensive
mWhat granularity? Per source? Per connection? Per packet?
• Per packet à re-ordering• Per source, per flow à coarse grain vs. fine grain
m Take advantage of relative duration of flows• Most bytes on long flows
Network Layer 4-292
NL: Overlay for FeaturesrHow do we add new features to the network?m Does every router need to support new feature?m Choices
• Reprogram all routers à active networks• Support new feature within an overlay
m Basic technique: tunnel packets rTunnelsm IP-in-IP encapsulationm Poor interaction with firewalls, multi-path routers,
etc.
Network Layer 4-293
NL: Examplesr IP V6 & IP Multicastm Tunnels between routers supporting feature
rMobile IPmHome agent tunnels packets to mobile host’s
locationm http://www.rfc-editor.org/rfc/rfc2002.txt
rQOSmNeeds some support from intermediate routers
Network Layer 4-294
NL: Overlay ChallengesrHow do you build efficient overlaym Probably don’t want all N2 links – which links to
create?mWithout direct knowledge of underlying topology
how to know what’s nearby and what is efficient?
Network Layer 4-295
NL: Future of OverlayrApplication specific overlaysmWhy should overlay nodes only do routing?
r Cachingm Intercept requests and create responses
rTranscodingm Changing content of packets to match available
bandwidthr Peer-to-peer applications
Network Layer 4-296
NL: Routing alternatives: Landmark routingr Details about things nearby and less information about
things far awayr Not defined by arbitrary boundaries
m Thus, not well suited to the real world that does have administrative boundaries
r Example: My apartment• MtHood.Portland.USBancorpTower.PearlDistrict.KearneyPlaza• From Beaverton
– Go towards Mt. Hood– See USBancorpTower before running into Mt.Hood – See PearlDistrict before running into USBancorpTower– Reach PearlDistrict and route to Kearney Plaza 2 blocks away
• From The Dalles– Go towards Mt. Hood, reach it– Go towards Portland, see USBancorpTower– Go towards and reach USBancorpTower– Go towards and reach PearlDistrict, route to Kearney Plaza 2 blocks away
Network Layer 4-298
NL: Landmark Overviewr Landmark routers have “height” which determines how
far away they can be seen (visibility)m Routers within radius n can see a landmark router LMn
• See = routers have LMn’s address and know next hop to reach it.
m Router x as an entry for router y if x is within radius of ym Routing table: Landmark (LM2(d)), Level(2), Next hop
r Intuitionm Everyone knows how to get to the highest landmark (level N)m Highest landmark knows how to get you to any landmark at level N-1
(i.e. the N-1 level landmark that matches your destination)m That level N-1 landmark, knows how to get you to your level N-2, etc.m Along the way, you may find a router that lets you short-circuit path
to higher landmarks and take you to destination
Network Layer 4-299
NL: LM Hierarchy Definitionr Each LM i associated with level (i) and radius (ri)r Every node is an LM0 landmarkr Recursion: some LMi are also LMi+1m Every LMi sees at least one LMi+1
rTerminating state when all level j LMs are seen by entire network
Network Layer 4-300
NL: LM Self-configuration
r Bottom-up hierarchy construction algorithmm Every router is L0 landmarkm All Li landmarks run election to self-promote one or more Li+1
landmarksr LM level maps to radius (part of configuration), e.g.:
m LM level 0: radius 2m LM level 1: radius 4m LM level 2: radius 8
r Dynamic algorithm to adapt to topology changes –Efficient hierarchy in terms of storage required
Network Layer 4-301
NL: LM Addressesr LM(2).LM(1).LM(0)
(C.B.A)r If destination is far
away, will not have complete routing information, refer to LM(1) portion of address, if not known then refer to LM(2) LM2C
LM1B
R2
R1
LM0A aka C.B.A
R0
Network Layer 4-302
NL: LM Routingr LM does not imply hierarchical forwardingm En route to LMn, packet may encounter router that
is within LM0 radius of destination address (like longest match)
rNOT a source router Paths may be asymmetric
Network Layer 4-303
•Source wants to reach LM0[a], whose address is c.b.a:
? Source can see LM2[c], so sends packet towards c
? Entering LM1[b] area, first router diverts packet to b
? Entering LM0[a] area, packet delivered to a
•Not shortest path•Packet may not reach landmarks
NL: Landmark Routing: Basic Operation
LM2[c]
LM1[b]r0[a]
LM0[a]
r2[c]
r1[b]
Network Node
Path
Landmark Radius
Network Layer 4-304
NL: Landmark Routing: Example
d.d.a
d.d.b
d.d.c
d.d.e
d.d.d
d.d.f
d.i.kd.i.g
d.d.j
d.i.i
d.i.w
d.i.ud.d.kd.d.l
d.n.hd.n.x
d.n.n
d.n.o
d.n.p
d.n.q
d.n.t
d.n.s
d.n.r
d.i.v
Network Layer 4-305
NL: Routing Table for Router g
Landmark Level Next hop
LM2[d]
LM0[e]
LM1[i]
LM0[k]
LM0[f]
2
1
0
0
0
f
k
f
k
f
Router g
Router t
r0 = 2, r1 = 4, r2 = 8 hops•How to go from d.i.g to d.n.t? g-f-e-d-u-t
•How does path length compare to shortest path? g-k-I-u-t
d.d.a
d.d.b
d.d.c
d.d.e
d.d.d
d.d.f
d.i.kd.i.g
d.d.j
d.i.i
d.i.w
d.i.ud.d.kd.d.l
d.n.hd.n.x
d.n.n
d.n.o
d.n.p
d.n.q
d.n.t
d.n.s
d.n.r
Network Layer 4-306
NL: Network layer summaryrNetwork layer functionsrSpecific network layers (IPv4, IPv6)rSpecific network layer devices (routers)rAdvanced network layer topics
Network Layer 4-307
Issues with Multi-homing
rSymmetric routingmWhile preference symmetric paths, many are
asymmetricr Packet re-orderingmMay trigger TCP’s fast retransmit algorithm
rOther concerns:m Addressing, DNS, aggregation
Network Layer 4-308
Multi-homing to a Single Provider
ISP
Customer
R1
R2
r Easy solution:m Use IMUX or Multi-link
PPPr Hard solution:
m Use BGPm Makes assumptions
about traffic (same amount of prefixes can be reached from both links)
Network Layer 4-309
Multi-homing to a Single Provider
ISP
Customer
R1
R2
r If multiple prefixes, may use MEDm Good if traffic load to
prefixes is equalr If single prefix, load
may be unequalm Break-down prefix and
advertise different prefixes over different links
R3
138.39/16 204.70/16
Network Layer 4-310
Multi-homing to a Single Provider
ISP
Customer
R1 R2
r For traffic to customer, same as before:m Use MEDm Good if traffic load to
prefixes is equalr For traffic to ISP:
m R3 alternates linksm Multiple default routes
R3
138.39/16 204.70/16
Network Layer 4-311
Multi-homing to a Single Provider
ISP
Customer
R1 R2
r Most reliable approachm No equipment sharing
r Use MED
R3
138.39/16 204.70/16
R4
Network Layer 4-312
Outline
r External BGP (E-BGP)
r Internal BGP (I-BGP)
rMulti-Homing
rStability Issues
Network Layer 4-313
Multi-homing
rWith multi-homing, a single network has more than one connection to the Internet.
r Improves reliability and performance:m Can accommodate link failurem Bandwidth is sum of links to Internet
r Challengesm Getting policy right (MED, etc..)m Addressing
Network Layer 4-314
Multi-homing to Multiple Providers
r Major issues:m Addressingm Aggregation
r Customer address space:m Delegated by ISP1m Delegated by ISP2m Delegated by ISP1 and
ISP2m Obtained independently
ISP1 ISP2
ISP3
Customer
Network Layer 4-315
Address Space from one ISPr Customer uses address
space from ISP1r ISP1 advertises /16
aggregater Customer advertises /24
route to ISP2r ISP2 relays route to ISP1
and ISP3r ISP2-3 use /24 router ISP1 routes directlyr Problems with traffic
load?
138.39/16
138.39.1/24
ISP1 ISP2
ISP3
Customer
Network Layer 4-316
Pitfalls
r ISP1 aggregates to a /19 at border router to reduce internal tables.
r ISP1 still announces /16.r ISP1 hears /24 from
ISP2.r ISP1 routes packets for
customer to ISP2!r Workaround: ISP1 must
inject /24 into I-BGP.
138.39.0/19
138.39/16
ISP1 ISP2
ISP3
Customer
138.39.1/24
Network Layer 4-317
Address Space from Both ISPs
r ISP1 and ISP2 continue to announce aggregates
r Load sharing depends on traffic to two prefixes
r Lack of reliability: if ISP1 link goes down, part of customer becomes inaccessible.
r Customer may announce prefixes to both ISPs, but still problems with longest match as in case 1.
138.39.1/24 204.70.1/24
ISP1 ISP2
ISP3
Customer
Network Layer 4-318
Address Space Obtained Independentlyr Offers the most
control, but at the cost of aggregation.
r Still need to control paths
ISP1 ISP2
ISP3
Customer
Network Layer 4-319
Measurement of Real Ethernet
r Evaluate performance in some typical scenariosm Scenario 1
• Topology: 4 clusters of 6 hosts – similar to office configuration
• Fixed pkt size• Throughput decreases with number of hosts &
increases with pkt size – as expected• Fairness improves with number of hosts – capture
effects less likely• Only linear increase in delay with number of hosts -
unexpected
Network Layer 4-320
Measurement of Real Ethernet
rScenario 2Topology: 23 hosts on short netLoad: fixed pkt sizeImprovement in bit rate over scenario 1Scenario 3Topology: 4 clustersLoad: bimodal pkt size7/1 ratio of small to large pkts is sufficient to greatly improve total bit rate
Network Layer 4-321
How to Improve Performance
r No long cablesr Fewer hosts per cabler Use large packetsr Don't mix real-time w/ bulk-data if possibler Can’t provide good efficiency/throughput and
good latencyr Ethernet Packet Traces
m Ethernet traffic is “self-similar” (fractal)m Bursty at every time scale (msecs to months)m Implication?
• On average, low load• Occasional peaks
Network Layer 4-323
Problems
r Routing table sizemNeed an entry for all paths to all networks
r Required memory= O((N + M*A) * K)mN: number of networksmM: mean AS distance (in terms of hops)m A: number of AS’sm K: number of BGP peers
Network Layer 4-324
Routing Table Size
Mean AS Distance Number of AS’s
2,100 5 59
4,000 10 100
10,000 15 300
BGP Peers/Net
3
6
10
100,000 20 3,000 20
Networks Memory
27,000
108,000
490,000
1,040,000
r Problem reduced with CIDR
Network Layer 4-325
Routing Information Bases (RIB)r Routes are stored in RIBsrAdj-RIBs-In: routing info that has been
learned from other routers (unprocessed routing info)
r Loc-RIB: local routing information selected from Adj-RIBs-In (routes selected locally)
rAdj-RIBs-Out: info to be advertised to peers (routes to be advertised)
Network Layer 4-326
BGP Common Header
Length (2 bytes) Type (1 byte)
0 1 2 3
Marker (security and message delineation)16 bytes
Types: OPEN, UPDATE, NOTIFICATION, KEEPALIVE
Network Layer 4-327
BGP Messages
r Openm Announces AS IDm Determines hold timer – interval between keep_alive or
update messages, zero interval implies no keep_aliver Keep_alive
• Sent periodically (but before hold timer expires) to peers to ensure connectivity.
• Sent in place of an UPDATE messager Notification
• Used for error notification• TCP connection is closed immediately after notification
Network Layer 4-328
BGP UPDATE Message
r List of withdrawn routesrNetwork layer reachability informationm List of reachable prefixes
r Path attributesmOriginm PathmMetrics
rAll prefixes advertised in message have same path attributes
Network Layer 4-329
LOCAL PREF
r Local (within an AS) mechanism to provide relative priority among BGP routers
R1 R2
R3 R4I-BGP
AS 256
AS 300
Local Pref = 500 Local Pref =800
AS 100
R5AS 200
Network Layer 4-330
AS_PATH
r List of traversed AS’s
AS 500
AS 300
AS 200 AS 100
180.10.0.0/16 300 200 100170.10.0.0/16 300 200
170.10.0.0/16 180.10.0.0/16
Network Layer 4-331
CIDR and BGP
AS X197.8.2.0/24
AS Y197.8.3.0/24
AS T (provider)197.8.0.0/23
AS Z
What should T announce to Z?
Network Layer 4-332
Options
rAdvertise all paths:m Path 1: through T can reach 197.8.0.0/23m Path 2: through T can reach 197.8.2.0/24m Path 3: through T can reach 197.8.3.0/24
r But this does not reduce routing tables! We would like to advertise:m Path 1: through T can reach 197.8.0.0/22
Network Layer 4-333
Sets and Sequences
r Problem: what do we list in the route?• List T: omitting information not acceptable, may lead
to loops• List T, X, Y: misleading, appears as 3-hop path
rSolution: restructure AS Path attribute as:• Path: (Sequence (T), Set (X, Y))• If Z wants to advertise path:
– Path: (Sequence (Z, T), Set (X, Y))• In practice used only if paths in set have same
attributes
Network Layer 4-334
Multi-Exit Discriminator (MED)
rHint to external neighbors about the preferred path into an AS mNon-transitive attribute (we will see later why)m Different AS choose different scales
rUsed when two AS’s connect to each other in more than one place
Network Layer 4-335
MED
rHint to R1 to use R3 over R4 linkr Cannot compare AS40’s values to AS30’s
R1 R2
R3 R4
AS 30
AS 40
180.10.0.0MED = 120
180.10.0.0MED = 200
AS 10
180.10.0.0MED = 50
Network Layer 4-336
MED• MED is typically used in provider/subscriber scenarios• It can lead to unfairness if used between ISP because it
may force one ISP to carry more traffic:
SF
NY
• ISP1 ignores MED from ISP2• ISP2 obeys MED from ISP1• ISP2 ends up carrying traffic most of the way
ISP1
ISP2
Network Layer 4-337
Other Attributes
rORIGINm Source of route (IGP, EGP, other)
rNEXT_HOPm Address of next hop router to usem Used to direct traffic to non-BGP router
r Check out http://www.cisco.com for full explanation
Network Layer 4-338
Decision Process
r Processing order of attributes:m Select route with highest LOCAL-PREFm Select route with shortest AS-PATHm Apply MED (if routes learned from same
neighbor)
Network Layer 4-339
Outline
r External BGP (E-BGP)
r Internal BGP (I-BGP)
rMulti-Homing
rStability Issues
Network Layer 4-340
Internal vs. External BGP
R3 R4R1
R2
E-BGP
•BGP can be used by R3 and R4 to learn routes•How do R1 and R2 learn routes?•Option 1: Inject routes in IGP
•Only works for small routing tables•Option 2: Use I-BGP
AS1 AS2
Network Layer 4-341
Internal BGP (I-BGP)
rSame messages as E-BGPrDifferent rules about re-advertising
prefixes:m Prefix learned from E-BGP can be advertised to
I-BGP neighbor and vice-versa, but m Prefix learned from one I-BGP neighbor cannot
be advertised to another I-BGP neighborm Reason: no AS PATH within the same AS and
thus danger of looping.
Network Layer 4-342
Internal BGP (I-BGP)
R3 R4R1
R2
E-BGP
I-BGP
• R3 can tell R1 and R2 prefixes from R4• R3 can tell R4 prefixes from R1 and R2• R3 cannot tell R2 prefixes from R1
R2 can only find these prefixes through a direct connection to R1Result: I-BGP routers must be fully connected (via TCP)!
• contrast with E-BGP sessions that map to physical links
AS1 AS2
Network Layer 4-343
Link Failures
rTwo types of link failures:m Failure on an E-BGP linkm Failure on an I-BGP Link
rThese failures are treated completely different in BGP
rWhy?
Network Layer 4-344
Failure on an E-BGP Link
AS1 R1 AS2R2Physical link
E-BGP session
138.39.1.1/30 138.39.1.2/30
• If the link R1-R2 goes down• The TCP connection breaks• BGP routes are removed
• This is the desired behavior
Network Layer 4-345
Failure on an I-BGP Link
R1
R2
R3
Physical link
I-BGP connection
138.39.1.1/30
138.39.1.2/30
•If link R1-R2 goes down, R1 and R2 should still be able to exchange traffic
•The indirect path through R3 must be used•Thus, E-BGP and I-BGP must use different conventions with respect to TCP endpoints
Network Layer 4-346
Distance Vector in Practice
r RIP and RIP2m Uses split-horizon/poison reverse
r BGPm Propagates entire pathm Path also used for effecting policies