Date post: | 19-Jan-2018 |
Category: |
Documents |
Upload: | clinton-oconnor |
View: | 215 times |
Download: | 0 times |
1
Forwarding and TransportEE 122: Intro to Communication Networks
Fall 2010 (MW 4-5:30 in 101 Barker)
Scott Shenker
TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev, Prayag Narula
http://inst.eecs.berkeley.edu/~ee122/
Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxsonand other colleagues at Princeton and UC Berkeley
Announcements• HW #1 Being graded
• HW #2 Out tonight
• I have 61 slides…..– If I finish them, you all flunk.
2
3
Goals of Today’s Lecture• Quick summary of IP addressing
• IP packet forwarding
• Transport layer
• Reliable delivery
• Non-goals:– Details of TCP (later)– Congestion control (later)– Finishing these slides….
4
Summary of IP Addressing• 32-bit numbers identify interfaces
• Allocated in prefixes
• Non-uniform hierarchy for scalability and flexibility– Routing is based on CIDR
• A number of special-purpose blocks reserved
• Address allocation:– ICANN RIR ISP customer network host
• Issues to be covered later– How hosts get their addresses (DHCP)– How to map from an IP address to a link address (ARP)
5
CIDR: Hierarchal Address Allocation
• Prefixes are key to Internet scalability– Addresses allocated in contiguous chunks (prefixes)– Routing protocols and packet forwarding based on prefixes
12.253.128.0/17
12.0.0.0/8
12.0.0.0/15
12.253.0.0/16
12.2.0.0/1612.3.0.0/16
::
12.3.0.0/2212.3.4.0/24
::
12.3.254.0/23
12.253.0.0/1912.253.32.0/1912.253.64.0/1912.253.64.108/3012.253.96.0/18
:::
:
Hierarchical Structure
• Helps scalability
• But also allows for easy delegation of control• ICANN RIR ISP customer network host
• This is a recurring Internet theme (e.g., DNS next lecture)• Deployability and scalability arise from loose hierarchical structure
6
But What Are Addresses Used For?• They allow the network to deliver packets to the
right destination– They are a means to achieve a goal– Not a goal themselves
• This is done by hop-by-hop packet forwarding
• To which we now turn…..
7
8
Packet Forwarding
9
Hop-by-Hop Packet Forwarding• Each router has a forwarding table
–Maps destination addresses…–… to outgoing interfaces (= links)
• Forwarding table derived from:–Routing algorithms–(or static configuration)
• Upon receiving a packet–Inspect the destination IP address in the header–Index into the forwarding table–Forward packet out appropriate interface
Using the Forwarding Table• With classful addressing, this is easy:
– Early bits in address specify masko Class A [0]: /8 Class B [10]: /16 Class C [110]: /24
– Can find exact match in forwarding tableo Use prefix as index into hash table
• Why won’t this work for CIDR?– Address doesn’t specify mask
• Two problems with CIDR forwarding tables– Finding match isn’t trivial– Non-topological addressing
10
11
Example #1: Provider w/ 4 Customers
Prefix Link201.143.0.0/22 Link 1201.143.4.0.0/24 Link 2201.143.5.0.0/24 Link 3201.143.6.0/23 Link 4
201.143.0.0/22 201.143.4.0/24 201.143.5.0/24 201.143.6.0/23
ProviderLink 1
Link 2 Link 3
Link 4
11001001 10001111 000000−− −−−−−−−
Finding the Match• No address matches more than one prefix
– But can’t easily find match
• Consider 11001001100011110000010111010010– First 21 bits match 4 partial prefixes– First 22 bits match 3 partial prefixes– First 23 bits match 2 partial prefixes– First 24 bits match exactly one full prefix 12
11001001 10001111 00000100 −−−−−−−11001001 10001111 00000101 −−−−−−−11001001 10001111 0000011− −−−−−−−
201.143.0.0/22
201.143.4.0/24
201.143.5.0/24
201.143.6.0/23
13
Example #2: Aggregating CustomersPrefix Link
201.143.0.0/21 Provider 1201.144.0.0/21 Provider 2
201.144.0.0/21
201.144.0.0/22 201.144.4.0/24 201.144.5.0/24 201.144.6.0/23
Provider 2
201.143.0.0/21
201.143.0.0/22 201.143.4.0/24 201.143.5.0/24 201.143.6.0/23
Provider 1
14
Example #3: Complications
201.144.0.0/21
201.144.0.0/22 201.144.4.0/24 201.144.5.0/24 201.144.6.0/23
Provider 2
201.143.0.0/21
201.143.0.0/22 201.143.4.0/24 201.143.5.0/24 201.143.6.0/23
Provider 1
Forwarding table more complicated when addressing is non-topological
Unique prefix matching
15
11001001 10001111 000000−− −−−−−−−11001001 10001111 00000100 −−−−−−−
11001001 10001111 00000101 −−−−−−−
11001001 10001111 0000011− −−−−−−−
11001001 10010000 000000−− −−−−−−−
11001001 10010000 00000100 −−−−−−−
11001001 10010000 00000101 −−−−−−−11001001 10010000 0000011− −−−−−−−
Provider 1
Provider 2
16
More compact representation
201.144.0.0/21
201.144.0.0/22 201.144.4.0/24 201.144.5.0/24 201.144.6.0/23
Provider 2
201.143.0.0/21
201.143.0.0/22 201.143.4.0/24 201.143.5.0/24 201.143.6.0/23
Provider 1
Use /21s for bulk of trafficList /24s as exceptions
Arriving packet:11001001 10010000 00000100 01101101 Longest Prefix Match
17
11001001 10001111 00000−−− −−−−−−−
11001001 10001111 00000101 −−−−−−−
11001001 10010000 00000−−− −−−−−−−
11001001 10010000 00000100 −−−−−−−
Provider 1
Provider 2
201.143.0.0/21
201.144.0.0/21
201.144.4.0/24
201.143.5.0/24
Arriving packet:11001001 10010000 00000101 01101101
Why use longest prefix match?• Nontrivial to find matches in CIDR even w/o LPM
– Because can’t tell where network address ends– Must walk down bit-by-bit
• LPM decreases size of routing table– Speeding up lookup– Reducing memory consumption
• But how does LPM work?– And how can we speed it up?
18
19
Longest-Prefix-Match Forwarding
201.10.7.17destination
Forwarding Tableoutgoing link
192.0.0.0/4 2
4.83.128.0/17 1201.10.0.0/21 3201.10.6.0/23 2
prefix
201: 11001001 10: 00001010 7: 00000111 17: 00010001
192: 11000000
20
Longest-Prefix-Match Forwarding
201.10.7.17destination
Forwarding Tableoutgoing link
192.0.0.0/4 2
4.83.128.0/17 1201.10.0.0/21 3201.10.6.0/23 2
prefix
201: 11001001 10: 00001010 7: 00000111 17: 00010001
192: 11000000
21
Longest-Prefix-Match Forwarding
201.10.7.17destination
Forwarding Tableoutgoing link
192.0.0.0/4 2
4.83.128.0/17 1201.10.0.0/21 3201.10.6.0/23 2
prefix
201: 11001001 10: 00001010 7: 00000111 17: 00010001
4: 00000100 83: 01010011128: 10000000
22
Longest-Prefix-Match Forwarding
201.10.7.17destination
Forwarding Tableoutgoing link
192.0.0.0/4 2
4.83.128.0/17 1201.10.0.0/21 3201.10.6.0/23 2
prefix
201: 11001001 10: 00001010 7: 00000111 17: 00010001
201: 11001001 10: 000010100: 00000000
23
Longest-Prefix-Match Forwarding
201.10.7.17destination
Forwarding Tableoutgoing link
192.0.0.0/4 2
4.83.128.0/17 1201.10.0.0/21 3201.10.6.0/23 2
prefix
201: 11001001 10: 00001010 7: 00000111 17: 00010001
201: 11001001 10: 000010106: 00000110
24
Longest-Prefix-Match Forwarding
• Algorithmic problem: how do we do this fast?
201.10.7.17destination
Forwarding Table
2
outgoing link
192.0.0.0/4 2
4.83.128.0/17 1201.10.0.0/21 3201.10.6.0/23 2
prefix
201: 11001001 10: 00001010 7: 00000111 17: 00010001
25
Simple Algorithms Are Too Slow• Scan the forwarding table one entry at a time
– See if the destination matches the prefix– Keep track of the entry with longest-matching prefix– If no match, use default route
• Overhead is linear in size of the forwarding table– Today, that means 200,000-250,000 entries!– And, the router may have just a few nanoseconds– … before the next packet arrives
• Need greater efficiency to keep up with line speed– Better algorithms– Hardware implementations
26
Patricia Tree• Store the prefixes as a tree
– One bit for each level of the tree– Some nodes correspond to valid prefixes
• When a packet arrives– Traverse the tree based on the destination address– Running time: scales with # bits in prefix
0 1
0 0 1
0 100*
0*
11*
100* 101*
27
How Does Sending End Host Forward?• No need to run a routing protocol
– Packets to the host itself (e.g., 1.2.3.4/32)o Delivered locally
– Packets to other hosts on the LAN(e.g., 1.2.3.0/25)o Sent out the interface with LAN address (ARP)o Can tell they’re local using subnet mask
(e.g., 255.255.255.128)– Packets to external hosts (any others)
o Sent out interface to local gatewayo I.e., IP router on the LAN
• How this information is learned– Static setting of address, subnet mask, and gateway– Or: Dynamic Host Configuration Protocol (DHCP)
28
What About Reaching the End Hosts?• How does the last router reach the destination?
• Each interface has a persistent, global identifier– MAC address (Media Access Control) - Layer 2– Programmed into NIC– Usually flat address structure (i.e., no hierarchy)
• Constructing an address resolution table– Mapping MAC address to/from IP address– Address Resolution Protocol (ARP)
host host host
LAN
...
router
1.2.3.4 1.2.3.7 1.2.3.156
29
5 Minute Break
Questions Before We Proceed?
30
Transport Layer
31
Role of Transport Layer• Application layer
– Communication for specific applications– E.g., HyperText Transfer Protocol (HTTP), File Transfer
Protocol (FTP), Network News Transfer Protocol (NNTP)
• Transport layer– Communication between processes (e.g., socket)– Relies on network layer; serves the application layer– E.g., TCP and UDP
• Network layer– Logical communication between nodes– Hides details of the link technology– E.g., IP
32
Transport Protocols• Provide logical communication
between application processes running on different hosts
• Run on end hosts – Sender: breaks application
messages into segments, and passes to network layer
– Receiver: reassembles segments into messages, passes to application layer
• Multiple transport protocol available to applications– Internet: TCP and UDP (mainly)
application
transportnetworkdata linkphysical
application
transportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysicalnetwork
data linkphysical
logical end-end transport
33
Internet Transport Protocols• Datagram messaging service (UDP)
– No-frills extension of “best-effort” IP– Multiplexing/Demultiplexing among processes
• Reliable, in-order delivery (TCP)– Connection set-up & tear-down– Discarding corrupted packets– Retransmission of lost packets– Flow control– Congestion control
• Services not available– Delay and/or bandwidth guarantees– Sessions that survive change-of-IP-address
4-bitVersion
4-bitHeaderLength
8-bitType of Service
(TOS)16-bit Total Length (Bytes)
16-bit Identification3-bitFlags 13-bit Fragment Offset
8-bit Time to Live (TTL) 8-bit Protocol 16-bit Header Checksum
32-bit Source IP Address
32-bit Destination IP Address
Options (if any)
Payload
4-bitVersion
4-bitHeaderLength
8-bitType of Service
(TOS)16-bit Total Length (Bytes)
16-bit Identification3-bitFlags 13-bit Fragment Offset
8-bit Time to Live (TTL) 8-bit Protocol 16-bit Header Checksum
32-bit Source IP Address
32-bit Destination IP Address
Options (if any)
Payload
4 5 8-bitType of Service
(TOS)16-bit Total Length (Bytes)
16-bit Identification3-bitFlags 13-bit Fragment Offset
8-bit Time to Live (TTL) 8-bit Protocol 16-bit Header Checksum
32-bit Source IP Address
32-bit Destination IP Address
Payload
4 5 8-bitType of Service
(TOS)16-bit Total Length (Bytes)
16-bit Identification3-bitFlags 13-bit Fragment Offset
8-bit Time to Live (TTL)
6 = TCP17 = UDP 16-bit Header Checksum
32-bit Source IP Address
32-bit Destination IP Address
Payload
4 5 8-bitType of Service
(TOS)16-bit Total Length (Bytes)
16-bit Identification3-bitFlags 13-bit Fragment Offset
8-bit Time to Live (TTL)
6 = TCP17 = UDP 16-bit Header Checksum
32-bit Source IP Address
32-bit Destination IP Address
Payload
16-bit Source Port 16-bit Destination Port
More transport header fields ….
39
Multiplexing and Demultiplexing
• Host receives IP datagrams– Each datagram has source
and destination IP address, – Each datagram carries one
transport-layer segment– Each segment has source
and destination port number
• Host uses IP addresses and port numbers to direct the segment to appropriate socket
source port # dest port #32 bits
applicationdata
(message)
other header fields
TCP/UDP segment format
40
Ports• Need to decide which application gets which packets
• Solution: map each socket to a port
• Client must know server’s port
• Separate 16-bit port address space for UDP and TCP– (src_IP, src_port, dst_IP, dst_port) identifies TCP connection– What about UDP?
• Well known ports (0-1023): everyone agrees which services run on these ports– e.g., ssh:22, http:80
• Ephemeral ports (most 1024-65535): given to clients– e.g. chat clients, p2p networks
41
Unreliable Message Delivery Service• Lightweight communication between processes
– Avoid overhead and delays of ordered, reliable delivery– Send messages to and receive them from a socket
• User Datagram Protocol (UDP; RFC 768 - 1980!)– IP plus port numbers to support (de)multiplexing– Optional error checking on the packet contents
o (checksum field = 0 means “don’t verify checksum”)
SRC port DST port
checksum
length
DATA
42
Why Would Anyone Use UDP?• Finer control over what data is sent and when
– As soon as an application process writes into the socket– … UDP will package the data and send the packet
• No delay for connection establishment – UDP just blasts away without any formal preliminaries– … which avoids introducing any unnecessary delays
• No connection state– No allocation of buffers, sequence #s, timers …– … making it easier to handle many active clients at once
• Small packet header overhead– UDP header is only 8 bytes
43
Popular Applications That Use UDP• Multimedia streaming
– Retransmitting lost/corrupted packets often pointless - by the time the packet is retransmitted, it’s too late
– E.g., telephone calls, video conferencing, gaming– Recent streaming protocols using TCP (and HTTP)
• Simple query protocols like Domain Name System– Connection establishment overhead would double cost– Easier to have application retransmit if needed
“Address for bbc.co.uk?”
“212.58.224.131”
44
Transmission Control Protocol (TCP)• Connection oriented
– Explicit set-up and tear-down of TCP session
• Stream-of-bytes service– Sends and receives a stream of bytes, not messages
• Congestion control– Dynamic adaptation to network path’s capacity
• Reliable, in-order delivery– TCP tries very hard to ensure byte stream (eventually)
arrives intacto In the presence of corruption and loss
• Flow control– Ensure that sender doesn’t overwhelm receiver
45
Reliable Delivery• How do we design for reliable delivery?
– How do you converse on a noisy phone line?
• Positive acknowledgment (“Ack”)– Explicit confirmation by receiver– TCP acknowledgments are cumulative (“I’ve received
everything up through sequence #N”)o With an option for acknowledging individual segments (“SACK”)
• Negative acknowledgment (“Nack”)– “I’m missing the following: …”– How might the receiver tell something’s missing?
Can they always do this?– (Only used by TCP in implicit fashion - “fast retransmit”)
46
Reliable Delivery, con’t• Timeout
– If haven’t heard anything from receiver, send again– Problem: for how long do you wait?
o TCP uses function of estimated RTT– Problem: what if no Ack for retransmission?
o TCP (and other schemes) employs exponential backoffo Double timer up to maximum - tapers off load during congestion
• A very different approach to reliability: send redundant data– Cell phone analogy: “Meet me at 3PM - repeat 3PM”– Forward error correction– Recovers from lost data nearly immediately!– But: only can cope with a limited degree of loss– And: adds load to the network
47
TCP Support for Reliable Delivery• Sequence numbers
– Used to detect missing data– ... and for putting the data back in order
• Checksum– Used to detect corrupted data at the receiver– …leading the receiver to drop the packet– No error signal sent - recovery via normal retransmission
• Retransmission– Sender retransmits lost or corrupted data– Timeout based on estimates of round-trip time (RTT)– Fast retransmit algorithm for rapid retransmission
48
Efficient Transport Reliability
49
Automatic Repeat reQuest (ARQ)
Time
Packet
ACKTim
eout
• Automatic Repeat Request– Receiver sends
acknowledgment (ACK) when it receives packet
– Sender waits for ACK and times out if does not arrive within some time period
• Simplest ARQ protocol– Stop and Wait– Send a packet, stop and wait
until ACK arrives
Sender Receiver
50
How Fast Can Stop-and-Wait Go?• Suppose we’re sending from UCB to New York:
– Bandwidth = 1 Mbps (megabits/sec)– RTT = 100 msec– Maximum Transmission Unit (MTU) = 1500 B = 12,000 b– No other load on the path and no packet loss
• What (approximately) is the fastest we can transmit using Stop-and-Wait?
• How about if Bandwidth = 1 Gbps?
51
Allowing Multiple Packets in Flight• “In Flight” = “Unacknowledged”
• Sender-side issue: how many packets (bytes)?
• Receiver-side issue: how much buffer for data that’s “above a sequence hole”?– I.e., data that can’t be delivered since previous data is
missing– Assumes service model is in-order delivery (like TCP)
52
Sliding Window• Allow a larger amount of data “in flight”
– Allow sender to get ahead of the receiver– … though not too far ahead
Sending process Receiving process
Last byte ACKed
Last byte can send
TCP TCP
Next byte needed
Last byte written Last byte read
Last byte received
Sender Window
Receiver Window
53
Sliding Window, con’t• Both sender & receiver maintain a window that
governs amount of data in flight (sender) or not-yet-delivered (receiver)
• Left edge of window:– Sender: beginning of unacknowledged data– Receiver: beginning of undelivered data
• For the sender:– Window size = maximum amount of data in flight
o Determines rateo Sender must have at least this much buffer (maybe more)
• For the receiver:– Window size = maximum amount of undelivered data
o Receiver has this much buffer
54
Sliding Window
Sending process
Last byte ACKed
Last byte can send
TCP Last byte written
Sender Window
• For the sender, when receives an acknowledgment for new data, window advances (slides forward)
55
Sliding Window• For the sender, when receives an
acknowledgment for new data, window advances (slides forward)
Sending process
Last byte ACKed
Last byte can send
TCP Last byte written
Sender Window
56
Sliding Window• For the receiver, as the receiving process
consumes data, the window slides forward
Receiving process
TCP
Next byte needed
Last byte read
Last byte receivedReceiver Window
57
Sliding Window• For the receiver, as the receiving process
consumes data, the window slides forward
Receiving process
TCP
Next byte needed
Last byte read
Last byte receivedReceiver Window
58
Sliding Window, con’t• Sender: window advances when new data ack’d
• Receiver: window advances as receiving process consumes data
• What happens if sender’s window size exceeds the receiver’s window size?
• Receiver advertises to the sender where the receiver window currently ends (“righthand edge”)– Sender agrees not to exceed this amount– It makes sure by setting its own window size to a value
that can’t send beyond the receiver’s righthand edge
59
Performance with Sliding Window
• Given previous UCB New York 1 Mbps path with 100 msec RTTand Sender (and Receiver) window = 100 Kb = 12.5 KB
• How fast can we transmit?
• What about with 12.5 KB window & 1 Gbps path?
• Window required to fully utilize path:• Bandwidth-delay product (or “delay-bandwidth product”)• 1 Gbps * 100 msec = 100 Mb = 12.5 MB• Note: large window = many packets in flight
60
Summary• IP packet forwarding
– Based on longest-prefix match– End systems use subnet mask to determine if traffic
destined for their LAN …o In which case they send directly, using ARP to find MAC address
– … or for some other networko In which case they send to their local gateway (router)
– This info either statically config’d or learned via DHCP
• Transport protocols– Multiplexing and demultiplexing via port numbers– UDP gives simple datagram service– TCP gives reliable byte-stream service– Reliability immediately raises performance issues
o Stop-and-Wait vs. Sliding Window
61
Next Lecture
• DNS = Domain Name System
• Reading: K&R 2.5