1
Star%ng and Ending a Connec%on: TCP Handshakes
1
Establishing a TCP Connec%on
• Three-‐way handshake to establish connec%on – Host A sends a SYNchronize (open) to the host B – Host B returns a SYN ACKnowledgment (SYN ACK) – Host A sends an ACK to acknowledge the SYN ACK
2
SYN
SYN ACK
ACK Data
A B
Data
Each host tells its ISN to the other host.!
TCP Header
3
Source port Destination port
Sequence number
Acknowledgment
Advertised window HdrLen Flags 0
Checksum Urgent pointer
Options (variable)
Data
Flags: SYN FIN RST PSH URG ACK
Step 1: A’s Ini%al SYN Packet
4
A’s port B’s port
A’s Initial Sequence Number
Acknowledgment
Advertised window 20 Flags 0
Checksum Urgent pointer
Options (variable)
Flags: SYN FIN RST PSH URG ACK
A tells B it wants to open a connec4on…
2
Step 2: B’s SYN-‐ACK Packet
5
B’s port A’s port
B’s Initial Sequence Number
A’s ISN plus 1
Advertised window 20 Flags 0
Checksum Urgent pointer
Options (variable)
Flags: SYN FIN RST PSH URG ACK
B tells A it accepts, and is ready to hear the next byte…
… upon receiving this packet, A can start sending data
Step 3: A’s ACK of the SYN-‐ACK
6
A’s port B’s port
B’s ISN plus 1
Advertised window 20 Flags 0
Checksum Urgent pointer
Options (variable)
Flags: SYN FIN RST PSH URG ACK
A tells B it is okay to start sending…!
Sequence number
… upon receiving this packet, B can start sending data!
What if the SYN Packet Gets Lost?
• Suppose the SYN packet gets lost – Packet is lost inside the network, or – Server rejects the packet (e.g., listen queue is full)
• Eventually, no SYN-‐ACK arrives – Sender sets a %mer and wait for the SYN-‐ACK – … and retransmits the SYN if needed
• How should the TCP sender set the %mer? – Sender has no idea how far away the receiver is – Hard to guess a reasonable length of %me to wait – Some TCPs use a default of 3 or 6 seconds
7
SYN Loss and Web Downloads
• User clicks on a hypertext link – Browser creates a socket and does a “connect” – The “connect” triggers the OS to transmit a SYN
• If the SYN is lost… – The 3-‐6 seconds of delay may be very long – The user may get impa%ent – … and click the hyperlink again, or click “reload”
• User triggers an “abort” of the “connect” – Browser creates a new socket and does a “connect” – Essen%ally, forces a faster send of a new SYN packet! – Some%mes very effec%ve, and the page comes fast
8
3
Tearing Down the Connec%on
• Closing (each end of) the connec%on – Finish (FIN) to close and receive remaining bytes – And other host sends a FIN ACK to acknowledge – Reset (RST) to close and not receive remaining bytes
9
SYN
SYN
ACK
ACK
D
ata
FIN
ACK
ACK
time A
B
FIN A
CK
Sending/Receiving the FIN Packet
• Sending a FIN: close() – Process is done sending data via the socket
– Process invokes “close()” to close the socket
– Once TCP has sent all of the outstanding bytes…
– … then TCP sends a FIN
• Receiving a FIN: EOF – Process is reading data from the socket
– Eventually, the acempt to read returns an EOF
10
11
Flow Control: TCP Sliding Window
12
4
Mo%va%on for Sliding Window • Stop-‐and-‐wait is inefficient
– Only one TCP segment is “in flight” at a %me – Esp. bad when delay-‐bandwidth product is high
• Numerical example – 1.5 Mbps link with a 45 msec round-‐trip %me (RTT)
• Delay-‐bandwidth product is 67.5 Kbits (or 8 KBytes) – But, sender can send at most one packet per RTT
• Assuming a segment size of 1 KB (8 Kbits) • … leads to 8 Kbits/seg / 45 Msec/seg 182 Kbps • Just one-‐eighth of the 1.5 Mbps link capacity
13
Sliding Window • Allow a larger amount of data “in flight”
– Allow sender to get ahead of the receiver – … though not too far ahead
14
Sending process! Receiving process!
Last byte ACKed!
Last byte sent!
TCP! TCP!
Next byte expected!
Last byte written! Last byte read!
Last byte received!
Receiver Buffering • Window size
– Amount that can be sent without acknowledgment – Receiver needs to be able to store this amount of data
• Receiver adver%ses the window to the sender – Tells the sender the amount of free space lel – … and the sender agrees not to exceed this amount
15
Window Size
Outstanding Un-ack’d data
Data OK to send
Data not OK to send yet
Data ACK’d
TCP Header for Receiver Buffering
16
Source port Destination port
Sequence number
Acknowledgment
Advertised window HdrLen Flags 0
Checksum Urgent pointer
Options (variable)
Data
Flags: SYN FIN RST PSH URG ACK
5
Conclusions
• Transport protocols – Mul%plexing and demul%plexing – Checksum-‐based error detec%on – Sequence numbers – Retransmission – Window-‐based flow control
• Next lecture – Conges%on control
17
Conges%on Control Reading: Chapter 3
18
Goals of Today’s Lecture
• Conges%on in IP networks – Unavoidable due to best-‐effort service model – IP philosophy: decentralized control at end hosts
• Conges%on control by the TCP senders – Infers conges%on is occurring (e.g., from packet losses) – Slows down to alleviate conges%on, for the greater good
• TCP conges%on-‐control algorithm – Addi%ve-‐increase, mul%plica%ve-‐decrease – Slow start and slow-‐start restart
• Ac%ve Queue Management (AQM) – Random Early Detec%on (RED) – Explicit Conges%on No%fica%on (ECN)
19
No Problem Under Circuit Switching
• Source establishes connec%on to des%na%on – Nodes reserve resources for the connec%on – Circuit rejected if the resources aren’t available – Cannot have more than the network can handle
20
6
IP Best-‐Effort Design Philosophy
• Best-‐effort delivery – Let everybody send – Network tries to deliver what it can – … and just drop the rest
21
source destination
IP network
Conges%on is Unavoidable • Two packets arrive at the same %me
– The node can only transmit one – … and either buffer or drop the other
• If many packets arrive in short period of %me – The node cannot keep up with the arriving traffic – … and the buffer may eventually overflow
22
The Problem of Conges%on
• What is conges%on? – Load is higher than capacity
• What do IP routers do? – Drop the excess packets
• Why is this bad? – Wasted bandwidth for retransmissions
23 Load
Goodput “congestion! collapse”!
Increase in load that results in a decrease in
useful work done.
Ways to Deal With Conges%on • Ignore the problem
– Many dropped (and retransmiced) packets – Can cause conges%on collapse
• Reserva%ons, like in circuit switching – Pre-‐arrange bandwidth alloca%ons – Requires nego%a%on before sending packets
• Pricing – Don’t drop packets for the high-‐bidders – Requires a payment model
• Dynamic adjustment (TCP) – Every sender infers the level of conges%on – Each adapts its sending rate “for the greater good”
24
7
Many Important Ques%ons
• How does the sender know there is conges%on? – Explicit feedback from the network? – Inference based on network performance?
• How should the sender adapt? – Explicit sending rate computed by the network? – End host coordinates with other hosts? – End host thinks globally but acts locally?
• What is the performance objec%ve? – Maximizing goodput, even if some users suffer more? – Fairness? (Whatever the heck that means!)
• How fast should new TCP senders send? 25
Inferring From Implicit Feedback
• What does the end host see?
• What can the end host change? 26
?
Where Conges%on Happens: Links • Simple resource alloca%on: FIFO queue & drop-‐tail
• Access to the bandwidth: first-‐in first-‐out queue – Packets transmiced in the order they arrive
• Access to the buffer space: drop-‐tail queuing – If the queue is full, drop the incoming packet
27
How it Looks to the End Host
• Packet delay – Packet experiences high delay
• Packet loss – Packet gets dropped along the way
• How does TCP sender learn this? – Delay
• Round-‐trip %me es%mate
– Loss • Timeout • Duplicate acknowledgments
28
8
What Can the End Host Do?
• Upon detec%ng conges%on (well, packet loss) – Decrease the sending rate – End host does its part to alleviate the conges%on
• But, what if condi%ons change? – Suppose there is more bandwidth available – Would be a shame to stay at a low sending rate
• Upon not detec%ng conges%on – Increase the sending rate, a licle at a %me – And see if the packets are successfully delivered
29
TCP Conges%on Window
• Each TCP sender maintains a conges%on window – Maximum number of bytes to have in transit – I.e., number of bytes s%ll awai%ng acknowledgments
• Adap%ng the conges%on window – Decrease upon losing a packet: backing off – Increase upon success: op%mis%cally exploring – Always struggling to find the right transfer rate
• Both good and bad – Pro: avoids having explicit feedback from network – Con: under-‐shoo%ng and over-‐shoo%ng the rate
30
Addi%ve Increase, Mul%plica%ve Decrease (AIMD)
• How much to increase and decrease? – Increase linearly, decrease mul%plica%vely – A necessary condi%on for stability of TCP – Consequences of over-‐sized window are much worse than having an under-‐sized window • Over-‐sized window: packets dropped and retransmiced • Under-‐sized window: somewhat lower throughput
• Mul%plica%ve decrease – On loss of packet, divide conges%on window in half
• Addi%ve increase – On success for last window of data, increase linearly
31
Leads to the TCP “Sawtooth”
32
t
Window
halved
Loss
9
Prac%cal Details
• Conges%on window – Represented in bytes, not in packets (Why?)
– Packets have MSS (Maximum Segment Size) bytes
• Increasing the conges%on window – Increase by MSS on success for last window of data
• Decreasing the conges%on window – Never drop conges%on window below 1 MSS
33
Receiver Window vs. Conges%on Window
• Flow control – Keep a fast sender from overwhelming a slow receiver
• Conges%on control – Keep a set of senders from overloading the network
• Different concepts, but similar mechanisms – TCP flow control: receiver window – TCP conges%on control: conges%on window – TCP window: min { conges%on window, receiver window }
34
How Should a New Flow Start
35
t
Window
But, could take a long time to get started!
Need to start with a small CWND to avoid overloading the network.
“Slow Start” Phase
• Start with a small conges%on window – Ini%ally, CWND is 1 Max Segment Size (MSS) – So, ini%al sending rate is MSS/RTT
• That could be precy wasteful – Might be much less than the actual bandwidth – Linear increase takes a long %me to accelerate
• Slow-‐start phase (really “fast start”) – Sender starts at a slow rate (hence the name) – … but increases the rate exponen%ally – … un%l the first loss event
36
10
Slow Start in Ac%on
37
Double CWND per round-trip time
D A D D A A D D
A A
D
A
Src
Dest
D
A
1 2 4 8
Slow Start and the TCP Sawtooth
38
Loss
Exponential “slow start”
t
Window
Why is it called slow-start? Because TCP originally had no congestion control mechanism. The source would just start by sending a whole receiver window’s worth of data.
Two Kinds of Loss in TCP
• Timeout – Packet n is lost and detected via a %meout – E.g., because all packets in flight were lost – Aler the %meout, blas%ng away for the en%re CWND – … would trigger a very large burst in traffic – So, becer to start over with a low CWND
• Triple duplicate ACK – Packet n is lost, but packets n+1, n+2, etc. arrive – Receiver sends duplicate acknowledgments – … and the sender retransmits packet n quickly – Do a mul%plica%ve decrease and keep going
39
Repea%ng Slow Start Aler Timeout
40
t
Window
Slow-start restart: Go back to CWND of 1, but take advantage of knowing the previous value of CWND.
Slow start in operation until it reaches half of
previous cwnd.
timeout
11
Repea%ng Slow Start Aler Idle Period
• Suppose a TCP connec%on goes idle for a while – E.g., Telnet session where you don’t type for an hour
• Eventually, the network condi%ons change – Maybe many more flows are traversing the link – E.g., maybe everybody has come back from lunch!
• Dangerous to start transmisng at the old rate – Previously-‐idle TCP sender might blast the network – … causing excessive conges%on and packet loss
• So, some TCP implementa%ons repeat slow start – Slow-‐start restart aler an idle period
41
TCP Achieves Some No%on of Fairness
• Effec%ve u%liza%on is not the only goal – We also want to be fair to the various flows – … but what the heck does that mean?
• Simple defini%on: equal shares of the bandwidth – N flows that each get 1/N of the bandwidth? – But, what if the flows traverse different paths? – E.g., bandwidth shared in propor%on to the RTT
42
What About Chea%ng?
• Some folks are more fair than others – Running mul%ple TCP connec%ons in parallel – Modifying the TCP implementa%on in the OS – Use the User Datagram Protocol
• What is the impact – Good guys slow down to make room for you – You get an unfair share of the bandwidth
• Possible solu%ons? – Routers detect chea%ng and drop excess packets? – Peer pressure? – ???
43
Queuing Mechanisms
Random Early Detec%on (RED) Explicit Conges%on No%fica%on (ECN)
44
12
Bursty Loss From Drop-‐Tail Queuing
• TCP depends on packet loss – Packet loss is the indica%on of conges%on – In fact, TCP drives the network into packet loss – … by con%nuing to increase the sending rate
• Drop-‐tail queuing leads to bursty loss – When a link becomes congested… – … many arriving packets encounter a full queue – And, as a result, many flows divide sending rate in half – … and, many individual flows lose mul%ple packets
45
Slow Feedback from Drop Tail • Feedback comes when buffer is completely full
– … even though the buffer has been filling for a while • Plus, the filling buffer is increasing RTT
– … and the variance in the RTT • Might be becer to give early feedback
– Get 1-‐2 connec%ons to slow down, not all of them
– Get these connec%ons to slow down before it is too late
46
Random Early Detec%on (RED) • Basic idea of RED
– Router no%ces that the queue is gesng backlogged – … and randomly drops packets to signal conges%on
• Packet drop probability – Drop probability increases as queue length increases – If buffer is below some level, don’t drop anything – … otherwise, set drop probability as func%on of queue
47
Average Queue Length
Prob
abili
ty
Proper%es of RED
• Drops packets before queue is full – In the hope of reducing the rates of some flows
• Drops packet in propor%on to each flow’s rate – High-‐rate flows have more packets – … and, hence, a higher chance of being selected
• Drops are spaced out in %me – Which should help desynchronize the TCP senders
• Tolerant of burs%ness in the traffic – By basing the decisions on average queue length
48
13
Problems With RED
• Hard to get the tunable parameters just right – How early to start dropping packets? – What slope for the increase in drop probability? – What %me scale for averaging the queue length?
• Some%mes RED helps but some%mes not – If the parameters aren’t set right, RED doesn’t help – And it is hard to know how to set the parameters
• RED is implemented in prac%ce – But, olen not used due to the challenges of tuning right
• Many varia%ons in the research community – With cute names like “Blue” and “FRED”…
49
Explicit Conges%on No%fica%on • Early dropping of packets
– Good: gives early feedback – Bad: has to drop the packet to give the feedback
• Explicit Conges%on No%fica%on – Router marks the packet with an ECN bit – … and sending host interprets as a sign of conges%on
• Surmoun%ng the challenges – Must be supported by the end hosts and the routers – Requires 2 bits in the IP header for detec%on (forward dir)
• One for ECN mark; one to indicate ECN capability • Solu%on: borrow 2 of Type-‐Of-‐Service bits in IPv4 header
– Also 2 bits in TCP header for signaling sender (reverse dir) 50
Other TCP Mechanisms
Nagle’s Algorithm and Delayed ACK
51
Mo%va%on for Nagle’s Algorithm
• Interac%ve applica%ons – Telnet and rlogin – Generate many small packets (e.g., keystrokes)
• Small packets are wasteful – Mostly header (e.g., 40 bytes of header, 1 of data)
• Appealing to reduce the number of packets – Could force every packet to have some minimum size – … but, what if the person doesn’t type more characters?
• Need to balance compe%ng trade-‐offs – Send larger packets – … but don’t introduce much delay by wai%ng
52
14
Nagle’s Algorithm • Wait if the amount of data is small
– Smaller than Maximum Segment Size (MSS) • And some other packet is already in flight
– I.e., s%ll awai%ng the ACKs for previous packets • That is, send at most one small packet per RTT
– … by wai%ng un%l all outstanding ACKs have arrived
• Influence on performance – Interac%ve applica%ons: enables batching of bytes – Bulk transfer: transmits in MSS-‐sized packets anyway
53
vs.
ACK
Nagle’s Algorithm • Wait if the amount of data is small
– Smaller than Maximum Segment Size (MSS) • And some other packet is already in flight
– I.e., s%ll awai%ng the ACKs for previous packets • That is, send at most one small packet per RTT
– … by wai%ng un%l all outstanding ACKs have arrived
• Influence on performance – Interac%ve applica%ons: enables batching of bytes – Bulk transfer: transmits in MSS-‐sized packets anyway
54
vs.
ACK
Turning Nagle Off!void !
tcp_nodelay (int s) !{ !
!int n = 1; !!if (setsockopt (s, IPPROTO_TCP, TCP_NODELAY, !! (char *) &n, sizeof (n)) < 0) !
warn ("TCP_NODELAY: %m\n"); !} !
Mo%va%on for Delayed ACK
• TCP traffic is olen bidirec%onal – Data traveling in both direc%ons – ACKs traveling in both direc%ons
• ACK packets have high overhead – 40 bytes for the IP header and TCP header – … and zero data traffic
• Piggybacking is appealing – Host B can send an ACK to host A – … as part of a data packet from B to A
55
TCP Header Allows Piggybacking
56
Source port Destination port
Sequence number
Acknowledgment
Advertised window HdrLen Flags 0
Checksum Urgent pointer
Options (variable)
Data
Flags: SYN FIN RST PSH URG ACK
15
Example of Piggybacking
57
Data
Data+ACK
Data
A B
ACK
Data
Data + ACK
B has data to send
A has data to send
B doesn’t have data to send
Increasing Likelihood of Piggybacking
• Example: rlogin or telnet – Host A types characters at prompt – Host B receives the character and executes a command
– … and then data are generated – Would be nice if B could send the ACK with the new data
• Increase piggybacking – TCP allows the receiver to wait to send the ACK
– … in the hope that the host will have data to send
58
Data
Data+ACK
Data
A B
ACK
Data
Data + ACK
Delayed ACK
• Delay sending an ACK – Upon receiving a packet, the host B sets a %mer
• Typically, 200 msec or 500 msec
– If B’s applica%on generates data, go ahead and send • And piggyback the ACK bit
– If the %mer expires, send a (non-‐piggybacked) ACK
• Limi%ng the wait – Timer of 200 msec or 500 msec – ACK every other full-‐sized packet
59
Conclusions • Conges%on is inevitable
– Internet does not reserve resources in advance – TCP ac%vely tries to push the envelope
• Conges%on can be handled – Addi%ve increase, mul%plica%ve decrease – Slow start, and slow-‐start restart
• Ac%ve Queue Management can help – Random Early Detec%on (RED) – Explicit Conges%on No%fica%on (ECN)
• Fundamental tensions – Feedback from the network? – Enforcement of “TCP friendly” behavior?
60