CS 4284Systems Capstone
Godmar Back
Networking
TCP
CS 4284 Spring 2013
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transferbull flow controlbull connection management
[ Network Address Translation ][ Principles of congestion control ]
bull TCP congestion control
CS 4284 Spring 2013
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T CPsend buffer
TC Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
CS 4284 Spring 2013
TCP Segment Structuresource port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement
numberreceive windowurg data pnterchecksum
FSRPAUheadlen
notused
options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection
establishment(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
CS 4284 Spring 2013
TCP Reliable Data Transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segments
bull Cumulative acksbull TCP uses single
retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate
acksndash ignore flow control
congestion control
CS 4284 Spring 2013
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles
out-of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt of
lsquoCrsquo echoesback lsquoCrsquo
timesimple telnet scenario
CS 4284 Spring 2013
TCP Sender Eventsdata rcvd from appbull create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to
be ackedndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP sender
(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so
SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73
CS 4284 Spring 2013
TCP retransmission scenariosHost A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
SendBase= 100
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Seq=
92 ti
meo
ut
SendBase= 120
SendBase= 120
Sendbase= 100
CS 4284 Spring 2013
TCP retransmission scenarios (more)
Host A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in
small chunks)ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash
then send all at oncebull You can turn this off via
ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Naglebull These two mechanisms have nothing to do with
each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable
delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender
ndash Delays sending actual data (so that more data can be fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
TCP
CS 4284 Spring 2013
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transferbull flow controlbull connection management
[ Network Address Translation ][ Principles of congestion control ]
bull TCP congestion control
CS 4284 Spring 2013
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T CPsend buffer
TC Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
CS 4284 Spring 2013
TCP Segment Structuresource port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement
numberreceive windowurg data pnterchecksum
FSRPAUheadlen
notused
options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection
establishment(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
CS 4284 Spring 2013
TCP Reliable Data Transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segments
bull Cumulative acksbull TCP uses single
retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate
acksndash ignore flow control
congestion control
CS 4284 Spring 2013
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles
out-of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt of
lsquoCrsquo echoesback lsquoCrsquo
timesimple telnet scenario
CS 4284 Spring 2013
TCP Sender Eventsdata rcvd from appbull create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to
be ackedndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP sender
(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so
SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73
CS 4284 Spring 2013
TCP retransmission scenariosHost A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
SendBase= 100
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Seq=
92 ti
meo
ut
SendBase= 120
SendBase= 120
Sendbase= 100
CS 4284 Spring 2013
TCP retransmission scenarios (more)
Host A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in
small chunks)ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash
then send all at oncebull You can turn this off via
ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Naglebull These two mechanisms have nothing to do with
each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable
delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender
ndash Delays sending actual data (so that more data can be fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transferbull flow controlbull connection management
[ Network Address Translation ][ Principles of congestion control ]
bull TCP congestion control
CS 4284 Spring 2013
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T CPsend buffer
TC Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
CS 4284 Spring 2013
TCP Segment Structuresource port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement
numberreceive windowurg data pnterchecksum
FSRPAUheadlen
notused
options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection
establishment(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
CS 4284 Spring 2013
TCP Reliable Data Transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segments
bull Cumulative acksbull TCP uses single
retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate
acksndash ignore flow control
congestion control
CS 4284 Spring 2013
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles
out-of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt of
lsquoCrsquo echoesback lsquoCrsquo
timesimple telnet scenario
CS 4284 Spring 2013
TCP Sender Eventsdata rcvd from appbull create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to
be ackedndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP sender
(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so
SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73
CS 4284 Spring 2013
TCP retransmission scenariosHost A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
SendBase= 100
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Seq=
92 ti
meo
ut
SendBase= 120
SendBase= 120
Sendbase= 100
CS 4284 Spring 2013
TCP retransmission scenarios (more)
Host A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in
small chunks)ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash
then send all at oncebull You can turn this off via
ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Naglebull These two mechanisms have nothing to do with
each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable
delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender
ndash Delays sending actual data (so that more data can be fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Overview RFCs 793 1122 1323 2018 2581
bull full duplex datandash bi-directional data flow in
same connectionndash MSS maximum segment
sizebull connection-oriented
ndash handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange
bull flow controlledndash sender will not overwhelm
receiver
bull point-to-pointndash one sender one receiver
bull reliable in-order byte streamndash no ldquomessage boundariesrdquo
bull pipelinedndash TCP congestion and flow
control set window sizebull send amp receive buffers
socketdoor
T CPsend buffer
TC Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
CS 4284 Spring 2013
TCP Segment Structuresource port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement
numberreceive windowurg data pnterchecksum
FSRPAUheadlen
notused
options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection
establishment(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
CS 4284 Spring 2013
TCP Reliable Data Transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segments
bull Cumulative acksbull TCP uses single
retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate
acksndash ignore flow control
congestion control
CS 4284 Spring 2013
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles
out-of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt of
lsquoCrsquo echoesback lsquoCrsquo
timesimple telnet scenario
CS 4284 Spring 2013
TCP Sender Eventsdata rcvd from appbull create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to
be ackedndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP sender
(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so
SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73
CS 4284 Spring 2013
TCP retransmission scenariosHost A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
SendBase= 100
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Seq=
92 ti
meo
ut
SendBase= 120
SendBase= 120
Sendbase= 100
CS 4284 Spring 2013
TCP retransmission scenarios (more)
Host A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in
small chunks)ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash
then send all at oncebull You can turn this off via
ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Naglebull These two mechanisms have nothing to do with
each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable
delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender
ndash Delays sending actual data (so that more data can be fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Segment Structuresource port dest port
32 bits
applicationdata
(variable length)
sequence numberacknowledgement
numberreceive windowurg data pnterchecksum
FSRPAUheadlen
notused
options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now(generally not used)
RST SYN FINconnection
establishment(setup teardown
commands)
bytes rcvr willingto accept
countingby bytes of data(not segments)
Internetchecksum
(as in UDP)
CS 4284 Spring 2013
TCP Reliable Data Transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segments
bull Cumulative acksbull TCP uses single
retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate
acksndash ignore flow control
congestion control
CS 4284 Spring 2013
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles
out-of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt of
lsquoCrsquo echoesback lsquoCrsquo
timesimple telnet scenario
CS 4284 Spring 2013
TCP Sender Eventsdata rcvd from appbull create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to
be ackedndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP sender
(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so
SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73
CS 4284 Spring 2013
TCP retransmission scenariosHost A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
SendBase= 100
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Seq=
92 ti
meo
ut
SendBase= 120
SendBase= 120
Sendbase= 100
CS 4284 Spring 2013
TCP retransmission scenarios (more)
Host A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in
small chunks)ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash
then send all at oncebull You can turn this off via
ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Naglebull These two mechanisms have nothing to do with
each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable
delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender
ndash Delays sending actual data (so that more data can be fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Reliable Data Transfer
bull TCP creates rdt service on top of IPrsquos unreliable service
bull Pipelined segments
bull Cumulative acksbull TCP uses single
retransmission timer
bull Retransmissions are triggered byndash timeout eventsndash duplicate acks
bull Initially consider simplified TCP senderndash ignore duplicate
acksndash ignore flow control
congestion control
CS 4284 Spring 2013
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles
out-of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt of
lsquoCrsquo echoesback lsquoCrsquo
timesimple telnet scenario
CS 4284 Spring 2013
TCP Sender Eventsdata rcvd from appbull create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to
be ackedndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP sender
(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so
SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73
CS 4284 Spring 2013
TCP retransmission scenariosHost A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
SendBase= 100
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Seq=
92 ti
meo
ut
SendBase= 120
SendBase= 120
Sendbase= 100
CS 4284 Spring 2013
TCP retransmission scenarios (more)
Host A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in
small chunks)ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash
then send all at oncebull You can turn this off via
ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Naglebull These two mechanisms have nothing to do with
each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable
delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender
ndash Delays sending actual data (so that more data can be fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Seq rsquos and ACKsSeq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKsndash seq of next byte
expected from other side
ndash cumulative ACKQ how receiver handles
out-of-order segmentsndash A TCP spec doesnrsquot
say - up to implementor
Host A Host B
Seq=42 ACK=79 data = lsquoCrsquo
Seq=79 ACK=43 data = lsquoCrsquo
Seq=43 ACK=80
Usertypes
lsquoCrsquo
host ACKsreceipt
of echoedlsquoCrsquo
host ACKsreceipt of
lsquoCrsquo echoesback lsquoCrsquo
timesimple telnet scenario
CS 4284 Spring 2013
TCP Sender Eventsdata rcvd from appbull create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to
be ackedndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP sender
(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so
SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73
CS 4284 Spring 2013
TCP retransmission scenariosHost A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
SendBase= 100
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Seq=
92 ti
meo
ut
SendBase= 120
SendBase= 120
Sendbase= 100
CS 4284 Spring 2013
TCP retransmission scenarios (more)
Host A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in
small chunks)ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash
then send all at oncebull You can turn this off via
ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Naglebull These two mechanisms have nothing to do with
each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable
delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender
ndash Delays sending actual data (so that more data can be fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Sender Eventsdata rcvd from appbull create segment with
seq bull seq is byte-stream
number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeoutbull retransmit segment that
caused timeoutbull restart timer ack rcvdbull If acknowledges
previously unacked segmentsndash update what is known to
be ackedndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP sender
(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so
SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73
CS 4284 Spring 2013
TCP retransmission scenariosHost A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
SendBase= 100
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Seq=
92 ti
meo
ut
SendBase= 120
SendBase= 120
Sendbase= 100
CS 4284 Spring 2013
TCP retransmission scenarios (more)
Host A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in
small chunks)ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash
then send all at oncebull You can turn this off via
ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Naglebull These two mechanisms have nothing to do with
each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable
delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender
ndash Delays sending actual data (so that more data can be fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP sender
(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) switch(event)
event data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer
end of loop forever
Commentbull SendBase-1 last cumulatively ackrsquoed byteExamplebull SendBase-1 = 71 so
SendBase = 72 say Ack received with y= 73 so the rcvr acknowledges up to including 72 now wants 73+ y is gt SendBase soknow that new data is acked set SendBase to 73
CS 4284 Spring 2013
TCP retransmission scenariosHost A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
SendBase= 100
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Seq=
92 ti
meo
ut
SendBase= 120
SendBase= 120
Sendbase= 100
CS 4284 Spring 2013
TCP retransmission scenarios (more)
Host A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in
small chunks)ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash
then send all at oncebull You can turn this off via
ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Naglebull These two mechanisms have nothing to do with
each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable
delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender
ndash Delays sending actual data (so that more data can be fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP retransmission scenariosHost A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
lost ACK scenario
Host B
X
Seq=92 8 bytes data
ACK=100
time
SendBase= 100
Host A
Seq=100 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92 8 bytes data
ACK=120
Seq=92 8 bytes data
Seq=
92 ti
meo
ut
ACK=120
Seq=
92 ti
meo
ut
SendBase= 120
SendBase= 120
Sendbase= 100
CS 4284 Spring 2013
TCP retransmission scenarios (more)
Host A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in
small chunks)ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash
then send all at oncebull You can turn this off via
ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Naglebull These two mechanisms have nothing to do with
each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable
delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender
ndash Delays sending actual data (so that more data can be fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP retransmission scenarios (more)
Host A
Seq=92 8 bytes data
ACK=100
losstimeo
ut
Cumulative ACK scenario
Host B
X
Seq=100 20 bytes data
ACK=120
time
SendBase= 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in
small chunks)ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash
then send all at oncebull You can turn this off via
ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Naglebull These two mechanisms have nothing to do with
each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable
delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender
ndash Delays sending actual data (so that more data can be fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
Arrival of in-order segment withexpected seq One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
Immediately send single cumulative ACK ACKing both in-order segments
Immediately send duplicate ACK indicating seq of next expected byte
Immediate send ACK provided thatsegment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in
small chunks)ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash
then send all at oncebull You can turn this off via
ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Naglebull These two mechanisms have nothing to do with
each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable
delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender
ndash Delays sending actual data (so that more data can be fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Naglersquos algorithmbull Consider again apps writing byte-by-byte (or in
small chunks)ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Naglendash Transmit first bytendash Buffer outgoing bytes until ack has been received ndash
then send all at oncebull You can turn this off via
ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Naglebull These two mechanisms have nothing to do with
each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable
delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender
ndash Delays sending actual data (so that more data can be fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Delayed ACK vs Naglebull These two mechanisms have nothing to do with
each otherndash often confused especially by various online sourcesndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable
delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiverndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)bull Naglersquos algorithm implemented by TCP Sender
ndash Delays sending actual data (so that more data can be fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP timeout value
bull longer than RTTndash but RTT varies
bull too short premature timeoutndash unnecessary
retransmissions
bull too long slow reaction to segment loss
Q how to estimate RTT
bull SampleRTT measured time from segment transmission until ACK receipt
bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several
recent measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer (b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving averagebull influence of past sample decreases
exponentially fastbull typical value = 0125
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Round Trip Time and TimeoutSetting the timeoutbull EstimatedRTT plus ldquosafety marginrdquo
ndash large variation in EstimatedRTT larger safety marginbull first estimate of how much SampleRTT deviates
from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
RTT Measurement amp Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTTndash Note ACK could be delayed for original
segment or early for retransmitted segmentbull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Karnrsquos Algorithmbull Idea donrsquot consider samples from retransmitted
segmentsndash Ok but what if current timeout value is too small for network
delayndash In this case would keep timing out but couldnrsquot adjust timeout
according to formulabull Solution If measurement canrsquot be made use exponential
backoff until new measurement can be madendash By factor of 2 with limit
bull Resume normal sampling algorithm afterwardsndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Fast Retransmitbull Time-out period often relatively long
ndash long delay before resending lost packetbull Detect lost segments via duplicate ACKs
ndash Sender often sends many segments back-to-backndash If segment is lost there will likely be many duplicate ACKs
bull If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lostndash fast retransmit resend segment before timer expires
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Flow Controlbull receive side of
TCP connection has a receive buffer
bull speed-matching service matching the send rate to the receiving apprsquos drain rate
bull app process may be slow at reading from buffer
sender wonrsquot overflow
receiverrsquos buffer bytransmitting too
much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards out-of-order segments)
bull spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindowndash guarantees
receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Flow Control Persistence Timerbull Suppose sender is blocked cause receiver
application hasnrsquot picked up databull Then receiver app reads n bytesbull TCP receiver advertises new window of size
RcvWindow = nbull But suppose this advertisement is lostbull Sender would be stuck
ndash Solution persistence timer sender sends probe after a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo before exchanging data segments
bull initialize TCP variablesndash seq sndash buffers flow control info
(eg RcvWindow)bull client connection initiator connect(s ampdstaddr hellip) bull server contacted by client cl=accept(sv ampcaddr
hellip)
Three way handshake
Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data
Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial seq
Step 3 client receives SYNACK replies with ACK segment which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshakebull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation(b) Old SYN appearing out of nowhere(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clockndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbersbull Must also guard against sequence number prediction attack
ndash Use PRNG see [RFC 1948] [CERT 2001-09]bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence number a host B is going to use next
bull By using spoofed source IP C A can engage in successful 3-way handshake with Bndash B believes it is talking to C might grant permissions
based on Crsquos IP addressndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for thatbull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
When SYNs Attackbull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where server is flooded with bogus SYN packets with forged IP source addresses
bull Solutionndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not allocate buffers
ndash If client continues with SYNACK check if ACK could have been sent then allocate buffers if correct
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
Sequence Number Summarybull Goals Set 1
ndash Guard against old duplicates in one connection -gt donrsquot reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2ndash Donrsquot allow high-jacking of connections unless attacker can
eavesdrop ndash use PRNG for initial seq number choicendash Donrsquot allow SYN attacks ndash compute but donrsquot store initial
sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Connection Management (cont)Closing a connection
client closes socket close(s)
Step 1 client end system sends TCP FIN control segment to server
Step 2 server receives FIN replies with ACK Closes connection sends FIN
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs
Step 4 server receives ACK Connection closed
Note with small modification can handle simultaneous FINs
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP ConnectionFSM
The heavy solid line is the normal path for a client
The heavy dashed line is the normal path for a server
The light lines are unusual events
Each transition is labeled by the event causing it and the action resulting from it separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Closing a Connectionbull Note previous charts showed normal casebull Can we reliably close a connection if
packets (FIN ACK) can be lostndash No Famous two-army problem
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Summarybull TCP segments acknowledgements amp
retransmissionndash Delayed ACKs Naglersquos algorithmndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm bull Flow Control amp Silly Window Syndromebull Connection Management in TCPbull Attacks against TCPrsquos connection management
schemendash SYN attackndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Miscellaneousbull MSS Maximum Segment Size Option
ndash Clientserver agree on larger than default (536 outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgementsbull WSCALE ndash scale factor for receive window to
allow for LFN (ldquoelefantrdquo) ndash Large Fat Networksbull RFC 1323 timestamps for accurate RTT
measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Study of TCP Outlinebull segment structurebull reliable data transfer
ndash delayed ACKsndash Naglersquos algorithm
bull timeout management fast retransmitbull flow control + silly window syndromebull connection management[ Network Address Translation ][ Principles of congestion control ]bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Principles of Congestion Control
Congestionbull informally ldquotoo many sources sending too
much data too fast for network to handlerdquobull different from flow controlbull manifestations
ndash long delays (queueing in router buffers)ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Causescosts of congestion scenario 1 bull two senders two
receiversbull one router infinite
buffers bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared output link buffers
Host Alin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Causescosts of congestion scenario 2 bull one router finite buffers bull sender retransmission of lost packet
after timeout
finite shared output link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput) bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion bull more work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2lin
l out
a c
R2
R2lin
l out
R4
R2
R2lin
l out
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Causescosts of congestion scenario 3 bull four sendersbull multihop pathsbull timeoutretransmit
Q what happens as lin and lrsquoin increase
finite shared output link buffers
Host Alin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion bull when packet is dropped any upstream transmission
capacity used for that packet was wastedbull ultimately leads to congestion collapse
Host A
Host B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even if no packet loss occurs
bull If packet loss occurs needed retransmission require offered load to be greater than goodput
bull Downstream losses waste upstream transmission capacity leading to congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion control
bull no explicit feedback from network
bull congestion inferred from end-system observed loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systemsndash single bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Congestion Controlbull end-end control (no network
assistance)bull sender limits transmission LastByteSent-LastByteAcked min(CongWin RcvWindow)bull CongWin and RTT influence throughput
bull CongWin is dynamic function of perceived network congestion
How does sender notice congestion
bull loss event = timeout or 3 duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is primary cause of loss)
Three mechanismsbull AIMDbull slow startbull fast recovery
rate = CongWin
RTT Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease cut CongWin in half after loss event
additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Slow Startbull When connection
begins CongWin = 1 MSSndash Example MSS =
500 bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTTndash desirable to quickly
ramp up to respectable rate
bull When connection begins increase rate exponentially fast until first loss event
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Slow Start (more)bull When connection begins
increase rate exponentially until first loss eventndash double CongWin every
RTTndash done by incrementing CongWin for every ACK received
bull Summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Tahoe vs RenoQ When should the
exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementationbull Variable Threshold bull At loss event Threshold is set to 12 of CongWin just before loss
eventbull If timeout event set CongWin to 1bull If triple-ack event set CongWin to Threshold and increase linearly
(this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Timeouts vs 3-dup ACKsbull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout eventndash CongWin instead set
to 1 MSS ndash window then grows
exponentiallyndash to a threshold then
grows linearly
bull 3 dup ACKs indicates network capable of delivering some segments
bull timeout before 3 dup ACKs received is stronger ldquomore alarmingrdquo indicator of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Sender Congestion ControlEvent State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS If (CongWin gt Threshold) set state to CA
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
Triple duplicate ACK
SS or CA Threshold = CongWin2 CongWin = ThresholdSet state to CA
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS
Timeout SS or CA Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTTndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTTbull Just after loss window drops to W2 throughput
to W2RTT bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Throughput amp Lossbull Example 1500 byte segments 100ms RTT
want 10 Gbps throughputbull Requires window size W = 83333 in-flight
segmentsbull Throughput in terms of loss rate
bull L = 210-10 Very lowbull Require almost perfect link
LRTTMSS221
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
Equation-Based Control
bull Note TCP congestion control forms control loopndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)bull Instead equation-based control uses an
equation to compute sending rate based on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same bottleneck link of bandwidth R each should have average rate of Rk
TCP connection 1
bottleneckrouter
capacity RTCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Why is TCP fairConsider two competing sessionsbull additive increase gives slope of 1 as throughput increasesbull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConn
e ctio
n 2
thro
u ghp
ut
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Fairness (more)Fairness and UDPbull Multimedia apps often
do not use TCPndash do not want rate throttled
by congestion controlbull Instead use UDP
ndash pump audiovideo at constant rate tolerate packet loss
bull TCP friendliness
Fairness and parallel TCP connections
bull nothing prevents app from opening parallel connections between 2 hosts
bull Example link of rate R supporting 9 connections ndash new app asks for 1 TCP gets
rate R10ndash new app asks for 9 TCPs gets
R2bull Thatrsquos what ldquodownload
acceleratorsrdquo do