Post on 16-Jul-2015
transcript
Chapter 12 TCP Traffic Control1
Traffic Control in TCPTraffic Control in TCP
Chapter 12 TCP Traffic Control2
TodayToday
Traffic control in TCPRouter-based congestion control
– RED (Random Early Detect)
– ECN (Explicit Congestion Notification)Reference and foils ©: Stallings Hi-Speed
Networks and Internets, Ch. 12Covered in [KMK] (in depth)
Chapter 12 TCP Traffic Control3
TCP Flow and Congestion ControlTCP Flow and Congestion Control
TCP transmit rate is determined by rate of incoming ACKs
Rate of Ack arrival determined by round-trip path between source and destination
Bottleneck may be destination or internetSender cannot tell whichOnly the internet bottleneck can be due to
congestion
Chapter 12 TCP Traffic Control4
Figure 12.7 TCP Flow and Figure 12.7 TCP Flow and Congestion ControlCongestion Control
Chapter 12 TCP Traffic Control5
Figure 12.6 TCP Figure 12.6 TCP Segment PacingSegment Pacing
Chapter 12 TCP Traffic Control6
TCP Flow ControlTCP Flow Control Simple yet often critical impact on performance Optimal: Window=Bits-In-Flight
=Bandwidth x Delay– Bandwidth: of `bottleneck` (slowest) link on path– Delay: end-to-end (propagation, queuing, processing)
If window too small: slows down connection If window too large:
– Requires large buffers (or risk buffer overflow)– Many bytes may be sent into `broken connection`– Extra segments queued at routers not good!
Chapter 12 TCP Traffic Control7
TCP Header Traffic Control FieldsTCP Header Traffic Control FieldsSequence number (SN) of first byte in data
segment [set by sender]Acknowledgement number (AN)Window (W)
– `Credit allocation` for flow controlAcknowledgement contains AN = i, W = j:
bytes through SN = i - 1 acknowledgedPermission is granted to send W = j more bytes,
i.e., bytes i through i + j - 1
Chapter 12 TCP Traffic Control8
Figure 12.1 TCP Credit Figure 12.1 TCP Credit Allocation MechanismAllocation Mechanism
Chapter 12 TCP Traffic Control9
Credit Allocation is FlexibleCredit Allocation is Flexible
Suppose last message B issued was AN = i, W = j
To increase credit to k (k > j) when no new data, B issues AN = i, W = k
To acknowledge segment containing m bytes (m < j), B issues AN = i + m, W = j - m
Chapter 12 TCP Traffic Control10
Figure 12.2 Flow Control Figure 12.2 Flow Control PerspectivesPerspectives
Chapter 12 TCP Traffic Control11
Credit PolicyCredit Policy
Receiver needs a policy for how much credit to give sender
Conservative approach: grant credit up to limit of available buffer space– May limit throughput in long-delay situations
Optimistic approach: grant credit based on expected free space when data arrives– May cause `overbooking` – But not if arrivals are thru `leaky bucket`
Chapter 12 TCP Traffic Control12
Effect of Window SizeEffect of Window Size
W = TCP window size (bytes)R = Data rate (bps) at TCP sourceD = Propagation delay (seconds)After TCP source begins transmitting, it
takes D seconds for first byte to arrive, and D seconds for acknowledgement to return
TCP source could transmit at most 2RD bits, or RD/4 bytes, before getting Ack
Chapter 12 TCP Traffic Control13
Normalized Throughput S Normalized Throughput S
1 W > RD / 4S = 4W W < RD / 4 RD
Chapter 12 TCP Traffic Control14
Complicating FactorsComplicating Factors Multiple TCP connections are multiplexed over same
network interface, reducing R and efficiency For multi-hop connections, D is the sum of delays across
each network plus delays at each router If source data rate R exceeds data rate on one of the hops,
that hop will be a bottleneck– Use lower rate (goal of congestion control)
Lost segments are retransmitted, reducing throughput. Impact depends on retransmission policy
TCP’s delayed (accumulated) acks: send Ack only after receiving 2 MSS or 200msec after Ack W=(2MSS)*k [Why not just W≥2MSS ?] – Actually W≥4MSS [otherwise sender waits for Ack]– Indeed W=4MSS is popular default [but not always best]
Chapter 12 TCP Traffic Control15
Retransmission StrategyRetransmission Strategy Retransmission required when:
1. Segment arrives damaged, as indicated by checksum error
2. Segment fails to arrive TCP relies exclusively on positive
acknowledgements + Timeouts:– Retransmission on acknowledgement
timeout– No explicit negative acknowledgement
Chapter 12 TCP Traffic Control16
Retransmission TimerRetransmission Timer
A timer is associated with each segment as it is sent (or with oldest unacknowledged segment)
If timer expires before segment acknowledged, sender must retransmit
Value of retransmission timer: a key design issue – Timer should be a bit longer than round-trip delay
(send segment, receive ack)– Delay is variable– Timer too small: many unnecessary retransmissions,
wasting network bandwidth– Timer too large: delay in handling lost segment
Chapter 12 TCP Traffic Control17
Difficulties to `measure delay`Difficulties to `measure delay`
Delayed (accumulated) AcksFor retransmitted segments, can’t tell
whether acknowledgement is response to original transmission or retransmission– Don’t estimate delay from retransmissions
[Karn’s algorithm]Network conditions may change suddenly
– Yet don’t overreact to impact of burst traffic
Chapter 12 TCP Traffic Control20
Adaptive Retransmission Timer:Adaptive Retransmission Timer:Simple Averaging MethodSimple Averaging MethodAverage Round-Trip Time (ARTT) K + 1
ARTT(K + 1) = 1 ∑ RTT(i) K + 1 i = 1
= K ART(K) + 1 RTT(K + 1)
K + 1 K + 1
Chapter 12 TCP Traffic Control21
RFC 793 Exponential AveragingRFC 793 Exponential Averaging
Smoothed Round-Trip Time (SRTT)
SRTT(K + 1) = α × SRTT(K) + (1 – α) × RTT(K + 1)
The older the observation, the less it is counted in the average.
Chapter 12 TCP Traffic Control22
Figure 12.4 Figure 12.4 Exponential Exponential Smoothing Smoothing CoefficientsCoefficients
SRTT(K + 1) = α × SRTT(K) + (1 – α) × SRTT(K + 1)(α=0.5, 0.875)
Chapter 12 TCP Traffic Control23
Figure 12.5 Figure 12.5 Exponential Exponential AveragingAveraging
Chapter 12 TCP Traffic Control24
RFC 793 Retransmission TimeoutRFC 793 Retransmission Timeout
RTO(K + 1) = Min(UB, Max(LB, β × SRTT(K + 1)))
UB, LB: prechosen fixed upper and lower bounds (typical/default: 1sec and 64 sec)
Example values for α, β:
0.8 < α < 0.9 1.3 < β < 2.0
Chapter 12 TCP Traffic Control25
Problem: RTT Instability PeriodsProblem: RTT Instability Periods
3 sources of high variance in RTTQueuing , collisions due to other sourcesPeer may not acknowledge segments
immediatelyLow data rate different packet size
cause different delays
Chapter 12 TCP Traffic Control26
Solution: Jacobson’s AlgorithmSolution: Jacobson’s AlgorithmSRTT(K + 1) = (1 – g) × SRTT(K) + g × RTT(K + 1)
SERR(K + 1) = RTT(K + 1) – SRTT(K)
SDEV(K + 1) = (1 – h) × SDEV(K) + h ×|SERR(K + 1)|
RTO(K + 1) = SRTT(K + 1) + f × SDEV(K + 1)
g = 0.125 h = 0.25 f = 2 or f = 4 (most current implementations use f = 4)
Chapter 12 TCP Traffic Control27
Figure 12.8 Figure 12.8 Jacobson’s RTO Jacobson’s RTO CalculationsCalculations
Chapter 12 TCP Traffic Control28
Exponential RTO BackoffExponential RTO Backoff
RTT measured only on no failures– `Karn’s algorithm`: don’t measure on resend
What if RTO too small failures no RTT… Increase RTO each time the same segment
retransmitted – backoff process Multiply RTO by constant on retransmit:
RTO[K+1] = q × RTO [K] q = 2 is called binary exponential backoff
Chapter 12 TCP Traffic Control29
TCP Implementation OptionsTCP Implementation Options Send: Nagle’s algorithm or immediate
– Nagle: When waiting for Ack, send partial-segments only on receipt/time-out
Deliver: cumulative or immediate (`push`) Ack: Delayed (cumulative) or Immediate Accept: In-order or In-window Retransmit
First-only: one timer for entire Q, retransmit only first in Q Batch: one timer for entire Q, retransmit entire Q Individual: timer and retransmit per each packet Which is best if receiver uses in-order accept policy?
Chapter 12 TCP Traffic Control30
TCP Congestion ControlTCP Congestion Control
Dynamic routing can alleviate congestion by spreading load more evenly
But only effective for unbalanced loads and brief surges in traffic
Congestion can only be controlled by limiting total amount of data entering network
ICMP source Quench message is crude and not effective
RSVP may help but not widely implemented
Chapter 12 TCP Traffic Control31
TCP Congestion Control is DifficultTCP Congestion Control is Difficult
IP is connectionless and stateless, with no provision for detecting or controlling congestion
TCP only provides end-to-end flow control
No cooperative, distributed algorithm to bind together various TCP entities
Chapter 12 TCP Traffic Control32
Congestion Window ManagementCongestion Window Management
Slow start / rapid accelerateDynamic window sizing on congestionFast retransmitFast recoveryLimited transmit
Chapter 12 TCP Traffic Control33
Slow Start / Rapid AccelerateSlow Start / Rapid Accelerate
awnd = MIN[ credit, cwnd]where
awnd = allowed window in segments
cwnd = congestion window in segments
credit = amount of unused credit granted in most recent ack
cwnd = 1 for a new connection and increased by 1 for each ack received, up to a maximum; from maximum: increase by 1 per RTT
Chapter 12 TCP Traffic Control34
Figure 23.9 Effect of Figure 23.9 Effect of Slow StartSlow Start
Chapter 12 TCP Traffic Control35
Dynamic Window Sizing on CongestionDynamic Window Sizing on Congestion A lost segment indicates congestion Prudent to reset cwsd = 1 and begin slow start
process May not be conservative enough: “ easy to drive a
network into saturation but hard for the net to recover” (Jacobson)
Instead:– Set slow-start threshold ssthresh=cwnd/2 – Use slow start from cwnd=1 till cwnd=ssthresh– For cwnd>sstrhesh, increase only by one each Ack
This is called congestion avoidance
Chapter 12 TCP Traffic Control36
Figure 12.10 Slow Figure 12.10 Slow Start and Congestion Start and Congestion AvoidanceAvoidance
Chapter 12 TCP Traffic Control37
Figure 12.11 Illustration of Slow Figure 12.11 Illustration of Slow Start and Congestion AvoidanceStart and Congestion Avoidance
Chapter 12 TCP Traffic Control38
Fast RetransmitFast Retransmit
RTO is generally noticeably longer than actual RTT
If a segment is lost, TCP may be slow to retransmit
TCP rule: if a segment is received out of order, an ack must be issued immediately for the last in-order segment
Fast Retransmit rule: if 4 acks received for same segment, highly likely it was lost, so retransmit immediately, rather than waiting for timeout
Chapter 12 TCP Traffic Control39
Figure 12.12 Fast Figure 12.12 Fast RetransmitRetransmit
Chapter 12 TCP Traffic Control40
Fast RecoveryFast Recovery
When TCP retransmits a segment using Fast Retransmit, a segment was assumed lost
Congestion avoidance measures are appropriate at this point
E.g., slow-start/congestion avoidance procedure This may be unnecessarily conservative since
multiple acks indicate segments are getting through (and out of the Net)
Fast Recovery: retransmit lost segment, cut cwnd in half, proceed with linear increase of cwnd
This avoids initial exponential slow-start
Chapter 12 TCP Traffic Control41
Figure 12.13 Fast Figure 12.13 Fast Recovery ExampleRecovery Example
Small ticks: dup acks
Acks line
window line Upon 3rd dup ack:a. Set ssthresh=cwnd/2b. Retransmit segmentc. Set cwnd=ssthresh+3Upon additional dup ack:a. cwnd++;b. If pending<cwnd,
send another segmentUpon new ack: set cwnd=ssthresh.
Chapter 12 TCP Traffic Control42
Limited TransmitLimited Transmit
If congestion window at sender is small, fast retransmit may not get triggered, e.g., cwnd = 3
1. Under what circumstances does sender have small congestion window?
2. Is the problem common?3. If the problem is common, why not reduce
number of duplicate acks needed to trigger retransmit?
Chapter 12 TCP Traffic Control43
Limited TransmitLimited Transmit If congestion window at sender is small, fast
retransmit may not get triggered, e.g., cwnd = 31. Under what circumstances does sender have small
congestion window?1. Limited amount of data to send2. Small limit on receive window (credit)3. Small bandwidth*delay (e.g. very low delay)
2. Is the problem common? 1. Yes, e.g. about 56% retransmit due to RTO expires, only
44% of them by fast retransmit
3. If the problem is common, why not reduce number of duplicate acks needed to trigger retransmit?
1. Packet reordering is not all that rare
Chapter 12 TCP Traffic Control44
Limited Transmit AlgorithmLimited Transmit AlgorithmRFC 3042RFC 3042
Sender can transmit new segment when 3 conditions are met:
1. Two consecutive duplicate acks are received
2. Destination advertised window allows transmission of segment
3. Amount of outstanding data after sending is less than or equal to cwnd + 2
Chapter 12 TCP Traffic Control45
Mices vs. ElephantsMices vs. Elephants 80% of the traffic is due to a small number of
large, steady flows {elephants} . The remaining traffic volume is due to many short-
lived, bursty flows {mice} . With TCP congestion control mechanisms, these
short flows receive less than their fair share when they compete for the bottleneck bandwidth.
Elephants fill up routers queues; when mice packets arrive (in bursts), they are dropped!
Chapter 12 TCP Traffic Control46
Another view: Buffer SizeAnother view: Buffer SizeSmall buffers:
– small delays
– but packets often lost (dropped, due to bursts)Large buffers:
– reduce number of packet drops (due to bursts)– but increase delays: filled up by `elephants`– So, it may not reduce drops so much !
Can we have the best of both worlds?Can elephants and mice prosper in peace?
Chapter 12 TCP Traffic Control47
Proactive Packet Discard Proactive Packet Discard
Congestion management by proactive packet discard– Before buffer full– Used on single FIFO queue
Or multiple queues for elastic traffic (but not here…)
– Random Early Detection (RED) [Floy97]
Chapter 12 TCP Traffic Control48
Random Early Discard (RED)Random Early Discard (RED)Basic premise:
– router should signal congestion when the queue starts building up (slow elephants)
RED: signal congestion by dropping a packet
– give flows time to reduce their sending rates before dropping more packets
– Don’t penalize burst traffic (mice)!Therefore, packet drops should be:
– early: don’t wait for queue to overflow– random: better slow an elephant than kill a mouse
Chapter 12 TCP Traffic Control49
Random Early Discard (RED): Random Early Discard (RED):
MotivationMotivation If burst fills buffers, gets discarded… (if TCP) Reduce window, probably: slow start
– Lost packets need to be resent Adds to load and delay
– Global synchronization Traffic burst fills queues so packets lost Many TCP connections enter slow start Traffic drops so network under utilized Connections may leave slow start at same time causing burst
Bigger buffers? High cost, limited returns, more delays Idea: Try to anticipate onset of congestion and tell one
connection (at a time) to slow down– Classical RED (in [Stalling]): notify by dropping packets– Or: ECN (Explicit Congestion Notification)
Chapter 12 TCP Traffic Control50
RED Design GoalsRED Design Goals
Congestion avoidanceGlobal synchronization avoidance
– Current systems inform (all) connections to back off implicitly by dropping packets
Avoidance of bias against bursty traffic– Burst fills tail of queue; so `drop-tail` is bad
Bound on average queue length– Hence control on average delay
Chapter 12 TCP Traffic Control51
RED: Random Early Detect/DiscardRED: Random Early Detect/DiscardFIFO schedulingBuffer management:
– Probabilistically discard packets – Probability is computed as a function of
average queue length (why average?)Discard Probability
AverageQueue Length
0
1
min_th max_th queue_len
Chapter 12 TCP Traffic Control52
REDREDpacket
THTHminminTHTHmaxmax
THTHminmin :::: average queue length threshold for average queue length threshold for triggering probabilistic drops/markstriggering probabilistic drops/marks..
THTHmax max :::: average queue length threshold for average queue length threshold for triggering forced drops.triggering forced drops.
Chapter 12 TCP Traffic Control53
RED (cont’d)RED (cont’d)THmin – minimum threshold
THmax – maximum thresholdavg – exponential average queue length q – sample queue length
– avg = (1-w)*avg + w*q– Forget factor w : 0 < w < 1Discard Probability
AverageQueue Length
0
1
THmin THmax queue_len
Chapter 12 TCP Traffic Control54
RED (cont’d)RED (cont’d)THmin – minimum threshold
THmax – maximum thresholdavg[k] –average queue length (at period k)q[k] – sample queue length (at period k)
– avg[k] = (1-w)*avg[k-1] + w*q[k]– Forget factor w : 0 < w < 1Discard Probability
AverageQueue Length
0
1
THmin THmax queue_len
Chapter 12 TCP Traffic Control55
Average vs Instantaneous Average vs Instantaneous Queue LengthQueue Length
Chapter 12 TCP Traffic Control56
RED (cont’d)RED (cont’d)If (avg < THmin) enqueue packet
If (avg > THmax) drop packet
Else discard with probability Pa
Discard Probability (Pa)
AverageQueue Length
0
1
THmin THmax queue_len
Chapter 12 TCP Traffic Control57
RED Algorithm – OverviewRED Algorithm – Overview
Calculate average queue size avgif avg < THmin
queue packetelse if THmin ≤ avg < THmax
calculate probability Pa
with probability Pa
discard packetelse with probability 1-Pa
queue packetelse if avg ≥ THmax
discard packet
Chapter 12 TCP Traffic Control58
RED BufferRED Buffer
Chapter 12 TCP Traffic Control59
RED: discard probability (simplified)RED: discard probability (simplified)Pa = max_P*(avg – THmin)/(THmaz – THmin)
Discard Probability
AverageQueue Length
0
1
THmin THmax queue_len
avg
Pa
max_P
Chapter 12 TCP Traffic Control60
RED: discard probability (real)RED: discard probability (real)Pb = max_P*(avg – THmin)/(THmaz – THmin)count: # of times we rolled `don’t discard` Pa=Pb/(1-count*Pb)
Discard Probability
AverageQueue Length
0
1
THmin THmax queue_len
avg
Pb (e.g. 0.2)
max_P (e.g. 0.35) Pa (e.g. 0.5, for count=3)
Chapter 12 TCP Traffic Control61
RED Algorithm DetailRED Algorithm Detail
Chapter 12 TCP Traffic Control62
RED SummaryRED Summary Average queue length small
– Reduces latency– Yet: reserves to absorb bursts
Does not always work well– Better if we can avoid dropping packets…– RED is random: may hit mouses, too…– Painful for short connections, e.g. Web traffic
ECN (Explicit Congestion Notification): – Same as RED, but mark instead of drop
Chapter 12 TCP Traffic Control63
ECN Routers [RFC3168]ECN Routers [RFC3168]
Explicit Congestion Notification (ECN) marks packets to signal congestion.
ECN works only if supported by router, sender and recipient
ECN-compliant TCP senders initiate their congestion avoidance algorithm after receiving marked ACK packets from the TCP receiver.
Packets from non-ECN flows are dropped (RED)
Chapter 12 TCP Traffic Control64
ECN SignallingECN SignallingTwo bits (in DS field of IP header) Explicit congestion notification (ECN) bit:
– Set by ECN router (instead of drop in RED)– If data packet has ECN bit set, then set it in ACK
ECN capable bit (for backward compatibility)– Indicates that sender implements ECN– To be ignored (and passed) by non-ECN router
Chapter 12 TCP Traffic Control65
PicturePicture
WW/2
A B
Chapter 12 TCP Traffic Control66
ECN AdvantagesECN Advantages
No extra delay (to receive retransmit)No wait for ack on retransmit (before
shifting window to new packets)Most critical for short lived flows (e.g.
Web transfers)No waste of bandwidth (retransmit)Indication of corruption losses?
Chapter 12 TCP Traffic Control67
ECN/RED: (Simplified) AnalysisECN/RED: (Simplified) Analysis[KMK], mostly section 7.6.6Simplifications:
– Single source
– Fixed delays: d=τf + τr + τq
– Rate of source at time t is r(t), buffer is x(t)
Router
Routerwide area linkClientServer x(t)r(t)
τf
c bps
τr
τq
Chapter 12 TCP Traffic Control68
ECN/RED: (Simplified) AnalysisECN/RED: (Simplified) AnalysisDynamic system model:
Router
Routerwide area linkClientServer x(t)r(t)
τf
c bps
τr
τq
( )
=−−>−−
= + 0)(for )(
0)(for )()(
txctr
txctrtx
f
f
ττ
Chapter 12 TCP Traffic Control69
Analysis: (more) SimplificationsAnalysis: (more) SimplificationsFixed delays: d=τf + τr + τq Rate of source at time t is r(t)=w(t)/dSender in congestion avoidance (AIMD):
– Ack without ECN: w(t+)=w(t)+1/w(t) [AddInc]– Ack with ECN: w(t+)=w(t) / 2 [MultDec]
Ack sent (immediately) for every packetSimplified mark probability:
( )( )( )+
+
−−−⋅⋅−⋅−
−−−−⋅−⋅−=
*
*
)()()(2
1
)(1)()(
1)(
xtxdtrtw
xtxdtrtw
tw
qr
qr
ττη
ττη
( )+−⋅ *)( xtxη
Chapter 12 TCP Traffic Control70
So far we have….So far we have….
Equilibrium ??r(t)=c, x(t)=x0 s.t.:
( )( )( )+
+
−−−⋅⋅−⋅−
−−−−⋅−⋅−=
*
*
)()()(
)(1)()(
1)(
xtxdtrtbw
xtxdtrtw
tw
qr
qr
ττη
ττη
( )
=−−>−−
= + 0)(for )(
0)(for )()(
txctr
txctrtx
f
f
ττ
4/1
/1)(
22*
0 dcxx
+=− η
Chapter 12 TCP Traffic Control71
Equilibrium ??Equilibrium ??
Equilibrium point: r(t)=c, x(t)=x0 s.t.:
Sufficient condition:
( )( )( )+
+
−−−⋅⋅−⋅−
−−−−⋅−⋅−=
*
*
)()()(
)(1)()(
1)(
xtxdtrtbw
xtxdtrtw
tw
qr
qr
ττη
ττη
( )
=−−>−−
= + 0)(for )(
0)(for )()(
txctr
txctrtx
f
f
ττ
4/1
/1)(
22*
0 dcxx
+=− η
4)( 3 <ηdc
Chapter 12 TCP Traffic Control72
ENDEND