Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | cameron-dean |
View: | 214 times |
Download: | 0 times |
Advanced Computer Advanced Computer NetworkingNetworking
Internet Congestion Control
11
22
Principles of Congestion ControlPrinciples of Congestion Control
Congestion:• informally: “too many sources sending too
much data too fast for network to handle”• manifestations:
– lost packets (buffer overflow at routers)– long delays (queuing in router buffers)
• a highly important problem!
H1
H2
R1 H3
A1(t)
10Mb/sD(t)
1.5Mb/sA2(t)
100Mb/s behnam shafagatybehnam shafagaty
33
Causes/costs of congestion: scenario Causes/costs of congestion: scenario
11
• two senders, two receivers• one router, • infinite buffers • no retransmission
behnam shafagatybehnam shafagaty
44
Causes/costs of congestion: scenario Causes/costs of congestion: scenario
11
• Throughput increases with load• Maximum total load C (Each session C/2)• Large delays when congested
– The load is stochastic
behnam shafagatybehnam shafagaty
55
Causes/costs of congestion: scenario Causes/costs of congestion: scenario
22 one router, finite buffers sender retransmission of lost packet
behnam shafagatybehnam shafagaty
66
Causes/costs of congestion: scenario Causes/costs of congestion: scenario
22 • always: (goodput)
– Like to maximize goodput!
• “perfect” retransmission:
– retransmit only when loss:
• Actual retransmission of delayed (not lost) packet
makes larger (than perfect case) for same .
in
out
=
out
in
out
>
in
behnam shafagatybehnam shafagaty
77
Causes/costs of congestion: scenario Causes/costs of congestion: scenario
22
“costs” of congestion: more work (retrans) for given “goodput” unneeded retransmissions: link carries (and
delivers) multiple copies of pkt
inin '
out
out
’in
out
’in
behnam shafagatybehnam shafagaty
Packet delay and throughput as functions of load
88 behnam shafagatybehnam shafagaty
Congestion ControlCongestion Control
• Congestion control involves two tasks:-Detect congestion-Limit sending rate
99 behnam shafagatybehnam shafagaty
TCP & AQMTCP & AQM
xi(t)
pl(t)
TCP: Reno Vegas
AQM: DropTail RED REM,PI,AVQ
Example congestion measure pl(t)
– Loss (Reno)– Queuing delay (Vegas)
Example congestion measure pl(t)
– Loss (Reno)– Queuing delay (Vegas)
1010 behnam shafagatybehnam shafagaty
TCP Congestion ControlTCP Congestion Control
•End-End control (no network assistance)
•Assumes long delays (packet loss) is due to congestion
1111 behnam shafagatybehnam shafagaty
Congestion Control IICongestion Control II
• TCP uses slow start and Additive Increase/multiplicative decrease (AIMD) to deal with congestion
• Van Jacobson 1988 outlined these ideas
• slow-start roughly: whenever starting traffic or recovering from congestion, start cwnd at the size of a single segment and increase it (up to a point) as ACKs show up
1212 behnam shafagatybehnam shafagaty
1313
AIMDAIMD(Additive Increase / Multiplicative (Additive Increase / Multiplicative
Decrease)Decrease)• CongestionWindow (cwnd) is a variable held
by the TCP source for each connection.
• cwnd is set based on the perceived level of congestion. The Host receives implicit (packet drop) or explicit (packet mark) indications of internal congestion.
MaxWindow :: min (CongestionWindow, AdvertisedWindow)
EffectiveWindow = MaxWindow – (LastByteSent -LastByteAcked)
behnam shafagatybehnam shafagaty
1414
Additive IncreaseAdditive Increase
• Additive Increase is a reaction to perceived available capacity.
• Linear Increase basic idea:: For each “cwnd’s worth” of packets sent, increase cwnd by 1 packet.
• In practice, cwnd is incremented fractionally for each arriving ACK.
increment = (MSS /cwnd)
cwnd = cwnd + increment
behnam shafagatybehnam shafagaty
1515
Additive IncreaseAdditive Increase
Source Destination
Add one packet
each RTT
behnam shafagatybehnam shafagaty
1616
Multiplicative DecreaseMultiplicative Decrease
The key assumption is that a dropped packet and the resultant timeout are due to congestion at a router or a switch.
Multiplicate Decrease:: TCP reacts to a timeout by halving cwnd.
cwnd is not allowed below the size of a single packet.
behnam shafagatybehnam shafagaty
1717
AIMDAIMD: Some Notes: Some Notes
• It has been shown that AIMD is a necessary condition for TCP congestion control to be stable.
• Because the simple CC mechanism involves timeouts that cause retransmissions, it is important that hosts have an accurate timeout mechanism.
• Timeouts set as a function of average RTT and standard deviation of RTT.
behnam shafagatybehnam shafagaty
1818
Typical TCP Congestion window Evolution Typical TCP Congestion window Evolution
behnam shafagatybehnam shafagaty
1919
AIMD: AIMD: Two users, One linkTwo users, One link
BW limit
Fairness
Rate of User 1
Rat
e of
Use
r 2
behnam shafagatybehnam shafagaty
2020
Slow StartSlow Start
Linear additive increase takes too long to ramp up a new TCP connection from cold start.
Beginning with TCP Tahoe, the slow start mechanism was added to provide an initial exponential increase in the size of cwnd.
behnam shafagatybehnam shafagaty
2121
SloSloww Start Start1- The source starts with cwnd = 1.2- Every time an ACK arrives, cwnd is
incremented.cwnd is effectively doubled per RTT “epoch”.Two slow start situations:
At the very beginning of a connection {cold start}.
When the connection goes dead waiting for a timeout to occur (i.e, the advertized window goes to zero!)
behnam shafagatybehnam shafagaty
2222
Slow StartSlow Start
Source Destination
Slow StartAdd one packet
per ACK
behnam shafagatybehnam shafagaty
2323
Fast RetransmitFast Retransmit
Basic Idea:: use duplicate ACKs to signal lost packet.
Fast RetransmitUpon receipt of three duplicate ACKs, the TCP Sender
retransmits the lost packet.
behnam shafagatybehnam shafagaty
2424
Fast RetransmitFast Retransmit
• Generally, fast retransmit eliminates about half timeouts.
• This yields roughly a 20% improvement in throughput.
• Note – fast retransmit does not eliminate all the timeouts due to small window sizes at the source.
behnam shafagatybehnam shafagaty
2525
Fast RetransmitFast Retransmit
Packet 1
Packet 2
Packet 3
Packet 4
Packet 5
Packet 6
Retransmitpacket 3
ACK 1
ACK 2
ACK 2
ACK 2
ACK 6
ACK 2
Sender Receiver
Fast Retransmit
Based on three
duplicate ACKs
behnam shafagatybehnam shafagaty
2626
TCP Congestion Window TCP Congestion Window TraceTrace
0
10
20
30
40
50
60
70
0 10 20 30 40 50 60
Time
Co
ng
esti
on
Win
do
w
threshold
congestionwindowtimeouts
slow start period
additive increase
fast retransmission
behnam shafagatybehnam shafagaty
2727
Fast RecoveryFast Recovery• Fast recovery was added with TCP Reno.
Fast Recovery•In congestion avoidance mode, if duplicate acks are received, reduce cwnd to half.•If n successive duplicate acks are received, we know that receiver got n segments after lost segment: Advance cwnd by that number.
behnam shafagatybehnam shafagaty
2828
Adaptive RetransmissionsAdaptive Retransmissions
RTT:: Round Trip Time between a pair of hosts on the Internet.
• How to set the TimeOut value?– The timeout value is set as a function
of the expected RTT.– Consequences of a bad choice?
behnam shafagatybehnam shafagaty
2929
Original AlgorithmOriginal Algorithm
• Keep a running average of RTT and compute TimeOut as a function of this RTT.– Send packet and keep timestamp ts .
– When ACK arrives, record timestamp ta .
SampleRTT = ta - ts
behnam shafagatybehnam shafagaty
3030
Original AlgorithmOriginal Algorithm
Compute a weighted average:
EstimatedRTT = EstimatedRTT = αα x x EstimatedRTT + EstimatedRTT + ( (1- 1- αα) x SampleRTT) x SampleRTT
Original TCP spec: αα in range (0.8,0.9) in range (0.8,0.9)
TimeOut = 2 x TimeOut = 2 x EstimatedRTTEstimatedRTT
behnam shafagatybehnam shafagaty
3131
Karn/Partidge AlgorithmKarn/Partidge Algorithm
An obvious flaw in the original algorithm:
Whenever there is a retransmission it is impossible to know whether to associate the ACK with the original packet or the retransmitted packet.
behnam shafagatybehnam shafagaty
3232
Associating the ACK?Associating the ACK?
Sender Receiver
Original transmission
ACK
Retransmission
Sender Receiver
Original transmission
ACK
Retransmission
(a) (b)
behnam shafagatybehnam shafagaty
3333
Karn/Partidge AlgorithmKarn/Partidge Algorithm
1. Do not measure SampleRTTSampleRTT when sending packet more than once.
2. For each retransmission, set TimeOutTimeOut to double the last TimeOutTimeOut.{ Note – this is a form of exponential backoff based on the believe that the lost packet is due to congestion.}
behnam shafagatybehnam shafagaty
3434
Jaconson/Karels AlgorithmJaconson/Karels AlgorithmThe problem with the original algorithm is that it did
not take into account the variance of SampleRTT.
Difference = SampleRTT – EstimatedRTTDifference = SampleRTT – EstimatedRTTEstimatedRTT = EstimatedRTT +EstimatedRTT = EstimatedRTT +
((δδ x Difference)x Difference)Deviation =Deviation = δδ (|Difference| - Deviation)(|Difference| - Deviation)
where δδ is a fraction between 0 and 1.
behnam shafagatybehnam shafagaty
3535
Jaconson/Karels AlgorithmJaconson/Karels Algorithm
TCP computes timeout using both the mean and variance of RTT
TimeOut =TimeOut = µµ x EstimatedRTT x EstimatedRTT ++ ΦΦ x Deviationx Deviation
where based on experience µ = 1µ = 1 and ΦΦ = 4 = 4.
behnam shafagatybehnam shafagaty
AlgorithmsAlgorithms
3636 behnam shafagatybehnam shafagaty
Early TCP Early TCP
• Pre-1988• Go-back-N ARQ
– Detects loss from timeout– Retransmits from lost packet onward
• Receiver window flow control– Prevent overflows at receive buffer
• Flow control: self-clocking
3737 behnam shafagatybehnam shafagaty
Why Flow Control?Why Flow Control?
• October 1986, Internet had its first congestion collapse
• Link LBL to UC Berkeley – 400 yards, 3 hops, 32 Kbps– throughput dropped to 40 bps– factor of ~1000 drop!
• 1988, Van Jacobson proposed TCP flow control
3838 behnam shafagatybehnam shafagaty
Effect of CongestionEffect of Congestion• Packet loss• Retransmission• Reduced throughput• Congestion collapse due to
– Unnecessarily retransmitted packets– Undelivered or unusable packets
• Congestion may continue after the overload!throughput
load3939 behnam shafagatybehnam shafagaty
Window Flow ControlWindow Flow Control
• ~ W packets per RTT• Lost packet detected by missing ACK
RTT
time
time
Source
Destination
1 2 W
1 2 W
1 2 W
data ACKs
1 2 W
4040 behnam shafagatybehnam shafagaty
Window flow controlWindow flow control
• Limit the number of packets in the network to window W
• Source rate = bps
• If W too small then rate « capacityIf W too big then rate > capacity
=> congestion
• Adapt W to network (and conditions)W = BW x RTT
RTT
MSSW
4141 behnam shafagatybehnam shafagaty
TCP Window Flow ControlsTCP Window Flow Controls
• Receiver flow control– Avoid overloading receiver– Set by receiver– awnd: receiver (advertised) window
• Network flow control– Avoid overloading network– Set by sender– Infer available network capacity– cwnd: congestion window
• Set W = min (cwnd, awnd)4343 behnam shafagatybehnam shafagaty
Receiver Flow ControlReceiver Flow Control
• Receiver advertises awnd with each ACK
• Window awnd– closed when data is received and ack’d– opened when data is read
• Size of awnd can be the performance limit (e.g. on a LAN)– sensible default ~16kB
4444 behnam shafagatybehnam shafagaty
Network Flow ControlNetwork Flow Control
• Source calculates cwnd from indication of network congestion
• Congestion indications– Losses – Delay– Marks
• Algorithms to calculate cwnd– Tahoe, Reno, Vegas, RED, REM …
4545 behnam shafagatybehnam shafagaty
TCP Congestion ControlsTCP Congestion Controls• Tahoe (Jacobson 1988)
– Slow Start– Congestion Avoidance– Fast Retransmit
• Reno (Jacobson 1990)– Fast Recovery
• Vegas (Brakmo & Peterson 1994)– New Congestion Avoidance
• RED (Floyd & Jacobson 1993)– Probabilistic marking
• REM (Athuraliya & Low 2000)– Clear buffer, match rate4646 behnam shafagatybehnam shafagaty
VariantsVariants
• Tahoe & Reno– NewReno– SACK– Rate-halving– Mod.s for high performance
• AQM– RED, ARED, FRED, SRED– BLUE, SFB– REM, PI, AVQ4747 behnam shafagatybehnam shafagaty
TCP Tahoe TCP Tahoe (Jacobson 1988)(Jacobson 1988)
SStime
window
CA
SS: Slow Start
CA: Congestion Avoidance
4848 behnam shafagatybehnam shafagaty
Slow StartSlow Start
data packet
ACK
receiversender
1 RTT
cwnd1
2
34
5678
cwnd cwnd + 1 (for each ACK) 5050 behnam shafagatybehnam shafagaty
Congestion AvoidanceCongestion Avoidance
cwnd1
2
3
1 RTT
4
data packet
ACK
cwnd cwnd + 1 (for each cwnd ACKS)
receiversender
5252 behnam shafagatybehnam shafagaty
Packet LossPacket Loss
• Assumption: loss indicates congestion
• Packet loss detected by– Retransmission TimeOuts (RTO timer)– Duplicate ACKs (at least 3)
1 2 3 4 5 6
1 2 3
Packets
Acknowledgements
3 3
7
35353 behnam shafagatybehnam shafagaty
Fast RetransmitFast Retransmit
• Wait for a timeout is quite long• Immediately retransmits after 3
dupACKs without waiting for timeout• Adjusts ssthresh
flightsize = min(awnd, cwnd)ssthresh max(flightsize/2, 2)
• Enter Slow Start (cwnd = 1)
5454 behnam shafagatybehnam shafagaty
Summary: TahoeSummary: Tahoe• Basic ideas
– Gently probe network for spare capacity– Drastically reduce rate on congestion– Windowing: self-clocking– Other functions: round trip time estimation,
error recoveryfor every ACK { if (W < ssthresh) then W++ (SS) else W += 1/W (CA)}for every loss {
ssthresh = W/2 W = 1 }
5656 behnam shafagatybehnam shafagaty
Fast recoveryFast recovery• Motivation: prevent `pipe’ from emptying after fast
retransmit• Idea: each dupACK represents a packet having left
the pipe (successfully received)• Enter FR/FR after 3 dupACKs
– Set ssthresh max(flightsize/2, 2)– Retransmit lost packet– Set cwnd ssthresh + ndup (window inflation)– Wait till W=min(awnd, cwnd) is large enough; transmit
new packet(s)– On non-dup ACK (1 RTT later), set cwnd ssthresh
(window deflation)
• Enter CA
5959 behnam shafagatybehnam shafagaty
9
94
0 0
Example: FR/FRExample: FR/FR
• Fast retransmit– Retransmit on 3 dupACKs
• Fast recovery– Inflate window while repairing loss to fill pipe
timeS
timeR
1 2 3 4 5 6 87
8
cwnd 8ssthresh
1
74
0 0 0
Exit FR/FR
44
411
00
10 11
6060 behnam shafagatybehnam shafagaty
Summary: RenoSummary: Reno
• Basic ideas– Fast recovery avoids slow start– dupACKs: fast retransmit + fast recovery– Timeout: fast retransmit + slow start
slow start retransmit
congestion avoidance FR/FR
dupACKs
timeout
6161 behnam shafagatybehnam shafagaty
NewReno: MotivationNewReno: Motivation
• On 3 dupACKs, receiver has packets 2, 4, 6, 8, cwnd=8, retransmits pkt 1, enter FR/FR
• Next dupACK increment cwnd to 9• After a RTT, ACK arrives for pkts 1 & 2, exit FR/FR,
cwnd=5, 8 unack’ed pkts• No more ACK, sender must wait for timeout
1 2time
S
timeD
3 4 5 6 87 1
8
FR/FR
09
0 0 0 0 0
9
8 unack’d pkts
2
5
3
timeout
6262 behnam shafagatybehnam shafagaty
NewRenoNewReno Fall & Floyd ‘96, (RFC 2583)Fall & Floyd ‘96, (RFC 2583)
• Motivation: multiple losses within a window– Partial ACK acknowledges some but not all
packets outstanding at start of FR– Partial ACK takes Reno out of FR, deflates window– Sender may have to wait for timeout before
proceeding
• Idea: partial ACK indicates lost packets– Stays in FR/FR and retransmits immediately– Retransmits 1 lost packet per RTT until all lost
packets from that window are retransmitted– Eliminates timeout
6363 behnam shafagatybehnam shafagaty
SACK SACK Mathis, Mahdavi, Floyd, Romanow ’96 (RFC 2018, RFC Mathis, Mahdavi, Floyd, Romanow ’96 (RFC 2018, RFC
2883)2883)
• Motivation: Reno & NewReno retransmit at most 1 lost packet per RTT– Pipe can be emptied during FR/FR with multiple
losses
• Idea: SACK provides better estimate of packets in pipe– SACK TCP option describes received packets– On 3 dupACKs: retransmits, halves window, enters
FR– Updates pipe = packets in pipe
• Increment when lost or new packets sent• Decrement when dupACK received
– Transmits a (lost or new) packet when pipe < cwnd– Exit FR when all packets outstanding when FR was
entered are acknowledged6464 behnam shafagatybehnam shafagaty
TCP Vegas TCP Vegas (Brakmo & Peterson 1994)(Brakmo & Peterson 1994)
• Reno with a new congestion avoidance algorithm• Converges (provided buffer is large) !
SStime
window
CA
6565 behnam shafagatybehnam shafagaty
for every RTT
{
if W/RTTmin – W/RTT < then W ++
if W/RTTmin – W/RTT > then W --
}
for every loss
W := W/2
Congestion avoidanceCongestion avoidance
• Each source estimates number of its own packets in pipe from RTT
• Adjusts window to maintain estimate between d and d
6666 behnam shafagatybehnam shafagaty
ImplicationsImplications• Congestion measure = end-to-end
queueing delay• At equilibrium
– Zero loss– Stable window at full utilization– Approximately weighted proportional
fairness– Nonzero queue, larger for more sources
• Convergence to equilibrium– Converges if sufficient network buffer– Oscillates like Reno otherwise6767 behnam shafagatybehnam shafagaty
Wireless TCPWireless TCP
• Reno uses loss as congestion measure
• In wireless, significant losses due to– Fading– Interference– Handover– Not buffer overflow (congestion)
• Halving window too drastic– Small throughput, low utilization
6868 behnam shafagatybehnam shafagaty
Proposed solutionsProposed solutions
• Ideas– Hide from source noncongestion losses– Inform source of noncongestion losses
• Approaches– Link layer error control– Split TCP– Snoop agent– SACK+ELN (Explicit Loss Notification)
6969 behnam shafagatybehnam shafagaty
Third approachThird approach
• Problem– Reno uses loss as congestion measure– Two types of losses
• Congestion loss: retransmit + reduce window• Noncongestion loss: retransmit
– Previous approaches• Hide noncongestion losses• Indicate noncongestion losses
– Our approach• Eliminates congestion losses (buffer overflows)
7070 behnam shafagatybehnam shafagaty
Third approachThird approach
• Idea– REM clears buffer– Only noncongestion losses – Retransmits lost packets without reducing window
RouterREM capable
HostDo not use loss as congestion measure
Vegas
REM
7171 behnam shafagatybehnam shafagaty
PerformancePerformance• Goodput
7272 behnam shafagatybehnam shafagaty