Post on 11-Jan-2016
description
transcript
Shivkumar KalyanaramanRensselaer Polytechnic Institute
1
TCP (Part II)
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
shivkuma@ecse.rpi.edu
http://www.ecse.rpi.edu/Homepages/shivkuma
Shivkumar KalyanaramanRensselaer Polytechnic Institute
2
TCP interactive data flow TCP bulk data flow TCP congestion control TCP timers TCP futures and performance
Ref: Chap 19-24; RFC 793, 1323, 2001, papers by Jacobson, Karn/Partridge
Overview
Shivkumar KalyanaramanRensselaer Polytechnic Institute
3
Reliability models Reliability fundamentally requires redundancy to
recover from uncertain loss or other failure modes. Two types of redundancy:
Spatial redundancy: independent backup copies Forward error correction (FEC) codes Problem: requires huge overhead, since the FEC is also
part of the packet(s) it cannot recover from erasure of all packets
Temporal redundancy: retransmit if packets lost/errorRequires trading off response time for reliability Design of status reports and retransmission optimization
(see next slide) important
Shivkumar KalyanaramanRensselaer Polytechnic Institute
4
Temporal Redundancy modelP ackets (sequencenum bers, C R C or
checksum )
S tatus R eports (A C K s,N A K s, S A C K s or b itm aps)
R etransm issions
D etect b it/packet e rrors orlosses
The Tem poralR edundancy
M odel
T im eout
Shivkumar KalyanaramanRensselaer Polytechnic Institute
5
Status report design Cumulative acks:
Robust to losses on the reverse channel Can work with go-back-N retransmission Cannot pinpoint blocks of data which are lost
The first lost packet can be pinpointed because the receiver would generate duplicate acks
Selective acks: For a byte-stream model like TCP, need to specify ranges of bytes
received (requires large overhead) SACK is a TCP option over-and-above the cumulative acks
Bitmaps are not efficient because a bit is needed for every byte NAKs have same problems like SACKs and bitmaps, but also are not
robust to reverse channel losses
Shivkumar KalyanaramanRensselaer Polytechnic Institute
6
Retransmission optimization Default retransmission:
Go-back-N: I.e. retransmit the entire window. Triggered by timeout or persistent loss in TCP Not efficient if windows are large: high speed n/ws
Selective retransmission: Retransmit one packet based upon duplicate acks
Recovers quickly from isolated loss, but not from burst loss
SACK allows pinpointing retransmissions to just cover ranges of lost packets
Such retransmitted packets must finally be confirmed by acks since SACK is only an option and not reliable
Shivkumar KalyanaramanRensselaer Polytechnic Institute
7
TCP Interactive Data Flow Problems:
Overhead: 40 bytes header + 1 byte data To batch or not to batch: response time important
Batching acks: Delay-ack timer: piggyback ack on echo 200 ms timer (fig 19.3)
Batching data: Nagle’s algo: Don’t send packet until next ack is
received. Developed because of congestion in WANs
Shivkumar KalyanaramanRensselaer Polytechnic Institute
8
TCP Bulk Data Flow Sliding window:
Send multiple packets while waiting for acks (fig 20.1) upto a limit (W)
Receiver need not ack every packet Acks are cumulative. Ack # = Largest consecutive sequence number
received + 1 Two transfers of the data can have different
dynamics (eg: fig 20.1 vs fig 20.2) Receiver window field:
Reduced if TCP receiver short on buffers
Shivkumar KalyanaramanRensselaer Polytechnic Institute
9
TCP Bulk Data Flow (Contd) End-to-end flow control Window update acks: receiver ready Default buffer sizes: 4096 to 16384 bytes. Ideal: window and receiver buffer = bandwidth-
delay product TCP window terminology: figs 20.4, 20.5, 20.6
Right edge, Left edge, usable window “closes” => left edge (snd_una) advances “opens” => right edge advances (receiver buffer
freed => receiver window increases) “shrinks” => right edge moves to left (rare)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
10
The Congestion Problem Problem: demand outstrips available capacity … Q: Will the “congestion” problem be solved when:
a) Memory becomes cheap (infinite memory)?
No buffer Too late
All links 19.2 kb/s Replace with 1 Mb/s
SS SS SS SS SS SS SS SS
File Transfer Time = 7 hoursFile Transfer time = 5 mins
b) Links become cheap (high speed links)?
Shivkumar KalyanaramanRensselaer Polytechnic Institute
11
Ans: None of the above solves congestion ! Congestion: Demand > Capacity
It is a dynamic problem => Static solutions are not sufficient
TCP provides a dynamic solution
A
BSS
C
DScenario: All links 1 Gb/s. A & B send to C.
c) Processors become cheap (fast routers
switches)?
Shivkumar KalyanaramanRensselaer Polytechnic Institute
12
i
i
If information about i , and is known in a central location where control of i can be effected with zero time delays, the congestion problem is solved.
Problems: Incomplete information (eg: loss indications) Distributed solution required Congestion and control/measurement locations different Time-varying, heterogeneous time-delays
Shivkumar KalyanaramanRensselaer Polytechnic Institute
13
TCP Congestion Control Window flow control: avoid receiver overrun Dynamic window congestion control: avoid/control network
overrun Observation: Not a good idea to start with a large window
and dump packets into network Treat network like a black box and start from a window of 1
segment (“slow start”) Increase window size exponentially (“exponential
increase”) over successive RTTs => quickly grow to claim available capacity.
Technique: Every ack: increase cwnd (new window variable) by 1 segment.
Effective window = Min(cwnd, Wrcvr)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
14
Dynamics
Rate of acks = rate of packets at the bottleneck: “Self-clocking” property.
100 Mbps 10 Mbps
RouterQ
1st RTT 2nd RTT 3rd RTT 4th RTT
Shivkumar KalyanaramanRensselaer Polytechnic Institute
15
Congestion Detection Packet loss as an indicator of congestion.
Set slow start threshold (ssthresh) to min(cwnd, Wrcvr)/2
Retransmit pkt, set cwnd to 1 (reenter slow start)
Time (units of RTTs)
CongestionWindow (cwnd)
Receiver Window
IdleInterval
Timeout
1
ssthresh
Shivkumar KalyanaramanRensselaer Polytechnic Institute
16
Congestion avoidance
Increment cwnd by 1 per ack until ssthresh Increment by 1/cwnd per ack afterwards
(“Congestion avoidance” or “linear increase”) Idea: ssthresh estimates the bandwidth-delay product
for the connection. Initialization: ssthresh = Receiver window or default
65535 bytes. Larger values thru options. If source is idle for a long time, cwnd is reset to one
MSS.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
17
Implications of using packet loss as congestion indicator Late congestion detection if the buffer sizes larger Higher speed links or large buffers => larger windows =>
higher probability of burst loss Interactions with retransmission algorithm and timeouts
Implications of ack-clocking More batching of acks => bursty traffic (harder to manage) Less batching leads to a large fraction of Internet traffic
being just acks (huge overhead) Additive Increase/Multiplicative Decrease Dynamics:
TCP approximates these dynamics
Shivkumar KalyanaramanRensselaer Polytechnic Institute
18
Timeout and RTT Estimation Timeout: for robust detection of packet loss Problem: How long should timeout be ?
Too long => underutilization; too short => wasteful retransmissions
Solution: adaptive timeout: based on RTT
RTT estimation: Early method: exponential averaging:
R *R + (1 - )*M { M =measured RTT} RTO = *R { = delay variance factor} Suggested values: = 0.9, = 2
Shivkumar KalyanaramanRensselaer Polytechnic Institute
19
RTT Estimation Jacobson [1988]: this method has problems w/
large RTT fluctuations New method: Use mean & deviation of RTT
A = smoothed average RTT D = smoothed mean deviation Err = M - A { M = measured RTT} A A + g*Err {g = gain = 0.125} D D + h*(|Err| - D) {h = gain = 0.25} RTO = A + 4D Integer arithmetic used throughout. Complex
initialization process ...
Shivkumar KalyanaramanRensselaer Polytechnic Institute
20
Timer Backoff/Karn’s Algorithm Timer backoff: If timeout, RTO = 2*RTO
{exponential backoff} Retransmission ambiguity problem:
During retransmission, it is unclear whether an ack refers to a packet or its retransmission. Problem for RTT estimation
Karn/Partridge: don’t update RTT estimators during retransmission.
Restart RTO only after an ack received for a segment that is not retransmitted
Shivkumar KalyanaramanRensselaer Polytechnic Institute
21
Fast Retransmit and Recovery Goals:
Timeout avoidance: The 500 ms timer granularity can have an adverse performance impact especially for high speed n/ws
Selective retransmission: Especially when packets are dropped due to error or light congestion
Fast Recovery: Converge quickly to a state of congestion avoidance (linear increase) with half-current window -- the assumed ideal window size.
Observation: Receivers are required to send an immediate duplicate acknowledgment when they receives out-of-order data segments.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
22
Fast Retransmit and Recovery
3 duplicate acks => assume loss More duplicate acks => other packets have reached destination
safely. Wait for about 1/2*RTT, and resume transmitting new
segments for every subsequent duplicate ack received. Stop this process once the ack for the missing segment received
0500
Ack 500Ack 500Ack 500Ack 500Ack 500
FRR
Shivkumar KalyanaramanRensselaer Polytechnic Institute
23
Fast Retransmit and Recovery Fast Retransmit: Received third duplicate ack:
Set ssthresh to 1/2 of current cwnd Retransmit the missing segment Set cwnd to ssthresh+3
Fast Recovery: For each duplicate ack hence: Increment cwnd by 1 MSS New packets are transmitted once cwnd grows
large enough. [If old cwnd was a pipe of length 1*RTT, the
network gets a relief period of 1/2*RTT]
Shivkumar KalyanaramanRensselaer Polytechnic Institute
24
FRR (contd)
Upon receiving the next (non-duplicate) Ack: Set cwnd to ssthresh & enter linear growth phase
CWND
TIME
CWND/2
New packets sent during thisphase
Shivkumar KalyanaramanRensselaer Polytechnic Institute
25
FRR problems Burst loss of 3 pkts => Timeout + window shutdown
to cwnd/8 !
CWND
Time1st FastRetransmit 2nd Fast
Retransmit
Timeout
CWND/2
CWND/4CWND/8
W
Shivkumar KalyanaramanRensselaer Polytechnic Institute
26
TCP Performance Optimization
SACK: selective acknowledgments: specifies blocks of packets received at destination.
Random early drop (RED) scheme spreads the dropping of packets more uniformly and reduces average queue length and packet loss rate.
Scheduling mechanisms protect well-behaved flows from rogue flows.
Explicit Congestion Notification (ECN): routers use a explicit bit-indication for congestion instead of loss indications.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
27
Congestion control summary Sliding window limited by receiver window. Dynamic windows: slow start (exponential rise),
congestion avoidance (linear rise), multiplicative decrease.
Adaptive timeout: need mean RTT & deviation Timer back off and Karn’s algo during retransmission Go-back-N or Selective retransmission Cumulative and Selective acknowledgements Timeout avoidance: FRR Drop policies, scheduling and ECN
Shivkumar KalyanaramanRensselaer Polytechnic Institute
28
TCP Persist Timer Receiver flow control can set window to zero Receiver later sends “window update acks” But TCP does not transmit acks reliably => update
acks may be lost and source may be stuck at a zero window value
TCP uses persist timer to query the receiver periodically to find if the window has been increased.
Persist timer always bounded between 5s and 60s. It does exponential backoff like other timers too.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
29
Silly Window Syndrome A) The system operates at a small window (sends
segments which are not MSS-sized) even if the receiver grants a large window.
B) Receiver advertises small windows. Solution: batching
Receiver must not advertise small windows Sender waits until segment full before sending
(extension of Nagle’s algo), It can transmit everything if it is not waiting for
any ACK (or if Nagle’s algo has been disabled)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
30
TCP Keepalive timer
Optional timer. Not part of TCP spec, but found in most
implementations. Not necessary, because “connection” defined by
endpoints. Connection can be “up”as long as
source/destination “up”. Typical use: to detect idle clients or half-open
connections and de-allocate server resources tied up to them. Eg: telnet, ftp.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
31
Gigabit Networks “Higher Bandwidth Networks” Propagation latency unchanged.
Increasing bandwidth from 1.5Mb/s to 45 Mb/s (factor of 29) decreases file transfer time of 1MB by a factor of 25.
But, increasing from 1 Gb/s to 2 Gb/s gives an improvement of only 10% !
Transfer time = propagation time + transmission time + queueing/processing.
Design networks to minimize delay (queueing, processing, reduce retransmission latency)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
32
Long Fat Pipe Networks (LFN): Satellite links Need very large window sizes. Normally, Max window = 216 = 64 KBytes Window scale: Window = W × 2Scale
Window Scaling Option
Kind = 3 Length = 3 Scale Max window = 216 × 2255
Option sent only in SYN and SYN
+ Ack segments. RFC 1323
Shivkumar KalyanaramanRensselaer Polytechnic Institute
33
Timestamp option For LFNs, need accurate and more frequent RTT
estimates. Timestamp option:
Place a timestamp value in any segment. Receiver echoes timestamp value in ack If acks are delayed, the timestamp value returned
corresponds to the earliest segment being acked. Segments lost/retransmitted => RTT overestimated
Shivkumar KalyanaramanRensselaer Polytechnic Institute
34
PAWS: Protection against wrapped sequence numbers
Largest receiver window = 2^30 = 1 GB “Lost” segment may reappear before MSL, and the
sequence numbers may have wrapped around The receiver considers the timestamp as an extension
of the sequence number => discard out-of-sequence segment based on both seq # and timestamp.
Reqt: timestamp values need to be monotonically increasing, and need to increase by at least one per window
Shivkumar KalyanaramanRensselaer Polytechnic Institute
35
Summary
Interactive and bulk TCP flow TCP congestion control Informal exercises: Perform some of the experiments
described in chaps 19-21 to see various facets of TCP in action