Post on 28-May-2018
transcript
1
Internet Transport ProtocolsUDP / TCP
Prof. Anja Feldmann, Ph.D.anja@net.t-labs.tu-berlin.de
TCP/IP Illustrated, Volume 1, W. Richard Stevens http://www.kohala.com/start
2
Transport Layer: Outline
r Transport-layer servicesr Multiplexing and
demultiplexingr Connectionless transport:
UDP
r Connection-oriented transport: TCPm Segment structurem Reliable data transferm Flow controlm Connection management
r Principles of congestion control
r TCP congestion control
3
Internet Transport-Layer Protocols
r Network layer: Logical communication between hosts
r Transport layer: Logical communication between processes m Relies on, enhances,
network layer services
r More than one transport protocol available to appsm Internet:
• TCP• UDP
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysicalnetwork
data linkphysical
logical end-end transport
4
Sockets: interface to applications
Socket APIr Introduced in BSD4.1 UNIX,
1981r Explicitly created, used,
released by apps r Client/server paradigm r Two types of transport service
via socket API: m Unreliable datagram m Reliable, byte stream-
oriented
A host-local, application-created/owned,
OS-controlled interface (a “door”) into which
application process can both send and
receive messages to/from another (remote or
local) application process
socket
5
Sockets and OS
Socket: a door between application process and end-end-transport protocol (UCP or TCP)
process
TCP withbuffers,variables
socket
controlled byapplicationdeveloper
controlled byoperating
system
host orserver
process
TCP withbuffers,variables
socket
controlled byapplicationdeveloper
controlled byoperatingsystem
host orserver
internet
6
application
transport
network
link
physical
application
transport
network
link
physical
application
transport
network
link
physical
host 1 host 2 host 3
Multiplexing/Demultiplexing
P2 P4P1P3
= process= socket
Delivering received segmentsto correct application (socket)
Demultiplexing at rcv host:Gathering data from multipleappl. (sockets), enveloping data with header (later usedfor demultiplexing)
Multiplexing at send host:
7
Multiplexing/Demultiplexing
Multiplexing/demultiplexing:r Based on sender, receiver port
numbers, IP addressesm Source, dest port #s in each
segmentm Well-known port numbers
for specific applications(see /etc/services)
source port # dest port #
32 bits
applicationdata
(message)
other header fields
TCP/UDP segment format
8
Multiplexing/Demultiplexing: Examples
host A server Bsource port: xdest. port: 23
source port:23dest. port: x
Port use: simple telnet app
Source IP: CDest IP: B
source port: xdest. port: 80
WWW clienthost C
Source IP: CDest IP: B
source port: ydest. port: 80
WWW clienthost A
WWWserver B
Port use: WWW server
Source IP: ADest IP: B
source port: xdest. port: 80
9
UDP: User Datagram Protocol [RFC 768]
r “No frills,” “bare bones”Internet transport protocol
r “Best effort” service, UDP segments may be:m Lostm Delivered out of order to
applicationr Connectionless:
m No handshaking between UDP sender, receiver
m Each UDP segment handled independently of others
Why is there a UDP?r No connection establishment
(which can add delay)r Simple: no connection state
at sender, receiverr Small segment headerr No congestion control: UDP
can blast away as fast as desired
10
UDP: More
r Each user request transferred in a single datagram
r UDP has a receive buffer but no sender buffer
r Often used for streaming multimedia appsm Loss tolerantm Rate sensitive
r Other UDP uses (why?):m DNS, SNMP, NFS
r Reliable transfer over UDP: add reliability at application layer
source port # dest port #
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength, in
bytes of UDPsegment,including
header
11
UDP Checksum
rOnes complement of 16 bit words (same as IP)r Covers data plus a 12 byte pseudo headerm IP addresses, 0, protocol identifier, lengthm Ensures that packet has reached the correct host
r Pad byte in case of an odd packet length rOptional – Zero indicates no checksumm Should always be turned on
r Receiver has to verify checksum
12
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581
r Connection-oriented:m Handshaking (exchange
of control msgs) init’ssender, receiver state before data exchange
r Flow controlled:m Sender will not
overwhelm receiver
r Congestion controlled:m Sender will not
overwhelm the network
r Point-to-point:m One sender, one receiver
r Reliable, in-order byte stream:m No “message boundaries”
r Pipelined:m TCP congestion and flow
control set window size
r Full duplex data:m Bi-directional data flow in
same connectionm MSS: maximum segment
size
13
Simulating Transport Protocols
r Network simulatorr Examples: m Network Simulator (NS), SSFNet, …
r Animation of NS traces via NAM (Network Animator)
r Try it!
14
Simulating Transport Protocols
r Example: 2 TCP connections + 1 UDP flowr Topology:
r TCP1 starts at time 0 seconds, TCP2 at time 3 secondsr UDP starts at time 15 seconds
TCP 1
2 Mb25 ms
380 Kb10 ms
TCP 2
UDP
TCP 1
TCP 2
UDP
Node 1 Node 2Node 0
15
Simulation Results
16
TCP Segment Structure
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
ptr urgent datachecksum
FSRPAUheadlen
notused
Options (variable length)
URG: urgent data (generally not used)
ACK: ACK #valid
PSH: push data now(generally not used)
RST, SYN, FIN:connection estab(setup, teardown
commands)
# bytes rcvr willingto accept
countingby bytes of data(not segments!)
Internetchecksum
(as in UDP)
17
TCP Reliability: Seq. #’s and ACKsSeq. #’s:
m Byte stream“Number” of first byte in segment’s data
ACKs:m Seq # of next byte
expected from other side
m Cumulative ACKQ: Now receiver handles
out-of-order segmentsm A: TCP spec doesn’t
say, – up to implementer
Host A Host B
Seq=42, ACK=79, data = ‘C’
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Usertypes
‘C’
host ACKsreceipt
of echoed‘C’
host ACKsreceipt of‘C’, echoes
back ‘C’
timesimple telnet scenario
18
TCP: Reliable Data Transfer
r Packet loss detection:m Retransmission timeoutm Fast retransmit
• Three duplicate ACKs
r Retransmission mechanismm ARQ: Go-Back-N,
selected retransmissions
waitfor
event
waitfor
event
event: data received from application abovecreate, send segment
event: timer timeout for segment with seq # y
retransmit segment
event: ACK received,with ACK # y
ACK processing
r Simplified sender Assumptionm One way data transferm No flow, congestion
control
19
TCP: Retransmission Scenarios
Seq=92, 8 bytes data
ACK=100
lossX
Seq=92, 8 bytes data
ACK=100
time
Host A
lost ACK scenario
Host B
timeo
ut
ACK=100
Seq=92, 8 bytes data
Host A
time premature timeout,cumulative ACKs
Host B
ACK=120
Seq=100, 20 bytes data
Seq=92, 8 bytes data
Seq=
92 t
imeo
utSe
q=10
0 tim
eout
ACK=120
20
TCP ACK Generation [RFC 1122, RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed
Arrival of in-order segment withexpected seq #. One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq. # .Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK (reduces ACK traffic)
Immediately send single cumulative ACK, ACKing both in-order segments
Immediately send duplicate ACK, indicating seq. # of next expected byte(trigger fast retransmit)
Immediate send ACK, provided thatsegment starts at lower end of gap
21
TCP Retransmission Timeout
r TCP uses one timer for one pkt onlyr Retransmission Timeout (RTO) calculated dynamically
m Based on Round Trip Time estimation (RTT)m Wait at least one RTT before retransmittingm Importance of accurate RTT estimators:
• Low RTT à unneeded retransmissions• High RTT à poor throughput
m RTT estimator must adapt to change in RTT• But not too fast, or too slow!
m Spurious timeouts• “Conservation of packets” principle – more than a window worth of
packets in flight
22
Retransmission Timeout Estimatorr Round trip times exponentially averaged:
m New RTT = α (old RTT) + (1 - α) (new sample)
m 0.875 for most TCP’sr Retransmit timer set to β RTT, where β = 2
m Every time timer expires, RTO exponentially backed-off
r Key observation: At high loads round trip variance is high
r Solution (currently in use):m Base RTO on RTT and standard deviation of RTT:
RTT + 4 * rttvarm rttvar = χ * dev + (1- χ)rttvar
• dev = linear deviation (also referred to as mean deviation)• Inappropriately named – actually smoothed linear deviation
m RTO is discretized into ticks of 500ms (RTO >= 2ticks)
23
Retransmission AmbiguityA B
ACK
SampleRTT
Original transmission
retransmission
RTO
A BOriginal transmission
retransmissionSampleRTT
ACKRTOX
r Karn’s RTT Estimatorm If a segment has been retransmitted:m Don’t count RTT sample on ACKs for this segmentm Keep backed off time-out for next packetm Reuse RTT estimate only after one successful transmission
24
TCP Flow Control: Sliding Window Proto.Receiver: Explicitly informs
sender of (dynamically changing) amount of free buffer space m rcvr window size field in TCP segment
Sender: Amount of transmitted, unACKeddata less than most recently-receiver rcvr window size
sender won’t overrunreceiver’s buffers by
transmitting too much,too fast
flow control
receiver buffering
25
TCP Flow Controlr TCP is a sliding window protocolm For window size n, can send up to n bytes without
receiving an acknowledgement mWhen the data is acknowledged then the window
slides forward
rOriginal TCP always sent entire windowm Congestion control now limits this via congestion
window determined by the sender! (network limited)m If not data rate is receiver limited
r Silly window syndromem Too many small packets in flightm Limit the # of smaller pkts than MSS to one per RTT
26
Window Flow Control:
Sent but not acked Not yet sent
sender window
Next to be sent
Sent and acked
Acked but notdelivered to user
Not yetacked
Receive buffer
rcvr window
Sender Side
Receiver Side
27
Ideal Window Size
r Ideal size = delay * bandwidthm Delay-bandwidth product (RTT * bottleneck bitrate)
rWindow size < delay*bw _ wasted bandwidth
rWindow size > delay*bw _m Queuing at intermediate routers _ increased RTT
m Eventually packet loss
28
TCP Connection Management
Recall: TCP sender, receiver establish “connection”before exchanging data segments
r Initialize TCP variables:m Seq. #sm Buffers, flow control info (e.g. RcvWindow)mMSS and other options
r Client: connection initiator, server: contacted by client
r Three-way handshake m Simultaneous open
r TCP Half-Close (four-way handshake)r Connection aborts via RSTs
29
TCP Connection Management (2)Three way handshake:
Step 1: Client end system sends TCP SYN control segment to serverm Specifies initial seq #m Specifies initial window #
Step 2: Server end system receives SYN, replies with SYNACK control segment
m ACKs received SYNm Allocates buffersm Specifies server ? receiver initial seq. #m Specifies initial window #
Step 3: Client system receives SYNACK
30
TCP Connection Management (3)
Closing a connection:
Client closes socket:clientSocket.close();
Step 1: Client end system sends TCP FIN control segment to server
Step 2: Server receives FIN, replies with ACK. Closes connection, sends FIN.
ACK
client server
FINclose
FINclose
31
TCP Connection Management (4)
Step 3: Client receives FIN, replies with ACK.
m Enters “timed wait” – will respond with ACK to received FINs
Step 4: Server, receives ACK. Connection closed.
Note: With small modification, can handly simultaneous FINs.
client
FIN
server
ACK
FIN
closing
closing
closed
timed
wai
t ACK
closed
32
TCP Connection Management (5)
TCP client lifecycle
33
TCP Connection Management (cont)TCP server lifecycle
34
TCP state machine
35
Excursion: Congestion Control Principles
36
TCP Acknowledgement Clocking
r TCP is “self-clocking”r New data sent when old data is ackedr Ensures an “equilibrium”r But how to get started? m Slow Startm Congestion Avoidance
rOther TCP featuresm Fast Retransmissionm Fast Recovery
37
TCP Congestion Control:
r Two “phases”m Slow startm Congestion avoidance
r Important variables:m Congwinm threshold (ssthresh):
Defines threshold between two slow start phase, congestion control phase
r “Probing” for usable bandwidth:m Ideally: Transmit as fast as
possible (Congwin as large as possible) without loss
m Increase Congwin until loss (congestion)
m Loss: Decrease Congwin, then begin probing (increasing) again
38
TCP Slowstart
r Exponential increase (per RTT) in window size (not so slow!)
r Loss event: Timeout (Tahoe TCP) and/or or three duplicate ACKs (Reno TCP)
initialize: Congwin = 1for (each segment ACKed)
Congwin++until (loss event OR
CongWin > threshold)
Slowstart algorithm
one segment
RTT
Host A Host B
time
two segments
four segments
39
Congestion Avoidance
r Loss implies congestion – why?m Not necessarily true on all link types
r If loss occurs when cwnd = Wm Network can handle 0.5W ~ W segmentsm Set cwnd to 0.5W (multiplicative decrease)
r Upon receiving new ACKm Increase cwnd by 1/cwnd m Results in additive increase
40
TCP Congestion Avoidance
/* slowstart is over */ /* Congwin > threshold */Until (loss event) {every w segments ACKed:
Congwin++}
threshold = Congwin/2Congwin = 1perform slowstart
Congestion avoidance
1
1: TCP Reno skips slowstart (fast recovery) after three duplicate ACKs
41
Return to Slow Start
r If packet is lost we loose self clockingm Need to implement slow-start and congestion
avoidance together
rWhen timeout occurs m Set threshold to 0.5 W (current window size)m Set cwnd to one segment
rWhen three duplicate acks occur:m Set threshold to 0.5 Wm Retransmit missing segment == Fast Retransmitm cwnd = threshold + number of dupacksm Upon receiving acks cwnd = threshold (cut in half!)m Use congestion avoidance == Fast Recovery
42
TCP Congestion Control
r End-end control (no network assistance)r TCP throughput limited by rcvr window (flow control)r Transmission rate limited by congestion window size,
Congwin, over segments:
r w segments, each with MSS bytes sent in one RTT
Congwin
43
0
5
10
15
20
25
30
Time
Num
ber DATA
ACKcwndinFlight
After fast recovery
Fast Recovery Example
r cwnd =6; in congestion avoidance
44
Sequence Number Plot (Simulation)
45
Seq. Number Plot (Simulation) zoom
46
TCP Flavors / Variants
r TCP Tahoem Slow Startm Congestion Avoidancem Timeout, 3 duplicate acks ? cwnd = 1 _ slow start
r TCP Renom Slow-startm Congestion avoidancem Fast retransmit, Fast recoverym Timeout ? cwnd = 1 _ slow startm Three duplicate acks ? Fast Recovery,
Congestion Avoidance
47
Extensions
r Fast recovery, multiple losses per RTT _ timeoutr TCP New-Reno
m Stay in fast recovery until all packet losses in window are recovered
m Can recover 1 packet loss per RTT without causing a timeout
r Selective Acknowledgements (SACK) [rfc2018]m Provides information about out-of-order packets
received by receiverm Can recover multiple packet losses per RTT
48
Additional TCP Features
r Urgent Datam Nice for interactive applicationsm In-Band via urgent pointer
r Nagle algorithmm Avoidance of small segmentsm Needed for interactive applicationsmMethodology: only one outstanding packet can be small
49
Summary
r Reviewed principles of transport layer:m Reliable data transferm Flow controlm Congestion controlm (Multiplexing)
r Instantiation in the InternetmUDPmTCP