Copyright Reserved 2001 1
Modern Computer NetworksAn Open Source Approach
Chapter 4: End-to-End Protocols
Ying-Dar Lin
Copyright Reserved 2001 2
Content4.1 Issues
Port-Multiplexing, Reliability, Flow/Congestion Control
4.2 UDP - Unreliable Connectionless Transfer 4.3 TCP - Reliable Connection-Oriented Transfer
Connection ManagementReliabilityFlow ControlPerformance
4.4 Programming Interface: Socket4.5 Real-time Transport (RTP & RTCP) 4.6 Book roadmapPitfalls and misleadingFurther readingExercises
Copyright Reserved 2001 3
4.1 Issues
Copyright Reserved 2001 4
4.1 IssuesEnd-to-End Communication Channel: Port-Multiplexing
Port: communication end point
Reliability: Per-Link vs. Per-Packet + Per-FlowNote: per-link reliability such as Ethernet
Collision: can be detected and be retransmittedCRC/alignment error: can only rely on upper-layer protocols
Flow/Congestion Control: Per-Link vs. Per-Flow
Multi-Access Channel
MACMAC IP Networks TCPTCP
AP1 AP2 AP1 AP2IP IP
Condense delay distribution Loose delay distribution
Node-to-Node Channel End-to-End Channel
LAN host 1 LAN host 2 IP host 1 IP host 2
Copyright Reserved 2001 5
4.2 UDP – Unreliable Connectionless Transfers
Copyright Reserved 2001 6
4.2 UDP – For Unreliable Connectionless Transfers
ObjectivesPort-Multiplexing
Per-Packet Reliability: Checksum
Header Format…
Carrying Unicast/Multicast Real-Time TrafficRetransmission is Meaningless: No Per-Flow Reliability NeededBit-rate is Determined by Codec Used: No Flow Control Needed
IP Networks TCPTCP
AP1 AP2 AP1 AP2
IP host 1 IP host 2
Copyright Reserved 2001 7
Elementary Socket: UDP Client/Server
Copyright Reserved 2001 8
4.3 TCP – Reliable Connection-Oriented Transfers
Copyright Reserved 2001 9
4.3 TCP – For Reliable Connection-Oriented Transfers
ObjectivesPort-Multiplexing: Same as UDPPer-Flow ReliabilityPer-Flow Flow Control
Connection ManagementConnection Establishment & State Transitions
Per-Flow ReliabilityPer-Packet Checksum & Per-Flow ACKs
Per-Flow Flow/Congestion ControlPerformance
Interactive vs. Bulk-Data Transfers
Stateful (Ch1) !! Requires connection management
Copyright Reserved 2001 10
Elementary Socket: TCP Client/Server
Copyright Reserved 2001 11
TCP Connection Management
Establishment/Termination – 3-Way Handshake Protocol
SYN
ACK of SYNSYN
ACK
FIN
ACK of FIN
ACK
FIN
Establishment Termination
Copyright Reserved 2001 12
TCP State Transition Diagram
Copyright Reserved 2001 13
Reliability of Data Transfers
Per-Packet Reliability: ChecksumIn Linux:
Per-Flow Reliability: Sequence Number & ACKACK every successfully received data packetProtection Against Wrapped Sequence Numbers (PAWS)
Retransmitting Lost PacketsWhen to Retransmit Which?
IP Header TCP Header Application Data
return 0
csum=csum_partial(D,lenD,0)
csum=csum_partial(T,lenT,csum)
ip_send_check(iph)
Copyright Reserved 2001 14
Retransmitting Lost Packets
Retransmit Which Packet?Fast RetransmitTowards Better Accuracy: TCP SACK Option
Further Refinement: FACK (based on SACK)
When to Retransmit?Fast Retransmit: same as aboveRetransmission Timeout (RTO)
Round-Trip Time (RTT) MeasurementTradeoff: RTT vs. RTO
Karn’s Algorithm
Towards Better RTO: TCP Timestamp Option
Copyright Reserved 2001 15
Retransmit Which Packet?
Fast RetransmitDuplicate ACKs
Packet ReorderingPacket LossInternet Route Change
TCP Receiver ACK the First “Hole”
Triple Duplicate ACKs (TDA)4 Same ACKs (ACK field=X)TCP Sender Infer TDA as CongestionRetransmit the Packet with SeqNum=XHalve Its Sending Rate
2 3 6 7
3 4 4 4
8
4
2 3 6 7 8
Time at Receiver
ACK
DATA
Copyright Reserved 2001 16
When to Retransmit?
Retransmission TimeOut (RTO)Round-Trip Time (RTT) Measurement vs. RTO
RTT: Varying DramaticallySmoothed RTT (SRTT) : Exponential Weighted Moving AverageMdev: Mean Deviation of RTT
RTO=SRTT+4*Mdev
Karn’s AlgorithmDon’t Update RTO When Retransmission is Also Lost
Copyright Reserved 2001 17
Per-Flow Flow/Congestion Control
How Fast to Send?Fast Sender vs. Slow Receiver
How to Know?Feedback RWND (Receiver Advertised Window) in ACK by Receiver
Fast Sender vs. Congested NetworkHow to Know?
Feedback Loss Events by NetworkRe-adjust (Congestion Window) CWND
How Fast? Satisfy Both: min (RWND, CWND)
Copyright Reserved 2001 18
Per-Flow Flow/Congestion Control
Sliding Window
3 9 10
TCP Window Size( = min(RWND, CWND) )
DATA 8
DATA 7
DATA 6
Next=6ACK
Next=5ACK
Receiver
2Sending Stream
Sent & ACKed To be sentWhen window moves
Network Pipe
sliding
Copyright Reserved 2001 19
Per-Flow Flow/Congestion Control
Opening & Shrinking of Window Size
3 9 10
TCP Window Size( = min(RWND, CWND) )
2
Open Shrink Close
Copyright Reserved 2001 20
TCP Congestion Control (Linux 2.2.17)
05
10152025303540
0 0.5 1 1.5 2 2.5 3
Cw
nd (
pack
et)
Time (sec)
(a) Window Variation
cwndrwndssth
0
50
100
150
200
250
0 0.5 1 1.5 2 2.5 3
seq_
num
(K
B)
Time (sec)
(b) Sending bytes
sequence numberacknowledgement
slow-start
congestion avoidance
triple-duplicate ACKs
fast retransmit
pipe limitssth reset
fast recovery
Copyright Reserved 2001 21
Summary: Properties of TCP
Per-Flow Reliability Through ACKsWindow-based Flow ControlSelf-clocking using ACKs
Copyright Reserved 2001 22
TCP Performance & Enhancement
Interactive ConnectionsSide Effect: Silly Window Syndrome (SWS)
Problem: transactions using small packetsSolution: Clark & Nagle
Bulk Data TransfersBandwidth Delay Product (BDP)Modeling TCP Throughput
Copyright Reserved 2001 23
Performance of Interactive Connections
Problem: Silly Window Syndrome (SWS)Sender transmits small packetsReceiver advertises small window
SolutionSend either
Data Accumulated to Full-sized SegmentData Accumulated to ½ RWNDNagle’s Algorithm Disabled/Not Applied
(Nagle:) Don’t send small packets until no un-ACKed packetsSmall RTT: Nagle is rarely used (many small packets)Large RTT: Nagle is oftern used (few large packets)
Receiver eitherBuffer available to Full-sized SegmentBuffer available to ½ of receiver’s buffer space
Copyright Reserved 2001 24
Performance of Bulk Data Transfers
Bandwidth Delay Product (BDP)
Horizontal Dimension: TimeVertical Dimension: Bandwidth Shaded Area: Packet SizeBDP=pipe size=Bandwidth x Delay
Problem: ACK-Compression when 2-Way Traffic => Bursty TrafficAsymmetric Paths
Copyright Reserved 2001 25
Performance of Bulk Data Transfers
Animation: Fill the Pipe Using TCP Congestion AvoidanceExample: Pipe size=6, send from left to right across the pipe
Upper pipe: data
Lower pipe: ACK
Copyright Reserved 2001 26
Performance of Bulk Data Transfers
What have you observed?
When RTTs are heterogeneous……
Copyright Reserved 2001 27
Performance of Bulk Data Transfers
Modeling TCP ThroughputGiven RTT, segment size s, loss rate p:
where c is a constant valueGiven additional information: Max Window Size Wm, # delayed ACK b, RTO
pt
scpstT
RTT
RTT
⋅⋅=),,(
+
+⋅
⋅=)321(
83
3,1min3
2,min),,,(
2ppbp
tbp
t
s
t
sWpsttT
RTORTTRTT
mRTORTT
Copyright Reserved 2001 28
UDP Header Format
Copyright Reserved 2001 31
4.4 Socket Programming Interface
Copyright Reserved 2001 32
4.4 Socket Programming Interface
Issue: Programming Interface to End-to-End ProtocolsProtocol Stack vs. Interfaces
BSD SocketINET Socket
TCP/UDPIP
NIC DriverEthernet
ARPICMP …
Socket Library
Copyright Reserved 2001 33
Bridging Applications & End-to-End Protocols
socket(domain, type, protocol)INET domain: PF_INETtype
UDP: SOCK_DGRAMTCP: SOCK_STREAM
Protocol: NULL
Typical Applications:telnetftpHTTP
Copyright Reserved 2001 34
Elementary Socket: TCP Client/Server
Copyright Reserved 2001 35
Elementary Socket: UDP Client/Server
Copyright Reserved 2001 36
Socket R/W in Linux: Kernel vs. User Space
Copyright Reserved 2001 37
Bridging Applications to Internetworking Protocols
socket(domain, type, protocol)PACKET domain: PF_PACKETtype: SOCK_DGRAMProtocol: NULL
Typical Applications:pingtraceroute
Copyright Reserved 2001 38
Bridging Applications to Node-to-Node Protocols
socket(domain, type, protocol)PACKET domain: PF_PACKETtype: SOCK_RAWEthernet Encapsulated IP packet: ETH_P_IP
Or others in “/usr/include/linux/if_ether.h”
Typical Applications:Packet sniffersHacking tools
Copyright Reserved 2001 39
Packet Capturing & Filtering
Capture until what header?
Towards Efficient Packet Filtering: Layered ModelUser-Space Tool: tcpdumpUser-Space Packet Filter: libpcap (portable)Kernel-Space Packet Filter: Linux Socket Filter
Copyright Reserved 2001 40
Linux Socket Filter (since post-2.0 releases)
Similar to BPF (Berkley Packet FIilter)
Copyright Reserved 2001 41
4.5 Real-time Transport Protocols
Copyright Reserved 2001 42
4.5 Real-time Transport Protocols
Issues: Codec Encapsulation & Path Quality ReportData-Plane: Video/Voice Codecs
Video: H.263…Voice: G.729…
Control-Plane: Delay/Jitter/Loss Report
RFC Standards: RTP & RTCPRTP: Data-Plane, Encapsulating the Chosen CodecRTCP: Control-Plane, Reporting Delay/Jitter/Loss to Senders
Copyright Reserved 2001 43
RTP (Real-Time Protocol)
ObjectivesEliminating Packet Reorder & Loss Detection: Sequence #TimestampSynchronization Source IdentifierContributing Source Identifier
Header Format
Copyright Reserved 2001 44
RTCP (Real-Time Transport Protocol)
ObjectivesReporting End-to-End DelayReporting Delay JitterReporting Loss Rate
Copyright Reserved 2001 45
Pitfalls & Fallacies
Throughput vs. GoodputGoodput is effective throughput
Window Size: Packet Count vs. Byte ModeRSVP vs. RTP vs. RTCP vs. RTSP
RSVP: signaling protocolRTP: transport protocolRTCP: control of RTPRTSP: initiates and directs the media delivery from media servers
Copyright Reserved 2001 46
Further Readings
TCP BasicsRFC 793R. Stevens, TCP/IP Illustrated Volume 1, Addison Wesley, 1994.
TCP VersionsK. Fall, and S. Floyd, Simulation-based Comparisons of Tahoe, Reno, and SACK TCP, ACM Computer
Communication Review, Vol. 26 No. 3, pp.5-21, Jul. 1996.J. Padhye, and S. Floyd, On Inferring TCP Behavior, ACM SIGCOMM'2001, San Diego, USA, Aug. 2001.
Modeling TCP ThroughputJ. Padhye, V. Firoiu, D. Towsley, and J. Kurose, Modeling TCP Throughput: A Simple Model
and its Empirical Validation, ACM SIGCOMM'98, Vancouver, British Columbia, Sep. 1998.