Yibo Zhu, Monia Ghobadi, JitendraPadhye (all Microsoft)
0 5
10 15 20 25 30 35 40
4KB 16KB 64KB 256KB 1MB 4MB
Th
rou
ghpu
t (G
bps)
Message size
TCP
4
Small messages CPU is the bottleneckLarger msgs ~3 CPU
cores are burnt by TCP
Sender Receiver
0
10
20
TCP RDMA(read/write)
RDMA(send)
Tim
e t
o t
ran
sfe
r 2
KB
(m
s)
0
20
40
60
80
100
4KB 16KB 64KB 256KB 1MB 4MB
CP
U u
tiliz
ation
(%
)Message size
TCP
5
RDMA bypasses host OS stack
frees host CPU, lowers latency
Memory
Buffer A
Write local buffer at address A
to remote buffer at address B
Buffer B is filled
DMA
NICApplication
NICApplicationMemory
Buffer B DMA
Sender
Receiver
Allocate
Allocate
6
RDMA single thread ~40Gbps RDMA CPU ~0%
RDMA latency 1~2 μs
0
10
20
TCP RDMA(read/write)
RDMA(send)
Tim
e t
o t
ran
sfe
r 2
KB
(m
s)
0
20
40
60
80
100
4KB 16KB 64KB 256KB 1MB 4MB
CP
U u
tiliz
ation
(%
)
Message size
TCPRDMA
0 5
10 15 20 25 30 35 40
4KB 16KB 64KB 256KB 1MB 4MB
Th
rou
ghpu
t (G
bps)
Message size
TCPRDMA
• Solution:
• Problem
7
Enter DCQCN and TIMELY: Congestion Control for ROCEv2
ECN
Delay
Takeaway:
DCQCN is a little too complicated
DCQCN model matches simulations and implementation
TIMELY model matches simulations
• Stability
• Rate of convergence
• Fairness
• High utilization
• Low flow completion time
We don’t have an intuitive explanation
Load factor = 0.8
• Feedback is delayed as queue builds up
T0, Q = 2
T1, Q = 3
T2, Q = 4
Blue packet arrival complete
Blue packet is about to arrive
Blue packet ready to depart
… and is marked, reflecting
state of queue at T2
Marking threshold = 4 packets
T0, Q = 2
T1, Q = 3
T2, Q = 4
Blue packet arrival complete.
… timer starts
Blue packet is about to arrive
Blue packet ready to depart
… and reflects state of queue
at T0
• Delay inherently reports “stale” information
• The staleness is affected by queue length!
• Longer queue more stale feedback
• This can lead to instability
• Can have fixed queue or fairness – but not both!
Bottleneck queue is a function of number of flows.
DCQCN (40Gbps link) TIMELY (10Gbps link)
DCQCN with RED-like marking
DCQCN with PI-like marking
• Can have fixed queue or fairness – but not both!
• ECN marking is resistant to feedback jitter
0
20
40
60
80
100
120
140
0 0.05 0.1 0.15 0.2
Qu
eu
e(K
B)
Time(s)
TIMELYDCQCN