+ All Categories
Home > Documents > Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B...

Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B...

Date post: 12-Mar-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
32
Transcript
Page 1: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger
Page 2: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

Yibo Zhu, Monia Ghobadi, JitendraPadhye (all Microsoft)

Page 3: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger
Page 4: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

0 5

10 15 20 25 30 35 40

4KB 16KB 64KB 256KB 1MB 4MB

Th

rou

ghpu

t (G

bps)

Message size

TCP

4

Small messages CPU is the bottleneckLarger msgs ~3 CPU

cores are burnt by TCP

Sender Receiver

0

10

20

TCP RDMA(read/write)

RDMA(send)

Tim

e t

o t

ran

sfe

r 2

KB

(m

s)

0

20

40

60

80

100

4KB 16KB 64KB 256KB 1MB 4MB

CP

U u

tiliz

ation

(%

)Message size

TCP

Page 5: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

5

RDMA bypasses host OS stack

frees host CPU, lowers latency

Memory

Buffer A

Write local buffer at address A

to remote buffer at address B

Buffer B is filled

DMA

NICApplication

NICApplicationMemory

Buffer B DMA

Sender

Receiver

Allocate

Allocate

Page 6: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

6

RDMA single thread ~40Gbps RDMA CPU ~0%

RDMA latency 1~2 μs

0

10

20

TCP RDMA(read/write)

RDMA(send)

Tim

e t

o t

ran

sfe

r 2

KB

(m

s)

0

20

40

60

80

100

4KB 16KB 64KB 256KB 1MB 4MB

CP

U u

tiliz

ation

(%

)

Message size

TCPRDMA

0 5

10 15 20 25 30 35 40

4KB 16KB 64KB 256KB 1MB 4MB

Th

rou

ghpu

t (G

bps)

Message size

TCPRDMA

Page 7: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

• Solution:

• Problem

7

Page 8: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

Enter DCQCN and TIMELY: Congestion Control for ROCEv2

ECN

Delay

Page 9: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger
Page 10: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger
Page 11: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

Takeaway:

DCQCN is a little too complicated

Page 12: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

DCQCN model matches simulations and implementation

TIMELY model matches simulations

Page 13: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

• Stability

• Rate of convergence

• Fairness

• High utilization

• Low flow completion time

Page 14: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

We don’t have an intuitive explanation

Page 15: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger
Page 16: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

Load factor = 0.8

Page 17: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger
Page 18: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

• Feedback is delayed as queue builds up

Page 19: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

T0, Q = 2

T1, Q = 3

T2, Q = 4

Blue packet arrival complete

Blue packet is about to arrive

Blue packet ready to depart

… and is marked, reflecting

state of queue at T2

Marking threshold = 4 packets

Page 20: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

T0, Q = 2

T1, Q = 3

T2, Q = 4

Blue packet arrival complete.

… timer starts

Blue packet is about to arrive

Blue packet ready to depart

… and reflects state of queue

at T0

Page 21: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

• Delay inherently reports “stale” information

• The staleness is affected by queue length!

• Longer queue more stale feedback

• This can lead to instability

Page 22: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

• Can have fixed queue or fairness – but not both!

Page 23: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

Bottleneck queue is a function of number of flows.

DCQCN (40Gbps link) TIMELY (10Gbps link)

Page 24: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger
Page 25: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

DCQCN with RED-like marking

DCQCN with PI-like marking

Page 26: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger
Page 27: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger
Page 28: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

• Can have fixed queue or fairness – but not both!

• ECN marking is resistant to feedback jitter

Page 29: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

0

20

40

60

80

100

120

140

0 0.05 0.1 0.15 0.2

Qu

eu

e(K

B)

Time(s)

TIMELYDCQCN

Page 30: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger
Page 31: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger
Page 32: Yibo Zhu, Monia Ghobadi, Jitendra Padhye (all Microsoft)...0 5 0 5 0 5 0 5 0 4KB 16KB 64KB 256KB B B T h r o u g h p u t (G b p s) e TCP 4 Small messages CPU is the bottleneck Larger

Recommended