+ All Categories
Home > Documents > Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing...

Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing...

Date post: 05-Aug-2018
Category:
Upload: buixuyen
View: 225 times
Download: 0 times
Share this document with a friend
45
OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu [email protected] Advisor: Professor Robert D. Russell University of New Hampshire
Transcript
Page 1: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

OpenFabrics

Software

User Group

Workshop

Analyzing InfiniBand Packets

Qian Liu

[email protected]

Advisor: Professor Robert D. Russell

University of New Hampshire

Page 2: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

Presentation Overview

• 1. Why analyze IB packets

• 2. How to capture IB packets

• 3. Comparison of IB capture tools

• 4. Our use of the tools to analyze packets

March 19, 2015 #OFAUserGroup 2

Page 3: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

1. Why analyze IB packets

• Protocol study, debug, verification, and research

• Monitor IB network performance

• Analyze inter-packet delay (IPD)

• Observe Flow Control and Congestion Control

March 19, 2015 #OFAUserGroup 3

Page 4: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

2. How to capture IB packets

• ibdump…………………………………Software package running on nodes

http://www.mellanox.com/

• CatC analyzer………………………Hardware box inline between ports

http://www.teledynelecroy.com/

March 19, 2015 #OFAUserGroup 4

Page 5: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

2. How to capture IB packets

• ibdump features

– Software package freely available from Mellanox Technologies

http://downloads.linux.hp.com/downloads/MLNX_OFED/suse/SLES11-

SP2/x86_64/2.2_1.0.1/ibdump-2.0.0-8.x86_64.rpm

– Requires NO physical change to the network

– Runs on an IB host & Captures packets on an IB interface on that host

– Works for all IB data rates: SDR, DDR, QDR, FDR10, FDR

– Dumps a.pcap file which can be loaded by Wireshark http://www.wireshark.com/

March 19, 2015 #OFAUserGroup 5

Page 6: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

Wireshark view of ibdump capture

March 19, 2015 #OFAUserGroup 6

Page 7: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

ibdump

• ibdump limitations

– Cannot capture Flow Control Packets (FCP)

– Packets may get lost if the data rate is high, e.g. FDR (56Gbits/s)

– Works only on Mellanox HCAs

– Doesn’t work between switches because it is software running on nodes

– Max capture size depends on the available host RAM or Disk space

– Inaccurate packet timestamps (in microsecond) (show this next)

March 19, 2015 #OFAUserGroup 7

Page 8: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

Inaccurate microsecond timestamps in ibdump

March 19, 2015 #OFAUserGroup 8

Page 9: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

2. How to capture IB packets

• CatC analyzer features

– Hardware analyzer from LeCroy

https://www.teledynelecroy.com

– Must be physically placed into an IB link between two IB ports

– Dumps an .ibt file which can be loaded by its IBTracer software

– Works only for SDR (8Gbits/s) data rate

– Works for any type of IB HCAs and switches

– Accurate packet timestamps (in nanosecond)

– Captures ALL packets on the link, including Flow Control Packets (FCP)

March 19, 2015 #OFAUserGroup 9

Page 10: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

CatC analyzer

• Captures packets passing through it in both directions

March 19, 2015 #OFAUserGroup 10

Page 11: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

CatC analyzer Capture

March 19, 2015 #OFAUserGroup 11

Page 12: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

CatC analyzer

• CatC analyzer limitations

– Only works for SDR (8Gbits/s) data rate

– 2GB recording capacity

– Doesn’t dump in .pcap format, so its capture file cannot use Wireshark

March 19, 2015 #OFAUserGroup 12

Page 13: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

3. Comparison between ibdump & CatC analyzer captures First Experiment

One data source is sending 128Mi bytes (MTU = 2k, 65536 packets),

by using RDMA_WRITE, to the receiver via a MLNX SX6036 switch.

Because there is no competing flow, therefore, there should be no

congestion on the link.

ibdump on both sides are running at the same time

March 19, 2015 #OFAUserGroup 13

Page 14: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

3. Comparison between ibdump & CatC analyzer captures First Experiment

Transferring data packets on a SDR (8Gbits/s) link with no congestion,

if each data packet has 2048 bytes payload (MTU is 2k),

The inter-packet time should be around:

2048 bytes * 8 / (8Gbits/s) = 2 us

March 19, 2015 #OFAUserGroup 14

Page 15: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

3. Comparison between ibdump & CatC captures on the receive side

First Experiment

March 19, 2015 #OFAUserGroup 15

Page 16: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

3. ibdump_receiver raw data First Experiment

March 19, 2015 #OFAUserGroup 16

Page 17: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

Comparison of CatC analyzer captures on both sides First Experiment

March 19, 2015 #OFAUserGroup 17

Page 18: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

Comparison of ibdump captures on both sides First Experiment

March 19, 2015 #OFAUserGroup 18

Page 19: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

ibdump_sender and ibdump_receiver raw data First Experiment

March 19, 2015 #OFAUserGroup 19

Page 20: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

3. Comparison between ibdump & CatC analyzer captures Second Experiment

Two data sources, each is sending 128Mi bytes, by using RDMA_WRITE, to

the single receiver via a MLNX SX6036 switch.

The expected inter-packet interval from the same source should be 4 us

March 19, 2015 #OFAUserGroup 20

Page 21: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

Comparison of two sender flows on CatC receive side Second Experiment

March 19, 2015 #OFAUserGroup 21

Page 22: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

Comparison of ibdump sender 1 flow on both sides Second Experiment

March 19, 2015 #OFAUserGroup 22

Page 23: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

ibdump sender 1 flow raw data on both sides Second Experiment

March 19, 2015 #OFAUserGroup 23

Page 24: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4. Our use of the tools to analyze packets

• 4.1 Flow Control mechanism

• 4.2 Study of the switch buffer size

• 4.3 Study of the tick value

March 19, 2015 #OFAUserGroup 24

Page 25: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.1 Flow Control Mechanism

• InfiniBand – Link Layer Flow Control (FC) mechanism

• IB sender will NOT send data packets unless it knows for

sure that the other side of the physical link has enough

buffer to hold the data

• Flow Control Packets (FCPs) are used to report the

available buffer space

• Only CatC analyzer can capture FCPs

March 19, 2015 #OFAUserGroup 25

Page 26: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.1 Flow Control Mechanism

• FCP format

• If A sends a FCP to B, then

– FCTBS: total blocks A has sent to B since link initialization

– FCCL: the sum of the total blocks A has received from B, plus the available buffer space in A’s receive buffer

– Both numbers are increasing monotonically, modulo 4096

– One block is 64 bytes of buffer space

March 19, 2015 #OFAUserGroup 26

Page 27: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.1 Flow Control Mechanism

• Experiment:

– A sender is sending 128Mi bytes of data to a receiver, using RDMA_WRITE

– MTU = 2k, 65536 data packets

– Each packet is at least 2048 + 8 + 12 + 6 = 2074 bytes.

– Each packet occupies = 33 FC blocks

March 19, 2015 #OFAUserGroup 27

Page 28: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.1 Flow Control Mechanism

• Starting FCCL/FCTBS before A (Tx) sends data packets to B (Rx)

A has sent 1404 blocks to B

A receives a FCP from B, in which the FCCL value is 3206

3206 = total blocks B has received from A + the available receive buffer space in B

3206 – 1404 >> 33, based on this calculation, A is able to send a data packet

March 19, 2015 #OFAUserGroup 28

Page 29: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.1 Flow Control Mechanism

• FCCL value update -> means one or more blocks are released in B’s

receive buffer

March 19, 2015 #OFAUserGroup 29

Page 30: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.1 Flow Control Mechanism

• FCTBS value update

March 19, 2015 #OFAUserGroup 30

Page 31: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.1 Flow Control Mechanism

– Before A sends data packets to B, the starting FCTBS value is 1404

……………………………

– The latest FCTBS value is 2262

– (2262 - 1404) / 33 = 26 data packets have been sent from A to B

March 19, 2015 #OFAUserGroup 31

Page 32: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.2 Study of switch buffer size

• Object:

MLNX SX6036 FDR switch

Use the CatC analyzer to determine the switch buffer size

Assumption:

1. input-queued switch

2. shared buffer per port, divided by the available Virtual Lanes (VLs)

March 19, 2015 #OFAUserGroup 32

Page 33: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.2 Study of switch buffer size

The buffer size is an indicator of the latency a program may experience

SDR 1 and SDR 2, two senders are sending data to a SDR receiver

MTU 2k, data transmission is on VL0 (Start SDR 2 later than SDR 1)

1. at the very beginning, each SDR sender can inject packets in 2us

2. when congestion occurs, each SDR sender can only inject packets in 4us

March 19, 2015 #OFAUserGroup 33

Page 34: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.2 Study of switch buffer size

March 19, 2015 #OFAUserGroup 34

• Buffer space on each port is not full

• Packets can be put in 2us interval

SDR 1

SDR 2

SDR R

Switch

VL Buffer

VL Buffer

Page 35: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.2 Study of switch buffer size

March 19, 2015 #OFAUserGroup 35

• Buffer space on each port is full

• Senders have to wait until there are enough buffer space on switch

port to hold the data packets

SDR 1

SDR 2

SDR R

Switch

VL Buffer

VL Buffer

Page 36: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.2 Study of switch buffer size

A2: The first data packet of SDR 2 (SDR 2 is started later than SDR 1)

B1: The first SDR 1 data packet whose inter-packet interval on its sending side is 4us

March 19, 2015 #OFAUserGroup 36

Page 37: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.2 Study of switch buffer size

On Mellanox SX6036 switch,

By counting the number of the green packets in the 2nd phase,

the determined switch input VL buffer space is around 32Ki bytes.

With configuration of 4 VLs, 4 * 32Ki = 128Ki bytes for each input port

March 19, 2015 #OFAUserGroup 37

Page 38: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.3 Study of tick value

Congestion Indicator (counter) PortXmitWait:

Port counter that is used to indicate the "number of ticks during

which selected port had data to transmit but none was sent during

the entire tick either because of insufficient credits or due to

lack of arbitration"

March 19, 2015 #OFAUserGroup 38

Page 39: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.3 Study of tick value

PortXmitWait:

What is the tick?

Tick indicates the node’s sampling clock interval:

encoding value * symbol time

symbol time:

the time required to transmit an 8 bit data quantity onto a physical lane

(SDR symbol time 4ns)

encoding value:

multiple of the symbol time. 1 ~ 256

# perfquery –c LID Port_Number

March 19, 2015 #OFAUserGroup 39

Page 40: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.3 Study of tick value

Both buffers are empty

Buffers are full

Buffers are filling up

March 19, 2015 #OFAUserGroup 40

SDR 1

SDR 2

SDR R

Switch

PortXmitWait unchanged

SDR 1

SDR 2

SDR R

Switch

PortXmitWait unchanged

SDR 1

SDR 2

SDR R

Switch

PortXmitWait incremented

Page 41: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.3 Study of tick value

A2: Time when SDR R starts receiving packets from both competing flows

B1: Time when the inter-packet intervals on each sender side go up to 4us

L: Time when SDR R receives the last SDR 1 data packet

March 19, 2015 #OFAUserGroup 41

Page 42: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.3 Study of tick value

• Tick =

• Duration of the Congestion = TIMEB1-L - TIMEregular

March 19, 2015 #OFAUserGroup 42

Page 43: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

4.3 Study of tick value

• Congestion time

MLNX MT26428 QDR CA

encoding value = 31 = 0x1F

# perfquery –c LID 1

Tick……………………..0x1F

March 19, 2015 #OFAUserGroup 43

Page 44: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

Acknowledgement

I would like to thank for their support

- My advisor, Professor Robert D. Russell

- National Science Foundation Grant OCI-1127228

- Software Forge, Inc. -- for the loan of the CatC analyzers

- University of New Hampshire InterOperability Lab (UNH IOL)

March 19, 2015 #OFAUserGroup 44

Page 45: Analyzing InfiniBand Packets - OpenFabrics · OpenFabrics Software User Group Workshop Analyzing InfiniBand Packets Qian Liu QGA2@unh.edu Advisor: Professor Robert D. Russell University

#OFSUserGroup

OpenFabrics Software

User Group Workshop

Thank You


Recommended