+ All Categories
Home > Documents > Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand Components Subnet...

Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand Components Subnet...

Date post: 19-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
49
Contemporary High-speed Techniques Tan Li
Transcript
Page 1: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

Contemporary High-speed Techniques

Tan Li

Page 2: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

2

Outline Native InfiniBand

Components Subnet Management and Services

High Speed Ethernet (HSE) Family Internet Wide Area RDMA Protocol (iWARP) Alternate choice - OpenOnLoad

InfiniBand/Ethernet Convergence Technologies (InfiniBand) RDMA over Ethernet (RoE) (InfiniBand) RDMA over Converged (Enhanced) Ethernet (RoCE)

Resources Summary

Page 3: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

3

Outline - Native InfiniBand

Recall InfiniBand Components InfiniBand Link Speed Roadmap InfiniBand Communication Model InfiniBand Switching and Routing InfiniBand Transport Layer Subnet Management and Services

Page 4: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

4

Recall Comparing InfiniBand with Traditional Networking Stack

Page 5: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

5

Recall - InfiniBand Protocol Offload Engines

Completely implement ISO/OSI layers 2‐4 (link layer, network layer and transport layer) in hardware

verbsIB

transport

IB network

IB link/phy

IB fabric

Page 6: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

6

InfiniBand Components Cables and Connectors

Channel Adapter

Switches

Routers

Page 7: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

7

Cables and Connectors• Volume 2 of the Architecture Specification is devoted to the physical and

electrical characteristics of InfiniBand. This has enabled vendors to develop and offer for sale a wide range of both copper and optical cables in a broad range of widths (4x, 12x) and speed grades (SDR, DDR, QDR).

• Many networks so far (1GE, Myrinet, Quadrics) used 8b/10b encoding

• New networks (IB (post‐QDR), HSE (>= 10GE) use 64b/66b encoding

• The eternal IB confusion: All networks other than IB specify data rate (1 Gigabit Ethernet ==1Gbps data

rate) IB initially broke this convention, when IB (up to QDR) is reported as

10/20/40Gbps, that’s actually the signaling rate: 8/16/32Gbps data rate IB FDR and EDR standards fixed this “error” and started reporting the data rate (IB

EDR reported as 100Gbps is truly data rate: 103.125Gbps signaling rate)

Page 8: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

8

Channel Adapter (CA)

App QP

OS CA

AppQP

OSCA

Different terms: HCA, TCA, DCAChannel adapter is a service, not

a hardware service

Using address translation mechanisms to visit the queue pairs

Page 9: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

9

InfiniBand switches & router• Switches: IB supports Virtual Cut Through(VCT), This is a subtle,

but key element of InfiniBand since it means that packets are never dropped in the network during normal operation. This “no drop” behavior is central to the operation of InfiniBand’s highly efficient transport protocol.

• Routers: Unspecified by IB SPEC, Up*/Down*, Shift are popular routing engines supported by OFED. Since InfiniBand’s management architecture is defined on a per subnet basis, using an InfiniBand router allows a large network to be partitioned into a number of smaller subnets thus enabling the deployment of InfiniBand networks that can be scaled to very large sizes, without the adverse performance impacts due to the need to route management traffic throughout the entire network.

Research spot:IB routing & WAN

Capability

Page 10: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

10

InfiniBand link speed roadmap

Page 11: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

11

InfiniBand link speed roadmap

Page 12: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

12

InfiniBand Communication Model

1. Queue Model

2. Overview

3. Memory Registration

4. Memory Protection

5. Verbs

Page 13: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

13

InfiniBand Communication Model - Queue Pair(QP) Model

• Send Queue(SQ)• Receive Queue(RQ)• Complete Queue(CQ)• Work requests(WQEs)• Notification of operation

completion(CQE)

Page 14: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

14

InfiniBand Communication Model - Overview

Page 15: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

15

InfiniBand Communication Model – Memory Registration

1. Registration Request2. Kernel handles

virtual->physical mapping and pins region into physical memory

3. HCA caches the virtual to physical mapping and issues a handle Work requests(WQEs)

4. Handle is returned to application

All memory used for communication must be registered!

Page 16: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

16

InfiniBand Communication Model – Memory Protection

• To send or receive data the l_keymust be provided to the HCA

• For security, keys are required for all operations that touch buffers

• For RDMA, initiator must have the r_key for the remote virtual address

r_key is not encrypted in IB!

Page 17: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

17

InfiniBand Communication Model - Verbs

• Post receive, send• RDMA-read, RDMA-write• Notify CQEs

Kernel is involved only to:1. Memory Registration

2. Post receive and send WQE3. Poll out completed CQEs

from CQ

Page 18: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

18

InfiniBand Switching and Routing

Virtual Lanes• Multiple virtual links within same

physical link• VL15: reserved for management,• Each port supports one or more data

VL

Service Levels• Packets may operate at one of 16

different SLs• Meaning not defined by IB• SL determines which VL on the next

link is to be used• Each port (switches, routers, end

nodes) has a SL to VL mapping table configured by the subnet management

Page 19: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

19

InfiniBand Switching and Routing

Allow the multiplexing of the multiple independent logical traffic flows on the same physical link

Simulate multiple networks in one physical network

Page 20: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

20

InfiniBand Switching and Routing

• Sender can utilize multiple LIDs associated to the same destination port Packets sent to one DLID take a fixed path Different packets can be sent using different DLIDs Each DLID can have a different path (switch can be configured differently for

each DLID)

• Each QP utilizes a single LID (one on one) All WQEs posted on same QP take the same path All packets are received by the receiver in the same order All receive WQEs are completed in the order in which they were posted

• Handle out-of-order-packet IB uses a simplistic approach: If packets in one connection arrive out‐of‐order,

they are dropped

Mark a Node: LID + GID = IP + MAC

Page 21: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

21

InfiniBand Transport Layer IB Transport Services (Queue-pair based)

Page 22: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

22

InfiniBand Transport LayerIB allows link rates to be statically changed

On a 4X link, we can set data to be sent at 1X For heterogeneous links, rate can be set to the lowest link rate Useful for low‐priority traffic

Auto‐negotiation also available E.g., if you connect a 4X adapter to a 1X switch, data is

automatically sent at 1X rate

Only fixed settings available Cannot set rate requirement to 3.16 Gbps, for example

Demo

Page 23: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

23

Subnet Management and Services Subnet Management Agents (SMA)

Processes or hardware units running on each adapter, switch, router (everything on the network)

Provide capability to query and set parameters

Managers Make high level decisions and implement it on the network

fabric using the agents

Subnet management packets (SMPs) Used for interactions between the manager and agents (or

between agents)

Messages

Page 24: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

24

Subnet Management and Services

Page 25: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

25

Subnet Management and Services

Subnet management packets (SMP)Define the operation to be performed by SMGet: get the information about CA, switch, portSet: set the attribute of a port (e.g. LID)GetResp: get responseTrap: inform SM about the state of a local node

• A SMA stop sending Trap message until it receives Trap Repress packet.

• Topology information can be obtained by a sweep and by periodical Traps.

Page 26: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

26

Subnet Management and Services

Subnet Management phases:Topology discovery: sending direct routed SMP to

evert port and processing the responses.Path computation: computing valid paths between

each pair of end nodePath distribution phase: configuring the forwarding

table

Page 27: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

27

Subnet Management and Services

Page 28: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

28

High Speed Ethernet (HSE) Family

Internet Wide Area RDMA Protocol (iWARP) Idea of iWARP iWARP & InfiniBand iWARP Architecture and Components iWARP Feathers Software iWARP

Alternative – OpenOnLoad Alternative – pure Ethernet/TCP/IP

Page 29: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

29

Idea of iWARP

verbs

TCP

IP

Enet MAC

RDDP

MPA

IP network

RDMAP

Page 30: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

30

iWARP & InfiniBand

Page 31: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

31

iWARP Architecture and Components

verbs

TCP

IP

Enet MAC

RDDP

MPA

IP network

RDMAP• RDMA Protocol (RDMAP)

Feature rich interfaceSecurity Management

• Remote Direct Data Placement (RDDP) Data Placement and Delivery Connection Management

• Marker PDU Aligned (MPA) Middle Box Fragmentation Data Integrity (CRC)

Page 32: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

32

iWARP Feathers

• Decoupled Data Placement and Data Delivery, if data is out‐of‐order, place it at the appropriate offset

• Complicated because of TCP windowing behavior• Can allow for simple prioritization, 8 classes provided,

Two priority classes for high‐priority traffic• Can allow for specific bandwidth requests, e.g., can

request for 3.62 Gbps bandwidth• Link aggregation allows for multiple links to logically look

like a single faster link, this is done at a hardware level• Primarily provides an InfiniBand RC transport like

behavior

Page 33: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

33

Software iWARP

Page 34: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

34

Alternative – Solarflare OpenOnLoad

Support standard Socket API

acceleration of TCP/UDP applications with no need to modify

applications or to run a new protocol

Page 35: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

35

Alternative – pure Ethernet/TCP/IP

High speed Ethernet (HSE) Consortium (10GE/40GE/100GE)

• 10GE Alliance formed by several industry leaders to take the Ethernet family to the next speed step

• Goal: To achieve a scalable and high performance communication architecture while maintaining backward

compatibility with Ethernet• http://www.ethernetalliance.org• 40 Gbps (Servers) and 100 Gbps Ethernet

(Backbones,Switches, Routers): IEEE 802.3 WG

Page 36: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

36

InfiniBand/Ethernet Convergence Technologies

Motivation & Hint

(InfiniBand) RDMA over Ethernet (RoE)

(InfiniBand) RDMA over Converged Ethernet (RoCE)

Some Test Results

Page 37: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

37

Motivation & Hint - Virtual Protocol Interconnect (VPI)

• Single network firmware to support both IB and Ethernet

• Autosensing of layer-2 protocol

• Multi-port adapters can use one port on IB and another on Ethernet

• Datacenters with IB inside the cluster and Ethernet outside, or clusters with IB network and Ethernet management

Page 38: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

38

Motivation & Hint

IB(S/D/Q)

XAUI XFI SGMII

IB Ethernet

IB L3 IPv4

IB transport

RDMAapplications

L1

L2

L3

L4 TCP

SDP

Socket applications

ULP RDSIPoIB

Verbs

Page 39: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

39

(InfiniBand) RDMA over Ethernet (IBoE or RoE)

Native convergence of IB network and transport layers with Ethernet link layer

IB packets encapsulated in Ethernet frames Advantages

Works natively in Ethernet environments (entire Ethernet management ecosystem is available)

Has all the benefits of IB verbs Disadvantages

Network bandwidth might be limited to Ethernet switches: 10GE switches available, 40GE yet to arrive, but 32 Gbps IB available now

Some IB native link‐layer features are optional in (regular) Ethernet

Page 40: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

40

(InfiniBand) RDMA over Converged Ethernet (RoCE)

Native convergence of IB network and transport layers with Ethernet link layer

IB packets encapsulated in Ethernet frames Advantages

CE is very similar to the link layer of native IB, so there are no missing features

Disadvantages Network bandwidth might be limited to Ethernet switches: 10GE

switches available, 40GE yet to arrive, but 32 Gbps IB available now

Page 41: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

41

(InfiniBand) RDMA over Converged Ethernet (RoCE)

LRH(L2 Hdr)

IB PayloadGRH(L3 Hdr)

VCRCICRCBTH+(L4 Hdr)

IB PayloadGRH ICRCBTH+ FCSMAC ETRoCEE

Infiniband

RoCEE

Page 42: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

42

Some Test Results

Page 43: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

43

Some Test Results

Page 44: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

44

Feature Comparison

Page 45: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

45

Summery

Page 46: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

46

Summery

VerbsSDP Lustre

skts apps

MPIstorage

appsfile

systemsnative apps

MPI SRP, iSER

InfiniBand iWARP RoCE

Page 47: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

47

Resources

• InfiniBand: - Introduction to InfiniBand™ for End Users - InfiniBand Trade Association: http

://www.infinibandta.org/• iWarp:

- Rdma Consortium: http://www.rdmaconsortium.org• OpenFabrics: http://www.openfabrics.org• [email protected]

Page 48: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

48

Future plan Design and programming in RDMA

Lustre concept and tuning

Page 49: Contemporary High-speed Techniques Tan Li. 2 Outline Native InfiniBand  Components  Subnet Management and Services High Speed Ethernet (HSE) Family.

49

Thanks & Questions


Recommended