+ All Categories
Home > Documents > InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox...

InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox...

Date post: 15-Aug-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
48
InfiniBand Technology and Usage Update Erin Filliater Mellanox Technologies
Transcript
Page 1: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

InfiniBand Technology and Usage Update

Erin Filliater Mellanox Technologies

Page 2: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

FDR InfiniBand solutions were introduced in mid-2011, and the InfiniBand roadmap and EDR specification were updated to provide a data rate of 100Gb/s per 4x EDR port (26Gb/s per lane). FDR InfiniBand introduced new 64/66-bit link encoding and a new reliability mechanism called Forward Error Correction. The newly defined link speeds, reliability mechanisms and transport features are designed to keep the rate of performance increase in like with systems-level performance increases. This session will provide a detailed review of the new InfiniBand speeds, features and roadmap.

2

Abstract

Page 3: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Learning Objectives

Detailed understanding of the new InfiniBand capabilities

View into the InfiniBand roadmap through 2016 Usage of RDMA for storage acceleration RDMA storage examples

3

Page 4: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Agenda

InfiniBand Technology Review New Features for FDR InfiniBand Roadmap InfiniBand and RDMA for Storage

4

Page 5: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

InfiniBand Technology

5

Page 6: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

What is InfiniBand?

Industry standard defined by the InfiniBand Trade Association (IBTA) Originated in 1999

Input/output architecture used to interconnect servers, communications

infrastructure equipment, storage and embedded systems

Pervasive, low-latency, high-bandwidth interconnect which requires low processing overhead and is ideal to carry multiple traffic types (clustering, communications, storage, management) over a single connection.

As a mature and field-proven technology, InfiniBand is used in thousands of data centers, high-performance compute clusters and embedded applications that scale from small scale to large scale

6

Page 7: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

The InfiniBand Architecture

Defines System Area Network architecture

Architecture supports Host Channel Adapters (HCA) Target Channel Adapters (TCA) Switches Routers

Facilitated HW design for Low latency / high bandwidth Transport offload

7

Processor Node

InfiniBand Subnet

Gateway

HCA

Switch Switch

Switch Switch

Processor Node

Processor Node

HCA

HCA

TCA

Storage Subsystem

Consoles

TCA

RAID

Ethernet

Gateway

Fibre Channel

HCA

Subnet Manager

Page 8: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

InfiniBand Feature Highlights

Serial high-bandwidth, ultra-low-latency links

Reliable, lossless, self-managing fabric

Full CPU offload

Quality Of Service

Cluster scalability, flexibility and simplified management

8

Page 9: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Delivering a Unified Data Center Fabric

9

Page 10: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

InfiniBand Network Stack

10

InfiniBand node

InfiniBand Switch

Legacy node

User code Kernel code Hardware Application

Network Layer

Link Layer

Physical Layer

Transport Layer

Network Layer

Link Layer

Physical Layer

Packet relay

PHY

Packet relay

PHY

PHY

Li

nk

PHY

Link

Router

Buffer

Buffer Buffer

Transport Layer

Application

Page 11: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Link Speed (109 bit/sec)

Physical Layer

Data transfer over serial bit streams

Auto-negotiation of link speed and width

Power management

Bit encoding

Control symbols

11

Lane Speed → SDR

(2.5GHz) DDR

(5GHz) QDR

(10GHz) FDR

(14GHz) EDR

(25GHz) Link Width ↓

1X 2.5 5 10 14 25

4X 10 20 40 56 100

8X 20 40 80 102 200

12X 30 60 120 168 300

Page 12: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Link Layer

Addressing and Switching Local Identifier (LID) addressing Unicast LID – 48K addresses Multicast LID – up to 16K addresses Efficient linear lookup Cut-through switching (ultra-low latency) Multi-pathing support through LMC

Data Integrity

Invariant CRC (ICRC) Variant CRC (VCRC)

12

Page 13: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Arbitration

De- mux

Mux

Link Control

Packets

Credits Returned

Link Control

Receive Buffers Packets

Transmitted

Link Layer – Flow Control

Credit-based link-level flow control No packet loss within fabric even in the presence of congestion Link Receivers grant packet receive buffer space credits per Virtual Lane

Separate flow control per virtual lane Alleviates of head-of-line blocking Virtual Fabrics – Congestion and latency on one VL does not impact traffic with

guaranteed QoS on another VL even though they share the same physical link

13

Page 14: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Virtual Lanes and Scheduling

14

Dynamically configure and adjust VLs and scheduling to match application performance needs

InfiniBand fabric

Low-latency VL for clustering

Mainstream storage VL Day: ≥ 40% BW Night: ≥ 20% BW

Backup VL Day: ≥ 20% BW Night: ≥ 60% BW

Physical:

Logical:

Page 15: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Network Layer

Global Identifier (GID) addressing Based on IPv6 addressing scheme GID = {64 bit GID prefix, 64 bit GUID}

GUID = Global Unique Identifier (64 bit EUI-64) GUID 0 – assigned by the manufacturer GUID 1..(N-1) – assigned by the subnet manager

Used for multicast distribution within end nodes

15

Subnet A Subnet B

IB Router

Page 16: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Transport Layer

Queue Pair (QP) – transport endpoint Asynchronous interface

Send Queue, Receive Queue, Completion Queue

Full transport offload Segmentation, reassembly, timers, retransmission, etc…

Kernel bypass Enables low latency and CPU offload Exposure of application buffers to the network

Polling and interrupt models supported

16

Page 17: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Remote QP Local QP

Transport Layer – Queue Pairs

QPs are in pairs (Send/Receive) Work Queue is the consumer/producer interface to the fabric The consumer/producer initiates a Work Queue Element (WQE) The channel adapter executes the work request The channel adapter notifies on completion or errors by writing a

Completion Queue Element (CQE) to a Completion Queue (CQ)

Transmit

Receive

Receive

Transmit WQE

17

Page 18: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Transport – HCA Model

Asynchronous interface Consumer posts work

requests HCA processes Consumer polls

completions

Transport executed by HCA

I/O channel exposed to the application

18

Port

VL VL VL VL …

Port

VL VL VL VL …

Transport and RDMA Offload Engine

Send Queue

Receive Queue

QP

Send Queue

Receive Queue

QP

Consumer

Completion Queue

posting WQEs

polling CQEs

HCA

18

Page 19: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Transport Layer – Types Transfer Operations

SEND Read message from HCA local system memory Transfers data to responder HCA Receive Queue logic Does not specify where the data will be written in remote memory Immediate Data option available

RDMA Read

Responder HCA reads its local memory and returns it to the requesting HCA Requires remote memory access rights, memory start address and message

length

RDMA Write Requester HCA sends data to be written into the responder HCA system

memory Requires remote memory access rights, memory start address and message

length

19

Page 20: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Typical Buffer Copy Flow Data

Source Data Sink

App Buf

Proto Buf

Proto Buf

Proto Buf

Proto Buf

Proto Buf

Proto

Data Message (Send)

Proto Buf

Proto Buf

Proto Buf

Proto Buf

Proto Buf

Proto App Buf App Buf

20

Page 21: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Typical Read Zero Copy Flow

Data Source

Data Sink

App Buf Advertise Message

(Send)

App Buf App Buf

RDMA Read

Completion Msg (Send)

Read Response

21

Page 22: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Typical Write Zero Copy Flow Data

Source Data Sink

App Buf

Advertise Message

(Send)

App Buf App Buf

Completion Msg (Send)

RDMA Write

22

Page 23: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Management Model

Subnet Manager (SM) Configures/administers fabric

topology Implemented at end-node or switch Active/passive model when more

than one SM is present Talks with SM agents in

nodes/switches Subnet Administration

Provides path records QoS management

Communication Management Connection establishment

processing

23

Subnet Mgt Agent

Subnet Manager

Subnet Management Interface

QP0 (uses VL15) QP1

Baseboard Mgt Agent

Communication Mgt Agent

Performance Mgt Agent

Device Mgt Agent

Vendor-Specific Agent

Application-Specific Agent

SNMP Tunneling Agent

Subnet Administration

General Service Interface

Page 24: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Partitions

Logically divide fabric into isolated domains

Partial and full membership per partition

Partition filtering at switches

Similar to FC Zoning 802.1Q VLANs

24

Host A Host B

InfiniBand Fabric

Partition 1: Inter-host

Partition 2: Private to host B

Partition 3: Private to host A

Partition 4: Shared

I/O A

I/O B I/O C

I/O D

Page 25: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

High Availability and Redundancy

Multi-port HCAs

Redundant fabric topologies

Link layer multi-pathing (LMC)

Automatic Path Migration (APM)

ULP High Availability Application-level multi-pathing (SRP/iSER) Teaming/bonding (IPoIB)

25

Page 26: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Upper Layer Protocols

ULPs connect InfiniBand to common interfaces Clustering

MPI (Message Passing Interface) RDS (Reliable Datagram Socket)

Network IPoIB (IP over InfiniBand) WSD (Winsock Direct) SDP (Socket Direct Protocol) Future: EthoIB

Storage SRP (SCSI RDMA Protocol) iSER (iSCSI Extensions for RDMA) NFSoRDMA (NFS over RDMA) Future: FCoIB

26

Hardware

Device Driver

InfiniBand Core Services

IPoIB

TCP/ IP

SDP RDS

socket interface

SRP iSER NFS over

RDMA

block storage file storage

Device Driver

InfiniBand Core Services

MPI

HPC clustering

Ker

nel B

ypas

s

Kernel

IB Apps

IB Apps

Clustering Apps

Sockets

Sockets-Based Apps

User

Storage

Interfaces (File/Block)

Storage Apps

Operating system InfiniBand Infrastructure Applications

WSD

Page 27: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

InfiniBand Block Storage

SRP – SCSI RDMA Protocol Defined by T10

iSER – iSCSI Extensions for RDMA

Defined by IETF IP Storage WG InfiniBand spec defined by IBTA Leverages iSCSI management

infrastructure

Protocol Offload InfiniBand Reliable Connection RDMA for zero-copy data transfer

27

SCSI Application

Layer

SCSI Transport Protocol

Layer

Interconnect Layer

SAM-3

FC-3 (FC-FS, FC-LS)

FC-2 (FC-FS) FC-1 (FC-FS) FC-0 (FC-PI)

SCSI Application

Layer

FC-4 Mapping (FCP-3)

Fibre Channel

InfiniBand

SCSI Application

Layer

SRP

SRP

InfiniBand / iWARP

SCSI Application

Layer

iSCSI

iSCSI

iSER

Page 28: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

SRP: Data Transfer Operations

Send/Receive Commands, Responses Task management

RDMA – Zero-Copy Path

Data-In, Data-Out Target issues the RDMA operations

iSER uses same principles

Immediate/unsolicited data allowed through Send/Receive

Included in mainline Linux kernel

28

Initiator Target

Initiator Target

IO Read

IO Write

Page 29: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Discovery Mechanism

SRP Persistent Information

{Node_GUID:IOC_GUID} Subnet Administrator Identifiers

Per LUN WWN (through INQUIRY VPD) SRP Target Port ID {IdentifierExt[63:0], IOC

GUID[63:0]} Service Name – SRP.T10.{PortID ASCII} Service ID – Locally assigned by IOC/IOU

29

I/O Controller

I/O Controller

I/O U

nit

InfiniBand I/O Model

Page 30: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Discovery Mechanism

iSER – uses iSCSI (RFC 3721) Static Configuration {IP, port, target

name} Send Targets {IP, port} SLP iSNS Target naming (RFC 3721/3980)

iSCSI Qualified Names (iqn.), IEEE EUI64 (eui.), T11 Network Address Authority (naa.)

30

I/O Controller

I/O Controller

I/O U

nit

InfiniBand I/O Model

Page 31: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

NFS Over RDMA

Defined by IETF ONC-RPC extensions for RDMA NFS mapping

RPC Call/Reply Send/Receive or via RDMA Read chunk list

Data transfer RDMA Read/Write – described by chunk list in

XDR message Send – inline in XDR message

Uses InfiniBand Reliable Connection QP IP extensions to CM Connection based on {IP, port} Zero-copy data transfers

Part of mainline Linux kernel

31

Client Server

Client Server

NFS READ

NFS WRITE

Page 32: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Storage Gateways

Benefits InfiniBand-island-to-SAN connectivity I/O scales independently of compute Design based on average server load

Current Gateways

SRPFC iSERFC Stateful architecture

Future Gateways

FCoIBFC FCoE sibling Stateless architecture

32

IB Header

FCoIB HDR

FC HDR

FC Payload

FC CRC

FCoIB Trailer

IB CRCs

FCoIB

FC HDR

FC Payload

FC CRC

FC

Stateless Packet Relay

Gateway

Servers

InfiniBand Fibre Channel

Scalable

Storage

Page 33: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

InfiniBand Fourteen Data Rate (FDR)

33

Page 34: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

FDR InfiniBand

Launched mid-2011 Next-generation high-speed interconnect 14Gb/s per lane 56Gb/s per port

PCIe 3.0 support

34

Page 35: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

FDR InfiniBand: New Features

New bit encoding scheme: 64/66 Forward Error Correction (FEC) Fix bit errors throughout network Reduce overhead for data retransmission

nodes InfiniBand routing

35

Page 36: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

FDR InfiniBand: Performance

36

120% Higher Application ROI

Double the Bandwidth of QDR Half the Latency of QDR

Page 37: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Remote Storage Access with Local Storage Performance

InfiniBand and Storage

37

SMB Client

SMB Server

Fusion IO Fusion IO Fusion IO PCIe Flash

IO Micro Benchmark

IB FDR

IB FDR

Page 38: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

InfiniBand Roadmap

38

Page 39: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

InfiniBand Roadmap

39

Source: InfiniBand Trade Association

Page 40: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Leading Interconnect, Leading Performance

40

Latency

5usec 2.5usec

1.3usec 0.7usec

<0.5usec

160/200Gb/s

100Gb/s 56Gb/s

40Gb/s 20Gb/s

10Gb/s 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

2017

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

Bandwidth

Same Software Interface

0.5usec

Page 41: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

InfiniBand and RDMA Storage

41

Page 42: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Efficient Storage Access

Full I/O offload Zero copy Interrupt avoidance (moderated per I/O interrupt) Offloaded segmentation and reassembly Transport reliability Lossless fabric – credit-based flow control

Fabric Consolidation Partitioning VL Arbitration and QoS Host virtualization compatible High throughput Performance counters

42

Page 43: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

InfiniBand Storage Benefits

High-bandwidth fabric Fabric consolidation Data center efficiency Gateways

One wire out of the server FC port sharing Independent growth for I/O,

storage and compute Network cache

43

InfiniBand Backend

Native IB JBODs

Direct attach native IB

Block Storage

Native IB File Server

(NFS RDMA) Native IB

Block Storage (SRP/iSER)

Servers InfiniBand

Gateway

InfiniBand Storage Deployment Options

Fibre Channel

Page 44: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Clustered/Parallel Storage Benefits

Integrated with clustering infrastructure

Efficient object/block transfer

Atomic operations Ultra-low latency High bandwidth Back-end storage fabric

44

Parallel / Clustered File System

Parallel NFS Server

OSD/Block Storage Targets

Servers

InfiniBand

Page 45: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Microsoft Windows Server 2012 and SMB Direct

New class of low-latency enterprise file storage Minimal CPU utilization for file storage

processing Leverages RDMA technologies Easy to provision, manage and migrate No application change or admin

configuration RDMA-capable network interface and

hardware required (InfiniBand and RoCE) SMB Multichannel for load balancing and

failover

45

File Client File Server

SMB Server SMB Client

User

Kernel

Application

Disk

RDMA Adapter

Network w/ RDMA

support

NTFS SCSI

Network w/ RDMA

support

RDMA Adapter

10X Performance Improvement versus 10GbE Preliminary results based on Windows Server 2012 beta

Page 46: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Measuring SMB Direct Performance

46

Page 47: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Maximizing File Server Performance

Configuration BW (MB/sec)

IOPS (512KB IOs/sec)

CPU Overhead (Privileged)

Local 10,090 38,492 2.5%

Remote 9,852 37,584 5.1%

VM 10,367 39,548 4.6%

Preliminary results from SuperMicro servers, each with 2 Intel E5-2680 CPUs at 2.70GHz. Both client and server use two Mellanox ConnectX-3 network interfaces on PCIe Gen 3 x8 slots. Data goes to 4 LSI 9285-8e RAID controllers and 4 JBODs, each with 8 OCZ Talos 2 R SSDs.

Workload: 512KB IOs, 2 threads, 16 outstanding IOs per thread

10GB/sec Bandwidth with 5% CPU Overhead

47

Page 48: InfiniBand Technology and Usage Update · 2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved. InfiniBand Technology and Usage Update Erin Filliater

2012 Storage Developer Conference. © 2012 Mellanox Technologies. All Rights Reserved.

Thank You

48


Recommended