pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe...

Post on 03-Aug-2020

2 views 0 download

transcript

2015 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

pNFS/RDMA: Possibilities

Chuck LeverOracle Corporation

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

The opinions expressed in this presentation are the presenter’s own, and do not represent the

views of Oracle or anyone else.

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Given these storage trends: ❒ Throughput of networks is increasing ❒ Latency of persistent storage is dropping

exponentially ❒ Capacity is off the charts

❒ How can NFS make good use of our new Persistent Memory overlords?

What If . . . ?

3

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

Traditional NFS

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Each NFS file resides on one server

❒ Applications locate files via a POSIX directory structure

❒ Clients access data via NFS READ and WRITE operations

Traditional NFS Operation

5

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

Traditional NFS Server Storage Topology

6

SAN

Ethernet

NFS server

NFS clients

XFS

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ One RPC issued at a time per TCP socket

❒ Typically one or a few TCP sockets are shared across a server’s shares

❒ Data throughput is constrained by the server

Traditional NFS Weaknesses

7

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

Traditional NFS FILE_SYNC WRITE

8

NFS Client NFS ServerTCP send

TCP send

Server updates durable storage

Application writes

Write is complete

TCP sendTCP send

TCP send

. . .

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ To avoid waiting for durable storage on every WRITE, NFSv3 introduced unstable WRITE plus COMMIT ❒ Client flushes data to server asynchronously ❒ Client sends COMMIT ❒ Server makes written data durable

❒ Transport bottlenecks remained

Two-phase Commit

9

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

What Is pNFS?

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ NFS protocol manages metadata ❒ Directory structure ❒ File open and lock state ❒ File data layout information ❒ Fall-back I/O mechanism

❒ Separate protocol and transports handle I/O

Data / Metadata Separation

11

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ A layout type: ❒ Specifies which transport protocol to use ❒ How to locate file data ❒ Specified separately from NFS protocol

❒ A layout instance tells where a file’s data resides ❒ Which NFS server and file, or ❒ Which SCSI LUN at which LBA

pNFS Layout Types

12

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Applications retain single-server view of files ❒ NFS server manages data layout ❒ Each NFS client can stripe file I/O across multiple

storage services ❒ Data and metadata operations run concurrently ❒ Clients and servers share a storage fabric

❒ SCSI, iSCSI, iSER, SRP ❒ Object-based storage ❒ NFS

Parallel NFS In A Nutshell

13

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

pNFS Server Storage Topology

14

SAN

Ethernet

NFS server

NFS clients

XFSSCSI

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ High Performance Computing ❒ Parallel I/O ❒ Greater file capacity

❒ Deployments where storage clients and servers share a storage fabric ❒ Each client can be directed to a particular

server ❒ Each file can be placed on a particular server

Example Usage Scenarios

15

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

What Is NFS/RDMA?

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ I/O-like access of the physical memory on another host ❒ Strong ordering of operations ❒ Asynchronous: completion fires when an

operation finishes ❒ Datagram channel: SEND and RECV ❒ Data transfer: READ and WRITE

What Is Remote Direct Memory Access?

17

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Zero-copy is possible on both send and receive ❒ No CPU cache footprint until app accesses

data ❒ Transport resources are pre-allocated

❒ No resource allocation in data path ❒ Reduced opportunity for deadlock

❒ Data transfer is concurrent with other transport operations

RDMA Ready For 100Gbps Fabrics

18

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Each RPC is conveyed by RDMA operations ❒ Ultra-low round-trip latency

❒ RNICs handle bulk data transfer ❒ Low CPU overhead ❒ High bandwidth

NFS/RDMA Concepts

19

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Non-I/O operations conveyed via RDMA SEND ❒ GETATTR, LOOKUP, and so on

❒ Data operations (i.e. NFS READ and WRITE) utilize RDMA READ and WRITE ❒ Server initiates all RDMA transfer ❒ After that, neither host CPU is involved

Data / Metadata Separation

20

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

NFS/RDMA FILE_SYNC WRITE

21

NFS Client NFS ServerRDMA SEND

RDMA READREAD result

RDMA SEND

RDMA READREAD result

Server updates durable storage

Application writes

Write is complete

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Use NFS/RDMA instead of NFS/TCP on IPoIB ❒ See “RDMA On 100Gbps Fabrics”

❒ Latency-sensitive SLAs

❒ CPU-intensive client workloads

❒ One-time bulk-data movement (e.g. backup)

Example Usage Scenarios

22

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

pNFS and NFS/RDMA

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Client gets direct access to durable storage

❒ E.g. ultra-low latency Persistent Memory

❒ No protocol translation overhead

❒ Data not even read into server DRAM

Why pNFS/RDMA?

24

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Multiple transport connections per client mount point

❒ Multiple QPs

❒ Multiple RNICs

Why pNFS/RDMA?

25

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ Single converged fabric shared between pNFS clients and servers

❒ Rather than “pNFS/TCP with SCSI”

❒ Instead use “pNFS/RDMA with SRP”

Why pNFS/RDMA?

26

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

pNFS/RDMA Server Storage Topology

27

RDMA Fabric

NFS server

NFS clients

XFS

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

Next Steps

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ NFSv4.1 on RDMA is a pre-requisite

❒ Bi-directional RPC-over-RDMA ❒ Lots of backchannel session slots ❒ NFSv4.1 Upper Layer Binding to RPC-over-

RDMA

What’s Needed For NFS/RDMA

29

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ A new pNFS layout type is not required for operation with SRP or iSER

❒ Proposal: a new pNFS layout type for accessing remote Persistent Memory devices directly ❒ Device naming ❒ Ensuring data durability ❒ Error handling and fencing ❒ Authentication, data privacy

What’s Needed For pNFS

30

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

Questions / Discussion

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

Appendix

2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.

❒ pNFS Standards ❒ NFSv4.1: RFC 5661 ❒ pNFS layouts: RFCs 5662 - 5665

❒ NFS/RDMA Standards ❒ RPC-over-RDMA: RFC 5666 ❒ NFS/RDMA ULB: RFC 5667

NFS Reference Material

33