Export Memblaze PBlaze4 NVMe SSD by
NVMe Over Fabric Or NVMe Over Fabric in
SPDK Env
WHITEPAPER
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
2
Contents
Executive Summary ........................................................................................................... 3
NVMe OVER FABRIC Brief Introduction .............................................................................. 3
SPDK Brief Introduction ..................................................................................................... 4
About the Test .................................................................................................................. 4
Test environment ..................................................................................................................... 4
Configuration of NVMe Over Fabric Target of Linux Kernel .................................................... 4
Configuration of NVMe Over Fabric Client .............................................................................. 8
Configuration for NVMe Over Fabric Target of SPDK ............................................................ 11
Configuration for NVMe Over Fabric Client of SPDK ............................................................. 13
Configuration of performance testing tool(fio) ..................................................................... 15
Configuration of performance testing tool(perf) ................................................................... 16
Test Results ..................................................................................................................... 16
Instructions of test items ....................................................................................................... 16
4KB Random Read Latency Test Results ................................................................................ 16
4KB Random Write Latency Test Results ............................................................................... 16
4KB Random Read Test Results .............................................................................................. 17
4KB Random Write Test Results ............................................................................................. 17
64KB Sequential Read Test Results ........................................................................................ 20
64KB Sequential Write Test Results ....................................................................................... 21
Conclusions ..................................................................................................................... 22
Reference ....................................................................................................................... 22
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
3
Executive Summary
As SSDs become more common, you'll also hear more about Non-Volatile Memory Express, a.k.a. NVM Express, or more commonly - NVMe. NVMe SSD offers both high throughput and low latency features. The capacity of NVMe SSD becomes larger than larger up to 3.2 or 6.4TB and one server can also support up to 24 or 48 U.2 NVMe SSD disk now. So the capacity of the server fully filled with U.2 NVMe SSD can reach abort the hundreds of TB capacity. There are very few single node application can use such a large capacity. So the problem how to export the large storage capacity for other application nodes is needed to solve.
PBlaze Series is Membalze enterprise level PCIe SSD which supports the industry-standard NVMe, providing extreme IOPS and help remove the I/O storage bottleneck that plagues the modern-day data centers.
In this whitepaper, compares the performance and IO latency consistency for the following two scenes:
It also unveils outstanding performance of PBlaze PCIe SSD and superior QoS.
NVMe OVER FABRIC Brief Introduction
Much has already been said about NVMe over Fabrics. It was first publically demonstrated in 2014 and the 1.0 specification is finally complete. In essence, NVMe over Fabrics (NVMf) is the NVM Express* (NVMe) block protocol tunneled through an RDMA fabric, and it has enormous potential to enable the next generation of datacenter storage development. To support broad adoption of NVMf, the Storage Performance Development Kit (SPDK) has created a reference user-space NVMf target implementation for Linux, released for community involvement via GitHub under BSD license. In parallel, community-based Linux* kernel efforts have created both a host and a target under GPL license. All three of these implementations are now released with the final 1.0 specification as of June 8, 2016. [1]
Topology of NVMe Over Fabric
1. PBlaze4 3.2TB U.2 NVMe SSD disk exported by NVMe over fabric (using NVMe over fabric target program of Linux kernel);
2. PBlaze4 3.2TB U.2 NVMe SSD disk exported by NVMe over fabric in SPDK environment (using NVMe over fabric target program of SPDK).
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
4
SPDK Brief Introduction
The Storage Performance Development Kit (SPDK) provides a set of tools and libraries for writing high performance, scalable, user-mode storage applications. It achieves high performance by moving all of the necessary drivers into userspace and operating in a polled mode instead of relying on interrupts, which avoids kernel context switches and eliminates interrupt handling overhead.
The bedrock of SPDK is a user space, polled-mode, asynchronous, lockless NVMe driver. This provides zero-copy, highly parallel access directly to an SSD
from a user space application. The driver is written as a C library with a single public header. Similarly, SPDK provides a user space driver for the I/OAT DMA engine present on many Intel Xeon-based platforms with all of the same properties as the NVMe driver.
SPDK also provides NVMe-oF and iSCSI servers built on top of these user space drivers that are capable of serving disks over the network. The standard Linux kernel iSCSI and NVMe-oF initiator can be used (or the Windows iSCSI initiator even) to connect clients to the servers. These servers can be up to an order of magnitude more CPU efficient than other implementations. [1]
About the Test
Test environment
Configuration of NVMe Over Fabric Target of Linux Kernel
Load nvmet and nvmet-rdma kernel modules.
Server: 2* PowerEdge R730xd( Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 8 core,128GB DRAM)
Storage: Memblaze 3.2T PBlaze4 U.2 Disk (20W Power Limit)
OS: CentOS 4.7.0-rc2+
SPDK : spdk-master
DPDK: dpdk-stable-16.07.1
NVMe Over Fabric Target of Linux Kernel: nvmet.ko (DEFDD1F71C669E1679530B9)
NVMe Over Fabric Target Driver of Linux Kernel: nvmet-rdma (865E9D5BA7A00E6EC149124)
NVMe Over Fabric Client: nvme-cli (nvme version 0.9.77.gc5e4)
NVMe Over Fabric Client Host Driver: nvme-rdma.ko (1E2A340FDC19F7E5BF87F8C)
NVMe Over Fabric Target of SPDK: nvmf_tgt
Test Tool: fio 2.2.9 & SPDK perf (./examples/nvme/perf/perf)
SDPK Topology
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
5
Firmware version and port type of Mellanox HCA.
RoCE Modes
RoCE modes are including RoCE V1 and RoCE V2 as following figure.
In this case RoCE Modes is RoCEv1 as the following screenshot.
The Linux-IO Target uses configFS for all fabric module configuration (according to https://linux-iscsi.org/wiki/ConfigFS).
1. Create nvmet-rdma Subsystem.
mkdir /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
6
cd /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name
The nvmet is a fabric module.
2. Allow any hosts to be connected to this target.
echo 1 > attr_allow_any_host
3. Create namespace inside the subsystem
mkdir namespaces/10
cd namespaces/10
4. Set the path to the NVMe device (/dev/nvme0n1) and enable the namespace.
echo -n /dev/nvme0n1 > device_path
echo 1 > enable
5. Create the following directory with an NVMe port.
mkdir /sys/kernel/config/nvmet/ports/1
cd /sys/kernel/config/nvmet/ports/1
6. Set the IP address of the relevant port
echo 10.10.10.10 > addr_traddr
7. Set RDMA as transport type, and set the transport RDMA port.
echo rdma > addr_trtype
echo 1023 > addr_trsvcid
8. Set Address Family to port.
echo ipv4 > addr_adrfam
9. Create a soft link.
ln -s /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name subsystems/nvme-subsystem-name
10. Check dmesg to make sure that the nvme target is listening on the port.
Oct 28 10:51:15 localhost kernel: nvmet: adding nsid 10 to subsystem nvme-subsystem-name
Oct 28 10:53:54 localhost kernel: enabling port 1 (10.10.10.10:1023)
11. After finish configuration.
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
7
Bind irqs of Mellanox ConnectX-3 Pro HCA.
Mellanox HCA is at numa node 1.
Stop irqbalance and bind irqs of the Mellanox HCA from core2 to core18.
Target nvme ssd irq bind:
PBlaze4 NVMe SSD is also at numa node 1.
Stop irqbalance and bind irqs of the PBlaze4 NVMe SSD from core0 to core31.
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
8
Configuration of NVMe Over Fabric Client
Load nvme-rdma kernel modules.
Firmware version and port type of Mellanox HCA.
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
9
RoCE Modes(RoCEv1)
Install and compile nvmecli.
# nvme
nvme-0.9.77.gc5e4
Discover available subsystems on NVMF target
Connect to the discovered subsystems
# nvme connect -t rdma -n nvme-subsystem-name -a 10.10.10.10 -s 1023
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
10
Bind irqs of Mellanox ConnectX-3 Pro HCA.
Mellanox HCA is at numa node 1.
Stop irqbalance and bind irqs of the Mellanox HCA to core1,core3……core31 of the numa node 1.
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
11
Configuration for NVMe Over Fabric Target of SPDK
Set the configuration of the NVMe over fabric target of SPDK.
PBlazeIV NVMe SSD and Mellanox HCA are all at the numa node 1.
The configuration of the NVMe over fabric target of SPDK(configure nvmf.conf).
[Global]
ReactorMask 0xaaaaaaaa
[Nvmf]
MaxQueuesPerSession 256
MaxQueueDepth 512
InCapsuleDataSize 4096
MaxIOSize 131072
AcceptorCore 31
AcceptorPollRate 0
[Subsystem1]
NQN nqn.2016-06.io.spdk:cnode1
Core 3
Mode Direct
Listen RDMA 10.10.10.10:1023
NVMe 0000:86:00.0
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
12
Configuration of the hugepages(32 * 1GB).
DPDK need to use 1GB size Linux Hugepages.
Start the NVMe over fabric target of SPDK.
No need to bind irqs when doing SPDK NVMe over fabric test.
According to the following picture, you can set there are no interrupts of NVMe SSD and Mellanox HCA to be dealt with when doing SPDK NVMe over fabric test.
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
13
Configuration for NVMe Over Fabric Client of SPDK
Load nvme-rdma kernel modules.
Firmware version and port type of Mellanox HCA.
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
14
RoCE Modes(RoCEv1)
Install and compile nvmecli.
# nvme
nvme-0.9.77.gc5e4
Discover available subsystems on NVMF target and connect to the discovered subsystems.
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
15
Bind irqs of Mellanox ConnectX-3 Pro HCA.
Mellanox HCA is at numa node 1.
Stop irqbalance and bind irqs of the Mellanox HCA to core1,core3……core31 of the numa node 1.
Configuration of performance testing tool(fio)
FIO is an open source tool for testing the storage IOPS. It supports a variety of different I/O engine, including sync, mmap, libaio, posixaio, SG v3, splice, null, network, syslet, guasi and solarisaio. In this test, we use libaio engine in order to increase the SSD concurrent access. FIO common test setting is on the right side of the list.
ioengine=libaio, random_generator=tausworthe64,
direct=1, thread=1,
norandommap, randrepeat=0,
The parameters including bs/rw/numjobs/iodepth are are
configured according to the test requirements.
The number of SSD concurrent access is 256(numjobs*iodepth).
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
16
Configuration of performance testing tool(perf)
Test Results
Instructions of test items
Test Items Test Description
Linux Kernel Local Test We use Linux kernel NVMe driver to access NVMe SSD and do IO test by using fio
tool.
NVMeOF Test We use nvme-cli connect to the NVMe over fabric target of Linux
Kernel and do IO test by using fio tool in client server.
SPDK Local Test We use SPDK user mode NVMe driver to access NVMe SSD and do IO
test by using SPDK perf tool.
SPDK + NVMeOF Test We use nvme-cli connect to the NVMe over fabric target of SPDK and do IO test by using fio tool in client server.
4KB Random Read Latency Test Results
Following test results show that NVMeOF only add about 10~15us latency when doing 4KB random read test, ether in or not in SPDK environment.
4KB Random Read(1 Outstanding IO)
Linux Kernel
Local Test
(NVMe Driver)
NVMeOF Test
(Target of
Linux Kernel)
SPDK + NVMeOF
Test (Target of SPDK)
Avg. Latency (us) 98.56 109.50 113.93
99.00% IO Latency Under (us) 121 131 129
4KB Random Write Latency Test Results
Following results show that NVMeOF only add about 9~10us latency when doing 4KB random write test, ether in or not in SPDK environment.
4KB Random Write(1 Outstanding IO) Linux Kernel NVMeOF Test SPDK + NVMeOF
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
17
Local Test
(NVMe Driver)
(Target of
Linux Kernel)
Test (Target of SPDK)
Avg. Latency (us) 14.60 25.96 23.53
99.00% IO Latency Under (us) 18 33 37
4KB Random Read Test Results
Following test results show that NVMeOF has few performance impact when doing the ultimate 4KB random read performance test, ether in or not in SPDK environment.
4KB Random Read
(256 Outstanding IO)
Linux Kernel
Local Test
(NVMe Driver)
NVMeOF Test
(Target of
Linux Kernel)
SPDK Local
Test
SPDK + NVMeOF
Test
(Target of SPDK)
IOPS 714,643 701,184 666,615(1 lcore)
689,950(2 lcore)
638,296
Avg. Latency (us) 354.18 359.75 191.99 395.59
99.00% IO Latency
Under (us)
1,176 884 N/A 844
Figure 1 shows that 4KB random read performance of Linux Kernel target is a little better than SPDK target.
Figure 2 shows that 4KB random read latency consistency of SPDK target is a little better than Linux Kernel target.
Figure 1, 4KRR IOPS Figure 2, 4KRR Latency Consistency (lower is better)
4KB Random Write Test Results
Following test results show that NVMeOF has no performance impact when doing the ultimate 4KB random write performance test, ether in or not in SPDK environment.
4KB Random Write
(256 Outstanding IO)
Linux Kernel
Local Test
NVMeOF Test
(Target of
SPDK + NVMeOF
1st Test
SPDK + NVMeOF
2nd Test
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
18
(NVMe Driver) Linux Kernel) (Target of SPDK) (Target of SPDK)
IOPS 163,112 IOPS 163,120 IOPS 163,063 IOPS 164,682 IOPS
Avg. Latency 1.56 ms 1.56 ms 1.56 ms 1.55 ms
99.00% IO Latency
Under
10 ms 10 ms 5.73 ms 5.66 ms
99.99% IO Latency
Under
27.26 ms 27.01 ms 12.48 ms 11.97 ms
And from the above table, we can see the write latency consistency of SPDK target is two times better than Linux kernel target.
Then we do the following tests to find the inflection point of the write latency consistency.
4KB Random
Write Items
NVMeOF Test
(Target of Linux Kernel)
SPDK + NVMeOF
Test (Target of SPDK)
1*1
Outstanding IO
IOPS 33,604 35,174
Avg. Latency(us) 25.96 24.62
99.00% IO Latency Under(us) 33 39
99.99% IO Latency Under(us) 107 1,144
2*2
Outstanding IO
IOPS 125,676 134,161
Avg. Latency(us) 28.16 26.14
99.00% IO Latency Under(us) 71 41
99.99% IO Latency Under(us) 122 1,144
4*4
Outstanding IO
IOPS 155,293 155,459
Avg. Latency(us) 98.92 98.82
99.00% IO Latency Under(us) 1,400 1,496
99.99% IO Latency Under(us) 4,192 4,256
8*8
Outstanding IO
IOPS 155,199 158,744
Avg. Latency(us) 407.97 398.80
99.00% IO Latency Under(us) 2,864 2,896
99.99% IO Latency Under(us) 5,792 5,792
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
19
4*16
Outstanding IO
IOPS 154,099 155,441
Avg. Latency(us) 411.22 407.61
99.00% IO Latency Under(us) 2,896 2,896
99.99% IO Latency Under(us) 5,856 5,984
4*32
Outstanding IO
IOPS 155,912 156,171
Avg. Latency(us) 816.81 815.46
99.00% IO Latency Under(us) 4,576 4,448
99.99% IO Latency Under(us) 9,408 9,152
4*64
Outstanding IO
IOPS 155,192 157,084
Avg. Latency(us) 1,634.5 1,625.28
99.00% IO Latency Under(us) 9,536 5,472
99.99% IO Latency Under(us) 20,864 10,048
4*128
Outstanding IO
IOPS 154,697 157,202
Avg. Latency(us) 3,283.91 3,231.42
99.00% IO Latency Under(us) 19,328 7,456
99.99% IO Latency Under(us) 46,336 12,480
We summary the data in above table and draw the following picture. From the Figure 3, you can see the write latency consistency of SPDK target is almost the same as Linux kernel target when outstanding IO of the test is under 4*32. But if the outstanding IO of the test is bigger than 4*32, the write latency consistency of Linux kernel target will become poor soon.
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
20
Figure 3, 4KRW IOPS and latency consistency in or not in SPDK env.
Figure 4 shows that 4KB random write performance of Linux Kernel target is almost the same as SPDK target.
Figure 5 shows that 4KB random read latency consistency of SPDK target is two times better than Linux Kernel target.
Figure 4, 4KRW IOPS Figure 5, 4KRW Latency Consistency (lower is better)
64KB Sequential Read Test Results
Following test results show that NVMeOF has no performance impact when doing the ultimate 64KB sequential read performance test, ether in or not in SPDK environment.
64KB Seq Read(256 Outstanding IO)
Linux Kernel
Local Test
(NVMe Driver)
NVMeOF Test
(Target of
Linux Kernel)
SPDK + NVMeOF
Test (Target of SPDK)
MBPS 2770.8 MB/s 2778.2 MB/s 2669.1MB/s
Avg. Latency 2.88 ms 2.87 ms 2.99 ms
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
21
99.00% IO Latency Under 7.52 ms 6.88 ms 7.01 ms
Figure 6 shows that 64KB sequential read performance of Linux Kernel target is almost the same as SPDK target.
Figure 7 shows that 64KB sequential read latency consistency of Linux Kernel target is almost the same as SPDK target.
Figure 6, 64SR IOPS Figure 7, 64KSR Latency Consistency (lower is better)
64KB Sequential Write Test Results
Following test results show that NVMeOF has no performance impact when doing the ultimate 64KB sequential write performance test, ether in or not in SPDK environment.
64KB Seq Write(256 Outstanding IO)
Linux Kernel
Local Test
(NVMe Driver)
NVMeOF Test
(Target of
Linux Kernel)
SPDK + NVMeOF
Test (Target of SPDK)
MBPS 1728.1 MB/s 1731.6 MB/s 1732.6MB/s
Avg. Latency 1.15 ms 1.14 ms 1.14 ms
99.00% IO Latency Under 3.57 ms 3.34 ms 3.53 ms
Figure 8 shows that 64KB sequential write performance of Linux Kernel target is almost the same as SPDK target.
Figure 9 shows that 64KB sequential write latency consistency of Linux Kernel target is almost the same as SPDK target.
Figure 8, 64KSW IOPS Figure 9, 64KSW Latency Consistency (lower is better)
Export Memblaze PBlaze NVMe SSD By NVMeOF Or NVMeOF In SPDK Env
WHITE PAPER Beijing Memblaze Technology Co., Ltd.
22
Conclusions
This test report clearly demonstrates the utility of Memblaze PBlaze SSD on NVMe over fabric almost has no performance impact when doing ultimate performance test, no matter in SPDK environment or not. The latency will only induce extra 9~15us when using NVMe over fabric. The write latency consistency of SPDK target keeps stable even under high outstanding IO pressure, while the write latency consistency of Linux kernel target is getting poorer and poorer with outstanding IO pressure increased. So it is recommended to use SPDK target under heavy business write pressure to fully deliver Memblaze PBlaze Series SSD performance.
Reference
[1] http://www.spdk.io/.
DISCLAIMER
Information in this document is provided in connection with Memblaze products. Memblaze provides this document “as is”, without warranty of any kind, neither expressed nor implied, including, but not limited to, the particular purpose. Memblaze may make improvements and/or changes in this document or in the product described in this document at any time without notice. The products described in this document may contain design defects or errors known as anomalies or errata which may cause the products functions to deviate from published specifications.
COPYRIGHT
© 2016 Memblaze Corp. All rights reserved. No part of this document may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any language in any form or by any means without the written permission of Memblaze Corp.
TRADEMARKS
Memblaze is a trademark of Memblaze Corporation. Other names mentioned in this document are trademarks/registered trademarks of their respective owners.
USING THIS DOCUMENT
Though Memblaze has reviewed this document and very effort has been made to ensure that this document is current and accurate, more
information may have become available subsequent to the production of this guide. In that event, please contact your local Memblaze sales
office or your distributor for latest specifications before placing your product order.