+ All Categories
Home > Documents > Performance Gains of IBM Spectrum Scale* with Intel® Solid ......lspci | egrep -i...

Performance Gains of IBM Spectrum Scale* with Intel® Solid ......lspci | egrep -i...

Date post: 07-May-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
16
Order Number: 335139-001US Performance Gains of IBM Spectrum Scale* with Intel® SSD Data Center Family with NVMe* Technology White Paper November 2016
Transcript

Order Number: 335139-001US

Performance Gains of IBM Spectrum Scale* with

Intel® SSD Data Center Family with NVMe* Technology

White Paper

November 2016

Intel® SSD Data Center with NVMe* Technology

White Paper November 2016

2 335139-001US

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or

configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your

purchase.

Test and System Configurations: Refer to Table 1: IBM Spectrum Scale/Storage Test Environment on page 4.

No computer system can be absolutely secure. Intel does not assume any liability for lost or stolen data or systems or any

damages resulting from such losses.

The products described may contain design defects or errors known as errata which may cause the product to deviate from

published specifications. Current characterized errata are available on request.

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel

product specifications and roadmaps.

By using this document, in addition to any agreements you have with Intel, you accept the terms set forth below.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.

*Other names and brands may be claimed as the property of others.

Copyright © 2016 Intel Corporation. All rights reserved.

Intel® SSD Data Center with NVMe* Technology

November 2016 White Paper

335139-001US 3

Contents

Abstract .................................................................................................................................................................................................................... 4

About NVM Express ............................................................................................................................................................................................. 4

IBM Spectrum Scale ............................................................................................................................................................................................ 5

Clustered File System ............................................................................................................................................................................... 5

Local Read-Only Cache (LROC)............................................................................................................................................................. 5

SPEC SFS 2014 Benchmark ............................................................................................................................................................................. 6

Test Environment ................................................................................................................................................................................................. 7

Benchmark Results .............................................................................................................................................................................................. 9

Summary ............................................................................................................................................................................................................... 11

Acknowledgment ............................................................................................................................................................................................... 11

Appendix ............................................................................................................................................................................................................... 12

VDA Benchmark Results ....................................................................................................................................................................... 12

IBM Spectrum Scale Configuration .................................................................................................................................................. 13

Intel® SSD Data Center with NVMe* Technology

White Paper November 2016

4 335139-001US

Abstract

This paper provides performance measurements to highlight the benefits of Intel® Solid State Drive Data Center

(Intel® SSD DC) P3700 Series with NVMe* technology in combination with IBM Spectrum Scale*, IBM's high-

performance clustered file system, and its LROC (local read-only cache) technology in a clustered shared-nothing

compute node environment with local disk storage.

While the latest SPEC SFS* 2014 SP1 benchmark release is used as workload for the test results described in this

paper, this paper itself does not represent an official SPEC SFS 2014 benchmark submission.

Table 1: IBM Spectrum Scale/Storage Test Environment

Type Qty Description Commands Used

System/Intel Board ID/Maker

3 S2600GZ (Intel White Box) lshw -short | grep system

CPU 2 Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz, 12 cores

lscpu

Memory 16 8192MB DDR3 DIMM @ 1600 MHz,

total memory 128GB dmidecode -t 17 | more

NVMe SSDs 2 Intel NVMe PCIe SSD P3700 800GB isdct show –intelssd

Linux install NA CentOS Linux* release 7.2.1511 (Core) cat /etc/*release

Kernel version NA 3.10.0-327.13.1.el7.x86_64 uname -a

Ethernet Switch NA ExtremeXOS 10Gbit Summit x70 series NA

Ethernet 10Gbit card 1 Intel Corporation 82599ES 10-Gigabit

SFI/SFP+ Network Controllers lspci | egrep -i 'network|ethernet'

NOTE: All systems share the same configuration

About NVM Express*

NVM Express* (NVMe) is an industry standard for using non-volatile memory (NVM), for example, NAND Flash

memory, in a solid state drive (SSD). NVMe standardizes the interface from the storage driver to the SSD, including

command set and features (e.g., power management). The standard enables native OS drivers, such as Windows*,

Linux*, and VMware*, to provide a seamless user experience. The standard was defined from the ground up for

NVM, so it is capable of much higher IO rates and lower latency than legacy storage standards (SATA, SAS) that

were designed for hard disk drives. The benefits of the NVMe driver’s efficiency with respect to the SSDs used in

this report are best described at http://itpeernetwork.intel.com/intel-ssd-p3700-series-nvme-efficiency .

The starting point for the standards specification and resources on the standardization effort for NVMe is at

www.nvmexpress.org. The specification can be found at www.nvmexpress.org/specifications.

Intel® SSD Data Center with NVMe* Technology

November 2016 White Paper

335139-001US 5

IBM Spectrum Scale

Clustered File System

IBM Spectrum Scale1 is a high-performance clustered file system developed by IBM, formerly known as General

Parallel File System2 (GPFS). It is capable of managing petabytes of data and billions of files, providing world-class

storage management with extreme scalability, performance, and automatic policy-based storage tiering. IBM

Spectrum Scale is a unified software-defined storage solution for file and object storage. It is aimed at high

performance, large scale workloads and include high performance computing as well as big data and analytics

environments.

Local Read-Only Cache (LROC)

Many applications benefit greatly from large local caches. Data in the cache is locally available at very low latencies

and the average response times of an application's read operations through cache hits is reduced. A high cache hit

ratio also reduces the load on the shared network and on the backend storage itself, thus providing a benefit for all

nodes of the cluster – even those without a large local read cache.

Figure 1: IBM Spectrum Scale - Local Read-Only Cache (LROC)

Source: IBM Corporation

NVMe SSDs provide an economical way to create very large local caches. The local read-only cache (LROC3) of the

IBM Spectrum Scale file system as shown in Figure 1 utilizes flash storage which serves as an extension to the local

client's buffer pool. As user data or metadata is evicted from the buffer pool in memory, it can be stored in the local

cache for future reference. A subsequent access will retrieve the data from the local cache (if it has not already

been evicted based on policies) rather than from the original source location. The data stored in the local cache,

like data stored in memory, remains consistent. If a conflicting access occurs, the data is invalidated from all caches.

In a like manner, if a node is restarted all data stored in the cache is discarded.

LROC improves performance as it keeps cached data and metadata close to the client, reducing latency and overall

load on the network and storage backend. LROC is designed for read-only access to data with synchronous write

through. At less costs than DRAM, the local cache extends the local client node's buffer pool for the IBM Spectrum

Scale file system which treats the cached data as volatile and ensures cache consistency through byte-range tokens

and checksums.

1 http://www.ibm.com/systems/storage/spectrum/scale/ 2 https://www.usenix.org/system/files/login/articles/login_june_04_hildebrand.pdf 3 http://www.ibm.com/support/knowledgecenter/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.adm.doc/bl1adm_lroc.htm

Intel® SSD Data Center with NVMe* Technology

White Paper November 2016

6 335139-001US

SPEC SFS 2014 Benchmark

The SPEC SFS 2014 benchmark4 is the latest version of the Standard Performance Evaluation Corporation

(SPEC)5 benchmark suite measuring file server throughput and response time and providing a standardized method

for comparing performance across different vendor platforms. It measures the maximum sustainable throughput of

a storage solution that is providing files to the application layer on the client nodes.

Table 2: Workload characteristics of the SPEC SFS 2014 VDA benchmark

Source: IBM Corporation

The SPEC SFS 2014 benchmark is a complete solution benchmark that measures performance of the entire storage

configuration as it interacts with application-based workloads. It is fully multi-client aware, and is a distributed

application that coordinates and conducts the testing across all of the client nodes that are used to test a storage

solution. It runs at the application system call level and is file system type agnostic. Being protocol independent, it

runs over any version of NFS or SMB/CIFS, clustered file systems, object oriented file systems, local file systems, or

any other POSIX compatible file system. Because the benchmark runs at the application system call level, all

components of the storage solution impact the performance of the solution – this includes the load generators

themselves as well as any physical or virtual hardware between the load generators and where the data ultimately

rests on stable storage. The workload consists of a selection of typical file operations.

4 https://www.spec.org/sfs2014/ 5 https://www.spec.org/

Intel® SSD Data Center with NVMe* Technology

November 2016 White Paper

335139-001US 7

The SPEC SFS 2014 benchmark provides four different workloads which represent real data processing file system

environments: SWBUILD - Software Build, VDA - Video Data Acquisition (Streaming), VDI – Virtual Desktop

Infrastructure, and DATABASE – Database Workload. Each of these workloads can be run independently.

The performance results in this paper are based on the Video Data Acquisition (VDA) workload of the SPEC SFS*

2014 benchmark which simulates applications that store data acquired from a temporally volatile source (e.g.

surveillance cameras) and measures – within bit rate and fidelity constraints – the number of simultaneous streams.

The business metric for the benchmark is the number of STREAMS. A stream refers to an instance of the

application storing data from a single source (e.g. one video feed). The benchmark consists of two workload

objects: a data stream (VDA1) and companion applications (VDA2). Each stream corresponds to a roughly 36 Mb/s

bit rate, which is in the upper range of high definition video. The specific workload characteristics of the SPEC SFS

2014 Video Data Acquisition (VDA) benchmark are listed in Table 2. The result of a benchmark execution is a set of

business metric / response-time data points that define a performance curve for the solution under test.

Test Environment

The test environment empoyed in this paper is shown in Figure 2.

The three compute nodes are Intel manufactured servers based on Intel® S2600GZ server boards. Each server is

equipped with two 12-core Intel® Xeon® (2.70 GHz) processors, 128 GB of DDR3 1600 MHz memory, four onboard

Intel I350 1Gbps Ethernet ports (only one 1Gbps port is connected), one dual-port Intel 82599ES 10Gbps Ethernet

network adapter (only one 10Gbps port is connected), three Seagate* ST91000640NS hard disk drives (1TB 7.2k

SATA 6Gb/s HDD), and two Intel® SSD DC P3700 Series 800GB NVMe expansion cards. The client nodes are

running CentOS Linux release 7.2.1511 (kernel 3.10.0-327.13.1.el7.x86_64) as operating system. They are

configured in a shared-nothing cluster configuration running IBM Spectrum Scale version 4.2.1.0 as shared cluster

file system with 2-way replication.

Figure 2: IBM Spectrum Scale LROC setup with Intel® SSD DC P3700 Series NVMe* cards

Source: IBM Corporation

Storage for the IBM Spectrum Scale clustered file system is provided by three local 1TB 7.2k SATA 6Gb/s HDDs in

each node, providing a total raw file system capacity of 8.2 TiB. Each of the three local 1TB HDDs (/dev/sdb,

/dev/sdc, /dev/sdd) is configured as an NSD (Network Shared Disk) for the IBM Spectrum Scale cluster with one

failure group per compute node:

%nsd: nsd=ap011a device=/dev/sdb servers=ap011 usage=dataAndMetadata failureGroup=1 %nsd: nsd=ap011b device=/dev/sdc servers=ap011 usage=dataAndMetadata failureGroup=1 %nsd: nsd=ap011c device=/dev/sdd servers=ap011 usage=dataAndMetadata failureGroup=1 %nsd: nsd=ap012a device=/dev/sdb servers=ap012 usage=dataAndMetadata failureGroup=2 %nsd: nsd=ap012b device=/dev/sdc servers=ap012 usage=dataAndMetadata failureGroup=2 %nsd: nsd=ap012c device=/dev/sdd servers=ap012 usage=dataAndMetadata failureGroup=2 %nsd: nsd=ap013a device=/dev/sdb servers=ap013 usage=dataAndMetadata failureGroup=3 %nsd: nsd=ap013b device=/dev/sdc servers=ap013 usage=dataAndMetadata failureGroup=3 %nsd: nsd=ap013c device=/dev/sdd servers=ap013 usage=dataAndMetadata failureGroup=3

Intel® SSD Data Center with NVMe* Technology

White Paper November 2016

8 335139-001US

The two Intel® SSD DC P3700 Series 800GB NVMe expansion cards appear as two additional 745.2 GiB disk devices

(/dev/nvme0n1, /dev/nvme1n1) on each node which are configured as LROC devices for the IBM Spectrum Scale

clustered file system:

%nsd: nsd=ap011c0 device=/dev/nvme0n1 servers=ap011 usage=localCache %nsd: nsd=ap011c1 device=/dev/nvme1n1 servers=ap011 usage=localCache %nsd: nsd=ap012c0 device=/dev/nvme0n1 servers=ap012 usage=localCache %nsd: nsd=ap012c1 device=/dev/nvme1n1 servers=ap012 usage=localCache %nsd: nsd=ap013c0 device=/dev/nvme0n1 servers=ap013 usage=localCache %nsd: nsd=ap013c1 device=/dev/nvme1n1 servers=ap013 usage=localCache

The IBM Spectrum Scale clustered file system is created from all 9 hard disk drives (HDDs) as follows with 2-way

replication, 1 MiB block size (-B) and a log file size of 32 MiB (-L):

# mmcrfs -F nsdfs -B 1M -L 32M -T /gpfs/fs0 -A no -m 2 -M 3 -r 2 -R 3 -S relatime

The 1 MiB block size proves to be a performance improvement over the default block size of 256 KiB for the VDA

workload which uses large I/O transfer sizes up to 1 MiB.

The 10 Gbps Ethernet connection is configured as an additional subnet (mmchconfig subnets) for IBM Spectrum

Scale to leverage the higher bandwidth of the 10 Gbps Ethernet network.

The following IBM Spectrum Scale configuration attributes have been changed (mmchconfig) from their default

values in order to better adapt to the hardware configuration and the workload profile of the VDA benchmark:

maxFilesToCache=100k

maxStatCache=0

maxBufferDescs=1m

Especially maxBufferDescs proves to most significantly enhance the performance with LROC in this configuration.

Increasing maxFilesToCache helps to slightly increase the cache hit ratio of data objects stored in LROC while

maxStatCache does not make a noticeable difference but is chosen to be set to 0 instead of the default value.

Increasing the default pagepool size within reasonable limits does not show a significant improvement for the

performance of the VDA workload in this configuration – neither with LROC enabled nor disabled.

Although more IBM Spectrum Scale configuration attributes could have been fine-tuned for the benchmark, only

the most significant attributes were changed. The IBM Spectrum Scale default pagepool size of 1 GiB, as well as the

default OS kernel runtime parameters (sysctl), remained unchanged for this test.

To disable or enable LROC the following configuration attributes are used (mmchconfig):

Enable LROC: Disable LROC:

lrocInodes yes lrocInodes no

lrocDirectories yes lrocDirectories no

lrocData yes lrocData no

Intel® SSD Data Center with NVMe* Technology

November 2016 White Paper

335139-001US 9

Benchmark Results

To evaluate the performance gains when using the Intel® SSD DC P3700 Series 800GB NVMe expansion cards as

local read-only cache (LROC) for the IBM Spectrum Scale file system, two valid6 test runs of the SPEC SFS® 2014

VDA benchmark are performed, one with LROC disabled and one with LROC enabled.

Each VDA test run simulates applications that store data acquired from a temporally volatile source (e.g.

surveillance cameras) and measures the number of as many simultaneous streams as possible while meeting bit

rate and fidelity constraints. The workload is expressed in a business metric as number of SPEC SFS2014_vda

STREAMS. A stream refers to an instance of the application storing data from a single source (for example, one

video feed) where each stream corresponds to a roughly 36 Mb/s bit rate, which is in the upper range of high

definition video.

Figure 3: SPEC SFS 2014 VDA benchmark run results

The test run with LROC disabled is comparable to a shared-nothing IBM Spectrum Scale cluster that does not have

Intel® SSD DC P3700 Series NVMe expansion cards installed, and depends solely on the achievable performance of

the local HDDs and the effective use of the IBM Spectrum Scale pagepool in memory with a default size of 1 GiB per

node, in this configuration.

The test run with LROC enabled in this shared-nothing cluster configuration benefits from an extension of the local

IBM Spectrum Scale pagepool in memory by using the additional capacity of the low-latency flash storage of the

Intel® SSD DC P3700 Series NVMe expansion cards.

The results of the VDA runs are shown in Figure 3. We see that a valid 10-load-point VDA benchmark run with

LROC disabled can maintain up to 40 SPEC SFS2014_vda STREAMS with an overall response time of 64.4 ms while

an LROC enabled configuration using the Intel® SSD DC P3700 Series NVMe expansion cards as local read-only

cache (LROC) and large extension of the local buffer pool can maintain up to 50 SPEC SFS2014_vda STREAMS with

an overall response time of only 24.36 ms.

6 Valid SPEC SFS 2014 benchmark results must include at least 10 load points within a single benchmark run. The nominal

interval spacing is the maximum requested load divided by the number of requested load points.

Intel® SSD Data Center with NVMe* Technology

White Paper November 2016

10 335139-001US

The total amount of data created in the 10-step VDA benchmark run in this configuration is 2.2 TiB with 4840 files

and directories for 50 SPEC SFS2014_vda STREAMS (with LROC) and 1.8 TiB with 3880 files and directories for 40

SPEC SFS2014_vda STREAMS (without LROC).

Enabling LROC on flash storage provided by the Intel® SSD DC P3700 Series NVMe expansion cards can not only

increase the overall throughput but also drastically reduce I/O response times due to increased read cache hit

ratios with extremely low latencies from the local NVMe devices. As a local read-only cache LROC only accelerates

the application's read operations.

When using LROC in this configuration with the VDA benchmark we observe a cache hit ratio of around 43% for

directory objects and 48% for data objects that are stored in the LROC cache on the NVMe devices.

=== mmdiag: lroc ===

LROC Device(s): '2465100A57BE3186#/dev/nvme0n1;2465100A57BE3187#/dev/nvme1n1;' status Running

Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0

Max capacity: 1526194 MB, currently in use: 382756 MB

Directory objects stored 1232 (38 MB) recalled 530 (21 MB) = 43.02 %

Data objects stored 801187 (800254 MB) recalled 389835 (342207 MB) = 48.66 %

The iostat statistics as shown in Table 3 are captured during active intervals of the VDA benchmark with LROC

enabled. They reveal extremely low mean service times (svctm) of only 0.054-0.056 ms for the Intel SSD DC P3700

Series NVMe devices in contrast to 8.5-8.7 ms for the SATA HDDs (a factor of 150x). The average wait time for read

requests issued to the device to be served (r_await) which includes the time spent by the requests in the queue

and the time spent by the device servicing the request is even more significant with mean read request wait times

of only 0.350 ms for LROC read operations from the NVMe devices compared to 127-158 ms for read operations

from the spinning HDDs (a factor of 400x). As LROC only accelerates local read operations through cache hits from

its large local cache it benefits considerably from the extremely low read latencies of the Intel SSD DC P3700 Series

NVMe devices.

Table 3: Iostat statistics for NVMe and HDD devices during a VDA benchmark run with LROC

nvme0n1 (NVMe / LROC): await r_await w_await svctm util Min. :0.0600 Min. :0.00 Min. :0.0100 Min. :0.01000 Min. : 0.20 1st Qu.:0.2400 1st Qu.:0.32 1st Qu.:0.1200 1st Qu.:0.05000 1st Qu.: 2.65 Median :0.2700 Median :0.34 Median :0.1500 Median :0.06000 Median : 4.80 Mean :0.2666 Mean :0.35 Mean :0.1514 Mean :0.05418 Mean : 11.92 3rd Qu.:0.2900 3rd Qu.:0.37 3rd Qu.:0.1700 3rd Qu.:0.06000 3rd Qu.: 7.20 Max. :0.6200 Max. :1.73 Max. :0.5000 Max. :0.11000 Max. :100.70

nvme1n1 (NVMe / LROC): await r_await w_await svctm util Min. :0.0900 Min. :0.1000 Min. :0.0000 Min. :0.01000 Min. : 0.10 1st Qu.:0.2500 1st Qu.:0.3300 1st Qu.:0.1200 1st Qu.:0.05000 1st Qu.: 2.75 Median :0.2800 Median :0.3600 Median :0.1500 Median :0.06000 Median : 5.15 Mean :0.2756 Mean :0.3673 Mean :0.1481 Mean :0.05627 Mean : 12.26 3rd Qu.:0.3000 3rd Qu.:0.3900 3rd Qu.:0.1700 3rd Qu.:0.06000 3rd Qu.: 7.70 Max. :0.5700 Max. :1.1200 Max. :0.4400 Max. :0.11000 Max. :100.65

sdb (SATA HDD / file system data): await r_await w_await svctm util Min. : 2.48 Min. : 2.00 Min. : 0.00 Min. : 2.090 Min. : 1.35 1st Qu.: 62.09 1st Qu.: 46.66 1st Qu.: 62.17 1st Qu.: 8.230 1st Qu.: 85.25 Median : 148.33 Median :120.50 Median : 147.97 Median : 9.520 Median : 99.60 Mean : 211.14 Mean :158.35 Mean : 211.94 Mean : 8.711 Mean : 86.27 3rd Qu.: 303.01 3rd Qu.:269.00 3rd Qu.: 302.58 3rd Qu.:10.040 3rd Qu.:100.00 Max. :1050.52 Max. :603.00 Max. :1053.50 Max. :10.920 Max. :100.20

sdc (SATA HDD / file system data): await r_await w_await svctm util

Intel® SSD Data Center with NVMe* Technology

November 2016 White Paper

335139-001US 11

Min. : 1.45 Min. : 0.0 Min. : 0.32 Min. : 1.360 Min. : 0.85 1st Qu.: 70.73 1st Qu.: 53.5 1st Qu.: 70.36 1st Qu.: 7.607 1st Qu.: 88.47 Median :147.97 Median :103.5 Median :148.80 Median : 9.480 Median : 99.90 Mean :189.74 Mean :148.4 Mean :190.52 Mean : 8.564 Mean : 86.77 3rd Qu.:291.16 3rd Qu.:239.8 3rd Qu.:292.49 3rd Qu.: 9.950 3rd Qu.:100.00 Max. :612.61 Max. :459.0 Max. :614.01 Max. :10.780 Max. :100.10

sdd (SATA HDD / file system data): await r_await w_await svctm util Min. : 1.46 Min. : 0.00 Min. : 0.41 Min. : 1.200 Min. : 0.90 1st Qu.: 69.67 1st Qu.: 32.34 1st Qu.: 69.66 1st Qu.: 8.045 1st Qu.: 82.38 Median : 151.93 Median : 84.00 Median : 153.20 Median : 9.290 Median : 99.95 Mean : 256.10 Mean :127.65 Mean : 257.35 Mean : 8.536 Mean : 85.46 3rd Qu.: 414.72 3rd Qu.:192.50 3rd Qu.: 416.86 3rd Qu.: 9.820 3rd Qu.:100.00 Max. :1166.17 Max. :511.00 Max. :1174.29 Max. :10.620 Max. :100.45

The tested configuration is neither limited by the Intel SSD DC P3700 Series NVMe devices nor the 10 Gbps

ethernet network, but in limited by the low number of local HDDs in the compute nodes. With only three local

HDDs per compute node the overall performance for the VDA workload is disk-bound as each of the spinning 7.2k

SATA disks is only capable of delivering a very limited number of random I/O operations per second (in this case

50-60 random read I/O operations per second per HDD). The mean utilization (util) of the HDDs during the VDA

benchmark run is on average 86% compared to 12% of the Intel SSD DC P3700 Series NVMe devices.

In this configuration, increasing the IBM Spectrum Scale pagepool size (within reasonable limits) does not yield the

same performance improvements for the VDA benchmark as enabling LROC (which is much larger and has multiple

times the amount of total memory of the compute node).

Summary

This paper demonstrates that significant performance gains can be achieved in shared-nothing compute clusters

with the IBM Spectrum Scale file system when taking advantage of its local read-only cache (LROC) feature and

utilizing Intel SSD DC P3700 Series NVMe expansion cards. The low latencies of the Intel SSD DC P3700 Series

NVMe expansion cards lead to extremely fast response times on read cache hits from the local read-only cache

(LROC) and thus can significantly reduce the overall response times of the application's I/O workload as

demonstrated in this paper for the SPEC SFS 2014 Video Data Acquisition (VDA) benchmark. Enabling LROC and

utilizing Intel SSD DC P3700 Series NVMe devices increases the achievable SPEC SFS 2014 VDA throughput of the

given shared-nothing compute node configuration by 25%7 while at the same time improving the overall response

times by more than 2x.

Many applications and workloads can benefit from large local read caches. Adding local solid state drives (SSDs) or

NVMe flash expansion cards to IBM Spectrum Scale client nodes and configuring the flash storage as a large local

read-only cache (LROC) can provide an economical way to enhance the overall performance of a shared-nothing

compute node cluster with IBM Spectrum Scale. The large local cache together with the extremely low latencies of

Intel SSD DC P3700 Series NVMe expansion cards can significantly improve application performance by increasing

throughput and reducing I/O response times.

Acknowledgment

Intel would like to thank primary authors Gero F Schmidt and Sven Oehme of IBM’s Almaden Research Center for

their contributions and partnership in providing this paper. This paper was a collaboration of Intel hardware labs

and IBM Software.

740 SPEC SFS2014_vda STREAMS without LROC vs 50 SPEC SFS2014_vda STREAMS with LROC

Intel® SSD Data Center with NVMe* Technology

White Paper November 2016

12 335139-001US

Appendix

VDA Benchmark Results

VDA Benchmark Results with LROC Disabled

Table 4: LROC Disabled

Table 5: SPEC SFS 2014 sfs_rc Parameters for Benchmark Run with LROC Disabled

BENCHMARK=VDA LOAD=4 INCR_LOAD=4 NUM_RUNS=10 CLIENT_MOUNTPOINTS=ap011:/gpfs/fs0/specrun ap012:/gpfs/fs0/specrun ap013:/gpfs/fs0/specrun EXEC_PATH=/specsfs/netmist USER=root WARMUP_TIME=300 IPV6_ENABLE=0

VDA Benchmark Results with LROC Enabled

Table 6: LROC Enabled

Table 7: SPEC SFS 2014 sfs_rc Parameters for Benchmark Run with LROC Enabled

BENCHMARK=VDA LOAD=5 INCR_LOAD=5 NUM_RUNS=10 CLIENT_MOUNTPOINTS=ap011:/gpfs/fs0/specrun ap012:/gpfs/fs0/specrun ap013:/gpfs/fs0/specrun EXEC_PATH=/specsfs/netmist USER=root WARMUP_TIME=300 IPV6_ENABLE=0

Intel® SSD Data Center with NVMe* Technology

November 2016 White Paper

335139-001US 13

IBM Spectrum Scale Configuration

LROC Statistics after VDA Benchmark Run with LROC enabled

=== mmdiag: lroc ===

LROC Device(s): '2465100A57BE3186#/dev/nvme0n1;2465100A57BE3187#/dev/nvme1n1;' status Running

Cache inodes 1 dirs 1 data 1 Config: maxFile 0 stubFile 0

Max capacity: 1526194 MB, currently in use: 382756 MB

Statistics from: Thu Sep 29 09:32:43 2016

Total objects stored 802419 (800295 MB) recalled 390366 (342232 MB)

objects failed to store 11 failed to recall 334 failed to inval 0

objects queried 0 (0 MB) not found 0 = 0.00 %

objects invalidated 419667 (417792 MB)

Inode objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 %

Inode objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB)

Inode objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0

Directory objects stored 1232 (38 MB) recalled 530 (21 MB) = 43.02 %

Directory objects queried 0 (0 MB) = 0.00 % invalidated 518 (22 MB)

Directory objects failed to store 1 failed to recall 0 failed to query 0 failed to inval 0

Data objects stored 801187 (800254 MB) recalled 389835 (342207 MB) = 48.66 %

Data objects queried 0 (0 MB) = 0.00 % invalidated 419147 (417767 MB)

Data objects failed to store 10 failed to recall 334 failed to query 0 failed to inval 0

agent inserts=25607631, reads=10982029

response times (usec):

insert min/max/avg=1/49152/27

read min/max/avg=1/196608/682

ssd writeIOs=1600144, writePages=204879104

readIOs=87265249, readPages=87109111

response times (usec):

write min/max/avg=192/49152/336

read min/max/avg=24/196608/294

Intel® SSD Data Center with NVMe* Technology

White Paper November 2016

14 335139-001US

LROC Configuration Attributes (LROC enabled)

lrocChecksum 0

lrocData 1

lrocDataMaxBufferSize 0

lrocDataMaxFileSize 0

lrocDataStubFileSize 0

lrocDeviceMaxSectorsKB 64

lrocDeviceNrRequests 1024

lrocDeviceQueueDepth 31

lrocDevices 2465100A57BE3186#/dev/nvme0n1;2465100A57BE3187#/dev/nvme1n1;

lrocDeviceScheduler deadline

lrocDeviceSetParams 1

lrocDirectories 1

lrocInodes 1

IBM Spectrum Scale Cluster Attributes

GPFS cluster information ========================

GPFS cluster name: spectrumscale.ap011

GPFS cluster id: 961253720001384271

GPFS UID domain: spectrumscale.ap011

Remote shell command: /usr/bin/ssh

Remote file copy command: /usr/bin/scp

Repository type: CCR

Node Daemon node name IP address Admin node name Designation --------------------------------------------------------------------

1 ap011 36.101.16.10 ap011 quorum-manager

2 ap012 36.101.16.9 ap012 quorum-manager

3 ap013 36.101.16.8 ap013 quorum-manager

Configuration data for cluster spectrumscale.ap011: ---------------------------------------------------

clusterName spectrumscale.ap011

clusterId 961253720001384271

autoload no

dmapiFileHandleSize 32

minReleaseLevel 4.2.1.0

ccrEnabled yes

cipherList AUTHONLY

subnets 192.168.10.0

maxFilesToCache 100k

maxStatCache 0

maxBufferDescs 1m

adminMode central

Intel® SSD Data Center with NVMe* Technology

November 2016 White Paper

335139-001US 15

File systems in cluster spectrumscale.ap011: --------------------------------------------

/dev/fs0

Disk name NSD volume ID Device Devtype Node name Remarks ---------------------------------------------------------------------------------------------------

ap011a 2465100A57B49341 /dev/sdb generic ap011 server node

ap011b 2465100A57B49342 /dev/sdc generic ap011 server node

ap011c 2465100A57B49343 /dev/sdd generic ap011 server node

ap011c0 2465100A57BE3186 /dev/nvme0n1 generic ap011 server node

ap011c1 2465100A57BE3187 /dev/nvme1n1 generic ap011 server node

ap012a 2465100957B49341 /dev/sdb generic ap012 server node

ap012b 2465100957B49344 /dev/sdc generic ap012 server node

ap012c 2465100957B49346 /dev/sdd generic ap012 server node

ap012c0 2465100957BE3188 /dev/nvme0n1 generic ap012 server node

ap012c1 2465100957BE318B /dev/nvme1n1 generic ap012 server node

ap013a 2465100857B49346 /dev/sdb generic ap013 server node

ap013b 2465100857B49353 /dev/sdc generic ap013 server node

ap013c 2465100857B49355 /dev/sdd generic ap013 server node

ap013c0 2465100857BE318E /dev/nvme0n1 generic ap013 server node

ap013c1 2465100857BE3190 /dev/nvme1n1 generic ap013 server node

Intel® SSD Data Center with NVMe* Technology

White Paper November 2016

16 335139-001US

IBM Spectrum Scale File System Attributes

flag value description ------------------- ------------------------ -----------------------------------

-f 32768 Minimum fragment size in bytes

-i 4096 Inode size in bytes

-I 32768 Indirect block size in bytes

-m 2 Default number of metadata replicas

-M 3 Maximum number of metadata replicas

-r 2 Default number of data replicas

-R 3 Maximum number of data replicas

-j scatter Block allocation type

-D nfs4 File locking semantics in effect

-k all ACL semantics in effect

-n 32 Estimated number of nodes that will mount file system

-B 1048576 Block size

-Q none Quotas accounting enabled

none Quotas enforced

none Default quotas enabled

--perfileset-quota No Per-fileset quota enforcement

--filesetdf No Fileset df enabled?

-V 15.01 (4.2.0.0) File system version

--create-time Thu Sep 29 09:36:56 2016 File system creation time

-z No Is DMAPI enabled?

-L 33554432 Logfile size

-E Yes Exact mtime mount option

-S relatime Suppress atime mount option

-K whenpossible Strict replica allocation option

--fastea Yes Fast external attributes enabled?

--encryption No Encryption enabled?

--inode-limit 8584960 Maximum number of inodes

--log-replicas 0 Number of log replicas

--is4KAligned Yes is4KAligned?

--rapid-repair Yes rapidRepair enabled?

--write-cache-threshold 0 HAWC Threshold (max 65536)

-P system Disk storage pools in file system

-d ap011a;ap011b;ap011c;ap012a;ap012b;ap012c;ap013a;ap013b;ap013c Disks in file system

-A no Automatic mount option

-o none Additional mount options

-T /gpfs/fs0 Default mount point

--mount-priority 0 Mount priority


Recommended