© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
NeVer Mind networking: Using shared non-volatile memory in scale-out software
Stanko Novakovic , Paolo Faraboschi, Kimberly Keeton, Rob Schreiber, Edouard Bugnion, Babak Falsafi
*
EPFL, Switzerland HP Labs, USA *
* *
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 2
Multiple non-volatile memory technologies
Magnetic disk • Block-based storage, access via programmed IO or DMA (e.g. ATA)
Flash (PCIe-attached – via NVMe)
• Block-based storage, access via queue-pairs (i.e. SQ, CQ) and DMA
Non-volatile Random Access Memory (NVRAM) • Persistent with similar performance characteristics to DRAM • Byte-addressable, direct access via LD/ST • Examples: Resistive RAM (RRAM), Phase-change memory (PCM)
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 3
Rack-scale systems with shared non-volatile memory
Shared disk architecture
Shared NVMe namespaces
TheMachine/Firebox
Cache coher. No No No Interface PIO, DMA QP-based LD/ST
Technology Magnetic disk Flash NVRAM
1. Direct shared access 2. Latency: small factor of DRAM 3. Bandwidth: DDR bandwidth
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 4
The Machine from HP Labs
High-performance, high-capacity rack-scale computing
Photonic interconnect
Special purpose cores Massive memory pool
Petabytes of byte-addressable NVRAM based on memristor technology • Shared across thousands of cores via photonic interconnect
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 5
Outline
Introduction to shared NVRAM ! Shared NVRAM model Hash Table Graph Processing System Conclusion
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 6
Shared NVRAM model
Shared NVRAM
CPU
DRAM
I/O CPU
DRAM
I/O … CPU
DRAM
I/O
Servers - Do not share DRAM - Share NVRAM
* but not CC *
Shared NVRAM - Used as a heap - Latency: 4-5x of DRAM - Bandwidth: DDR rate
server
Can datacenter software benefit from such architecture?
Memory Interconnect
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 7
Experimental shared NVRAM platform
Kernel Virtual Machines (KVM)
Fraction of each guest’s address space is shared User code instrumented to delay NVRAM accesses
Ttrans = lat + data_size/bandwidth ~0.5µs DDR3/QPI
(unchanged)
DRAM Linux
VM VM VM …
Shared NVRAM
100ns ~0.5µs
app app app
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 8
Using shared NVRAM in datacenter software systems
Data serving • Distributed hash tables, graph stores ! Share data items via NVRAM
Data processing (a.k.a. analytics) • MapReduce, distributed graph processing, etc. ! Use shared NVRAM as communication medium
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 9
Today: Hash Table w/ client-side hashing
CPU I/O CPU I/O … CPU I/O
client
shardID ß h(key)…
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 10
This work: CREW Hash Table for shared NVRAM
CPU
DRAM
I/O CPU
DRAM
I/O … CPU
DRAM
I/O
…
Each server owns write permission to one block and read permission to whole NVRAM 1 2 N
Permisssions: Read-Write = {1} Read-Only = {1,2,..N}
Data stored in NVRAM
Memory Interconnect
1 2 N
CREW – Concurrent Read Exclusive Write
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 11
CREW Hash Table for shared NVRAM (reads)
CPU
DRAM
I/O CPU
DRAM
I/O … CPU
DRAM
I/O
client
serverID = uniform_rand() 1 2 N
…
Memory Interconnect
N
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 12
CREW Hash Table for shared NVRAM (writes)
CPU
DRAM
I/O CPU
DRAM
I/O … CPU
DRAM
I/O
client
serverID = uniform_rand() 1 2 N
network
…
Memory Interconnect
N
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 13
Hash table for shared NVRAM prototype
Based on Redis KVS • Each server runs separate instance of Redis • Shared read-only access to data items
YCSB clients run on separate servers
• Configured to compute hash or pick server in round-robin fashion
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 14
Per-server network utilization (0.99 Zipf workload)
0
50
100
150
200
250
300
350
400
1 2 3 4 5 6 7 8 9 10 11 12
1KB
read
look
ups
(thou
sand
s)
servers
Client-side hashing
Consistent hashing on client side can cause load imbalance
Redis (hashing)
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 15
0
50
100
150
200
250
300
350
400
1 2 3 4 5 6 7 8 9 10 11 12
1KB
read
look
ups
(thou
sand
s)
servers
Client-side hashing Round-robin
What does uniform utilization buy us?
BW/CPU bottleneck
Redis (shared NVRAM)
Higher throughput due to shared access w/o violating SLA
99th-percentile latency Redis (hashing)
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 16
Today: Hash Table w/ proxying
CPU I/O CPU I/O … CPU I/O
client
serverID = uniform_rand()
proxy
network
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 17
HT w/ proxying vs. HT w/ shared access
0
10
20
30
40
50
60
70
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Look
ups/
seco
nd (t
hous
ands
)
Fraction of reads
Round-robin (w/ proxying) Round-robin
~2x improvement in performance for read-only workload
Redis (shared NVRAM) Redis (proxying)
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 18
Data serving recap
Looked at two hash table (KVS) designs • with client-side hashing and proxying
KVS that uses shared NVRAM allows for shared read-only access • Better load balancing over hashing • Lower end-to-end latency over proxying
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 19
Outline
Introduction to shared NVRAM Shared NVRAM model Hash Table ! Graph Processing System Conclusion
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 20
Bulk Synchronous Parallel (BSP) on scale-out
CPU I/O CPU I/O … CPU I/O
1 2 N Compute
Buffered vertex updates
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 21
BSP graph processing on scale-out
CPU I/O CPU I/O … CPU I/O
1 2 N
network
Compute
Communicate
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 22
BSP graph processing on shared NVRAM
CPU I/O CPU I/O … CPU I/O
1 2 N Compute
…
Memory Interconnect
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 23
BSP graph processing on shared NVRAM
CPU I/O CPU I/O … CPU I/O
1 2 N Compute Communicate
DRAM à NVRAM
FLUSH
…
Memory Interconnect
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 24
BSP graph processing on shared NVRAM
CPU I/O CPU I/O … CPU I/O
1 2 N Compute Communicate
DRAM à NVRAM NOTIFY
NOTIFY
…
Memory Interconnect
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 25
BSP graph processing on shared NVRAM
CPU I/O CPU I/O … CPU I/O
1 2 N Compute
PULL
Communicate DRAM à NVRAM NOTIFY NVRAM à DRAM
…
Memory Interconnect
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 26
BSP graph processing for shared NVRAM prototype
Shared NVRAM BSP framework implemented in C++ from scratch Algorithms implemented as compute kernels Compare BSP over TCP/IP and BSP over NVRAM
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 27
PageRank on Twitter graph
Accelerating shuffle phase helps improve overall execution time
0
1
2
3
4
5
6
7
8
2 nodes 4 nodes 8 nodes 16 nodes
Exe
cutio
n tim
e (s
econ
ds)
TCP/IP NVM Scale-out Shared NVRAM
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 28
Related work
Concurrent Read Exclusive Write (FARM NSDI’14, MICA NSDI’14) Graph update aggregation (Pregel SIGMOD’10) Shared disks and shared NVMe namespaces
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 29
Conclusion
Shared NVRAM
CPU
DRAM
I/O CPU
DRAM
I/O … CPU
DRAM
I/O
Memory Interconnect
Rack-scale systems w/ shared NVRAM • Direct access, byte-addressable
This talk: how to make use of this arch. • In common DC apps • Studied KVS and graph processing
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP CONFIDENTIAL 30
Thank you