Post on 17-Oct-2020
transcript
1
The Direct Access File System (DAFS)
Matt DeBergalis, Peter Corbett, Steve Kleiman, Arthur Lent, Dave Noveck, Tom Talpey, Mark Wittle
Network Appliance, Inc.
Usenix FAST ’03
Tom Talpey
tmt@netapp.com
2Usenix FAST ‘03
Outline
4 DAFS
4 DAT / RDMA
4 DAFS API
4 Benchmark results
3Usenix FAST ‘03
DAFS – Direct Access File System
4 File access protocol, based on NFSv4 and RDMA, designed specifically for high-performance data center file sharing (local sharing)
4 Low latency, high throughput, and low overhead
4 Semantics for clustered file sharing environment
4Usenix FAST ‘03
DAFS Design Points
4 Designed for high performance– Minimize client-side overhead– Base protocol: remote DMA, flow control– Operations: batch I/O, cache hints, chaining
4 Direct application access to transport resources– Transfers file data directly to application buffers– Bypasses operating system overhead– File semantics
4 Improved semantics to enable local file sharing– Superset of CIFS, NFSv3, NFSv4 (and local file systems!)– Consistent high-speed locking– Graceful client and server failover, cluster fencing
4 http://www.dafscollaborative.org
5Usenix FAST ‘03
DAFS Protocol
4 Session-based
4 Strong authentication
4 Message format optimized
4 Multiple data transfer models
4 Batch I/O
4 Cache hints
4 Chaining
6Usenix FAST ‘03
DAFS Protocol Enhanced Semantics
4 Rich locking
4 Cluster fencing
4 Shared key reservations
4 Exactly-once failure semantics
4 Append mode, Create-unlinked, Delete-on-last-close
7Usenix FAST ‘03
DAT – Direct Access Transport
4 Common requirements and an abstraction of services for RDMA - Remote Direct Memory Access
– Portable, high-performance transport underpinning for DAFS and applications
– Defines communications endpoints, transfer semantics, memory description, signalling, etc.
4 Transfer models:– Send (like traditional network flow)– RDMA Write (write directly to advertised peer memory)– RDMA Read (read from advertised peer memory)
4 Transport independent– 1 Gb/s VI/IP, 10 Gb/s InfiniBand, future RDMA over IP
4 http://www.datcollaborative.org
8Usenix FAST ‘03
DAFS Inline Read
READ_INLINE
ApplicationBuffer
Send Descriptor
ReceiveDescriptor
Client
REPLY
ServerBuffer
Send Descriptor
ReceiveDescriptor
Server
READ_INLINE
REPLY
1
23
9Usenix FAST ‘03
DAFS Direct Read
READ_DIRECT
ApplicationBuffer
Send Descriptor
ReceiveDescriptor
Client
REPLY
ServerBuffer
Send Descriptor
ReceiveDescriptor
Server
READ_DIRECT
REPLY
1
2
3
RDMA Write
10Usenix FAST ‘03
DAFS Inline Write
WRITE_INLINE
ApplicationBuffer
Send Descriptor
ReceiveDescriptor
Client
REPLY
ServerBuffer
Send Descriptor
ReceiveDescriptor
Server
WRITE_INLINE
REPLY
1
23
11Usenix FAST ‘03
DAFS Direct Write
WRITE_DIRECT
ApplicationBuffer
Send Descriptor
ReceiveDescriptor
Client
REPLY
ServerBuffer
Send Descriptor
ReceiveDescriptor
Server
WRITE_DIRECT
REPLY
1
2
3
RDMA Read
12Usenix FAST ‘03
DAFS-enabled Applications
Raw Device Adapter
Disk I/OSyscalls
Application(unchanged)
Buffers
Device Driver
DAFS Library
DAT Provider Library
NIC Driver
RDMA NIC
• Kernel-level plug-in• Looks like raw disk• App uses standard
disk I/O calls• Very limited access to
DAFS features• Performance similar
to direct-attached disk
Kernel File System
File I/OSyscalls
Application(unchanged)
Buffers
File System
DAFS Library
DAT Provider Library
NIC Driver
RDMA NIC
• Kernel-level plug-in• Peer to local FS• App uses standard
file I/O semantics• Limited access to
DAFS features• Performance similar
to local FS
User Library
Application(modified)
Buffers
RDNA NIC
DAFS Library
DAT Provider Library
NIC Driver
UserSpace
OSKernel
H/W
• User-level library• Best performance• Full application
access to DAFS semantics
• Paper focuses on this style
UserSpace
OSKernel
H/W
DAFS APICalls
13Usenix FAST ‘03
DAFS API
4 File based: exports DAFS semantics4 Designed for highest application performance4 Lowest client CPU requirements of any I/O system4 Rich semantics that meet or exceed local file system
capabilities4 Portable and consistent interface and semantics
across platforms– No need for different mount options, caching policies,
client-side SCSI commands, etc.– DAFS API interface is completely specified in an open
standard document, not in OS-specific documentation
4 Operating system avoidance
14Usenix FAST ‘03
The DAFS API
4 Why a new API?– Backward compatibility with POSIX is fruitless
• File descriptor sharing, signals, fork()/exec()– Performance
• RDMA (memory registration), completion groups– New semantics
• Batch I/O, cache hints, named attributes, open with key, delete on last close
– Portability• OS independence and semantic consistency
15Usenix FAST ‘03
Key DAFS API Features
4 Asynchronous– High performance interfaces support native asynchronous
file I/O– Many I/Os can be issued and awaited concurrently
4 Memory registration– Efficiently prewires application data buffers, permitting
RDMA (direct data placement)
4 Extended semantics– Batch I/O, delete on last close, open with key, cluster
fencing, locking primitives
4 Flexible completion model– Completion groups segregate related I/O– Applications can wait on specific requests, any of a set, or
any number of a set
16Usenix FAST ‘03
Key DAFS API Features
4 Batch I/O– Essentially free I/O: amortizes costs of I/O issue over many
requests– Asynchronous notification of any number of completions– Scatter/gather file regions and memory regions
independently– Support for high-latency operations– Cache hints
4 Security and authentication– Credentials for multiple users– Varying levels of client authentication: none, default,
plaintext password, HOSTKEY, Kerberos V, GSS-API
4 Abstraction– server discovery, transient failure and recovery, failover,
multipathing
17Usenix FAST ‘03
Benchmarks
4 Microbenchmarks to measure throughput and cost per operation of DAFS versus traditional network I/O
4 Application benchmark to demonstrate value of modifying application to use DAFS API
18Usenix FAST ‘03
Benchmark Configuration
4 User-space DAFS library, VI provider4 NetApp F840 Server, fully cached workload
– Adapters (GbE):• Intel PRO/1000• Emulex GN9000 VI/TCP
– NFSv3/UDP, DAFS
4 Sun 280R client– Adapters:
• Sun “Gem 2.0”• Emulex GN9000 VI/TCP
4 Point-to-point connections
19Usenix FAST ‘03
Microbenchmarks
4 Measures read performance4 NFS kernel versus DAFS user4 Asynchronous and Synchronous4 Throughput versus blocksize4 Throughput versus CPU time4 DAFS advantages are evident:
– Increased throughput– Constant overhead per operation
20Usenix FAST ‘03
Microbenchmark Results
21Usenix FAST ‘03
Application (GNU gzip)
4 Demonstrates benefit of user I/O parallelism4 Read, compress, write 550MB file4 Gzip modified to use DAFS API
– Memory preregistration, asynchronous read and write
4 16KB blocksize4 1 CPU, 1 process: DAFS advantage4 2 CPUs, 2 processes: DAFS 2x speedup
22Usenix FAST ‘03
GNU gzip Runtimes
23Usenix FAST ‘03
Conclusion
4 DAFS protocol enables high-performance local file sharing
4 DAFS API leverages benefit of user space I/O
4 The combination yields significant performance gains for I/O intensive applications