Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | damian-lyons |
View: | 220 times |
Download: | 0 times |
Sockets Direct Protocol Over InfiniBand in Clusters: Is it Beneficial?
P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, J. Wu and
D. K. Panda
Network Based Computing Laboratory
The Ohio State University
Presentation Layout
Introduction and Background
Sockets Direct Protocol (SDP)
Multi-Tier Data-Centers
Parallel Virtual File System (PVFS)
Experimental Evaluation
Conclusions and Future Work
Introduction
• Advent of High Performance Networks– Ex: InfiniBand, Myrinet, 10-Gigabit Ethernet
– High Performance Protocols: VAPI / IBAL, GM, EMP
– Good to build new applications
– Not so beneficial for existing applications• Built around Portability: Should run on all platforms
• TCP/IP based Sockets: A popular choice
• Performance of Application depends on the Performance of Sockets
• Several GENERIC optimizations for sockets to provide high performance
– Jacobson Optimization: Integrated Checksum-Copy [Jacob89]
– Header Prediction for Single Stream data transfer
[Jacob89]: “An analysis of TCP Processing Overhead”, D. Clark, V. Jacobson, J. Romkey and H. Salwen. IEEE Communications
Network Specific Optimizations
• Generic Optimizations Insufficient– Unable to saturate high performance networks
• Sockets can utilize some network features– Interrupt Coalescing (can be considered generic)
– Checksum Offload (TCP stack has to modified)
– Insufficient!
• Can we do better?– High Performance Sockets
– TCP Offload Engines (TOE)
High Performance Sockets
High Performance Network
Pseudo sockets layer
Application or Library
Hardware
Kernel
User Space
SocketsOS Agent
NetworkNative Protocol
NIC
IP
TCP
Sockets
Application or Library
Hardware
Kernel
User Space
Traditional Berkeley Sockets High Performance Sockets
InfiniBand Architecture Overview
• Industry Standard
• Interconnect for connecting compute and I/O nodes
• Provides High Performance– Low latency of lesser than 5us
– Over 840MBps uni-directional bandwidth
– Provides one-sided communication (RDMA, Remote Atomics)
• Becoming increasingly popular
Sockets Direct Protocol (SDP*)• IBA Specific Protocol for Data-Streaming
• Defined to serve two purposes:– Maintain compatibility for existing applications
– Deliver the high performance of IBA to the applications
• Two approaches for data transfer: Copy-based and Z-Copy
• Z-Copy specifies Source-Avail and Sink-Avail messages– Source-Avail allows destination to RDMA Read from source
– Sink-Avail allows source to RDMA Write to the destination
• Current implementation limitations:– Only supports the Copy-based implementation
– Does not support Source-Avail and Sink-Avail
*SDP implementation from the Voltaire Software Stack
Presentation Layout
Introduction and Background
Sockets Direct Protocol (SDP)
Multi-Tier Data-Centers
Parallel Virtual File System (PVFS)
Experimental Evaluation
Conclusions and Future Work
Multi-Tier Data-Centers
• Client Requests come over the WAN (TCP based + Ethernet Connectivity)
• Traditional TCP based requests are forwarded to the inner tiers• Performance is limited due to TCP
• Can we use SDP to improve the data-center performance?
• SDP is not compatible with traditional sockets: Requires TCP termination!
(Courtesy Mellanox Corporation)
3-Tier Data-Center Test-bed at OSU
Database ServersClients
Application Servers
Web Servers
Proxy Nodes
Tier 0
Tier 1
Tier 2
Generate requests for both web servers and
database servers
TCP TerminationLoad Balancing
Caching
Caching
Dynamic Content CachingPersistent Connections
File System evaluation
Caching Schemes
Apache
MySQL
or
DB2
PHP
Apache
WAN
Presentation Layout
Introduction and Background
Sockets Direct Protocol (SDP)
Multi-Tier Data-Centers
Parallel Virtual File System (PVFS)
Experimental Evaluation
Conclusions and Future Work
Network
Parallel Virtual File System (PVFS)
ComputeNode
ComputeNode
ComputeNode
ComputeNode
Meta-DataManager
I/O ServerNode
I/O ServerNode
I/O ServerNode
MetaData
Data
Data
Data
• Relies on Striping of data across different nodes
• Tries to aggregate I/O bandwidth from multiple nodes
• Utilizes the local file system on the I/O Server nodes
Parallel I/O in Clusters via PVFS
• PVFS: Parallel Virtual File System– Parallel: stripe/access data across multiple nodes– Virtual: exists only as a set of user-space daemons– File system: common file access methods (open, read/write)
• Designed by ANL and Clemson
iod
Local file systems
iod
Local file systems
mgr…
Network
Posix MPI-IO
libpvfs
Applications
Posix MPI-IO
libpvfs
Applications…
ControlData
“PVFS over InfiniBand: Design and Performance Evaluation”, Jiesheng Wu, Pete Wyckoff and D. K. Panda. International Conference on Parallel Processing (ICPP), 2003.
Presentation Layout
Introduction and Background
Sockets Direct Protocol (SDP)
Multi-Tier Data-Centers
Parallel Virtual File System (PVFS)
Experimental Evaluation Micro-Benchmark Evaluation
Data-Center Performance
PVFS Performance
Conclusions and Future Work
Experimental Test-bed
• Eight Dual 2.4GHz Xeon processor nodes
• 64-bit 133MHz PCI-X interfaces
• 512KB L2-Cache and 400MHz Front Side Bus
• Mellanox InfiniHost MT23108 Dual Port 4x HCAs
• MT43132 eight 4x port Switch
• SDK version 0.2.0
• Firmware version 1.17
Latency and Bandwidth Comparison
• SDP achieves 500MBps bandwidth compared to 180MBps of IPoIB• Latency of 27us compared to 31us of IPoIB• Improved CPU Utilization
Latency and CPU utilization on SDP vs IPoIB
0
10
20
30
40
50
60
70
Message Size
Tim
e (
us
)
0
10
20
30
40
50
60
% C
PU
uti
liza
tio
nIPoIB CPU SDP CPU IPoIB
SDP VAPI send/recv VAPI RDMA write
Bandwidth and CPU utilization on SDP vs IPoIB
0
100
200
300
400
500
600
700
800
900
4 16 64 256 1K 4K 16K 64KMessage Size
Ban
dw
idth
(Mb
ytes
/s)
0
40
80
120
160
200
% C
PU
util
izat
ion
IPoIB CPU SDP CPU IPoIBSDP VAPI send/recv VAPI RDMA write
Hotspot Latency
Hotspot Latency on SDP vs IPoIB
0
100
200
300
400
500
600
700
1 2 3 4 5 6 7Number of Nodes
Tim
e (
us
)
0
20
40
60
80
100
120
140
160
180
200
% C
PU
Uti
lizati
on
16K IPoIBCPU 16K SDP CPU 1K IPoIB 1K SDP
4K IPoIB 4K SDP 16K IPoIB 16K SDP
• SDP is more scalable in hot-spot scenarios
Data-Center Response TimeClient Response Time
0
50
100
150
200
250
32
K
64
K
12
8K
25
6K
51
2K
1M
2M
Message Size (bytes)
Re
spo
nse
Tim
e (
ms)
IPoIB SDP
Web Server Delay
0
5
10
15
20
25
32K
64K
128k
256k
512k
1024
k
2048
k
Message Size (bytes)T
ime
Sp
en
t (m
s)
IPoIB SDP
• SDP shows very little improvement: Client network (Fast Ethernet) becomes the bottleneck
• Client network bottleneck reflected in the web server delay: up to 3 times improvement with SDP
Data-Center Response Time (Fast Clients)
0
5
10
15
20
25
30
32K 64K 128K 256K 512K 1M 2M
Message Size (bytes)
Re
spo
nse
Tim
e (
ms)
IPoIB
SDP
• SDP performs well for large files; not very well for small files
Data-Center Response Time Split-up
Init + Qtime8% Request Read
3%
Core Processing10%
URL Manipulation1%
Back-end Connect32%Request Write
2%
Reply Read14%
Cache Update2%
Response Write25%
Proxy End3%
Init + Qtime9% Request Read
3%
Core Processing12%
URL Manipulation1%
Back-end Connect14%
Request Write2%
Reply Read15%
Cache Update3%
Response Write38%
Proxy End3%
IPoIB SDP
Data-Center Response Time without Connection Time Overhead
0
5
10
15
20
25
30
32K 64K 128K 256K 512K 1M 2M
Message Size (bytes)
Re
spo
nse
Tim
e (
ms)
IPoIB
SDP
• Without the connection time, SDP would perform well for all file sizes
PVFS Performance using ramfsRead Bandwidth (3 IODs)
0
500
1000
1500
1 2 3 4 5
No. of Clients
Band
wid
th (M
Bps)
IPoIB SDP VAPI
Write Bandwidth (3IODs)
0200400600
80010001200
1 2 3 4 5No. of Clients
Ban
dwid
th (M
Bps
)
Read Bandwidth (4IODs)
0
500
1000
1500
2000
1 2 3 4No. of Clients
Ban
dwid
th (M
Bps
)
Write Bandwidth (4IODs)
0
500
1000
1500
1 2 3 4No. of Clients
Ban
dwid
th (M
Bps
)
PVFS Performance with sync (ext3fs)
60
65
70
75
IPoIB SDP VAPI
Agg
raga
te B
andw
idth
(M
byte
s/s)
• Clients can push data faster to IODs using SDP; de-stage bandwidth remains the same
Conclusions
• User-Level Sockets designed with two motives:– Compatibility for existing applications
– High Performance for modern networks
• SDP was proposed recently along similar lines
• Sockets Direct Protocol: Is it Beneficial?– Evaluated it using micro-benchmarks and real applications
• Multi-Tier Data-Centers and PVFS
– Benefits in environments it’s good for• Communication intensive environments such as PVFS
– Demonstrate environments it’s yet to mature for• Connection overhead involving environments such as Data-Centers
Future Work
• Connection Time bottleneck in SDP– Using dynamic registered buffer pools, FMR techniques, etc
– Using QP pools
• Power-Law Networks
• Other applications: Streaming and Transaction
• Comparison with other high performance sockets
For more information, please visit the
http://nowlab.cis.ohio-state.edu
Network Based Computing Laboratory,
The Ohio State University
Thank You!
NBC Home Page
Backup Slides
TCP Termination in SDP
OS by pass
Ethernet
IP
TCP
Proxy
SDP
Infiniband HWEthernet
IP
TCP
Sockets
Browser
Sockets
SDP
Infiniband HW
Web ServerHTTP HTTP
HTML HTML
Personal Notebook/Computer Blade ServersNetwork Service Tier
Ethernet Communication InfiniBand Communication