+ All Categories
Home > Documents > Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute...

Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute...

Date post: 25-Dec-2015
Category:
Upload: nathaniel-manning
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
24
Virtualizing Modern High- Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China Bo Li, Zhigang Huo, Panyong Zhang, Dan Meng {leo, zghuo, zhangpanyong, md}@ncic.ac.cn Presenter: Xiang Zhang [email protected]
Transcript
Page 1: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Virtualizing Modern High-Speed Interconnection Networks with

Performance and Scalability

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

Bo Li, Zhigang Huo, Panyong Zhang, Dan Meng

{leo, zghuo, zhangpanyong, md}@ncic.ac.cn

Presenter: Xiang Zhang [email protected]

Page 2: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Introduction

• Virtualization is now one of the enabling technologies of Cloud Computing

• Many HPC providers now use their systems as platforms for cloud/utility computing, these HPC on Demand offerings include:– Penguin's POD– IBM's Computing On Demand service– R Systems' dedicated hosting service– Amazon’s EC2

Page 3: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Introduction:

Virtualizing HPC clouds?• Pros:– good manageability– proactive fault tolerance– performance isolation– online system maintenance

• Cons:– Performance gap

• Lack low latency interconnects, which is important to tightly-coupled MPI applications

• VMM-bypass has been proposed to relieve the worry

Page 4: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Introduction:

VMM-bypass I/O Virtualization• Xen split device driver model only used to setup necessary

user access points• data communication in the critical path bypasses both the

guest OS and the VMM

Application

OS

OS-bypass I/O device

Application

OS

IDD VM

Guest Module

Backend Module

Privileged Module

Privileged AccessVMM-bypass Access VMM-Bypass I/O (courtesy [7])

Page 5: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Introduction:

InfiniBand Overview• InfiniBand is a popular

high-speed interconnect– OS-bypass/RDMA– Latency: ~1us– BW: 3300MB/s

• ~41.4% of Top500 now uses InfiniBand as the primary interconnect Interconnect Family / Systems

June 2010

Source: http://www.top500.org

Page 6: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

P5 P6 P7 P8

P1 P2 P3 P4

XRC domain

XRC domainnode1

node2

XRC in InfiniBand

P7 P8P5 P6

P3 P4P1 P2

RC in InfiniBand

node1

node2RQ SRQ

Introduction:

InfiniBand Scalability Problem• Reliable Connection (RC)

– Queue Pair (QP), Each QP consists of SQ and RQ– QPs require memory

• Shared Receive Queue (SRQ)• eXtensible Reliable Connection (XRC)

– XRC domain & SRQ-based addressing

Conns/Process:(N-1)×C

Conns/Process:(N-1)

SRQ5 SRQ6 SRQ7 SRQ8

N: node countC: cores per node

Page 7: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Problem Statement

• Does scalability gap exist between native and virtualized environments?– CV: cores per VM

XRCD

VM

XRCD

P7 P8P5 P6 VM

VM

XRCD XRCD

VM

XRCDVM

XRC domain

P7 P8

XRC domain

XRC domain

XRC domain

P5 P6

P1VM 2

VM 1

VM 4

VM 3

XRCD

P1VM

VM

XRCD XRCDVM

XRC in VMs (Cv=2) XRC in VMs (Cv=1)

Transport QPs per Process QPs per Node

Native RC (N-1)×C (N-1)×C2

XRC (N-1) (N-1)×C

VM RC (N-1)×C (N-1)×C2

XRC (N-1)×(C/CV) (N-1)×(C2/CV)

Scalability gap exists!

Page 8: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Presentation Outline

• Introduction• Problem Statement• Proposed Design• Evaluation• Conclusions and Future Work

Page 9: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Proposed Design:

VM-proof XRC design• Design goal is to eliminate the scalability gap– Conns/Process: (N-1)×(C/CV) (N-1)

P7 P8

Shared XRC domain

P5 P6

P1 VM

VM

VM

VM

Shared XRC domain

Page 10: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Proposed Design:

Design Challenges• VM-proof sharing of XRC domain

– A single XRC domain must be shared among different VMs within a physical node

• VM-proof connection management– With a single XRC connection, P1 is able to

send data to all the processes in another physical node (P5~P8), no matter which VMs those processes reside in

High-Speed Interconnection Network

Xen Hypervisor

Abstraction Device Interface (ADI)

Internal MPI Architecture

VM-proof CM

Channel Interface

InfiniBand OS-bypass I/O

MPI Library

CommunicationDevice APIs

MPI Application

Core InfiniBand Modules

Front-end DriverVM-proof

XRCD sharing

Resource Management

Core InfiniBand Modules

Back-end Driver

VM-proof XRCD sharing

Resource Management

UserspaceKernel

Native HCA Driver

Device Mananger and Control Software

IDD Guest Domain

Device Channel

Event ChannelP7 P8

Shared XRC domain

P5 P6

P1 VM

VM

VM

VM

Shared XRC domain

Page 11: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Proposed Design:

Implementation• VM-proof sharing of XRCD– XRCD is shared by opening the same XRCD file– guest domains and IDD have dedicated, non-

shared filesystem– pseudo XRCD file and real XRCD file

• VM-proof CM– Traditionally IP/hostname was used to identify

a node– LID of the HCA is used instead

Page 12: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Proposed Design:

Discussions• safe XRCD sharing– unauthorized applications from other VMs may share

the XRCD • the isolation of the sharing of XRCD could be guaranteed by

the IDD– isolation between VMs running different MPI jobs

• By using different XRCD files, different jobs (or VMs) could share different XRCDs and run without interfering with each other

• XRC migration– main challenge: XRC connection is a process-to-node

communication channel.• Future work

Page 13: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Presentation Outline

• Introduction• Problem Statement• Proposed Design• Evaluation• Conclusions and Future Work

Page 14: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Evaluation:

Platform• Cluster Configuration:– 128-core InfiniBand Cluster– Quad Socket, Quad-Core Barcelona 1.9GHz– Mellanox DDR ConnectX HCA, 24-port MT47396

Infiniscale-III switch• Implementation– Xen 3.4 with Linux 2.6.18.8– OpenFabrics Enterprise Edition (OFED) 1.4.2– MVAPICH-1.1.0

Page 15: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Evaluation:

Microbenchmark• The bandwidth results are

nearly the same• Virtualized IB performs ~0.1us

worse when using blueframe mechanism.– memory copy of the sending data

to the HCA's blueframe pageIB verbs latency using doorbell

IB verbs latency using blueframe MPI latency using blueframe

Explanation: Memory copy operations under virtualized case would

include interactions between the guest

domain and the IDD.

Page 16: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Evaluation:

VM-proof XRC Evaluation• Configurations– Native-XRC: Native environment running XRC-

based MVAPICH.– VM-XRC (CV=n): VM-based environment running

unmodified XRC-based MVAPICH. The parameter CV denotes the number of cores per VM.

– VM-proof XRC: VM-based environment running MVAPICH with our VM-proof XRC design.

Page 17: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Evaluation:

Memory Usage• 16 cores/node cluster fully

connected– The X-axis denotes the

process count– ~12KB memory for each

QP• 16x less memory usage– 64K processes will

consume 13GB/node with the VM-XRC (CV=1) configuration

– The VM-proof XRC design reduces the memory usage to only 800MB/node

Better

800MB

13GB

Page 18: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Evaluation:

MPI Alltoall Evaluation

• a total of 32 processes• 10%~25% improvement for messages < 256B

Better

VM-proof XRC

Page 19: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Evaluation:

Application Benchmarks• VM-proof XRC performs

nearly the same as Native-XRC– Except BT and EP

• Both are better than VM-XRC

Better

Better• little variation for

different CV values• Cv=8 is an exception• Memory allocation not

NUMA-aware guaranteed

VM-proof XRC

Page 20: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Evaluation:

Application Benchmarks (Cont’d)

~15.9x less conns

~14.7x less conns

Page 21: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Conclusion and Future Work

• VM-proof XRC design converges two technologies– VMM-bypass I/O virtualization– eXtensible Reliable Connection in modern high speed interconnection

networks (InfiniBand)

• the same raw performance and scalability as in native non-virtualized environment with our VM-proof XRC design– ~16x scalability improvement is seen in 16-core/node clusters

• Future work– evaluations on different platforms with increased scale – add VM migration support to our VM-proof XRC design– extend our work to the newly SRIOV-enabled ConnectX-2 HCAs

Page 22: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Questions?

{leo, zghuo, zhangpanyong, md}@ncic.ac.cn

Page 23: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Backup Slides

Page 24: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

OS-bypass of InfiniBand

OpenIB Gen2 stack


Recommended