+ All Categories
Home > Technology > XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel...

XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel...

Date post: 18-Jan-2017
Category:
Upload: the-linux-foundation
View: 377 times
Download: 2 times
Share this document with a friend
28
1 High-Performance Virtualization for HPC Cloud on Xen Jun Nakajima Tianyu Lan
Transcript
Page 1: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

1

High-Performance Virtualization for HPC

Cloud on Xen

Jun Nakajima

Tianyu Lan

Page 2: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

2

Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS

OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.

Intel may make changes to specifications and product descriptions at any time, without notice.

All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others.

Copyright © 2016 Intel Corporation.

Page 3: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

3

Agenda

• Intel® Xeon Phi™ processor

• HPC Cloud usage

• Challenges for Xen

• Achieving high performance

• Call for action

Page 4: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

4

Intel® Xeon Phi™ x100 Product

Family formerly

codenamed

Knights Corner

Intel® Xeon Phi™ x200 Product Family

codenamed

Knights Landing

Skylake

The world is going parallel – stick with sequential

code and you will fall behind.

61

4

512-bit

352 GB/s

Cores

Threads/Core

Vector Width

Peak Memory Bandwidth

18

2

256-bit

68 GB/s

72

4

512-bit (x2)

>500 GB/s

28

2

512-bit

128 GB/s

Intel® Xeon® Processor E5-2600 v3 Product

Family formerly codenamed

Haswell

Intel® Xeon® Processor E5-2600 v4 Product Family codenamed

Broadwell

22

2

256-bit

77 GB/s

The world is going parallel

4

Page 5: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

5

Intel® Xeon Phi™ Processor

• Intel’s first bootable host processor specifically designed for HPC

• Binary compatible with Xeon Processor

• Integration of memory on package: Innovative memory architecture for high bandwidth

and high capacity

• Integration of Omni-path Fabric on package

Page 6: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

6

*Results will vary. This simplified test is the result of the distillation of the more in-depth programming guide found here: https://software.intel.com/sites/default/files/article/383067/is-xeon-phi-right-for-me.pdf

All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.1 Over 3 Teraflops of peak theoretical double-precision performance is preliminary and based on current expectations of cores, clock frequency and floating point operations per cycle. FLOPS = cores x clock frequency x floating-point operations per second per cycle.2 Host processor only

22 nm process

Coprocessor only

>1 TF DP Peak

Up to 61 Cores

Up to 16GB GDDR5

Available Today

Knights CornerIntel® Xeon Phi™ x100 Product Family

Launched

Knights LandingIntel® Xeon Phi™ x200 Product Family

Future

Knights Hill3rd generation

14 nm process

Host Processor & Coprocessor

>3 TF DP Peak1

Up to 72 Cores

Up to 16GB HBM

Up to 384GB DDR42

~460 GB/s STREAM

Integrated Fabric2

10 nm process

Integrated Fabric (2nd

Generation)

In Planning…

Intel® Xeon Phi™ Product Family

Page 7: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

7

Hardware Overview

Chip: Up To 36 tiles interconnected by Mesh

Tile: 2 Cores + 2 VPU/core + 1MB L2

Core: 4 hyper threads / core

ISA: Binary Compatible with Intel Xeon processors

+ AVX 512 extension

Memory

:

Up To 16GB on-package MCDRAM

+ up to 6 channels of DDR4-2400 (up to 384GB)

IO: 36 lanes PCIe Gen3 + 4 lanes DMI for chipset

Node: 1-socket only

DDR4

x4 DMI2 to PCH36 Lanes PCIe* Gen3 (x16, x16, x4)

MCDRAM MCDRAM

MCDRAM MCDRAM

DDR4

TILE:

Tile

IMC (integrated memory controller)

EDC (embedded DRAM controller)

IIO (integrated I/O controller)

Xeon Phi

2VPU

Core

2VPU

Core1MBL2

HUB

Page 8: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

8

MCDRAM Memory modes

Cache Mode Flat Mode Hybrid Mode

Description

Hardware automatically manages the MCDRAM as a “memory side cache” between CPU and ext DDR memory

Manually manage how the app usesthe integrated on-package memory and external DDR for peak perf

Joins the benefits of both Cache and Flat modes by segmenting the integrated on-package memory

DRAM8 or 4 GB MCDRAM

8 or 12GBMCDRAM8GB/ 16GB

MCDRAM

Up to 384 GB

DRAM

Ph

ysic

al A

dd

ress

DRAM16GB

MCDRAM

64B cache lines direct-mapped

Page 9: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

9

MCDRAM(Flat)

• Platform with 2 NUMA nodes

• Memory allocated in DDR by default

• Keep low bandwidth data out of MCDRAM

• Apps explicitly allocates important data in MCDRAM

NUMA 0

CPU DDR MCDRAM

NUMA 1

Platform

Page 10: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

10

Agenda

• Intel® Xeon Phi™ processor

• HPC Cloud usage

• Challenges for Xen

• Achieving high performance

• Call for action

Page 11: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

11

HPC Cloud usage

• Single VM on one machine

• Expose most host CPUs to VM

• More than 255 VCPUs in VM

• Expose MCDRAM to VM

• Pass through Omni-path Fabric to VM

Page 12: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

12

Agenda

• Intel® Xeon Phi™ processor

• HPC Cloud usage

• Challenges for Xen

• Achieving high performance

• Call for action

Page 13: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

13

Challenges for Xen

• Support >255 VCPUs

• Virtual IOMMU support

• Scalability

• Scalability issue in tasklet subsystem

Page 14: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

14

Support >255 VCPUs

• HVM guest supports 128 VCPUs

• X2APIC mode is required for >255 VCPUs

• Linux disables X2APIC mode when no IR(interrupt remapping)

• No Virtual IOMMU support in Xen

• > 255 VCPUs => X2APIC => IR => Virtual IOMMU

• Enable DMA translation first

• Linux IOMMU driver can’t work without DMA translation

Page 15: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

15

Virtual IOMMU

Hvmloader

Virtual IOMMU

Dom0Qemu

Dummy

Xen-VIOMMU

Hypervisor

VM

Xenstore

Hypercall

Linux Kernel

IOMMU driver

ACPI DMAR

Page 16: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

16

Virtual IOMMU (DMA Translation)

Virtual IOMMU

Dom0Qemu

Dummy

Xen-VIOMMU

Hypervisor

VMLinux Kernel

IOMMU driver

Physical IOMMU

IOVA

Physical

PCI Device

Hardware

Virtual

PCI device

Memory

Region

DMA

Memory

Address

Translation

DMA

IOVA -> GPA

Shadow

IOVA->HPAIOVA->

Target GPA

IOVA->HPA

Page 17: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

17

Virtual IOMMU (IR)

Dom0 Qemu

Hypervisor

VM Linux kernel

IOMMU driver

VIOAPIC/VMSI

IRQ

VLAPIC

VIRQ

IRQ

subsystem

Hardware

Virtual

PCI device

Virtual IOMMU

Physical

PCI Device

IR table

IRQ

Remapping

Device

Driver

Inject VIRQ

Page 18: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

18

Challenge for Xen

• Support >255 VCPUs

• Virtual IOMMU support

• Scalability

• Scalability issue in tasklet subsystem

Page 19: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

19

Scalability issue in tasklet subsystem

• Tasklist work lists are percpu data structures

• A global spin lock “tasklet_lock” protects all these lists

• Tasklet_lock becomes hot point when running heavy workload in VM

• Take average180k tsc count to acquire global lock (IO VM exit:150k tsc count)

• Change tasklet_lock to percpu lock

6350 50

87 85 86

0

20

40

60

80

100

Stream Dgemm Sgemm

Benchmark

Host Original VM Optimizated VM

Page 20: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

20

Agenda

• Intel® Xeon Phi™ processor

• HPC Cloud usage

• Challenges for Xen

• Achieving high performance

• Call for action

Page 21: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

21

Achieving high performance

• Expose key compute resources to VM:

• CPU topology

• MCDRAM

• Reduce timer interrupts

Page 22: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

22

VM CPU topology

Guest

Native

Pin

Core 0

0 1 2 3

Core 0

0 1 2 3

Core 63 Core 64

Core 63 Core XCore 0

Core Y

Dom 0

• HPC software assigns workload according CPU topology

• Balance work load among physical cores

Page 23: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

23

Expose MCDRAM to VM

• Create vNUMA nodes as host’s

NUMA topology

• Keep vNUMA of MCDRAM with far

distance to vNUMA of CPU

Host

VM

NUMA 0

VNUMA 0 VNUMA 1

NUMA 1

CPU DDR MCDRAM

RAMVCPU RAM

Page 24: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

24

Reduce timer interrupts

• Local APIC timer interrupt causes frequent VM exit(26000 exits/s) during running

benchmark

• Reduce timer interrupt via setting timer_slop to 10ms

• Side affect: Low timer’s resolution

63

50 50

87 85 86

99 98 97

0

50

100

Stream Dgemm Sgemm

Benchmark

Host Original VM

Tasklet fixed VM Timer slop VM

Page 25: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

2525

Reduce timer interrupts(Next to do)

Hypervisor:

• No need scheduler for single VM

Guest:

• Make Guest Linux tickless

Page 26: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

26

Agenda

• Intel® Xeon Phi™ processor

• HPC Cloud usage

• Challenges for Xen

• Achieving high performance

• Call for action

Page 27: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

27

Call for action

• We were able to achieve high-performance HPC on Xen

• Changes required in Xen

• Increase vcpu numbers

› 128 => 255 vcpus

› Virtual IOMMU

Page 28: XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima & Tianyu Lan, Intel Corp.

Q & A


Recommended