GPU Virtualization: Doing Much More with GPUs

Post on 16-Apr-2017

274 views 0 download

transcript

GPU Virtualization: Doing Much More with GPUsMazhar Memon, CTO BitfusionSC16 Salt Lake City, UT

1

Quick Poll: Any GPU users?

2

3

GPU Users: Much variety

• Learners• Application developers• HPC Scientist• CAD/CAE• Artist + Designers• Data Analyst

One size doesn’t fit all

Manufacturing

Retail & Finance

Media & Entertainment

Pharma & Healthcare

Oil & Gas

Deep Learning

4

Variety of GPU Sizes

• TX1• GTX• Tesla• Quadro

Problem: How to do more with your (static) GPUs?

5

Virtualization 101

6

Server

7

Hypervisor

VM

VM

VM

VM

Server

Hypervisor

vGPU

VM

vGPU

VM

vGPU

VM

vGPU

VM

Virtualization: Support More Users or Applications on a Single Server

Many users, small problems

GPUGPU

GPUGPU

GPUGPU

GPUGPU

GPUGPU

GPUGPU

GPUGPU

GPUGPU

9

One user, one big problemGPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

10

Large data, small device memory

GPUGPU

App requiremen

tAvailable memory

App demand >> GPU memory

11

GPU Virtualization on Steroids

Use your favorite GPUapplications as-is

Bitfusion Boost Layer

Your existing GPU infrastructure

Solve Small Problems Cheaply

GPUGPU

Slice GPUs into arbitrary fractionsMemory and process isolation

Available on Nimbix Today: $0.49 GPU instances

Logi

cally

atta

ched

GPU

s

Solve Large Problems Dynamically

CPU-only Node

48 Cores3 TB Memory

72 TB SSD Storage

BoostMassive Virtual NodeGPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

Racks with GPUs

GPU GPUGPU GPU

GPU GPUGPU GPU

GPU GPUGPU GPU

GPU GPUGPU GPU

GPU GPUGPU GPU

Creating the largest virtual GPU machines on demand

14

GPUGPU

Host Memory

Solve Large Data Problems Efficiently

Available memory

Dynamic paging of GPU memory backed by host memoryWorks for non-Pascal GPUs as well

15

Monitoring and Managing GPUs Easily

• Use your favorite tools:All common tools e.g. nvidia-smi work across

virtual clusters

16

Handling Faults Automatically

GPUGPU

GPUGPU

App

Failover to any other available GPU server uponCatastrophic, memory, intermittent faults

Bitfusion Boost: Software Stack

application

remote servers

local server

System view

Hardware

VM Hypervisor

Drivers

Operating system

SDI

User Space

Intercepts applications and applies a variety of rules including automatic scale-out, resource pooling, high availability, etc.

Hardware

VM Hypervisor

Drivers

Operating system

SDI

Hardware

VM Hypervisor

Drivers

Operating system

SDI

Open APIs

Custom APIs

Libraries

Application

Core Functions

Hardware

VM Hypervisor

Drivers

Operating system

SDI

Deploy on bare metal, containers, VMs. Secure, Portable, Frictionless

18

App Specific Instance Configurations as Machine

Images

Resource Pooling:• Consolidate use of compute resources• Increase utilization• Lower capital costs

Resource Provisioning:• Enforce CPU, memory, utilization quotas• Effect QoS policy and guarantees• Maximize utilization and reduce costs

High availability:• Detect failures at app level• Rollback, failover, error detection• Events for higher level reporting

Heterogeneous Offload:• Leverage HPC hardware• Interpose vendor libraries• Retarget hot functions to efficient specialized devices

Scale-out:• Distribute and load balance load across systems• Scale performance on demand• Take advantage of runtime optimizations

Advanced Profiling:• Understand application

demands of the datacenter• Fine-grained data provides

unique insight• Precise recommendations for

capacity planning

Deep Learning Caffe Deep Learning Torch

Deep Learning TensorflowMedia Transcoding

Rendering Scientific Computing

Boost: Add broad set of features to your application

http://www.bitfusion.io/boost-machine-images

19

Boost Available on Nimbix todayDeveloper-optimized machine configurations:

20

Learn more about Bitfusion Boost at boost.bitfusion.io