+ All Categories
Home > Documents > Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine...

Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine...

Date post: 20-May-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
19
9/11/2016 1 Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray
Transcript
Page 1: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

1

Solutions Showcase: Machine Learning Workloads on Cray Systems

Mark Staveley

Machine Learning Research @ Cray

Page 2: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

2

Machine Learning

Analytics

Artificial Intelligence

eResearch 2016 - (c) Cray Inc

NoSQL

Streaming

GraphMachine

Learning

Some Deep

Learning

MapD

SQreamSome Machine

Learning

Deep Learning

CPU focus

GPU focus

SQL

eResearch 2016 - (c) Cray Inc

Page 3: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

3

Create code from data

Insight Into Data

Emulate human mind

eResearch 2016 - (c) Cray Inc

eResearch 2016 - (c) Cray Inc

Page 4: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

4

eResearch 2016 - (c) Cray Inc

Machine Learning Workflow

eResearch 2016 - (c) Cray Inc

Page 5: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

5

DL Example: Radiograph Classifier

Training Images

Image Results:

Bounding Boxes

Classification Scores For:

Tumors

Broken bones

Lesions

Etc.

3) Trained model

scores images1) Designs Neural

Net Architecture

2) Training defines

linking weights

Developer /

Researcher

Important Points:

• Parallelism and local memory: GPUs

• Single / half-precision floats speed up

learning without loss of accuracy

• All-to-all communication is required

for scale and works well on Aries

• Scoring is separate from training

eResearch 2016 - (c) Cray Inc

DL Example: Radiograph Classifier

Training Images

Image Results:

Bounding Boxes

Classification Scores For:

Tumors

Broken bones

Lesions

Etc.

3) Trained model

scores images1) Designs Neural

Net Architecture

2) Training defines

linking weights

Developer /

Researcher

Important Points:

• Parallelism and local memory: GPUs

• Single / half-precision floats speed up

learning without loss of accuracy

• All-to-all communication is required

for scale and works well on Aries

• Scoring is separate from training

eResearch 2016 - (c) Cray Inc

Page 6: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

6

Artificial Intelligence Pipeline

eResearch 2016 - (c) Cray Inc

Artificial Intelligence Pipeline

eResearch 2016 - (c) Cray Inc

Page 7: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

7

Why Cray ?

eResearch 2016 - (c) Cray Inc

NoSQL

Streaming

GraphMachine

Learning

Some Deep

Learning

MapD

SQreamSome Machine

Learning

Deep Learning

CPU focus

GPU focus

SQL

eResearch 2016 - (c) Cray Inc

Page 8: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

8

HPC-type Scale

and Problems

NoSQL

Streaming

GraphMachine

Learning

Some Deep

Learning

MapD

SQreamSome Machine

LearningDeep

Learning

CPU focus

GPU focus

SQL

Copyright 2016 Cray Inc.

eResearch 2016 - (c) Cray Inc

HPC-type Scale

and Problems

NoSQL

Streaming

GraphMachine

Learning

Some Deep

Learning

MapD

SQreamSome Machine

LearningDeep

Learning

CPU focus

GPU focus

100+ Layer

Neural

Networks

Specific Scope

and Use Cases

Convergence Data chains & Analytics workflows

Teams & Tools

Platforms and Technologies

SQL

Copyright 2016 Cray Inc.

eResearch 2016 - (c) Cray Inc

Page 9: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

9

HPC-type Scale

and Problems

NoSQL

Streaming

GraphMachine

Learning

Some Deep

Learning

MapD

SQreamSome Machine

LearningDeep

Learning

CPU focus

GPU focus

100+ Layer

Neural

Networks

Specific Scope

and Use Cases

Convergence Data chains & Analytics workflows

Teams & Tools

Platforms and Technologies

SQL

Copyright 2016 Cray Inc.

eResearch 2016 - (c) Cray Inc

Map Reduce

N-body methods

Graph traversal

Graphical models

Dense and sparse linear algebra

Spectral methods

Structured and unstructured grids

Combinational logic

Dynamic programming

Backtrack and branch-and-bound

Finite-state machines

Basic statistics – simple Map Reduce implementation

Generalized n-body problems

Graph-theoretic computations

Linear algebraic computations

Optimizations – e.g., linear programming

Integration/machine learning

Alignment problems – e.g., BLAST

eResearch 2016 - (c) Cray Inc

Landscape of Parallel Computing Research (Berkeley – 2006/2008)

State of Big Data: Use Cases and Ogre Patterns (NIST 2014)

Page 10: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

10

Components

eResearch 2016 - (c) Cray Inc

eResearch 2016 - (c) Cray Inc

Hardware

Data

OS Software

Management

Application Software

Page 11: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

11

eResearch 2016 - (c) Cray Inc

Single Machine

Cloud

Cluster Cloud

Linux (CoreOS, CentOS, Ubuntu, RedHat)

Docker, Mesos, Kubernetes, Marathon, Fleet,

CNTK

NAS HDFS

TensorFlow MXNet Caffe Torch Warp DSSTNE

eResearch 2016 - (c) Cray Inc

Single Machine

Cloud

Cluster Cloud

Linux (CoreOS, CentOS, Ubuntu, RedHat)

Docker, Mesos, Kubernetes, Marathon, Fleet

CNTK

NAS HDFS

TensorFlow MXNet Caffe Torch Warp

HW

Storage

OS

Mgmt

DSSTNEToolkits

Page 12: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

12

Where is Cray headed ?

eResearch 2016 - (c) Cray Inc

eResearch 2016 - (c) Cray Inc

Single Machine

Cloud

Cluster Cloud

Linux (CoreOS, CentOS, Ubuntu, RedHat)

Docker, Mesos, Kubernetes, Marathon, Fleet

CNTK

NAS HDFS

TensorFlow MXNet Caffe Torch Warp

HW

Storage

OS

Mgmt

DSSTNEToolkits

Page 13: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

13

CS-Storm

Three Focus Areas

• Computation

• Storage

• Analytics

● NVIDIA M40 24GB + CS-Storm● Variant of CS-Storm designed for Machine Learning (ML)

● 8 x M40 24GB / Machine ● (3072 CUDA cores + 24 GB GPU memory)

● 512 GB – 1 TB of RAM

● Up to 6 SSDs

● Dual Rail IB

● Key Features / Data Points● Workloads have seen a 1.2 – 1.8x improvement

● Optimizations in CUDA not available with K40 or K80

● Building Block for Deep Learning Compute Solution

● Power and Cooling Integrity

eResearch 2016 - (c) Cray Inc

eResearch 2016 - (c) Cray Inc

Page 14: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

14

eResearch 2016 - (c) Cray Inc

eResearch 2016 - (c) Cray Inc

Page 15: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

15

eResearch 2016 - (c) Cray Inc

Artificial Intelligence Pipeline

eResearch 2016 - (c) Cray Inc

Page 16: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

16

Deep Learning @ Cray

Many supported frameworks, common challenges

Need for Data

Support for 3rd Party Libraries (e.g. MKL-DNN & cuDNN)

Highly scalable SGD codes are on the way and

some are already here:

● CNTK-1 Bit SGD and BlockMomentum

● TensorFlow distributed

● MXNET MPI+OpenMP parallelism

eResearch 2016 - (c) Cray Inc

Cray Deep Learning Research

eResearch 2016 - (c) Cray Inc

Page 17: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

17

Three Focus Areas

• Computation

• Storage

• Analytics

https://blogs.technet.microsoft.com/inside_microsoft_research/2015/12/07/microsoft-computational-network-toolkit-offers-most-efficient-distributed-deep-learning-computational-performance/

eResearch 2016 - (c) Cray Inc

CNTK Scaling on XC-40s and Clusters

0

5

10

15

20

25

30

1 2 4 8 16 32

AV

G E

PO

CH

TIM

E

NUMBER GPUS

CNTK-1BIT SGD FFN BenchmarkXC(Aeries) vs Storm(IB)

Storm-K40 + Default OpenMPI Storm-K40 + Tuned OpenMPI XC-K40 Cray MPICH Defaults XC-K40 Cray MPICH Tuned

eResearch 2016 - (c) Cray Inc

Page 18: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

18

Wrap Up

eResearch 2016 - (c) Cray Inc

Summary

● Close Relationship between HPC workloads and Machine Learning

● Machine Learning is changing how we think about HPC (Data Movement, Workload Resiliency, etc.)

● Desire to make ML easy on Cray Systems (choice of HW and Toolkits)

● Fake Data / Small Data – negative influence on performance optimization targets

● Real Data Sets / Large Scale Workloads – challenges with libraries, implementations and HW

● Engineering and Research Development across Cray HW platforms & components

● Understanding and Learning (Different ML Toolkits + Data Movement + Network Performance Optimizations)

eResearch 2016 - (c) Cray Inc

Page 19: Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine Learning Workloads on Cray Systems Mark Staveley Machine Learning Research @ Cray.

9/11/2016

19

Thank You

Mark_Staveley

[email protected]

Thursday – 10:30-10:50 – Industry State 1/2 –

Scaling Out Deep Learning Workloads on Cray

Systems


Recommended