+ All Categories
Home > Data & Analytics > Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Date post: 21-Apr-2017
Category:
Upload: spark-summit
View: 6,844 times
Download: 0 times
Share this document with a friend
21
Which Is Deeper Comparison of Deep Learning Frameworks Atop Spark Zhe Dong, Dr. Yu Cao EMC Corporation
Transcript
Page 1: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Which Is DeeperComparison of Deep Learning Frameworks Atop Spark

Zhe Dong, Dr. Yu CaoEMC Corporation

Page 2: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Outline• Motivation• Theoretical Principle• State-of-the-Art• Evaluation Criteria• Evaluation Results • Summary• Conclusion

2

Page 3: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Deep Learning on Spark Motivation• Single-machine DL

• Low efficiency (in hours to even days)

• Limited DNN model capability (hard to

support billions of parameters)

• Dedicated deep learning cluster

• Massive data movement

• High maintenance cost

• Spark+Deep Learning = Truly All-in-One

3

Page 4: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Theoretical Principle• Large Scale Distributed Deep Networks,Jeffrey Dean,2012

• Model parallelism• Data parallelism

https://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks.pdf

4

Page 5: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Data Parallelism for distributed SGD• Model is replicated on worker nodes

• Two repeating steps – Train each model replica with mini-batches– Synchronize model parameters across cluster

• Specific implementations can be different– How parameters are combined– Synchronization (strong or weak)– Parameter server (centralized or not)

5

Page 6: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

DownpourSGD Client Pseudo code

http://www.cs.toronto.edu/~ranzato/publications/DistBeliefNIPS2012_withAppendix.pdf

6

Page 7: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

DL on Spark – State-of-the-Artl AMPLab SparkNet

l Yahoo! CaffeOnSpark

l Arimo Tensorflow On Spark

l Skymind DeepLearning4J

l DeepDist

l H2O Spark

7

Page 8: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Evaluation Criteria Evaluation Criteria

Dimensions For Example

Ease of Getting Started

Documentation Are there detailed, well-organized, up-to-date documents?

Installation How automatic it is?

Built-in Examples Examples available for quick warming up?

Ease of Use Interface Programming language support

Model Encapsulation Model/Layer/Node

Functionality Built-in Models Which NN models have been implemented?

Parallelism Model parallelism or data parallelism

Performance Performance MNIST benchmark results

Status Quo Community Vitality Github project statistics

Enterprise Support Contributions from organizations? 8

Page 9: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

SparkNet• Started by AMPLab from 2015• Wrapper of Caffe and Tensorflow• Centralized parameter server• Strong SGD synchronization• Differentiating feature: A fixed number (τ) of iterations (mini-

batch) on its subset of data

http://arxiv.org/pdf/1511.06051v4 9

Page 10: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

AMPLab SparkNet - Evaluation Evaluation Criteria

Dimensions SparkNet Score

Ease of Getting Started

Documentation Paper; No Blog; README.md in Github

Installation No Installation; Have to copy to each worker node

Built-in Examples Cifar10/MNIST/ImageNet

Ease of Use Interface Java/Scala

Model Encapsulation

Model/Layer

Functionality Built-in Models Tensorflow and Caffe

Parallelism Data Parallelism

Performance Performance MNIST

Status Quo CommunityVitality

EnterpriseSupport

AMPLab

1 2 3 4Iterations 1000 2000 5000 10000

Time (seconds) 2130 4218 10471 21003

Accuracy 94.13% 94.26% 94.01% 94.22%

10

Page 11: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Deeplearning4J• Started by Skymind from 2014 • An open-source, distributed deep-learning project in Java

and Scala• Parameter server: IterativeReduce• Strong SGD synchronization

http://deeplearning4j.org/iterativereduce.html 11

Page 12: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Deeplearning4J - Evaluation Evaluation Criteria

Dimensions DL4J Score

Ease of Getting Started

Documentation Comprehensive but bad-organized

Installation No Installation

Built-in Examples Only For CDH5;MNIST/IRIS/GravesLSTM

Ease of Use Interface Java/Scala

Model Encapsulation

Layer

Functionality Built-in Models CNN/RNN/LSTM/DBN/SAE

Parallelism Data Parallelism

Performance Performance MNIST

Status Quo CommunityVitality

EnterpriseSupport

Skymind

1 2 3 4Epochs 5 10 15 20Time (seconds) 2098 4205 6303 8367

Accuracy 70% 79% 82.7% 84.6%

12

Page 13: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

CaffeOnSpark• Started by Yahoo! from 2015• Peer-to-Peer parameter server• Strong SGD synchronization• Distinguishing feature: MPI Allreduce, RMDA, Infiniband

w1 w2 w3

w1 w2 w3w1 w2 w3

Worker 1 (Parameter Server for w1)

Worker 3(Parameter Server for w3)

Worker 2(Parameter Server for w2)

Weights propagation(Gradients are sent in reverse direction)

13

Page 14: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

CaffeOnSpark - Evaluation Evaluation Criteria

Dimensions CaffeOnSpark Score

Ease of Getting Started

Documentation Blog; README.md in github

Installation Have to install all Caffe needed in each node

Built-inExamples

Cifar10/MNIST

Ease of Use Interface Java/Scala, DataFrames

Model Encapsulation

Model

Functionality Built-in Models Caffe

Parallelism Data Parallelism

Performance Performance MNIST

Status Quo CommunityVitality

EnterpriseSupport

Yahoo!

1 2 3 4Iterations 1000 2000 5000 10000Time(seconds) 224 445 1113 2229

Accuracy 97% 99.4% 99.7% 99.6%

14

Page 15: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Tensorflow on Spark• Started by Arimo from 2014• A data-parallel Downpour SGD implementation on Spark• Centralized parameter server• Weak SGD synchronization

15

Page 16: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Tensorflow on Spark - Evaluation Evaluation Criteria

Dimensions Tensorflow on Spark Score

Ease of Getting Started

Documentation Blog; Spark Summit East 2016 slides and video

Installation Dependent on Tensorflow and tornado

Built-inExamples

MNISTcnn/MNISTdnn/higgsdnn/moleculardnn

Ease of Use Interface Python

Model Encapsulation

Model/Layer

Functionality Built-in Models Tensorflow

Parallelism Data Parallelism

Performance Performance MNIST

Status Quo CommunityVitality

EnterpriseSupport

Arimo

1 2 3 4Epochs 5 10 15 20Time(seconds) 223 415 615 828

Accuracy 93% 94% 94.2% 95.4%

16

Page 17: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Benchmark – MNIST

2030405060708090

100

0 2000 4000 6000 8000 10000

SparkNetDL4JCaffeOnSparkTensorflow on Spark

One master (16-Core,64GB) Five slaves (8-Core,32GB)Executor memory: 20GBBatch size: 64

Accuracy

Time (seconds) 17

Page 18: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Benchmark – MNIST

2030405060708090

100

0 200 400 600 800 1000

SparkNetCaffeOnSparkTensorflow on Spark

One master (16-Core,64GB) Five slaves (8-Core,32GB)Executor memory: 20GBBatch size: 64

Accuracy

Time (seconds) 18

Page 19: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Tensorflow On Spark - Evaluation • Easy of Use

– Language:Java/Scala– Interface Level: Model/High Level Network Structure

• Function– Algorithm:Excellent– Data Parallel– Only Ethernet

• Easy to Get Start– Document: Average– Need to setup in each node– Example: Average

• Performance• Maturity

– Early Stage– Community: bad– Commercial or Big company support: AMPLab

Evaluation Criteria

Dimensions SparkNet DL4J CaffeOnSpark Tensorflow on Spark

Ease of Getting Started

Documentation

Installation

Built-in Examples

Ease of Use Interface

Model Encapsulation

Functionality Built-in Models

Parallelism

Performance Performance

Status Quo Community Vitality

Enterprise Support

19

Page 20: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

Conclusion• Common issues

– Lack of model parallelism– Potential network congestion– Early-stage development

• Future evaluation work– GPU integration– SGD synchronization – Scalability

20

Page 21: Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

THANK [email protected]


Recommended