+ All Categories
Home > Technology > Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

Date post: 22-Jan-2018
Category:
Upload: arimo-inc
View: 1,225 times
Download: 0 times
Share this document with a friend
34
@adataoinc @pentagoniac http://adatao.com adatao.com/deeplearning 1 Distributed Deep Learning on Tachyon & Spark Christopher Nguyen, PhD Vu Pham Michael Bui, PhD
Transcript
Page 1: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniachttp://adatao.com

adatao.com/deeplearning

1

Distributed Deep Learning on Tachyon & Spark

Christopher Nguyen, PhDVu Pham

Michael Bui, PhD

Page 2: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

2

The Journey

1. What We Do At Adatao

2. Challenges We Ran Into

3. How We Addressed Them

4. Lessons to Share

1. How some interesting things came about

2. Where some interesting things are going

3. How some good engineering/architectural decisions are made

Along the Way, You’ll Hear

Page 3: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

3

Acknowledgements/Discussions with

§Nam Ma, Adatao

§Haoyuan Li, TachyonNexus

§Shaoshan Liu, Baidu

§Reza Zadeh, Stanford/Databricks

Page 4: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

4

Adatao 3 Pillars

App Development

BIG APPS

PREDICTIVE ANALYTICS

NATURAL INTERFACES

COLLABO-RATION

Big Data + Big Compute

Page 5: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

5

Deep Learning Use Case

IoT Customer Segmentation

Fraud Detection

Page 6: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

6

etc…

Challenge 1: Deep Learning Platform Options

Page 7: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

6

etc…Which approach?

Challenge 1: Deep Learning Platform Options

Page 8: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

7

It Depends!

Page 9: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

8

MapReduce vs Pregel At Google: An Analogy

— Sanjay Ghemawat, Google

If you squint at a problem just a certain way, it becomes a MapReduce problem

Page 10: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

9

API

API

API

Data EngineerData ScientistBusiness Analyst

Custom Apps

Page 11: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

10

Moral: “And No Religion, Too.”

Architectural choices are locally optimal.1What’s best for someone else isn’t necessarily best for you. And vice versa.2

Page 12: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

11

Challenge 2: Who-Does-What Architecture

DistBelief

Large Scale Distributed Deep Networks, Jeff Dean et al, NIPS 2012

1. Compute Gradients

2. Update Params (Descent)

Page 13: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

12

The View From Berkeley—Dave Patterson, UC Berkeley

7 Dwarfs of

Parallel Computing —Phillip Colella, LLBL

Page 14: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

13

Unleashing the Potential of Tachyon

Memory-Based Filesystem Datacenter-Scale Distributed Filesystem

today

Filesystem-Backed Shared Memory Datacenter-Scale Distributed Memory

tomorrow

Ah Ha!

Page 15: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

14

Spark & Tachyon Architectural Options

Model as Broadcast Variable

Spark-Only

Model Stored as

Tachyon File

Tachyon- Storage

Model Hosted in

HTTP Server

Param- Server

Model Stored and Updated by Tachyon

Tachyon- CoProc

Page 16: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

15

Tachyon CoProcessor Concept

Page 17: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

16

Tachyon CoProcessor

Page 18: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

17

A. Compute Gradients

B. Update Params (Descent)

Tachyon-Storage In Detail

Page 19: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

18

A. Compute Gradients

B. Update Params (Descent)

Tachyon-CoProcessors In Detail

Page 20: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

19

Tachyon-CoProcessors

§Spark workers do Gradients

- Handle data-parallel partitions

- Only compute Gradients; freed up quickly

- New workers can continue gradient compute where previous workers left off (mini-batch behavior)

§ Use Tachyon for Descent

- Model Host

- Parameter Server

Page 21: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

20

Demo

ImageNet model

Custom Apps

Adatao AppBuilder

Adatao PredictiveEngine

Data EngineerData ScientistBusiness Analyst

[Model Zoo, Network-in-Network]

Predictive Intelligence for All

Page 22: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

21

Result

Page 23: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

22

Data Set Size Model Size

MNIST 50K x 784 1.8M

Higgs 10M x 28 8.4M

Molecular 150K x 2871 14.3M

Page 24: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

23

Constant-Load ScalingRe

lativ

e Sp

eed

0

6

12

18

24

# of Spark Executors8 16 24

Spark-Only Tachyon-Storage Param-Server Tachyon-CoProc

Page 25: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

24

Training-Time Speed-Up

-20%

-0%

20%

40%

60%

80%

100%

MNIST Molecular Higgs

Spark-Only Tachyon-Storage Param-Server Tachyon-CoProc

Page 26: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

25

Training ConvergenceEr

ror R

ate

0%

25%

50%

75%

100%

Iterations400 10000 20000

Spark-Only Tachyon-Storage Param-Server Tachyon-CoProc

Page 27: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

26

Lessons Learned: Tachyon CoProcessors

§ Spark (Gradient) and Tachyon (Descent) can be scaled independently

§ The combination gives natural mini-batch behavior

§ Up to 60% speed gain, scales almost linearly, and converges faster.

Page 28: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

27

Lessons Learned: Data Partitioning

§Tunable Number of Data Partitions

- Big partitions: slow convergence, shorter time per epoch

- Small partitions: faster convergence, longer time per epoch (for network communication)

Page 29: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

28

Lessons Learned: Memory Tuning

§Typically each machine needs:

- (model_size + batch_size * unit_count) * 3 * 4 * 1.5 * executors

§batch_size matters

§ If low RAM capacity, reduce the number of executors

Page 30: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

29

Lessons Learned: GPU vs CPU

§GPU is 10x faster on local, 2-4x faster on Spark

§GPU memory is limited. AWS commonly 4-6 GB of memory

§Better to have multiple GPUs per worker

§On JVM with multi-process accesses, GPUs might fail randomly

Page 31: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

30

Summary

Page 32: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

31

Summary

§Tachyon is much more than memory-based filesystem

- Tachyon can become filesystem-backed shared-memory

§Combination of Spark & Tachyon CoProcessing yields superior Deep Learning performance in multiple dimensions

§Adatao is open-sourcing both:

- Tachyon CoProcessor design & code

- Spark & Tachyon-CoProcessor Deep Learning implementation

Page 33: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

32

Appendices

Page 34: Tachyon Meetup: First-Ever Scalable, Distributed Deep Learning Architecture using Tachyon & Spark

@adataoinc @pentagoniac adatao.com/deeplearning

33

Design Choices

§Combine the results of Spark workers:

- Parameter averaging

- Gradient averaging ✓

- Best model


Recommended