PETUUM A New Platform for Distributed Machine Learning on Big Data Eric P. Xing, Qirong Ho Wei Dai,...

1

PETUUMA New Platform for Distributed Machine

Learning on Big Data

Eric P. Xing, Qirong Ho Wei Dai, Jin Kyu Kim

Jinliang Wei, Seunghak Lee Xun Zheng, Pengtao Xie

Abhimanu Kumar, Yaoliang Yu

2

What they think…• Machine Learning is becoming a primary mechanism for extracting

information from data.• Need ML methods to scale beyond single machine.• Flickr, Instagram and Facebook are anecdotally known to possess 10s

of billions of images.• Highly inefficient to use such big data sequentially in a batch or

scholastic fashion in a typical iterative ML algorithm.

3

• Despite rapid development of many new ML models and algorithms aiming at scalable application, adoption of these technologies remains generally unseen in the wider data mining, NLP, vision, and other application communities.

• Difficult migration from an academic implementation (small desktop PCs, small lab clusters) to a big, less predictable platform (cloud or a corporate cluster) prevents ML models and algorithms from being widely applied.

Why build a new Framework…?• Find a systematic way to e ciently apply a wide ffi

spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes).

• Modern parallelization strategies employ fine-grained operations and scheduling beyond the classic bulk-synchronous processing paradigm.

• Or even specialized graph-based execution that relies on graph representations of ML programs.

• But it remains di cult to find a universal platform ffiapplicable to a wide range of ML programs at scale. 4

5

Problems with other platforms• Hadoop : simplicity of its Map Reduce abstraction makes it

difficult to exploit ML properties such as error tolerance and its performance on many ML programs has been surpassed by alternatives.

• Spark : Spark does not offer fine-grained scheduling of computation and communication for fast and correct execution of advanced ML algorithms.

• GraphLab and Pregel efficiently partition graph-based models; but ML programs such as topic modeling and regression either do not admit obvious graph representations, or a graph representation may not be the most efficient choice.

• In summary, existing systems manifest a unique tradeoff on efficiency, correctness, programmability, and generality.

6

What they say…

7

Petuum in a nutshell…

• A distributed machine learning framework.• Aims to provide a generic algorithmic and systems

interface to large scale machine learning.• Takes care of difficult systems "plumbing work" and

algorithmic acceleration. • Simplified the distributed implementation of ML

programs - allowing us to focus on model perfection and Big Data Analytics.

• It runs efficiently at scale on research clusters and cloud compute like Amazon EC2 and Google GCE.

8

Their Philosophy…• Most ML programs are defined by an explicit objective function

over data (e.g., likelihood).• The goal is to attain the optimality of this function, in the space

defined by the model parameters and other intermediate variables.

• Operational objectives such as fault tolerance and strong consistency are absolute necessary.

• ML program’s true goal is fast, efficient convergence to an optimal solution.

• Petuum is built on an ML-centric optimization-theoretic principle, as opposed to various operational objectives.

9

So how they built it…

• Formalized ML algorithms as iterative-convergent programs -• stochastic gradient descent• MCMC for determining point estimates in latent variable models• coordinate descent, variational methods for graphical methods• proximal optimization for structured sparsity problems, and others

• Found out the shared properties across all algorithms.• Key lies in the recognition of a clear dichotomy b/w DATA and

MODEL• This Inspired bimodal approach to parallelism: data parallel and

modal parallel distribution and execution of a big ML program over cluster of machines.

10

Data parallel and Model parallel approach

• This approach exploits unique statistical nature of ML algorithms, mainly three properties –

• Error tolerance – iterative-convergent algorithms are robust against limited errors in intermediate calculations.

• Dynamic structural dependency – changing correlation strengths between model parameters critical to efficient parallelization.

• Non-uniform convergence – No. of steps required for a parameter to converge can be highly skewed across parameters.

11

12

Parallelization Strategies

13

Principle formulation for Data & Model Parallelism

• Iterative – Convergent ML Algorithm : Given data and model (i.e., a fitness function such as likelihood ), a typical ML problem can be grounded as executing the following update equation iteratively, until the model state (i.e., parameters and/or latent variables) reaches some stopping criteria:

subscript (t) denotes iteration• The update function () (which improve the loss) performs

computation on data and model state and,• Outputs intermediate results to be aggregated by ().

14

Data Parallelism, In which data is divided across machines

15

Data is partitioned and assigned to computational workersAssumption - function () can be applied to each data set independently, yielding the equation:

() outputs are aggregated via summationIt is crucial because CPUs can produce updates must faster than they can be transmitted.Each parallel worker contributes “equally”.

16

Model Parallelism, In which ML model is divided across machines

17

Model is partitioned and assigned to workersUnlike data-parallelism, each update function () also takes a scheduling function (), which restricts () to operate on a subset of the model parameters :

Unlike data parallelism, the model parameters are not independent.Hence, definition of model-parallelism includes a global scheduling mechanism that select carefully-chosen parameters for parallel updating.

18

Petuum System Design

19

Petuum Programming Interface

20

Petuum Program Structure

21

Performance

Petuum improves the performance of ML applications by making every update or iteration more effective, without compromising on update speed. More effective updates means faster ML completion time.

22

Petuum topic model (LDA)

Settings: 4.5GB dataset (8.2m docs, 737m tokens, 141k vocab, 1000 topics), 50 machines (800 cores), Petuum v1.0 vs YahooLDA, program completion = reached -5.8e9 log-likelihood

23

Petuum sparse logistic regression

Settings: 29GB dataset (10m features, 50k samples), 8 machines (512 cores), Petuum v0.93 vs Shotgun Lasso, program completion = reached 0.5 loss function

24

Petuum multi-class logistic regression

Settings: 20GB dataset (253k samples, 21k features, 5.4b nonzeros, 1000 classes), 4 machines (256 cores), Petuum v1.0 vs Synchronous parameter server, program completion = reached 0.0168 loss function

25

26

27

Some speed highlights• Logistic regression: learn a 10m-dimensional model from 30GB

of sparse data in 20 minutes, on 8 machines with 16 cores each.

• LDA topic model: learn 1k topics on 8m documents (140k unique words) in 17 minutes, on 25 machines with 16 cores each.

• Matrix Factorization (collaborative filtering): train on a 480k-by-20k matrix with rank 40 in 2 minutes, on 25 machines with 16 cores each.

• Convolutional Neural Network built on Caffe: Train Alexnet (60m parameters) in under 24 hours, on 8 machines with a Tesla K20 GPU each.

• MedLDA supervised topic model: learn 1k topics on 1.1m documents (20 labels) in 85 minutes, on 20 machines with 12 cores each.

• Multiclass Logistic Regression: train on the MNIST dataset (19GB, 8m samples, 784 features) in 6 minutes, on 8 machines with 16 cores each.

28

What it doesn’t do• Primarily about Allowing ML practitioners to implement and

experiment with new data/model-parallel ML algorithms on small-to-medium clusters.

• lacks features that are necessary for clusters with ≥ 1000 machines.

• Such as Automatic recovery from machine failure.• Experiments focused on clusters with 10-100 machines.

29

Thoughts• Highly efficient and fast for their target users.• Good library with 10+ algorithms.• Lacks features for more then >=1000 machines but no

experiments even near to 500 machines, only 50 machines.• Petuum is specifically designed for algorithms such as

optimization algorithms and sampling algorithms.• So, Not a silver bullet for all Big Data problems.

30

Questions?

31

Thank you

Nitin Saroha

Date post:	21-Jan-2016
Category:	Documents
Upload:	lorraine-logan
View:	213 times
Download:	0 times

PETUUM A New Platform for Distributed Machine Learning on Big Data Eric P. Xing, Qirong Ho Wei Dai,...

Documents