+ All Categories
Home > Documents > Shimin Chen Big Data Reading Group

Shimin Chen Big Data Reading Group

Date post: 06-Feb-2016
Category:
Upload: lesa
View: 29 times
Download: 0 times
Share this document with a friend
Description:
Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006). Shimin Chen Big Data Reading Group. Motivations. Industry-wide shift to multicore No good framework for parallelize ML algorithms - PowerPoint PPT Presentation
Popular Tags:
26
Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading Group
Transcript
Page 1: Shimin Chen Big Data Reading Group

Map-Reduce for Machine Learning on MulticoreC. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006)

Shimin ChenBig Data Reading Group

Page 2: Shimin Chen Big Data Reading Group

Motivations Industry-wide shift to multicore No good framework for parallelize ML

algorithms

Goal: develop a general and exact technique for parallel programming of a large class of ML algorithms for multicore processors

Page 3: Shimin Chen Big Data Reading Group

Idea

Statistical Query Model

Summation Form

Map-Reduce

Page 4: Shimin Chen Big Data Reading Group

Outline Introduction Statistical Query Model and Summation

Form Architecture (inspired by Map-Reduce) Adopted ML Algorithms Experiments Conclusion

Page 5: Shimin Chen Big Data Reading Group

Valiant Model [Valiant’84]

x is the input y is a function of x that we want to

learn In Valiant model, the learning

algorithm uses randomly drawn examples <x, y> to learn the target function

Page 6: Shimin Chen Big Data Reading Group

Statistical Query Model [Kearns’98]

A restriction on Valiant model A learning algorithm uses some

aggregates over the examples, not the individual examples

More precisely, the learning algorithm interacts with a statistical query oracle Learning algorithm asks about f(x,y) Oracle returns the expectation that f(x,y) is

true

Page 7: Shimin Chen Big Data Reading Group

Summation Form

Aggregate over the data: Divide the data set into pieces Compute aggregates on each cores Combine all results at the end

Page 8: Shimin Chen Big Data Reading Group

Example: Linear Regression using Least SquaresModel:Goal:

Solution: Given m examples: (x1, y1), (x2, y2), …, (xm, ym) We write a matrix X with x1, …, xm as rows, and row vector Y=(y1, y2, …ym). Then the solution is

Parallel computation:

Cut to m/num_processor pieces

Page 9: Shimin Chen Big Data Reading Group

Outline Introduction Statistical Query Model and Summation

Form Architecture (inspired by Map-Reduce) Adopted ML Algorithms Experiments Conclusion

Page 10: Shimin Chen Big Data Reading Group

Lighter Weight Map-Reduce for Multicore

Page 11: Shimin Chen Big Data Reading Group

Outline Introduction Statistical Query Model and Summation

Form Architecture (inspired by Map-Reduce) Adopted ML Algorithms Experiments Conclusion

Page 12: Shimin Chen Big Data Reading Group

Locally Weighted Linear Regression (LWLR)

Mappers: one sets compute A, the other set compute b Two reducers for computing A and b Finally compute the solution

When wi==1, this is least squares.

Solve:

Page 13: Shimin Chen Big Data Reading Group

Naïve Bayes (NB) Goal: estimate P(xj=k|y=1) and P(xj=k|y=0) Computation: count the occurrence of (xj=k, y=1) and

(xj=k, y=0), count the occurrence of (y=1) and (y=0), the compute division

Mappers: count a subgroup of training samples Reducer: aggregate the intermediate counts, and

calculate the final result

Page 14: Shimin Chen Big Data Reading Group

Gaussian Discriminative Analysis (GDA) Goal: classification of x into classes of y

assuming each class is a Gaussian Mixture model with different means but same covariance.

Computation: Mappers: compute for a subset of training

samples Reducer: aggregate intermediate results

Page 15: Shimin Chen Big Data Reading Group

K-means Computing the Euclidean distance between

sample vectors and centroids Recalculating the centroids Divide the computation to subgroups to be

handled by map-reduce

Page 16: Shimin Chen Big Data Reading Group

Expectation Maximization (EM) E-step computes some prob or counts per

training example M-step combines these values to update the

parameters Both of them can be parallelized using map-

reduce

Page 17: Shimin Chen Big Data Reading Group

Neural Network (NN) Back-propagation, 3-layer network

Input, middle, 2 output nodes Goal: compute the weights in the NN by back

propagation

Mapper: propagate its set of training data through the network, and propagate errors to calculate the partial gradient for weights

Reducer: sums the partial gradients and does a batch gradient descent to update the weights

Page 18: Shimin Chen Big Data Reading Group

Principal Components Analysis (PCA) Compute the principle eigenvectors of the covariance

matrix

Clearly, we can compute the summation form using map-reduce

Page 19: Shimin Chen Big Data Reading Group

Other Algorithms

Logistic Regression Independent Component Analysis Support Vector Machine

Page 20: Shimin Chen Big Data Reading Group

Time Complexity

Page 21: Shimin Chen Big Data Reading Group

Outline Introduction Statistical Query Model and Summation

Form Architecture (inspired by Map-Reduce) Adopted ML Algorithms Experiments Conclusion

Page 22: Shimin Chen Big Data Reading Group

Setup Compare map-reduce version and sequential

version 10 data sets Machines:

Dual-processor Pentium-III 700MHz, 1GB RAM 16-way Sun Enterprise 6000 (these are SMP, not multicore)

Page 23: Shimin Chen Big Data Reading Group

Dual-Processor SpeedUps

Page 24: Shimin Chen Big Data Reading Group

2-16 processor speedups

More data in the paper

Page 25: Shimin Chen Big Data Reading Group

Multicore Simulator Results

A paragraph on this Basically, says that results are

better than multiprocessor machines. Could be because of less

communication cost

Page 26: Shimin Chen Big Data Reading Group

Conclusion

Parallelize summation forms Use map-reduce on a single

machine


Recommended