+ All Categories
Home > Engineering > Data platform at Samsung (Big Learning)

Data platform at Samsung (Big Learning)

Date post: 26-Jan-2015
Category:
Upload: zhuanzhuanding
View: 111 times
Download: 8 times
Share this document with a friend
Description:
 
Popular Tags:
20
SRA-SV | Cloud Research Lab SRA-SV | Cloud Research Lab Guangdeng Liao Zhan Zhang Samsung Cloud Research Lab Data Platform at Samsung
Transcript
Page 1: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research LabSRA-SV | Cloud Research Lab

Guangdeng LiaoZhan Zhang

Samsung Cloud Research Lab

Data Platform at Samsung

Page 2: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 2

Our Mission: provide scalable, reliable, and secure storage and computation for Samsung R&D

Samsung Data Platform

Resources: • Hundreds of machines • Petabytes of storage• keep increasing..

Page 3: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 3

What we have in our platform

Distributed MR processing Data warehousing with Hive/PigIn-house web-based ETL portalMany more..

Offline

K-V store HBaseIn-house Blob store

Online StormMany more..

Online

Apache Mahout ElasticSearch

In house unified web portalIn house Single Sign On

VisualizationMany more..

Dev. & management tools

By using platform, we already significantly improve ETL process, data management and processing for other teams!!

Page 4: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 4

So, are we done?

No. Many more complex challenges.

Page 5: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 5

Challenge #1: How to build scalable and efficient machine learning over Big Data?

Page 6: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 6

MR-based Mahout is good but...

Not good at expressing data dependency and iterative algorithms like PageRank

Map: distribute rank to link targets

Reduce: collect ranks from multiple sources

Iterate

n

i i

i

tC

tPR

NxPR

1 )(

)()1(

1)(

One job/iteration Startup penaltyI/O Penalty

Unfortunately, a lot of MLDM are iterative jobs

Page 7: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 7

Graph naturally represents data dependency

Page 8: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 8

Graph-based Processing: Think like a Vertex

Scheduling

p p

p

p

p

p

p

In-memory data graph over a cluster

Communication– Message-based– Shared memory-

based

Vertex abstraction– Think like a vertex’s– In-memory processing

Execution engine– Bulk synchronous

parallel – Asynchronous parallel

Popular frameworks: – Giraph– GraphLab

Page 9: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 9

Graph-based Machine Learning

We used Apache Giraph 1.0 and developed machine learning library over it:

Alternative Least Square (ALS)

Weight ALSSGD ( Matrix Factorization)

Bias SGD

Belief Propagation

Recommendation Graphical Model

KMeansKMeans++

Fuzzy-Clustering

Clustering

We see one magnitude order of speedups compared to MR-based approach in our cluster

Page 10: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 10

Challenge #2: How to make Big Model + Big Data like Deep Learning scalable and efficient?

Page 11: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 11

One example: Deep Learning1

Many more examples (millions to billions parameters ) in Speech Recognition, Image Processing and NLP

1Imagenet classification with deep convolutional neural networks, in NIPS 2012

Page 12: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 12

Model-Parallel Framework

User defined model

Auto-generation of model topology

Auto-partition of topology over

clusterc1

c2

Auto-deployment of topology (in-

memory)

c3

Neuron-like programming

Message-based communication

Message-driven computation

Parallelize a big machine learning model over a cluster

Page 13: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 13

Architecture over Yarn

Node Manager

Node manager

ControllerPartition and

deploy topology

Node manager

Application Master

Container

Container

Container

Data Communication:• node-

level• group-

level

Control comm. based on Thrift

Data comm. based on Netty

Page 14: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 14

Execution Engine• Execution Engine (Deep Neural Net)

– Training layer by layer controlled by Execution Engine..

– Progress reporting– Process control: end user can control the

training process, and even restart the process from a certain point

– System snapshot for fault tolerance

Input

RBM

RBMSoftmax

Fully connected

• Generic Execution Engine– Abstract the common design pattern from our development

experiences of deep neural net algorithm.– Generalized to support various other algorithms

Page 15: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 15

Model-parallel is still not scalable enough over Big Data

Page 16: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 16

Deep Learning Platform: Hybrid of Data-parallelism and Model-parallelism

……..Data Chunk

Model-parallel Model-parallel

Data Chunk

……..

Parameter Server 1

Parameter Server n

……..

Parameters coordination

Data-parallelism

Lots of model instances

Parameter servers help models learn

each other

Page 17: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 17

Distributed Parameter Servers

Client Client Client

HBase/HDFS

In-memory cache/storage

In-memory cache/storage

In-memory cache/storage

Server 1 Server 2 Server 3

Netty communication layer

Currently we support asynchronous parameter pulls and push Synchronized version is also supported

Pull/Push/Sync

Page 18: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 18

Deep Learning Algorithms

Aim at three major application fields: speech recognition, image processing and NLP

What we have developed Our Roadmap

Feed Forward Neural NetworkRestricted Boltzmann Machine

Deep Belief NetworkSparse Auto-encoder

Convolutional Neural NetworkRecurrent Neural Network

Page 19: Data platform at Samsung (Big Learning)

SRA-SV | Cloud Research Lab Slide 19

Summary

• We are providing our Hadoop-based data platform– hundreds machines, petabytes of storages– Hadoop ecosystem (MapReduce, HBase, Yarn, HDFS, Zookeeper, Oozie, Lipstick, Mahout etc.)– In-house ETL pipeline– In-house unified web portal with SSO

• We are working hard on big learning to make our platform intelligent– Large-scale graph-based machine learning – Large-scale deep learning – And many more under progress

Page 20: Data platform at Samsung (Big Learning)

Q&A


Recommended