+ All Categories
Home > Documents > Apache hama @ Samsung SW Academy

Apache hama @ Samsung SW Academy

Date post: 11-Nov-2014
Category:
Upload: edward-j-yoon
View: 1,058 times
Download: 0 times
Share this document with a friend
Description:
 
Popular Tags:
17
Apache Hama a Bulk Synchronous Parallel Computing Edward J. Yoon <[email protected]>
Transcript
Page 1: Apache hama @ Samsung SW Academy

Apache Hamaa Bulk Synchronous Parallel Computing

Edward J. Yoon<[email protected]>

Page 2: Apache hama @ Samsung SW Academy

Who Am I

• Edward J. Yoon–@eddieyoon

• Founder of Apache Hama• PMC member of Apache BigTop• Oracle Employee

Page 3: Apache hama @ Samsung SW Academy

What’s Hama?

• Open Source – Under Apache 2.0 License

• Written In Java• Apache Top Level Project

Page 4: Apache hama @ Samsung SW Academy
Page 5: Apache hama @ Samsung SW Academy

Characteristics

• a General BSP computing engine– M/R like Input/Output Formatter

• SequenceFile, Text, Accumulo, Hbase, …, etc.

– Job Manager– Checkpoint Recovery

• Streaming and Pipes – Python, C++, …, etc.

• Graph and Machine Learning Packages– K-means, Gradient Descent, Collaborative Filtering

Page 6: Apache hama @ Samsung SW Academy

Bulk Synchronous Parallel?

• Originally introduced by Valiant• a Sequence of supersteps

Page 7: Apache hama @ Samsung SW Academy

Compare to M/R and MPI

• Supports message-passing paradigm style of application development

• Provides a flexible, simple, and easy-to-use small APIs

• Enables to perform better than MPI for communication-intensive applications

• Guarantees impossibility of deadlocks or collisions in the communication mechanisms

Page 8: Apache hama @ Samsung SW Academy

So, fit for what?

• Processing Big Data w/ complicated relationships– e.g., graph or network.

• Iterative or Recursive scientific applications

• Continuous Event Processing

Page 9: Apache hama @ Samsung SW Academy

Which is the Big Data?

Page 10: Apache hama @ Samsung SW Academy

Could be applied to

• Analyze user actions and patterns• Social Target Marketing• Observe evolution of Social networks• Detect anomaly rapidly in Real-time• Business Intelligence

Page 11: Apache hama @ Samsung SW Academy

Internals

• Pluggable RPC Architecture for message transfer– e.g., Hadoop RPC, Avro RPC, …, etc.

• Message Collector, Bundler, and Compressor to reduce network overheads and contentions– e.g., Snappy, Bzip2, …, etc.

Page 12: Apache hama @ Samsung SW Academy

BSP API

public abstract void bsp(BSPPeer<K1, V1, K2, V2, M> peer) throws IOException, SyncException;

Page 13: Apache hama @ Samsung SW Academy

BSP Examples

• Pi Calculation• Sparse Matrix-Vector Multiplication• K-means Clustering• Gradient Descent

Page 14: Apache hama @ Samsung SW Academy

Graph API

public void compute(Iterator<M> messages) throws IOException;

Page 15: Apache hama @ Samsung SW Academy

Graph Examples

• In-link Count• Single Source Shortest Path• Pagerank• Bipartitie Matching• Semi-Clustering

Page 16: Apache hama @ Samsung SW Academy

Find Maximum Value

Page 17: Apache hama @ Samsung SW Academy

SSSP Performance

• a SSSP for random graph of 1 billion edges is computed in 400 seconds on 1 Oracle BDA


Recommended