Post on 01-Nov-2014
description
transcript
Written by: Grzegorz Malewicz et al at SIGMOD 2010
Presented by: Abolfazl Asudeh
CSE 6339 – Spring 2013
Pregel: A System for Large-Scale Graph Processing
04/08/20232
Problem? Very large graphs are a popular object of analysis:
e.g. Social networks, web and several other areas Efficient processing of large graphs is challenging:
poor locality of memory access very little work per vertex
Distribution over many machines: the locality issue? Machine failure?
There was no scalable general-purpose system for implementing arbitrary graph algorithms over arbitrary graph representations in a large-scale distributed environment
04/08/20233
Want to process a large scale graph?The options: Crafting a custom distributed infrastructure
Needs a lot of effort and has to be repeated for every new algorithm
Relying on an existing distributed platform: e.g. Map Reduce Must store graph state in each state too much
communication between stages Using a single-computer graph algorithm
library !! Using an existing parallel graph system
Do not address problems like fault tolerance that are very important in large graph processing
04/08/20234
How can we solve the problem?
Thousand/millions of Computers (Distributed System)
Billions of Vertices/Edges (Graph)
How to Assign?
04/08/20235
The high-level organization of Pregel programs
sequence of iterations, called super-steps. each vertex invokes a function conceptually in
parallel The function specifies behavior at a single
vertex V and a single superstep S can read messages from previous steps and
send messages for next steps Can modify the state? of V and its outgoing
edges.
InputAll Voteto Halt Outpu
t
04/08/20236
advantage?In vertex-centric approach
users focus on a local action processing each item independently ensures that Pregel programs are inherently
free of deadlocks and data races common in asynchronous systems.
04/08/20237
MODEL OF COMPUTATION A Directed Graph is given to Pregel It runs the computation at each vertex Until all nodes vote for halt Then Returns the results
All Voteto Halt Outpu
t
04/08/20238
Vertex State Machine Algorithm termination is based on every
vertex voting to halt In superstep 0, every vertex is in the active
state A vertex deactivates itself by voting to halt It can be reactivated by receiving an
(external) message
04/08/20239
The C++ API (Vertex Class)1. template <typename VertexValue,2. typename EdgeValue,3. typename MessageValue>4. class Vertex {5. public:6. virtual void Compute(MessageIterator* msgs) = 0;7. const string& vertex_id() const;8. int64 superstep() const;9. const VertexValue& GetValue();10. VertexValue* MutableValue();11. OutEdgeIterator GetOutEdgeIterator();12. void SendMessageTo(const string& dest_vertex,const
MessageValue& message);13. void VoteToHalt();14.};
04/08/202310
The C++ API - Message Passing Massages are guaranteed to deliver but not in
original order Each message is delivered only once Each vertex can send massage to any vertex
04/08/202311
Maximum Value Example
04/08/202312
The C++ API – other classes Combiners (not active by default)
User Can specify a way in this class to reduce the number of sending the same message
Aggregators Gather The global information such as Statistical
values (sum, average, min,…) Can Also be used as a global manager to force the
vertices to run a specific branches of their Compute functions during specific SuperSteps
04/08/202313
The C++ API Topology Mutation
Some graph algorithms need to change the graph's topology. E.g. A clustering algorithm may need to replace a
cluster with a node Add Vertex, Then add edge. Remove all edges
then remove vertex User defined handlers can be added to solve the
conflicts Input / Output
It has Reader/Writer for famous file formats like text and relational DBs
User can customize Reader/Writer for new input/outputs
04/08/202314
Implementation Pregel was designed for the Google cluster
architecture Each cluster consists of thousands of
commodity PCs organized into racks with high intra-rack bandwidth
Clusters are interconnected but distributed geographically
Vertices are assigned to the machines based on their vertex-ID ( hash(ID) ) so that it can easily be understood that which node is where
04/08/202315
Implementation1. User Programs are copied on machines2. One machine becomes the master.
Other computer can find the master using name service and register themselves to it
The master determines how many partitions the graph have
3. The master assigns one or more partitions (why?) and a portion of user input to each worker
4. The workers run the compute function for active vertices and send the messages asynchronously
There is one thread for each partition in each worker When the superstep is finished workers tell the master
how many vertices will be active for next superstep
04/08/202316
Fault tolerance At the end of each super step:
Workers checkpoint V, E, and Messages Master checkpoints the aggregated values
Failure is detected by “ping” messages from master to workers
The Master reassigns the partition to available workers and it is recovered from the last check point In Confined recovery only the missed partitions
have to be recomputed because the result of other computations are known Not possible for randomized algorithms
04/08/202317
Worker implementation Maintains some partitions of the graph Has the message-queues for supersteps S and
S+1 If the destination is not in this machine it is
buffered to be sent and when the buffer is full it is flashed
The user may define Combiners to send the remote messages
04/08/202318
Master Implementation maintains a list of all workers currently known
to be alive, including the worker's ID and address, and which portion of the graph it has been assigned
Does the synchronization and coordinates all operations
Maintains statistics and runs a HTTP server for user
04/08/202319
Aggregators Implementation The information are passed to the master in a
Tree Structure The workers may send their information to the
Aggregator machines and they aggregate and send the values the Master Machine
Master
Aggregator
Worker
Worker
04/08/202320
Application – Page Rank ?class PageRankVertex
: public Vertex<double, void, double> {public: virtual void Compute(MessageIterator* msgs) {
if (superstep() >= 1) {double sum = 0;for (; !msgs->Done(); msgs->Next())
sum += msgs->Value();*MutableValue() =0.15 / NumVertices() + 0.85 * sum;
}if (superstep() < 30) {
const int64 n = GetOutEdgeIterator().size();SendMessageToAllNeighbors(GetValue() / n);
} elseVoteToHalt();
}};
04/08/202321
Application – Shortest Path ?class ShortestPathVertex
: public Vertex<int, int, int> { void Compute(MessageIterator* msgs) {
int mindist = IsSource(vertex_id()) ? 0 : INF;for (; !msgs->Done(); msgs->Next())
mindist = min(mindist, msgs->Value());if (mindist < GetValue()) {
*MutableValue() = mindist;OutEdgeIterator iter = GetOutEdgeIterator();for (; !iter.Done(); iter.Next())
SendMessageTo(iter.Target(),mindist + iter.GetValue());
}VoteToHalt();
}};
04/08/202322
Application – Bipartite matching Problem: Find a set of edges in the bipartite
graph that share no endpoint1. each left vertex not yet matched sends a message
to each of its neighbors to request a match, and then unconditionally votes to halt
2. each right vertex not yet matched randomly chooses one of the messages it receives, sends a message granting that request
3. each left vertex not yet matched chooses one of the grants it receives and sends an acceptance message
4. The right node receives the message and votes to halt
04/08/202323
Experiments 300 Multicore PCs were used They just count the running time (not check
pointing) Measure the scalability of workers Measure the scalability over number of
vertices
04/08/202324
shortest paths runtimes for a binary tree with a billion vertices (and, thus, a billion minus one edges) when the number of Pregel workers varies from 50 to 800
04/08/202325
shortest paths runtimes for binary trees varying in size from a billion to 50 billion vertices, now using a fixed number of 800 worker tasks scheduled on 300 multicore machines.
04/08/202326
Random graphs that use a log-normal distribution of outdegrees
04/08/202327
Thank you