MapReduce Programming Model To Solve Graph Problems
Presented By: Nishant GandhiM.Tech. - CSE 1st Year1311CS05
Guided By:Dr. Rajiv Misra
Seminar Overview
• Introduction to MapReduce• MapReduce Programming Model– Word Count problem
• Graph Problems & MapReduce– Breath First Search– Augmenting Edges with Degree– Enumerating Triangles from Graph
Introduction to MapReduce• History of Computing
– Moore’s Law• Not holding since last few years• Memory is still bottle neck for large GHZ processor
– Distributed Problems• Indexing The Web, Simulating Internet Sized Network, Speeding Up Content
Delivery, Rendering Multiple Frames– Parallel Computing (1975-1985)
• Synchronization Problems• Very Costly Super Computers
– Distributed Computing (1995-Today)• Cost Effective Solution• Use Commodity Hardware• Google has no Super Computer
Introduction to MapReduce
• History of MapReduce at Google– Problem at Google• Computing Large Amount of Data on DS• Parallelize Computing, Distribute Data, Handle Failure
– One Solution• New Abstract that allows simple computation & hide
all other mess• Automatics Parallelization, Distribution, Fault Handling• MapReduce Paper 2004
MapReduce Programming Model
• Motivation– Automatic Parallelization & Distribution– Fault tolerant– Provides Status & Monitoring Tool– Clean Abstract For Programmer
MapReduce Programming Model
• Programming Model– Borrows From Functional Programming– User Implement interface of two functions• Map & Reduce
• map (in_key, in_value) --> (out_key, intermediate_value) list
• reduce (out_key, intermediate_value list) --> out_value list
MapReduce Programming Model
map: (K1,V1) → list (K2,V2)reduce: (K2,list(V2)) → list (K3,V3)1. Map function is applied to every input key-value pair2. Map function generates intermediate key-value pairs3. Intermediate key-values are sorted and grouped by
key4. Reduce is applied to sorted and grouped
intermediate key-values5. Reduce emits result key-values
MapReduce Programming Model
MapReduce Programming ModelExample: WordCount
Graph Problems
Graphs are ubiquitous in modern society. Some examples:
• The hyperlink structure of the web• Social networks on social networking sites like
Facebook, IMDB, email, text messages and tweet flows (like Twitter)
• Transportation networks (roads, trains, fights etc)• Human body can be seen as a graph of genes,
proteins, cells etc..
Graph Problems & MapReduce
• Performing Computation on a graph data structure requires processing at each node
• Each node contain node-specific data as well as links (edges) to other nodes
• Computation must traverse the graph and perform the computation step
• How do we traverse a graph in MapReduce? How do we represent the graph for this?
Breath First Search & MapReduceProblem: This does not fit into MapReduce
Solution: Iterated passes through MapReduce-map some nodes, result includes additional nodes which are fed into successive MapReduce passes
Breath First Search & MapReduce Example
Representation as adjacent list
ID EDGES|DISTANCE_FROM_SOURCE|COLOR|• Input to MAP1 2,5|0|GRAY|2 1,3,4,5|Integer.MAX_VALUE|WHITE|3 2,4|Integer.MAX_VALUE|WHITE|4 2,3,5|Integer.MAX_VALUE|WHITE|5 1,2,4|Integer.MAX_VALUE|WHITE|
Breath First Search & MapReduce Example
• 1st iteration of Map1 2,5|0|BLACK|2 NULL|1|GRAY|5 NULL|1|GRAY|2 1,3,4,5|Integer.MAX_VALUE|WHITE|3 2,4|Integer.MAX_VALUE|WHITE|4 2,3,5|Integer.MAX_VALUE|WHITE|5 1,2,4|Integer.MAX_VALUE|WHITE|
•1st iteration for Reduce(result only for node 2)2 NULL|1|GRAY|2 1,3,4,5|Integer.MAX_VALUE|WHITE|
The reducers job is to take all this data and construct a new node using
the non-null list of edgesthe minimum distancethe darkest color
Breath First Search & MapReduce Example
•Output of 1st iteration1 2,5,|0|BLACK2 1,3,4,5,|1|GRAY3 2,4,|Integer.MAX_VALUE|WHITE4 2,3,5,|Integer.MAX_VALUE|WHITE5 1,2,4,|1|GRAY
•Output of 2st iteration1 2,5,|0|BLACK2 1,3,4,5,|1|BLACK3 2,4,|2|GRAY4 2,3,5,|2|GRAY5 1,2,4,|1|BLACK
Breath First Search & MapReduce Example
•Output of 3st iteration1 2,5,|0|BLACK2 1,3,4,5,|1|BLACK3 2,4,|2|BLACK4 2,3,5,|2|BLACK5 1,2,4,|1|BLACK
Augmenting Edges with Degrees & MapReduce
Problem:This does not fit into MapReduce
Solution: Requires two MapReducejobs: two reduce steps and two map steps,one of which is the identity map.
Augmenting Edges with Degrees & MapReduce Example
Mapper:for each input record, the map creates two output records, one keyed under each vertex in the edge.
Reducer: The reduce takes all edges mapped to a single vertex (“Fred” here), counts them to obtain the degree, and emits a record for each input record, each keyed under the edge it represents.
Augmenting Edges with Degrees & MapReduce Example
Mapper:the identity mapper preserves the records unchanged, so the records are binned by the edges they represent.
Reducer:The reducer combines the partial-degreeinformation to produce a complete record, which it exports.
Enumerating Triangles & MapReduce Example
Problem:Enumerating 3-cycle sub graph
from given graph Solution: • augmenting the edge records
with vertex valence• two MapReduce jobs
Enumerating Triangles & MapReduce Example
• In the first map operation for enumerating triangles, the mapper records each edge under the vertex with the lowest degree.
• The incoming records’ key doesn’t matter.
Enumerating Triangles & MapReduce Example
• In the first map operation for enumerating triangles, the mapper records each edge under the vertex with the lowest degree.
• The incoming records’ key doesn’t matter.
Enumerating Triangles & MapReduce Example
• The second map for enumerating triangles brings together the edge and open triad records.
• In the process, it rekeys the edge records so that both record types are binned under the vertices they connect.
Enumerating Triangles & MapReduce Example
• In the second reduce, each bin contains at most one edge record and some number of triad records (perhaps none).
• For every combination of edge record and triad record in a bin, the reduce emits a triangle record. The output key isn’t significant.
Bibliography1. J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on
Large Clusters,” Comm. ACM, vol. 51, no. 1,2008, pp. 107–112.2. GoogleDevelopers, “Lecture 5: Parallel Graph Algorithms with
MapReduce,” 28 Aug. 2007; http://youtube.com/watch?v=BT-piFBP4fE.3. Jonathan Cohen, Graph Twiddling in a MapReduce World. Comp. in
Science & Engineering, July/August 2009, 29-41.
Thank You