Laboratory Session: MapReduceAlgorithm Design in MapReduce
Pietro Michiardi
Eurecom
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 1 / 63
Algorithm Design Preliminaries
Preliminaries
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 2 / 63
Algorithm Design Preliminaries
Algorithm Design
Developing algorithms involve:I Preparing the input dataI Implement the mapper and the reducerI Optionally, design the combiner and the partitioner
How to recast existing algorithms in MapReduce?I It is not always obvious how to express algorithmsI Data structures play an important roleI Optimization is hard→ The designer needs to “bend” the framework
Learn by examplesI “Design patterns”I Synchronization is perhaps the most tricky aspect
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 3 / 63
Algorithm Design Preliminaries
Algorithm Design
Aspects that are not under the control of the designerI Where a mapper or reducer will runI When a mapper or reducer begins or finishesI Which input key-value pairs are processed by a specific mapperI Which intermediate key-value paris are processed by a specific
reducer
Aspects that can be controlledI Construct data structures as keys and valuesI Execute user-specified initialization and termination code for
mappers and reducersI Preserve state across multiple input and intermediate keys in
mappers and reducersI Control the sort order of intermediate keys, and therefore the order
in which a reducer will encounter particular keysI Control the partitioning of the key space, and therefore the set of
keys that will be encountered by a particular reducer
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 4 / 63
Algorithm Design Preliminaries
Algorithm DesignMapReduce jobs can be complex
I Many algorithms cannot be easily expressed as a singleMapReduce job
I Decompose complex algorithms into a sequence of jobsF Requires orchestrating data so that the output of one job becomes
the input to the nextI Iterative algorithms require an external driver to check for
convergence
OptimizationsI Scalability (linear)I Resource requirements (storage and bandwidth)
OutlineI Local AggregationI Pairs and StripesI Order inversionI Graph algorithms
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 5 / 63
Algorithm Design Local Aggregation
Local Aggregation
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 6 / 63
Algorithm Design Local Aggregation
Local Aggregation
In the context of data-intensive distributed processing, themost important aspect of synchronization is the exchange ofintermediate results
I This involves copying intermediate results from the processes thatproduced them to those that consume them
I In general, this involves data transfers over the networkI In Hadoop, also disk I/O is involved, as intermediate results are
written to disk
Network and disk latencies are expensiveI Reducing the amount of intermediate data translates into
algorithmic efficiency
Combiners and preserving state across inputsI Reduce the number and size of key-value pairs to be shuffled
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 7 / 63
Algorithm Design Local Aggregation
Combiners
Combiners are a general mechanism to reduce the amount ofintermediate data
I They could be thought of as “mini-reducers”
Example: word countI Combiners aggregate term counts across documents processed by
each map taskI If combiners take advantage of all opportunities for local
aggregation we have at most m × V intermediate key-value pairsF m: number of mappersF V : number of unique terms in the collection
I Note: due to Zipfian nature of term distributions, not all mappers willsee all terms
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 8 / 63
Algorithm Design Local Aggregation
Word Counting in MapReduce
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 9 / 63
Algorithm Design Local Aggregation
In-Mapper Combiners
In-Mapper Combiners, a possible improvementI Hadoop does not guarantee combiners to be executed
Use an associative array to cumulate intermediate resultsI The array is used to tally up term counts within a single documentI The Emit method is called only after all InputRecords have been
processed
Example (see next slide)I The code emits a key-value pair for each unique term in the
document
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 10 / 63
Algorithm Design Local Aggregation
In-Mapper Combiners
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 11 / 63
Algorithm Design Local Aggregation
In-Mapper Combiners
Taking the idea one step furtherI Exploit implementation details in HadoopI A Java mapper object is created for each map taskI JVM reuse must be enabled
Preserve state within and across calls to the Map methodI Initialize method, used to create a across-map persistent data
structureI Close method, used to emit intermediate key-value pairs only
when all map task scheduled on one machine are done
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 12 / 63
Algorithm Design Local Aggregation
In-Mapper Combiners
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 13 / 63
Algorithm Design Local Aggregation
In-Mapper Combiners
Summing up: a first “design pattern”, in-mapper combiningI Provides control over when local aggregation occursI Design can determine how exactly aggregation is done
Efficiency vs. CombinersI There is no additional overhead due to the materialization of
key-value pairsF Un-necessary object creation and destruction (garbage collection)F Serialization, deserialization when memory bounded
I Mappers still need to emit all key-value pairs, combiners onlyreduce network traffic
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 14 / 63
Algorithm Design Local Aggregation
In-Mapper Combiners
PrecautionsI In-mapper combining breaks the functional programming paradigm
due to state preservationI Preserving state across multiple instances implies that algorithm
behavior might depend on execution orderF Ordering-dependent bugs are difficult to find
Scalability bottleneckI The in-mapper combining technique strictly depends on having
sufficient memory to store intermediate resultsF And you don’t want the OS to deal with swapping
I Multiple threads compete for the same resourcesI A possible solution: “block” and “flush”
F Implemented with a simple counter
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 15 / 63
Algorithm Design Local Aggregation
Further Remarks
The extent to which efficiency can be increased with localaggregation depends on the size of the intermediate keyspace
I Opportunities for aggregation araise when multiple values areassociated to the same keys
Local aggregation also effective to deal with reducestragglers
I Reduce the number of values associated with frequently occuringkeys
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 16 / 63
Algorithm Design Local Aggregation
Algorithmic correctness with local aggregation
The use of combiners must be thought carefullyI In Hadoop, they are optional: the correctness of the algorithm
cannot depend on computation (or even execution) of thecombiners
In MapReduce, the reducer input key-value type must matchthe mapper output key-value type
I Hence, for combiners, both input and output key-value types mustmatch the output key-value type of the mapper
Commutative and Associatvie computationsI This is a special case, which worked for word counting
F There the combiner code is actually the reducer codeI In general, combiners and reducers are not interchangeable
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 17 / 63
Algorithm Design Local Aggregation
Algorithmic Correctness: an ExampleProblem statement
I We have a large dataset where input keys are strings and inputvalues are integers
I We wish to compute the mean of all integers associated with thesame key
F In practice: the dataset can be a log from a website, where the keysare user IDs and values are some measure of activity
Next, a baseline approachI We use an identity mapper, which groups and sorts appropriately
input key-value parisI Reducers keep track of running sum and the number of integers
encounteredI The mean is emitted as the output of the reducer, with the input
string as the key
Inefficiency problems in the shuffle phasePietro Michiardi (Eurecom) Laboratory Session: MapReduce 18 / 63
Algorithm Design Local Aggregation
Example: basic MapReduce to compute the mean of values
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 19 / 63
Algorithm Design Local Aggregation
Algorithmic Correctness: an Example
Note: operations are not distributiveI Mean(1,2,3,4,5) 6= Mean(Mean(1,2), Mean(3,4,5))I Hence: a combiner cannot output partial means and hope that the
reducer will compute the correct final mean
Next, a failed attempt at solving the problemI The combiner partially aggregates results by separating the
components to arrive at the meanI The sum and the count of elements are packaged into a pairI Using the same input string, the combiner emits the pair
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 20 / 63
Algorithm Design Local Aggregation
Example: Wrong use of combiners
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 21 / 63
Algorithm Design Local Aggregation
Algorithmic Correctness: an Example
What’s wrong with the previous approach?I Trivially, the input/output keys are not correctI Remember that combiners are optimizations, the algorithm should
work even when “removing” them
Executing the code omitting the combiner phaseI The output value type of the mapper is integerI The reducer expects to receive a list of integersI Instead, we make it expect a list of pairs
Next, a correct implementation of the combinerI Note: the reducer is similar to the combiner!I Exercise: verify the correctness
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 22 / 63
Algorithm Design Local Aggregation
Example: Correct use of combiners
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 23 / 63
Algorithm Design Local Aggregation
Algorithmic Correctness: an Example
Using in-mapper combiningI Inside the mapper, the partial sums and counts are held in memory
(across inputs)I Intermediate values are emitted only after the entire input split is
processedI Similarly to before, the output value is a pair
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 24 / 63
Algorithm Design Paris and Stripes
Pairs and Stripes
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 25 / 63
Algorithm Design Paris and Stripes
Pairs and Stripes
A common approach in MapReduce: build complex keysI Data necessary for a computation are naturally brought together by
the framework
Two basic techniques:I Pairs: similar to the example on the averageI Stripes: uses in-mapper memory data structures
Next, we focus on a particular problem that benefits fromthese two methods
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 26 / 63
Algorithm Design Paris and Stripes
Problem statement
The problem: building word co-occurrence matrices for largecorpora
I The co-occurrence matrix of a corpus is a square n × n matrixI n is the number of unique words (i.e., the vocabulary size)I A cell mij contains the number of times the word wi co-occurs with
word wj within a specific contextI Context: a sentence, a paragraph a document or a window of m
wordsI NOTE: the matrix may be symmetric in some cases
MotivationI This problem is a basic building block for more complex operationsI Estimating the distribution of discrete joint events from a large
number of observationsI Similar problem in other domains:
F Customers who buy this tend to also buy that
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 27 / 63
Algorithm Design Paris and Stripes
Observations
Space requirementsI Clearly, the space requirement is O(n2), where n is the size of the
vocabularyI For real-world (English) corpora n can be hundres of thousands of
words, or even billion of worlds
So what’s the problem?I If the matrix can fit in the memory of a single machine, then just use
whatever naive implementationI Instead, if the matrix is bigger than the available memory, then
paging would kick in, and any naive implementation would break
CompressionI Such techniques can help in solving the problem on a single
machineI However, there are scalability problems
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 28 / 63
Algorithm Design Paris and Stripes
Word co-occurrence: the Pairs approachInput to the problem
I Key-value pairs in the form of a docid and a doc
The mapper:I Processes each input documentI Emits key-value pairs with:
F Each co-occurring word pair as the keyF The integer one (the count) as the value
I This is done with two nested loops:F The outer loop iterates over all wordsF The inner loop iterates over all neighbors
The reducer:I Receives pairs relative to co-occurring words
F This requires modifing the partitionerI Computes an absolute count of the joint eventI Emits the pair and the count as the final key-value output
F Basically reducers emit the cells of the matrix
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 29 / 63
Algorithm Design Paris and Stripes
Word co-occurrence: the Pairs approach
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 30 / 63
Algorithm Design Paris and Stripes
Word co-occurrence: the Stripes approach
Input to the problemI Key-value pairs in the form of a docid and a doc
The mapper:I Same two nested loops structure as beforeI Co-occurrence information is first stored in an associative arrayI Emit key-value pairs with words as keys and the corresponding
arrays as values
The reducer:I Receives all associative arrays related to the same wordI Performs an element-wise sum of all associative arrays with the
same keyI Emits key-value output in the form of word, associative array
F Basically, reducers emit rows of the co-occurrence matrix
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 31 / 63
Algorithm Design Paris and Stripes
Word co-occurrence: the Stripes approach
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 32 / 63
Algorithm Design Paris and Stripes
Pairs and Stripes, a comparison
The pairs approachI Generates a large number of key-value pairs (also intermediate)I The benefit from combiners is limited, as it is less likely for a
mapper to process multiple occurrences of a wordI Does not suffer from memory paging problems
The pairs approachI More compactI Generates fewer and shorted intermediate keys
F The framework has less sorting to doI The values are more complex and have serialization/deserialization
overheadI Greately benefits from combiners, as the key space is the
vocabularyI Suffers from memory paging problems, if not properly engineered
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 33 / 63
Algorithm Design Order Inversion
Order Inversion
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 34 / 63
Algorithm Design Order Inversion
Computing relative frequenceies
“Relative” Co-occurrence matrix constructionI Similar problem as before, same matrixI Instead of absolute counts, we take into consideration the fact that
some words appear more frequently than othersF Word wi may co-occur frequently with word wj simply because one of
the two is very commonI We need to convert absolute counts to relative frequencies f (wj |wi)
F What proportion of the time does wj appear in the context of wi?
Formally, we compute:
f (wj |wi) =N(wi ,wj)∑w ′ N(wi ,w ′)
I N(·, ·) is the number of times a co-occurring word pair is observedI The denominator is called the marginal
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 35 / 63
Algorithm Design Order Inversion
Computing relative frequenceies
The stripes approachI In the reducer, the counts of all words that co-occur with the
conditioning variable (wi ) are available in the associative arrayI Hence, the sum of all those counts gives the marginalI Then we divide the the joint counts by the marginal and we’re done
The pairs approachI The reducer receives the pair (wi ,wj) and the countI From this information alone it is not possible to compute f (wj |wi)I Fortunately, as for the mapper, also the reducer can preserve state
across multiple keysF We can buffer in memory all the words that co-occur with wi and their
countsF This is basically building the associative array in the stripes method
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 36 / 63
Algorithm Design Order Inversion
Computing relative frequenceies: a basic approachWe must define the sort order of the pair
I In this way, the keys are first sorted by the left word, and then by theright word (in the pair)
I Hence, we can detect if all pairs associated with the word we areconditioning on (wi ) have been seen
I At this point, we can use the in-memory buffer, compute the relativefrequencies and emit
We must define an appropriate partitionerI The default partitioner is based on the hash value of the
intermediate key, modulo the number of reducersI For a complex key, the raw byte representation is used to compute
the hash valueF Hence, there is no guarantee that the pair (dog, aardvark) and
(dog,zebra) are sent to the same reducerI What we want is that all pairs with the same left word are sent to
the same reducer
Limitations of this approachI Essentially, we reproduce the stripes method on the reducer and
we need to use a custom partitionnerI This algorithm would work, but present the same
memory-bottleneck problem as the stripes method
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 37 / 63
Algorithm Design Order Inversion
Computing relative frequenceies: order inversion
The key is to properly sequence data presented to reducersI If it were possible to compute the marginal in the reducer before
processing the join counts, the reducer could simply divide the jointcounts received from mappers by the marginal
I The notion of “before” and “after” can be captured in the ordering ofkey-value pairs
I The programmer can define the sort order of keys so that dataneeded earlier is presented to the reducer before data that isneeded later
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 38 / 63
Algorithm Design Order Inversion
Computing relative frequenceies: order inversion
Recall that mappers emit pairs of co-occurring words as keys
The mapper:I additionally emits a “special” key of the form (wi , ∗)I The value associated to the special key is one, that represtns the
contribution of the word pair to the marginalI Using combiners, these partial marginal counts will be aggregated
before being sent to the reducers
The reducer:I We must make sure that the special key-value pairs are processed
before any other key-value pairs where the left word is wiI We also need to modify the partitioner as before, i.e., it would take
into account only the first word
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 39 / 63
Algorithm Design Order Inversion
Computing relative frequenceies: order inversion
Memory requirements:I Minimal, because only the marginal (an integer) needs to be storedI No buffering of individual co-occurring wordI No scalability bottleneck
Key ingredients for order inversionI Emit a special key-value pair to capture the margianlI Control the sort order of the intermediate key, so that the special
key-value pair is processed firstI Define a custom partitioner for routing intermediate key-value pairsI Preserve state across multiple keys in the reducer
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 40 / 63
Algorithm Design Graph Algorithms
Graph Algorithms
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 41 / 63
Algorithm Design Graph Algorithms
Preliminaries and Data Structures
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 42 / 63
Algorithm Design Graph Algorithms
MotivationsExamples of graph problems
I Graph searchI Graph clusteringI Minimum spanning treesI Matching problemsI Flow problemsI Element analysis: node and edge centralities
The problem: big graphs
Why MapReduce?I Algorithms for the above problems on a single machine are not
scalableI Recently, Google designed a new system, Pregel, for large-scale
(incremental) graph processingI Even more recently, [3] indicate a fundamentally new design pattern
to analyze graphs in MapReduce
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 43 / 63
Algorithm Design Graph Algorithms
Graph Representations
Basic data structuresI Adjacency matrixI Adjacency list
Are graphs sparse or dense?I Determines which data-structure to use
F Adjacency matrix: operations on incoming links are easy (columnscan)
F Adjacency list: operations on outgoing links are easyF The shuffle and sort phase can help, by grouping edges by their
destination reducerI [4] dispelled the notion of sparseness of real-world graphs
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 44 / 63
Algorithm Design Graph Algorithms
Parallel Breadth-First-Search
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 45 / 63
Algorithm Design Graph Algorithms
Parallel Breadth-First Search
Single-source shortest pathI Dijkstra algorithm using a global priority queue
F Maintains a globally sorted list of nodes by current distanceI How to solve this problem in parallel?
F “Brute-force” approach: breadth-first search
Parallel BFS: intuitionI FloodingI Iterative algorithm in MapReduceI Shoehorn message passing style algorithms
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 46 / 63
Algorithm Design Graph Algorithms
Parallel Breadth-First Search
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 47 / 63
Algorithm Design Graph Algorithms
Parallel Breadth-First Search
AssumptionsI Connected, directed graphI Data structure: adjacency listI Distance to each node is stored alongside the adjacency list of that
node
The pseudo-codeI We use n to denote the node id (an integer)I We use N to denote the node adjacency list and current distanceI The algorithm works by mapping over all nodesI Mappers emit a key-value pair for each neighbor on the node’s
adjacency listF The key: node id of the neighborF The value: the current distace to the node plus oneF If we can reach node n with a distance d , then we must be able to
reach all the nodes connected ot n with distance d + 1
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 48 / 63
Algorithm Design Graph Algorithms
Parallel Breadth-First Search
The pseudo-code (continued)I After shuffle and sort, reducers receive keys corresponding to the
destination node ids and distances corresponding to all pathsleading to that node
I The reducer selects the shortest of these distances and update thedistance in the node data structure
Passing the graph alongI The mapper: emits the node adjacency list, with the node id as the
keyI The reducer: must distinguish between the node data structure and
the distance values
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 49 / 63
Algorithm Design Graph Algorithms
Parallel Breadth-First Search
MapReduce iterationsI The first time we run the algorithm, we “discover” all nodes
connected to the sourceI The second iteration, we discover all nodes connected to those→ Each iteration expands the “search frontier” by one hopI How many iterations before convergence?
This approach is suitable for small-world graphsI The diameter of the network is smallI See [3] for advanced topics on the subject
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 50 / 63
Algorithm Design Graph Algorithms
Parallel Breadth-First Search
Checking the termination of the algorithmI Requires a “driver” program which submits a job, check termination
condition and eventually iteratesI In practice:
F Hadoop countersF Side-data to be passed to the job configuration
ExtensionsI Storing the actual shortest-pathI Weighted edges (as opposed to unit distance)
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 51 / 63
Algorithm Design Graph Algorithms
The story so far
The graph structure is stored in an adjacency listsI This data structure can be augmented with additional information
The MapReduce frameworkI Maps over the node data structures involving only the node’s
internal state and it’s local graph structureI Map results are “passed” along outgoing edgesI The graph itself is passed from the mapper to the reducer
F This is a very costly operation for large graphs!I Reducers aggregate over “same destination” nodes
Graph algorithms are generally iterativeI Require a driver program to check for termination
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 52 / 63
Algorithm Design Graph Algorithms
PageRank
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 53 / 63
Algorithm Design Graph Algorithms
Introduction
What is PageRankI It’s a measure of the relevance of a Web page, based on the
structure of the hyperlink graphI Based on the concept of random Web surfer
Formally we have:
P(n) = α( 1|G|
)+ (1− α)
∑m∈L(n)
P(m)
C(m)
I |G| is the number of nodes in the graphI α is a random jump factorI L(n) is the set of out-going links from page nI C(m) is the out-degree of node m
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 54 / 63
Algorithm Design Graph Algorithms
PageRank in Details
PageRank is defined recursively, hence we need an interativealgorithm
I A node receives “contributions” from all pages that link to it
Consider the set of nodes L(n)I A random surfer at m arrives at n with probability 1/C(m)I Since the PageRank value of m is the probability that the random
surfer is at m, the probability of arriving at n from m is P(m)/C(m)
To compute the PageRank of n we need:I Sum the contributions from all pages that link to nI Take into account the random jump, which is uniform over all nodes
in the graph
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 55 / 63
Algorithm Design Graph Algorithms
PageRank in MapReduce
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 56 / 63
Algorithm Design Graph Algorithms
PageRank in MapReduce
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 57 / 63
Algorithm Design Graph Algorithms
PageRank in MapReduce
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 58 / 63
Algorithm Design Graph Algorithms
PageRank in MapReduce
Sketch of the MapReduce algorithmI The algorithm maps over the nodesI Foreach node computes the PageRank mass the needs to be
distributed to neighborsI Each fraction of the PageRank mass is emitted as the value, keyed
by the node ids of the neighborsI In the shuffle and sort, values are grouped by node id
F Also, we pass the graph structure from mappers to reducers (forsubsequent iterations to take place over the updated graph)
I The reducer updates the value of the PageRank of every singlenode
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 59 / 63
Algorithm Design Graph Algorithms
PageRank in MapReduce
Implementation detailsI Loss of PageRank mass for sink nodesI Auxiliary state informationI One iteration of the algorith
F Two MapReduce jobs: one to distribute the PageRank mass, theother for dangling nodes and random jumps
I Checking for convergenceF Requires a driver programF When updates of PageRank are “stable” the algorithm stops
Further reading on convergence and attacksI Convergenge: [5, 2]I Attacks: Adversarial Information Retrieval Workshop [1]
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 60 / 63
References
References I
[1] Adversarial information retrieval workshop.
[2] Monica Bianchini, Marco Gori, and Franco Scarselli.Inside pagerank.In ACM Transactions on Internet Technology, 2005.
[3] Silvio Lattanzi, Benjamin Moseley, Siddharth Suri, and SergeiVassilvitskii.Filtering: a method for solving graph problems in mapreduce.In Proc. of SPAA, 2011.
[4] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos.Graphs over time: Densification laws, shrinking diamters andpossible explanations.In Proc. of SIGKDD, 2005.
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 61 / 63
References
References II
[5] Lawrence Page, Sergey Brin, Rajeev Motwani, and TerryWinograd.The pagerank citation ranking: Bringin order to the web.In Stanford Digital Library Working Paper, 1999.
Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 62 / 63