+ All Categories
Home > Documents > Lecture 2 – MapReduce: Theory and Implementation

Lecture 2 – MapReduce: Theory and Implementation

Date post: 19-Mar-2016
Category:
Upload: najila
View: 27 times
Download: 0 times
Share this document with a friend
Description:
Lecture 2 – MapReduce: Theory and Implementation. CSE 490h – Introduction to Distributed Computing, Winter 2008. Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Last Class. How do I process lots of data? - PowerPoint PPT Presentation
Popular Tags:
28
Lecture 2 – MapReduce: Theory and Implementation CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
Transcript
Page 1: Lecture 2 – MapReduce: Theory and Implementation

Lecture 2 – MapReduce: Theory and Implementation

CSE 490h – Introduction to Distributed Computing, Winter 2008

Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.

Page 2: Lecture 2 – MapReduce: Theory and Implementation

Last Class

How do I process lots of data?Distribute the work

Can I distribute the work?Maybe… if it’s not dependent on other tasksExample: Fibonnaci.

Page 3: Lecture 2 – MapReduce: Theory and Implementation

Last Class

What problems can occur?Large tasksUnpredictable bugsMachine failure

How do solve / avoid these?Break up into small chunks?Restart tasks?Use known working solutions

Page 4: Lecture 2 – MapReduce: Theory and Implementation

MapReduce

Concept from functional programming Implemented by Google Applied to large number of problems

Page 5: Lecture 2 – MapReduce: Theory and Implementation

Functional Programming Review

Java:int fooA(String[] list) {

return bar1(list) + bar2(list); }

int fooB(String[] list) { return bar2(list) + bar1(list); }

Do they give the same result?

Page 6: Lecture 2 – MapReduce: Theory and Implementation

Functional Programming Review

Functional Programming:fun fooA(l: int list) =

bar1(l) + bar2(l)

fun fooB(l: int list) = bar2(l) + bar1(l)

Do they give the same result?

Page 7: Lecture 2 – MapReduce: Theory and Implementation

Functional Programming Review

Operations do not modify data structures: They always create new ones

Original data still exists in unmodified form

Page 8: Lecture 2 – MapReduce: Theory and Implementation

Functional Updates Do Not Modify Structuresfun foo(x, lst) = let lst' = reverse lst in reverse ( x :: lst' )foo: a’ -> a’ list -> a’ list

The foo() function above reverses a list, adds a new element to the front, and returns all of that, reversed, which appends an item.

But it never modifies lst!

Page 9: Lecture 2 – MapReduce: Theory and Implementation

Functions Can Be Used As Argumentsfun DoDouble(f, x) = f (f x)It does not matter what f does to its argument; DoDouble() will do it twice.

What is the type of this function? x: a’ f: a’ -> a’ DoDouble: (a’ -> a’) -> a’ -> a’

Page 10: Lecture 2 – MapReduce: Theory and Implementation

map (Functional Programming)

Creates a new list by applying f to each element of the input list; returns output in order.

f f f f f f

map f lst: (’a->’b) -> (’a list) -> (’b list)

Page 11: Lecture 2 – MapReduce: Theory and Implementation

map Implementation

This implementation moves left-to-right across the list, mapping elements one at a time

… But does it need to?

fun map f [] = [] | map f (x::xs) = (f x) :: (map f xs)

Page 12: Lecture 2 – MapReduce: Theory and Implementation

Implicit Parallelism In map

In a purely functional setting, elements of a list being computed by map cannot see the effects of the computations on other elements

If order of application of f to elements in list is commutative, we can reorder or parallelize execution

This is the “secret” that MapReduce exploits

Page 13: Lecture 2 – MapReduce: Theory and Implementation

FoldMoves across a list, applying f to each element

plus an accumulator. f returns the next accumulator value, which is combined with the next element of the list

f f f f f returned

initial

fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b

Page 14: Lecture 2 – MapReduce: Theory and Implementation

fold left vs. fold right

Order of list elements can be significant Fold left moves left-to-right across the list Fold right moves from right-to-leftSML Implementation:

fun foldl f a [] = a | foldl f a (x::xs) = foldl f (f(x, a)) xs

fun foldr f a [] = a | foldr f a (x::xs) = f(x, (foldr f a xs))

Page 15: Lecture 2 – MapReduce: Theory and Implementation

Example

fun foo(l: int list) = sum(l) + mul(l) + length(l)

How can we implement this?

Page 16: Lecture 2 – MapReduce: Theory and Implementation

Example (Solved)

fun foo(l: int list) = sum(l) + mul(l) + length(l)

fun sum(lst) = foldl (fn (x,a)=>x+a) 0 lstfun mul(lst) = foldl (fn (x,a)=>x*a) 1 lstfun length(lst) = foldl (fn (x,a)=>1+a) 0 lst

Page 17: Lecture 2 – MapReduce: Theory and Implementation

Google MapReduce

Input Handling Map function Partition Function Compare Function Reduce Function Output Writer

Page 18: Lecture 2 – MapReduce: Theory and Implementation

Input Handling

Divides up data into bite-size chunks Starts up tasks Assigns tasks to idle workers

Page 19: Lecture 2 – MapReduce: Theory and Implementation

Map

Input: Key, Value pair Output: Key, Value pairs Example: Annual Rainfall Per City

Page 20: Lecture 2 – MapReduce: Theory and Implementation

Map (Example)

Example: Annual Rainfall Per City map(String key, String value): // key: date // value: weather info foreach (City c in value) EmitIntermediate(c, c.temperature)

Page 21: Lecture 2 – MapReduce: Theory and Implementation

Partition Function

Allocates map output to particular reduces Input: key, number of reduces Output: Index of desired reduce Typical: hash(key) % numberOfReduces

Page 22: Lecture 2 – MapReduce: Theory and Implementation

Comparison

Sorts input for each reduce Example: Annual rainfall per city

Sorts rainfall data for each citySeattle: {0, 0, 0, 1, 4, 7, 10, …}

Page 23: Lecture 2 – MapReduce: Theory and Implementation

Reduce

Input: Key, Sorted list of values Output: Single value Example: Annual rainfall per city

Page 24: Lecture 2 – MapReduce: Theory and Implementation

Reduce

Input: Key, Sorted list of values Output: Single value Example: Annual rainfall per city

Page 25: Lecture 2 – MapReduce: Theory and Implementation

Reduce (Example)

Example: Annual rainfall per city reduce(String key, Iterator values):

// key: city // values: temperature sum = 0, count = 0 for each (v in values) sum += v count = count + 1 Emit(sum / count)

Page 26: Lecture 2 – MapReduce: Theory and Implementation

Output

Writes the output to storage (GFS, etc)

Page 27: Lecture 2 – MapReduce: Theory and Implementation

Data store 1 Data store nmap

(key 1, values...)

(key 2, values...)

(key 3, values...)

map

(key 1, values...)

(key 2, values...)

(key 3, values...)

Input key*value pairs

Input key*value pairs

== Barrier == : Aggregates intermediate values by output key

reduce reduce reduce

key 1, intermediate

values

key 2, intermediate

values

key 3, intermediate

values

final key 1 values

final key 2 values

final key 3 values

...

Page 28: Lecture 2 – MapReduce: Theory and Implementation

MapReduce for Google Local

Intersections Rendering Tiles Finding nearest gas stations


Recommended