of 29
7/27/2019 MapReduce-xpbu
1/29
Introduction to MapReduceECE7610
7/27/2019 MapReduce-xpbu
2/29
The Age of Big-Data
Big-data age Faceboo co!!ect" #00 terab$te" a da$%&011' (oog!e co!!ect" &0000)B a da$ %&011'
Data i" an i*portant a""et to an$organi+ation Finance co*pan$, in"urance co*pan$, internet
co*pan$
e need ne. A!gorith*"/data "tructure"/progra**ing *ode!
2
7/27/2019 MapReduce-xpbu
3/29
hat to do %ord Count'
Con"ider a !arge data co!!ection and countthe occurrence" of the different .ord"
3
Dataco!!ection
web 2
weed 1
green 2
sun 1
moon 1
land 1
part 1
ResultTable
Main
DataCollection
WordCounter
parse( )count( )
{web, weed, green, sun, moon, land, part, web, green,}
7/27/2019 MapReduce-xpbu
4/29
hat to do %ord Count'
4
Dataco!!ection
web 2
weed 1
green 2
sun 1
moon 1
land 1
part 1
Thread
DataCollection Res ultTable
WordCounter
parse( )count( )
Main
1..*1..*
Multi-threadLock on shared data
7/27/2019 MapReduce-xpbu
5/29
hat to do%ord Count'
5
Dataco!!ection
ing!e *achine cannot "er2e a!! thedata3 $ou need a di"tributed "pecia!%fi!e' "$"te*
4arge nu*ber of co**odit$ hard.are
di""3 "a$5 1000 di"" 1TB each Critica! a"pect"3 fau!t to!erance
rep!ication !oad ba!ancing5 *onitoring Ep!oit para!!e!i"* afforded b$ "p!itting
par"ing and counting )ro2i"ion and !ocate co*puting at data
!ocation"
7/27/2019 MapReduce-xpbu
6/29
hat to do %ord Count'
6
WordList
Thread
Main
1..*
1..*
DataCollection
Parser
1..*Counter
1..*
ResultTable
KEY web weed green sun moon land part web green .
VALUE
web 2
weed 1
green 2
sun 1
moon 1
land 1
part 1
Dataco!!ection
Separate countersSeparate data
Dataco!!ection
Dataco!!ection
Dataco!!ection
Dataco!!ection
7/27/2019 MapReduce-xpbu
7/29
It i" not ea"$ to para!!e!89
7
Fundamental issuesScheduling, data distribution, synchronization, inter-process communication, robustness, fault tolerance,
Different programming modelsMessage Passing Shared Memory
Architectural issuesFlynns taonomy !S"MD, M"MD, etc#$, net%or&topology, bisection band%idth, cache coherence,
'ommon problems(i)eloc&, deadloc&, data star)ation, priority in)ersion,dining philosophers, sleeping barbers, cigarettesmo&ers,
Different programming constructsMutees, conditional )ariables, barriers, masters*sla)es, producers*consumers, %or& +ueues,#
Actually, Programmers Nightmare.
7/27/2019 MapReduce-xpbu
8/29
MapReduce3 Auto*ate for $ou
I*portant di"tributed para!!e! progra**ing paradig* for !arge-"ca!eapp!ication"9 Beco*e" one of the core techno!ogie" po.ering big IT co*panie"5 !ie
(oog!e5 IBM5 :ahoo and Faceboo9 The fra*e.or run" on a c!u"ter of *achine" and auto*atica!!$ partition"
;ob" into nu*ber of "*a!! ta"" and proce""e" the* in para!!e!9
Feature"3 fairne""5 ta" data !oca!it$5 fau!t-to!erance9
8
7/27/2019 MapReduce-xpbu
9/29
MapReduce
9
MA)3 Input data
7/27/2019 MapReduce-xpbu
10/29
MapReduce
10
Reduce
Reduce
Reduce
MA)3 Input data CE3
7/27/2019 MapReduce-xpbu
11/29
C. Xu @ Wayne State 11
Count
Count
C
ount
Large scale data splits
)ar"e-ha"h
)ar"e-ha"h
)ar"e-ha"h
)ar"e-ha"h
Map "ke!# 1$ %educers &sa!# 'ount(
7/27/2019 MapReduce-xpbu
12/29
MapReduce
12
7/27/2019 MapReduce-xpbu
13/29
?o. to "tore the data
13
Compute Nodes
Whats the po!"em hee#
7/27/2019 MapReduce-xpbu
14/29
Di"tributed Fi!e $"te*
Don@t *o2e data to .orer"8 Mo2e .orer"to the data tore data on the !oca! di"" for node" in the c!u"ter tart up the .orer" on the node that ha" the data !oca!
h$ ot enough RAM to ho!d a!! the data in *e*or$ et.or i" the bott!enec5 di" throughput i" good
A di"tributed fi!e "$"te* i" the an".er (F %(oog!e Fi!e $"te*' ?DF for ?adoop
14
7/27/2019 MapReduce-xpbu
15/29
(F/?DF De"ign
Co**odit$ hard.are o2er eotic hard.are ?igh co*ponent fai!ure rate" Fi!e" "tored a" chun"
Fied "i+e %6MB'
Re!iabi!it$ through rep!ication Each chun rep!icated acro"" chun"er2er"
ing!e *a"ter to coordinate acce""5 eep *etadata i*p!e centra!i+ed *anage*ent
o data caching 4itt!e benefit due to !arge data "et"5 "trea*ing read"
i*p!if$ the A)I )u"h "o*e of the i""ue" onto the c!ient
15
7/27/2019 MapReduce-xpbu
16/29
(F/?DF
16
7/27/2019 MapReduce-xpbu
17/29
MapReduce Data 4oca!it$
Ma"ter "chedu!ing po!ic$ A"" ?DF for !ocation" of rep!ica" of input fi!e b!oc" Map ta"" t$pica!!$ "p!it into 6MB %GG (F b!oc "i+e' 4oca!it$ !e2e!"3 node !oca!it$/rac !oca!it$/off-rac
Map ta"" "chedu!ed a" c!o"e to it" input data a" po""ib!e Effect
Thou"and" of *achine" read input at !oca! di" "peed9ithout thi"5 rac ".itche" !i*it read rate and net.or
band.idth beco*e" the bott!enec9
17
7/27/2019 MapReduce-xpbu
18/29
MapReduce Fau!t-to!erance
Reacti2e .a$orer fai!ure
H ?eartbeat5 orer" are periodica!!$ pinged b$ *a"ter$N% esponse & 'a("ed )o*e
H If the proce""or of a .orer fai!"5 the ta"" of that.orer are rea""igned to another .orer9
Ma"ter fai!ure
H Ma"ter .rite" periodic checpoint"H Another *a"ter can be "tarted fro* the !a"tchecpointed "tate
H If e2entua!!$ the *a"ter die"5 the ;ob .i!! be aborted
18
7/27/2019 MapReduce-xpbu
19/29
MapReduce Fau!t-to!erance
)roacti2e .a$ %Speculative Execution' The prob!e* of +"tragg!er",%"!o. .orer"'
H ther ;ob" con"u*ing re"ource" on *achineH Bad di"" .ith "oft error" tran"fer data 2er$ "!o.!$
H eird thing"3 proce""or cache" di"ab!ed %'
hen co*putation a!*o"t done5 re"chedu!e in-progre"" ta""
hene2er either the pri*ar$ or the bacupeecution" fini"he"5 *ar it a" co*p!eted
19
7/27/2019 MapReduce-xpbu
20/29
MapReduce chedu!ing
Fair Sharing conduct" fair "chedu!ing u"ing greed$ *ethod to *aintain
data !oca!it$Delay
u"e" de!a$ "chedu!ing a!gorith* to achie2e good data
!oca!it$ b$ "!ight!$ co*pro*i"ing fairne"" re"trictionLATE(4onge"t Approi*ate Ti*e to End)
i*pro2e" MapReduce app!ication"J perfor*ance inheterogenou" en2iron*ent5 !ie 2irtua!i+ed en2iron*ent5through accurate "pecu!ati2e eecution
Capacity introduced b$ :ahoo5 "upport" *u!tip!e Kueue" for "hared
u"er" and guarantee" each Kueue a fraction of thecapacit$ of the c!u"ter
20
7/27/2019 MapReduce-xpbu
21/29
MapReduce C!oud er2ice
)ro2iding MapReduce fra*e.or" a" a "er2ice in c!oud"beco*e" an attracti2e u"age *ode! for enterpri"e"9 A MapReduce c!oud "er2ice a!!o." u"er" to co"t-effecti2e!$
acce"" a !arge a*ount of co*puting re"ource" .ith creatingo.n c!u"ter9
>"er" are ab!e to ad;u"t the "ca!e of MapReduce c!u"ter" inre"pon"e to the change of the re"ource de*and ofapp!ication"9
21
7/27/2019 MapReduce-xpbu
22/29
A*a+on E!a"tic MR
-ou
1. Sp data to "uste2. /oe data (nto S
3. ee"op ode "oa""y
4. Su!m(t /apedueo!4a. o !a* to Step 3
5. /oe data out o' S
6. Sp data 'om "uste
0. ""oate adoop "uste
C2
-ou adoop C"uste
7. C"ean up
7/27/2019 MapReduce-xpbu
23/29
e. Cha!!enge"
Interference bet.een co-ho"ted LM" !o. do.n the ;ob 19#-7 ti*e"
4oca!it$ pre"er2ing po!ic$ no !ong effecti2e 4o"e *ore than &0 !oca!it$ %depend"'
eed "pecifica!!$ de"igned "chedu!er for2irtua! MapReduce c!u"ter
Interference-a.are 4oca!it$-a.are
23
7/27/2019 MapReduce-xpbu
24/29
MapReduce )rogra**ing
?adoop i*p!e*entation of MR in Na2a %2er"ion 1909' ordCount ea*p!e3 hadoop-
1909/"rc/ea*p!e"/org/apache/hadoop/ea*p!e"/ordCount9;a2a
24
7/27/2019 MapReduce-xpbu
25/29
MapReduce )rogra**ing
25
7/27/2019 MapReduce-xpbu
26/29
Map
I*p!e*ent $our o.n *ap c!a"" etendingthe Mapper c!a""
26
7/27/2019 MapReduce-xpbu
27/29
Reduce
I*p!e*ent $our o.n reducer c!a""etending the reducer c!a""
27
7/27/2019 MapReduce-xpbu
28/29
Main%'
28
7/27/2019 MapReduce-xpbu
29/29
De*o
29