+ All Categories
Home > Documents > MapReduce-xpbu

MapReduce-xpbu

Date post: 14-Apr-2018
Category:
Upload: arteepu4
View: 216 times
Download: 0 times
Share this document with a friend

of 29

Transcript
  • 7/27/2019 MapReduce-xpbu

    1/29

    Introduction to MapReduceECE7610

  • 7/27/2019 MapReduce-xpbu

    2/29

    The Age of Big-Data

    Big-data age Faceboo co!!ect" #00 terab$te" a da$%&011' (oog!e co!!ect" &0000)B a da$ %&011'

    Data i" an i*portant a""et to an$organi+ation Finance co*pan$, in"urance co*pan$, internet

    co*pan$

    e need ne. A!gorith*"/data "tructure"/progra**ing *ode!

    2

  • 7/27/2019 MapReduce-xpbu

    3/29

    hat to do %ord Count'

    Con"ider a !arge data co!!ection and countthe occurrence" of the different .ord"

    3

    Dataco!!ection

    web 2

    weed 1

    green 2

    sun 1

    moon 1

    land 1

    part 1

    ResultTable

    Main

    DataCollection

    WordCounter

    parse( )count( )

    {web, weed, green, sun, moon, land, part, web, green,}

  • 7/27/2019 MapReduce-xpbu

    4/29

    hat to do %ord Count'

    4

    Dataco!!ection

    web 2

    weed 1

    green 2

    sun 1

    moon 1

    land 1

    part 1

    Thread

    DataCollection Res ultTable

    WordCounter

    parse( )count( )

    Main

    1..*1..*

    Multi-threadLock on shared data

  • 7/27/2019 MapReduce-xpbu

    5/29

    hat to do%ord Count'

    5

    Dataco!!ection

    ing!e *achine cannot "er2e a!! thedata3 $ou need a di"tributed "pecia!%fi!e' "$"te*

    4arge nu*ber of co**odit$ hard.are

    di""3 "a$5 1000 di"" 1TB each Critica! a"pect"3 fau!t to!erance

    rep!ication !oad ba!ancing5 *onitoring Ep!oit para!!e!i"* afforded b$ "p!itting

    par"ing and counting )ro2i"ion and !ocate co*puting at data

    !ocation"

  • 7/27/2019 MapReduce-xpbu

    6/29

    hat to do %ord Count'

    6

    WordList

    Thread

    Main

    1..*

    1..*

    DataCollection

    Parser

    1..*Counter

    1..*

    ResultTable

    KEY web weed green sun moon land part web green .

    VALUE

    web 2

    weed 1

    green 2

    sun 1

    moon 1

    land 1

    part 1

    Dataco!!ection

    Separate countersSeparate data

    Dataco!!ection

    Dataco!!ection

    Dataco!!ection

    Dataco!!ection

  • 7/27/2019 MapReduce-xpbu

    7/29

    It i" not ea"$ to para!!e!89

    7

    Fundamental issuesScheduling, data distribution, synchronization, inter-process communication, robustness, fault tolerance,

    Different programming modelsMessage Passing Shared Memory

    Architectural issuesFlynns taonomy !S"MD, M"MD, etc#$, net%or&topology, bisection band%idth, cache coherence,

    'ommon problems(i)eloc&, deadloc&, data star)ation, priority in)ersion,dining philosophers, sleeping barbers, cigarettesmo&ers,

    Different programming constructsMutees, conditional )ariables, barriers, masters*sla)es, producers*consumers, %or& +ueues,#

    Actually, Programmers Nightmare.

  • 7/27/2019 MapReduce-xpbu

    8/29

    MapReduce3 Auto*ate for $ou

    I*portant di"tributed para!!e! progra**ing paradig* for !arge-"ca!eapp!ication"9 Beco*e" one of the core techno!ogie" po.ering big IT co*panie"5 !ie

    (oog!e5 IBM5 :ahoo and Faceboo9 The fra*e.or run" on a c!u"ter of *achine" and auto*atica!!$ partition"

    ;ob" into nu*ber of "*a!! ta"" and proce""e" the* in para!!e!9

    Feature"3 fairne""5 ta" data !oca!it$5 fau!t-to!erance9

    8

  • 7/27/2019 MapReduce-xpbu

    9/29

    MapReduce

    9

    MA)3 Input data

  • 7/27/2019 MapReduce-xpbu

    10/29

    MapReduce

    10

    Reduce

    Reduce

    Reduce

    MA)3 Input data CE3

  • 7/27/2019 MapReduce-xpbu

    11/29

    C. Xu @ Wayne State 11

    Count

    Count

    C

    ount

    Large scale data splits

    )ar"e-ha"h

    )ar"e-ha"h

    )ar"e-ha"h

    )ar"e-ha"h

    Map "ke!# 1$ %educers &sa!# 'ount(

  • 7/27/2019 MapReduce-xpbu

    12/29

    MapReduce

    12

  • 7/27/2019 MapReduce-xpbu

    13/29

    ?o. to "tore the data

    13

    Compute Nodes

    Whats the po!"em hee#

  • 7/27/2019 MapReduce-xpbu

    14/29

    Di"tributed Fi!e $"te*

    Don@t *o2e data to .orer"8 Mo2e .orer"to the data tore data on the !oca! di"" for node" in the c!u"ter tart up the .orer" on the node that ha" the data !oca!

    h$ ot enough RAM to ho!d a!! the data in *e*or$ et.or i" the bott!enec5 di" throughput i" good

    A di"tributed fi!e "$"te* i" the an".er (F %(oog!e Fi!e $"te*' ?DF for ?adoop

    14

  • 7/27/2019 MapReduce-xpbu

    15/29

    (F/?DF De"ign

    Co**odit$ hard.are o2er eotic hard.are ?igh co*ponent fai!ure rate" Fi!e" "tored a" chun"

    Fied "i+e %6MB'

    Re!iabi!it$ through rep!ication Each chun rep!icated acro"" chun"er2er"

    ing!e *a"ter to coordinate acce""5 eep *etadata i*p!e centra!i+ed *anage*ent

    o data caching 4itt!e benefit due to !arge data "et"5 "trea*ing read"

    i*p!if$ the A)I )u"h "o*e of the i""ue" onto the c!ient

    15

  • 7/27/2019 MapReduce-xpbu

    16/29

    (F/?DF

    16

  • 7/27/2019 MapReduce-xpbu

    17/29

    MapReduce Data 4oca!it$

    Ma"ter "chedu!ing po!ic$ A"" ?DF for !ocation" of rep!ica" of input fi!e b!oc" Map ta"" t$pica!!$ "p!it into 6MB %GG (F b!oc "i+e' 4oca!it$ !e2e!"3 node !oca!it$/rac !oca!it$/off-rac

    Map ta"" "chedu!ed a" c!o"e to it" input data a" po""ib!e Effect

    Thou"and" of *achine" read input at !oca! di" "peed9ithout thi"5 rac ".itche" !i*it read rate and net.or

    band.idth beco*e" the bott!enec9

    17

  • 7/27/2019 MapReduce-xpbu

    18/29

    MapReduce Fau!t-to!erance

    Reacti2e .a$orer fai!ure

    H ?eartbeat5 orer" are periodica!!$ pinged b$ *a"ter$N% esponse & 'a("ed )o*e

    H If the proce""or of a .orer fai!"5 the ta"" of that.orer are rea""igned to another .orer9

    Ma"ter fai!ure

    H Ma"ter .rite" periodic checpoint"H Another *a"ter can be "tarted fro* the !a"tchecpointed "tate

    H If e2entua!!$ the *a"ter die"5 the ;ob .i!! be aborted

    18

  • 7/27/2019 MapReduce-xpbu

    19/29

    MapReduce Fau!t-to!erance

    )roacti2e .a$ %Speculative Execution' The prob!e* of +"tragg!er",%"!o. .orer"'

    H ther ;ob" con"u*ing re"ource" on *achineH Bad di"" .ith "oft error" tran"fer data 2er$ "!o.!$

    H eird thing"3 proce""or cache" di"ab!ed %'

    hen co*putation a!*o"t done5 re"chedu!e in-progre"" ta""

    hene2er either the pri*ar$ or the bacupeecution" fini"he"5 *ar it a" co*p!eted

    19

  • 7/27/2019 MapReduce-xpbu

    20/29

    MapReduce chedu!ing

    Fair Sharing conduct" fair "chedu!ing u"ing greed$ *ethod to *aintain

    data !oca!it$Delay

    u"e" de!a$ "chedu!ing a!gorith* to achie2e good data

    !oca!it$ b$ "!ight!$ co*pro*i"ing fairne"" re"trictionLATE(4onge"t Approi*ate Ti*e to End)

    i*pro2e" MapReduce app!ication"J perfor*ance inheterogenou" en2iron*ent5 !ie 2irtua!i+ed en2iron*ent5through accurate "pecu!ati2e eecution

    Capacity introduced b$ :ahoo5 "upport" *u!tip!e Kueue" for "hared

    u"er" and guarantee" each Kueue a fraction of thecapacit$ of the c!u"ter

    20

  • 7/27/2019 MapReduce-xpbu

    21/29

    MapReduce C!oud er2ice

    )ro2iding MapReduce fra*e.or" a" a "er2ice in c!oud"beco*e" an attracti2e u"age *ode! for enterpri"e"9 A MapReduce c!oud "er2ice a!!o." u"er" to co"t-effecti2e!$

    acce"" a !arge a*ount of co*puting re"ource" .ith creatingo.n c!u"ter9

    >"er" are ab!e to ad;u"t the "ca!e of MapReduce c!u"ter" inre"pon"e to the change of the re"ource de*and ofapp!ication"9

    21

  • 7/27/2019 MapReduce-xpbu

    22/29

    A*a+on E!a"tic MR

    -ou

    1. Sp data to "uste2. /oe data (nto S

    3. ee"op ode "oa""y

    4. Su!m(t /apedueo!4a. o !a* to Step 3

    5. /oe data out o' S

    6. Sp data 'om "uste

    0. ""oate adoop "uste

    C2

    -ou adoop C"uste

    7. C"ean up

  • 7/27/2019 MapReduce-xpbu

    23/29

    e. Cha!!enge"

    Interference bet.een co-ho"ted LM" !o. do.n the ;ob 19#-7 ti*e"

    4oca!it$ pre"er2ing po!ic$ no !ong effecti2e 4o"e *ore than &0 !oca!it$ %depend"'

    eed "pecifica!!$ de"igned "chedu!er for2irtua! MapReduce c!u"ter

    Interference-a.are 4oca!it$-a.are

    23

  • 7/27/2019 MapReduce-xpbu

    24/29

    MapReduce )rogra**ing

    ?adoop i*p!e*entation of MR in Na2a %2er"ion 1909' ordCount ea*p!e3 hadoop-

    1909/"rc/ea*p!e"/org/apache/hadoop/ea*p!e"/ordCount9;a2a

    24

  • 7/27/2019 MapReduce-xpbu

    25/29

    MapReduce )rogra**ing

    25

  • 7/27/2019 MapReduce-xpbu

    26/29

    Map

    I*p!e*ent $our o.n *ap c!a"" etendingthe Mapper c!a""

    26

  • 7/27/2019 MapReduce-xpbu

    27/29

    Reduce

    I*p!e*ent $our o.n reducer c!a""etending the reducer c!a""

    27

  • 7/27/2019 MapReduce-xpbu

    28/29

    Main%'

    28

  • 7/27/2019 MapReduce-xpbu

    29/29

    De*o

    29


Recommended