+ All Categories
Home > Documents > List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and...

List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and...

Date post: 14-Aug-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
16
TDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1 Introduction and Functional Programming 2 Imperative Programming and Data Structures 3 Environment 4 Evaluation 5 Object Oriented Programming 6 Macros and decorators 7 Virtual Machines and Bytecode 8 Garbage Collection and Native Code 9 Parallel and Distributed Computing 10 Logic Programming 11 Summary 3 / 64 Lecture goal Learn about the concept, the challenges of distributed computing The impact of distributed programming on programming language and implementations 4 / 64 Lecture content Parallel Programming Multithreaded Programming The States Problems and Solutions Atomic actions Language and Interpreter Design Considerations Single Instruction, Multiple Threads Programming Distributed programming Message Passing MapReduce
Transcript
Page 1: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

TDDA69DataandProgramStructureParallelandDistributedComputing

CyrilleBerger

2/64

Listoflectures1IntroductionandFunctionalProgramming2ImperativeProgrammingandDataStructures3Environment4Evaluation5ObjectOrientedProgramming6Macrosanddecorators7VirtualMachinesandBytecode8GarbageCollectionandNativeCode9ParallelandDistributedComputing

10LogicProgramming11Summary

3/64

LecturegoalLearnabouttheconcept,thechallengesofdistributedcomputingTheimpactofdistributedprogrammingonprogramminglanguageandimplementations

4/64

LecturecontentParallelProgramming

MultithreadedProgrammingTheStatesProblemsandSolutions

AtomicactionsLanguageandInterpreterDesignConsiderations

SingleInstruction,MultipleThreadsProgramming

DistributedprogrammingMessagePassingMapReduce

Page 2: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

5/64

ConcurrentcomputingInconcurrentcomputingseveralcomputationsareexecutedatthesametimeInparallelcomputingallcomputationsunitshaveaccesstosharedmemory(forinstanceinasingleprocess)Indistributedcomputingcomputationsunitscommunicatethroughmessagespassing

6/64

BenefitsofconcurrentcomputingFasterResponsiveness

Interactiveapplicationscanbeperformingtwotasksatthesametime:rendering,spellchecking...

AvailabilityofservicesLoadbalancingbetweenservers

ControllabilityTasksrequiringcertainpreconditionscansuspendandwaituntilthepreconditionshold,thenresumeexecutiontransparently.

7/64

Disadvantagesofconcurrentcomputing

ConcurrencyishardtoimplementproperlySafety

EasytocorruptDeadlock

TaskscanwaitindefinitelyforeachNon-Notalwaysfaster!

ThememorybandwidthandCPUcacheis

8/64

Concurrentcomputingprogramming

Fourbasicapproachtocomputing:Sequencialprogramming:noconcurrencyDeclarativeconcurrency:streamsinafunctionallanguageMessagepassing:withactiveobjects,usedindistributedcomputingAtomicactions:onasharedmemory,usedinparallelcomputing

Page 3: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

9/64

StreamProgramminginFunctionalProgramming

NoglobalFunctionsonlyactontheirinput,theyarereentrantFunctionscanthenbeexecutedinparallel

Aslongastheydonotdependontheoutputofanotherfunction

ParallelProgramming

11

ParallelProgrammingInparallelcomputingseveralcomputationsareexecutedatthesametimeandhaveaccesstosharedmemory

Unit Unit Unit

Memory

12

SIMD,SIMT,SMT(1/2)SIMD:SingleInstruction,Multiple

Elementsofashortvector(4to8elements)areprocessedinparallel

SIMT:SingleInstruction,MultipleThesameinstructionisexecutedbymultiplethreads(from128to3048ormoreinthefuture)

SMT:SimultaneousGeneralpurpose,differentinstructionsareexecutedbydifferentthreads

Page 4: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

13

SIMD,SIMT,SMT(2/2)SIMD:

PUSH[1,2,3,4]PUSH[4,5,6,7]chrome://downloads/VEC_ADD_4

SIMT:execute([1,2,3,4],[4,5,6,7],lambdaa,b,ti:a[ti]=a[ti]+max(b[ti],5))

SMT:a=[1,2,3,4]b=[4,5,6,7]...Thread.new(lambda:a=a+b)Thread.new(lambda:c=c*b)

14

Whytheneedforthedifferentmodels?

Flexibility:SMT>SIMT>SIMD

Lessflexibilitygivehigherperformance

Unlessthelackofflexibilitypreventtoaccomplishthetask

Performance:SIMD>SIMT>SMT

MultithreadedProgramming

16

SinglethreadedvsMultithreaded

Page 5: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

17

MultithreadedProgrammingModelStartwithasinglerootthreadFork:tocreateconcurentlyexecutingthreadsJoin:tosynchronizethreadsThreadscommunicatethroughsharedmemoryThreadsexecuteassynchronouslyTheymayormaynotexecuteondifferentprocessors

main

sub0 subn...

main

sub0 subn...

main

18

Amultithreadedexamplethread1=newThread(function(){/*dosomecomputation*/});thread2=newThread(function(){/*dosomecomputation*/});thread1.start();thread2.start();thread1.join();thread2.join();

TheStatesProblemsandSolutions

20

GlobalStatesandmulti-threadingExample:

vara=0;thread1=newThread(function(){a=a+1;});thread2=newThread(function(){a=a+1;});thread1.start();thread2.start();

Whatisthevalueofa?Thisiscalledaracecondition

Page 6: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

Atomicactions

22

MutexMutexistheshortofMutualexclusion

Itisatechniquetopreventtwothreadstoaccessasharedresourceatthesametime

Example:vara=0;varm=newMutex();thread1=newThread(function(){m.lock();a=a+1;m.unlock();});

thread2=newThread(function(){m.lock();a=a+1;m.unlock();});thread1.start();thread2.start();

Now

23

DependencyExample:

vara=1;varm=newMutex();thread1=newThread(function(){m.lock();a=a+1;m.unlock();});

thread2=newThread(function(){m.lock();a=a*3;m.unlock();});thread1.start();thread2.start();

Whatisthevalueofa?4or6?

24

ConditionvariableAConditionvariableisasetofthreadswaitingforacertainconditionExample:

vara=1;varm=newMutex();varcv=newConditionVariable();thread1=newThread(function(){m.lock();a=a+1;cv.notify();m.unlock();});

thread2=newThread(function(){cv.wait();m.lock();a=a*3;m.unlock();});thread1.start();thread2.start();a=6

Page 7: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

25

DeadlockWhatmighthappen:

vara=0;varb=2;varma=newMutex();varmb=newMutex();thread1=newThread(function(){ma.lock();mb.lock();b=b-1;a=a-1;ma.unlock();mb.unlock();});

thread2=newThread(function(){mb.lock();ma.lock();b=b-1;a=a+b;mb.unlock();ma.unlock();});thread1.start();thread2.start();thread1waitsformb,

thread2waitsforma

26

AdvantagesofatomicactionsVeryefficientLessoverhead,fasterthanmessagepassing

27

DisadvantagesofatomicactionsBlocking

MeaningsomethreadshavetowaitSmalloverheadDeadlockAlow-prioritythreadcanblockahighprioritythreadAcommonsourceofprogrammingerrors

LanguageandInterpreterDesignConsiderations

Page 8: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

29

CommonmistakesForgettounlockamutexRaceconditionDeadlocksGranularityissues:toomuchlockingwillkilltheperformance

30

ForgettounlockamutexMostprogramminglanguagehave,either:

AguardobjectthatwillunlockamutexupondestructionAsynchronizationstatementsome_rlock=threading.RLock()withsome_rlock:print("some_rlockislockedwhilethisexecutes")

31

RaceconditionCanwedetectpotentialraceconditionduringcompilation?Intherustprogramminglanguage

ObjectsareownedbyaspecificthreadTypescanbemarkedwithSendtraitindicatethattheobjectcanbemovedbetweenthreads

TypescanbemarkedwithSynctraitindicatethattheobjectcanbeaccessedbymultiplethreadssafely

32

SafeSharedMutableStateinrust(1/3)

letmutdata=vec![1u32,2,3];forjin0..2{thread::spawn(move||{for(inti=0;i<2;++i)data[i]+=1;});}

Givesanerror:"captureofmovedvalue:`data`"

Page 9: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

33

SafeSharedMutableStateinrust(2/3)

letmutdata=Mutex::new(vec![1u32,2,3]);forjin0..2{letdata=data.lock().unwrap();thread::spawn(move||{for(inti=0;i<2;++i)data[i]+=1;});}

Givesanerror:MutexGuarddoesnothaveSendtraits

Meaningwecanotmovedatatothethread

34

SafeSharedMutableStateinrust(3/3)

letmutdata=Arc::new(vec![1u32,2,3]);forjin0..2{letdata=data.clone();thread::spawn(move||{letmutdata=data.lock().unwrap();for(inti=0;i<2;++i)data[i]+=1;});}

ArchastheSynctrait.

SingleInstruction,MultipleThreadsProgramming

36

SingleInstruction,MultipleThreadsProgramming

WithSIMT,thesameinstructionsisexecutedbymultiplethreadsondifferentregisters

Page 10: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

37

Singleinstruction,multipleflowpaths(1/2)

Usingamaskingsystem,itispossibletosupportif/elseblock

Threadsarealwaysexecutingtheinstructionofbothpartoftheif/elseblocksdata=[-2,0,1,-1,2],data2=[...]functionf(thread_id,data,data2){if(data[thread_id]<0){data[thread_id]=data[thread_id]-data2[thread_id];}elseif(data[thread_id]>0){data[thread_id]=data[thread_id]+data2[thread_id];}}

38

Singleinstruction,multipleflowpaths(1/2)

Benefits:Multipleflowsareneededinmanyalgorithms

Drawbacks:Onlyoneflowpathisexecutedatatime,nonrunningthreadsmustwaitRandomizememoryaccessElementsofavectorarenotaccessedsequentially

39

ProgrammingLanguageDesignforSIMT

OpenCL,CUDAarethemostcommonVerylowlevel,C/C++-derivative

GeneralpurposeprogramminglanguagearenotsuitableSomeworkhasbeendonetowriteinPythonforCUDA

@jit(argtypes=[float32[:],float32[:],float32[:]],target='gpu')defadd_matrix(A,B,C):A[cuda.threadIdx.x]=B[cuda.threadIdx.x]+C[cuda.threadIdx.x]withlimitationonstandardfunctionthatcanbecalled

Distributedprogramming

Page 11: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

41

DistributedProgramming(1/4)Indistributedcomputingseveralcomputationsareexecutedatthesametimeandcommunicatethroughmessagespassing

Unit Unit Unit

Memory Memory Memory

42

Distributedprogramming(2/4)Adistributedcomputingapplicationconsistsofmultipleprogramsrunningonmultiplecomputersthattogethercoordinatetoperformsometask.Computationisperformedinparallelbymanycomputers.Informationcanberestrictedtocertaincomputers.Redundancyandgeographicdiversityimprovereliability.

43

Distributedprogramming(3/4)Characteristicsofdistributedcomputing:

Computersareindependent—theydonotsharememory.Coordinationisenabledbymessagespassedacrossanetwork.

44

Distributedprogramming(4/4)Individualprogramshavedifferentiatingroles.Distributedcomputingforlarge-scaledataprocessing:

Databasesrespondtoqueriesoveranetwork.Datasetscanbepartitionedacrossmultiplemachines.

Page 12: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

MessagePassing

46

MessagePassingMessagesare(usually)passedthroughsocketsMessagesareexchangedsyncrhonouslyorasynchronouslyCommunicationcanbecentralizedorpeer-to-peer

47

Python'sGlobalInterpreterLockCPythoncanonlyinterpretonesinglethreadatagiventimeThelockisreleased,

ThecurrentthreadisblockingforI/OEvery100interpreterticks

TruemultithreadingisnotpossiblewithCPython

48

Python'sMultiprocessingmoduleThemultiprocessingpackageoffersbothlocalandremoteconcurrency,effectivelyside-steppingtheGlobalInterpreterLockbyusingsubprocessesinsteadofthreadsItimplementstransparentmessagepassing,allowingtoexchangePythonobjectsbetweenprocesses

Page 13: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

49

Python'sMessagePassing(1/2)Exampleofmessagepassing

frommultiprocessingimportProcessdeff(name):print'hello',nameif__name__=='__main__':p=Process(target=f,args=('bob',))p.start()p.join()Outputhellobob

50

Python'sMessagePassing(2/2)ExampleofmessagepassingwithpipesfrommultiprocessingimportProcess,Pipedeff(conn):conn.send([42,None,'hello'])conn.close()if__name__=='__main__':parent_conn,child_conn=Pipe()p=Process(target=f,args=(child_conn,))p.start()printparent_conn.recv()p.join()

Output[42,None,'hello']

Transparentmessagepassingispossiblethankstoserialization

51

SerializationAserializedobjectisanobjectrepresentedasasequenceofbytesthatincludestheobject’sdata,itstypeandthetypesofdatastoredintheobject.

52

pickleInPython,serializationisdonewiththepicklemodule

Itcanserializeuser-definedTheclassdefinitionmustbeavailablebeforedeserialization

WorkswithdifferentversionofBydefault,useanASCII

Itcanserialize:Basictypes:booleans,numbers,Containers:tuples,lists,setsanddictionnary(ofpickableToplevelfunctionsandclasses(onlytheObjectswhere__dict__or__getstate()__are

Example:pickle.loads(pickle.dumps(10))

Page 14: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

53

SharedmemoryMemorycanbesharedbetweenPythonprocesswithaValueorArray.

frommultiprocessingimportProcess,Value,Arraydeff(n,a):n.value=3.1415927foriinrange(len(a)):a[i]=-a[i]if__name__=='__main__':num=Value('d',0.0)arr=Array('i',range(10))p=Process(target=f,args=(num,arr))p.start()p.join()printnum.valueprintarr[:]

Andofcourse,youwouldneedtouseMutextoavoidrace

MapReduce

55

BigDataProcessing(1/2)MapReduceisaframeworkforbatchprocessingofbigdata.Framework:AsystemusedbyprogrammerstobuildapplicationsBatchprocessing:Allthedataisavailableattheoutset,andresultsarenotuseduntilprocessingcompletesBigdata:Usedtodescribedatasetssolargeandcomprehensivethattheycanrevealfactsaboutawholepopulation,usuallyfromstatisticalanalysis

56

BigDataProcessing(2/2)TheMapReduce

DatasetsaretoobigtobeanalyzedbyonemachineUsingmultiplemachineshasthesamecomplications,regardlessoftheapplication/analysisPurefunctionsenableanabstractionbarrierbetweendataprocessinglogicandcoordinatingadistributedapplication

Page 15: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

57

MapReduceEvaluationModel(1/2)

Mapphase:Applyamapperfunctiontoallinputs,emittingintermediatekey-valuepairs

Themappertakesaniterablevaluecontaininginputs,suchaslinesoftextThemapperyieldszeroormorekey-valuepairsforeachinput

58

MapReduceEvaluationModel(2/2)Reducephase:Foreachintermediatekey,applyareducerfunctiontoaccumulateallvaluesassociatedwiththatkey

Thereducertakesaniterablevaluecontainingintermediatekey-valuepairsAllpairswiththesamekeyappearconsecutivelyThereduceryieldszeroormorevalues,eachassociatedwiththatintermediatekey

59

MapReduceExecutionModel(1/2)

60

MapReduceExecutionModel(2/2)

Page 16: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and

61

MapReduceexampleFroma1.1billionpeopledatabase(facebook?),wewanttoknowtheaveragenumberoffriendsperageIn

SELECTage,AVG(friends)FROMusersGROUPBYage

Inthetotalsetofusersinsplitteddifferentusers_setfunctionmap(users_set){for(userinusers_set){send(user.age,user.friends.size);}}

Thekeysareshuffledandassignedtoreducersfunctionreduce(age,friends):{varr=0;for(friendinfriends){r+=friend;}send(age,r/friends.size);}

62

MapReduceAssumptionsConstraintsonthemapperandreducer:

ThemappermustbeequivalenttoapplyingadeterministicpurefunctiontoeachinputindependentlyThereducermustbeequivalenttoapplyingadeterministicpurefunctiontothesequenceofvaluesforeachkey

Benefitsoffunctionalprogramming:Whenaprogramcontainsonlypurefunctions,callexpressionscanbeevaluatedinanyorder,lazily,andinparallelReferentialtransparency:acallexpressioncanbereplacedbyitsvalue(orvisversa)withoutchangingtheprogram

InMapReduce,thesefunctionalprogrammingideasallow:

Consistentresults,howevercomputationisRe-computationandcachingofresults,as

63

MapReduceBenefitsFaulttolerance:Amachineorharddrivemightcrash

TheMapReduceframeworkautomaticallyre-runsfailedtasksSpeed:Somemachinemightbeslowbecauseit'soverloaded

Theframeworkcanrunmultiplecopiesofataskandkeeptheresultoftheonethatfinishesfirst

Networklocality:DatatransferisexpensiveTheframeworktriestoschedulemaptasksonthemachinesthatholdthedatatobeprocessed

Monitoring:Willmyjobfinishbeforedinner?!?Theframeworkprovidesaweb-basedinterfacedescribingjobs

64/64

SummaryParallelprogrammingMulti-threadingandhowtohelpreduceprogrammererrorDistributedprogrammingandMapReduce


Recommended