TDDA69DataandProgramStructureParallelandDistributedComputing
CyrilleBerger
2/64
Listoflectures1IntroductionandFunctionalProgramming2ImperativeProgrammingandDataStructures3Environment4Evaluation5ObjectOrientedProgramming6Macrosanddecorators7VirtualMachinesandBytecode8GarbageCollectionandNativeCode9ParallelandDistributedComputing
10LogicProgramming11Summary
3/64
LecturegoalLearnabouttheconcept,thechallengesofdistributedcomputingTheimpactofdistributedprogrammingonprogramminglanguageandimplementations
4/64
LecturecontentParallelProgramming
MultithreadedProgrammingTheStatesProblemsandSolutions
AtomicactionsLanguageandInterpreterDesignConsiderations
SingleInstruction,MultipleThreadsProgramming
DistributedprogrammingMessagePassingMapReduce
5/64
ConcurrentcomputingInconcurrentcomputingseveralcomputationsareexecutedatthesametimeInparallelcomputingallcomputationsunitshaveaccesstosharedmemory(forinstanceinasingleprocess)Indistributedcomputingcomputationsunitscommunicatethroughmessagespassing
6/64
BenefitsofconcurrentcomputingFasterResponsiveness
Interactiveapplicationscanbeperformingtwotasksatthesametime:rendering,spellchecking...
AvailabilityofservicesLoadbalancingbetweenservers
ControllabilityTasksrequiringcertainpreconditionscansuspendandwaituntilthepreconditionshold,thenresumeexecutiontransparently.
7/64
Disadvantagesofconcurrentcomputing
ConcurrencyishardtoimplementproperlySafety
EasytocorruptDeadlock
TaskscanwaitindefinitelyforeachNon-Notalwaysfaster!
ThememorybandwidthandCPUcacheis
8/64
Concurrentcomputingprogramming
Fourbasicapproachtocomputing:Sequencialprogramming:noconcurrencyDeclarativeconcurrency:streamsinafunctionallanguageMessagepassing:withactiveobjects,usedindistributedcomputingAtomicactions:onasharedmemory,usedinparallelcomputing
9/64
StreamProgramminginFunctionalProgramming
NoglobalFunctionsonlyactontheirinput,theyarereentrantFunctionscanthenbeexecutedinparallel
Aslongastheydonotdependontheoutputofanotherfunction
ParallelProgramming
11
ParallelProgrammingInparallelcomputingseveralcomputationsareexecutedatthesametimeandhaveaccesstosharedmemory
Unit Unit Unit
Memory
12
SIMD,SIMT,SMT(1/2)SIMD:SingleInstruction,Multiple
Elementsofashortvector(4to8elements)areprocessedinparallel
SIMT:SingleInstruction,MultipleThesameinstructionisexecutedbymultiplethreads(from128to3048ormoreinthefuture)
SMT:SimultaneousGeneralpurpose,differentinstructionsareexecutedbydifferentthreads
13
SIMD,SIMT,SMT(2/2)SIMD:
PUSH[1,2,3,4]PUSH[4,5,6,7]chrome://downloads/VEC_ADD_4
SIMT:execute([1,2,3,4],[4,5,6,7],lambdaa,b,ti:a[ti]=a[ti]+max(b[ti],5))
SMT:a=[1,2,3,4]b=[4,5,6,7]...Thread.new(lambda:a=a+b)Thread.new(lambda:c=c*b)
14
Whytheneedforthedifferentmodels?
Flexibility:SMT>SIMT>SIMD
Lessflexibilitygivehigherperformance
Unlessthelackofflexibilitypreventtoaccomplishthetask
Performance:SIMD>SIMT>SMT
MultithreadedProgramming
16
SinglethreadedvsMultithreaded
17
MultithreadedProgrammingModelStartwithasinglerootthreadFork:tocreateconcurentlyexecutingthreadsJoin:tosynchronizethreadsThreadscommunicatethroughsharedmemoryThreadsexecuteassynchronouslyTheymayormaynotexecuteondifferentprocessors
main
sub0 subn...
main
sub0 subn...
main
18
Amultithreadedexamplethread1=newThread(function(){/*dosomecomputation*/});thread2=newThread(function(){/*dosomecomputation*/});thread1.start();thread2.start();thread1.join();thread2.join();
TheStatesProblemsandSolutions
20
GlobalStatesandmulti-threadingExample:
vara=0;thread1=newThread(function(){a=a+1;});thread2=newThread(function(){a=a+1;});thread1.start();thread2.start();
Whatisthevalueofa?Thisiscalledaracecondition
Atomicactions
22
MutexMutexistheshortofMutualexclusion
Itisatechniquetopreventtwothreadstoaccessasharedresourceatthesametime
Example:vara=0;varm=newMutex();thread1=newThread(function(){m.lock();a=a+1;m.unlock();});
thread2=newThread(function(){m.lock();a=a+1;m.unlock();});thread1.start();thread2.start();
Now
23
DependencyExample:
vara=1;varm=newMutex();thread1=newThread(function(){m.lock();a=a+1;m.unlock();});
thread2=newThread(function(){m.lock();a=a*3;m.unlock();});thread1.start();thread2.start();
Whatisthevalueofa?4or6?
24
ConditionvariableAConditionvariableisasetofthreadswaitingforacertainconditionExample:
vara=1;varm=newMutex();varcv=newConditionVariable();thread1=newThread(function(){m.lock();a=a+1;cv.notify();m.unlock();});
thread2=newThread(function(){cv.wait();m.lock();a=a*3;m.unlock();});thread1.start();thread2.start();a=6
25
DeadlockWhatmighthappen:
vara=0;varb=2;varma=newMutex();varmb=newMutex();thread1=newThread(function(){ma.lock();mb.lock();b=b-1;a=a-1;ma.unlock();mb.unlock();});
thread2=newThread(function(){mb.lock();ma.lock();b=b-1;a=a+b;mb.unlock();ma.unlock();});thread1.start();thread2.start();thread1waitsformb,
thread2waitsforma
26
AdvantagesofatomicactionsVeryefficientLessoverhead,fasterthanmessagepassing
27
DisadvantagesofatomicactionsBlocking
MeaningsomethreadshavetowaitSmalloverheadDeadlockAlow-prioritythreadcanblockahighprioritythreadAcommonsourceofprogrammingerrors
LanguageandInterpreterDesignConsiderations
29
CommonmistakesForgettounlockamutexRaceconditionDeadlocksGranularityissues:toomuchlockingwillkilltheperformance
30
ForgettounlockamutexMostprogramminglanguagehave,either:
AguardobjectthatwillunlockamutexupondestructionAsynchronizationstatementsome_rlock=threading.RLock()withsome_rlock:print("some_rlockislockedwhilethisexecutes")
31
RaceconditionCanwedetectpotentialraceconditionduringcompilation?Intherustprogramminglanguage
ObjectsareownedbyaspecificthreadTypescanbemarkedwithSendtraitindicatethattheobjectcanbemovedbetweenthreads
TypescanbemarkedwithSynctraitindicatethattheobjectcanbeaccessedbymultiplethreadssafely
32
SafeSharedMutableStateinrust(1/3)
letmutdata=vec![1u32,2,3];forjin0..2{thread::spawn(move||{for(inti=0;i<2;++i)data[i]+=1;});}
Givesanerror:"captureofmovedvalue:`data`"
33
SafeSharedMutableStateinrust(2/3)
letmutdata=Mutex::new(vec![1u32,2,3]);forjin0..2{letdata=data.lock().unwrap();thread::spawn(move||{for(inti=0;i<2;++i)data[i]+=1;});}
Givesanerror:MutexGuarddoesnothaveSendtraits
Meaningwecanotmovedatatothethread
34
SafeSharedMutableStateinrust(3/3)
letmutdata=Arc::new(vec![1u32,2,3]);forjin0..2{letdata=data.clone();thread::spawn(move||{letmutdata=data.lock().unwrap();for(inti=0;i<2;++i)data[i]+=1;});}
ArchastheSynctrait.
SingleInstruction,MultipleThreadsProgramming
36
SingleInstruction,MultipleThreadsProgramming
WithSIMT,thesameinstructionsisexecutedbymultiplethreadsondifferentregisters
37
Singleinstruction,multipleflowpaths(1/2)
Usingamaskingsystem,itispossibletosupportif/elseblock
Threadsarealwaysexecutingtheinstructionofbothpartoftheif/elseblocksdata=[-2,0,1,-1,2],data2=[...]functionf(thread_id,data,data2){if(data[thread_id]<0){data[thread_id]=data[thread_id]-data2[thread_id];}elseif(data[thread_id]>0){data[thread_id]=data[thread_id]+data2[thread_id];}}
38
Singleinstruction,multipleflowpaths(1/2)
Benefits:Multipleflowsareneededinmanyalgorithms
Drawbacks:Onlyoneflowpathisexecutedatatime,nonrunningthreadsmustwaitRandomizememoryaccessElementsofavectorarenotaccessedsequentially
39
ProgrammingLanguageDesignforSIMT
OpenCL,CUDAarethemostcommonVerylowlevel,C/C++-derivative
GeneralpurposeprogramminglanguagearenotsuitableSomeworkhasbeendonetowriteinPythonforCUDA
@jit(argtypes=[float32[:],float32[:],float32[:]],target='gpu')defadd_matrix(A,B,C):A[cuda.threadIdx.x]=B[cuda.threadIdx.x]+C[cuda.threadIdx.x]withlimitationonstandardfunctionthatcanbecalled
Distributedprogramming
41
DistributedProgramming(1/4)Indistributedcomputingseveralcomputationsareexecutedatthesametimeandcommunicatethroughmessagespassing
Unit Unit Unit
Memory Memory Memory
42
Distributedprogramming(2/4)Adistributedcomputingapplicationconsistsofmultipleprogramsrunningonmultiplecomputersthattogethercoordinatetoperformsometask.Computationisperformedinparallelbymanycomputers.Informationcanberestrictedtocertaincomputers.Redundancyandgeographicdiversityimprovereliability.
43
Distributedprogramming(3/4)Characteristicsofdistributedcomputing:
Computersareindependent—theydonotsharememory.Coordinationisenabledbymessagespassedacrossanetwork.
44
Distributedprogramming(4/4)Individualprogramshavedifferentiatingroles.Distributedcomputingforlarge-scaledataprocessing:
Databasesrespondtoqueriesoveranetwork.Datasetscanbepartitionedacrossmultiplemachines.
MessagePassing
46
MessagePassingMessagesare(usually)passedthroughsocketsMessagesareexchangedsyncrhonouslyorasynchronouslyCommunicationcanbecentralizedorpeer-to-peer
47
Python'sGlobalInterpreterLockCPythoncanonlyinterpretonesinglethreadatagiventimeThelockisreleased,
ThecurrentthreadisblockingforI/OEvery100interpreterticks
TruemultithreadingisnotpossiblewithCPython
48
Python'sMultiprocessingmoduleThemultiprocessingpackageoffersbothlocalandremoteconcurrency,effectivelyside-steppingtheGlobalInterpreterLockbyusingsubprocessesinsteadofthreadsItimplementstransparentmessagepassing,allowingtoexchangePythonobjectsbetweenprocesses
49
Python'sMessagePassing(1/2)Exampleofmessagepassing
frommultiprocessingimportProcessdeff(name):print'hello',nameif__name__=='__main__':p=Process(target=f,args=('bob',))p.start()p.join()Outputhellobob
50
Python'sMessagePassing(2/2)ExampleofmessagepassingwithpipesfrommultiprocessingimportProcess,Pipedeff(conn):conn.send([42,None,'hello'])conn.close()if__name__=='__main__':parent_conn,child_conn=Pipe()p=Process(target=f,args=(child_conn,))p.start()printparent_conn.recv()p.join()
Output[42,None,'hello']
Transparentmessagepassingispossiblethankstoserialization
51
SerializationAserializedobjectisanobjectrepresentedasasequenceofbytesthatincludestheobject’sdata,itstypeandthetypesofdatastoredintheobject.
52
pickleInPython,serializationisdonewiththepicklemodule
Itcanserializeuser-definedTheclassdefinitionmustbeavailablebeforedeserialization
WorkswithdifferentversionofBydefault,useanASCII
Itcanserialize:Basictypes:booleans,numbers,Containers:tuples,lists,setsanddictionnary(ofpickableToplevelfunctionsandclasses(onlytheObjectswhere__dict__or__getstate()__are
Example:pickle.loads(pickle.dumps(10))
53
SharedmemoryMemorycanbesharedbetweenPythonprocesswithaValueorArray.
frommultiprocessingimportProcess,Value,Arraydeff(n,a):n.value=3.1415927foriinrange(len(a)):a[i]=-a[i]if__name__=='__main__':num=Value('d',0.0)arr=Array('i',range(10))p=Process(target=f,args=(num,arr))p.start()p.join()printnum.valueprintarr[:]
Andofcourse,youwouldneedtouseMutextoavoidrace
MapReduce
55
BigDataProcessing(1/2)MapReduceisaframeworkforbatchprocessingofbigdata.Framework:AsystemusedbyprogrammerstobuildapplicationsBatchprocessing:Allthedataisavailableattheoutset,andresultsarenotuseduntilprocessingcompletesBigdata:Usedtodescribedatasetssolargeandcomprehensivethattheycanrevealfactsaboutawholepopulation,usuallyfromstatisticalanalysis
56
BigDataProcessing(2/2)TheMapReduce
DatasetsaretoobigtobeanalyzedbyonemachineUsingmultiplemachineshasthesamecomplications,regardlessoftheapplication/analysisPurefunctionsenableanabstractionbarrierbetweendataprocessinglogicandcoordinatingadistributedapplication
57
MapReduceEvaluationModel(1/2)
Mapphase:Applyamapperfunctiontoallinputs,emittingintermediatekey-valuepairs
Themappertakesaniterablevaluecontaininginputs,suchaslinesoftextThemapperyieldszeroormorekey-valuepairsforeachinput
58
MapReduceEvaluationModel(2/2)Reducephase:Foreachintermediatekey,applyareducerfunctiontoaccumulateallvaluesassociatedwiththatkey
Thereducertakesaniterablevaluecontainingintermediatekey-valuepairsAllpairswiththesamekeyappearconsecutivelyThereduceryieldszeroormorevalues,eachassociatedwiththatintermediatekey
59
MapReduceExecutionModel(1/2)
60
MapReduceExecutionModel(2/2)
61
MapReduceexampleFroma1.1billionpeopledatabase(facebook?),wewanttoknowtheaveragenumberoffriendsperageIn
SELECTage,AVG(friends)FROMusersGROUPBYage
Inthetotalsetofusersinsplitteddifferentusers_setfunctionmap(users_set){for(userinusers_set){send(user.age,user.friends.size);}}
Thekeysareshuffledandassignedtoreducersfunctionreduce(age,friends):{varr=0;for(friendinfriends){r+=friend;}send(age,r/friends.size);}
62
MapReduceAssumptionsConstraintsonthemapperandreducer:
ThemappermustbeequivalenttoapplyingadeterministicpurefunctiontoeachinputindependentlyThereducermustbeequivalenttoapplyingadeterministicpurefunctiontothesequenceofvaluesforeachkey
Benefitsoffunctionalprogramming:Whenaprogramcontainsonlypurefunctions,callexpressionscanbeevaluatedinanyorder,lazily,andinparallelReferentialtransparency:acallexpressioncanbereplacedbyitsvalue(orvisversa)withoutchangingtheprogram
InMapReduce,thesefunctionalprogrammingideasallow:
Consistentresults,howevercomputationisRe-computationandcachingofresults,as
63
MapReduceBenefitsFaulttolerance:Amachineorharddrivemightcrash
TheMapReduceframeworkautomaticallyre-runsfailedtasksSpeed:Somemachinemightbeslowbecauseit'soverloaded
Theframeworkcanrunmultiplecopiesofataskandkeeptheresultoftheonethatfinishesfirst
Networklocality:DatatransferisexpensiveTheframeworktriestoschedulemaptasksonthemachinesthatholdthedatatobeprocessed
Monitoring:Willmyjobfinishbeforedinner?!?Theframeworkprovidesaweb-basedinterfacedescribingjobs
64/64
SummaryParallelprogrammingMulti-threadingandhowtohelpreduceprogrammererrorDistributedprogrammingandMapReduce