LLNL-PRES-736898This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
Umpire:Next-GenerationMemoryandResourceManagementCoE PerformancePortabilityMeeting
DavidBeckingsale&RichHornung
August 22nd 2017
LLNL-PRES-7368982
§ TechnologyspecificAPIsforceapplicationdeveloperstocommittooneimplementation
§ Forprogrammingmodels,thiscanbemitigatedwithapproacheslikeRAJA&Kokkos
§ Weneedasimilarapproachformemoryandexecutionresources
§ UmpireAPIwillbedrivenbyapplicationneedsandlibraryrequirements
Upcominghardwareplatformshavecomplexsetsofmemoryandexecutionresources
LLNL-PRES-7368983
§ Howcanapplicationsandlibrariesco-ordinateusinglimitedmemoryresources?
§ Howcanwesupportflexibleallocationstrategiesfordifferentallocationtypes(e.g.temporaryarrays)?
§ Howcandatabemovedbetweenplacesinthememoryhierarchy?
UmpireMotivation
LLNL-PRES-7368984
§ Umpireisaresourcemanagementlibrarythatprovidesaunifiedhigh-levelAPIfordiscovery,provision,andmanagementofmemoryonnext-generationhardwarearchitectures
§ Decoupleresourceallocationfromspecificmemoryspaces,memoryallocators andmemoryoperations
§ Provideintrospectioncapabilityfortheseallocations,allowingapplicationsandlibrariesmakedecisionsbasedonallocationproperties
UmpireGoals
LLNL-PRES-7368985
§ Don’treinventthewheel
§ Provideaunified,high-levelandapplication-focusedAPIforprojectsliketcmalloc,jemalloc,memkind,SICM
UmpirewillleverageThird-PartyLibraries
Umpire
DDR GDDR
memkind tcmalloc cudaMalloc
cnmemmemkind
API
Implementations
Hardware
LLNL-PRES-7368986
§ Spacesabstractamemorylocation,providinganinterfacetoinspectpropertiesandtoallocate/freeviaastrategy.
§ Allocatorisalightweightinterfaceformakingandqueryingmemoryallocations
§ AllocationStrategy decoupleallocationsfromtheareatheyaremadein,allowingforcomplexallocationmechanisms.
§ Operations allowallocationstobemovedfromonespacetoanother.Theseoperationswillbespecializedbasedonthesourceanddestination.
UmpireConcepts
LLNL-PRES-7368987
§ UmpireuserinterfacewillbebasedaroundAllocators
§ Allocatorobjecthidesspecificimplementationbehindaunifiedinterface
§ Allowsaccessingaparticularmemoryspacethesamewayasacomplexslaballocator
§ AllocatorsareaccessedbyqueryingacentralResourceManager
Allocators&AllocationStrategies
jemalloc
cudaMalloc
MyArena
Allocator
allocate()deallocate()
LLNL-PRES-7368988
§ Spacesarecreatedbasedonaccessibilityofdifferentmemoryresources
§ Forexample,onatypicalCPU-GPUnode:— 1areafortheDRAM— 1areaperGDRAM(devicememory)— 1areafor“unifiedmemory”
§ Notethatalthoughthesespacesoverlap,theyarestillseparatelyidentified
§ Onceaspaceisconstructed,itwillbetiedtoa“system”allocator
Spaces
UM
DDR
GDDR0 GDDR1
LLNL-PRES-7368989
§ Operationsallowdatamovementbetweenspaces
§ Theresourcemanagerwillhandlealldatamovement,sotheuseronlyneedstoprovidethesourcepointeranddestinationspace
void* moved = rm.move(ptr, new_space);
§ Higher-levelcapabilitiesarounddatamovementlikecachingallocationsindifferentlocationswillbehandledbyotherlibraries/applications(e.g.CHAI)
Operations
LLNL-PRES-73689810
Umpirewillco-ordinatewithotherprojects
Umpire
CHAI Sidre
RAJA
RAJA
CHAI
Sidre
Umpire
Dependencies
• Lightweight portability layer for loops (“on-node” programming model)
• Gives context for CHAI data copies
• Lightweight pointer abstraction to make run-time data copies transparent
• Requires RAJA (and Umpire in future)
• Data description and access for sharing across apps and tools
• Will require Umpire for allocations (future)
• Portable memory allocation and query API
• Underpins CHAI and Sidre (future)
LLNL-PRES-73689811
chai::ManagedArray<float> a(100);
chai::ManagedArray<float> b(100);
// init data on host
const float x = 1.0;
forall<cuda_exec>(0, 100, [=] (int i) {a[i] = a[i]*x + b[i];
});
forall<seq_exec>(0, 100, [=] (int i) {std::cout << “a[i] = “ << a[i];std::cout << std::endl;
});
Chai
A
B
AB
AB
AB
AB
CPU GPU
Umpire handles data allocation
Umpire handles data movement
LLNL-PRES-73689812
§ FlexiblepoolsfortemporaryallocationofGPUdata— SeeBrianRyujin’s talk
§ PassingAllocatorsfromapplicationthroughtolibrarysothatlibrarydataallocatedinthesameplace
§ CHAIinuseinmultipleLLNLapplication— SeeAdamKunen’s talk
InitialUseCases
LLNL-PRES-73689813
§ InitialimplementationsupportingAllocatorsforCPU andGPUandsimplearenaallocation
§ Releaseprocessunderway,willhostonGitHub
§ Weareinterestedincollaborationsatalllevelsofthememorysoftwarestack
CurrentStatus
LLNL-PRES-73689815
§ InUmpire,the Allocatorconcept isastatelessobjectthathandlesallocationsatthesystemlevel,andisthelowest-levelcomponentintheUmpiresystem.
§ The required interfaceismodeledonthatoftheC++17allocatorconcept
§ Wecurrentlyhavewrappersaroundstd::malloc,cudaMalloc,cudaMallocManaged thatsupportthisinterface
AllocatorConcepts
LLNL-PRES-73689816
§ Umpirewillsupportintrospectionofallocationsandresources,allowingapplicationsandlibrariestodynamicallyadjusttherebehavior
§ Where(whatspace)isthispointer?
§ Howmuchmemoryisleftinthisspace?
AllocationIntrospection