Umpire: Next-Generation Memory and Resource Management€¦ · LLNL-PRES-736898 4 §Umpire is a...

LLNL-PRES-736898This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

Umpire:Next-GenerationMemoryandResourceManagementCoE PerformancePortabilityMeeting

DavidBeckingsale&RichHornung

August 22nd 2017

LLNL-PRES-7368982

§ TechnologyspecificAPIsforceapplicationdeveloperstocommittooneimplementation

§ Forprogrammingmodels,thiscanbemitigatedwithapproacheslikeRAJA&Kokkos

§ Weneedasimilarapproachformemoryandexecutionresources

§ UmpireAPIwillbedrivenbyapplicationneedsandlibraryrequirements

Upcominghardwareplatformshavecomplexsetsofmemoryandexecutionresources

LLNL-PRES-7368983

§ Howcanapplicationsandlibrariesco-ordinateusinglimitedmemoryresources?

§ Howcanwesupportflexibleallocationstrategiesfordifferentallocationtypes(e.g.temporaryarrays)?

§ Howcandatabemovedbetweenplacesinthememoryhierarchy?

UmpireMotivation

LLNL-PRES-7368984

§ Umpireisaresourcemanagementlibrarythatprovidesaunifiedhigh-levelAPIfordiscovery,provision,andmanagementofmemoryonnext-generationhardwarearchitectures

§ Decoupleresourceallocationfromspecificmemoryspaces,memoryallocators andmemoryoperations

§ Provideintrospectioncapabilityfortheseallocations,allowingapplicationsandlibrariesmakedecisionsbasedonallocationproperties

UmpireGoals

LLNL-PRES-7368985

§ Don’treinventthewheel

§ Provideaunified,high-levelandapplication-focusedAPIforprojectsliketcmalloc,jemalloc,memkind,SICM

UmpirewillleverageThird-PartyLibraries

Umpire

DDR GDDR

memkind tcmalloc cudaMalloc

cnmemmemkind

API

Implementations

Hardware

LLNL-PRES-7368986

§ Spacesabstractamemorylocation,providinganinterfacetoinspectpropertiesandtoallocate/freeviaastrategy.

§ Allocatorisalightweightinterfaceformakingandqueryingmemoryallocations

§ AllocationStrategy decoupleallocationsfromtheareatheyaremadein,allowingforcomplexallocationmechanisms.

§ Operations allowallocationstobemovedfromonespacetoanother.Theseoperationswillbespecializedbasedonthesourceanddestination.

UmpireConcepts

LLNL-PRES-7368987

§ UmpireuserinterfacewillbebasedaroundAllocators

§ Allocatorobjecthidesspecificimplementationbehindaunifiedinterface

§ Allowsaccessingaparticularmemoryspacethesamewayasacomplexslaballocator

§ AllocatorsareaccessedbyqueryingacentralResourceManager

Allocators&AllocationStrategies

jemalloc

cudaMalloc

MyArena

Allocator

allocate()deallocate()

LLNL-PRES-7368988

§ Spacesarecreatedbasedonaccessibilityofdifferentmemoryresources

§ Forexample,onatypicalCPU-GPUnode:— 1areafortheDRAM— 1areaperGDRAM(devicememory)— 1areafor“unifiedmemory”

§ Notethatalthoughthesespacesoverlap,theyarestillseparatelyidentified

§ Onceaspaceisconstructed,itwillbetiedtoa“system”allocator

Spaces

UM

DDR

GDDR0 GDDR1

LLNL-PRES-7368989

§ Operationsallowdatamovementbetweenspaces

§ Theresourcemanagerwillhandlealldatamovement,sotheuseronlyneedstoprovidethesourcepointeranddestinationspace

void* moved = rm.move(ptr, new_space);

§ Higher-levelcapabilitiesarounddatamovementlikecachingallocationsindifferentlocationswillbehandledbyotherlibraries/applications(e.g.CHAI)

Operations

LLNL-PRES-73689810

Umpirewillco-ordinatewithotherprojects

Umpire

CHAI Sidre

RAJA

RAJA

CHAI

Sidre

Umpire

Dependencies

• Lightweight portability layer for loops (“on-node” programming model)

• Gives context for CHAI data copies

• Lightweight pointer abstraction to make run-time data copies transparent

• Requires RAJA (and Umpire in future)

• Data description and access for sharing across apps and tools

• Will require Umpire for allocations (future)

• Portable memory allocation and query API

• Underpins CHAI and Sidre (future)

LLNL-PRES-73689811

chai::ManagedArray<float> a(100);

chai::ManagedArray<float> b(100);

// init data on host

const float x = 1.0;

forall<cuda_exec>(0, 100, [=] (int i) {a[i] = a[i]*x + b[i];

});

forall<seq_exec>(0, 100, [=] (int i) {std::cout << “a[i] = “ << a[i];std::cout << std::endl;

});

Chai

A

B

AB

AB

AB

AB

CPU GPU

Umpire handles data allocation

Umpire handles data movement

LLNL-PRES-73689812

§ FlexiblepoolsfortemporaryallocationofGPUdata— SeeBrianRyujin’s talk

§ PassingAllocatorsfromapplicationthroughtolibrarysothatlibrarydataallocatedinthesameplace

§ CHAIinuseinmultipleLLNLapplication— SeeAdamKunen’s talk

InitialUseCases

LLNL-PRES-73689813

§ InitialimplementationsupportingAllocatorsforCPU andGPUandsimplearenaallocation

§ Releaseprocessunderway,willhostonGitHub

§ Weareinterestedincollaborationsatalllevelsofthememorysoftwarestack

CurrentStatus

LLNL-PRES-73689815

§ InUmpire,the Allocatorconcept isastatelessobjectthathandlesallocationsatthesystemlevel,andisthelowest-levelcomponentintheUmpiresystem.

§ The required interfaceismodeledonthatoftheC++17allocatorconcept

§ Wecurrentlyhavewrappersaroundstd::malloc,cudaMalloc,cudaMallocManaged thatsupportthisinterface

AllocatorConcepts

LLNL-PRES-73689816

§ Umpirewillsupportintrospectionofallocationsandresources,allowingapplicationsandlibrariestodynamicallyadjusttherebehavior

§ Where(whatspace)isthispointer?

§ Howmuchmemoryisleftinthisspace?

AllocationIntrospection

Date post:	04-Nov-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Umpire: Next-Generation Memory and Resource Management€¦ · LLNL-PRES-736898 4 §Umpire is a...

Documents