+ All Categories
Home > Documents > 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

Date post: 04-Jan-2016
Category:
Upload: rafe-king
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
46
1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005
Transcript
Page 1: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

1

Basic Charm++ and Load Balancing

Gengbin Zheng

charm.cs.uiuc.edu10/11/2005

Page 2: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

2

Charm++ Basics

Page 3: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

3

Charm++ Parallel library for Object-

Oriented C++ applications Invoke functions remotely Messaging via remote method

calls (like CORBA) Communication “proxy” objects

Methods called by scheduler System determines who runs next

Multiple objects per processor Object migration fully supported

Even with broadcasts, reductions

Page 4: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

4

Virtualized Programming Model

User View

System implementation

User writes code in terms of communicating objects

System maps objects to processors

Page 5: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

5

Chares – Concurrent Objects

Can be dynamically created on any available processor

Can be accessed from remote processors

Send messages to each other asynchronously

Contain “entry methods”

Page 6: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

6

Charm++ Features: Object Arrays

A[0] A[1] A[2] A[3] A[n]

User’s view

Applications are written as a set of communicating objects

Page 7: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

7

Charm++ Features: Object Arrays

Charm++ maps those objects onto processors, routing messages as needed

A[0] A[1] A[2] A[3] A[n]

A[3]A[0]

User’s view

System view

Page 8: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

8

Charm++ Features: Object Arrays

Charm++ can re-map (migrate) objects for communication, load balance, fault tolerance, etc.

A[0] A[1] A[2] A[3] A[n]

A[3]A[0]

User’s view

System view

Page 9: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

9

Charm++ Array Definition

array[1D] foo { entry foo(int problemNo); entry void bar(int x); }

Interface (.ci) file

class foo : public CBase_foo {public:// Remote calls foo(int problemNo) { ... } void bar(int x) { ... } // Migration support: foo(CkMigrateMessage *m) {} void pup(PUP::er &p) {...}};

In a .C file

Page 10: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

10

Charm++ Remote Method Calls

To call a method on a remote C++ object foo, use the local “proxy” C++ object CProxy_foo generated from the interface file:

array[1D] foo { entry foo(int problemNo); entry void bar(int x); };

Interface (.ci) file

CProxy_foo someFoo=...;someFoo[i].bar(17);

In a .C file

This results in a network message, and eventually to a call to the real object’s method:

void foo::bar(int x) { ...

}

In another .C file

Generated class

i’th object method and parameters

Page 11: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

11

Charm++ Startup Process: Main

module myModule { array[1D] foo { entry foo(int problemNo); entry void bar(int x); } mainchare myMain { entry myMain(int argc,char **argv); }};

Interface (.ci) file

#include “myModule.decl.h”class myMain : public CBase_myMain { myMain(int argc,char **argv) { int nElements=7, i=nElements/2; CProxy_foo f=CProxy_foo::ckNew(2,nElements); f[i].bar(3); }};#include “myModule.def.h”

In a .C file Generated class

Called at startup on PE 0

Special startup object

Page 12: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

12

.ci filemainmodule hello {

mainchare mymain {

entry mymain(CkArgMsg *m);

};

};

“ Hello World!”

Generates

hello.decl.h

hello.def.h

#include “hello.decl.h”class mymain : public CBase_mymain{public: mymain(CkArgMsg *m) {

ckout <<“Hello World” <<endl;CkExit();

}};#include “hello.def.h”

.C file

Page 13: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

13

Compile and run the programCompiling

• charmc <options> <source file>• -o, -g, -language, -module, -tracemode

pgm: pgm.ci pgm.h pgm.C charmc pgm.ci charmc pgm.C charmc –o pgm pgm.o –language charm++

To run a CHARM++ program named ``pgm'' on four processors, type:

charmrun pgm +p4 <params>

Nodelist file (for network architecture)• list of machines to run the program• host <hostname> <qualifiers>

Example Nodelist File:group main ++shell sshhost Host1host Host2

Page 14: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

14

Charm++: Portability Runs on:

Any machine with MPI, including• IBM SP, Blue Gene/L•Cray XT3•Origin2000

PSC’s Lemieux (Quadrics Elan) Clusters with Ethernet (Udp/Tcp) Clusters with Myrinet (GM) Clusters with Amasso cards Apple clusters Even Windows!

SMP-Aware (pthreads)

Page 15: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

15

Build Charm++ Download from website

http://charm.cs.uiuc.edu/download.html

Build Charm++ ./build <target> <version> <options>

[compile flags]• ./build charm++ net-linux gm -g

Parallel make (-j2)

Compile code using charmc Portable compiler wrapper Link with “-language charm++”

Run code using charmrun

Page 16: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

16

How Charmrun Works?

ssh

connect

Acknowledge

Charmrun charmrun +p4 ./pgm

Page 17: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

17

Charmrun (batch mode)

ssh

connect

Acknowledge

Charmrun charmrun ++batch 8

Page 18: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

18

Debugging Charm++ Applications

Printf Gdb

Sequentially (standalone mode)

• gdb ./pgm +vp16 Run debugger in

xterm• charmrun +p4 pgm

++debug• charmrun +p4 pgm

++debug-no-pause Memory paranoid Parallel debugger

Page 19: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

19

Charm++ Features

Page 20: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

20

Message Driven Execution

Scheduler Scheduler

Message Q Message Q

Virtualization leads to Message Driven Execution

Page 21: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

21

Prioritized Messages Number of priority bits passed during

message allocation FooMsg * msg = new (size, nbits) FooMsg; Priorities stored at the end of messages

Signed integer priorities:*CkPriorityPtr(msg)=-1;

CkSetQueueing(m, CK_QUEUEING_IFIFO); Unsigned bitvector priorities

CkPriorityPtr(msg)[0]=0x7fffffff;

CkSetQueueing(m, CK_QUEUEING_BFIFO);

Page 22: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

22

Advanced Message Features

Expedited messages Message do not go through the

charm++ scheduler (faster) Top priority messages

Immediate messages Entries are executed in an

interrupt or the communication thread

Very fast, but tough to get right

Page 23: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

23

Object Migration

Page 24: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

24

How to Migrate a Virtual Processor?

Move all application state to new processor

Stack Data (threads) Subroutine variables and calls Managed by compiler

Heap Data Allocated with malloc/free Managed by user

Global Variables Open files, environment

variables, etc. (not handled yet!)

Page 25: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

25

Migration Solutions

Stack Data (threads) Automatic: isomalloc stacks

Heap Data Use “-memory isomalloc” -or- Write pup routines

Global Variables Use “-swapglobals”

•Works on ELF platform (Linux and Sun)• Just a pointer swap, no data copying

-or- Remove globals entirely

Page 26: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

26

Migrate Heap Data: PUP

Packing/unpacking user allocated data

Basic contract: here is my data Sizing: counts up data size Packing: copies data into message Unpacking: copies data back out Same call works for network,

memory, disk I/O ...

Page 27: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

27

Migrate Heap Data: PUP C++ Example

#include “pup.h”#include “pup_stl.h”

class myMesh { std::vector<float> nodes; std::vector<int> elts;public: ... void pup(PUP::er &p) { p|nodes; p|elts; }};

Page 28: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

28

Migrate Heap Data: PUP F90 ExampleTYPE(myMesh) INTEGER :: nn,ne REAL*4, ALLOCATABLE(:) :: nodes INTEGER, ALLOCATABLE(:) :: eltsEND TYPE

SUBROUTINE pupMesh(p,mesh) USE MODULE ... INTEGER :: p TYPE(myMesh) :: mesh fpup_int(p,mesh%nn) fpup_int(p,mesh%ne) IF (fpup_isUnpacking(p)) THEN ALLOCATE(mesh%nodes(mesh%nn)) ALLOCATE(mesh%elts(mesh%ne)) END IF fpup_floats(p,mesh%nodes,mesh%nn); fpup_ints(p,mesh%elts,mesh%ne); IF (fpup_isDeleting(p)) deleteMesh(mesh);END SUBROUTINE

Page 29: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

29

Automatic Load Balancing

Page 30: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

30

Motivation Irregular or dynamic applications

Initial static load balancing Application behaviors change

dynamically Difficult to implement with good parallel

efficiency Versatile, automatic load balancers

Application independent No/little user effort is needed in load

balance Work for both Charm++ and Adaptive

MPI

Page 31: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

31

Using Dynamic Mapping to Processors

Migrate objects between processors Use that for dynamic (and static, initial)

load balancing Two major approaches

No predictability of load patterns• Fully dynamic

• Early work on State Space Search, Branch&Bound, ..

With certain predictability• Measurement-based load balancing strategy• CSE, molecular dynamics simulation

Page 32: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

32

Applications lack of predictability

Flow of tasks - application generates a continuous flow of tasks The goal of the load balancing

strategies is to balance these tasks across the system for a fast response time and a better throughput

Tasks are assigned at creation time, no migration afterwards

Page 33: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

33

Seed Load Balancing

Neighborhood averaging with work-stealing when Idle using immediate messages Load balancing among

neighboring processors• Load is represented by

length of queue Work-stealing at idle

time with interruption-based message

• Fast response to the request

80000 objects, 10% heavy objects

Page 34: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

34

Link with a seed load balancer

Use –balance <random|neighbor> Charmc –o pgm pgm.o –balance

neighbor Specify topology

+LBTopo <ring|torus2d|…>

Page 35: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

35

Principle of Persistence Once an application is expressed in

terms of interacting objects, object communication patterns and computational loads tend to persist over time In spite of dynamic behavior

• Abrupt and large,but infrequent changes (eg:AMR)

• Slow and small changes (eg: particle migration) Parallel analog of principle of locality

Heuristics, that holds for most CSE applications

Run-time instrumentation is possible

Page 36: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

36

Measurement Based Load Balancing

Runtime instrumentation Measures CPU load per object Measures communication volume

between objects Measurement based load

balancers Use the instrumented database

periodically to make new decisions A load balancing strategy takes the

database as input and generates a new object-to-processor mapping

Page 37: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

37

Load Balancing – graph partitioning

LB View

mapping of objectsWeighted object graph in view of Load Balancer

Charm++ PE

Page 38: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

38

Charm++ Load Balancer in Action

Automatic Load Balancing in Crack Propagation

Page 39: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

39

Load Balancer Categories

Centralized Object load data

are sent to processor 0

Integrate to a complete object graph

Migration decision is broadcasted from processor 0

Global barrier

Distributed Load balancing

among neighboring processors

Build partial object graph

Migration decision is sent to its neighbors

No global barrier

Page 40: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

40

Main Centralized Load Balancing Strategies

GreedyCommLB a “greedy” load balancing strategy which uses the process

load and communications graph to map the processes with the highest load onto the processors with the lowest load, while trying to keep communicating processes on the same processor

RefineLB Incremental adjustment by moving objects off overloaded

processors to under-utilized processors to reach average load MetisLB

uses the METIS graph partitioning library to partition the object-communication graph with node (object) weights and communication loads on edges.

OrbLB treats objects with spatial coordinates. It applies an

orthogonal recursive bisection algorithm which attempts to provide a more balanced division of space.

Others – the manual discusses several other load balancers which are not used as often, but may be useful in some cases; also, more are being developed

Page 41: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

41

Load Balancing StrategiesBaseLB

CentralLB NborBaseLB

OrbLBDummyLB MetisLB RecBisectBfLB

GreedyLB RandCentLB RefineLB

GreedyCommLB RandRefLB RefineCommLB

NeighborLB

GreedyRefLB

Page 42: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

42

Neighborhood Load Balancing Strategies

NeighborLB processor tries to average out its

load only among its neighbors WSLB

A load balancer for timeshared workstation clusters, which can detect load changes on desktops and adjust load without interferes with other's use of the desktop

Page 43: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

43

Compiler Interface Link time options

-module: Link load balancers as modules

• -module EveryLB

Link multiple modules into binary• -balancer GreedyCommLB -balancer RefineLB

• -balancer ComboCentLB:GreedyLB,RefineLB

Page 44: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

44

Runtime Options Run-time options do the same

thing, but override the compile time options +balancer: invoke a load balancer Can have multiple load balancers

•+balancer GreedyCommLB +balancer RefineLB

Page 45: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

45

When to Re-balance Load?

Programmer Control: ReadyLoadBalance()

Enable load balancing at specific point Object ready to migrate Re-balance if needed ReadyLoadBalance() called when your chare is ready to be load

balanced – load balancing may not start right away ResumeFromSync() called when load balancing for this chare

has finished

Default: Load balancer is periodicProvide period as a runtime parameter (+LBPeriod)

Page 46: 1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.

46

Thank You!

Free source, binaries, manuals, and more information at:http://charm.cs.uiuc.edu/

Parallel Programming Lab at University of Illinois


Recommended