High Performance Computing and the FLAME Framework

High Performance Computing and the FLAME Framework

Prof C Greenough, LS Chin and Dr DJ WorthSTFC Rutherford Appleton Laboratory

Prof M Holcombe and Dr S CoakleyComputer Science, Sheffield University

Application can not be run on a conventional computing system– Insufficient memory– Insufficient compute power

High Performance Computing (HPC) generally now means:– Large multi-processor system– Complex communications hardware– Specialised attached processors– GRID/Cloud computing

STFC Rutherford Appleton Laboratory

2CLIMACE Meeting - 14 May 2009

Why High Performance Computing?

Parallel system are in constant development Their hardware architectures are ever changing

– simple distributed memory on multiple processors– share memory between multiple processors– hybrid systems –

clusters of share memory multiple processors clusters of multi-core systems

– the processors often have a multi-level cache system



Issues in High Performance Computing

Most have high speed multi-level communication switches

GRID architectures are now being used for very large simulations– many large high-performance systems – loosely coupled together over the internet

Performance can be improved by optimising to a specific architecture

Can very easily become architecture dependent



Issues in High Performance Computing



The FLAME Framework

Based on X-Machines Agents:

– Have memory– Have states– Communicate through messages

Structure of Application:– Embedded in XML and C-code– Application generation driven by state graph– Agent communication managed by library



Characteristics of FLAME

The Data Load– Size of agents internal memory– The number of size of message boards The Computational Load– Work performed in any state change– Any I/O performed FLAME Framework– Programme generator (serial/parallel)– Provides control of states– Provide communications network



Characteristics of FLAME

Based on :– the distribution of agents – computational load– distribution of message boards – data load

Agents only communicate via MBs Cross-node message information is made available to

agents by message board synchronisation Communication between nodes are minimised

– Halo regions– Message filtering



Initial Parallel Implementation


CLIMACE Meeting - 14 May 2009 9

Geometric Partitioning

halos

radius

P1

P2

P3

P4 P7 P10

P11

P12P9P6

P5 P8

Processors

Pi



Parallelism in FLAME

Parallelism is hidden in the XML model and the C-code – this is in term of agent locality or groupings

Communications captured in XML– In agent function descriptions– In message descriptions

The States are the computational load – weight not known until run time – could be fine or course grained

Initial distribution based on a static analysis Final distributions method be based on dynamic

behaviour



Issues with HPC and FLAME



Parallelism in FLAMEParallel agents grouped on parallel nodes.

Messages synchronised

Message board library allows both serial and parallel versions to work

Implementation details hiddenfrom modellers

System automatically manages the simulation

Decoupled from the FLAME framework Well defined Application Program Interface (API) Includes functions for creating, deleting, managing and

accessing information on the Message Boards Details such as internal data representations, memory

management and communication strategies are hidden Uses multi-threading for work and communications



Message Boards



FLAME & the Message Boards

MB Management– create, delete, add message, clear board

Access to message information (iterators)– plain, filtered, sorted, randomise

MB Synchronisation– moving information between nodes– full data replication – very expensive– filtered information using tagging– overlapped with computation



Message Board API

Message Board Management– MB_Env_Init - Initialises MB environment– MB_Env_Finalise - Finalises the MB environment– MB_Create - Creates a new Message Board object– MB_AddMessage - Adds a message to a Message

Board– MB_Clear - Clears a Message Board– MB_Delete - Deletes a Message Board



The MB Environment

Message Selection & Reading - Iterators– MB_Iterator_Create - Creates an iterator– MB_Iterator_CreateSorted - Create a sorted iterator– MB_Iterator_CreateFiltered - Create a filtered iterator– MB_Iterator_Delete - Deletes an iterator– MB_Iterator_Rewind - Rewinds an iterator– MB_Iterator_Randomise - Randomises an Iterator– MB_Iterator_GetMessage - Returns next message



The Message Board API (2)

Message Synchronisation: Synchronisation of boards involves the propagation of message data out across the processing nodes as required by the agents on each node

– MB_SyncStart - Synchronises a message board– MB_SyncTest - Tests for synchronisation completion– MB_SyncComplete - Completes the synchronisation




MB Sychronisation:– The simplest form is full replication of message data

- very expensive in communication and memory– The MB uses message tagging to reduce the volume

of data being transferred and stored– Tagging uses message FILTERs to select message

information to be transferred – FILTERs are specified in the Model File XMML





Selection based on filters Filters defined in XMML Filters can be used:

– in creating iterators to reduce local message list

– during synchronisation to minimise cross-node communications



Iterators objects used for traversing Message Board content. They provide users access to messages while isolating them from the internal data representation of Boards.

Creating an Iterator generates a list of the available messages within the Board against a specific criteria. This is a snapshot of the content of a local Board.



MB Iterators (1)



MB Iterators (2)

FLAME has been successfully ported to the to various HPC systems:– SCARF – 360x2.2 GHz AMD Opteron cores, 1.3TB total

memory– HAPU – 128x2.4 GHz Opteron cores, 2GB memory / core– NW-Grid – 384x2.4 GHz Opteron cores, 2 or 4 GB

memory/core– HPCx – 2560x1.5GHz Power5 cores, 2GB memory / core– Legion (Blue Gene/P) – 1026xPowerPC 850 MHz; 4096 cores– Leviathan (UNIBI) – 3xIntel Xeon E5355 (Quad Core), 24

cores



Porting to Parallel Platforms

Test Models

Circles Model– Very simple agents– all have position data– x,y,fx,fy,radius in

memory– Repulsion from

neighbours– 1message type – Domain

decomposition

C@S Model– Mix of agents: Malls,

Firms, People– A mixture of state

complexities– All have position data– Agents have range of

influence– 9 message types– Domain

decomposition





Circles Model



C@S Model



Bielefeld Model

Work only just started Goal to move agents between compute nodes:

– reduce overall elapsed time– increase parallel efficiency

There is an interaction between computational efficiency and overall elapsed time

The requirements of communications and load may conflict!



Dynamic Load Balancing

Balance - Load vs. Communication

Distribution 1– P1: 13 agents– P2: 3 agents– P2 <--> P1: 1 channel

Distribution 2– P1: 9 agents– P2: 7 agents– P1 <--> P2: 6 channels



Distribution A

Distribution B

P1 P2

Frequent

Occasional

Moving Wrong AgentsMoving wrong agents could increase elapsed time

Problem of Load Imbalance

0

1

2

3

4

5

6

1 2 2 4 4 4 4 5 5 5 5 5 8 8 8 8 8 8 8 8

# Partitions

Tim

e (s

) Work by agents (geometric)

Elapsed time (geometric)

Work by agents (round-robin)

Elapsed time (round-robin)



Size of agent population Granularity of agents

– is there are large computational load– How often do they communicate

Inherent parallelism (locality) in model – Are the agents in groups– Do they have short range communication

Size of initial data Size of outputs



HPC Issues in CLIMACE

Effect initial static distributions Effect dynamic agent migration algorithms Sophisticated communication strategies

– To reduce the number of communications– To reduce synchronisations– To reduce communication volumes– Pre-tagging information to allow pre-fetching

Overlapping of computation with communications Efficient use of multi-code nodes on large systems Efficient use of attached processors



HCP Challenges for ABM

Date post:	16-Jan-2016
Category:	Documents
Upload:	gracie
View:	29 times
Download:	0 times

High Performance Computing and the FLAME Framework

Documents