High Performance Computing and the FLAME Framework
Prof C Greenough, LS Chin and Dr DJ WorthSTFC Rutherford Appleton Laboratory
Prof M Holcombe and Dr S CoakleyComputer Science, Sheffield University
Application can not be run on a conventional computing system– Insufficient memory– Insufficient compute power
High Performance Computing (HPC) generally now means:– Large multi-processor system– Complex communications hardware– Specialised attached processors– GRID/Cloud computing
STFC Rutherford Appleton Laboratory
2CLIMACE Meeting - 14 May 2009
Why High Performance Computing?
Parallel system are in constant development Their hardware architectures are ever changing
– simple distributed memory on multiple processors– share memory between multiple processors– hybrid systems –
clusters of share memory multiple processors clusters of multi-core systems
– the processors often have a multi-level cache system
STFC Rutherford Appleton Laboratory
3CLIMACE Meeting - 14 May 2009
Issues in High Performance Computing
Most have high speed multi-level communication switches
GRID architectures are now being used for very large simulations– many large high-performance systems – loosely coupled together over the internet
Performance can be improved by optimising to a specific architecture
Can very easily become architecture dependent
STFC Rutherford Appleton Laboratory
4CLIMACE Meeting - 14 May 2009
Issues in High Performance Computing
STFC Rutherford Appleton Laboratory
5CLIMACE Meeting - 14 May 2009
The FLAME Framework
Based on X-Machines Agents:
– Have memory– Have states– Communicate through messages
Structure of Application:– Embedded in XML and C-code– Application generation driven by state graph– Agent communication managed by library
STFC Rutherford Appleton Laboratory
6CLIMACE Meeting - 14 May 2009
Characteristics of FLAME
The Data Load– Size of agents internal memory– The number of size of message boards The Computational Load– Work performed in any state change– Any I/O performed FLAME Framework– Programme generator (serial/parallel)– Provides control of states– Provide communications network
STFC Rutherford Appleton Laboratory
7CLIMACE Meeting - 14 May 2009
Characteristics of FLAME
Based on :– the distribution of agents – computational load– distribution of message boards – data load
Agents only communicate via MBs Cross-node message information is made available to
agents by message board synchronisation Communication between nodes are minimised
– Halo regions– Message filtering
STFC Rutherford Appleton Laboratory
8CLIMACE Meeting - 14 May 2009
Initial Parallel Implementation
STFC Rutherford Appleton Laboratory
CLIMACE Meeting - 14 May 2009 9
Geometric Partitioning
halos
radius
P1
P2
P3
P4 P7 P10
P11
P12P9P6
P5 P8
Processors
Pi
STFC Rutherford Appleton Laboratory
10CLIMACE Meeting - 14 May 2009
Parallelism in FLAME
Parallelism is hidden in the XML model and the C-code – this is in term of agent locality or groupings
Communications captured in XML– In agent function descriptions– In message descriptions
The States are the computational load – weight not known until run time – could be fine or course grained
Initial distribution based on a static analysis Final distributions method be based on dynamic
behaviour
STFC Rutherford Appleton Laboratory
11CLIMACE Meeting - 14 May 2009
Issues with HPC and FLAME
STFC Rutherford Appleton Laboratory
12CLIMACE Meeting - 14 May 2009
Parallelism in FLAMEParallel agents grouped on parallel nodes.
Messages synchronised
Message board library allows both serial and parallel versions to work
Implementation details hiddenfrom modellers
System automatically manages the simulation
Decoupled from the FLAME framework Well defined Application Program Interface (API) Includes functions for creating, deleting, managing and
accessing information on the Message Boards Details such as internal data representations, memory
management and communication strategies are hidden Uses multi-threading for work and communications
STFC Rutherford Appleton Laboratory
13CLIMACE Meeting - 14 May 2009
Message Boards
STFC Rutherford Appleton Laboratory
14CLIMACE Meeting - 14 May 2009
FLAME & the Message Boards
MB Management– create, delete, add message, clear board
Access to message information (iterators)– plain, filtered, sorted, randomise
MB Synchronisation– moving information between nodes– full data replication – very expensive– filtered information using tagging– overlapped with computation
STFC Rutherford Appleton Laboratory
15CLIMACE Meeting - 14 May 2009
Message Board API
Message Board Management– MB_Env_Init - Initialises MB environment– MB_Env_Finalise - Finalises the MB environment– MB_Create - Creates a new Message Board object– MB_AddMessage - Adds a message to a Message
Board– MB_Clear - Clears a Message Board– MB_Delete - Deletes a Message Board
STFC Rutherford Appleton Laboratory
16CLIMACE Meeting - 14 May 2009
The MB Environment
Message Selection & Reading - Iterators– MB_Iterator_Create - Creates an iterator– MB_Iterator_CreateSorted - Create a sorted iterator– MB_Iterator_CreateFiltered - Create a filtered iterator– MB_Iterator_Delete - Deletes an iterator– MB_Iterator_Rewind - Rewinds an iterator– MB_Iterator_Randomise - Randomises an Iterator– MB_Iterator_GetMessage - Returns next message
STFC Rutherford Appleton Laboratory
17CLIMACE Meeting - 14 May 2009
The Message Board API (2)
Message Synchronisation: Synchronisation of boards involves the propagation of message data out across the processing nodes as required by the agents on each node
– MB_SyncStart - Synchronises a message board– MB_SyncTest - Tests for synchronisation completion– MB_SyncComplete - Completes the synchronisation
STFC Rutherford Appleton Laboratory
18CLIMACE Meeting - 14 May 2009
The Message Board API (3)
MB Sychronisation:– The simplest form is full replication of message data
- very expensive in communication and memory– The MB uses message tagging to reduce the volume
of data being transferred and stored– Tagging uses message FILTERs to select message
information to be transferred – FILTERs are specified in the Model File XMML
STFC Rutherford Appleton Laboratory
19CLIMACE Meeting - 14 May 2009
The Message Board API (4)
The Message Board API (5)
Selection based on filters Filters defined in XMML Filters can be used:
– in creating iterators to reduce local message list
– during synchronisation to minimise cross-node communications
STFC Rutherford Appleton Laboratory
20CLIMACE Meeting - 14 May 2009
Iterators objects used for traversing Message Board content. They provide users access to messages while isolating them from the internal data representation of Boards.
Creating an Iterator generates a list of the available messages within the Board against a specific criteria. This is a snapshot of the content of a local Board.
STFC Rutherford Appleton Laboratory
21CLIMACE Meeting - 14 May 2009
MB Iterators (1)
STFC Rutherford Appleton Laboratory
22CLIMACE Meeting - 14 May 2009
MB Iterators (2)
FLAME has been successfully ported to the to various HPC systems:– SCARF – 360x2.2 GHz AMD Opteron cores, 1.3TB total
memory– HAPU – 128x2.4 GHz Opteron cores, 2GB memory / core– NW-Grid – 384x2.4 GHz Opteron cores, 2 or 4 GB
memory/core– HPCx – 2560x1.5GHz Power5 cores, 2GB memory / core– Legion (Blue Gene/P) – 1026xPowerPC 850 MHz; 4096 cores– Leviathan (UNIBI) – 3xIntel Xeon E5355 (Quad Core), 24
cores
STFC Rutherford Appleton Laboratory
23CLIMACE Meeting - 14 May 2009
Porting to Parallel Platforms
Test Models
Circles Model– Very simple agents– all have position data– x,y,fx,fy,radius in
memory– Repulsion from
neighbours– 1message type – Domain
decomposition
C@S Model– Mix of agents: Malls,
Firms, People– A mixture of state
complexities– All have position data– Agents have range of
influence– 9 message types– Domain
decomposition
STFC Rutherford Appleton Laboratory
24CLIMACE Meeting - 14 May 2009
STFC Rutherford Appleton Laboratory
25CLIMACE Meeting - 14 May 2009
Circles Model
STFC Rutherford Appleton Laboratory
26CLIMACE Meeting - 14 May 2009
C@S Model
STFC Rutherford Appleton Laboratory
27CLIMACE Meeting - 14 May 2009
Bielefeld Model
Work only just started Goal to move agents between compute nodes:
– reduce overall elapsed time– increase parallel efficiency
There is an interaction between computational efficiency and overall elapsed time
The requirements of communications and load may conflict!
STFC Rutherford Appleton Laboratory
28CLIMACE Meeting - 14 May 2009
Dynamic Load Balancing
Balance - Load vs. Communication
Distribution 1– P1: 13 agents– P2: 3 agents– P2 <--> P1: 1 channel
Distribution 2– P1: 9 agents– P2: 7 agents– P1 <--> P2: 6 channels
STFC Rutherford Appleton Laboratory
29CLIMACE Meeting - 14 May 2009
Distribution A
Distribution B
P1 P2
Frequent
Occasional
Moving Wrong AgentsMoving wrong agents could increase elapsed time
Problem of Load Imbalance
0
1
2
3
4
5
6
1 2 2 4 4 4 4 5 5 5 5 5 8 8 8 8 8 8 8 8
# Partitions
Tim
e (s
) Work by agents (geometric)
Elapsed time (geometric)
Work by agents (round-robin)
Elapsed time (round-robin)
STFC Rutherford Appleton Laboratory
30CLIMACE Meeting - 14 May 2009
Size of agent population Granularity of agents
– is there are large computational load– How often do they communicate
Inherent parallelism (locality) in model – Are the agents in groups– Do they have short range communication
Size of initial data Size of outputs
STFC Rutherford Appleton Laboratory
31CLIMACE Meeting - 14 May 2009
HPC Issues in CLIMACE
Effect initial static distributions Effect dynamic agent migration algorithms Sophisticated communication strategies
– To reduce the number of communications– To reduce synchronisations– To reduce communication volumes– Pre-tagging information to allow pre-fetching
Overlapping of computation with communications Efficient use of multi-code nodes on large systems Efficient use of attached processors
STFC Rutherford Appleton Laboratory
32CLIMACE Meeting - 14 May 2009
HCP Challenges for ABM