CSAR Overview
Laxmikant (Sanjay) Kale11 September 2001
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign ©
©2001 Board of Trustees of the University of Illinois
2
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
CS Faculty and Staff Investigators T. Baker M. Bhandarkar M. BrandyBerry M. Campbell E. de Sturler H. Edelsbrunner R. Fiedler M. Heath J. Jiao L. Kale
O. Lawlor J. Liesen J. Norris D. Padua D. Reed P. Saylor K. Seamons A. Sheffer S. Teng M. Winslett
plus numerous students
3
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Computer ScienceResearch Overview
Parallel programming environmentSoftware integration frameworkParallel component frameworksClustersParallel I/O and data migrationPerformance tools and techniquesComputational steeringVisualization
Computational mathematics and geometry Interface propagation and interpolation Linear solvers and preconditionersEigensolversMesh generation and adaptation
4
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Software Integration Framework Flexible framework for coupling stand-alone
application codes (local & grid) Encapsulation via objects and threads Runtime environment to support dynamic
behavior (e.g., refinement, load balancing) Intelligent interface for mediating
communication between component modules Reusable abstractions People: (SWIFT team +)
de Sturler, Heath, Kale, Geubelle, Parsons, ..Bhandarkar, Campbell, Jiao, Haselbacher..
5
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
APIs for Coupling Codes Experimented with three orthogonal ideas MPI based
Replaces subroutine call by communication with MPI “Decouples” coupled code for greater flexibility in assigning
modules to processors Charm++ based
Encapsulates modules using objects and threads Replaces MPI with “adaptive” MPI transparently to user Provides automatic load balancing by migrating threads
Autopilot based Uses sensors and actuators to coordinate coupled modules Provides steering and performance visualization
Current solution: Incorporates ideas from above AMPI with cross communicators, integration with Roccom
6
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
AMPI Adaptive load balancing for MPI programs Uses Charm++’s load balancing framework Uses multiple MPI threads per processor
Light-weight threads
Rocflo
Rocface
Rocsolid
Rocflo
Rocface
Rocsolid
Rocflo
Rocface
Rocsolid
Rocflo
Rocface
Rocsolid
7
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
AMPI and Roc*
Rocflo
Rocface
RocsolidRocflo
Rocface
Rocsolid
Rocflo
Rocface
Rocsolid
Rocflo
Rocface
Rocsolid
Rocflo
Rocface
Rocsolid
8
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
267.75299.85301.56235.19Time Step
133.76149.01150.08117.16Pre-Cor Iter
46.8352.2052.5041.86Solid update
86.8996.7397.5075.24Fluid update
8P3,8P2 w. LB
8P3,8P2 w/o LB
16P216P3Phase
Load Balancing with AMPI/Charm++Turing cluster has processors with different speeds
9
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Performance of GEN1Using Charm++
1
10
100
1000
1 10 100 1000
Number of Processors
Prob 1Prob 2Prob 3Prob 4Prob 5
10
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
AMPI: Recent progress Compiler support for automatic conversion
Global variablesPacking-unpacking functions
Automatic checkpointingNo user intervention needed
Except pack-unpack for rare, complex data structures Triggered by user calls, or periodic
Restart on a different number of processors Cross communicators
Allows multiple components to communicate acrossTwo independent MPI “Worlds” can communicate Implemented for Rocflo/Rocsolid separation
11
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
AMPI and Roc*
Rocflo
Rocface
Rocsolid
Rocflo
Rocface
Rocsolid
Rocflo
Rocface
Rocsolid
Rocflo
Rocface
Rocsolid
Rocflo
Rocface
Rocsolid
Rocflo
Rocface
RocsolidRocface
Rocsolid
Rocface
Rocsolid
Rocface
RocsolidRocface
Rocsolid
RocfloRocflo Rocflo Rocflo
12
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
AMPI and Roc*: Communication
Rocflo
Rocface
RocsolidRocface
Rocsolid
Rocface
Rocsolid
Rocface
RocsolidRocface
Rocsolid
RocfloRocflo Rocflo Rocflo
13
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Roccom -- Component Objects ManagerMechanisms for inter-component data exchange and
function invocation Roccom API
Programming interface for application modules Roccom developers interface
C++ interface for service modules Roccom implementations
Roccom easily supported by multiple runtime systems: MPI, Charm++ (AMPI), Autopilot
14
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Roccom GoalsMechanism for data
exchange and function invocation between Roc* components
Object-oriented philosophy enforcing encapsulation and enabling polymorphism
Minimal changes required to existing physical modules
Minimal dependencies in component development
Maximal flexibility for integration
15
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Architectures with/without Roccom
Promotes modularity Eases integration of modules (e.g. Rocpanda) Enables plug-and-play of physics modules
Solid
HDF IO
Fluid
Roccom
Orchestration
Combustion
Interface
Solid
HDF IO
Fluid
Orchestration
Combustion
Interface
HDF IO
GEN1 architecture GEN2 architecture
17
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Component Frameworks Motivation
Reduce tedium of parallel programming for commonly used paradigms
Encapsulate required parallel data structures and algorithmsProvide easy to use interface,
Sequential programming style preserved No alienating invasive constructs
Use adaptive load balancing framework (and objects) Current and planned component frameworks
FEM Multiblock AMR
18
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
FEM framework Present clean, “almost serial” interface:
Hide parallel implementation in the runtime systemLeave physics and time integration to userUsers write code similar to sequential code Or, easily modify sequential code
Input: connectivity file (mesh), boundary data and initial data
Framework:Partitions data, andStarts driver for each chunk in a separate threadAutomates communication, once user registers fields to be
communicatedAutomatic dynamic load balancing
19
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
FEM Experience Previous:
3-D volumetric/cohesive crack propagation code (Geubelle, Breitenfeld, et. al)
3-D dendritic growth fluid solidification code (Dantzig, Jeong)
RecentAdaptive insertion of cohesive elements
Mario Zaczek, Philippe Geubelle Performance data
Multi-Grain contact (in progress) Spandan Maiti Using FEM framework and collision detection
NSF funded project
Did initial parallelization in 4 days
20
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Performance data: ASCI RedMesh with
3.1 million
elements
21
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Parallel Collision Detection Detect collisions (intersections) between
objects scattered across processors Approach based on Charm++ Arrays
Overlay regular, sparse grid of voxels (array elements)Send objects to all voxels they touchCollide voxels independently and collect results
Results: 2s per polygon; speedups to 1000s
22
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Related Projects Multiphase load balancing
Automatically identify phases, if necessaryUse instrumentation of each phase to remap objects
from each phase independently Automatic out-of-core execution
Take advantage of data-driven executionPerfectly predictive object prefetchingNo programmer intervention needed
Cluster Management Stretchable jobs : shrink-and-expand Assigned processors can be changed at runtimeJob scheduler to maximize throughputUsing stretchable jobs as well as fixed-size ones
23
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Parallel I/O and Data Migration
Parallel output of snapshots for GEN1Combine arrays for different blocks into single virtual
arrayOutput multiple arrays at once using array groupManage metadata for outputting HDF files for Rocketeer
Automatic tuning of parallel I/O performance Data migration concurrent with application Automatic choice of data migration strategy Rocpanda 3.0 Released
24
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Parallel I/O and Data Migration Parallel output of snapshots for GENx using
RocpandaSupport output of metadata, data to HDF files in
Rocketeer’s formatHide cost of I/O with new general buffering scheme called
greedy bufferingMigrate output automatically to remote workstation
Automatic tuning of parallel I/O performanceAutomatic selection of data migration strategy, buffer
sizes and placements, communication strategy, data layouts on disk
25
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Mesh Generation and Adaptation Library for mixed 3D cohesive element meshes
A program for introducing cohesive elements based on material types.
Alla Sheffer and Philippe Geubelle Mesh quality measures & Laplace smoothing in
the ALE codeAlla Sheffer and Mark Brandyberry
Continuing: Space-Time meshing in 2DxTIMEAlla Sheffer, Alper Ungor
Surface parameterizationAlla Sheffer, Eric de Sturler, Joerg Liesen & students In collaboration with Sandia (Cubit)
26
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Interface Propagationand Data Transfer
Jim Jiao, Mike Heath Interface propagation
New approach combining best features of marker particle and level set methods
Concept of null set of interface for detection of expendable data and topological change
Interface data transferEfficient and robust algorithms for mesh association between
disparate meshesNew algorithm for overlaying two meshes to create reference
mesh from common refinementAccurate and conservative interpolation using overlaid
reference mesh and least squares approximationParallel implementation in GEN1 integrated code
27
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Rocface: disparate meshesRobust and efficient algorithm for overlaying two surface meshes
28
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Rocface –Interface Component Robust and efficient algorithm for overlaying two
surface meshes
+ =
29
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Least Squares Data Transfer Minimizes error and enforces conservation Handles node and element centered data Made possible by the overlay Achieved superb experimental results Cumulative effect over 500 steps of a coupled simulation
Our methodLoad transfer (Farhat)
30
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Iterative Solvers Exact and finite precision analysis of
Krylov subspace methodsShort-term recurrencesChoice of basis in minimal residual (MR) methods
New preconditioners for indefinite systems
Application in surface parameterization
Application of Krylov subspace methods in large scale problems
GMRES with optimal truncation
31
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign©2001 Board of Trustees of the University of Illinois
Prof. Laxmikant Kale
Department of Computer Science
University of Illinois at Urbana-Champaign
2262 Digital Computer Laboratory
1304 West Springfield Avenue
Urbana, IL 61801 USA
http://www.cs.uiuc.edu/contacts/ faculty/kale.html
telephone: 217-244-0094
fax: 217-333-3501