+ All Categories
Home > Documents > What is the ACTS Collection?

What is the ACTS Collection?

Date post: 13-Jan-2016
Category:
Upload: wright
View: 25 times
Download: 0 times
Share this document with a friend
Description:
NERSC User Group Meeting The DOE ACTS Collection Osni Marques Lawrence Berkeley National Laboratory [email protected]. What is the ACTS Collection?. http://acts.nersc.gov. A dvanced C ompu T ational S oftware Collection Tools for developing parallel applications - PowerPoint PPT Presentation
Popular Tags:
26
NERSC User Group Meeting The DOE ACTS Collection Osni Marques Lawrence Berkeley National Laboratory [email protected]
Transcript
Page 1: What is the ACTS Collection?

NERSC User Group Meeting

The DOE ACTS Collection

Osni MarquesLawrence Berkeley National Laboratory

[email protected]

Page 2: What is the ACTS Collection?

09/19/2007NUG Meeting 2

What is the ACTS Collection?

• Advanced CompuTational Software Collection

• Tools for developing parallel applications

• ACTS started as an “umbrella” project

http://acts.nersc.gov

Goals Extended support for experimental software Make ACTS tools available on DOE computers Provide technical support ([email protected]) Maintain ACTS information center (http://acts.nersc.gov) Coordinate efforts with other supercomputing centers Enable large scale scientific applications Educate and train

Page 3: What is the ACTS Collection?

09/19/2007NUG Meeting 3

ACTS Timeline

1999 2003 2007

ACTS Toolkit

User Community

ACTS

Challenge Codes Computing Systems

Interoperability

Pool of Software Tools

Testing and Acceptance Phase

Workshops and Training

Scientific Computing Centers

Computer Vendors

Numerical SimulationsPhysics

ChemistryBiology

Medicine

Mathematics

Bioinformatics

Computer SciencesEngineering

SoftwareCollection

SoftwareSustainabilityCenter

SoftwareTool Box

Page 4: What is the ACTS Collection?

09/19/2007NUG Meeting 4

Challenges in the Development of Scientific Codes

• Research in computational sciences is fundamentally interdisciplinary

• The development of complex simulation codes on high-end computers is not a trivial task

• Productivity• Time to the first solution (prototype)• Time to solution (production)• Other requirements

• Complexity• Increasingly sophisticated models• Model coupling• Interdisciplinarity

• Performance• Increasingly complex algorithms• Increasingly complex architectures• Increasingly demanding applications

• Research in computational sciences is fundamentally interdisciplinary

• The development of complex simulation codes on high-end computers is not a trivial task

• Productivity• Time to the first solution (prototype)• Time to solution (production)• Other requirements

• Complexity• Increasingly sophisticated models• Model coupling• Interdisciplinarity

• Performance• Increasingly complex algorithms• Increasingly complex architectures• Increasingly demanding applications

• Libraries written in different languages• Discussions about standardizing

interfaces are often sidetracked into implementation issues

• Difficulties managing multiple libraries developed by third-parties

• Need to use more than one language in one application

• The code is long-lived and different pieces evolve at different rates

• Swapping competing implementations of the same idea and testing without modifying the code

• Need to compose an application with some other(s) that were not originally designed to be combined

• Libraries written in different languages• Discussions about standardizing

interfaces are often sidetracked into implementation issues

• Difficulties managing multiple libraries developed by third-parties

• Need to use more than one language in one application

• The code is long-lived and different pieces evolve at different rates

• Swapping competing implementations of the same idea and testing without modifying the code

• Need to compose an application with some other(s) that were not originally designed to be combined

Page 5: What is the ACTS Collection?

09/19/2007NUG Meeting 5

Current ACTS Tools and their Functionalities

Category Tool Functionalities

Numerical

Trilinos Algorithms for the iterative solution of large sparse linear systems (includes AZTEC00)

HypreAlgorithms for the iterative solution of large sparse linear systems, intuitive grid-centric interfaces, and dynamic configuration of parameters.

PETScTools for the solution of PDEs that require solving large-scale, sparse linear and nonlinear systems of equations.

OPT++ Object-oriented nonlinear optimization package.

SUNDIALSSolvers for the solution of systems of ordinary differential equations, nonlinear algebraic equations, and differential-algebraic equations.

ScaLAPACKLibrary of high performance dense linear algebra routines for distributed-memory message-passing.

SLEPc Eigensolver package built on top of PETSc

SuperLUGeneral-purpose library for the direct solution of large, sparse, nonsymmetric systems of linear equations.

TAOLarge-scale optimization software, including nonlinear least squares, unconstrained minimization, bound constrained optimization, and general nonlinear optimization.

Code Development

Global ArraysLibrary for writing parallel programs that use large arrays distributed across processing nodes and that offers a shared-memory view of distributed arrays.

OvertureObject-Oriented tools for solving computational fluid dynamics and combustion problems in complex geometries.

Code Execution TAU Set of tools for analyzing the performance of C, C++, Fortran and Java programs.

Library Development ATLAS

Tools for the automatic generation of optimized numerical software for modern computer architectures and compilers.

ODEs

PDEs

TVUAzAz

bAx

Availability

To be installed

To be installed

Installed

Upon request

Upon request

Installed*

To be installed

Installed*

To be installed

Installed**

Upon request

Under test

Upon request

* Also in LibSci** USG

Page 6: What is the ACTS Collection?

09/19/2007NUG Meeting 6

ACTS Tools: numerical functionalities

Computational Problem Methodology Algorithms Library

Linear Equations

Direct Methods

LU factorization ScaLAPACK (dense), SuperLU (sparse)

Cholesky factorization ScaLAPACK

LDLT factorization (tridiagonal matrices)

QR factorization

QR factorization with column pivoting

LQ factorization

Full orthogonal factorization

Generalized QR factorization

Iterative Methods

Conjugate gradient (CG) AztecOO (Trilinos), PETSc

GMRES AztecOO, Hypre, PETSc

CG Squared AztecOO, PETSc

Bi-CG-Stab

QMR AztecOO

Transpose free QMR AztecOO, PETSc

SYMMLQ PETSc

Richardson

Block Jacobi preconditioner AztecOO, Hypre, PETSc

Point Jacobi preconditioner AztecOO

Least-squares polynomials

SOR preconditioner PETSc

Overlapping additive Schwarz

Approximate inverse Hypre

Sparse LU preconditioner AztecOO, Hypre, PETSc

Incomplete LU (ILU) preconditioner

MultigridMG preconditioner Hypre, PETSc

Algebraic multigrid ML (Trilinos), Hypre

Semicoarsening Hypre

Page 7: What is the ACTS Collection?

09/19/2007NUG Meeting 7

ACTS Tools: numerical functionalities

Computational Problem Methodology Algorithms Library

Linear least squares least squares minx bAx 2 ScaLAPACK

minimum norm minx x 2

minimum norm least squares

minx x 2 and minx bAx 2

Standard eigenvalue problems iterative, direct Az=z for A=AT or A=AH ScaLAPACK (dense), SLEPc (sparse)

Generalized eigenvalue problems Az=Bz, ABz=z, BAz=z

Singular value decomposition A=UV T, A=UVH

Non-linear equations problems Newton-based Line search PETSc, KINSOL (SUNDIALS)

Trust regions PETSc

Pseudotransient continuation PETSc

Matrix-free PETSc

Nonlinear optimization Newton-based Newton OPT++, TAO

Finite differences OPT++

Quasi-Newton OPT++, TAO (LMVM)

Nonlinear interior point OPT++, TAO

CG Standard nonlinear CG OPT++, TAO

Limited memory BFGS OPT++

Gradient projection TAO

Direct Search Without derivative information OPT++

Semismooth Infeasible semismooth TAO

Feasible semismooth

Page 8: What is the ACTS Collection?

09/19/2007NUG Meeting 8

ACTS Tools: numerical functionalities

Computational Problem Methodology Algorithms Library

ODEs Integration Variable coefficient Adams-Moulton CVODE (SUNDIALS)

Backward differential

Direct and iterative solvers

ODEs with sensitivity analysis Integration Variable coefficient Adams-Moulton

Backward differential

Direct and iterative solvers

Differential-algebraic equations Backward differential formula

Direct and iterative solvers IDA (SUNDIALS)

Nonlinear equations with sensitivity analysis

Inexact Newton line search SensKINSOL (SUNDIALS)

Tuning and optimization Automatic code generator

BLAS and some LAPACK routines ATLAS

Page 9: What is the ACTS Collection?

09/19/2007NUG Meeting 9

Software InterfacesCALL BLACS_GET( -1, 0, ICTXT )CALL BLACS_GRIDINIT( ICTXT, 'Row-major', NPROW, NPCOL ) :CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) :CALL PDGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, INFO )

Data Layout

structured composite blockstrc unstruc CSR

Linear Solvers

GMG FAC Hybrid, ... AMGe ILU, ...

Linear System Interfaces

• -ksp_type [cg,gmres,bcgs,tfqmr,…]• -pc_type [lu,ilu,jacobi,sor,asm,…]

More advanced:

• -ksp_max_it <max_iters>• -ksp_gmres_restart <restart>• -pc_asm_overlap <overlap>• -pc_asm_type

[basic,restrict,interpolate,none]

command line

function call

problem domain

(ScaLAPACK)

(PETSc)

(Hypre)

Page 10: What is the ACTS Collection?

09/19/2007NUG Meeting 10

Use of ACTS Tools

Electronic structure optimization performed with TAO, (UO2)3(CO3)6

(courtesy of deJong).

Molecular dynamics and thermal flow simulation using codes based on Global

Arrays. GA have been employed in large simulation codes such as NWChem, GAMESS-UK, Columbus, Molpro, Molcas, MWPhys/Grid,

etc.

Problems (different grid types) solved with Hypre.

Micro-FE bone modeling using ParMetis, Prometheus and PETSc; models up to 537 million DOF (Adams, Bayraktar,

Keaveny, and Papadopoulos).

Model of the heart mechanics (blood-muscle-valve) by an adaptive and parallel version of the immersed

boundary method, using PETSc, Hypre and SAMRAI (courtesy of Boyce Griffith,

New York University).

3D incompressible Euler,tetrahedral grid, up to 11 million unknowns, based on a

legacy NASA code, FUN3d (W. K. Anderson), fully implicit steady-state, parallelized with PETSc (courtesy of

Kaushik and Keyes).

Page 11: What is the ACTS Collection?

09/19/2007NUG Meeting 11

Use of ACTS Tools

Induced current (white arrows) and charge density (colored plane and gray surface) in crystallized glycine due to an external field (Louie, Yoon,

Pfrommer and Canning), eigenvalue problems solved with ScaLAPACK.

OPT++ is used in protein energy minimization problems (shown

here is protein T162 from CASP5, courtesy of Meza , Oliva et al.)

Omega3P is a parallel distributed-memory code intended for the modeling and analysis of accelerator cavities, which requires the solution

of generalized eigenvalue problems. A parallel exact shift-invert eigensolver based on PARPACK and SuperLU has allowed for the solution

of a problem of order 7.5 million with 304 million nonzeros. Finding 10 eigenvalues requires about 2.5 hours on 24 processors of an IBM SP.

Two ScaLAPACK routines, PZGETRF and PZGETRS, are used for solution of linear systems in the spectral algorithms based AORSA code (Batchelor et al.), which is intended for the study of electromagnetic wave-plasma interactions. The code reaches 68% of peak performance on 1936 processors of an IBM SP.

Page 12: What is the ACTS Collection?

09/19/2007NUG Meeting 12

ScaLAPACK

ScaLAPACK

BLAS

LAPACK BLACS

MPI/PVM/...

PBLASGlobal

Local

platform specific

Clarity,modularity, performance and portability.

Atlas can be used here for automatic tuning.

Clarity,modularity, performance and portability.

Atlas can be used here for automatic tuning.

Linear systems, least squares, singular value decomposition,

eigenvalues.

Linear systems, least squares, singular value decomposition,

eigenvalues.

Communication routines targeting

linear algebra operations.

Communication routines targeting

linear algebra operations.

Parallel BLAS.

Parallel BLAS.

Communication layer (message

passing).

Communication layer (message

passing).

http://acts.nersc.gov/scalapack

Version 1.7.5 released in January 2007; NSF funding for further

development.

Version 1.7.5 released in January 2007; NSF funding for further

development.

UTK, UCB …

Page 13: What is the ACTS Collection?

09/19/2007NUG Meeting 13

ScaLAPACK: understanding performance

-

50

100

150

200

250

order of the m atrix

tim

e (s

)

p = 2 (1x2)

p = 4 (2x2)

p = 8 (2x4)

p = 16 (4x4)

p = 32 (4x8)

p = 64 (8x8)

60 processors, Dual AMD Opteron 1.4GHz Cluster with Myrinet Interconnect, 2GB memory

0

10

20

30

40

50

60

70

80

90

100

seco

nd

s

10002000300040005000600070008000900010000

1x60

2x30

3x20

4x15

5x12

6x10

problem size

grid shape

Execution time of PDGESV for various grid shape

90-100

80-90

70-80

60-70

50-60

40-50

30-40

20-30

10-20

0-10

LU on 2.2 GHz AMD Opteron (4.4 GFlop/s peak performance)

Page 14: What is the ACTS Collection?

09/19/2007NUG Meeting 14

ScaLAPACK: understanding the 2D block-cyclic distribution

http://acts.nersc.gov/scalapack/hands-on/datadist.html

Page 15: What is the ACTS Collection?

09/19/2007NUG Meeting 15

PETSc

Computation and Communication KernelsMPI, MPI-IO, BLAS, LAPACK

Profiling Interface

PETSc PDE Application Codes

Object-OrientedMatrices, Vectors, Indices

GridManagement

Linear SolversPreconditioners + Krylov Methods

Nonlinear Solvers,Unconstrained Minimization

ODE Integrators Visualization

Interface

ANL

Portable, Extensible Toolkit for Scientific computation

Page 16: What is the ACTS Collection?

09/19/2007NUG Meeting 16

PETSc: Linear Solvers (SLES)

PETSc

ApplicationInitialization

Evaluation of A and bPost-

Processing

SolveAx = b PC KSP

Linear Solvers (SLES)

PETSc codeUser code

Main Routine

Page 17: What is the ACTS Collection?

09/19/2007NUG Meeting 17

PETSc: setting SLES parameters at run time

• -ksp_type [cg,gmres,bcgs,tfqmr,…]• -pc_type [lu,ilu,jacobi,sor,asm,…]

more advanced:

• -ksp_max_it <max_iters>• -ksp_gmres_restart <restart>• -pc_asm_overlap <overlap>• -pc_asm_type [basic,restrict,interpolate,none]• many more (see manual)

Page 18: What is the ACTS Collection?

09/19/2007NUG Meeting 18

Important Questions for Application Developers

• How does performance vary with different compilers?

• Is poor performance correlated with certain OS features?

• Has a recent change caused unanticipated performance?

• How does performance vary with MPI variants?

• Why is one application version faster than another?

• What is the reason for the observed scaling behavior?

• Did two runs exhibit similar performance?

• How are performance data related to application events?

• Which machines will run my code the fastest and why?

• Which benchmarks predict my code performance best?

From http://acts.nersc.gov/events/Workshop2005/slides/Shende.pdf

Page 19: What is the ACTS Collection?

09/19/2007NUG Meeting 19

TAU

• Multi-level performance instrumentation• Multi-language automatic source instrumentation

• Flexible and configurable performance measurement

• Widely-ported parallel performance profiling system• Computer system architectures and operating systems

• Different programming languages and compilers

• Support for multiple parallel programming paradigms• Multi-threading, message passing, mixed-mode, hybrid

• Support for performance mapping

• Support for object-oriented and generic programming

• Integration in complex software systems and applications

U Oregon

Tuning and Analysis Utilities

Page 20: What is the ACTS Collection?

09/19/2007NUG Meeting 20

Definitions: profiling and tracing

• Profiling• Recording of summary information during execution

(inclusive and exclusive time, number of calls, hardware statistics, etc)• Reflects performance behavior of program entities

(functions, loops, basic blocks, user-defined “semantic” entities)• Very good for low-cost performance assessment• Helps to expose performance bottlenecks and hotspots• Implemented through

• sampling: periodic OS interrupts or hardware counter traps• instrumentation: direct insertion of measurement code

• Tracing• Recording of information about significant points (events) during program execution

• entering/exiting code region (function, loop, block, etc)• thread/process interactions (send/receive message, etc)

• Save information in event record• timestamp• CPU identifier, thread identifier• Event type and event-specific information

• Event trace is a time-sequenced stream of event records• Can be used to reconstruct dynamic program behavior• Typically requires code instrumentation

Page 21: What is the ACTS Collection?

09/19/2007NUG Meeting 21

TAU: Example 1 (1/2)

http://acts.nersc.gov/tau/programs/pdgssvx

set the C compiler

Ex. tau-multiplecounters-mpi-papi-pdt

Page 22: What is the ACTS Collection?

09/19/2007NUG Meeting 22

TAU: Example 1 (2/2)

PAPI provides access to hardware performance counters (see http://icl.cs.utk.edu/papi for details and contact [email protected] for the corresponding TAU events). In this example we are just measuring FLOPS.

PARAPROF

Page 23: What is the ACTS Collection?

09/19/2007NUG Meeting 23

TAU: Example 2 (1/2)

PESCAN is a code that uses the folded spectrum method for nonselfconsistent nanoscale calculations. It uses a planewave basis, and conventional Kleinman-Bylander nonlocal pseudopotetials in real space. It is parallelized using MPI and can calculate million atom systems.

# Makefile for PESCAN

include $(TAULIBDIR)/Makefile.tau-multiplecounters-mpi-papi-pdt#include $(TAULIBDIR)/Makefile.tau-callpath-mpi-pdt

FC = $(TAU_COMPILER) mpxlf90_rCC = $(TAU_COMPILER) mpcc_r

Page 24: What is the ACTS Collection?

09/19/2007NUG Meeting 24

TAU: Example 2 (2/2)

Page 25: What is the ACTS Collection?

09/19/2007NUG Meeting 25

The Case for Software Libraries

machine tuned and dependent modules

application data layout

CONTROL I/O

algorithmic implementations

APPLICATION

New architecture: may or may not need re-rewritingNew developments: difficult to predict

New architecture: extensive re-rewritingNew or extended Physics: extensive re-rewriting or increased overhead

New architecture: minimal to extensive rewriting

New architecture or software: • Extensive tuning• May require new programming paradigms• Difficult to maintain!

Page 26: What is the ACTS Collection?

09/19/2007NUG Meeting 26

ACTS: value-added services

PyACTS

• Requirements for reusable high quality software tools

• Integration, maintenance and support efforts

• Interfaces using script languages

• Software automation

More information:• [email protected]• http://acts.nersc.gov


Recommended