+ All Categories
Home > Documents > NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years Goal is to...

NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years Goal is to...

Date post: 21-Dec-2015
Category:
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
47
CCNI HPC 2 Activities
Transcript
Page 1: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

CCNI HPC2 Activities

Page 2: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

2

NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years

Goal is to provide NY State users support in the application of HPC technologies in: Research and discovery Product development Improved engineering and manufacturing

processesThe HPC2 is a distributed activity - participants

Rensselaer, Stony Brook/Brookhaven, SUNY Buffalo, NYSERNET

HPC2 Activities

Page 3: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

NY State Industrial Partners

XeroxCorningITT Fluid Technologies: Goulds PumpsGlobal Foundries

Page 4: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Modeling Two-phase Flows

Objectives Demonstrate end-to-end solution of two-phase flow

problems. Couple with structural mechanics boundary condition. Provide interfaced, efficient and reliable software suite

for guiding design.Tools Simmetrix SimAppS Graphical Interface – mesh generation

and problem definition PHASTA – two-phase level set flow solver PhParAdapt – solution transfer and mesh adaptation driver Kitware Paraview – visualizationSystems CCNI BG/L, CCNI Opterons Cluster

Page 5: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Modeling Two-phase Flows3D Example Simulation

REPLACE WITH ANIMATION

Fluid ejected into air.Ran on 4000 CCNI BG/L cores.

Page 6: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Two-phase Automated Mesh Adaptation

Six iterations of mesh adaptation on two-phase simulation. Autonomously ran on 128 cores of CCNI Opterons for approximately 4 hours

Page 7: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Modeling Two-phase FlowsSoftware Support for Fluid Structure Interactions

Initial work interfaces simulations through serial file formats for displacement and pressure data.

Structural mechanics simulation runs in serial. PHASTA simulation runs in parallel.

Distribute serial displacement data to partitioned PHASTA mesh.

Aggregate partitioned PHASTA nodal pressure data to serial input file.

Modifications to automated mesh adaptation Perl script.

Structural Mechanics Mesh of Input Face

PHASTA Partitioned Mesh of Input Face

Page 8: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Modeling Free Surface FlowsObjectives Demonstrate capability of available computational

tools/resources for parallel simulation of highly viscous sheet flows.

Solve a model sheet flow problem relevant to the actual process/geometry.

Develop and define processes for high fidelity twin screw extruder parallel CFD simulation.

Investigated Tools (to date) ACUSIM AcuConsole and AcuSolve, Simmetrix

MeshSim, Kitware ParaviewSystems CCNI Opterons Cluster

Page 9: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

9

High Aspect Ratio Sheet Aspect ratio : 500:1 Element count: 1.85 Million 7 mins on 512 cores 300 mins on 8 cores

Parallel 3D Sheet Flow Simulation

Page 10: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

10

Mesh generation in Simmetrix SimAppS graphical interface.

Gaps that are ~1/180 of large feature dimension.

* http://en.wikipedia.org/wiki/Plastics_extrusion

** https://sites.google.com/site/oscarsalazarcespedescaddesign/project03

Single Screw Extruder CAD**

Conceptual Rendering of Single

Screw Extruder Assembly*

Screw Extruder: Simulation Based Design Tools

Page 11: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Modeling Pump Flows

Objectives Apply HPC systems and software to setup and

run 3D pump flow simulations in hours instead of days.

Provide automated mesh generation for fluid geometries with rotating components.

Tools ACUSIM Suite, PHASTA, ANSYS CFX, FMDB,

Simmetrix MeshSim, Kitware ParaviewSystems CCNI Opterons Cluster

Page 12: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Modeling Pump FlowsGraphical Interfaces

AcuConsole Interface Problem definition, mesh

generation, runtime monitor, and data visualization

Page 13: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Modeling Pump Flows Critical Mesh Regions

Page 14: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Modeling Pump Flows Critical Mesh Regions

Page 15: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Mesh Generation Tools

Simmetrix provided customized mesh generation and problem definition GUI after iterating with industrial partner. Supports automated identification of pump

geometric model features and application of attributes

Problem definition with support for exporting data for multiple CFD analysis tools.

Reduced mesh generation time frees engineers to focus on simulation and design optimizations improved products

Page 16: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

NEERAV’S SLIDES HERE

Page 17: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Scientific Computation Research Center

Page 18: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Scientific Computation Research Center

Goal: Develop simulation technologies that allow practitioners to evaluate systems of interest.

To meet this goal we Develop adaptive methods for reliable simulations Develop methods to do all computation on

massively parallel computers Develop multiscale computational methods Develop interoperable technologies that speed

simulation system development Partner on the construction of simulation systems

for specific applications in multiple areas

Page 19: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

SCOREC Software Components

Software available (http://www.scorec.rpi.edu/software.php) Some tools not yet linked – email [email protected] with any

questions Simulation Model and Data Management

Geometric model interface to interrogate CAD models Parallel mesh topological representation Representation of tensor fields Relationship manager

Parallel Control Neighborhood aware message packing - IPComMan Iterative mesh partition improvement with multiple criteria -

ParMA Processor mesh entity reordering to improve cache performance

Page 20: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

SCOREC Software Components (Continued)

Adaptive Meshing Adaptive mesh modification Mesh curving

Adaptive Control Support for executing parallel adaptive unstructured mesh

flow simulations with PHASTA Adaptive multimodel simulation infrastructure

Analysis Parallel Hierarchic Adaptive Stabilized Transient Analysis

software for compressible or incompressible, laminar or turbulent, steady or unsteady flows on 3D unstructured meshes (with U. Colorado)

Parallel hierarchic multiscale modeling of soft tissues

Page 21: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Interoperable Technologies for Advanced Petascale Simulations (ITAPS)

Mesh Geometry Relations FieldCommonInterfaces

ComponentTools

Are unified by

PetascaleIntegratedTools

Build on

Mesh Adapt

InterpolationKernels

Swapping DynamicServices

Geom/MeshServices

AMRFront tracking

ShapeOptimization

SolutionAdaptiveLoop

SolutionTransfer

PetascaleMeshGeneration

SmoothingFront tracking

Page 22: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

PHASTA Scalability(Jansen, Shephard, Sahni, Zhou)

Excellent strong scaling Implicit time integration Employs the partitioned mesh for

system formulation and solution Specific number of ALL-REDUCE

communications also required

#Proc. El./core t(sec) scale

512 204,800 2120 1

1,024 102,400 1052 1.01

2,048 51,200 529 1.00

4,096 25,600 267 0.99

8,192 12,800 131 1.02

16,384 6,400 64.5 1.03

32,768 3,200 35.6 0.93

105M vertex mesh (CCNI Blue Gene/L)

1 billion element anisotropic mesh on Intrepid Blue

Gene/P#of cores

Rgn imb

Vtx imb Time (s) Scaling

16k 2.03% 7.13% 222.03 1

32k 1.72% 8.11% 112.43 0.987

64k 1.6% 11.18% 57.09 0.972

128k 5.49% 17.85% 31.35 0.885

Page 23: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Strong Scaling – 5B Mesh up to 288k Cores

Without ParMA partition improvement strong scaling factor is 0.88 (time is 70.5 secs).Can yield 43 cpu-years savings for production runs!

AAA 5B elements: full-system scale on Jugene (IBM BG/P system)

Page 24: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Requires functional support for Mesh distribution Mesh level inter-processor communications Parallel mesh modification Dynamic load balancing

Have parallel implementations for each – focusing on increasing scalability

Parallel Adaptive Analysis

Page 25: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Initial mesh: uniform, 17 million mesh regions

Adapted mesh: 160 air bubbles 2.2 billion mesh regions

Multiple predictive load balance steps used to make the adaptation possible

Larger meshes possible (not out of memory)

Parallel Mesh Adaptation to 2.2 Billion Elements

Initial and adapted mesh (zoom of a bubble), colored by magnitude of mesh size field

Mesh size field of air bubbles distributing in a tube (segment of the model – 64 bubbles total)

Page 26: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Initial Scaling Studies of parallel MeshAdapt Test strong scaling uniform

refinement on Ranger 4.3M to 2.2B elements

Nonuniform field driven refinement (with mesh optimization) on Ranger 4.2M to 730M elements (time for dynamic load balancing not included)

Nonuniform field driven refinement (with mesh optimization operations) on Blue Gene/P 4.2M to 730M elements (time for dynamic load balancing not included)

# of Parts Time (s) Scaling

2048 21.5 1.0

4096 11.2 0.96

8192 5.67 0.95

16384 2.73 0.99

# of Parts Time (s) Scaling

2048 110.6 1.0

4096 57.4 0.96

8192 35.4 0.79

# of Parts Time (s) Scaling

4096 173 1.0

8192 105 0.82

16384 66.1 0.65

32768 36.1 0.60

Page 27: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Tightly coupled Adv: Computationally efficient Disadv: More complex code

development Example: Explicit solution of

cannon blastsLoosely coupled

Adv: Ability to use existing analysis codes

Disadv: Overhead of multiple structures and data conversion

Example: Implicit high-orderActive flow control modeling

t=0.0

t=2e-4

t=5e-4

Adaptive Loop Construction

Page 28: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Adaptive Loop Driver – C++ Coordinates API calls to execute solve-adapt loop

phSolver – Fortran 90 Flow solver scalable to 288k cores of BG-P, Field API

phParAdapt – C++ Invokes parallel mesh adaptation

▪ SCOREC FMDB and MeshAdapt, Simmetrix MeshSim and MeshSimAdapt

Adaptive Loop DriverphSolver phParAdapt

28

Compact Mesh and Solution

Data

Mesh Data Base

Solution Fields

Field API

Field API

Control

Control

Field Data

Field Data

File Free Parallel-Adaptive Loop

Page 29: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

IPComMan

General-purpose communication package built on top of MPI Architecture independent neighborhood based inter-processor

communications. Neighborhood in parallel applications

Subset of processors exchanging messages during a specific communication round.

Bounded by a constant, typically under 40, independent of the total number of processors.

Several useful features of the library Automatic message packing. Management of sends and receives with non-blocking MPI functions. Asynchronous behavior unless the other is specified. Support of dynamically changing neighborhood during communication

steps.

Page 30: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

IPComMan Implementation

Buffer Memory Management Assemble messages in pre-allocated buffers for each destination. Send each package out when its buffer size is reached. Provide memory allocation for both sending and receiving buffers. Deal with constant or arbitrary message sizes.

Processor-Neighborhood-Domain Concept Support efficient communication to processor neighbors based on knowledge of

neighborhoods. No collective call verifications if neighbors are fixed. If new neighbors are encountered, perform a collective call to figure out the correctness

of communication. Communication Paradigm

No need to verify and send the number of packages to neighbors, it is wrapped in the last buffer.

If nothing to send to its neighbor a constant is sent notifying that the communication is done.

No message order rule, thus save communication time by processing the first available buffer.

Page 31: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

IPComMan Vs. MPI Implementation

Tiling patterns to test the message flow control in a pseudo-unstructured neighborhood environment on 1024 cores.

N/4 processors has 2 neighbors, N/8 processors has 3 neighbors, N/4 processors has 4 neighbors, 3N/16 processors has 5 neighbors, N/16 processors has 9 neighbors, N/16 processors has 14 neighbors, N/16 processors has 36 neighbors

Sending and receiving 8 byte messages without buffering.

Page 32: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Mesh modification before load balancing can lead to memory problems – Predictive load balancing performs weighted dynamic load balance

Mesh metric field at any point P is decomposed to three unit direction (e1,e2,e3) and desired length (h1,h2,h3) in each corresponding direction.

The volume of desired element (tetrahedron) : h1h2h3/6 Estimate # of elements to be generated:

Predictive Load Balancing

Page 33: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

ParMA - Partition Improvement Procedures

Incremental redistribution of mesh entities to improve overall balance Partitioning using Mesh Adjacencies - ParMA

Designed to improve balance for multiple entities types Use mesh adjacencies directly to determine best candidates for movement Current implementation based on neighborhood diffusion

Table: Region and vertex imbalance for a 8.8 million region uniform mesh on a bifurcation pipe model partitioned to different number of parts

Page 34: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Selection of vertices to be migrated: ones bounding small number of elements

Vertices with only one remote copy considered to avoid the possibility to create nasty part boundaries

Vertex imbalance: from 14.3% to 5%

Region imbalance: from 2.1% to 5%

ParMA - Partition Improvement Procedures

Page 35: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Enabling Co-Design of Multi-Layer Exascale Storage Architectures

The CODES Project

Page 36: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

The CODES ProjectUsing the Rensselaer Optimistic Simulation System (ROSS) as a parallel simulation framework, we are building a highly detailed and accurate model of the BG/L Torus network, enabling us to investigate contention of I/O and compute network traffic in potential exascale architectures.

Comparison of Network Torus Latency:Blue Gene/L versus Simulation

Do our models accurately reflect behavior of existing hardware?

Event Rate Scalability: Event rate as a function of BG/L processors

Do our simulations scale on today’s leadership-class systems?

Page 37: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Mesh Curving for COMPASS Analyses

37

mesh close-up before and after correcting invalid mesh regions

marked in yellow

• Mesh curving applied to 8-cavity cryomodule simulations• 2.97 Million curved regions• 1,583 invalid elements corrected – leads to stable simulation and

executes 30% faster

Page 38: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Moving Mesh Adaptation

• FETD for short-range wakefield calculations▪ Adaptively refined meshes have 1~1.5

million curved regions▪ Uniform refined mesh using small mesh

size has 6 million curved regions

Electric fields on the three refined curved meshes

Page 39: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Patient Specific Vascular Surgical Planning

Initial mesh has 7.1 million regionsInitial mesh is isotropic outside boundary layerThe adapted mesh: 42.8 million regions 7.1M->10.8M->21.2M->33.0M->42.8MBoundary layer based mesh adaptationMesh is anisotropic

Page 40: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Multiscale Simulations for Collagen Indentation

• Multiscale simulation linking microscale network model to a macroscale finite element continuum model.

• Collaborating with experimentalists at the University of Minnesota

Macroscale Model Microscale Model

Page 41: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

Concurrent Multiscale: Atomistic-to-Continuum

Nano-indentation of a thin film.Concurrent modelconfiguration at 60th loadstep (3 A indentationdisplacement). Colors representthe sub-domains in whichvarious models are used.

Nano-void subjected to hydrostatictension. Finite element discretization of the problem domain anddislocation structures.

Page 42: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

42

ParallelComputingMethods

Fab-Aware High-Performance Chip Design

size

sca

le

circ

uits

dev

ice

sa

tom

s/ca

rrie

rs

design manufacture use/performance

Simulation AutomationComponents

Device simulation

Super-resolutionlithography tools

Reactive ionetching

variation-awarecircuit design

1st principlesCMOS modeling Modeling/simulation

development

Technology development

Mechanics ofdamage nucleation in devices

Page 43: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

43

First-Principles Modeling for Nanoelectronic CMOS (Nayak)

E

Fermi level

NU

UN

Poisson

Schrödinger

UI

Input to circuit level from atomic level physics

As Si CMOS devices shrink nanoelectronic effects emerge. Fermi-function based analysis gives way

to quantum energy-level analysis. Poisson and Schrodinger equations

reconciled iteratively, allowing for current predictions.

Carrier dynamics respond to strain in increasingly complex ways from mobility changes to tunneling effects.

New functionalities might be exploited▪ Single-electron transistors▪ Graphene semiconductors▪ Carbon nanotube conductors▪ Spintronics – encoding information into charge

carrier’s spin

Page 44: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

44

Super-Resolution Lithography Analysis (Oberai)

Motivation: Reducing feature size in has made the

modeling of underlying physics critical. In projective lithography simple biases

not adequate In holographic lithography near-field

phenomenon is predominant Modeling approach must be based on

Maxwell’s equations

Goal: Develop unified computational

algorithms for the design and analysis of super-resolution lithographic processes that model the underlying physics with high fidelity

Projective Lithography

Holographic Lithography

Page 45: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

45

Virtual Nanofabrication: Reactive-Ion Etching Simulation (Bloomfield)

To handle SRAM-scale systems, we expect much larger computational systems, e.g., 105 - 106 surface elements. Transport tracking scales O(n2) with number of surface elements n.

▪ Parallelizes well – every view factor can be computed completely independently of every other view factor, giving almost linear speed up.

Computational complexity of chemistry solver depends upon particular chemical mechanisms associated with etch recipe. Tend to be O(n2).

Cut away view of reactive ion etch simulation of an aspect ratio 1.4 via into a dielectric substrate with 7% porosity, and complete selectivity with respect to the underlying etch stop. A generic ion-radical etch model was used. ~103 surface elements. [Bloomfield et al., SISPAD 2003, IEEE.]

Page 46: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

46

Stress-induced Dislocation Formation in Silicon Devices (Picu) At 90 nm and below, devices have come to rely on increased carrier mobility

produced by strained silicon. As devices scale down, the relative importance of scattering centers increases. Can we have our cake and eat it too? How much strain can be built into a

given device before processing variations and thermo-mechanical load during use cause critical dislocation shedding?

Continuum FEM calculationsautomatically identify critical high-stress regions.

A local atomistic problem is constructed and an MD simulation is run, looking for criticality. Results feed back to continuum.

Page 47: NYS High Performance Computation Consortium funded by NYSTAR at $1M/year for 3 years  Goal is to provide NY State users support in the application.

47

Advanced Meshing Tools for Nanoelectronic Design (Shephard)

Advanced meshing tools and expertise exist at RPI and associated spin-off

Leverage tools to support CCNI projects such as the advanced device-modeling.

Local refinement and adaptivity can help carry the computation resources further. “More bang for the buck.”


Recommended