+ All Categories
Home > Documents > CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc...

CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc...

Date post: 04-Jun-2018
Category:
Upload: truongxuyen
View: 219 times
Download: 0 times
Share this document with a friend
42
1 CCS The Center for Computational Sciences CCS The Center for Computational Sciences CCS Cray Biology Workshop Mark R. Fahey May 9, 2003
Transcript
Page 1: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

1

CCSThe Center forComputational SciencesCCSThe Center forComputational Sciences

CCS Cray Biology Workshop

Mark R. FaheyMay 9, 2003

Page 2: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

2

CCSThe Center forComputational Sciences

Goals

• Computational Sciences: − Identify one or two important biology codes that are expected

to benefit from vectorization− Identify people who will work on the optimization of these

codes on the X1 • Life Sciences:

− Hear what Cray has done with informatics algorithms to date − Find out what non-traditional facilities can be accessed (i.e., all

integer and word manipulation) • Cray:

− Understand of ORNL's usage of applications in the life sciences. Interested in hearing about genomics, proteomics, or systems biology

− Work with scientists who have bio applications believed to be well-suited for the X1 system, and to see how they might potentially assist in porting and optimizing those codes

Page 3: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

3

CCSThe Center forComputational Sciences

Agenda

1060 Commerce Park Conference Room 176

8:00-9:00 Cray X1 Overview, Mark Fahey9:00-10:00 Biology and Informatics at Cray, Jim Maltby10:00-10:30 Break10:30-11:00 Genomes to Life, Al Geist11:00-11:45 Cross-cutting needs GTL Facilities and

Transparent access to Supercomputers, Phil Locascio

11:45-12:15 PROSPECT and pathway codes, Ying Xu12:15-1:15 Lunch

Page 4: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

4

CCSThe Center forComputational Sciences

Agenda (cont.)

6010 Conference Room1:15-1:40 PICUPP and gene function codes, Nagiza

Samatova1:40-2:05 Protein complex using Rosetta, Andrey

Gorin2:05-2:20 MD simulation of biological molecules,

Pratul Agarwal2:20-3:30 discussion3:30-4:00 Break4:00-4:30 Biomechanics, Richard Ward4:30-5:00 discussion and wrap up, Mark Fahey

Page 5: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

5

CCSThe Center forComputational SciencesCCSThe Center forComputational Sciences

CCS Cray Biology Workshop:CCS X1

Mark R. [email protected]

Center for Computational Sciences

Page 6: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

6

CCSThe Center forComputational Sciences

Acknowledgement

Research sponsored by the Mathematical, Information, and Computational Sciences Division, Office of Advanced Scientific Computing Research, U.S. Department of Energy, under Contract No. DE-AC05-00OR22725 with UT-Battelle, LLC.

Page 7: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

7

CCSThe Center forComputational Sciences

Phase 1 – March 2003• 32 Vector Processors

− 8 nodes, each with 4 processors

• 128 GB shared memory• 8 TB of disk space

400 GigaFLOP/s

Page 8: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

8

CCSThe Center forComputational Sciences

Phase 2 – September 2003

• 256 Vector Processors− 64 nodes

• 1 TB shared memory• 20 TB of disk space 3.2 TeraFLOP/s

Page 9: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

9

CCSThe Center forComputational Sciences

Outline

• Evaluation Plan• X1 architecture• CCS X1 status

Page 10: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

10

CCSThe Center forComputational SciencesCCSThe Center forComputational Sciences

Evaluation Plan

Page 11: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

11

CCSThe Center forComputational Sciences

Evaluate entire system

• Compare performance with other systems− Applications Performance Matrix

• Determine most-effective usage• Evaluate system-software reliability and

performance• Predict scalability• Collaborate with Cray on next generation

Page 12: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

12

CCSThe Center forComputational Sciences

Hierarchical approach

• System software• Microbenchmarks• Parallel-paradigm evaluation• Full applications• Scalability evaluation

Page 13: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

13

CCSThe Center forComputational Sciences

System-software evaluation

• Job-management systems• Mean time between failure• Mean time to repair• All problems, with Cray responses• Scalability and fault tolerance of OS• Filesystem performance & scalability• Tuning for HPSS, NFS, and wide-area

high-bandwidth networks

Page 14: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

14

CCSThe Center forComputational Sciences

Microbenchmarking

• Results of standard benchmarks• Performance metrics of components

− Vector & scalar arithmetic− Memory hierarchy− Message passing− Process & thread management− I/O primitives

• Models of component performance

Page 15: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

15

CCSThe Center forComputational Sciences

Programming-paradigm evaluation

• MPI-1, MPI-2 one-sided, SHMEM, Global Arrays, Co-Array Fortran, UPC, OpenMP, MLP, …

• Identify best techniques for X1• Develop optimization strategies for

applications

Page 16: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

16

CCSThe Center forComputational Sciences

Scalability evaluation

• Hot-spot analysis− Inter- and intra-node communication− Memory contention− Parallel I/O

• Trend analysis for selected communication and I/O patterns

• Trend analysis for kernel benchmarks• Scalability predictions from performance

models and bounds

Page 17: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

17

CCSThe Center forComputational Sciences

Application tuning & benchmarking

• Full applications of interest to DOE Office of Science− Scientific goals require multi-tera-scale

resources• Evaluation of performance, scaling, and

efficiency• Evaluation of ease/effectiveness of

targeted tuning

Page 18: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

18

CCSThe Center forComputational Sciences

Identifying applications

• Draft evaluation plan• Prototype Workshop Nov. 5-6 • Feb 3-5, 2003: Fusion• Feb 6, 2003: Climate• March 2, 2003: Materials • Current audience

− This workshop…

Page 19: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

19

CCSThe Center forComputational Sciences

Identifying applications

• Priorities− Potential performance and science payoff

• Schedule the pipeline− Porting, tuning, production runs− small number of applications in each stage

• Potential application− Important to DOE Office of Science− Scientific goals require multi-terascale resources

• Potential user− Willing and able to learn the X1− Knows the application− Motivated to tune application, not just recompile

Page 20: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

20

CCSThe Center forComputational SciencesCCSThe Center forComputational Sciences

X1 Architecture

Page 21: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

21

CCSThe Center forComputational Sciences

Cray X1 – Eventual goal

• World’s biggest single system

• 4096 processors, one system image

• 50+ TF• 4-way SMP nodes• Globally addressable

memory among nodes

Page 22: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

22

CCSThe Center forComputational Sciences

X1 node

• 4 multi-streaming processors (MSPs)

• 51 GF• 16 GB shared memory

(CCS)• 200 GB/s bandwidth

− 150 GB/s to local MSPs− Extra for remote access

Node

local shared memory

memory controllers

rem

ote

mem

ory

MSP MSP MSP MSP

Page 23: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

23

CCSThe Center forComputational Sciences

X1: MSP and SSP

• MSP− 4 single-streaming processors (SSPs)

• 12.8 GF (25.6 GF single precision)− 2 MB shared cache

• 51 GB/s load, 26 GB/s store• 4-word cache line

• SSP− Decoupled scalar and vector units− 400 MHz 2-way super-scalar unit

• 16 kB each data & instruction caches− 800 MHz 2-pipe vector unit

• 32 registers, 64 values (64-bit) each• 3 Functional Unit Groups:

Add, multiply, divide/sqrt

2 MB cache

SSP SSP SSP SSP

MSP

Page 24: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

24

CCSThe Center forComputational Sciences

X1: MSP

• Is an MSP one or four processors?• One!

− Fast synchronization− Shared cache− Can act like one 8-pipe processor

• Four!− Each SSP can operate independently− OpenMP-like directives

Page 25: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

25

CCSThe Center forComputational Sciences

X1 interconnect

• 3D torus− Not full bisection bandwidth− Process topology matters

• 12.8 GB/s/MSP• Memory to memory• Globally addressable memory

− Load/store memory on any node− Remote address translation

• On memory’s node, not at processor• Avoids TLB misses• Requires contiguous processors

− Cache coherent• Only cache local memory

Page 26: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

26

CCSThe Center forComputational Sciences

Parallel models

• Multistreaming within MSP• OpenMP within node (late 2003)• Between nodes (or processors)

− MPI-1 two-sided message passing− MPI-2 one-sided communication− SHMEM one-sided communication− Co-Array Fortran remote memory

Page 27: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

27

CCSThe Center forComputational Sciences

CCS X1: Status

• Current State• What’s there, what’s not• Latest performance numbers

Page 28: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

28

CCSThe Center forComputational Sciences

Status: Current State

• UNICOS/mp 2.1.09− Relatively stable− Tuesday afternoon was memorable though

• PE 4.3.0.0− C, C++, Co-Array, Fortran compilers

• Some Open Software• LIBSCI, MPI, SHMEM• Gdb, Totalviewcli, PAT performance tool• DFS-NFS gateway to home directories• /scratch scratch space

Page 29: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

29

CCSThe Center forComputational Sciences

Status: What’s there

• LIBSCI− Optimized BLAS – some issues− LAPACK− BLACS/SCALAPACK (not yet optimized)− FFTs− NetCDF

• Unofficially obtained from Cray• Open Software

− Perl 5.6.1 provided by Cray− Tcltk 8.4 built locally

Page 30: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

30

CCSThe Center forComputational Sciences

Status: What’s not there

• Passwords exported from DCE− Chsh disabled

• Loopmark listings for C, coming fall• Integration between psched and PBS (July)• Gcc• OpenMP not supported yet• MPI SSP mode

− Not officially supported yet, in PE 5.2 (Sept.)• no way to effectively trap exceptions in a code

Page 31: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

31

CCSThe Center forComputational Sciences

Status: problems

• Single precision BLAS− CGEMM performance

• Poor with leading dimension of 2n (like 1024)• Jobs Migrating• NFS performance issues• CPES occasionally loses NFS mounts• Network performance issues• Compiler (cc and ftn) core dumping

Page 32: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

32

CCSThe Center forComputational Sciences

Status: Performance

Page 33: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

33

CCSThe Center forComputational Sciences

Page 34: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

34

CCSThe Center forComputational Sciences

Page 35: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

35

CCSThe Center forComputational Sciences

Page 36: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

36

CCSThe Center forComputational Sciences

Page 37: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

37

CCSThe Center forComputational Sciences

Page 38: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

38

CCSThe Center forComputational Sciences

Page 39: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

39

CCSThe Center forComputational Sciences

Problems: Pivoting, matrix scaling

Page 40: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

40

CCSThe Center forComputational Sciences

Conclusion

• Early results are promising• Many issues/problems to be resolved• Definitely not ready for general population• Evaluation continues

Page 41: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

41

CCSThe Center forComputational Sciences

Conclusion: Goals

• Computational Sciences: − Identify one or two important biology codes that are expected

to benefit from vectorization− Identify people who will work on the optimization of these

codes on the X1 • Life Sciences:

− Hear what Cray has done with informatics algorithms to date − Demonstrate what non-traditional facilities can be accessed

(i.e., all integer and word manipulation) • Cray:

− Understand of ORNL's usage of applications in the life sciences. Interested in hearing about genomics, proteomics, or systems biology

− Work with scientists who have bio applications believed to be well-suited for the X1 system, and to see how they might potentially assist in porting and optimizing those codes

Page 42: CCS Cray Biology Workshop - csm.ornl.gov · CCS The Center for Computational ... •Compiler (cc and ftn) core dumping. 32 CCS The Center for Computational Sciences ... ORNL Cray

42

CCSThe Center forComputational Sciences

References

• James L. Schwarzmeier. “Cray X1 Architecture and Hardware Overview.” ORNL Cray X1 Tutorial, November 2002.

• Nathan Wichmann. “Coding for Performance on the Cray X1.” ORNL Cray X1 Tutorial, November 2002.

• John M. Levesque. “Message-Passing Paradigms.” ORNL Cray X1 Tutorial, November 2002.

• Arthur S. Bland. “Cray X1 at ORNL.” ORNL Cray X1 Tutorial, November 2002.

• James B. White III. “DOE Evaluation of the Cray X1.” ORNL Cray X1 Tutorial, November 2002.

• James B. White III. “Modern Vector Systems.” ORNL Cray Fusion Workshop, February 2003.

• Plots taken from Tom Dunigan’s early evaluation web page http://www.csm.ornl.gov/~dunigan


Recommended