+ All Categories
Home > Documents > HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

Date post: 15-Oct-2021
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
21
HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)
Transcript
Page 1: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

Page 2: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

2

What is HPC’s Pivot to Data?

•  The boundaries of the scientific computing ecosystem are being pushed beyond single centers providing HPC resources or any single experimental facility.

•  Increasing data volume, variety, and velocity require additional resources to foster a productive environment for scientific discovery. This focus on data does not imply a reduction in importance of large-scale simulation capabilities

• Predicting how and why scientific workflows achieve their observed end-to-end performance is a growing challenge for scientists.

•  This talk will focus on data movement- because it is an excellent example of a task that requires multi-facility collaboration to enable multi-facility scientific workflows

Page 3: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

3

Data Growth

NERSC Storage Usage Projection (Log Scale)

Peta

byte

s

2000

2002

2004

2006

2008

2010

2012

2014

2016

2018

2020

2022

0.1

110

100

1000

HPSS Total PBNGF Total PB

NCCS HPSS Usage since 2010

Peta

byte

s

Jan 2

010

Apr 20

10

Jul 2

010

Oct 20

10

Jan 2

011

May 20

11

Aug 20

11

Nov 20

11

Feb 2

012

May 20

12

Aug 20

12

Nov 20

12

Mar 20

13

Jun 2

013

Sep 20

13

Dec 20

13

Mar 20

14

1015

2025

3035

10

20

25

30

Pet

abyt

es

35

15

10

10

Pet

abyt

es 10

0 10

00

1 0.

1

NCCS/OLCF HPSS Usage NERSC HPSS Usage

Page 4: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

4

Jan 1990 Jul 1990 Jan 1991 Jul 1991 Jan 1992 Jul 1992 Jan 1993 Jul 1993 Jan 1994 Jul 1994 Jan 1995 Jul 1995 Jan 1996 Jul 1996 Jan 1997 Jul 1997 Jan 1998 Jul 1998 Jan 1999 Jul 1999 Jan 2000 Jul 2000 Jan 2001 Jul 2001 Jan 2002 Jul 2002 Jan 2003 Jul 2003 Jan 2004 Jul 2004 Jan 2005 Jul 2005 Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008 Jul 2008 Jan 2009 Jul 2009 Jan 2010 Jul 2010 Jan 2011 Jul 2011 Jan 2012 Jul 2012 Jan 2013 Jul 2013 Jan 2014

Month

0.0000

0.0001

0.0010

0.0100

0.1000

1.0000

10.0000

100.0000

Peta

byte

sESnet Accepted Traffic: Jan 1990 - Mar 2014 (Log Scale)

Data Movement

100000

10000

1000

0.100

0.010

0.001

0.0001

0.00

Pet

abyt

es

1990

1995

2000

2005

2010

2014

ESnet Accepted Traffic: Jan 1990-Mar. 2014

Page 5: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

5

Inter-Workflow: One HPC Center

Even projects with workflows designed to utilize only one HPC center need efficient data transfer, because data is typically moved to more permanent storage at the end of a project.

Page 6: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

6

Intra-workflows: Simulation

Lingerfelt et al 2014, Near real-time data analysis of core-collapse supernova simulations with Bellerophon,” in The International Conference on Computational Science 2015 Proceedings, in press, 2014.

Page 7: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

7

Intra-workflows : Experiment

Single  par+cle  X-­‐ray  Crystallography    10E6  diffrac+ve  images  reconstructed  to  reproduce  angstrom  scale  3-­‐D  structure.

Page 8: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

8

Enabling Intra-workflow: Remote Transfers Centers • How can we enable secure data movement? • What transfer tools are best for the system? • How can we efficiently monitor our network?

Users • Can I scrip the transfer? Click and forget ? • How long will the transfer take? • What tools do you recommend?

Page 9: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

9

The Science DMZ

• A network architecture explicitly designed for high-performance applications, where the science network is distinct from the general-purpose network

•  The use of dedicated systems for data transfer • Performance measurement and network testing systems

that are regularly used to characterize the network and are available for troubleshooting

• Security policies and enforcement mechanisms that are tailored for high performance science environments

Page 10: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

10

OLCF Infrastructure

Titan, 27PF, 299,008 Cores

18,866 Cray XK7 Nodes

Eos, 200 TF, 11,904 Cores

744 Cray XC30 Nodes

Rhea Focus

EVEREST

Infiniband Core

(SION)

Data Transfer Nodes

Ethernet Core

18 SFA12K500GB/s

Atlas1

18 SFA12K500GB/s

Atlas2

432 FDR IB Ports

2 x 10GbE

1 x 100GbE

144 FDR IB Ports

144 FDR IB Ports

20 FDR IB Ports

HPSS35PB, 5GB/s

ingest

38 FDR IB Ports

Page 11: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

11

NERSC Infrastructure

Page 12: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

12

http://fasterdata.es.net/performance- testing/perfsonar/ perfsonar- dashboard/

Perfsonar: How well is the transfer performing?

Page 13: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

13

• Open to the public.

Perfsonar: How is my transfer performing?

Page 14: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

14

The Pros and Cons of SCP and Rsync

Single  stream:  therefore  slow  and  take  poor  advantage  of  WAN.  

The  common  version  do  not  allow  much  control  over  buffer  size  and  fault  

checking.      

SCP  Common  on  all  Unix-­‐like  systems  

Can  be  scripted  into  a  workflow  when  the  

transfer  des+na+on  allows  for  passwordless  SSH  

logins.  

Page 15: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

15

The Pros and Cons Parallel Streams: GridFTP, globusonline, bbcp

The  soPware  must  be  available  at  both  end  of  the  transfer  and  the  availability  varies.  Setup  required.    

Security  policies  impact  ease  of  use.  

Mul+-­‐streams  allow  fast  transfers  on  WAN.  Many  Op+ons  for  customiza+on.  

Globus  is  extremely  user-­‐friendly  once  it  is  set  up.    

Activation Energy

Page 16: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

16

What rates can I expect?

0"

1000"

2000"

3000"

4000"

GUC" GUC"(stripe)" Globus" BBCP" RSYNC"

!Rate!Mbits!

1"TB" 10"100"GB"files"

0"20"40"60"80"100"120"

GUC" GUC"(stripe)" Globus" BBCP"

Average'Time'

1"TB" 10"100"GB"files"

Page 17: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

17

Best tool features for the system?

This filesystem benchmarking with fio shows that the performance improvement from GUC to GUC with striping comes from parallelism at the Lustre client.

Page 18: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

18

Case Studies

Year   Methods   Data   Transfer  

Combus+on  research  challenge  for  mul+-­‐lab  work-­‐  flow      

2007-­‐2009   dtns/SCP/Kepler  workflow  

10TB   OLCFèNERSC  

20th  Century  Climate  Reanalysis      

2011   HPSS  Direct   40  TB   OLCFèNERSC    

DNS  Combus+on    

2013   dtns/  Globus     80TB   OLCFèALCF    

LSCS   2013-­‐14   dtns  /  bbcp  Globus  

130  TB   SLACèNERSC    

Page 19: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

19

Road Maps: Data-Centric Services • ORNL is establishing a Compute & Data Environment for

Science (CADES). OLCF is a key partner. • CADES will be a HUB to share data infrastructure and

compute & data science capabilities with and among many projects.

Page 20: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

20

Conclusion

• Success for workflows with large data movement depends critically on a ”Science DMZ” like approach to connectivity

• HPC centers and their users benefit from collaboration between different centers and facilities and collaboration between centers and projects.

• Computing centers traditionally focused on large-scale simulation are expanding their repertoire to include user-facing data services.

• Data services are built around a long-lived resources, the data itself. Both HPC architecture and allocation of will need to adapt to this longevity.

Page 21: HPC’s Pivot to Data Suzanne Parete-Koon (OLCF)

21

Data Transfer Working Group

This manuscript has been authored by an author at Lawrence Berkeley National Laboratory under Contract No. DE-AC02-05CH11231 with the U.S. Department of Energy. This work used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

The U.S. Government retains, and the publisher, by accepting the article for publication, acknowledges, that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for U.S. Government purposes.

OLCF B.Caldwell J. Hill C. Layton S. Parete-Koon D. Pelfrey H. Nam J. Wells

CADES G. Shipman

NERSC S. Cannon D. Hazen J. Hick D. Skinner

ESnet E. Dart J. Zurawski


Recommended