+ All Categories
Home > Documents > Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User...

Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User...

Date post: 11-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
23
Science in the Clouds and Beyond Lavanya Ramakrishnan Computational Research Division (CRD) & National Energy Research Scientific Computing Center (NERSC) Lawrence Berkeley National Lab
Transcript
Page 1: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Science in the Clouds and Beyond

Lavanya Ramakrishnan

Computational Research Division (CRD) &

National Energy Research Scientific Computing Center

(NERSC) Lawrence Berkeley National Lab

Page 2: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

The goal of Magellan was to determine the appropriate role for cloud computing for science

Program Office

Advanced Scientific Computing Research 17%

Biological and Environmental Research 9%

Basic Energy Sciences -Chemical Sciences 10%

Fusion Energy Sciences 10%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

Access to additional resources

Access to on-demand (commercial) paid resources closer to deadlines

Ability to control software environments specific to my application

Ability to share setup of software or experiments with collaborators

Ability to control groups/users

Exclusive access to the computing resources/ability to schedule independently of other groups/users

Easier to acquire/operate than a local cluster

Cost associativity? (i.e., I can get 10 cpus for 1 hr now or 2 cpus for 5 hrs at the same cost)

MapReduce Programming Model/Hadoop

Hadoop File System

User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources through science

gateways

Program Office

High Energy Physics 20%

Nuclear Physics 13%

Advanced Networking Initiative (ANI) Project 3%

Other 14%

Page 3: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Magellan was architected for flexibility and to support research

3 3

QD

R InfiniB

and

Mgt Nodes (2)

Flash Storage Servers 10 Compute/Storage Nodes 8TB High-Performance FLASH 20 GB/s Bandwidth

Gateway Nodes (27)

Compute Servers 720Compute Servers Nehalem Dual quad-core 2.66GHz 24GB RAM, 500GB Disk Totals 5760 Cores, 40TF Peak 21TB Memory, 400 TB Disk

Global Storage (GPFS) 1 PB

ESNet 10Gb/s

Big Memory Servers 2 Servers 2TB Memory

ANI 100 Gb/s Future

Aggregation Sw

itch

Router

IO Nodes (9)

Archival Storage

Page 4: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Science + Clouds = ?

4

Data Intensive Science Technologies from Cloud

Business model for Science

Performance and Cost

Page 5: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Scientific applications with minimal communication are best suited for clouds

5

0

2

4

6

8

10

12

14

16

18

GAMESS GTC IMPACT fvCAM MAESTRO256

Run

time

Rel

ativ

e to

Mag

ella

n (n

on-

VM)

Carver

Franklin

Lawrencium

EC2-Beta-Opt

Amazon EC2

Amazon CC

5

Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud, CloudCom 2010

Better

Page 6: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Scientific applications with minimal communication are best suited for clouds

6 6

Better

0

10

20

30

40

50

60

MILC PARATEC

Run

time

rela

tive

to M

agel

lan

Carver Franklin Lawrencium EC2-Beta-Opt Amazon EC2

Page 7: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

The principle decrease in bandwidth occurs when switching to TCP over IB.

7

0

0.5

1

1.5

2

2.5

3

3.5

4

32 64 128 256 512 1024

Ping

Pong

Ban

dwid

th(G

B/s

)

Number of Cores

IB TCPoIB 10G - TCPoEth Amazon CC

Better

BEST 2X

5X

HPCC PingPong BW Evaluating Interconnect and Virtualization Performance for High Performance Computing, ACM Perf Review 2012

Page 8: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Ethernet connections are unable to cope with significant amounts of network connection

8

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

32 64 128 256 512 1024

Ran

dom

Ord

er B

andw

idth

(GB

/s)

Number of Cores

IB TCPoIB 10G - TCPoEth Amazon CC 10G- TCPoEth VM 1G-TCPoEth

Better

BEST

Page 9: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

The principle increase in latency occurs for TCP over IB even at mid-range concurrency

9

0

50

100

150

200

250

32 64 128 256 512 1024

Ping

Pong

Lat

ency

(us)

Number of Cores

IB TCPoIB 10G - TCPoEth Amazon CC 10G- TCPoEth VM 1G-TCPoEth

Better

BEST

40X

HPCC: PingPong Latency

Page 10: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

0

50

100

150

200

250

300

350

400

450

500

32 64 128 256 512 1024

Ran

dom

Ord

er L

aten

cy (u

s)

Number of Cores

IB TCPoIB 10G - TCPoEth Amazon CC 10G- TCPoEth VM 1G-TCPoEth

Latency is affected by contention by a greater amount than the bandwidth

10

Better

BEST

10G VM shows 6X

HPCC: RandomRing Latency

Page 11: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Clouds require significant programming and system administration support

11

•  STAR performed Real-time analysis of data coming from Brookhaven Nat. Lab

•  First time data was analyzed in real-time to a high degree

•  Leveraged existing OS image from NERSC system

•  Started out with 20 VMs at NERSC and expanded to ANL.

Page 12: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

On-demand access for scientific applications might be difficult if not impossible

Number of cores required to run a job immediately upon submission to Franklin

12

Page 13: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Public clouds can be more expensive than in-house large systems

13

Component Cost

Compute Systems (1.38B hours) $180,900,000

HPSS (17 PB) $12,200,000

File Systems (2 PB) $2,500,000

Total (Annual Cost) $195,600,000

Assumes 85% utilization and zero growth in HPSS and File System data. Doesn’t include the 2x-10x performance impact that has been measured.

This still only captures about 65% of NERSC’s $55M annual budget. No consulting staff, no administration, no support.

Page 14: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Cloud is a business model and can be applied to HPC centers

14

Traditional Enterprise IT HPC Centers

Typical Load Average 30% * 90%

Computational Needs Bounded computing requirements – Sufficient to meet customer demand or transaction rates.

Virtually unbounded requirements – Scientist always have larger, more complicated problems to simulate or analyze.

Scaling Approach Scale-in. Emphasis on consolidating in a node using virtualization

Scale-Out Applications run in parallel across multiple nodes.

Cloud HPC Centers

NIST Definition Resource Pooling, Broad network access, measured service, rapid elasticity, on-demand self service

Resource Pooling, Broad network access, measured service. Limited: rapid elasticity, on-demand self service

Workloads High throughput modest data workloads

High Synchronous large concurrencies parallel codes with significant I/O and communication

Software Stack Flexible user managed custom software stacks

Access to parallel file systems and low-latency high bandwidth interconnect. Preinstalled, pre-tuned application software stacks for performance

Page 15: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Science + Clouds = ?

15

Data Intensive Science Technologies from Cloud

Business model for Science

Performance and Cost

Page 16: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

MapReduce shows promise but current implementations have gaps for scientific applications

16

High throughput workflows

Scaling up from desktops

File system: non POSIX Language: Java Input and output formats: mostly line-oriented text Streaming mode: restrictive i/p and o/p model Data locality: what happens when multiple inputs?

Page 17: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Streaming adds a performance overhead

17

Better

Evaluating Hadoop for Science, In submission

Page 18: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

High performance file systems can be used with MapReduce at lower concurrency

0

2

4

6

8

10

12

0 500 1000 1500 2000 2500 3000

Tim

e (m

inut

es)

Number of maps

Teragen (1TB) HDFS

GPFS

Linear(HDFS) Expon.(HDFS) Linear(GPFS) Expon.(GPFS)

18

Better

Page 19: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Data operations impacts the performance differences

19

Better

Page 20: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Schemaless databases show promise for scientific applications

20

Schemaless database

manager.x manager.x manager.x

Brain

www.materialsproject.org Source: Michael Kocher, Daniel Gunter

Page 21: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Data centric infrastructure will need to evolve to handle large scientific data volumes

21

Joint Genome Institute, Advance Light Source, etc are all facing a data tsunami

Page 22: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Cloud is a business model and can be applied at DOE supercomputing centers

•  Current day cloud computing solutions have gaps for science –  performance, reliability, stability –  programming models are difficult for legacy apps –  security mechanisms and policies

•  HPC centers can adopt some of the technologies and mechanisms –  support for data-intensive workloads –  allow custom software environments –  provide different levels of service

22

Page 23: Science in the Clouds and Beyondcloudseminar.berkeley.edu/data/magellan.pdf · User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources

Acknowledgements

•  US Department of Energy DE-AC02-05CH11232 •  Magellan

–  Shane Canon, Tina Declerck, Iwona Sakrejda, Scott Campbell, Brent Draney •  Amazon Benchmarking

–  Krishna Muriki, Nick Wright, John Shalf, Keith Jackson, Harvey Wasserman, Shreyas Cholia

•  Magellan/ANL –  Susan Coghlan, Piotr T Zbiegiel, Narayan Desai, Rick Bradshaw, Anping

Liu •  NERSC

–  Jeff Broughton, Kathy Yelick •  Applications

–  Jared Wilkening, Gabe West, Ed Holohan, Doug Olson, Jan Balewski, STAR collaboration, K. John Wu, Alex Sim, Prabhat, Suren Byna, Victor Markowitz

23


Recommended