+ All Categories
Home > Documents > Michael L. Norman Principal Investigator Interim Director, SDSC

Michael L. Norman Principal Investigator Interim Director, SDSC

Date post: 02-Jan-2016
Category:
Upload: hilda-wilkins
View: 45 times
Download: 0 times
Share this document with a friend
Description:
Michael L. Norman Principal Investigator Interim Director, SDSC. Allan Snavely Co-Principal Investigator Project Scientist. What is Gordon?. A “data-intensive” supercomputer based on SSD flash memory and virtual shared memory Emphasizes MEM and IO over FLOPS - PowerPoint PPT Presentation
19
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Michael L. Norman Principal Investigator Interim Director, SDSC Allan Snavely Co-Principal Investigato Project Scientist
Transcript
Page 1: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Michael L. NormanPrincipal InvestigatorInterim Director, SDSC

Allan SnavelyCo-Principal InvestigatorProject Scientist

Page 2: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

What is Gordon?

• A “data-intensive” supercomputer based on SSD flash memory and virtual shared memory• Emphasizes MEM and IO over FLOPS

• A system designed to accelerate access to massive data bases being generated in all fields of science, engineering, medicine, and social science

• The NSF’s most recent Track 2 award to the San Diego Supercomputer Center (SDSC)

• Coming Summer 2011

Page 3: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Why Gordon?

• Growth of digital data is exponential• “data tsunami”

• Driven by advances in digital detectors, networking, and storage technologies

• Making sense of it all is the new imperative• data analysis workflows• data mining• visual analytics• multiple-database queries• on demand data-driven

applications

Page 4: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

The Memory Hierarchy

Flash SSD, O(TB)1000 cycles

Potential 10x speedup for random I/O to large files and databases

Page 5: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Gordon Architecture: “Supernode”

• 32 Appro Extreme-X compute nodes• Dual processor Intel

Sandy Bridge• 240 GFLOPS• 64 GB

• 2 Appro Extreme-X IO nodes• Intel SSD drives

• 4 TB ea.• 560,000 IOPS

• ScaleMP vSMP virtual shared memory• 2 TB RAM aggregate• 8 TB SSD aggregate

240 GFComp.Node

64 GBRAM

240 GFComp.Node

64 GBRAM

4 TB SSDI/O Node

vSMP memory virtualization

Page 6: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Gordon Architecture: Full Machine

• 32 supernodes = 1024 compute nodes

• Dual rail QDR Infiniband network• 3D torus (4x4x4)

• 4 PB rotating disk parallel file system• >100 GB/s

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

SN SN

D D D D D D

Page 7: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Gordon Peak Capabilities

Speed 245 TFLOPS

Mem (RAM) 64 TB

Mem (SSD) 256 TB

Mem (RAM+SSD) 320 TB

Ratio (MEM/SPEED) 1.31 BYTES/FLOP

IO rate to SSDs 35 Million IOPS

Network bandwidth 16 GB/s bi-directional

Network latency 1 sec.

Disk storage 4 PB

Disk IO Bandwidth >100 GB/sec

Page 8: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Gordon is designed specifically for data-intensive HPC applications

• Such applications involve “very large data-sets or very large input-output requirements”

• Two data-intensive application classes are important and growing

Data Mining

“the process of extracting hidden patterns from data… with the amount of data doubling every three years, data mining is becoming an increasingly important tool to transform this data into information.” Wikipedia

Data-IntensivePredictive Science

solution of scientific problems via simulations that generate large amounts of data

Page 9: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

High Performance Computing (HPC) vs High Performance Data (HPD)

Attribute HPC HPD

Key HW metric Peak FLOPS Peak IOPS

Architectural features Many small-memory multicore nodes

Fewer large-memory SMP nodes

Typical application Numerical simulation Database queryData mining

Concurrency High concurrency Low concurrency or serial

Data structures Data easily partitionede.g. grid

Data not easily partitioned e.g. graph

Typical disk I/O patterns Large block sequential Small block random

Typical usage mode Batch process Interactive

Page 10: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Data mining applicationswill benefit from Gordon

• De novo genome assembly from sequencer reads & analysis of galaxies from cosmological simulations and observations

• Will benefit from large shared memory

• Federations of databases and Interaction network analysis for drug discovery, social science, biology, epidemiology, etc. • Will benefit from low latency

I/O from flash

Page 11: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Data-intensive predictive sciencewill benefit from Gordon

• Solution of inverse problems in oceanography, atmospheric science, & seismology• Will benefit from a balanced

system, especially large RAM per core & fast I/O

• Modestly scalable codes in quantum chemistry & structural engineering• Will benefit from large

shared memory

Page 12: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Dash:

towards a supercomputer for data

intensive computing

Page 13: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Project Timeline

• Phase 1: Dash development (9/09-7/11)• Phase 2: Gordon build and acceptance (3/11-7/11)• Phase 3: Gordon operations (7/11-6/14)

Page 14: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Comparison of the Dash and Gordon systems

Doubling capacity halves accessibility to any random data on a given mediaSystem Component Dash Gordon

Node Characteristics (# sockets, cores, DRAM) 2 sockets, 8 cores, 48 GB 2 sockets, TBD cores, 64 GB

Compute Nodes (#) 64 1024

Processor Type Nehalem Sandy Bridge

Clock Speed (GHz) 2.4 TBD

Peak Speed (Tflops) 4.9 245

DRAM (TB) 3 64

I/O Nodes (#) 2 64

I/O Controllers per Node 2 with 8 ports 1 with 16 ports

Flash (TB) 2 256

Total Memory: DRAM + flash (TB) 5 320

vSMP Yes Yes

32-node Supernodes 2 32

Interconnect InfiniBand InfiniBand

Disk .5 PB 4.5 PB

Page 15: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Gordon project wins storage challenge at SC09 with Dash

Page 16: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

We won SC09 Data Challenge with Dash!

• With these numbers:• IOR 4KB

• RAMFS 4Million+ IOPS on up to .750 TB of DRAM (1 supernode’s worth)

• 88K+ IOPS on up to 1 TB of flash (1 supernode’s worth)• Speed up Palomar Transients database searches 10x to

100x• Best IOPS per dollar

• Since that time we boosted flash IOPS to 540K (hitting our 2011 performance targets – it is now 2009

Page 17: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Dash Update – early vSMP test results

Page 18: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Dash Update – early vSMP test results

Page 19: Michael L. Norman Principal Investigator Interim Director, SDSC

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Next Steps• Continue vSMP and flash SSD assessment and

development on Dash• Prototype Gordon application profiles using Dash

• New application domains• New usage modes and operational support mechanisms• New user support requirements

• Work with TRAC to identify candidate apps• Assemble Gordon User Advisory Committee• International Data-Intensive Conference Fall 2010


Recommended