Date post: | 05-Apr-2018 |
Category: |
Documents |
Upload: | martin-tsenkov |
View: | 220 times |
Download: | 0 times |
of 17
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
1/17
Computing is changing more rapidly
than ever before, and scientists have the
unprecedented opportunity to changecomputing directions
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
2/17
Largest computer at a given time Technical use for science and engineering
calculations Large government defense, weather, aero
laboratories are first buyers
Price is no object Market size is 3-5
5/26/2012 Copyright G Bell & TCM History Center 2
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
3/17
Major Challenges are ahead for extremecomputing Power
Parallelism
and many others not discussed here
We will need completely new approaches andtechnologies to reach the Exascale level
This opens up a unique opportunity for scienceapplications to lead extreme scale systemsdevelopment
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
4/17
Commodity processor with commodity inter-processorconnectionClustersPentium, Itanium, Opteron, AlphaGigE, Infiniband, Myrinet, Quadrics, SCINEC TX7
HP AlphaCommodity processor with custom interconnect
SGI AltixIntel Itanium 2Cray Red StormAMD Opteron
Custom processor with custom interconnectCray X1NEC SX-7IBM RegattaIBM Blue Gene/L
Loosely
Coupled
TightlyCoupled
Commercial Parallel Computer Architecture
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
5/17
SGI AltixThe Columbia Supercomputer at NASA'sAdvanced Supercomputing Facility at AmesResearch Center.It consists of a 10,240-processor SGI Altix
system comprised of 20 nodes, each with512 Intel Itanium 2 processors, and running aLinux operating systemBlack Hole Simulations
Hitachi SR11000NEC SX-7AppleCray RedStormCray BlackWidow
IBM Blue Gene/L
http://imagine.gsfc.nasa.gov/Images/news/columbia_computer.jpghttp://imagine.gsfc.nasa.gov/Images/news/columbia_computer.jpg7/31/2019 Martin Tsenkov;ELFE;221210014;77B
6/17
5/26/2012 Copyright G Bell & TCM History Center 6
Time $M structure example1950 1 mainframes many...
1960 3 instruction //sm IBM / CDCmainframe SMP
1970 10 pipelining 7600 / Cray 11980 30 vectors; SCI Crays1990 250 MIMDs: mC, SMP, DSM Crays/MPP2000 1,000 ASCI, COTS MPP Grid, Legion
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
7/17
Intel Pentium Xeon3.2 GHz, peak = 6.4 Gflop/s
Linpack 100 = 1.7 Gflop/sLinpack 1000 = 3.1 Gflop/s
AMD Opteron2.2 GHz, peak = 4.4 Gflop/sLinpack 100 = 1.3 Gflop/sLinpack 1000 = 3.1 Gflop/s
Intel Itanium 21.5 GHz, peak = 6 Gflop/sLinpack 100 = 1.7 Gflop/sLinpack 1000 = 5.4 Gflop/s
HP PA RISCSun UltraSPARC IVHP Alpha EV68
1.25 GHz, 2.5 Gflop/s
MIPS R16000
Linpack: a standard benchmark software that testhow fast your computer runs
Gflop/s: One billion floating pointoperations per second
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
8/17
Switch
topology
NIC $ Node MPI Lat
(us)
1-way
speed(MB/s)
Bi-Dir
speed(MB/s)
GigabitEthernet
Bus $ 50 $100 30 100 150
SCI Torus $1,600
$1600 5 300 400
QsNetII(R)
Fat Tree $1200 $2900 3 880 900
Myrinet(D card
Clos $595 $995 6.5 240 480
Myrinet(E card)
Clos $995 $1395 6 450 900
IBM 4X Fat Tree $1000 $1400 6 820 790
Gig EthernetMyrinetInfinibandQsNetSCI
More detail
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
9/17
Tree network: there is only one path between
any pair of processors. Fat tree network: increase the number of
communication links close to the root.
Root levelhas more physicalconnections
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
10/17
A.K.A----Wrapped-around-mesh topology
Three-dimensional Mesh
Mesh with wraparound
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
11/17
is a kind of multistage switchingnetwork
Three stages, each consisting a numberof crossbars.
Middle stage have redundant switchingboxes to alleviate blocking probability
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
12/17
By Myricom company First Myrinet in 1994 An alternative for
Ethernet to connectthe nodes in a cluster
entirely operated inuser space, no
Operating Systemdelays
Miyinet switch: 10-Gbps, $12,800Clos networks up to 128 host ports
10G PCI Express NICWith fiber connectors
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
13/17
By Quadrics (formed in 1996) uses a 'fat tree' topology QsNetII scales up to 4096 nodes
Each node might have multiple CPUs Designed for use within SMP systems MPI latency on standard AMD Opteron starts
at 1.22 usec; Bandwidth on Intel Xeon EM64T is 912
Mbytes/s.
QsNetII E-Series128-way switch
http://en.wikipedia.org/wiki/Symmetric_multiprocessinghttp://en.wikipedia.org/wiki/Opteronhttp://en.wikipedia.org/wiki/EM64Thttp://en.wikipedia.org/wiki/EM64Thttp://en.wikipedia.org/wiki/Opteronhttp://en.wikipedia.org/wiki/Symmetric_multiprocessing7/31/2019 Martin Tsenkov;ELFE;221210014;77B
14/17
Each chip containstwo nodes
Each node is aPPC440 processor
Each node has 512local memory
Each node runslightweight OS withMPI.
Each node runs oneuser process
No contextswitching at node
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
15/17
Use five networks: GigE for I/O nodes, to external systems
A control network use FastEthernet
3-D Torus for node-to-node message passing Handle majority of application traffic (mpi messaging)
Longest path: 64 hops
MPI software is highly customized: A collective network for broadcasting
A barrier network
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
16/17
7/31/2019 Martin Tsenkov;ELFE;221210014;77B
17/17
System attributes 2010 2015 2018
System peak 2 Peta 200 Petaflop/sec 1 Exaflop/sec
Power 6 MW 15 MW 20 MW
System memory 0.3 PB 5 PB 32-64 PB
Node performance 125 GF 0.5 TF 7 TF 1 TF 10 TF
Node memory BW 25 GB/s 0.1 TB/sec 1 TB/sec 0.4 TB/sec 4 TB/sec
Node concurrency 12 O(100) O(1,000) O(1,000) O(10,000)
System size (nodes) 18,700 50,000 5,000 1,000,000 100,000
Total Node
Interconnect BW
1.5 GB/s 20 GB/sec 200 GB/sec
MTTI days O(1day) O(1 day)