Zellescher Weg 12
Trefftz-Bau/HRSK 151
Phone +49 351 - 463 - 39871
Guido Juckeland ([email protected])
Center for Information Services and High Performance Computing (ZIH)
Introduction to High Performance Computing at ZIH
Architecture of the PC Farm (Deimos)
Slide 2 - Guido Juckeland
Agenda
PC Farm Components
AMD Opteron Prozessors und Systems
Infiniband Networks
Slide 3 - Guido Juckeland
PC Farm Components (Deimos)
Slide 4 - Guido Juckeland
Linux Networx PC-Farm (Deimos)
1292 AMD Opteron x85 Dual-Core CPUs (2,6 GHz)
726 Compute nodes with 2, 4 oder 8 CPU Cores
Per core 2 GiByte main memory
2 Infiniband interconnects (MPI- and I/O-Fabric)
68 TByte SAN-Storage
Per node 70, 150, 290 GByte scratch-disk
OS: SuSE SLES 10
Batch system: LSF
Compiler: Pathscale, PGI, Intel, Gnu
3rd party applications: Ansys100, CFX, Fluent, Gaussian, LS-DYNA, Matlab, MSC,…
Slide 5 - Guido Juckeland
Deimos - Partitions
2 Master Nodes
– Not accessible for users, PC-Farm management
4 Login Nodes
– 4 Core Nodes
– Accessible with DNS Round Robin under deimos.hrsk.tu-dresden.de
Single-, Dual- und Quad-Nodes
– 1, 2 or 4 CPUs
– 4, 8 or 16 GiByte main memory (24 Quads with 32 GiByte)
– 80, 160 or 300 GByte local disks
Setup in phase 1 and phase 2 nodes
– Identical hardware
– Differences in the connection to the MPI- and the I/O-Fabric (later)
Slide 6 - Guido Juckeland
AMD Opteron Processors und Systems
Slide 7 - Guido Juckeland
AMD Opteron CPU - Design
AMD Opteron x85 (2,6 GHz)
Memory controller on-chip(2 memory channels with 3.2 GiByte/s transfer bandwidth each)
Each Core 64 KiByte level 1 instruciton- and data cache
1 MiByte Level 2 Cache
64 Bit extension of IA-32 x86-architecture (x86-64, x64 oder EM64T)
Now also as quad core CPUs available
Slide 8 - Guido Juckeland
AMD Opteron – Block diagram
Instr 'nTLB Level 1 Instr'n Cache
Fetch 2 - transit
Pick
Decode 1Decode 2
Decode 1Decode 2
Decode 1Decode 2
Pack Pack Pack
Decode Decode Decode
8-entryScheduler
8-entryScheduler
8-entryScheduler
ALU AGU ALU AGU ALU AGU FADD FMUL FMISC
36-entryScheduler
DataTLB Level 1 Data Cache ECC
2kBranchTargets
16kHistoryCounter
RAS&
Target Address
Level 2Cache
L2 ECCL2 Tags
L2 Tag ECC
System RequestQueue (SRQ)
Cross Bar(XBAR)
Memory Controller&
HyperTransport TM
v
Slide 9 - Guido Juckeland
Deimos – Layout of a single-CPU node
AMDOpteron
185Mem
ory
(4 G
iByt
e)
Hypertransport
Peripheral devices(Infiniband, Ethernet, Disk)
Slide 10 - Guido Juckeland
Deimos – Layout of a dual-CPU nodes
AMDOpteron
285
AMDOpteron
285Mem
ory
(4 G
iByt
e)
Mem
ory
(4 G
iByt
e)
Hypertransport
Hypertransport
Peripheral devices
(Infiniband, Ethernet, Festplatte)
Slide 11 - Guido Juckeland
Deimos - Layout of a quad-CPU Node
AMDOpteron
885
AMDOpteron
885Mem
ory
(4 G
iByt
e)
Mem
ory
(4 G
iByt
e)
Hypertransport
Hypertransport
Peripheral devices
(Infiniband, Ethernet, Festplatte)
AMDOpteron
885
AMDOpteron
885Mem
ory
(4 G
iByt
e)
Mem
ory
(4 G
iByt
e)
Hypertransport
Hypertransport Hypertransport
Slide 12 - Guido Juckeland
Infiniband Networks
Slide 13 - Guido Juckeland
Basic Layout
Slide 14 - Guido Juckeland
More complicated structures
Slide 15 - Guido Juckeland
Infiniband-Stack
Slide 16 - Guido Juckeland
Consequences for the user
No standard Linux networks (eth0,...)
No IP-addresses
No direct traffic monitoring possible
Very low MPI latency (about 5-15 μs)
High MPI bandwidth (up to 900 MiByte/s)
The batch system does not know about the state of the Infiniband fabric
Slide 17 - Guido Juckeland
Deimos Infiniband-Layout (rough sketch)
Node
Node
Node
Node
Node
...Node
Node
Node
Node
Node
...
MPI Netzwerk
IO Netzwerk
Slide 18 - Guido Juckeland
Deimos MPI-Fabric
+-------------------+ +--------------------+ +-------------------+| Switch 1 | | Switch 2 | | Switch 3 || | 30x | | 30x | || Rack 05 |-------| Rack 20 |-------| Rack 25 || | | | | || all Phase1 Nodes | | Phase2 Duals+Quads | | Phase 2 Singles |+-------------------+ +--------------------+ +-------------------+
3 288-Port Voltaire ISR 9288 IB-Switches with 4x Infiniband Ports
Slide 19 - Guido Juckeland
Deimos I/O Fabric
Tree structure with
– 1 192 Port Voltaire ISR 9288 IB-Switch with 4x Infiniband Ports (Rack 07)
– 36 24 Port Mellanox IB-Switch (4x) passive
Voltaire
Core-Switch
24 Port Mellanox
24 Port Mellanox
24 Port Mellanox
24 Port Mellanox
24 Port Mellanox
24 Port Mellanox
... ...
Phase 1 Phase 2