17.09.2014 Leibniz-Rechenzentrum2
Agenda
• Introduction
• Power Infrastructure
• Cooling Infrastructure
• IT systems
• Issues @lrz.de
• Discussion
Leibniz Supercomputing Centre
-3-
Munich Bavaria Germany & Europe
• We provide generic IT services to all Munich universities
• We provide special IT services to all universities in Bavaria• Network, High Performance and Grid Computing
• Backup and Archive Services
• IT Management
• We provide supercomputing resources to scientists in Europe• Member of the German Gauss Supercomputing Centre
• Part of the European HPC Infrastructure PRACE
• Operating Tier-0 Supercomputing Center (SuperMUC system)
• Investigations on Future HPC Systems:
• Hardware Architectures
• Programming Models & System Software
• Zero Emission Data Center
• Re-Use of Waste Heat
SuperMUC: IBM System x iDataPlex
With Direct Water Cooling
-4-Torsten Bloth, IBM Lab Services - © IBM Corporation
iDataplex DWC Rack w/ water cooled nodes
(rear view of water manifolds)
Data Center Infrastructure
-5-
Layout Power Infrastructure
Controls Power Infrastructure
• Equipment:- Transformer, switching, ...: SIEMENS
- Dyn UPS: Piller
- Battery backup: Emerson
- Diesel generator: MTU
• Metering- SOCOMEC&WinCC (power)
- Piller&WinCC (power, messaging)
- JCI Metasys M5 (power)
- deZem (power)
- SWM - Utility Provider (power)
• Monitoring- Siemens WinCC
-7-
-8-
Layout Cooling Infrastructure
-9-
Controls Cooling Infrastructure
• Equipment:
- Cooling towers: Gohl, Jaeggi
- Chiller: McQuay, CARRIER
- CRAC/CRAH: GEA, WEISS, STULZ,
RC Group
- Pumps: Grundfoss/ABB&KSB
• Metering
- Krohne (flow)
- Calec (heat)
- WIKA a.o. (pressure, temperature)
• Monitoring & Operations
- JCI Metasys
-10-
Monitoring of IT Systems
• SuperMUC
- Vendor solution: IBM tool set based on icinga
- Power and energy readings at server (PDU & Paddle cards
& RAPL counters) and system level
- Temperature at server level and room level
- Pressure/heat at system level
• CoolMUC
- Vendor solution: power/heat/temperature/flow control
• Clusters & servers, NAS systems
- Nagios based inhouse tools
• Tape libraries
• Networking
-11-
Issues @lrz.de
• Power infrastructure
- Monitoring, reporting (dashboard)
- Quality of reported measurements
• Cooling infrastructure
- Ops of cooling loops (hydraulics, meta controls)
- Ops of cooling towers
• Information management
- Integration&consolidation of heterogeneous data sources
- Interoperatorbility of differing system controls
• General
- Interaction with vendors/contractors of BMS
- Strategy DCIM: in house/vendor based, open source
-12-
Topics of Interest
• Requirements of liquid cooled systems for BMS
• Requirements of large HPC systems for systems control
• Roadmaps for BMS and DCIM
• Vendors view on status and trends in system controls
• Standardization
• APIs
• Lessons learned and white paper on
„Best practise in systems controls for HPC data centers“
Zero Emission Supercomputing Centre
Thank You!
Overview Cooling Infrastructure
Wasseraufbereitung
(3x Umkehrosmose)
UKG 6x
Dach
SuperMUC
Compute
Section (≈ 150 Racks)
HRR
Disks
Tape
Libraries
KW-Verteiler/-Sammler
Serv
ers
(2
5x K
KT
Kra
us R
acks)
Su
pe
rMU
C In
terc
on
ne
ct
(„R
DH
X“)
EGGelände
RLT 2x GEA
NSHVsKä
lte
mas
ch
ine
n5x+
2x
SuperMUC Storage
3.OG 2.OG
NSR
1.OG EG
RLT (2x)
Dir
ekte
Modu
l-K
üh
lun
g
30
– 6
0°C
Kühltürme
(4x Gohl)
Dunstturm
(1x Gohl)
Kühltürme
(2x Gohl
+ 5x Jäggi)
Brunnen
KKG
5x
Netw
ork
& C
ore
Serv
ers
I
Serv
ers
(≈
95
x R
acks)
UKGUKG
Präzisions-
kühler
„fre
ie K
ühlu
ng“
(Win
ter)
Rü
ckkühlu
ng
RLT (2x)
RLT 2xGEA
KKG
4x
Netw
ork
& C
ore
Serv
ers
II
UKG
5x
DAR
KKG
3x+3x
USVstat
3x+3x
UG
UKG
WKZ
NEA
USVdyn.
3x+6x
Trafo
6x+6x
Mittel-
SP
Elektro
Kältemaschinen
(2x McQuay
+ 5x Carrier)
Gelä
nd
e