+ All Categories
Home > Documents > 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

Date post: 28-Jan-2017
Category:
Upload: vokhanh
View: 217 times
Download: 0 times
Share this document with a friend
35
2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]
Transcript
Page 1: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 2: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

2

Torsten Bloth

� HPC Systems Architect

� leading architect for the LRZ SuperMUC Phase 2 supercomputer.

� started with IBM back in 2005 and was part of the IBM System x transition into Lenovo 2014.

� lives in beautiful Potsdam and it took him a while to travel to Lugano ;)

That‘s me

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 3: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

4

There is No Agenda

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

� Lenovo and HPC

� LRZ SuperMUC Phase 1 Review

� LRZ SuperMUC Phase 2

– Overview and current status

– Technology used

– Advantages of Water Cooling Technology

Page 4: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

Who we are

Lenovo

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 5: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

6

� A $39 billion, Fortune 500 technology company

– Publicly listed/traded on the Hong Kong Stock Exchange

– 60,000+ employees serving clients in 160+ countries

� A global company

– Two headquarters in Raleigh, N.C and Beijing, China

– Major research centers in the U.S, Japan, China

– Manufacturing in U.S., China, India, Brazil, Mexico

� Invests in innovation

– Ranked as one of Top 25 most innovative companies

– #1 in worldwide PC market share

– #2 in worldwide PC & tablet market share

Who is Lenovo?

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 6: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

7

Lenovo Enterprise System Portfolio

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

1P & 2P Rack & Tower systems

High-end rack systems

Broad rack and tower portfolio to meet a wide range of client needs from

infrastructure to technical computing

4/8 socket enterprise-class x86 performance, resiliency, security

Integration across IBM assets in systems and SW for maximum client optimization

and value

Optimize space-constrained data centers with extreme performance and

energy efficiency

Converged/Blade systems

Dense systems

SOLUTIONS

AnalyticsCloud Technical Computing

System x

Page 7: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

8

HPC Storage Portfolio

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Home Grown + Legacy Lenovo + IBM OEM Offerings

Direct Attach

IBM Storwize

V3700/V7000 UnifiedGSS 24/26

GSS 21/22

Network Attach

Page 8: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

9

� a permanent HPC benchmarking and R&D center in Stuttgart, Germany

� Technology partners will gain access to a state-of-the-art benchmarking facility

� The center will provide HPC support to Lenovo partners worldwide

� Industry partners – Intel, IBM, Mellanox, NVIDIA

� Client partners - Leibniz-Rechenzentrum (LRZ), Science & Technology Facilities Council: Hartree Centre, Barcelona Supercomputing Centre (BSC), Cineca, Rechenzentrum Garching (RZG), Forschungszentrum Jülich(FZJ), Distributed Research utilizing Advanced Computing (DiRAC)

� ISV partners – ScaleMP, Allinea, PathScale

HPC Center Opening

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 9: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

Review

SuperMUC Phase 1

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 10: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

11

� 2 HPC AC presentations from Klaus Gottschalk

– 2011: SuperMUC HPC System

– 2013: LRZ SuperMUC – One Year of Operation

LRZ SuperMUC – Introduction

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 11: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

12

SuperMUC Phase 1 Review

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 12: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

13

3 PetaFLOP/s Peak Performance

� 9288 water cooled nodes with 2 Intel SB-EP

� 207 nodes with 4 Intel WSM-EX CPUs

� 324 TB Memory

� Mellanox InfiniBand FDR10 Interconnect

Large File Space for multiple purpose

� 10 PiB + 200GiB/s GPFS @ DDN SFA12k

� 2 PiB NAS Storage with 10GiByte/s

Innovative Technology for Energy Effective Computing

� Warm Water Cooling

� Energy Aware Scheduling with xCAT and LL

� Huge Monitoring instances

PUE ~1.15 through all season free-cooling capability

No GPGPUs or other Accelerator Technology

SuperMUC Phase 1 Review

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 13: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

14

How do 10k IB cables look like?

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 14: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

152015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

� 9288 IBM System x iDataPlex dx360 M4

– 43997256 Components

– 18576 Intel SandyBridge-EP

– 148608 CPU Cores

–8.08m² CMOS

– 74304 4GB DIMMs

� 11868 Optical Cables � 192640m

� 23736 QSFP Connectors

� 5040 3TB disks, RAID6

� Tubes and Hoses

– 690m Stainless Steel

– 4039m EPDM

– 34153m Copper

– 7.9m³ Water

� Mass: 194100 kg

� Hardware and 5year Maintenance,Energy, and Support: € 83000000

Page 15: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

Solution Overview

SuperMUC Phase 2

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 16: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

18

Heck! Moores Law?

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Adding ~3PFLOP/s, Direct Water Cooled

� Island Design almost equal to SuperMUC Phase I

� 6 Compute + 1 I/O islands

� 3096 compute nodes nx360 M5 WCT

� InfiniBand FDR14

� IO island with GPFS backend and management servers

$WORK parallel file system on GSS Technology

• Adding 6PB net capacity

• 100GB/s

$HOME File System

• Adding 2PB

Page 17: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

19

From Chip to System

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Chip~493 GF/s

Server2 Chips

986 GF/s

64GB RAM

Chassis12 Server~12 TF/s

768GB RAM

Domain8 Racks

516 Server

508 TF/s

33.024TB RAM

System6 Domains3096 Server

Page 18: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

20

3096 Compute Nodes

� Lenovo NeXtScale nx360M5 WCT

� 2 x Intel E5-2697 v3 2.6GHz 14c

� 64GB Memory

� Mellanox Connect-IB Single Port

� Diskless

� Direct Water Cooled

Compute Server

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 19: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

21

InfiniBand Concept

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

#1 #42.......

3

516

(compute nodes)516 516 516 516 516 IO Components

4:1

1:1

Page 20: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

22

InfiniBand Concept – Phase 1 + Phase 2

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

GPFS available everyhwhere via Multi-Homed GPFS

Thin compute

Edge Switch #2

18p leaf#8

18p leaf#36

18p leaf#1

n01 n516

1

18p leaf#7

Thin compute

Edge Switch #1918p leaf

#818p leaf

#36

18p leaf#1

18p leaf#7

I/O #1

Edge Switch #2018p leaf

#818p leaf

#18

18p leaf#1

18p leaf#7

Thin compute

Edge Switch #1

18p leaf#8

18p leaf#36

18p leaf#1

n01 n516

1

18p leaf#7

Thin compute

Edge Switch #6

18p leaf#8

18p leaf#36

18p leaf#1

18p leaf#7

I/O #2

Edge Switch #718p leaf

#818p leaf

#21

18p leaf#1

18p leaf#7

36p FDR #4236p FDR #136p FDR10 #1 36p FDR10 #126

13

36p FDR10 #18 36p FDR #6

SuperMUC Phase I

20 islands

(1+18+1)

SuperMUC Phase II

6+1 islands

Fabric #1 Fabric #2

Page 21: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

The Technology

SuperMUC Phase 2

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 22: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

24

Technology Overview - NeXtScale

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Compute

Chassis

Storage

Acceleration

Water Cooled Node

Standard Rack

Page 23: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

25

The Chassis

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

IBM NeXtScale n1200 WCT Enclosure

Fan and Power Controller

Front Viewshown with 12 compute nodes installed (6 trays)

Rear View

3x power supplies

3x power supplies

Rear fillers/EMC shields

Rear fillers/EMC shields

Syste

m infr

astr

uctu

reS

imple

arc

hitectu

re

Water Cool Chassis

� 6U Chassis, 6 bays

� Each bay houses a full wide, 2-node tray (12 nodes per 6U

chassis)

� Up to 6x 900W or 1300W power supplies N+N or N+1

configurations

� No fans except PSUs

� Fan and Power Controller

� Drip sensor, error LEDs, and web link for detecting water leaks

� No built in networking

Page 24: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

26

The Node

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Syste

m infr

astr

uctu

reS

imple

arc

hitectu

re

Water Cool Compute Node

� 2 compute nodes per full wide 1U tray

� Water circulated through

cooling tubes for component level cooling

� Dual socket Intel E5-2600 v3

processors (up to 165W)

� 16x DIMM slots (DDR4, 2133MHz)

� InfiniBand support:

� FDR: ConnectX-3 (ML2)

� FDR: Connect-IB (PCIe)

� QDR (PCIe)

� Onboard GbE NICs

nx360 M5 WCT Compute Tray (2 nodes)

Power, LEDsDual-port ML2

(IB/Ethernet)

Labeling tag

1 GbE ports

PCI slot for Infiniband

PCI slot for Connect IB

x16 ML2 slot

CPU with liquid cooled heatsink

16x DIMM slots

Cooling tubes

Water Outlet

x16 ML2 slot

PCI slot for Connect IB

PCI slot for Connect IB

Water Inlet

Page 25: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

27

How it works

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 26: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

Why Water Cool Technology

SuperMUC Phase 2

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 27: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

29

• Example: Linpack scores across a range of temperatures:

• 12 sample processors running on NeXtScaleSystem WCT

• Linpack scores remain mostly flat for junction temperatures in the range that water cooling operates.

• The Linpack scores drop significantly when junction temperature is in range that air cooling operates.

• Conclusion: Water Cooling enables the highest performance possible for each processor SKU at any water inlet temperature under 45°C

Temperature Impact on Processor Performance

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

915

935

955

975

995

1,015

20 30 40 50 60 70 80 90 100

Linp

ack

Scor

e (G

flops

)

Junction Temperature, Tj (C)

n101

n102

n104

n105

n106

n107

n108

n109

n110

n111

n112

Tj,max

18 °C Inlet

45 °C Inlet

Air Cooled

E5-2697 v3 145W Junction Temp vs Performance on NeXtScale WCT

Vinod Kamath

Page 28: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

30

• Linpack scores remain high and relatively stable for chip junction temperatures <60°C

�Achieved using <45°C inlet water

• Performance drops off significantly at higher junction temps

� Typical for Air Cool

Chip Junction vs Water Inlet Temperature – nx360 M5 WCT

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

n101 n102

Linpack Linpack

T_water_inlet Tj Score Tj Score

(°C) (°C) (Gflops) (°C) (Gflops)

8 28 934.5 27 948.4

18 35 935.2 33 948.3

24 41 934.7 42 948.1

35 54 931.9 52 946.3

45 60 926.7 60 944.7

55 68 921.3 73 938.5

55 1 75 918.9 79 936.1

Note: Typical flow is 0.5 lpm, 551 is 0.25 lpm/node Vinod Kamath

Page 29: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

31

� NeXtScale WCT can cool the Xeon E5-2698A processor due to greater thermal capabilities

• 165W, 2.8GHz, 16 cores

• Highest Linpack performance per nodeof any Xeon processor at 1.083 Tflops

• Highest Gflops/Watt of any Xeonprocessor at 2.45 GFlops/Watt

• Cannot be cooled by air

• Processor throttles at 65°C junction temperature (below air cool range)

• NeXtScale WCT cools 165W processors with inlet temperatures up to 35°C

• No chillers required

High Power Processor Support in WCT

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

850

900

950

1,000

1,050

1,100

1,150

20 30 40 50 60 70 80 90 100

Junction Temperature, Tj (C)

Lin

pack S

core

(G

flops)

n125

n126

n127

n128

n129

n130

Tj,max

18 °C

Inlet35 °C

Inlet

Air

Cooled

E5-2698A v3 Junction Temp vs Performance on NeXtScale WCT

2x E5-2698a v3 2x E5-2697 v3 2x E5-2690 v3

HPL (GF) 1083 907 813

Power (W) 441 402 385

Perf/Watt (MF/W) 2457 2256 2111

Vinod Kamath

Page 30: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

32

Socket Power

� Is relatively flat for junction temperatures in the range that water cooling operates (18-45°C)

� Increases significantly when junction temperature is in range that air cooling operates.

Result:

Save 5% power when cooling with

water up to 45°C

Lower Power Consumption with WCT

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

air

water = 45°C

water = 18°C

Power saving is about 5% per node

E5-2695 v2 CPU 115W

Vinod Kamath

Page 31: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

33

How Direct Water Cooling Makes Your Data Center More Efficient and Lowers Costs

� Chillers not required for most geographies

– Due to inlet water temperature of 18°C to 45°C

– Reduce CAPEX for new data centers

� 40% energy savings in datacenter due to no fans or chillers.

� Compute node power consumption reduced ~ 10% due to lower component temperatures (~5%) and no fans (~5%)

� Power Usage Effectiveness PTotal / PIT: PUE ~ 1.1 possible with NeXtScale WCT

– 1.1 PUE achieved at LRZ installation

– 1.5 PUE is typical of a very efficient air cooled datacenter.

� 85-90% Heat recovery is enabled by the compute node design

– Heat energy absorbed may be reutilized for heating buildings in the winter

– Energy Reuse Effectiveness (PTotal – PReuse) / PIT: ERE ~ 0.3

Hot Water Cooling

2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Vinod Kamath

Page 32: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

342015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 33: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

352015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 34: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

362015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth – [email protected]

Page 35: 2015 LENOVO INTERNAL. ALL RIGHTS RESERVED. Torsten Bloth ...

Recommended