Date post: | 16-Apr-2017 |
Category: |
Technology |
Upload: | inside-bigdatacom |
View: | 290 times |
Download: | 0 times |
DDN$Users’$Group$Mee?ng$$Supercompu?ng$2016$
Salt$Lake$City,$15$November$2016$
From&Singapore&to&Warsaw:(InfiniCortex&&
and(the&Renaissance&in&Polish&
Supercompu?ng$Marek$Michalewicz$
Interdisciplinary$Centre$for$Mathema?cal$and$Computa?onal$Modelling$(ICM),$University$of$Warsaw,$Poland$$Ins?tute$for$Advanced$Computa?onal$Science,$Stony$Brook$University,$
USA$A*STAR$Computa?onal$Resource$Centre,$Singapore$
2
Level(17(at (Fusionopolis$
A*CRC(Datacenter(1$
Singapore(
3
A*CRC(Datacenter(2(
Matrix(Building(at(Biopolis$
Mellanox$Metro]]]X$tes?ng$since$early$2013$$goal:$to$connect$HPC$resources$at$Fusionopolis$$with$storage$and$genomics$pipeline$at$Biopolis$
A*CRC $Metro]]]X$tes?ng$team:$$Stephen$Wong$Tay$Teck$Wee$$Steven$Chew$
NOT(GRID!(
NOT(CLOUD!(
NOT(“Internet”!(
InfiniCortex&is&…(
InfiniCortex&demo&at&SC16:&booth&501$
6
InfiniCortex(is(like(a(living(global(brain(
The$InfiniCortex(uses$a$metaphor$of$a$human$brain’s$outer$layer,$the$Cortex,$consis?ng$of$$highly$connected$and$dense$network$of$neurons$enabling$thinking$….$
to$deliver$concurrent(supercompu?ng$across(the(globe(u?lising$trans]]]con?nental$InfiniBand(and$Galaxy(of(Supercomputers$
InfiniCortex(Components(
1.(Galaxy(of(Supercomputers$
•$
•$
Supercomputer$interconnect$topology$work$$by$Y.$Deng,$M.$Michalewicz$and$L.$Orlowski$Obsidian$Strategics$Crossbow$InfiniBand$router$
2.(ACA(100(&(ACE(10$
•$
•$
Asia$Connects$America$100$Gbps,$by$November$2014$$Asia$Connects$Europe$10Gbps,$established$February$2015$
3.(InfiniBand(over(trans111con?nental(distances$
•$ Using$Obsidian$Strategics$Longbow$range$extenders$
4.(Applica?on(layer$
•$
•$
from$simplest$file$transfer:$dsync+$to$complex$workflows:$ADIOS,$mul?]]]scale$models$
InfiniCortex(Team(
With$help$from:$SingAREN$A/Prof(Francis(Lee((
Prof(Lawrence(Wong$NTU$Stanley(Goh$
A*CRC$Tan(Geok(Lian((Networking)((
Lim(Seng((Networking)$Dr(Jonathan(Low((H/W,(S/W,(Applica?ons)((
Dr(Gabriel(Noaje((S/W,(Applica?ons)((
Lukasz(Orlowski((S/W,(Applica?ons)$Dr(Dominic(Chien((S/W,(Applica?ons)((
Dr(Liou(Sing111Wu((S/W,(Applica?ons)$
Yves(Poppe,((
Interna?onal(connec?vity$Prof (Yuefan(Deng$Dr(Marek(Michalewicz((PI) (A/Prof(Tan(Tin(Wee((PI)$ Dr(David(Southwell$
(most)(Project(Partners(20141112016(
Huawei, HPE, Fujitsu, Aspera, Bright Cluster, Altair, ByteScale, Arista FermiLab, George Washington University
Team Europe GEANT, TEIN,
France: University of Reims, Poland: PSNC, ICM
10
TITECH (Tokyo)
A*STAR (Singapore)
NCI (Canberra)
Seattle
SC14 (New Orleans)
GA Tech (Atlanta)
10Gbps InfiniBand 100Gbps InfiniBand
Enabling$geographically$dispersed$HPC$facili?es$to$collaborate$and$func?on$as$ONE$concurrent$supercomputer,$$bringing$the$capability$to$address$and$solve$grand$challenges$to$the$next$level$of$efficiency$and$scale.$
InfiniCortex(2014((phase(1)(
100Gbps(Bandwidth(U?liza?on(
12
10Gbps InfiniBand 100Gbps InfiniBand
100Gbps$InfiniBand$East]]]ward$link:$Singapore]]]trans]]]Pacific]]]USA]]]trans]]]Atlan?c]]]Europe$$10Gbps$InfiniBand$West]]]ward$link:$Singapore]]]Europe$(via$ TEIN*CC)$
InfiniCortex(2015(InfiniBand(ring111around(the(World(
TITECH (Tokyo)
NCI (Canberra)
Seattle
GA Tech (Atlanta) SC15
(Austin,TX)
PSNC (Poznan)
A*STAR (Singapore)
ANA200/Internet2/GEANT
ESnet/Internet2
100Gbps shared Internet 2-SingAREN URCA
(Reims)
NCI (Canberra)
SC15 (Austin)
GA TECH (Atlanta)
SBU (New York)
A*CRC (Singapore)
PSNC (Poznań)
URCA (Reims)
InfiniCloud(2015(
True(HPC(Cloud(around(thUAlbeerta
(EdmontoGn)
lobe$
InfiniCortex(demo,(SC15,(Aus?n,(TX,(USA(
7(InfiniBand(sub111nets$7(countries:(Singapore,$USA,$Australia,$Japan,$$Poland,$France,$Canada$100Gbps(Singapore111Aus?n$1011130Gbps(rest(of(network$~15(Universi?es(and(Research(en??es$~40(partners(and(growing$HPC(InfiniCloud(over(4(con?nents$
Science, Technology and Research Network (STAR-N) connects all National Supercomputing Centre stakeholders: A*STAR, NUS, NTU and Industrial users with 100Gbps + InfiniBand links.
NUS
NTU
A*STAR Fusionopolis
SingAREN Global Switch
A*STAR Biopolis
WOODLANDS
500Gbps Infinera Cloud Express
100Gbps InfiniBand
10/40/100Gbps InfiniBand/IP
SELETAR
CHANGI
NOVENA
OUTRAM
ONE-NORTH
JURONG
Singapore(InfiniBand(connec?vity(
•$
•$
•$
•$
•$
A$high$bandwidth$network$$to$connect$the$distributed$$login$nodes$Provide$high$speed$access$$to$users$(both$public$and$$private)$anywhere((Support$transfer$of$large$$data]]]sets$(both$locally$and$$interna?onally)$Builds$local(and((interna?onal(network((
connec?vity((Internet$2,$$TEIN*CC)$ASEAN,(USA(Europe,((
Australia,(Japan,(Middle((
East$
Genomic(Ins?tute(of(Singapore(111(Na?onal(Supercompu?ng(Centre((
(GIS111NSCC)(Integra?on(
NGSP Sequencers at B2 (Illumina + PacBio)
NSCC Gateway
GIS
NSCC
STEP 2: Automated pipeline analysis once sequencing completes. Processed data resides in NSCC
500Gbps Primary link
Data Manager
STEP 3: Data manager indexes and annotates processed data. Replicate metadata to GIS. Allowing data to be search ed and retrieved
Data Manager
Compute Tiered Storage
POLARIS, Genotyping and other Platforms in L4~L8
Tiered Storage
STEP 1: Sequencers stream directly to NSCC Storage (NO footprint in GIS)
Compute
1 Gbps per sequencer
10 Gbps
1 Gbps per machine
100 Gbps
10 Gbps
GIS DC (Biopolis)
HPC
Storage (Isilon)
Longbow C400 6
A-CWDM81 7
IB EDR/ FDR Switch
8
Longbow C400 6
A-CWDM-81 7
IB EDR/
Matrix DC (Biopolis)
100GE Switch 18
Mellanox MTX 6100
16
Network Room (Fusionopolis)
NSCC (Fusionopolis)
Storage System
Large Memory Nodes
8 x 10Gbps (80Gbps)
Infinera CX-1003
Arista 100G Core
Switch 4
Infinera CX-100 3
Longbow C400 6
Longbow C400 6
Exanet 100G Core
Switch 5 40GE Switch 9
Exanet 100G Core
Switch 5
40GE Switch 9
HPC
Storage
A*CRC (Fusionopolis)
HPC
NTU DR Site
BMRC Research Institutes
SERC Research Institutes
Biopolis Fusionopolis 1.18Tb
NTU
100GE Switch 18
100GE Switch 18
Mellanox MTX 6100
16
200G Transponder
19
400Gbps 15
240Gbps used by MTX ( + 160Gbps spare)
GIS IP Network
Sequencers
1GE
500Gbps 2
10
10
11 11
10 10
11
11 FDR Switch 8
11
11
13 13 13 12 13
14
11 11
13 13 13 13 12
13
Arista 100G 13
Core Switch4
13
13
21 11
11
13
13
17 17 17
1 1
14
1$–$Inter]rack$fibers$(To$be$procured)$
2$]$Available$dark$fibres$
3$]$Infinera$CX]100$are$500Gbps$DWDM$switches$for$$mul?plexing$5$x$100GE$of$total$capacity$over$a$single$dark$$fibre.$
4$]$Arista$100Gbps$Ethernet$Switch$for$Core$Backbone$
5$]$Exanet$100Gbps$Core$Switches$using$Cisco$Nexus$switches.$
6$ ]$ Obsidian$ Longbow$ C400$ InfiniBand$ Range$ Extender$switch.$ $This$allows$combined$capacity$of$40Gbps$of$na?ve$InfiniBand$ $ connec?vity$ over$ a$ distance$ of$ 10km]40km,$depending$on$the$$type$of$transceivers$used.$
7$]$A]CWDM81$(Coarse$Wavelength$Division$Mul?plexing)$]$$performs$op?cal$mul?plex/demul?plex$func?ons$neccessary$to$$carry$two$4$x$QDR$range]extended$InfiniBand$links$(as$well$as$a$$bonus$10G$Ethernet$or$Fiber$Channel$circuit)$over$a$single$fiber$$pair$across$a$campus$or$metro$area$network.$
8$]$InfiniBand$EDR/FDR$Switch$
9$]$Exanet$10/40Gbps$Ethernet$switch$
10$]$4$x$10Gbps$(40Gbps)$InfiniBand$link$
11$]$40Gbps$QDR$link$
12$]$9$x$10Gbps$ethernet$links$
13$]$100Gbps$ethernet$links$
14$]$10Gbps$ethernet$link$
15]$400Gbps$combined$capacity$$over$a$single$dark$fibre$
16$]$Mellanox$MTX$6100$InfiniBand$Switch$with$up$to$$240Gbps$of$InfiniBand$capacity$$over$6$pairs$of$dark$fibres$
2 ROADM 88ch 19 200G Transponder 19
ROADM 88ch 19
20
20
20
20
2
100Gbps 13
17$]$2$x$40Gbps$ethernet$link$
18]$100GE$edge$switch$(To$be$procured)$
19$–$Packetlight$DWDM$Switches$200G$Transponder$–$Op?cal$Network$$Transport$(OTN)$Switches$
ROADM$–$Reconfigurable$Op?cal$Add/Drop$Mul?plexer$
20 –$10GE$Ethernet$
21 –$100G$EDR$link$
2
13 13
A
B
C 100Gbps 13
D
PSNC:(1$
Cyfronet:(2.4$
WCNS:(1$
ICM:(1.3$
TASK:(1$
PIONIER(Academic(Network((
Consor?um( coordinated( by((
PSNC,(Poznan$
7,500(km(own(fiber$
Five(Academic(Supercompu?ng((
Centers,(combined(~6.7(PFLOPS$
Polish(Academic(Supercompu?ng(and(Networking(Landscape,(2016(
Poland$
6
5 4
3 2
1
7
0
199
5
199
6
199
7
199
8
199
9
200
0
200
1
200
2
200
3
200
4
200
5
200
6
200
7
200
8
200
9
201
0
201
1
201
2
201
3
201
4
201
5
20
16
Number(of(Polish(HPC(systems(in(Top500(list(
100
200
300
400
500
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
479
408
467
430
68
138
85 88 106
145
232 211
59 38
Posi?on(of(the(top(Polish(HPC(system(at(Top500(list((
(lower(is(bejer)(
1
100
10
1 000
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Rmax(of(the(top(Polish(HPC(system(at(Top500(list(
10 000 000
PFLOPS 1 000 000
100 000
10 000
TFLOPS
GFLOPS
Polish(HPC(systems(in(Top500(list,(June(2016(
Poznan(Supercompu?ng(and((
Networking(Centre$
SC16:(booth(501$
CYFRONET,(Krakow$
www.icm.edu.pl
Interdisciplinary Centre for Mathematical and Computational Modelling
University of Warsaw, Poland
www.icm.edu.pl
ICM(as(an(interdisciplinary(centre((
METEOROLOGICAL(ACTIVITY(
• Academic$(University$of$Warsaw)$service.$• Meteorological$ac?vity$at$ICM$UW$since$1996.$• Results$are$publicly$available:$–hyp://www.meteo.pl$–hyp://maps.meteo.pl$
• Three$independent$NWP$models:$–UM$(Unified$Model)$UK$Met$Office,$–WRF$(Weather$Researcg$and$Forecas?ng$Model),$– $COAMPS$(Coupled$Ocean/Atmosphere$Mesoscale$Predic?on$System).$
• Forecasts$are$used$by$public$sector,$scien?sts$and$commercial$users.$• Flexible$approach$to$meet$sophis?cated$requirements.$
www.icm.edu.pl
Statistics of the main service for 2016
ICM(as(an(interdisciplinary(centre((
METEOROLOGICAL(ACTIVITY(
www.icm.edu.pl
UM (Unified Model):
– spatial resolution: 4 km and 1.5 km, length: up to 72 hours.
ICM(as(an(interdisciplinary(centre((
METEOROLOGICAL(ACTIVITY(
www.icm.edu.pl
WRF (Weather Research and Forecasting Model) – spatial resolution: 3.4 km, length: up to 120 hours.
ICM(as(an(interdisciplinary(centre((
METEOROLOGICAL(ACTIVITY(
www.icm.edu.pl
COAMPS (Coupled Ocean/Atm. Mesoscale Pred. System) – spatial resolution: 13 km, length: up to 108 hours.
ICM(as(an(interdisciplinary(centre((
METEOROLOGICAL(ACTIVITY(
Storage Requirements from RFP
6PB 150GB/s**
500GB/s Burst Buffer
2PB 100GB/s
2PB 100GB/s
2PB 100GB/s
4PB 150GB/s**
500GB/s Burst Buffer
2PB 150GB/s
500GB/s Burst Buffer
3PB 50GB/s
3PB 50GB/s
3PB 50GB/s
5PB 20TB/h
5PB 20TB/h
5PB 20TB/h
TIER 0 Scratch FS
TIER 1 Home FS
TIER 2 Nearline
TIER 3 Archive
1PF Config 2PF Config 3PF Config
HSM
Singapore(
1PF$Compute$Cluster$
265TB$Burst$Buffer$with$DDN$Infinite$Memory$$Engine$(IME)$at$500$GB/s$
DDN$EXAScaler$(Lustre)$For$Scratch$$4PB$at$200$GB/s$performance$
EDR$Infiniband$N/w$
GS]]]WOS$Bridge$
PFS$Stats$Collec?on$$&$Monitoring$
DDN$GRIDScaler$For$Home$&$Nearline$$4PB$at$100$GB/s$performance$
WOS$over$$10GbE$
NAS$Gateways$&$Data$$Transfer$Nodes$
MetroX$
5PB$DDN$WOS$Object$$Storage$Archive$
Remote$Login$$Nodes$at$NUS$
MetroX$
Remote$Login$$Nodes$at$NTU$
NSCC(/(A*STAR(End111To111End((Storage(Architecture(
~550GB/s(Read,(Write$ ~50(Million(IOPs$
IOR(File111per111Process((GB/s)$560$000$ 420$000$ 280$000$ 140$000$
0$Write $Read$
1$
10$
100$
1$000$
10$000$
100$000$
1$000$000$
10$000$000$
4k(Random(IOPS$100$000$000$
Write$ Read$
Rack Performance: IME
ICM Lustre for OKEANOS (CRAY XC40) schematics
Poland(
(
(
(
(
(
(
(
5x$DDN$SF12KA$Head$(HA)$25x$DDN$SS8460$Disk$Shelf$2100$6TB$Disks$(HGST$He8)$=$total$raw$12.6PB$$RAID6$8+2$–$usable$space$10PB$25x$OSS$with$2x$dual$port$Mellanox$FDR$HCA$$2x$MDS$+$Netapp$E2700$for$metadata$10x$Mellanox$6025$InfiniBand$36$port$FDR$Switch$$Full$Fat$Tree$IB$topology$20x$CRAY$LNET$Router$(STRIO)$with$dual$port$FDR$HCA$
Configuration Details
Benchmark Results
IOR]]]2.10.3:$MPI$Coordinated$Test$of$Parallel$I/O$Command: $cray]]]ior$]]]w$$$]]]t$$$4m$]]]b$$$512g$]]]F$$$]]]k$$$]]]E$$$]]]D$$$240$]]]i$$$$1$]]]o $/ddn/5sfa/ior_tesâile$
blocksize=$512$GiB $filesize$=$105$TiB$clients$=$210$(1$per$node) $$xfersize$=$4$MiB$$Max$Write:$152.0(GB/sec ((141.6$GiB/sec)$Max$(Ops): $114$592$(write)$
Command:$/ddn/cray/cray]]]ior$]]]r$]]]t$4m$]]]b$512g$]]]F$]]]k$]]]E$]]]D$240$]]]i$1$]]]o$/ddn/5sfa/ior_tesâile$$Max$Read:(181.9(GB/sec((168.5$MiB/sec)$Max$(Ops):$114$589$(read)$