Date post: | 11-Aug-2015 |
Category: |
Technology |
Upload: | insidehpc |
View: | 119 times |
Download: | 1 times |
PDC Center for High Performance Computing
Building Virtual Product Modeling for Scania and Managing Industry Partners
PDC Center for High Performance Computing
KTH PDC and Industry – joint projects
PDC is working with industrial researchers and developers on major international projects that push high-performance computing to the next level.
15-07-13 3 Stockholm
PDC Center for High Performance Computing
KTH PDC Resources
BESKOW – a Cray XC40 for scalable application ● 1.97 PF TPP and 105TB RAM from 1676 dual-socket nodes. ● Intel E5-2698v3 16 core Haswell CPUs - Cray Aries interconnect ● Largest HPC Resource located in the Nordics
15-07-13 4 Stockholm
PDC Center for High Performance Computing
KTH PDC Resources
TEGNER - Pre/Post Processing cluster The pre- and post- processing infrastructure aims to support users with complex workflows and with advanced access methods including graphics rendering and data exploration. • Mellanox EDR Infiniband interconnect (1:1 fat tree). • NVIDIA Quadro K420 for HW-assisted off-screen rendering • 9 nodes with NVIDIA Tesla K80 for GPU-enabled apps. • 4 nodes with Intel Xeon Phi 7120 • 55 thin nodes with two Intel-E2670v3 12-core Haswell CPUs and 512GB RAM. • Five 1TB RAM nodes (4 Intel E7-8857v2 Ivy Bridge CPUs) • Five 2TB RAM nodes (4 Intel E7-8857v2 Ivy Bridge CPUs)
15-07-13 5 Stockholm
PDC Center for High Performance Computing
KTH PDC Resources
Klemming – a site-wide high-performance file-system • A 5 PB Lustre filesystem supplied by DDN • Based on four DDN SF12KX acting as SRP targets over FDR Infiniband Point-to-Point links to 16 OSSs. The 16 OSSes acts as 4 redundancy groups. • Metadata is stored on a DDN EF3015 FC Raid over FC point-to-point links to a single MDS/MGS fail-over pair. • LNET routers housed in the target systems are used to connect the file system to the various resources at PDC. • An FDR Infiniband 1:1 fat tree fabric is used to connect the LNET routers to the Lustre servers. • 132 GB/s to Beskow (meassured) and 20 GB/s to Tegner (projected).
15-07-13 6 Stockholm
PDC Center for High Performance Computing
PDC Resource use for Partners
• KTH PDC can provide secure compute environment for Industry users in joint collaboration with PDC
– Tailored solution per customer for long term collaboration with strategic partners (Shared Investment)
• Reference Customer – Scania Group – Standard setup for short term needs based on PDC Standards but with secured nodes and shared data storage (pay-per-use)
15-07-13 7 Stockholm
PDC Center for High Performance Computing
To the point - Scania partnership
15-07-13 8 Stockholm
• Scania is one of the worlds leading manufacturers of heavy trucks. • Scania has in-house computational resources but PDC provides resources for elastic off-loading as well as the possibility of larger-scale simulations on primarily PDCs Cray XC40 Beskow. • KTH and Scania has a long-standing strategic partnership and the Scania-PDC collaboration is one aspect of this partnership.
PDC Center for High Performance Computing
Scania partnership - security
15-07-13 9 Stockholm
• Scania requires a higher level of verifiable confidentiality than most of PDCs academic users. • Lingering data and files (persistent state) is assumed to be the highest risk items in the current setup. • PDC currently employs only state-less exclusively scheduled computational nodes which makes the compute platform less of a problem in this respect – putting the focus on the high performance filesystem. • To accommodate for these requirements Scania has a reserved filesystem which mimics the setup of the main fileystem but on a smaller scale. • The design employs Infiniband partitioning to create a strong separation of data and also to restrict access to the file system only to those LNET routers which are supposed to reach it. • This allows sharing the Infiniband connecting the Lustre servers to the LNET routers. • The separation can be extended into the compute resource – depending on the availability of similar mechanism in its interconnect.
PDC Center for High Performance Computing
Scania file-system – technical aspects
15-07-13 10 Stockholm
• Self-encrypting black-hole warranty drives in a dedicated DDN SFA7700 SRP solution connected with Infiniband (point-to-point) to a dedicated fail-over OSS server pair (for file content) • Dedicated meta-data storage in a DDN EF3015 system connected with FC (point-to-point) to a dedicated fail-over MDS server pair (for file and file-system meta-data). • OSSs and MDSs are connected to a pair of restricted partitions on the shared Infiniband fabric. • LNET routers are used to export the file-system to systems where it is supposed to be used. • Only routers allowed into the restricted Inifiniband partition can access the OSS/MDS. Hence the file-system is not accessible from systems not intended to mount the file-system.