+ All Categories
Home > Documents > A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI...

A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI...

Date post: 22-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
37
© Atos A fully configurable HPC web portal for managing Slurm jobs Slurm User Group SLUG’19 Salt Lake City, USA - September 18, 2019 Patrice Calegari
Transcript
Page 1: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

© Atos

A fully configurable HPC web

portal for managing Slurm jobs

Slurm User Group SLUG’19Salt Lake City, USA - September 18, 2019

Patrice Calegari

Page 2: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software

We will talk about…

Context of the projects

XCS - eXtreme factory Computing Studio

BEM - Bull Efficiency Manager

Conclusion and future work

2

Page 3: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

Context of the projects

Page 4: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software

▶ Our division, Atos BDS (Big Data & Security) is in charge of developing supercomputing hardware and middleware.

▶ Our domains of interests: HPC, AI and Quantum simulations.

▶ User experience (UX) is extremely important

▶ Security is critical in all our activities (and those of our clients)

▶ We contribute to Slurm community and integrate Slurm in our HPC stack for more than 10 years

Bull/Atos HPC & AI Software R&D

4

Page 5: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

XCSeXtreme factory Computing Studio

Page 6: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software

▶ Modular HPC, AI & Quantum portal

– as-a-Service cornerstone application,

– supports Slurm (and other schedulers)

– Role Based Access Control (RBAC)

– supports AD, LDAP (with Kerberos)

– XCS = REST API service + GUI

▶ Fully customizable user interface

– Responsive Web Design (RWD) GUI

– Single Page Application (SPA) with configurable dashboards: layout, components, languages, themes

6

Extreme factory Computing Studio v3 (XCS3)Introduction

Latest release: XCS 3.8.0 (April 5, 2019)

Page 7: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 7

XCS REST API https://public.extremefactory.com/demo/api/doc/api-full.html

Page 8: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 8

XCS REST API https://public.extremefactory.com/demo/app/api/doc/api-full.html

Page 9: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 9

XCS REST API https://public.extremefactory.com/demo/app/api/doc/api-full.html

Page 10: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 10

XCS user dashboardExample 1: 8 components

Page 11: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 11

XCS user dashboardExample 2: 1 component

Page 12: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 12

XCS user dashboardExample 3: 6 components with edited theme

Page 13: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 13

XCS dashboard main menuimport/export dashboards

Page 14: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 14

XCS dashboard main menuREST API documentation

Page 15: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 15

XCS Fundamental conceptsKey software product for HPCaaS solutions

Give users and admins access to resources through web services

• Use of a GUI in a web browser that relies on a REST API

Be compatible with « all possible » environments

• Software, frameworks, middleware

Never be intrusive

• The solution should be used in existing environments without modifying them

Keep all the intelligence in the REST API server

• The goal of the GUI is only to be the HMI (Human Machine Interface)

Page 16: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software

XCS architecturecurrent v3

XCS GUIweb server• Dashboards• Web Design

16

HTTPS

HPC clusterintegration layer

• Slurm• HPC applications

SSH

Directory service

XCSData base

HTTPS

XCS web User Interface

XCS DCs

Job submissionDC

Data mngmtDC

DC A

PI

DC = Dashboard Component

XCSREST API web

server

Security service

Page 17: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software

Slurm job submission workflow with XCS

17

sbatch … Appli.sh $arg1 …

Page 18: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 18

XCS application administrator dashboardHPC application general information

Page 19: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 19

XCS application administrator dashboardHPC application form definition

Page 20: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

BEM Bull Efficiency Manager

Page 21: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software

▶ Slurm has been enhanced by Bull/Atos to provide additional functionality including topology-aware resource allocation and advanced placement policies,

▶ Bull Efficiency Manager (BEM) is the web application running upon the Slurm workload manager to show cluster details interactively,

▶ BEM dashboards show information in graphs and tables for both current and previous archived data about cluster resources.

21

Bull Efficiency Manager (BEM)Introduction

Page 22: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software

XCS architecturecurrent v3

XCS GUIweb server• Dashboards• Web Design

22

HTTPS

HPC clusterintegration layer

• Slurm

SSH

Directory service

BEMData base

BEMREST API web

server

Security service

BEM DCs

Switch Topology DC

Slurm usage history DC

DC A

PI

DC = Dashboard Component

BEM web User Interface

HTTPS

Page 23: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 23

BEMLogin Page

Page 24: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 24

BEMCurrent resource usage 1/3

Page 25: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 25

BEMCurrent resource usage 2/3

Page 26: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 26

BEMCurrent resource usage 3/3

Page 27: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 27

BEMHistorical resource usage

Page 28: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 28

BEMTopology resource allocation 1/3

Page 29: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 29

BEMCurrent resource usage 2/3

Page 30: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software 30

BEMCurrent resource usage 3/3

Page 31: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

Conclusion & Future Work

Page 32: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software

▶ XCS is successfully used in production on many sites for several years and it evolves continuously

▶ BEM is still under development and the first Minimal Viable Product (MVP) is very promising

▶ Mobile devices are becoming a new standard way for doing “everything”, so such a web portal approach will soon be mandatory for new users (unexperienced users, young scientist of the new generation, non-technical managers, etc.)

Conclusions

32

Page 33: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software

▶ Unify both interfaces (XCS & BEM) and share a unique security service

▶ Add new features to administrate Slurm

▶ We develop a new web portal framework to federate all our HPC, AI & Quantum tools/microservices. It is an evolution of our current XCS solution with:

– a generic web GUI framework

– a security service (with flexible identity, authentication with SSO and authorization management).

– global services (reverse proxy, gateway, discovery service, etc.)

On going and future work

33

Page 34: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software

XCS and BEM architectureComplete solution to be developed in 2020

Unified GUIweb server• Dashboards• Web Design

34

HTTPS

SSH

HTTPS

HTTPS

HPC clusterintegration layer

• Slurm• HPC applications

SSH

Directory service

XCSData base

XCSREST API web

server

Security service

HTTPS

BEMREST API web

server

BEMintegration layer

• Slurm

BEMData base

NEW unified web User Interface

XCS DCs

Job submissionDC

Data mngmtDC

DC

AP

IBEM DCs

Slurm usage history DC

Switch Topology DC

DC

AP

I

DC = Dashboard Component

Page 35: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software

XCS and Slurm native REST service architecturePossible evolution…

Unified GUIweb server• Dashboards• Web Design

35

HTTPS

HTTPS

HTTPS

HPC clusterintegration layer

• Slurm• HPC applications

SSH

Directory service

XCSData base

XCSREST API web

server

Security service

HTTPS

Slurm REST API

deamonslurm.restd

Slurm server

SlurmData base

NEW unified web User Interface

XCS DCs

Job submissionDC

Data mngmtDC

DC

AP

ISlurm DCs

Slurm admin specific DC

Slurm job specific DC

DC

AP

I

DC = Dashboard Component

Page 36: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

Atos, the Atos logo, Atos Syntel, Unify, and Worldline are registered trademarks of the Atos group. May 2019. © 2019 Atos. Confidential information owned by Atos, to be used by the recipient only. This document, or any part of it, may not be reproduced, copied, circulated and/or distributed nor quoted without prior written approval from Atos.

Thank youFor more information please contact:Mathis Clayer for Slurm topics ([email protected])Patrice Calegari for GUI topics ([email protected])

Page 37: A fully configurable HPC web portal for managing Slurm jobs · Our domains of interests: HPC, AI and Quantum simulations. User experience (UX) is extremely important Security is critical

| 18-09-2019 | Patrice Calegari | © Atos HPC & AI R&D Software

▶ Web Portals for High-performance Computing: A Survey

– 36 page journal paper published by ACM

– https://dl.acm.org/citation.cfm?id=3197385

▶ Democratization of HPC through the Use of Web Portals: Different Strategies

– Panel at SC’19 in Denver, November 20th, 3:30pm-5pm

– https://sc19.supercomputing.org/presentation/?id=pan102&sess=sess223

37

More on HPC web portals


Recommended