+ All Categories
Home > Documents > Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC...

Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC...

Date post: 03-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
56
Cray Urika-XC Support of Analytics workflows on Shaheen HPC 101 Shaheen II Training Workshop Dr Samuel Kortas Computational Scientist KAUST Supercomputing Laboratory [email protected] 31 January 2019
Transcript
Page 1: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Cray Urika-XCSupport of Analytics

workflows on Shaheen

HPC 101 Shaheen II Training Workshop

Dr Samuel KortasComputational Scientist

KAUST Supercomputing [email protected]

31 January 2019

Page 2: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Outline

Current status What is Urika-XC Analytics slack? Some more detailed use Case. Q/A

Page 3: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Current statusUrika-XC Analytics slack has been installed since November 2018 and a few users already used it succesfully.

Urika-XC is a mature and stable environment provided by Cray.

Other modules are also directly available on Shaheen (eg tensorflow/1.8) and we are learning to build customized solution.

We are currently developping a web portal to ease the access to these resources

Live, regularly updated documentation is avalaible at http://hpc.kaust.edu.sa/analytics

Page 4: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Outline

Current status What is Urika-XC Analytics stack? Some more detailed use Case. Q/A

Page 5: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Idea: using Shaheen’s resource for other workflow than classical numerical simulation

© CRAY, courtesy of Dr James D. Maltby

Page 6: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

The 2 components of Urika-XC

© CRAY, courtesy of Dr James D. Maltby

Page 7: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Urika-XC packages

© CRAY, courtesy of Dr James D. Maltby

Page 8: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Outline

Current status What is Urika-XC Analytics slack? Some more detailed use Cases.

● A wide variety of open source software available● A secured container technology● Setting a User Interface (Jupyter Notebook)

Page 9: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Software Available

© CRAY, courtesy of Dr James D. Maltby

Page 10: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

The power of containerswith a magical Cray Sauce

© CRAY, courtesy of Dr James D. Maltby

Page 11: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Interactivesession

only!

The power of containerswith a magical Cray Sauce

UseregularConda!

© CRAY, courtesy of Dr James D. Maltby

Page 12: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Interactivesession

only!

The power of containerswith a magical Cray Sauce

Tuned byCray for

XC !

© CRAY, courtesy of Dr James D. Maltby

Page 13: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Interactivesession

only!

The power of containerswith a magical Cray Sauce

= Dockersecured

for HPC !

© CRAY, courtesy of Dr James D. Maltby

Page 14: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

What is a container?

● Package Software into Standardized Units for Development, Shipment and Deployment

– Standard

– Lightweight

– Secure

● Docker Containers Are Everywhere: Linux, Windows, Data center, Cloud, Serverless, etc.

(definition given at http://docker.com)

● 10 year-old technology: LXC (linux containers), FreeBSD Jails, AIX Workload partition, Solaris Containers have been around but…

● Popularized with docker.com with a huge repository of images available…. With > 3.5 Millions dockerized applications at http://hub.docker.com

Page 15: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

What is a container?

With containers, you are not trapped anymore by the OS and Software environment of Shaheen….

Download an existing container from hub.docker.com.

Modify it, develop on your workstation, laptop, and deploy immediately to Shaheen, IBEX, Amazon WS….

→ the overhead (in memory and performance) is very low compared to virtualization

Page 16: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

But… Docker is not well suited for HPC world...

●Security issue: easy to become root, easy to browse any part of a filesystem

●Performance issue: Docker is hardwareagnostic… How to perform nice on an HPCsystems with a tuned network? How to detect with GPU, CPU you’re running on?

●License issue: Docker license or businessmodel not that clear makes it risky to baseany Open Science Project on it

Page 17: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Alternative solutions have been designed

Both Singularity (from LBL) and Shifter (from NERSC) can use Docker containers.

Although container can be tweaked on your machine with all permission, any attempt to become root or read unauthorized filesystem is denied from a container run via shifter

A part of Cray analytics software stack is a single Shifter container tuned by Cray to perform at its best on XC

Page 18: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

How it works?Where it happens?

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre

Login nodes Gateway2

File system

Page 19: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

How it works?Where it happens?

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre

Login nodes Gateway2

File system

Urika-XCanalyticsavailablefrom...

Urika-XCanalyticsavailable

from...

Page 20: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

So how it works?Which commands? Where it happens?

● Cray Urika-XC Analytics software stack is exclusively available from Shaheen gateway2. Every interaction operation has then to be launched from there.

ssh gateway2

● Once the gateway, the following modules are now available:– analytics: providing Spark, Anaconda python and R environment,

Tensorflow or Jupyter notebooks,– shifter: providing an HPC tuned and secure support of docker

images– cge: enabling Cray Graph Engine

Page 21: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

WARNING!!!!● Gateway2 is an essential component of Shaheen II, used by

SLURM to launch and schedules its jobs….

● DON’T run any code on gateway2. Only use gateway2 to– Launch an interactive SLURM session with salloc– Launch training with run_training– Forward ports any user interface running on node– If building your own conda environment, prefer

conda create -p /project/userxxx/.conda/env/my_env

Page 22: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Shaheen’s node and file system environment...

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre

Login nodes Gateway2

File system

Page 23: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Shaheen’s node and file system environment...

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre

Login nodes Gateway2

File system

Page 24: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Shaheen’s node and file system environment...

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre/lustre/project

/lustre/scratch

Login nodes Gateway2

File system

Page 25: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Shaheen’s node and file system environment...

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre/lustre/project ← /project

/lustre/scratch ←/scratch

Login nodes Gateway2

File system

Page 26: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Shaheen’s node and file system environment...

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre/lustre/project ← /project

/lustre/scratch ←/scratch

Login nodes Gateway2

File system

Page 27: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Shaheen’s node and file system environment...

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre/lustre/project ← /project

/lustre/scratch ←/scratch

Login nodes Gateway2

File system

Page 28: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Shaheen’s node and file system environment...

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre/lustre/project ← /project

/lustre/scratch ←/scratch

Login nodes Gateway2

File system

Page 29: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Shaheen’s node and file system environment...

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre/lustre/project ← /project

/lustre/scratch ←/scratch/lustre/scratch/userxxx ←/home/userxxxx

Login nodes Gateway2

File system

Page 30: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Shaheen’s node and file system environment...

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre/lustre/project ← /project

/lustre/scratch ←/scratch/lustre/scratch/userxxx ←/home/userxxxx

Login nodes Gateway2

File system

Page 31: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Shaheen’s node and file system environment...

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre/lustre/project ← /project

/lustre/scratch ←/scratch/lustre/scratch/userxxx ←/home/userxxxx

Login nodes Gateway2

File system

Urika-XC stackis available froma gateway and a

computer node only

Urika-XC stackis available froma gateway and a

computer node onlyUrika-XC stackis available froma gateway and a

computer node only

Page 32: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Shaheen’s node and file system environment...

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre/lustre/project ← /project

/lustre/scratch ←/scratch/lustre/scratch/userxxx ←/home/userxxxx

Login nodes Gateway2

File system

Urika-XC stackis available froma gateway and a

computer node only

Urika-XC stackis available froma gateway and a

computer node only

Urika-XC stackis available froma gateway and a

computer node only

Every configuration files,conda environmentusually lives in your

/home directory

keep in mind, your homeis

/scratch/userxxxx !

Page 33: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

You’re almost set!

● As your /home directory is in reality on /scratch, every file untouched during the last 60 days…

→ prefer building any valuable environment you wish to keep (conda environment, Notebooks...) in your project directory and make a symbolic link to your /scratch/$USER

mkdir -p /project/kxxxx/$USER/NOTEBOOKS

mkdir -p /project/kxxxx/$USER/.conda

cd /scratch/$USERln -s /project/kxxxx/$USER/NOTEBOOKS /project/kxxxx/$USER/.conda

Page 34: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

You’re almost set!

● In order to use the tools in a seamless manner, you also need to set up a passwordless ssh connection between nodes scheduled.

→ set password less ssh private and public keys as explained at

https://www.hpc.kaust.edu.sa/analytics/ssh_keys

Page 35: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

So how it works?Which commands? Where it happens?

● Go to gateway2 cdl1% ssh gateway2

● Load the modules: gateway2% module load analytics

● Book the nodes gateway2% salloc -N 3 (at least 3 are needed for spark, dask, tensorflow distributed)

● Start the environment gateway2% start_analytics

● Or do the last 2 steps in one

gateway2% salloc -N 3 start_analytics

Page 36: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

So how it works?Which commands? Where it happens?

Page 37: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

So how it works?Which commands? Where it happens?

Page 38: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

So how it works?Which commands? Where it happens?

Page 39: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

You’re all set!

● Spark, conda, R are all available and installed for you

● Do not hesitate to create your own conda environment and add conda packages if needed

conda create -p /scratch/userxxx/.conda/my_env

conda activate /scratch/userxxx/.conda/my_env

conda install pyspark

Page 40: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Why using R from Urika XC ?

Alreadytuned for XC

by CRAY© CRAY, courtesy of Dr James D. Maltby

Page 41: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Jupyter notebooks anddashboards made available

Interactivesession

only!© CRAY, courtesy of Dr James D. Maltby

+ Jupyterlab

Page 42: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

How to reach your User Interface Jupyter? Tensorboard?

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre

Login nodes Gateway2

File system

Only cdls are visible fromyour laptop

Page 43: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre

Login nodes Gateway2

File system

Only cdls are visible fromyour laptop

How toreach there?

How to reach your User Interface Jupyter? Tensorboard?

Page 44: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre

Login nodes Gateway2

File system

Only cdls are visible fromyour laptop

How toreach there?

How to reach your UserInterface Jupyter? Tensorboard?

8888

start_analytics --ssh-tunnel 8080:8080

8888

Page 45: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre

Login nodes Gateway2

File system

Only cdls are visible fromyour laptop

How toreach there?

How to reach your UserInterface Jupyter? Tensorboard?

8888

8888

8888

8888`

Page 46: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre

Login nodes Gateway2

File system

Only cdls are visible fromyour laptop

How toreach there?

How to reach your UserInterface Jupyter? Tensorboard?

8888

8888

8888

8888`

Page 47: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre

Login nodes Gateway2

File system

Only cdls are visible fromyour laptop

How toreach there?

How to reach your UserInterface Jupyter? Tensorboard?

8888

8888

8888

8888`

ssh – L 8888:localhost:8888 cdl2ssh – L 8888:localhost:8888 gateway2module load analyticssalloc -N 1 -p debugstart_analytics --login-port 8888 --ui-port 8888jupyter notebook

Page 48: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre

Login nodes Gateway2

File system

Only cdls are visible fromyour laptop

How toreach there?

Choose one forwarded port per user

20080

20080

20080

20080

ssh – L 20080:localhost:20080 cdl2ssh – L 20080:localhost:20080 gateway2module load analyticssalloc -N 1 -p debugstart_analytics --login-port 20080 --ui-port 20080jupyter notebook

Page 49: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre

Login nodes Gateway2

File system

Only cdls are visible fromyour laptop

How toreach there?

Save the forwarding in .ssh/config

20080

20080

20080

20080

ssh – L 20080:localhost:20080 cdl2ssh – L 20080:localhost:20080 gateway2module load analyticssalloc -N 1 -p debugstart_analytics --login-port 20080 --ui-port 20080jupyter notebook

Page 50: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre

Login nodes Gateway2

File system

Port forwarding setin .ssh/config

How toreach there?

Save the forwarding in .ssh/config

20080

20080

20080

20080

ssh shaheenssh gateway2module load analyticssalloc -N 1 -p debugstart_analytics --login-port 20080 --ui-port 20080jupyter notebook

Page 51: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Compute Nodes

/home

cdl1

cdl2

cdl3

cdl4Gateway2

Gateway1

/lustre

Login nodes Gateway2

File system

Port forwarding setin .ssh/config

How toreach there?

Save the forwarding in .ssh/config

20080

20080

20080

20080

ssh cdl2ssh gateway2module load analyticssalloc -N 1 -p debugstart_analytics --login-port 20080 --ui-port 20080jupyter notebook

On myLaptop

On cdl3

Page 52: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

https://www.hpc.kaust.edu.sa/jupyter

Interactivesession

only!© CRAY, courtesy of Dr James D. Maltby

+ Jupyterlab

Page 53: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Cray Graph Engine

© CRAY, courtesy of Dr James D. Maltby

Page 54: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

So how it works?Which commands? Where it happens?

● If you aim to use the web interface, you need to build a tunnel just like for Jupyter notebook

● Full instructions are detailed at

https://www.hpc.kaust.edu.sa/cray-graph-engine

Page 55: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Use jupyterlab instead of jupyter notebook

conda install jupyterlab

Page 56: Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019  · Current status Urika-XC Analytics slack has been installed since November 2018 and a few users already used

Questions?

http://hpc.kaust.edu.sa/analytics

https://pubs.cray.com/content/S-2589/1.1.UP00/xctm-series-urika-xc-analytic-

applications-guide/about-urika-xc

[email protected]

[email protected]


Recommended