Cluster Computing in Frankfurt
Anja Gerbes
Goethe University in Frankfurt/MainCenter for Scientific Computing
December 12, 2017
General InformationCluster FactsBatch Usage
Center for Scientific ComputingWhat can we provide you?
CSC Center for Scientific Computing
Capability computingCapacity computingAccess to licensed softwareIntroductory Courses
HKHLR Hessischen Kompetenzzentrum für Hochleistungsrechnen
Access to hessian clustersHiPerCH Workshops
Anja Gerbes Cluster Computing in Frankfurt
General InformationCluster FactsBatch Usage
Center for Scientific Computing
Capability computing is thought of as using the maximumcomputing power to solve a single largeproblem in the shortest amount of time.
Capacity computing in contrast is thought of as using efficientcost-effective computing power to solve asmall number of somewhat large problemsor a large number of small problems.
Access to licensed software to commercial packages: TotalViewDebugger, Vampir Profiler, Intel Compilers,Tools and Libraries.
Access to hessian clusters of the universities of Darmstadt, Frankfurt,Giessen, Kassel, and Marburg.
Anja Gerbes Cluster Computing in Frankfurt
General InformationCluster FactsBatch Usage
Center for Scientific Computing
Introductory Courses UNIX, Shell Scripting, Software Tools,Cluster Computing (for MPI/OpenMP &Matlab users), Python, C++, TotalView,Make - Build-Management-Tool.
HiPerCH Workshops offers users twice a year an insight into thehigh-performance computing with differentHPC topics.
Anja Gerbes Cluster Computing in Frankfurt
Introductionto LOEWE-CSC & FUCHS
Cluster Computing in Frankfurt
Anja Gerbes
Goethe University in Frankfurt/MainCenter for Scientific Computing
December 12, 2017
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
HPC Terminology
Cluster A group of identical computers connected by a high-speednetwork are forming a supercomputer.
Node Currently most compute node is equivalent to a high-endworkstation and is a part of a cluster.with two sockets, each with a single CPU, volatile working memory (RAM), a hard drive
CPU A Central Processing Unit (CPU) is a processor which mayhave one or more cores to perform tasks at a given time.
Core A core is the basic computation unit of the CPU.with its own computing pipeline, logical units, memory controller
Thread Each CPU core service a number of CPU threads.each having an independent instruction stream but sharing the cores memory controller & other logical units
FLOPS Performance is measured in FLoating-point Operations PerSecond (FLOPS)
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
Formula
The full and complete sample formula using dimensional analysis:
GFLOPS =#chassis ·# nodeschassis
·#socketsnode
·# coressocket
· GHzcore
· FLOPscycle
TFLOPS = TeraFLOPS = 1012FLOPS =GFLOPS
1000
GFLOPS = GigaFLOPS = 109FLOPS =MFLOPS
1000
MFLOPS = MegaFLOPS = 106FLOPS
NoteThe use of a GHz processor yields GFLOPS of theoretical performance.
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
HPC Terminology
The Past, but times changes . . .
1 A chassis contained a singlenode.
2 A single node contained a singleprocessor.
3 A processor contained a singleCPU core and fit into a singlesocket.
. . . with recent computer systems:
1 A single chassis containingmultiple nodes.
2 Those nodes contain multiplesockets.
3 The processors in those socketscontain multiple CPU cores.
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
HPC Terminology
On current computer systems:
1 A chassis houses one or more compute nodes.
2 A node contains one or more sockets.
3 A socket holds one processor.
4 A processor contains one or more CPU cores.
5 The CPU cores perform the actual mathematical computations.
6 One sequence of these mathematical operations involves theexclusive use of floating point numbers (FLOPS).
7 One or more of rack computers builds a computer system.
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
HPC Terminology
core core
corecore
processor
MEMORY
dual-core CPU
core
core
processor
MEMORY
quad-core CPU
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
HPC Terminology
core core
corecore
processor
core core
corecore
processor
24GB MEMORY
NODE
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
Setting of the LOEWE-CSC Cluster
RAM
HDD
Input / Output
CPU
NodeAMD
RAM
HDD
Input / Output
CPU
NodeGPU
RAM
HDD
Input / Output
CPU
NodeIntel
graphic
card
graphic
card
interconnect Fabric
Storage Storage
Core Core
Core Core
Core Core
Core Core
Core Core
Core Core
Cluster
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
Access to the Cluster of Frankfurt
LOEWE-CSCssh <username>@loewe-csc.hhlr-gu.de
FUCHSssh <username>@hhlr.csc.uni-frankfurt.de
I Go to CSC-Website/Access/LOEWE &CSC-Website/Access/FUCHS to get an account at the Clusters.
I The project manager has to send a request to Prof. Lüdde to getCPU-Time for research projects.
I Please download the file & use a regular PDF viewer to open theforms.
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
Organization of a Cluster
Your PC
Login nodes
2-4 generallogin nodes
ssh connection Batch
job
Compute nodes
600+ nodes
Infiniband
network
connects
nodes
loewe-csc.hhlr-gu.dehhlr.csc.uni-frankfurt.de
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
Idea behind Batch Processing
I Whatever you would normally type at the command line⇒ goes into your batch script
I Output that would normally go to the screen⇒ goes into a log file
I The system runs your job when resources become available
I Very efficient in terms of resource utilization
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
Hardware Resources of the LOEWE-CSC Cluster
#NodeCPU
GHz # CoresCPU
CoresNode
ThreadsNode
RAM[in GB]
GPU
4382xAMD Opteron 6172
2.10 12 24 24 64
1xAMD HD 5800 1GB
198 2xIntel Xeon E5-2670v2 2.50 10 20 40 128
139 2xIntel Xeon E5-2640v4 2.40 10 20 40 128
502xIntel Xeon E5-2630v2
2.60 6 12 24 128
2xAMD FirePro S10000 12GB
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
Filesystem of the Clusters
Warning
Use the /scratch-directory instead of /home to write out the standardoutput and error.
LOEWE-CSCmountpoint /home /scratch /local /data0[1|2]
size 10GB per user 764 TB 1.4 T 500TB eachaccess time slow fast fast slowsystem NFS FhGFS ext3 NFSnetwork Ethernet InfiniBand Ethernet
FUCHS
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
Environments Modules
DefinitionEnvironments Modules provide software for specific purposes.
Syntax: module <command> <modulename>
avail display all available moduleslist display all loaded modulesload | add <module> load a moduleload unstable load a deprecated or unstable moduleunload | rm <module> unload a module
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
Environments Modules
DefinitionEnvironments Modules provide software for specific purposes.
Syntax: module <command> <modulename>
switch | swap <old-module> <new-module>
first unloads an old module then loads a new module
purge unload all currently loaded modules
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
Environments Modules
Syntax: module load <modulename>
If you (un)load a mpi module you will automatically (un)load a compiler:
No MPI Type1 mpi/mvapich2/gcc/2.0 gcc2 mpi/mvapich2/intel-14.0.3/2.0 intel3 mpi/mvapich2/pgi-14.7/2.0 pgi1 openmpi/gcc/1.8.1 gcc2 openmpi/intel-14.0.3/1.8.1 intel
No Compiler1 intel/compiler/64/14.0.3
2 pgi/14.7
generic term(version of)flavour of MPIcompiler (version)with which was compiled
Intel softwareBit
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
Environments ModulesUse custom modules
1 writing a module file in tcl to set environment variables
2 module load use.own enables you to load your own modules
3 module load ~/privatemodules/modulename
4 use facilities provided by module
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
Partitions of the Cluster
Cluster Partitionrun
timeMax
NodesMax
NodesPUMax
JobsPUMax
SubmitPU
LOEWE parallel 30d 750 150 40 50
gpu 30d 50 50 40 50
test 1h 2-12 10 10
FUCHS parallel 30d 60 100 60 100
test 12h
The maximum array size of the LOEWE-CSC-cluster is 1001.
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
Architecture of the LOEWE-Partitions
– partition = parallel– constraint = dual (AMD)– constraint = intel20 (Intel)– constraint = broadwell (Intel)
RAM
HDD
Input / Output
CPU
NodeAMD
RAM
HDD
Input / Output
CPU
NodeGPU
RAM
HDD
Input / Output
CPU
NodeIntel
graphic
card
graphic
card
interconnect Fabric
Storage Storage
Core Core
Core Core
Core Core
Core Core
Core Core
Core Core
partition parallel
Cluster
– partition = gpu
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Cluster-UsageEnvironments ModulesPartitions
Architecture of the FUCHS-Partitions
Processor Type AMD #AMD Socket #CPU RAM[in GB]
Magny-Cours AMD 72 dual 24 64
Magny-Cours AMD 36 quad 48 128
Istanbul AMD 250 dual 12 32/64
I The architecture is called with ’--constraint’.
I magnycours = 72 dual-socket AMD Magny-Cours nodesdual = 250 dual-socket AMD Istanbul nodesquad = 36 quad-socket AMD Magny-Cours nodes
I ’--constraint=magnycours|dual’ to avoid quad
Anja Gerbes Introduction to LOEWE-CSC & FUCHS
Batch Usageon LOEWE-CSC & FUCHS
Cluster Computing in Frankfurt
Anja Gerbes
Goethe University in Frankfurt/MainCenter for Scientific Computing
December 12, 2017
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Batch System Concepts
Cluster consists of a set of tightly connectedidentical computers as a single system &work together to solvecomputation-intensive problems.
Resource Manager is responsible for managing the resourcesof a cluster, like tasks, nodes, CPUs,memory & network.
Scheduler controls user’s jobs on a cluster.
Batch System combines all the features of a scheduler &a resource manager in an efficient way.
SLURM offers both functionality, scheduling &resource management.
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Batch System Concepts
Batch Processing executes programs or jobs without user’sintervention.
Job consists with a description of requiredresources & job steps user-defined work-flowsby the batch system.
Job Steps describe tasks that must be done.
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Batch System Concepts
ClusterI consists of a set of tightly connected identical computers
I computers are presented as a single system & work together tosolve computation-intensive problems
I node are connected through high speed local network
I node have access to shared resources like shared file-systems
Resource ManagerI responsible for managing the resources of a cluster, like tasks,
nodes, CPUs, memory & network
I manages the execution of jobs
I makes sure that jobs are not overlapping on the resources &handles also their I/O
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Batch System Concepts
SchedulerI receives jobs from the users
I controls user’s jobs on a cluster
I controls the resource manager to make sure that the jobs arecompleted successfully
I handles the job submissions & put jobs into queuesI offers many features like:
I user commands for managing the jobs (start, stop, hold)I interfaces for defining work-flows & job dependenciesI interfaces for job monitoring & profiling (accounting)I partitions & queues to control jobs according to policies & limitsI scheduling mechanisms, like backfilling according to priorities
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Batch System Concepts
Batch SystemI is the combination of a scheduler & a resource manager
I combines all the features of these two parts in an efficient way
I SLURM offers both functionality, scheduling & resourcemanagement
Batch ProcessingI composition of programs, so-called Jobs, is achieved by batch
processing & realized by batch systems
I execution of programs or jobs without user’s intervention
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Batch System Concepts
JobI execution of user-defined work-flows by the batch system
I a job consists of a description of required resources & job steps
Job StepsI job steps describe tasks that must be done
I resource requests consist in a number of CPUs, computing expectedduration, amounts of RAM or disk space
I the script itself is a job step
I other job steps are created with the srun command
I when a job started, the job would run a first job step srun
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
SLURMResource Manager on the Cluster
I SLURM stand for Simple Linux Utility for Resource Management.
I The user sends a job via sbatch to SLURM.
I SLURM calculates the work priority of each job.
I SLURM starts a job according to the priority & the resourcesavailabilty.
I There is an exclusive node assignment per job.
I SLURM allocates resources of the jobs.
I SLURM provides a framework for starting & monitoring of the jobs.
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
SLURM Commands
1 job submission & execution
salloc requests interactive jobs/allocationssbatch submits a batch script
srun run jobs interactively (implicit resource allocation)
2 manage a job
scancel cancels a pending or running jobsinfo shows information about nodes & partitions
squeue allows to query the list of pending & running jobsscontrol shows detailed information about compute nodes
3 accounting information
sacct displays accounting data for all jobs & job stepssacctmgr shows slurm account information
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Backfilling Scheduling
Backfilling Scheduling Algorithm may schedule jobs with lower prioritiesthat can fit in the gap created by freeing resources for thenext highest priority jobs.
−1 0 1 2 3 4
A
time
node
s
BCC
C
A is 1 node job & starts at -1. Consider a 2-node cluster, job A is running and willtake until time point 2
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Backfilling Scheduling
Backfilling Scheduling Algorithm may schedule jobs with lower prioritiesthat can fit in the gap created by freeing resources for thenext highest priority jobs.
−1 0 1 2 3 4
A
time
node
s
B
CCC
B starts at -1. Now job B is submitted and scheduled. It will start after A, as it willtake all 2 nodes.
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Backfilling Scheduling
Backfilling Scheduling Algorithm may schedule jobs with lower prioritiesthat can fit in the gap created by freeing resources for thenext highest priority jobs.
−1 0 1 2 3 4
A
time
node
s
BC
CC
C starts at 0. Now job C is submitted. It will start after B, if the scheduler has toassume it will take longer than time point 2.
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Backfilling Scheduling
Backfilling Scheduling Algorithm may schedule jobs with lower prioritiesthat can fit in the gap created by freeing resources for thenext highest priority jobs.
−1 0 1 2 3 4
A
time
node
s
BCC
C
However, C, if promised to end before B it will start now. This is backfillingsimplified. The actual process will take into account all resources.
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job-Submission
1 Login to the clusterssh <username>@loewe-csc.hhlr-gu.de LOEWEssh <username>@hhlr.csc.uni-frankfurt.de FUCHS
2 Create a job scripte.g. with the .slurm extensionExample script name: workshop_batch_script.slurm
3 Submit this script to the cluster with sbatch
sbatch workshop_batch_script.slurm (indirect)salloc (interactive mode)
4 Use of allocated resources
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job-Submission
I commands for job allocation
I sbatch is used to submit batch jobs to the queuesbatch [options] jobscript [args...]
I salloc is used to allocate resource for interactive jobssalloc [options] [<command> [command args]]
I command for job executionI with srun the users can spawn any kind of application, process or
task inside a job allocation1 Inside a job script submitted by sbatch (starts a job step)2 After calling salloc (execute programs interactively)
srun [options] executable [args...]
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Indirect Job-Submission
sbatch
encapsulation of job parameters and user program call in a job script tothe handover to submit command
Features:
I create prefabricated job scripts with important parameters eliminatesoperator error
I simply add additional functionality
I allows transfer of additional parameters to submit command
I one-time additional expenses by draft the job-scripts
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Direct Job-Submission
salloc
transfer of job parameters and user program to submit command
Features:
I allows simple, quick and flexible change of job parameters
I prefered in many of the same jobs that differ only in very fewparameters (eg. benchmarks, the same process, different number ofCPUs)
I prone to faulty operation
I additional functionality only via encapsulation in self-generatedscripts (eg. load the library)
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Execution
srun
I used to initiate job steps mainly within a job
I start an interactive jobs
I a job can contain multiple job steps
I executing sequentially or in parallel on independent nodes within thejob’s node allocation
After modulefiles are loaded and resources have been allocated, anapplication on the assigned node can be started with precedingsrun mvapichmpirun openmpimpiexec openmpi
In this shell window more applicationscan be started.Running jobs interactively implicitresource allocation with salloc.
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job-Submission
List of the submission/allocation options for sbatch and salloc:
-p, --partition partition to be used from the job-c, --cpus-per-task logical CPUs (hardware threads) per task-N, --nodes compute nodes used by the job-n, --ntasks total No. of processes (MPI processes)--ntasks-per-node tasks per compute node-t, --time max wall-clock time of the job-J, --job-name set the name of the job-o, --output path to the job’s standard output-e, --error path of the job’s standard error
srun accepts almost all allocation options of sbatch and salloc
NoteOption --partition has to be set.
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Toy Examples
Listing 1: A naive script1 #!/bin/bash23 #SBATCH --job-name=TestJobSerial4 #SBATCH --nodes=15 #SBATCH --output=TestJobSerial-%j.out6 #SBATCH --error=TestJobSerial-%j.err7 #SBATCH --time=3089 hostname
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Toy Examples
Listing 2: Going parallel1 #!/bin/bash23 #SBATCH --job-name=TestJobParallel4 #SBATCH --nodes=15 #SBATCH --output=TestJobParallel-%j.out6 #SBATCH --error=TestJobParallel-%j.err7 #SBATCH --time=6089 srun --ntasks-per-node=2 hostname
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Toy Examples
Listing 3: Going parallel across nodes1 #!/bin/bash23 #SBATCH --job-name=TestJobParallel4 #SBATCH --nodes=45 #SBATCH --output=TestJobParallel-%j.out6 #SBATCH --error=TestJobParallel-%j.err7 #SBATCH --time=6089 srun --ntasks-per-node=2 hostname
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Creating a Parallel Job in SLURM
There are several ways a parallel job, one whose tasks are runsimultaneously, can be created:
1 by running several instances of multi-programs
2 by running a multi-process program (MPI)
3 by running a multi-threaded program (OpenMP or pthreads)
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Creating a Parallel Job in SLURM
I In SLURM context, a task is to be understood as a process.
I A multi-threaded program is consists in 1 task that uses several CPUs.
I Option --cpus-per-task is defined for multi-threaded programs.I Multi-threaded jobs run on a single node, but use more than one
processor on the node.I Tasks cannot be split across several compute nodes, so requesting
several CPUs with the --cpus-per-task option will ensure allCPUs are allocated on the same compute node.
I A multi-process program is made of several tasks.
I Option --ntasks is defined for multi-process programs.I By contrast, requesting the same amount of CPUs with the--ntasks option may lead to several CPUs being allocated onseveral, distinct compute nodes.
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Trivial ParallelizationSimple Loop
Listing 4: scriptexp.sh1 #!/bin/bash23 echo ‘hostname‘ $14 exit 0
Listing 5: Serial Job, simpleloop.slurm1 #!/bin/bash2 #SBATCH --job-name=oneforloop3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=488 #SBATCH --cpus-per-task=19 #SBATCH --mem-per-cpu=1200
10 #SBATCH --mail-type=FAIL1112 for N in ‘seq 1 48‘; do13 srun -N 1 -n 1 ./scriptexp.sh $N &14 done15 wait16 sleep 300
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Trivial ParallelizationNested Loop
Listing 6: scriptexp.sh1 #!/bin/bash23 echo ‘hostname‘ $14 sleep 25 exit 0
Listing 7: Serial Job, nestedloop.slurm1 #!/bin/bash2 #SBATCH --job-name=twoforloops3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=488 #SBATCH --cpus-per-task=19 #SBATCH --mem-per-cpu=1200
10 #SBATCH --mail-type=FAIL1112 for i in ‘seq 0 3‘; do13 for M in ‘seq 1 48‘; do14 let N=$i*48+$M15 srun -N 1 -n 1 ./scriptexp.sh $N &16 done17 wait18 done19 wait
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Toy Examples
Listing 8: Naive OpenMP Job1 #!/bin/bash23 #SBATCH -J NaiveOMP4 #SBATCH -N 156 #SBATCH -o TestOMP-%j.out7 #SBATCH -e TestOMP-%j.err8 #SBATCH --time=2:00:009
10 #SBATCH --constraint=dual1112 # AMD nodes13 export OMP_NUM_THREADS=241415 /home/user/omp-prog
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Toy Examples
Listing 9: Naive OpenMP Job1 #!/bin/bash23 #SBATCH -J NaiveOMP4 #SBATCH -N 156 #SBATCH -o TestOMP-%j.out7 #SBATCH -e TestOMP-%j.err8 #SBATCH --time=2:00:009
10 #SBATCH --constraint=dual1112 # AMD nodes13 export OMP_NUM_THREADS=241415 /home/user/omp-prog
max CoresNode
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Toy Examples
Listing 10: MPI Job1 #!/bin/bash23 #SBATCH -J TestMPI4 #SBATCH --nodes=45 #SBATCH --ntasks=966 #SBATCH -o TestMPI-%j.out7 #SBATCH -e TestMPI-%j.err8 #SBATCH --time=0:15:009
10 #SBATCH --partition=parallel1112 # implied --ntasks-per-node=2413 srun ./mpi-prog
4×24 = 96
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Toy Examples
Listing 11: Multiple Job Steps1 #!/bin/bash23 #SBATCH -J TestJobSteps4 #SBATCH -N 325 #SBATCH --partition=parallel6 #SBATCH -o TestJobSteps-%j.out7 #SBATCH -e TestJobSteps-%j.err8 #SBATCH --time=6:00:009
10 srun -N 16 -n 32 -t 00:50:00 ./mpi-prog_111 srun -N 2 -n 4 -t 00:10:00 ./mpi-prog_212 srun -N 32 --ntasks-per-node=2 -t 05:00:00 ./mpi-prog_3
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Toy Examples
Listing 12: Multiple Job Steps1 #!/bin/bash23 #SBATCH -J TestJobSteps4 #SBATCH -N 325 #SBATCH --partition=parallel6 #SBATCH -o TestJobSteps-%j.out7 #SBATCH -e TestJobSteps-%j.err8 #SBATCH --time=6:00:009
10 srun -N 16 -n 32 -t 00:50:00 ./mpi-prog_111 srun -N 2 -n 4 -t 00:10:00 ./mpi-prog_212 srun -N 32 --ntasks-per-node=2 -t 05:00:00 ./mpi-prog_3
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Toy Examples
Listing 13: Multiple Job Steps1 #!/bin/bash23 #SBATCH -J TestJobSteps4 #SBATCH -N 325 #SBATCH --partition=parallel6 #SBATCH -o TestJobSteps-%j.out7 #SBATCH -e TestJobSteps-%j.err8 #SBATCH --time=6:00:009
10 srun -N 16 -n 32 -t 00:50:00 ./mpi-prog_111 srun -N 2 -n 4 -t 00:10:00 ./mpi-prog_212 srun -N 32 --ntasks-per-node=2 -t 05:00:00 ./mpi-prog_3
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Toy Examples
Listing 14: Job Arrays1 #!/bin/bash23 #SBATCH -J TestJobArrays4 #SBATCH --nodes=15 #SBATCH -o TestJobArrays-%A_%a.out6 #SBATCH -e TestJobArrays-%A_%a.err7 #SBATCH --time=2:00:008 #SBATCH --array=1-209
10 srun -N 1 --ntasks-per-node=1 ./prog input_${SLURM_ARRAY_TASK_ID}.txt
will cause 20 array-tasks(numbered 1, 2, . . . , 20)
“array-tasks” are simply copies of this master script
I SLURM supports job arrays with option --array
SLURM_ARRAY_JOB_ID : %A : base array job id
SLURM_ARRAY_TASK_ID : %a : array index
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Array Support
I Job arrays offer a mechanism for submitting and managingcollections of similar jobs quickly and easily.
I Job arrays with many tasks can be submitted in milliseconds.
I All jobs have the same initial options (e.g. size, time, limit)
I Users may limit how many such jobs are running simultaneously.
I Job arrays are only supported for batch jobs.
I To address a job array, SLURM provides a base array ID & an arrayindex for each job, specify with <base job id>_<array index>
I SLURM exports environment variables
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
CPU Management with SLURM
1 Selection of Nodes
2 Allocation of CPUs from Selected Nodes
3 Distribution of Tasks to Selected Nodes
4 Optional Distribution & Binding of Tasks to Allocated CPUswithin a Node
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
CPU Management
AllocationAssignment of a specific set of CPU resources (nodes, sockets, coresand/or threads) to a specific job or step
Distribution1 Assignment of a specific task to a specific node
2 Assignment of a specific task to a specific set of CPUs within a node(used for optional Task-to-CPU binding)
Core Binding
Confinement/locking of a specific set of tasks to a specific set of CPUswithin a node
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
CPU Management with SLURM
Selection of resource with sbatch
#SBATCH --partition=parallel
#SBATCH --nodes=6
#SBATCH --constraint=intel20
#SBATCH --mem=512
#SBATCH --ntasks=12
#SBATCH --cpus-per-task=3
Allocation of resource with sbatch
Distribution with srun
srun --distribution=block:cyclic ./my_program
Core Binding Process/Task Binding with srun
srun --cpu_bind=cores ./my_program
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
HOW-TO
I first you have to login to one of the login nodes
I prepare a batch script with your requirements
I execute the batch script to run your application
I monitor the batch script on the terminal
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Login to one of the Login Nodes
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Executing the Batch Script to run your Application
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Monitoring the Batch Script on the Terminal
squeue -u <user>
scancel <jobid>
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Monitoring the Batch Script on the Terminal
scontrol show job <jobid>
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Setting of the LOEWE-CSC Cluster
RAM
HDD
Input / Output
CPU
NodeAMD
RAM
HDD
Input / Output
CPU
NodeGPU
RAM
HDD
Input / Output
CPU
NodeIntel
graphic
card
graphic
card
interconnect Fabric
Storage Storage
Core Core
Core Core
Core Core
Core Core
Core Core
Core Core
important parameters
for sbatch:
I -p, --partition
I -C, --constraint
I -J, --job-name
I -t, --time
I -N, --nodes
I --mem-per-cpu
I -n, --ntasks
I -c, --cpus-per-task
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Examples
Listing 15: Parallel MPI Job#!/bin/bash
#SBATCH --job-name=parallelmpi#SBATCH --output=expscript-%j.out#SBATCH --error=expscript-%j.err##SBATCH --partition=parallel#SBATCH --constraint=dual##SBATCH --ntasks=4#SBATCH --time=00:10:00#SBATCH --mem-per-cpu=100
module load mpi/mvapich2/gcc/2.0mpiexec helloworld.mpi
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Examples
Listing 16: OpenMP Job#!/bin/bash
#SBATCH --job-name=parallelopenmp#SBATCH --output=expscript-%j.out#SBATCH --error=expscript-%j.err##SBATCH --partition=parallel#SBATCH --constraint=dual##SBATCH --ntasks=1#SBATCH --cpus-per-task=4#SBATCH --time=00:10:00#SBATCH --mem-per-cpu=100
export OMP_NUM_THREADS=4./helloworld.omp
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Listing 17: OpenMP Job#!/bin/bash#SBATCH --nodes=1#SBATCH --tasks-per-node=1#SBATCH --cpus-per-task=16
export OMP_NUM_THREADS=16
srun -n 1 --cpus-per-task $OMP_NUM_THREADS ./application
Cores 0-7 Cores 8-15
Socket 1 Socket 2
Node 1
0-15
physical view
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Examples
Listing 18: MPI Job#!/bin/bash#SBATCH --nodes=2#SBATCH --tasks-per-node=16#SBATCH --cpus-per-task=1
srun -n 32 ./application 0 2 4 6 8 101214 1 3 5 7 9 111315
Cores 0-7 Cores 8-15
Socket 1 Socket 2
Node 1
1618202224262830 1719212325272931
Cores 0-7 Cores 8-15
Socket 1 Socket 2
Node 2
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Examples
Listing 19: Hybrid MPI/OpenMP Job#!/bin/bash#SBATCH --nodes=2#SBATCH --tasks-per-node=4#SBATCH --cpus-per-task=4
export OMP_NUM_THREADS=4
srun -n 8 --cpus-per-task $OMP_NUM_THREADS ./application
Rank 0
Cores 0-7 Cores 8-15
Socket 1 Socket 2
Node 1
0,1
Rank 0
8,9
Rank 1
2,1
Rank 1
10,11
Rank 2
4,5
Rank 2
12,13
Rank 3
6,7
Rank 3
14,15
Rank 4
Cores 0-7 Cores 8-15
Socket 1 Socket 2
Node 2
0,1
Rank 4
8,9
Rank 5
2,1
Rank 5
10,11
Rank 6
4,5
Rank 6
12,13
Rank 7
6,7
Rank 7
14,15
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Submitting a Batch Script
Suppose you need 16 cores.
Have the control on how the cores are allocated using--cpus-per-task & --ntasks-per-node options.
With those options, there are several ways to get the same allocation:
Example for --cpus-per-task & --ntasks-per-node
Equivalence in terms of resource allocation:
I --nodes=4 --ntasks=4 --cpus-per-task=4
I --ntasks=16 --ntasks-per-node=4 with
I srun 4 processes are launched
I mpirun 16 processes are launched
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Submitting a Batch Script
Suppose you need 16 cores.
Example for --cpus-per-task & --ntasks-per-node
I use mpi & don’t care about where those cores are distributed:--ntasks=16
I launch 16 independent processes (no communication):--ntasks=16
I want those cores to spread across distinct nodes:--ntasks=16 --ntasks-per-node=1 or--ntasks=16 --nodes=16
I want those cores to spread across distinct nodes & no interferencefrom other jobs:--ntasks=16 --nodes=16 --exclusive
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Submitting a Batch Script
Suppose you need 16 cores.
Example for --cpus-per-task & --ntasks-per-node
I 16 processes to spread across 8 nodes to have two processes pernode:--ntasks=16 --ntasks-per-node=2
I 16 processes to stay on the same node:--ntasks=16 --ntasks-per-node=16
I one process that can use 16 cores for multithreading:--ntasks=1 --cpus-per-task=16
I 4 processes that can use 4 cores each for multithreading:--ntasks=4 --cpus-per-task=4
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Submitting a Batch Script
Example for --mem & --mem-per-cpu
I If you request two cores (-n 2) and 4 Gb with --mem, each core willreceive 2 Gb RAM.
I If you specify 4 Gb with --mem-per-cpu, each core will receive 4 Gbfor a total of 8 Gb.
Anja Gerbes Batch Usage on LOEWE-CSC & FUCHS
Different Batch-ScriptsCluster Computing in Frankfurt
Anja Gerbes
Goethe University in Frankfurt/MainCenter for Scientific Computing
December 12, 2017
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
OpenMP example
Listing 20: OpenMP1 #!/bin/bash2 #SBATCH --job-name=openmpexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=18 #SBATCH --cpus-per-task=249 #SBATCH --mem-per-cpu=200
10 #SBATCH --time=48:00:0011 #SBATCH --mail-type=ALL12 #13 export OMP_NUM_THREADS=2414 ./example_program
If your application needs 4800 MByou want to run 24 threads
set –mem-per-cpu=200 (4800/24 = 200)
Anja Gerbes Different Batch-Scripts
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
MPI example
Listing 21: MPI1 #!/bin/bash2 #SBATCH --job-name=mpiexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=SLURM_NTASKS8 #SBATCH --cpus-per-task=19 #SBATCH --mem-per-cpu=1200
10 #SBATCH --time=48:00:0011 #SBATCH --mail-type=ALL12 #13 module load openmpi/gcc/1.8.114 export OMP_NUM_THREADS=115 mpirun -np 96 ./example_program
SLURM_NTASKS = OpenMPI ranks
Anja Gerbes Different Batch-Scripts
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
MPI example
Listing 22: MPI1 #!/bin/bash2 #SBATCH --job-name=mpiexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=SLURM_NTASKS8 #SBATCH --cpus-per-task=19 #SBATCH --mem-per-cpu=1200
10 #SBATCH --time=48:00:0011 #SBATCH --mail-type=ALL12 #13 module load openmpi/gcc/1.8.114 export OMP_NUM_THREADS=115 mpirun -np 96 ./example_program
1200 MB of RAM are allocated for each rank
Anja Gerbes Different Batch-Scripts
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
small MPI example
Listing 23: small MPI example1 #!/bin/bash2 #SBATCH --job-name=smallmpiexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=248 #SBATCH --nodes=19 #SBATCH --cpus-per-task=1
10 #SBATCH --mem-per-cpu=200011 #SBATCH --time=48:00:0012 #SBATCH --mail-type=FAIL13 #14 export OMP_NUM_THREADS=115 mpirun -np 12 ./program input01 >& 01.out &16 sleep 317 mpirun -np 12 ./program input02 >& 02.out &18 wait
if you have several 12-rank MPI jobsyou can start more than one computations
Anja Gerbes Different Batch-Scripts
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Hybrid MPI+OpenMP example
Listing 24: Hybrid MPI+OpenMP1 #!/bin/bash2 #SBATCH --job-name=hybridexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=248 #SBATCH --cpus-per-task=69 #SBATCH --mem-per-cpu=200
10 #SBATCH --time=48:00:0011 #SBATCH --mail-type=ALL12 #13 export OMP_NUM_THREADS=614 export MV2_ENABLE_AFFINITY=015 srun -n 24 ./example_program
24 ranks
Anja Gerbes Different Batch-Scripts
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Hybrid MPI+OpenMP example
Listing 25: Hybrid MPI+OpenMP1 #!/bin/bash2 #SBATCH --job-name=hybridexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=248 #SBATCH --cpus-per-task=69 #SBATCH --mem-per-cpu=200
10 #SBATCH --time=48:00:0011 #SBATCH --mail-type=ALL12 #13 export OMP_NUM_THREADS=614 export MV2_ENABLE_AFFINITY=015 srun -n 24 ./example_program
6 threads each
Anja Gerbes Different Batch-Scripts
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Hybrid MPI+OpenMP example
Listing 26: Hybrid MPI+OpenMP1 #!/bin/bash2 #SBATCH --job-name=hybridexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=248 #SBATCH --cpus-per-task=69 #SBATCH --mem-per-cpu=200
10 #SBATCH --time=48:00:0011 #SBATCH --mail-type=ALL12 #13 export OMP_NUM_THREADS=614 export MV2_ENABLE_AFFINITY=015 srun -n 24 ./example_program
200 MB per threads
Anja Gerbes Different Batch-Scripts
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Hybrid MPI+OpenMP example
Listing 27: Hybrid MPI+OpenMP1 #!/bin/bash2 #SBATCH --job-name=hybridexp3 #SBATCH --output=expscript.out4 #SBATCH --error=expscript.err5 #SBATCH --partition=parallel6 #SBATCH --constraint=dual7 #SBATCH --ntasks=248 #SBATCH --cpus-per-task=69 #SBATCH --mem-per-cpu=200
10 #SBATCH --time=48:00:0011 #SBATCH --mail-type=ALL12 #13 export OMP_NUM_THREADS=614 export MV2_ENABLE_AFFINITY=015 srun -n 24 ./example_program
24 × 6 threads→ you will get six 24 core nodes
Anja Gerbes Different Batch-Scripts
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Toy Examples
Listing 28: Hybrid Job with Simultaneous multithreading (SMT)1 #!/bin/bash23 #SBATCH -J TestHybrid4 #SBATCH --ntasks=65 #SBATCH --ntasks-per-node=26 #SBATCH --cpus-per-task=247 #SBATCH -o TestMPI-%j.out8 #SBATCH -e TestMPI-%j.err9 #SBATCH --time=0:20:00
10 #SBATCH --partition=parallel1112 export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}1314 srun ./hybrid-prog
Anja Gerbes Different Batch-Scripts
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Job Scripts Toy Examples
Listing 29: Hybrid Job with Simultaneous multithreading (SMT)1 #!/bin/bash23 #SBATCH -J TestHybrid4 #SBATCH --ntasks=65 #SBATCH --ntasks-per-node=26 #SBATCH --cpus-per-task=247 #SBATCH -o TestMPI-%j.out8 #SBATCH -e TestMPI-%j.err9 #SBATCH --time=0:20:00
10 #SBATCH --partition=parallel1112 export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}1314 srun ./hybrid-prog
Anja Gerbes Different Batch-Scripts
Closing RemarksCluster Computing in Frankfurt
Anja Gerbes
Goethe University in Frankfurt/MainCenter for Scientific Computing
December 12, 2017
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Checklistfor a successful Cluster Usage
2 Account exists?
2 I know how to access the cluster.
2 I know the parallel behavior of my software (and I know whether it isparallel at all).
2 I can approximate the runtime behavior and memory usage of mysoftware.
2 I know how to run my software on the operating system of thecluster.
2 I know where to find help when I have problems.→ HKHLR-members.
Anja Gerbes Closing Remarks
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
SummaryResource Allocation Specifications
Syntax: sbatch myBatchScript.sh
Features of a Cluster -C, --constraint
Node count -N, --nodes
Node restrictions -w, --nodelist
Task count -n, --ntasks
Task specifications --ntasks-per-node
--ntasks-per-socket
--ntasks-per-core
--cpus-per-task
Memoryper node --mem
per CPU --mem-per-cpu
Anja Gerbes Closing Remarks
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Cluster Quick Reference Guide
Version 1.0 February 14, 2017 Cluster | quick reference | Frankfurt1 Cluster Usage
Access Cluster Frankfurt
ssh <username>@loewe-csc.hhlr-gu.de LOEWEssh <username>@hhlr.csc.uni-frankfurt.de FUCHS
Go to CSC-Website/Access to get an account at the clusters. The projectmanager has to send for LOEWE a request to Prof. Lüdde to get CPU-Time for research projects.
Getting Help Cluster Frankfurt
You will find further information about usable commands on the clusterswith man <command>.
How-To execute myBatchScript.sh
1 first you have to login to one of the login nodes2 prepare a batch script with your requirements3 execute the batch script to run your application
Module setting program environments
Syntax: module <command> <modulename>avail display all available moduleslist display all loaded modulesload | add <module> load a moduleload unstable load a deprecated or unstable moduleunload | rm <module> unload a moduleswitch | swap <old-module> <new-module>first unloads an old module then loads a new modulepurge unload all currently loaded modules
How-To use custom modules
1 writing a module file in tcl* to set environment variables2 module load use.own enables you to load your own modules3 module load ~/privatemodules/modulename4 use facilities provided by module* look for examples in /cm/shared/modulefiles
Architecture & Constraints Cluster Frankfurt
LOEWE Cluster Frankfurt#nodes CPU GHz #CPUs
Cores RAM GPU438 AMD
Magny-CoursOpteron 6172
2.10 2/24 64GB 1xATI RadeonHD5870 1GB
198 Intel XeonE5-2670v2Ivy Bridge
2.50 2/20 128GB
139 Intel XeonE5-2640v2Broadwell
2/20 128GB
50 Intel XeonE5-2650v2Ivy Bridge
2.60 2/12 128GB 2xAMD FireProS10000 12GB
The architecture will be selectable via the ’--constraint’ option,dual = dual-socket AMD Magny-Cours CPU/GPU nodes,intel20 = dual-socket Intel Ivy Bridge CPU nodes,broadwell = dual-socket Intel Broadwell CPU nodes.
FUCHS Cluster FrankfurtProcessor Type #AMD Socket #CPU RAM [in GB]
Magny-Cours 72 dual 24 64Magny-Cours 36 quad 48 128
Istanbul 250 dual 12 32/64The architecture is called with ’--constraint’.magnycours = 72 dual-socket AMD Magny-Cours nodesdual = 250 dual-socket Istanbul nodesquad = 36 quad-socket AMD Magny-Cours nodes’--constraint=magnycours|dual’ to avoid quad
Contact HPC FrankfurtIf you have any HPC-questions about SLURM and wanthelp by debugging & optimizing your program, pleasewrite to [email protected], you can contact the system administrators if youneed software to be installed: [email protected]. . . .Detailed documentation on using the cluster can befound at CSC-Website.
Partitions Cluster Frankfurt
cluster partitionrun
timeMax
NodesMax
NodesPUMax
JobsPUMax
SubmitPULOEWE parallel 30d 750 150 40 50
gpu 30d 50 50 40 50test 1h 2-12 10 10
FUCHS parallel 30d 60 100 60 100test 12h
The maximum array size of the cluster is 1001.To view such informations on the cluster, use the command:
sacctmgr list QOS partition format=maxnodes,maxnodesperuser,maxjobsperuser,maxsubmitjobsperuser
scontrol show partitionsinfo -p partitionsqueue -p partition
partition description LOEWEparallel A mix of AMD Magny-Cours nodes, Intel Xeon Ivy
Bridges & Broadwell nodes.gpu dual-socket Intel Xeon Ivy Bridge E5-2650v2 CPU/GPU
nodes, each with two AMD FirePro S10000 dual-GPUcards
’--constraint=gpu’ become obsolete, use ’--partition=gpu’ instead.Mixed node types ’gpu*3&intel20*2’ is possible. Ensure, that the Noof nodes you request matches the No of nodes in your constraints.
Per-User Resource Limits Cluster Frankfurt
limit descriptionMaxNodes max No of nodesMaxNodesPU max No of nodes to use at the same timeMaxJobsPU max No of jobs to run simultaneouslyMaxSubmitPU max No of jobs in running or pending stateMaxArraySize max job array size
File Systems storage systems
mountpoint /home /scratch /local /data0[1|2]size 10GB PU 764 TB 1.4 T 500TB eachaccess time slow fast fast slowsystem NFS FhGFS ext3 NFSnetwork Ethernet InfiniBand Ethernet
http://csc.uni-frankfurt.dehttp://www.hpc-hessen.de
Center for Scientific ComputingHessisches Kompetenzzentrum für Hochleistungsrechnen
Anja Gerbes Closing Remarks
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Cluster Quick Reference Guide
Version 1.0 February 14, 2017 Cluster | quick reference | Frankfurt
Resource Manager Cluster Frankfurt
On our systems, compute jobs are managed by SLURM. At the clusters,the node allocation is exclusive. You can find more examples on our CSC-Website/ClusterUsage. In SlurmCommands, there is a detailed summaryof the different options.
2 Job Submission & Execution
sbatch batch mode | salloc interactive | allocate resources
Syntax: salloc [options] [<command> [command args]]sbatch myBatchScript.sh
-a, --array=<indexes> submit a job array-C, --constraint=<feature> specify features of a Cluster-c, --cpus-per-task=<ncpus>Threads How many threads run on the node? with OpenMP-J, --job-name=<job-name> specify a name for the allocation-m, --distribution=<block|cyclic|arbitrary|plane>mapping of processes--mem=<MB> specify real memory required per node--mem-per-cpu min memory required per allocated CPU--mem_bind=<type> bind tasks to memory-N, --nodes=<min[-max]>Nodes How many nodes will be allocated to this job?-n, --ntasks=<number>Tasks How many processes are started? important for OpenMP-p <partition> request specific partition for the resource-t <time> set limit on total run time of the job-w, --nodelist=<node_name_list>request a specific list of node names
sbatch execute myBatchScript.sh | batch mode
#!/bin/bash#SBATCH -p parallel # partition (queue)#SBATCH -C dual|intel20 # class of nodes#SBATCH -N|-n|-c 1 # number of nodes|processes|cores#SBATCH --mem 100 # memory pool for all cores#SBATCH -t 0-2:00 # time (D-HH:MM)srun helloworld.sh # start program
srun run parallel jobs | interactive mode
After modulefiles are loaded and resources have been allocated, anapplication on the assigned node can be started with precedingsrun run parallel jobs | mpiexec run mpi programIn this shell window more applications can be started.
process bindingconstraints each process to run on specific processors
--cpu_bind process binding to cores & CPUs srun--bind-to |-core|-socket|-none| mpirun--cpus-per-proc <#perproc>bind each process to the specified number of cpus--report-bindings report any bindings for launched processes--slot-list <id> list of processor IDs to be used for binding
MPI processes
3 Accounting
sacct display accounting data
Syntax: sacct [options]-b, --brief displays jobid, status, exitcode-e, --helpformat print a list of available fields-o, --format comma separated list of fields
sacctmgr view Slurm account information
Syntax: sacctmgr [options] [command]list | show display information about the specified entity
4 Job Management
scancel cancel a job
Syntax: scancel <jobid>-u <username> cancel all the jobs for a user-t PD -u <username> cancel all the pending jobs for a user
sinfo view info about nodes and partitions
Syntax: sinfo [options]-i <seconds> print state on a periodic basis-l, --long print more detailed information-n <nodes> print info only about the specific node-p <partition> print info about the specified partition-R, --list-reasons list reasons why nodes are in the down,
drained, fail or failing state-s, --summarize list only a partition state summary with no
node state details
squeue view job info located in scheduling queue
Syntax: squeue [options]-i <seconds> report requested information-j <job_id_list> print list of job IDs-r print one job array element per line--start report expected start time & resources to be
allocated for pending jobs-t <state_list> print specified states of jobs-u <user_list> print jobs from list of users
scontrol view state of specified entity
Syntax: scontrol [options] [command] ENTITY_IDoptions:-d, --details print show command print more details-o, --oneliner print information one line per recordcommand <ENTITY_ID>:hold <jobid> pause a particular jobresume <jobid> resume a particular jobrequeue <jobid> requeue (cancel & rerun) a particular jobsuspend <jobid> suspend a running jobscontrol show ENTITY_ID:job <job_id> print job informationsnode <name> print node informationspartition <name> print partition informationsreservation print list of reservations
http://csc.uni-frankfurt.dehttp://www.hpc-hessen.de
Center for Scientific ComputingHessisches Kompetenzzentrum für Hochleistungsrechnen
Anja Gerbes Closing Remarks
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
ISC STEM STUDENT DAY & STEM GALA
Purpose? HPC skills can positively shape STEM students future careersIntroduction of current & predicted HPC job landscape& howEuropean HPC workforce look like in 2020
Audience? Undergraduate and graduate students pursuing STEM degrees
When? Wednesday, June 27; 9:30am – 9:30pm
What? | Where? Day Program & Evening Program in Frankfurt
Fee? Free admission for STEM Students
Registration? Registration will open in spring 2018, just for 70 attendeesfirst come first serve
Infos? Announcement www.isc-hpc.com
Anja Gerbes Closing Remarks
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Day Program & Evening Program in Frankfurt
I Tutorial on HPC Applications, Systems & Programming LanguagesDr.-Ing. Bernd Mohr
I Tutorial on Machine Learning & Data Analytics Prof. Dr.-Ing. Morris Riedel
I Guided Tour of ISC Exhibition & Student Cluster Competition
I Keynote Thomas Sterling at Room Konstant, Forum, Messe Frankfurt
I Welcome Kim McMahon
I Introduction by Addison Snell
I Job Fair & Dinnerat Marriott Frankfurt
Anja Gerbes Closing Remarks
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Why Attend the STEM Day?
I science, technology, engineering or mathematics rely on HPC
I STEM degree programs are not including HPC courses into theircurriculum
I HPC: technological foundation of machine learning, AI and the Internet ofThings
I well-paying HPC-related jobs are not being filled due to a shortage of HPCskills
I depend on your skills the salary is between $80 K to $150 K
I free introduction to HPC & its role in STEM careers
I introduce the organizations that offer training in HPC
Anja Gerbes Closing Remarks
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Feedback
QuestionnaireAnja Gerbes Cluster Computing Course Date: 14.06.2017
Evaluation of the Course
very
good
very
bad
Total impression 2 2 2 2 2 2How would you evaluate. . .
. . . the content and target of the workshop? 2 2 2 2 2 2actuality 2 2 2 2 2 2comprehensibility 2 2 2 2 2 2relevance of content 2 2 2 2 2 2practical relevance 2 2 2 2 2 2handout 2 2 2 2 2 2
. . . the professional competence of the course instructor? 2 2 2 2 2 2
. . . the presentation? 2 2 2 2 2 2
. . . the methodical-didactic competence with regard to . . .the structure of learning content 2 2 2 2 2 2and its presentation? 2 2 2 2 2 2
. . . the participant orientation? 2 2 2 2 2 2
. . . the equipment and environment? 2 2 2 2 2 2
Which course . . . you join?did would
UNIX 2 2TOOLS 2 2SHELL 2 2CLUSTER 4 2PYTHON 2 2CPP 2 2TOTALVIEW 2 2MAKE 2 2HPC 2 2LIKWID 2 2VAMPIR 2 2
Is there any topic missingthat you are interested in?
Length of the course 2 adequate 2 too short 2 too longDepth of the content 2 adequate 2 too superficial 2 too profoundSubject of the course 2 important for me 2 minor for meDo you wish further courses on this subject? 2 Yes 2 NoWould you recommend this course? 2 Yes 2 NoWill you be using the material later on? 2 Yes 2 No
Follow up courses
in Python
Would you be interested in afollow-up course about . . .
. . . TDD with Python? 2 Yes 2 No
. . . Python project development? 2 Yes 2 No. . . Other python related topic?Which one?
http://csc.uni-frankfurt.dehttp://www.hpc-hessen.de
Center for Scientific ComputingHessisches Kompetenzzentrum für Hochleistungsrechnen
QuestionnaireWhat did you like most about the course?
What did you like least about the course?
Which ideas and suggestions do you have for this course?
What content would you have additionally preferred?
How did you like the exercises?
How were you informed about this course?
2 WWW 2 colleagues 2 direct mail 2 introductory courses 2 other
About You
Affiliation (type):
2 student 2 PhD student 2 employee (but not PhD student)
Employment:
2 at University of Frankfurt
2 at University of Kassel
2 at University of Marburg
2 at University of Gießen
2 at University of Darmstadt2 at GSI2 at German federal research labs,
MPI, FhG2 at other university
2 at other institute/company
Faculty:
2 Physics 2 ComputerScience
2 Mathematics 2 Chemistry 2 Biology 2 Engineer 2 other
http://csc.uni-frankfurt.dehttp://www.hpc-hessen.de
Center for Scientific ComputingHessisches Kompetenzzentrum für HochleistungsrechnenAnja Gerbes Closing Remarks
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
General Contact Information
Anja Gerbes M. Sc. Computer Science Thank you for your attention.Physics Building, room 1.127 Questions?
Tel.: +49 69 798-47356Fax: +49 69 798-47360E-Mail: [email protected]: [email protected]: [email protected] software
[email protected] HPC-questionsWebsite: csc.uni-frankfurt.de
public CSC-Meeting:Every first Wednesday of the month at 10:00amin the physics building, room 1.101.
Anja Gerbes Closing Remarks
SLURM GlossaryCluster Computing in Frankfurt
Anja Gerbes
Goethe University in Frankfurt/MainCenter for Scientific Computing
December 12, 2017
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
sbatchSubmitting a Batch Script
I Just exclusive nodes are available on the LOEWE-CSC-Cluster.
I The following commands are therefore not considered:--exclusive exklusive nodes-s, --share shared nodes
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
sbatchSubmitting a Batch Script
Syntax: sbatch myBatchScript.sh
-C, --constraint=<feature> specify features of a Cluster
Intel --constraint=intel20
Intel Ivy Bridge
Intel --constraint=broadwell
Intel Broadwell
AMD --constraint=dual
AMD Magny-Cour with AMD Radeon HD 5800
GPU --partition=gpu
Intel with AMD FirePro S10000
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
sbatchSubmitting a Batch Script
Syntax: sbatch myBatchScript.sh
-p, --partition=<partition> specify partition for the resource-J, --job-name=<jobname> specify a name for allocation-w, --nodelist=<list> specify list of node names-A, --account=<account> select a project-t, --time=<time> set limit on total run time of job-a, --array=<indexes> submit a job array-o, --output=<name-%j>.out save the output file-e, --error=<name-%j>.err save the error log to a file
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
sbatchSubmitting a Batch Script
Syntax: sbatch myBatchScript.sh
-N, --nodes=<min[-max]> No. of nodes requestedHow many nodes will be allocated to this job?
-n, --ntasks=<number> total No. of processes--ntasks-per-node . . . per node--ntasks-per-socket . . . per socket--ntasks-per-core . . . per coreHow many processes are started?
-c, --cpus-per-task=<ncpus>
How many threads run on the node?
No. of processors per taskcontrols the number of CPUsby the allocated task
--tasks-per-node processes per node
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
sbatchSubmitting a Batch Script
Syntax: sbatch myBatchScript.sh
--mem=<MB> sets total memory across all cores--mem-per-cpu=<MB> sets the value for each requested core--mem_bind=<type> bind tasks to memory
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
sbatchSubmitting a Batch Script
Syntax: sbatch myBatchScript.sh
--mail-user=<email> send an E-Mail--mail-type=<mode> status information
choose a mail type:BEGIN, END, FAIL, REQUEUE und ALL
receive an e-mail notification to getinformation about the status changes
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
sbatchSubmitting a Batch Script
-m, --distribution=<block|cyclic|arbitrary|plane>
mapping of processes
The tasks are distributed as follows
block they share a node
cyclic they are distributed over consecutive nodes(Round-Robin-Concept)
arbitrary as in the environment variableSLURM_HOSTFILE
plane in blocks of a predetermined size
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
Process BindingConstraints each Process to run on specific Processors
srun --cpu_bind process binding to cores & CPUsmpirun --bind-to |-core|-socket|-none|
--cpus-per-proc <perproc>
bind each process to the specified number of cpus
--report-bindings report any bindings for launched processes--slot-list <id> list of processor IDs to be used for binding MPI
processes
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
sacctDisplaying Accounting Data for all Jobs and Job Steps
Syntax: sacct [options]
-b, --brief displays jobid, status, exitcode-e, --helpformat print a list of available fields-o, --format comma separated list of fields
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
sacctmgrShowing Slurm Account Information
Syntax: sacctmgr [options] [command]
list | show display information about the specified entity
sacctmgr list QOS partition format=maxnodes,
maxnodesperuser,
maxjobsperuser,
maxsubmitjobsperuser
In addition to the QOS , there are the following entities:Account, Association, Cluster, Configuration, Event, Problem,Transaction, User & WCKey.
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
scancelDeleting its own Batch Job
Syntax: scancel <jobid>
-u <username> cancel all the jobs for a user-t PD -u <username> cancel all the pending jobs for a user
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
sinfoShowing Information about Nodes and Partitions
Syntax: sinfo [options]
-l, --long print more detailed information-n <nodes> print info only about the specific node-p <partition> print info about the specified partition-R, --list-reasons list reasons nodes are in the down,
drained, fail or failing state
-s, --summarize list only a partition state summary with nonode state details
sinfo -p partition
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
squeueQuery the List of Pending and Running Jobs
Syntax: squeue [options]
-i <seconds> report requested information-j <job_id_list> print list of job IDs--start report expected start time & resources to
be allocated for pending jobs-t <state_list> print specified states of jobs-u <user_list> print jobs from list of users
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
scontrolShowing detailed Information about Compute Nodes
Syntax: scontrol [options] [command]
-d, --details additional information of the show command-o, --oneliner all information of a record in a row
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
scontrolShowing detailed Information about Compute Nodes
Syntax: scontrol show [options] [command]
job <job_id> print job informationsnode <name> print node informationspartition <name> print partition informationsreservation print list of reservations
scontrol show partition
scontrol -d show partition job <job_id>
Anja Gerbes SLURM Glossary
General InformationCluster FactsBatch Usage
Definitions & TheoryHOW-TO SLURMAppendix (Scripts, Closing Remarks, SLURM Glossary)
scontrolShowing detailed Information about Compute Nodes
Syntax: scontrol hold|resume|requeue
hold <job_id> pause a particular jobresume <job_id> resume a particular jobrequeue <job_id> requeue (cancel & rerun) a particular job
Anja Gerbes SLURM Glossary