+ All Categories
Home > Documents > Introduction to MARCC

Introduction to MARCC

Date post: 13-Feb-2017
Category:
Upload: dinhdiep
View: 218 times
Download: 1 times
Share this document with a friend
37
MARCC 1 M aryland A dvanced R esearch C omputing C enter Jaime E. Combariza, PhD Director
Transcript
Page 1: Introduction to MARCC

MARCC

1

•Maryland • Advanced • Research • Computing • Center Jaime E. Combariza, PhD

Director

Page 2: Introduction to MARCC

Slides available online

2

•www.marcc.jhu.edu/training

Page 3: Introduction to MARCC

Model & Funding

3

•Grant from the State of Maryland to Johns Hopkins University to build an HPC/big data facility.•Building, IT stack and Networking•Operational cost covered by 5 schools:

•Krieger School of Arts & Sciences (JHU)•Whiting Schools of Engineering (JHU)•School of Medicine (JHU)•Bloomberg Schools of Public Health (JHU)•University of Maryland at College Park (UMCP)

Page 4: Introduction to MARCC

4

Page 5: Introduction to MARCC

Hardware

5

Count Description

676 Compute nodes, 128GB RAM, 24 cores, 2.5GHz Haswell processors

50 Large memory nodes, 1024 GB RAM, 48 cores

48 GPU nodes, 2 Nvidia K80 GPUs/node, 24 CPU cores

2 PBytes High Performance File System (Lustre)

18 Pbytes ZFS File System

FDR-14 Infiniband (compute nodes and storage), 56 gbps

RPeak ~900 TFLOPs, RMax=476 TFLOPs (top 10, US universities)

Page 6: Introduction to MARCC

HPC Resources & Model

6

•Approx 19,000 cores and 20 PB storage

KSAS:10 M Quarter

WSE: 10M/Q

SOM: 6 M/Q

BSPH: 6 M/Q

UMCP: 6 M/Q

Reserve: 1.8 M/Q

Page 7: Introduction to MARCC

Allocations

7

•Deans requested applications from all faculty members•Allocations granted according to available resources•Trial period until end of August

•http://marcc.jhu.edu/request-access/marcc-allocation-

Page 8: Introduction to MARCC

Remarks

8

•MARCC is free of cost to PIs. The schools pay for the operations•Authentication will be via a two-factor authentication•At this time only open-data (no HIPAA)

Page 9: Introduction to MARCC

Storage

9

Directory Quota Backup (cost) Additional storage

/home 20 GBytes on ZFS YES (no cost) NO

~/scratch 1 TB per group on Lustre, user access NO YES (>10TB, Vice Dean)

~/work shared quota with ~/scratch, group access

~/data 10 TB per group on ZFS, request MARCC YES (no cost) N/A

~/work-zfs 50 TB per group on ZFS, request Vice Dean YES ($30/TB/yr) $30/TB/yr + backup cost

~/project < 6 months, upon request (Vice Dean) NO N/A

Page 10: Introduction to MARCC

Connecting

10

•Windows: Putty, openSSH, Cygwin•Mac: X11 or XQuartz•VNC (soon)

•ssh [-Y] gateway2.marcc.jhu.edu -l userid

Page 11: Introduction to MARCC

File transfer

11

1. ssh dtn5.marcc.jhu.edu -l userid ssh dtn2.marcc.jhu.edu -l userid 2.Use globus connect•Request an account (www.globus.org)•Download client or use website•Create end point (if needed)

Page 12: Introduction to MARCC

12

Modules

Page 13: Introduction to MARCC

Software & Modules

13

•MARCC will manage software availability using the “environment modules” (Lmod from TACC)•module - - help•module avail•module spider python•module list

Page 14: Introduction to MARCC

More on Modules

14

•Module “load” changes the user’s path to prepend the package being loaded to the user’s environment.•Example: python•>which python•/usr/bin/python•module show python [Shows how the user’s environment will be modified, PATH, LIBRARY_PATH]

Page 15: Introduction to MARCC

Modules defaults

15

• /cm/shared/modulefiles/python/2.7.9.lua:•----------------------------------• whatis("adds Python to your environment variables ")• prepend_path("PATH","/cm/shared/apps/python/2.7.9/bin")• prepend_path("LD_LIBRARY_PATH","/cm/shared/apps/python/

2.7.9/lib")• prepend_path("LIBRARY_PATH","/cm/shared/apps/python/2.7.9/lib")• prepend_path("MANPATH","/cm/shared/apps/python/2.7.9/share/

man")• prepend_path("CPATH","/cm/shared/apps/python/2.7.9/include")

Page 16: Introduction to MARCC

Modules Examples

16

• [jcombar1@login-node01 ~]$ module load python• [jcombar1@login-node01 ~]$ module list• Currently Loaded Modules:• 1) gcc/4.8.2 2) slurm/14.11.03 3) python/2.7.9• [jcombar1@login-node01 ~]$ which python• /cm/shared/apps/python/2.7.9/bin/python• [jcombar1@login-node01 ~]$ echo $PATH• /cm/shared/apps/python/2.7.9/bin:.:/home/jcombar1/MPI/bin:/

home/jcombar1/opt/pb/binutils-2.24/bin:/cm/shared/apps/slurm/current/bin:/cm/shared/apps/slurm/14.11.3/sbin:/cm/shared/apps/slurm/14.11.3/bin:/cm/shared/apps/gcc/4.8.2/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/

Page 17: Introduction to MARCC

module help application

17

•module help gaussian------------------- Module Specific Help for "gaussian/g09" --------------------This is a computational chemistry application.

web site: http://www.gaussian.com

Manual on line: http://www.gaussian.com/g_tech/g_ur/g09_help.htm

To run it in batch mode use a script like this one:#!bin/bash -l#SBATCH --time=1:0:0#SBATCH --nodes=1#SBATCH --ntasks-per-node=8#SBATCH --partition=shared#SBATCH --mem=120000MB####THE ABOVE REQUESTS 120GB RAMmodule load gaussianmkdir -p /Scratch/$USER/$SLURM_JOBIDexport GAUSS_SCRDIR=/Scratch/$USER/$SLURM_JOBIDdatetime g09 water(.com) date#

Page 18: Introduction to MARCC

18

SLURM

Page 19: Introduction to MARCC

Queuing system

19

•MARCC allocates resources to users using a transparent and fair process by means of a “queueing system”.•SLURM (Simple Linux Universal Resource Manager) •Open source, adopted at many HPC centers

Page 20: Introduction to MARCC

SLURM Commands

20

•man slurm•http://marcc.jhu.edu/getting-started/running-jobs/

Page 21: Introduction to MARCC

Queues/Partitions

21

•Link to websitePartition Default/Max Time

Default/Max Cores

Default/MAx Mem

Serial/Parallel Backfilling

SHARED 1 hr/ 7 d 1/24 5 GB / 128GB

Serial/ Parallel Shared

UNLIMITED Unlimited 1/24 5 Gb/128GB

Serial/Parallel Shared

PARALLEL 1 hr/ 7d 5 GB / 128GB Parallel Exclusive

GPU Unlimited 6/24 5 GB / 128GB

Serial/Parallel Shared

LRGMEM Unlimited 48 120GB/1024GB

Serial/Parallel Shared

Scavenger Max 12hr 1/24 5Gb/128GB

Serial/Parallel Shared

Page 22: Introduction to MARCC

Partitions/Shared

22

•Share compute nodes•Serial or parallel jobs•time limits 1hr to 7 days•1 -24 cores•one node

Page 23: Introduction to MARCC

Partitions/Unlimited

23

•Unlimited time !!!•Jobs that need to run for more than 7 days•If the system/node crashes the job will be killed•one to 24 cores, one or multi node•serial or parallel

Page 24: Introduction to MARCC

Partitions/Parallel

24

•Dedicated queue•exclusive nodes•single and multi-node jobs•1hr to 7 days•Parallel jobs only

Page 25: Introduction to MARCC

Partitions/scavenger

25

•Must use with qos=scavenger•Low priority jobs•Time maximum 12 hours•Jobs may be killed if high priority nodes need resources•Use only if your allocation ran out

Page 26: Introduction to MARCC

SLURM Flags

26

Description FLAGScript directive #SBATCH

Job Name #SBATCH --job-name=Any-nameRequested time #SBATCH -t minutes

#SBATCH -t [days-hrs:min:sec]Nodes requested #SBATCH -N min-Max

#SBATCH --nodes=NumberNumber of cores per node #SBATCH --ntasks-per-node=12Number of cores per task #SBATCH --cpus-per-task=2

Mail #SBATCH --mail-type=endUser’s email address #SBATCH [email protected]

Memory size #SBATCH --mem=[mem|M|G|T]Job Arrays #SBATCH --array=[array_spec]

Request specific resource #SBATCH --constraint=“XXX”

Page 27: Introduction to MARCC

SLURM Env variables

27

Description Variable

JobID $SLURM_JOBIDSubmit

Directory$SLURM_SUBMIT_DIR

Submit Host $SLURM_SUBMIT_HOST

Node List $SLURM_JOB_NODELISTJob Array

Index$SLURM_ARRAY_TASK_ID

> printenv | grep SLURM

Page 28: Introduction to MARCC

SLURM Scripts

28

cp -r /home/jcombar1/scripts .

#!/bin/bash -l#SBATCH --job-name=MyJob#SBATCH --time=24:0:0#SBATCH --nodes=1#SBATCH --ntasks-per-node=24#SBATCH --mail-type=end#SBATCH —[email protected]#SBATCHmodule load mvapich2/gcc/64/2.0b #### load mvapich2 module

Page 29: Introduction to MARCC

Running Jobs

29

• sbatch (qsub) script -name• squeue (qstat -a) -u userid

[[email protected]]$ qstat -a -u [email protected]

login-vmnode01.cm.cluster: Req'd Req'd Elap Job id Username Queue Name SessID NDS TSK Memory Time Use S Time -------------------- -------- -------- -------------------- ------ ----- ----- ------ ----- - -----

Page 30: Introduction to MARCC

SLURM commands

30

•scancel (qdel) jobid•scontrol show job jobID•sinfo•sinfo -p shared

Page 31: Introduction to MARCC

Interactive work

31

•interact -usage• usage: interact [-n cores] [-t walltime] [-m memory] [-p queue]• [-o outfile] [-X] [-f featurelist] [-h hostname] [-g ngpus]• Interactive session on a compute node• options:• -n cores (default: 1)• -t walltime as hh:mm:ss (default: 30:00)• -m memory as #[k|m|g] (default: 5GB)• -p partition (default: 'def')• -o outfile save a copy of the session's output to outfile (default: off)• -X enable X forwarding (default: no)• -f featurelist CCV-defined node features (e.g., 'e5-2600'),• combined with '&' and '|' (default: none)• -h hostname only run on the specific node 'hostname'• (default: none, use any available node)

Page 32: Introduction to MARCC

Job arrays

32

• #!/bin/bash -l• #SBATCH --job-name=job-array• #SBATCH --time=1:0:0• #SBATCH --array=1-240• #SBATCH --nodes=1• #SBATCH --ntasks-per-node=1• #SBATCH --partition=shared• #SBATCH --mem=120000MB• #SBACTH --mail-type=end• #SBATCH [email protected]

• # run your job

• echo "Start Job $SLURM_ARRAYID on $HOSTNAME"

Page 33: Introduction to MARCC

GPUs and Interactive Jobs

33

•#SBATCH -p gpu —gres=gpu:4

•interact -p gpu -g 1

Page 34: Introduction to MARCC

Compilers/Compiling

34

•Intel compilers•module list•ifort -O3 -openmp -o my.exe my.f90•icc -g -o myc.x myc.c

•GNU•gfortran -O4 -o myg.f90

Page 35: Introduction to MARCC

MPI jobs

35

•module spider mpi•module load mvapich2•mpif90 or mpicc code(.f90 or .c)

•mpiexec code.x (within compute node•USE mpiicc or mpif90 (Intel-mpi)

Page 36: Introduction to MARCC

Warning

36

•No refunds so make sure you are using MARCC resources effectively

Page 37: Introduction to MARCC

Information

37

[email protected]

•Web site marcc.jhu.edu


Recommended