Date post: | 15-Jan-2016 |
Category: |
Documents |
Upload: | stephany-fields |
View: | 213 times |
Download: | 0 times |
Critical Flags, Variables, and Other Important ALCF Minutiae
Jini RamprakashTechnical Support Specialist
Argonne Leadership Computing Facility
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
2
Presentation outline
It’s all about your job!– Job management– Job basics
• Submission• Queuing• Execution• Termination
Software environment Optimization for beginners ALCF resources, outlined
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
3
Job management
Cobalt (the ALCF resource scheduler) is used on all ALCF systems – Similar to PBS but not the same– Find more information at http://trac.mcs.anl.gov/projects/cobalt
Job management commands:– qsub: submit a job– qstat: query a job status– qdel: delete a job– qalter: alter batched job parameters– qmove: move job to different queue– qhold: place queued (non-running) job on hold– qrls: release hold on job– showres: show current and future reservations
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
4
Job basics – submission
Two modes of submitting jobs– Basic– Script mode
Get all flags and options by running ‘man qsub’ For example: qsub -A alchemy -n 40960 --mode c1 -t 720 --env “OMP_NUM_THREADS=4”
lead_to_gold– In English: Charge project “Alchemy” for this job. Run on 40960 nodes, with one MPI
rank per node. Run for 720 minutes. Set the “OMP_NUM_THREADS” environment variable to 4. Run the “lead_to_gold” binary.
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
5
qsub checks your submission for sanity
Did you specify a nodecount and walltime? Are they legal? Is the mode you specified valid? Did you ask for more than the minimum runtime? Are you a member of the project you specified? Does that project have a usable
allocation? If so … all systems go! Get a JOBID, and put it in the queue
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
6
Not there yet!
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
7
Job basics - life in the queue
Periodically, your job’s score will increase Periodically, the scheduler will decide if there are any jobs it wants to run Check current state with qstat At some point, your score will be high enough, and it will be YOUR TURN!
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
8
Score accrual
Large jobs are prioritized Jobs that have been waiting long are prioritized INCITE/ALCC projects are prioritized Negative allocations have a score cap lower than the starting score of other jobs
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
9
Job basics - execution
Book-keeping– Put a start record in the database. Output a log file start record. Send email of job start
if –notify was requested. Start job timers Fire up to execute the job
– Cobalt boots partition– runjob starts executable
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
10
Script mode jobs
All jobs launch via runjob on the service nodes Script mode jobs launch your script on a special login node That script is responsible for calling runjob to launch the actual compute-node job You are charged for the duration of the script
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
11
Job basics – termination aka are we there yet?
Your requested wall-time ticks down. Either your runjob returns, or you run out of wall-time and your job is forcibly removed
Job-end cleanup happens– If your partition wasn’t cleaned up, that happens now
Job-end book-keeping happens– Database, log file, notify if requested
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
12
Job basics – Termination, life after your job
If you had a job depending on you, it can be released to run. If you had a non-zero exit code, it moves to dep_fail instead
That night, the log files will be fed into clusterbank (the ALCF accounting system) to create charges
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
13
Non-standard job events
Reservations and/or draining qsub rejection Job holds Job redefinition (qalter) Job removal (qdel) Abnormal job failure Why isn’t this job running?
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
14
Software environment - SoftEnv
A tool for managing your environment– Sets your PATH to access desired front-end tools– Your compiler version can be changed here
Settings:– Maintained in the file ~/.soft– Add/remove keywords from ~/.soft to change environment– Make sure @default is at the very end
Commands:– softenv
• a list of all keywords defined on the systems– resoft
• reloads initial environment from ~/.soft file– soft add|remove keyword
• Temporarily modify environment by adding/removing keywords
http://www.mcs.anl.gov/hs/software/systems/softenv/softenv-intro.html
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
15
Software libraries
ALCF Supports two sets of libraries:– IBM system and provided libraries: /bgsys/drivers/ppcfloor
• glibc• mpi
– Site supported libraries and programs: /soft/• PETSc• ESSL
– And many others • See http://www.alcf.anl.gov/resource-guides/software-and-libraries
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
16
Compiler wrappers
MPI wrappers for IBM XL cross-compilers:
MPI wrappers for GNU cross-compilers:
Wrapper Thread-Safe Wrapper
Underlying Compiler
Description
mpixlc mpixlc_r bgxlc IBM BG C Compiler
mpixlcxx mpixlcxx_r bgxlC IBM BG C++ Compiler
mpixlf77 mpixlf77_r bgxlf IBM BG Fortran 77 Compiler
mpixlf90 mpixlf90_r bgxlf90 IBM BG Fortran 90 Compiler
mpixlf95 mpixlf95_r bgxlf95 IBM BG Fortran 95 Compiler
mpixlf2003 mpixlf2003_r bgxlf2003 IBM BG Fortran 2003 Compiler
Wrapper Underlying Compiler Description
mpicc powerpc-bgp-linux-gcc GNU BG C Compiler
mpicxx powerpc-bgp-linux-g++ GNU BG C++ Compiler
mpif77 powerpc-bgp-linux-gfortran GNU BG Fortran 77 Compiler
mpif90 powerpc-bgp-linux-gfortran GNU BG Fortran 90 Compiler
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
17
Optimization for beginners
Suggested set of optimization levels from least to most optimization: -O0 # best level for use with a debugger -O2 # good level for verifying correctness, baseline perf -O2 -qmaxmem=-1 -qhot=level=0 -O3 -qstrict (preserves program semantics) -O3 -O3 -qhot=level=1 -O4 -O5
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
18
Optimization tips
-qlistopt generates a listing with all flags used in compilation -qreport produces a listing, shows how code was optimized Performance can decrease at higher levels of optimization, especially at -O4 or -O5 May specify different optimization levels for different routines/files
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
19
ALCF Resources – BG/Q systems
Mira – BG/Q system – 49,152 nodes / 786,432 cores – 786 TB of memory – Peak flop rate: 10 PF – Linpack flop rate: 8.1 PF
Cetus (T&D) – BG/Q system– 1024 nodes / 16,384 cores– 16 TB of memory– Peak flop rate: 208 TF
Vesta (T&D) - BG/Q systems ‐– 2,048 nodes / 32,768 cores – 32 TB of memory – Peak flop rate: 416 TF
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
20
ALCF Resources – supporting systems
Tukey – Nvidia system – 100 nodes / 1600 x86 cores/ 200 M2070 GPUs – 6.4 TB x86 memory / 1.2 TB GPU memory – Peak flop rate: 220 TF
Storage – Scratch: 28.8 PB raw capacity, 240 GB/s bw (GPFS) – Home: 1.8 PB raw capacity, 45 GB/s bw (GPFS) – Storage upgrade planned in 2015
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
21
ALCF Resources
Mira48 racks/768K cores10 PF
Cetus (Dev)1 rack/16K cores208 TF
Tukey (Viz)100 nodes/1600 cores200 NVIDIA GPUs220 TF
Networks100Gb (via Esnet, internet2 UltraScienceNet)
Vesta (Dev)2 racks/32K cores416 TF
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
22
Coming up next…
Data Transfers in the ALCF - Robert Scott, ALCF
Argonne Leadership Computing Facility - supported by the Office of Science of the U.S. Department of Energy
23
Thank You!
Questions?