Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | whitney-short |
View: | 214 times |
Download: | 1 times |
Batch Systems
• In a number of scientific computing environments, multiple users must share a compute resource:– research clusters– supercomputing centers
• On multi-user HPC clusters, the batch system is a key component for aggregating compute nodes into a single, sharable computing resource
• The batch system becomes the “nerve center” for coordinating the use of resources and controlling the state of the system in a way that must be “fair” to its users
• As current and future expert users of large-scale compute resources, you need to be familiar with the basics of a batch system
Batch Systems• The core functionality of all batch systems are essentially the same,
regardless of the size or specific configuration of the compute hardware:– Multiple Job Queues:
• queues provide an orderly environment for managing a large number of jobs• queues are defined with a variety of limits for maximum run times, memory
usage, and processor counts; they are often assigned different priority levels as well
• may be interactive or non-interactive– Job Control:
• submission of individual jobs to do some work (eg. serial, or parallel HPC applications)
• simple monitoring and manipulation of individual jobs, and collection of resource usage statistics (e.g., memory usage, CPU usage, and elapsed wall-clock time per job)
– Job Scheduling• policy which decides priority between individual user jobs• allocates resources to scheduled jobs
Batch Systems• Job Scheduling Policies:
– the scheduler must decide how to prioritize all the jobs on the system and allocate necessary resources for each job (processors, memory, file-systems, etc)
– scheduling process can be easy or non-trivial depending on the size and desired functionality
• first in, first out (FIFO) scheduling: jobs are simply scheduled in the order in which they are submitted
• political scheduling: enables some users to have more priority than others• fairshare scheduling, scheduler ensures users have equal access over time
– Additional features may also impact scheduling order:• advanced reservations - resources can be reserved in advance for a
particular user or job• backfill - can be combined with any of the scheduling paradigms to allow
smaller jobs to run while waiting for enough resources to become available for larger jobs
– back-fill of smaller jobs helps maximize the overall resource utilization– back-fill can be your friend for small duration jobs
Batch Systems
• Common batch systems you may encounter in scientific computing:– Platform LSF– PBS– Loadleveler (IBM)– SGE
• All have similar functionality but different syntax
• Reasonably straight forward to convert your job scripts from one system to another
• Above all include specific batch system directives which can be placed in a shell script to request certain resources (processors, queues, etc).
• We will focus on LSF primarily since it is the system running on Lonestar
Batch Submission Process
internet
internet
Server
Head
C1 C2 C3 C4Submission:bsub < job
Queue: Job Script waits for resources on Server Master: Compute Node that executes the job
script, launches ALL MPI processes Launch: ssh to each compute node to start
executable (e.g. a.out)
Launch mpirun
Master
Queue
Compute Nodes
mpirun –np # ./a.out
ibrun ./a.out
LSF Batch System• Lonestar uses Platform LSF for both the batch queuing system and scheduling
mechanism (provides similar functionality to PBS, but requires different commands for job submission and monitoring)
• LSF includes global fairshare, a mechanism for ensuring no one user monopolizes the computing resources
• Batch jobs are submitted on the front end and are subsequently executed on compute nodes as resources become available
• Order of job execution depends on a variety of parameters:
– Submission Time
– Queue Priority: some queues have higher priorities than others
– Backfill Opportunities: small jobs may be back-filled while waiting for bigger jobs to complete
– Fairshare Priority: users who have recently used a lot of compute resources will have a
lower priority than those who are submitting new jobs
– Advanced Reservations: jobs my be blocked in order to accommodate advanced
reservations (for example, during maintenance windows)
– Number of Actively Scheduled Jobs: there are limits on the maximum number of concurrent
processors used by each user
Lonestar Queue Definitions
Queue NameMax
RuntimeMin/Max
Procs
SU Charge Rate
Use
normal 24 hours 2/256 1.0 Normal usage
high 24 hours 2/256 1.8 Higher priority usage
development 15 min 1/32 1.0Debugging and developmentAllows interactive jobs
hero 24 hours >256 1.0Large job submission
Requires special permission
serial 12 hours 1/1 1.0For serial jobs. No more than 4 jobs/user
request Special Requests
spruceDebugging & development, special priority, urgent comp. env.
systest System Use (TACC Staff only)
Lonestar Queue Definitions
• Additional Queue Limits– In the normal and high queues, only a maximum of 512
processes can be used at one time. Jobs requiring more processors are deferred for possible scheduling until running jobs complete. For example, a single user can have the following job combinations eligible for scheduling:
• Run 2 jobs requiring 256 procs• Run 4 jobs requiring 128 procs each• Run 8 jobs requiring 64 procs each• Run 16 jobs requiring 32 procs each
– A maximum of 25 queued jobs per user is allowed at one time
LSF Fairshare
• A global fairshare mechanism is implemented on Lonestar to provide fair access to its substantial compute resources
• Fairshare computes a dynamic priority for each user and uses this priority in making scheduling decisions
• Dynamic priority is based on the following criteria– Number of shares assigned– Resources used by jobs belonging to the user:
• Number of job slots reserved• Run time of running jobs• Cumulative actual CPU time (not normalized), adjusted so that recently
used CPU time is weighted more heavily than CPU time used in the distant past
LSF Fairshare
• bhpart: Command to see current fairshare priority. For example:
lslogin1--> bhpart -rHOST_PARTITION_NAME: GlobalPartitionHOSTS: all
SHARE_INFO_FOR: GlobalPartition/USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIMEavijit 1 0.333 0 0 0.0 0chona 1 0.333 0 0 0.0 0ewalker 1 0.333 0 0 0.0 0minyard 1 0.333 0 0 0.0 0phaa406 1 0.333 0 0 0.0 0bbarth 1 0.333 0 0 0.0 0milfeld 1 0.333 0 0 2.9 0karl 1 0.077 0 0 51203.4 0vmcalo 1 0.000 320 0 2816754.8 7194752
Pri
ority
Commonly Used LSF Commandsbhosts
Displays configured compute nodes and their static and dynamic resources (including job slot limits)
lsloadDisplays dynamic load information for compute nodes (avg CPU usage, memory usage, available /tmp space)
bsub submits a batch job to LSF
bqueues displays information about available queues
bjobs displays information about running and queued jobs
bhist displays historical information about jobs
bstop suspends unfinished jobs
bresume resumes one or more suspended jobs
bkill Sends signal to kill, suspend, or resume unfinished jobs
bhpart Displays global fairshare priority
lshosts Displays hosts and their static resource configuration
lsuser Shows user job information
Note: most of these commands support a “-l” argument for long listings. For example: bhist –l <jobID> will give a detailed history of a specific job. Consult the man pages for each of these commands for more information.
Note: most of these commands support a “-l” argument for long listings. For example: bhist –l <jobID> will give a detailed history of a specific job. Consult the man pages for each of these commands for more information.
LSF Batch System
• LSF Defined Environment Variables:LSB_ERRORFILE name of the error file
LSB_JOBID batch job id
LS_JOBPID process id of the job
LSB_HOSTS list of hosts assigned to the job. Multi-cpu hosts will appear more than once (may get truncated)
LSB_QUEUE batch queue to which job was submitted
LSB_JOBNAME name user assigned to the job
LS_SUBCWD directory of submission, i.e. this variable is set equal to $cwd when the job is submitted
LSB_INTERACTIVE set to ‘y’ when the –I option is used with bsub
LSF Batch System
• Comparison of LSF, PBS and Loadleveler commands that provide similar functionality
LSF PBS Loadleveler
bresume qrls | qsit llhold -r
bsub qsub llsubmit
bqueues qstat llclass
bjobs qstat llq
bstop qhold llhold
bkill qdel llcancel
Batch System Concerns
• Submission (need to know)– Required Resources– Run-time Environment– Directory of Submission– Directory of Execution– Files for stdout/stderr Return– Email Notification
• Job Monitoring• Job Deletion
– Queued Jobs– Running Jobs
LSF: Basic MPI Job Script
#!/bin/csh#BSUB -n 32#BSUB -J hello#BSUB -o %J.out#BSUB -e %J.err#BSUB -q normal#BSUB -P A-ccsc#BSUB -W 0:15
echo "Master Host = "`hostname`echo "LSF_SUBMIT_DIR: $LS_SUBCWD"echo "PWD_DIR: "`pwd`
ibrun ./hello
Total number of processesJob nameStdout Output file name (%J = jobID)
Submission queue
Echo pertinent environment info
Execution command
executableParallel application manager and mpirun wrapper script
Stderr Output file name
Your Project NameMax Run Time (15 minutes)
LSF: Extended MPI Job Script#!/bin/csh#BSUB -n 32#BSUB -J hello#BSUB -o %J.out#BSUB -e %J.err#BSUB -q normal#BSUB -P A-ccsc#BSUB -W 0:15#BSUB -w ‘ended(1123)'#BSUB -u [email protected]#BSUB -B#BSUB -N
echo "Master Host = "`hostname`echo "LSF_SUBMIT_DIR: $LS_SUBCWD"
ibrun ./hello
Total number of processesJob nameStdout Output file name (%J = jobID)
Submission queueStderr Output file name
Your Project Name
Email addressEmail when job begins executionEmail job report informationupon completion
Dependency on Job <1123>Max Run Time (15 minutes)
LSF: Job Script Submission
• When submitting jobs to LSF using a job script, a redirection is required for bsub to read the commands. Consider the following script:
lslogin1> cat job.script#!/bin/csh#BSUB -n 32#BSUB -J hello#BSUB -o %J.out#BSUB -e %J.err#BSUB -q normal#BSUB -W 0:15echo "Master Host = "`hostname`echo "LSF_SUBMIT_DIR: $LS_SUBCWD“echo "PWD_DIR: "`pwd`
ibrun ./hello
• To submit the job:
lslogin1% bsub < job
Re-direction is required!
LSF: Interactive Execution• Several ways to run interactively
– Submit entire command to bsub directly:
> bsub –q development -I -n 2 -W 0:15 ibrun ./hello
Your job is being routed to the development queueJob <11822> is submitted to queue <development>.<<Waiting for dispatch ...>><<Starting on compute-1-0>> Hello, world! --> Process # 0 of 2 is alive. ->compute-1-0 --> Process # 1 of 2 is alive. ->compute-1-0
– Submit using normal job script and include additional -I directive:
> bsub -I < job.script
Batch Script Suggestions
• Echo issuing commands – (“set -x” and “set echo” for ksh and csh).
• Avoid absolute pathnames– Use relative path names or environment variables ($HOME,
$WORK)
• Abort job when a critical command fails. • Print environment
– Include the "env" command if your batch job doesn't execute the same as in an interactive execution.
• Use “./” prefix for executing commands in the current directory– The dot means to look for commands in the present working
directory. Not all systems include "." in your $PATH variable. (usage: ./a.out).
• Track your CPU time
LSF Job Monitoring (showq utility)lslogin1% showqACTIVE JOBS--------------------JOBID JOBNAME USERNAME STATE PROC REMAINING STARTTIME
11318 1024_90_96x6 vmcalo Running 64 18:09:19 Fri Jan 9 10:43:5311352 naf phaa406 Running 16 17:51:15 Fri Jan 9 10:25:4911357 24N phaa406 Running 16 18:19:12 Fri Jan 9 10:53:46 23 Active jobs 504 of 556 Processors Active (90.65%)
IDLE JOBS----------------------JOBID JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
11169 poroe8 xgai Idle 128 10:00:00 Thu Jan 8 10:17:0611645 meshconv019 bbarth Idle 16 24:00:00 Fri Jan 9 16:24:18 3 Idle jobs
BLOCKED JOBS-------------------JOBID JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
11319 1024_90_96x6 vmcalo Deferred 64 24:00:00 Thu Jan 8 18:09:1111320 1024_90_96x6 vmcalo Deferred 64 24:00:00 Thu Jan 8 18:09:11 17 Blocked jobs
Total Jobs: 43 Active Jobs: 23 Idle Jobs: 3 Blocked Jobs: 17
LSF Job Monitoring (bjobs command)lslogin1% bjobsJOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME11635 bbarth RUN normal lonestar 2*compute-8 *shconv009 Jan 9 16:24 2*compute-9-22 2*compute-3-25 2*compute-8-30 2*compute-1-27 2*compute-4-2 2*compute-3-9 2*compute-6-1311640 bbarth RUN normal lonestar 2*compute-3 *shconv014 Jan 9 16:24 2*compute-6-2 2*compute-6-5 2*compute-3-12 2*compute-4-27 2*compute-7-28 2*compute-3-5 2*compute-7-511657 bbarth PEND normal lonestar *shconv028 Jan 9 16:3811658 bbarth PEND normal lonestar *shconv029 Jan 9 16:3811662 bbarth PEND normal lonestar *shconv033 Jan 9 16:3811663 bbarth PEND normal lonestar *shconv034 Jan 9 16:3811667 bbarth PEND normal lonestar *shconv038 Jan 9 16:3811668 bbarth PEND normal lonestar *shconv039 Jan 9 16:38
Note: Use “bjobs -u all” to see jobs from all users.
LSF Job Monitoring (lsuser utility)
lslogin1$ lsuser -u vapJOBID QUEUE USER NAME PROCS SUBMITTED547741 normal vap vap_hd_sh_p96 14 Tue Jun 7 10:37:01
2005
HOST R15s R1m R15m PAGES MEM SWAP TEMPcompute-11-11 2.0 2.0 1.4 4.9P/s 1840M 2038M
24320Mcompute-8-3 2.0 2.0 2.0 1.9P/s 1839M 2041M
23712Mcompute-7-23 2.0 2.0 1.9 2.3P/s 1838M 2038M
24752Mcompute-3-19 2.0 2.0 2.0 2.6P/s 1847M 2041M
23216Mcompute-14-19 2.0 2.0 2.0 2.1P/s 1851M 2040M
24752Mcompute-3-21 2.0 2.0 1.7 2.0P/s 1845M 2038M
24432Mcompute-13-11 2.0 2.0 1.5 1.8P/s 1841M 2040M
24752M
LSF Job Manipulation/Monitoring
• To kill a running or queued job (takes ~30 seconds to complete):bkill <jobID>bkill -r <jobID> (Use when bkill alone won’t delete the job)
• To suspend a queued job:bstop <jobId>
• To resume a suspended job:bresume <jobID>
• To see more information on why a job is pending:bjobs –p <jobID>
• To see a historical summary of a job:bhist <jobID>
lslogin1> bhist 11821Summary of time in seconds spent in various states:JOBID USER JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL11821 karl hello 131 0 127 0 0 0 258