Post on 12-Mar-2020
transcript
© 2007 IBM Corporation
IBM Global Engineering Solutions
IBM Blue Gene/P
LoadLeveler Blue Gene Support
June 2010
IBM Blue Gene/P System Administration
Interaction with Blue Gene
Blue Gene
Jobs
Blue Gene Bridge API
GetResourcesAnd JobsData
Find ResourceFor jobsAndDefinePartitions
Blue Gene mpirun
submitted Run a job
LoadLeveler
For LoadLeveler documentationhttp://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.infocenter.doc/library.html
IBM Blue Gene/P System Administration
Starting LoadLeveler
� llctl start on both the FEN and SN� llstatus � look for “Blue Gene is present”
� llstatus -b
Name Base Partitions c-nodes InQ Run
BGP 4x4x2 32x32x16 0 0
� llstatus –B all � show all base partitions
� llstatus –P <partition_name>
� llstatus –b –l � show more BG resources
IBM Blue Gene/P System Administration
LoadLeveler Job Command File
# @ job_name = myjob# @ comment = "BG Job by Size"# @ error = $(home)/output/$(job_name).$(jobid).err# @ output = $(home)/output/$(job_name).$(jobid).out# @ environment = COPY_ALL;# @ wall_clock_limit = 00:20:00# @ notification = error# @ notify_user = $(user)@us.ibm.com# @ job_type = bluegene# @ bg_size = 32# @ queue/usr/bin/mpirun -exe /bgtest/hello.rts -verbose 1
IBM Blue Gene/P System Administration
Parallel Job command file on standard system
#!/bin/bash#@ job_type = BLUEGENE#@ job_name = my_job_name#@ output = $(home)/$(job_name).$(jobid).out#@ error = $(home)/$(job_name).$(jobid).err#@ executable = mpirun#@ arguments = -verbose 1 -exe /gpfs2/IBM/tototest/ toto -cwd
/gpfs2/IBM/tototest/ -mode VN –np number_of_mpi_tas ks#@ class = BGL128_1H#@ bg_size = #@ bg_partition =#@ queue
Or
mpirun -exe /gpfs2/IBM/tototest/toto -cwd /gpfs2/IBM/tototest/ -mode VN –npnumber_of_mpi_tasks
IBM Blue Gene/P System Administration
Blue Gene Job Keywords
� Mutually exclusive (one must be specified)�bg_size � number of compute nodes
�bg_shape � 1x2x4 number of BPs in x,y,z direction
�bg_partition � specify a predefined partition
� Optional�bg_connection � MESH, TORUS, PREFER_TORUS
�bg_rotate � True or False�bg_requirements � c-node memory
IBM Blue Gene/P System Administration
Job command file keywords
account_noargumentsclasscommentdependencyerrorexecutablegroup
holdinitialdirinputjob_namejob_typenotificationnotify_useroutput
preferencesrequirementsstart_datestep_nameuser_prioritywall_clock_limitbg_sizebg_partitiongroup
A number of environment variables are set by LoadLeveler for the job. These include:
LOADL_JOB_NAME
LOADL_STEP_NAME
LOADL_STEP_INITDIR
LOADL_PROCESSOR_LIST
LOADL_STEP_CLASS
etc
IBM Blue Gene/P System Administration
Runtime Environment� Available to Prologs and Epilogs
� In LoadL_config, add JOB_PROLOG = /bgtest/bg_job_prolog.sh#!/bin/kshname=`basename $0 .sh`echo "$LOADL_BG_PARTITION $LOADL_BG_SIZE
$LOADL_BG_CONNECTION $LOADL_BG_BPS $LOADL_BG_IONODES`date` $LOADL_STEP_OWNER $LOADL_STEP_ID$LOADL_STEP_CLASS " > /tmp/$name.$LOADL_STEP_ID.log
� cat /tmp/bg_job_prolog.bgpdd1sys1.rchland.ibm.com.2.0.logLL07111910011602 512 MESH R20-M1 N00-J00,N04-J00,N08-J00,N12-J00
Mon Nov 19 10:01:16 CST 2007 ezhong bgpdd1sys1.rchland.ibm.com.2.0high
IBM Blue Gene/P System Administration
Submit a Job
� llsubmit <my_job_command_file>� llq
�llq –b � show Blue Gene specific info
�llq –s <step_id> � show why the job step remains idle
�-l : long listing
�…
IBM Blue Gene/P System Administration
Blue Gene Job Info from llq# llq –l
=============== Job Step bgpdd1sys1.rchland.ibm.com .9.0 ===============...
Step Type: Blue GeneSize Requested: 512Size Allocated: 512
Shape Requested: Shape Allocated: 1x1x1
Wiring Requested: MESHWiring Allocated: MESH
Rotate: TrueBlue Gene Status: Blue Gene Job Id:
Partition Requested: Partition Allocated: LL07112110294409
BG Partition State: FREEBG Requirements:
...
IBM Blue Gene/P System Administration
Blue Gene Job Info from llq# llq
Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- - ----------- -----------bgpdd1sys1.9.0 ezhong 11/21 10:29 R 50 h igh bgpdd1sys1
1 job step(s) in queue, 0 waiting, 0 pending, 1 run ning, 0 held, 0 preempted
# llq –b
Id Owner Submitted LL BG PT Partition Size ________________________ __________ ___________ __ __ __ ________________
______bgpdd1sys1.9.0 ezhong 11/21 10:29 R FR LL07112110294409 512
1 job step(s) in queue, 0 waiting, 0 pending, 1 run ning, 0 held, 0 preempted
# llq -f %id %BB %BS %PT %BG%dd %st
Step Id Partition Size PT BG Disp. Date ST------------------------ ---------------- ------ -- -- - ---------- --bgpdd1sys1.9.0 LL07112110294409 512 FR 11/21 10:29 R
1 job step(s) in queue, 0 waiting, 0 pending, 1 run ning, 0 held, 0 preempted
IBM Blue Gene/P System Administration
Multi-step jobs
LoadLeveler's basic unit of work is the "job step"a single job can contain multiple job stepsthere can be no dependencies between jobsthere can be dependencies between steps of a job
IBM Blue Gene/P System Administration
Example multi-step job control file
# @ step_name = step_1# @ queue# @ step_name = step_2# @ dependency = ( step_1 == 0 )# @ queue# @ step_name = step_3# @ dependency = ( step_1 != 0 )# @ queue# @ step_name = step_4# @ dependency = ( step_2 == CC_NOTRUN )# @ queue
IBM Blue Gene/P System Administration
Example multi-step job control file
case $LOADL_STEP_NAME instep_1) echo "I am failing in step 1" exit 1 ;;step_2) echo "I am step 2, step 1 was fine" ;;step_3) echo "I am step 3, step 1 failed dismally" ;;step_4) echo "I am step 4, why didn't step 2 run?" ;;esac
IBM Blue Gene/P System Administration
Example of a multi-step job
Id Owner Submitted ST PRI Class Running on------------ --------- ---------- --- --- -------- ----------node09.486.0 peterm 6/6 01:14 R 50 express node15node09.486.1 peterm 6/6 01:14 NQ 50 express node09.486.2 peterm 6/6 01:14 NQ 50 express node09.486.3 peterm 6/6 01:14 NQ 50 express
4 job steps in queue, 0 waiting, 0 pending, 1 runni ng, 3 held
IBM Blue Gene/P System Administration
Advance Reservation
� Creating reservations : llmkres� llmkres –t 14:00 –d 300 –c 1024
� llmkres –t 12/18 08:00 –d 60 –f my_jcf� reservation IDs : llqres -l � Specify the reservation ID through the LL_RES_ID
� Use the llsubmit command to submit the job
� Other commands: llchres , llrmres , llbind, …
IBM Blue Gene/P System Administration
Advance Reservation
� Reserve for maintenance� Reserve for special workload� Allow other users or groups to use� Allow a reservation be automatically cancelled if no
more jobs can run� Allow extra resources to be shared when all special
jobs for the reservation start to run
IBM Blue Gene/P System Administration
More about job priority
� q_sysprio in the llq –l output is used by LoadLevelerCentral Manger for scheduling
� Set in LoadL_config SYSPRIO_THRESHOLD_TO_IGNORE_STEP = integer
� Jobs with lower q_sysprio won’t be scheduled to run
� llmodify –s <q_sysprio> <step_id> -- Admin only command option
� Assign a fixed priority, won’t be changed by priority recalculation