+ All Categories
Home > Documents > IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P...

IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P...

Date post: 12-Mar-2020
Category:
Upload: others
View: 12 times
Download: 0 times
Share this document with a friend
18
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P LoadLeveler Blue Gene Support June 2010
Transcript
Page 1: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

© 2007 IBM Corporation

IBM Global Engineering Solutions

IBM Blue Gene/P

LoadLeveler Blue Gene Support

June 2010

Page 2: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Interaction with Blue Gene

Blue Gene

Jobs

Blue Gene Bridge API

GetResourcesAnd JobsData

Find ResourceFor jobsAndDefinePartitions

Blue Gene mpirun

submitted Run a job

LoadLeveler

For LoadLeveler documentationhttp://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.infocenter.doc/library.html

Page 3: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Starting LoadLeveler

� llctl start on both the FEN and SN� llstatus � look for “Blue Gene is present”

� llstatus -b

Name Base Partitions c-nodes InQ Run

BGP 4x4x2 32x32x16 0 0

� llstatus –B all � show all base partitions

� llstatus –P <partition_name>

� llstatus –b –l � show more BG resources

Page 4: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

LoadLeveler Job Command File

# @ job_name = myjob# @ comment = "BG Job by Size"# @ error = $(home)/output/$(job_name).$(jobid).err# @ output = $(home)/output/$(job_name).$(jobid).out# @ environment = COPY_ALL;# @ wall_clock_limit = 00:20:00# @ notification = error# @ notify_user = $(user)@us.ibm.com# @ job_type = bluegene# @ bg_size = 32# @ queue/usr/bin/mpirun -exe /bgtest/hello.rts -verbose 1

Page 5: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Parallel Job command file on standard system

#!/bin/bash#@ job_type = BLUEGENE#@ job_name = my_job_name#@ output = $(home)/$(job_name).$(jobid).out#@ error = $(home)/$(job_name).$(jobid).err#@ executable = mpirun#@ arguments = -verbose 1 -exe /gpfs2/IBM/tototest/ toto -cwd

/gpfs2/IBM/tototest/ -mode VN –np number_of_mpi_tas ks#@ class = BGL128_1H#@ bg_size = #@ bg_partition =#@ queue

Or

mpirun -exe /gpfs2/IBM/tototest/toto -cwd /gpfs2/IBM/tototest/ -mode VN –npnumber_of_mpi_tasks

Page 6: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Blue Gene Job Keywords

� Mutually exclusive (one must be specified)�bg_size � number of compute nodes

�bg_shape � 1x2x4 number of BPs in x,y,z direction

�bg_partition � specify a predefined partition

� Optional�bg_connection � MESH, TORUS, PREFER_TORUS

�bg_rotate � True or False�bg_requirements � c-node memory

Page 7: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Job command file keywords

account_noargumentsclasscommentdependencyerrorexecutablegroup

holdinitialdirinputjob_namejob_typenotificationnotify_useroutput

preferencesrequirementsstart_datestep_nameuser_prioritywall_clock_limitbg_sizebg_partitiongroup

A number of environment variables are set by LoadLeveler for the job. These include:

LOADL_JOB_NAME

LOADL_STEP_NAME

LOADL_STEP_INITDIR

LOADL_PROCESSOR_LIST

LOADL_STEP_CLASS

etc

Page 8: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Runtime Environment� Available to Prologs and Epilogs

� In LoadL_config, add JOB_PROLOG = /bgtest/bg_job_prolog.sh#!/bin/kshname=`basename $0 .sh`echo "$LOADL_BG_PARTITION $LOADL_BG_SIZE

$LOADL_BG_CONNECTION $LOADL_BG_BPS $LOADL_BG_IONODES`date` $LOADL_STEP_OWNER $LOADL_STEP_ID$LOADL_STEP_CLASS " > /tmp/$name.$LOADL_STEP_ID.log

� cat /tmp/bg_job_prolog.bgpdd1sys1.rchland.ibm.com.2.0.logLL07111910011602 512 MESH R20-M1 N00-J00,N04-J00,N08-J00,N12-J00

Mon Nov 19 10:01:16 CST 2007 ezhong bgpdd1sys1.rchland.ibm.com.2.0high

Page 9: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Submit a Job

� llsubmit <my_job_command_file>� llq

�llq –b � show Blue Gene specific info

�llq –s <step_id> � show why the job step remains idle

�-l : long listing

�…

Page 10: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Blue Gene Job Info from llq# llq –l

=============== Job Step bgpdd1sys1.rchland.ibm.com .9.0 ===============...

Step Type: Blue GeneSize Requested: 512Size Allocated: 512

Shape Requested: Shape Allocated: 1x1x1

Wiring Requested: MESHWiring Allocated: MESH

Rotate: TrueBlue Gene Status: Blue Gene Job Id:

Partition Requested: Partition Allocated: LL07112110294409

BG Partition State: FREEBG Requirements:

...

Page 11: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Blue Gene Job Info from llq# llq

Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- - ----------- -----------bgpdd1sys1.9.0 ezhong 11/21 10:29 R 50 h igh bgpdd1sys1

1 job step(s) in queue, 0 waiting, 0 pending, 1 run ning, 0 held, 0 preempted

# llq –b

Id Owner Submitted LL BG PT Partition Size ________________________ __________ ___________ __ __ __ ________________

______bgpdd1sys1.9.0 ezhong 11/21 10:29 R FR LL07112110294409 512

1 job step(s) in queue, 0 waiting, 0 pending, 1 run ning, 0 held, 0 preempted

# llq -f %id %BB %BS %PT %BG%dd %st

Step Id Partition Size PT BG Disp. Date ST------------------------ ---------------- ------ -- -- - ---------- --bgpdd1sys1.9.0 LL07112110294409 512 FR 11/21 10:29 R

1 job step(s) in queue, 0 waiting, 0 pending, 1 run ning, 0 held, 0 preempted

Page 12: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Multi-step jobs

LoadLeveler's basic unit of work is the "job step"a single job can contain multiple job stepsthere can be no dependencies between jobsthere can be dependencies between steps of a job

Page 13: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Example multi-step job control file

# @ step_name = step_1# @ queue# @ step_name = step_2# @ dependency = ( step_1 == 0 )# @ queue# @ step_name = step_3# @ dependency = ( step_1 != 0 )# @ queue# @ step_name = step_4# @ dependency = ( step_2 == CC_NOTRUN )# @ queue

Page 14: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Example multi-step job control file

case $LOADL_STEP_NAME instep_1) echo "I am failing in step 1" exit 1 ;;step_2) echo "I am step 2, step 1 was fine" ;;step_3) echo "I am step 3, step 1 failed dismally" ;;step_4) echo "I am step 4, why didn't step 2 run?" ;;esac

Page 15: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Example of a multi-step job

Id Owner Submitted ST PRI Class Running on------------ --------- ---------- --- --- -------- ----------node09.486.0 peterm 6/6 01:14 R 50 express node15node09.486.1 peterm 6/6 01:14 NQ 50 express node09.486.2 peterm 6/6 01:14 NQ 50 express node09.486.3 peterm 6/6 01:14 NQ 50 express

4 job steps in queue, 0 waiting, 0 pending, 1 runni ng, 3 held

Page 16: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Advance Reservation

� Creating reservations : llmkres� llmkres –t 14:00 –d 300 –c 1024

� llmkres –t 12/18 08:00 –d 60 –f my_jcf� reservation IDs : llqres -l � Specify the reservation ID through the LL_RES_ID

� Use the llsubmit command to submit the job

� Other commands: llchres , llrmres , llbind, …

Page 17: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

Advance Reservation

� Reserve for maintenance� Reserve for special workload� Allow other users or groups to use� Allow a reservation be automatically cancelled if no

more jobs can run� Allow extra resources to be shared when all special

jobs for the reservation start to run

Page 18: IBM System Blue Gene-P - Load Leveler - 2-1-0 System Blue Gene-P - Load... · IBM Blue Gene/P System Administration Starting LoadLeveler llctl start on both the FEN and SN llstatus

IBM Blue Gene/P System Administration

More about job priority

� q_sysprio in the llq –l output is used by LoadLevelerCentral Manger for scheduling

� Set in LoadL_config SYSPRIO_THRESHOLD_TO_IGNORE_STEP = integer

� Jobs with lower q_sysprio won’t be scheduled to run

� llmodify –s <q_sysprio> <step_id> -- Admin only command option

� Assign a fixed priority, won’t be changed by priority recalculation


Recommended