+ All Categories
Home > Documents > Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity...

Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity...

Date post: 17-Feb-2020
Category:
Upload: others
View: 8 times
Download: 1 times
Share this document with a friend
61
TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Altix UV HW/SW SGI Altix UV utilizes an array of advanced hardware and software feature to offload: thread synchronization data sharing massage passing overhead from CPUs. This system has a rich set of hardware features that enable scalable programming models to be implemented with high efficiency and performance. SGI MPI The SGI MPI software stack includes a number of software components.
Transcript
Page 1: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

Altix UV HW/SW •  SGI Altix UV utilizes an array of advanced hardware and software

feature to offload:

  thread synchronization

  data sharing

  massage passing overhead from CPUs.

•  This system has a rich set of hardware features that enable scalable programming models to be implemented with high efficiency and performance.

  SGI MPI

•  The SGI MPI software stack includes a number of software components.

Page 2: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

SGI MPI Software Stack •  MPI

•  XPMEM(cross process memory mapping)

•  GRU development kit

•  NUMA tools

•  Perfboost

•  Perfcatcher

•  MPInside

Page 3: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

UV HUB The UV_HUB is a custom ASIC developed by SGI. It

implements NUMAlink5 protocol, memory operations and associated atomic operations. It provides following capabilities:

  Cache-coherent global shared memory.

  Offloading time-sensitive and data-intensive operations from processors to increase processing efficiency and scaling.

  Scalable, reliable, fair interconnect with other blades via NUMAlink5.

Page 4: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

Altix UV Blade and HUB

Source : SGI Altix UV 1000 System User’s Guide

Page 5: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

UV HUB in detail •  SI(socket interface):

provides bridge between the Hub’s LH and RH chip sets and Intel sockets.

•  To communicate with Intel sockets, the SI implements an Intel proprietary Interconnect called CSI(common system interface).

Source : SGI Altix UV admin manual

Page 6: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

UV HUB in detail •  LH(local home)

  manages directory operations associated with remote memory requests. The LH has a single external memory channel.

•  RH(Remote home)

  processes coherent and non-coherent CSI transactions that are initialized by a local socket to a remote system address.

  processes Numalink intervention and invalidate requests when remote is locally cached by a socket.

•  LB(local block)

  provides system software the ability to select, configure and control various functionalities of the UV hub chip.

  provides facilities to monitor, diagnose, and debug hardware states and operations on live systems.

Page 7: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

UV HUB Units •  The NUMAlink interconnect

•  The Global reference unit(GRU)

•  The processor interconnect

Page 8: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

NUMAlink Interconnect •  Shared memory, globally addressable system

interconnect.

•  All physically distributed system memory is mapped into one global address space.

•  Peak aggregate bi-directional bandwidth 15GB/s.

•  2-3x MPI latency improvement.

•  Special support for block transfer and global operation.

•  NUMAlink is connected into the memory infrastructure of the system, versus being indirectly connected through an IO subsystem chip.

Page 9: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

Fetch-Op in HUB •  Fetch-Op-variables on Hub provide fast synchronization

•  The Fetch-Op AMO helped reduce MPI send/recv latency from 12 to 8 microseconds.

•  Used by MPI_Barrier, MPI_Win_fence, and shmem barrier all

CPU HUB

ROUTER

CPU

Fetch-op variable

Page 10: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

GRU •  Hardware in the Hub for memory to memory block

transfer and CPU synchronization events.

•  It is used by MPI, SHMEM, UPC

•  External TLB with large page support

•  Page initialization

•  Scatter/Gather operations

•  Update cache for AMOs

Page 11: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

GRU API Components • GRU resource allocators

• GRU memory access functions

•  XPMEM address mapping functions

• MPT address mapping functions

Page 12: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

MOE •  It is a set of functionality that offloads MPI communication

workload from CPUs to the Altix UV_HUB ASIC, accelerating common MPI tasks such as barriers and reductions across GSM(global shared memory).

•  Similar in concept to a TCP/IP offload engine(TOE) which offloads TCP/IP protocol processing from system CPUs.

•  Frees CPU from MPI activity.

•  Faster reduction operations.

•  Faster barriers and random access.

Page 13: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

MPI and MOE •  Accessing the MOE.

• MOE implements atomic memory operations in conjunction with a hardware multicast facility that helps to accelerate MPI_barrier, MPI_Bcast, MPI_Allreduce.

•  Accelerates MPI point-to-point and collective communication.

Page 14: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

MOE Advantages • MOE provides:

  MPI message queues

  synchronization primitives

  Advanced RDMA capabilities such as strided and indexed global memory updates.

  Hardware multicast.

Page 15: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

Determining System Configuration

Page 16: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

topology •  topology:

displays general information about SGI Altix system, with a focus on node information.

•  It includes node counts for blades, node IDs, NASIDs, memory per node, UV hub and partition number.

Page 17: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

cpumap •  cpumap: displays logical CPUs and shows

relationship between them.

•  Aspects displayed include, hyper threading, last level cache sharing and topology placements.

•  It gets information from /proc/cpuinfo, /sys/devices/system and /proc/sgi_uv/topology

Page 18: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

cpumap

Page 19: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

nodeinfo

Page 20: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

nodeinfo •  Hit: page was allocated on the preferred node.

•  Miss: preferred node was full. Allocation occurred on this node by a process running on another node that was full.

•  Foreign: preferred node was full. Had to allocate somewhere else.

•  Interleave: allocation was for interleaved policy numactl –i.

•  Local: page allocated on This node by a process running on this node.

•  Remote: page allocated on this node by a process running on the another node.

Page 21: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

x86info

Page 22: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

pmchart •  Put figure here

Page 23: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

pmchart

•  Put figure here

Page 24: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

HW Summary •  /proc/cpuinfo •  /proc/meminfo •  /sys/devices/system/node •  /dev/cpuset/torque/job#

Page 25: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

Data Placement Tool

Page 26: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

CPU Scheduling •  In a single-processor system, only one process

can run at a time.

•  CPU scheduling controls how the OS switches access to the CPU between processes.

•  Kernel provides mechanism called time slicing.

•  Time slice is the maximum length of time that a process owns its CPU resource and executes at its current policy.

•  Each CPU has its own run queue.

Page 27: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

Cache Affinity •  Affinity scheduling is a special scheduling discipline

used in multiprocessor system.

•  As a process executes, it causes more and more data and instruction text to be loaded into the processor cache. This creates an “affinity” between the process and the CPU.

Page 28: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

Data Placement Tool

•  NUMA machines have a shared address space. There is a single shared memory space and a single operating system instance.

•  Performance penalty to access remote memory versus local memory.

•  Access time to memory vary over physical address ranges and between processing elements. NUMAlink used to access memory between blades/node.

•  Memory latency is lowest when a processor accesses local memory.

•  NUMA tool also helps run multiple instances of serial program in a single job script with better processes placement.

Page 29: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

NUMA API •  The API is called from libcpuset

  cpuset: create, modify, destroy cpuset.

  taskset: Run a process on specific physical CPU.

  numactl: Control NUMA policy for processes or shared memory.

  dplace: Binds process to specific logical CPU.

  omplace: Controls the placement of MPI processes and OpenMP threads.

  Batch systems: LSF, PBSPro, Torque, SGE

  dlook, dlook-summary, pidstat, cpuset-Q

Page 30: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

cpuset •  cpuset includes sched_setaffinity for CPU

binding and memory binding.

•  Each task has a link to a cpuset structure that specifies the CPUs and memory node available for its use.

•  All tasks sharing the same placement constraints reference the same cpuset.

Page 31: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

Why Use a cpuset ? •  Restrict consumption of designated resources

CPU to specified processes/threads.

•  Limit run time variability.

•  Memory affinity.

•  Isolates the I/O.

Page 32: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

How Are cpuset’s Used •  Static cpusets (batch calls shared by queue)

  Cpusets are defined by administrator after system startup.

  User attach processes to the existing cpusets.

  Cpusets continue to exist after job finish executing.

•  Dynamic cpusets

  Workload management system(WMS) creates cpuset when It is required by a job.

  WMS attaches job to the newly created cpus.

  WMS destroys cpuset at the end of job.

Page 33: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

cpuset Command Line Options

•  cpuset

-c cpuset_name Create CPUSET

-m cpuset_name Modify CPUSET

-x cpuset_name Destroy CPUSET

-d cpuset_name Dump CPUSET attributes

-i csname –I script Run command

-p cpuset_name List all procs in CPUSET

-a cpuset_name Attach pids to CPUSET

-w pid List CPUSET the PID is attached to

-f filename input config file

Page 34: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

Advantage of Cpuset? •  It improves cache locality and memory access

time.

•  Facilitates providing equal resources to each thread in a job.

  Results in both optimum and repeatable performance.

Page 35: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

taskset •  taskset: restricts execution to the listed set of

CPUs. However, processes are still free to move among listed CPUs.

•  It is used to set of retrieve the CPU affinity of a running process given its PID or to launch a new command with a given CPU affinity.

•  The CPU affinity is represented as a bitmask(hexadecimal), with the lowest order bit.

  0x00000001 ## is processor number 0.

Page 36: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

taskset •  taskset:It does not pin a task to a specific CPU.

It only restricts a task so that it does not run any CPU that is not in the cpulist.

•  If you are running an MPI application, you do not use the taskset command. Instead of taskset use dplace.

mpirun –np 8 dplace -s1 –c10, 11, 16-21 ./a.out

export MPI_DSM_CPULIST 10,11,16-21 mpirun –np 8 ./a.out

Page 37: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

taskset examples •  taskset 0x1 ./a.out #executes on physical CPU 1

•  taskset 0x00131 ./a.out #executes on physical CPUs 0 4 5 8

•  taskset –p 0xa8 14386 #executes PID 14386 on physical CPUs 3 5 and 7

•  taskset –p –c 5 ./a.out #execute a.out on physical CPU 5

•  taskset –p 14386 #returns the affinity mask of PID 14386

Page 38: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

numactl •  Runs processes with a specific NUMA scheduling or memory placement

policy.

•  Control memory placement

  Interleave node(round robin)

  Membind(allocate from specified node pool)

  Preferred node

  Local allocation(first touch)

•  Each task has a link to a cpuset structure that specifies the CPUs and memory node available for its use.

•  All tasks sharing the same placement constraints reference the same cpuset.

Page 39: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

numactl Command Line Options

•  numactl --interleave Set a memory interleave policy.

--membind Only allocate memory from nodes.

--cpunodebind Only execute command on the CPUs of nodes.

--physcpubind Only execute process on CPUs.

Page 40: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

numactl examples •  numactl --physcpubind=+0-4,8-12 myapplic arguments #Run myapplic on cpus 0-4 and 8-12 of the current cpuset.

•  numactl --interleave=all bigdatabase arguments #Run big database with its memory interleaved on all CPUs.

•  numactl --cpubind=0 --membind=0,1 process #Run process on node 0 with memory allocated on node 0 and 1.

Page 41: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

numactl --hardware

Page 42: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

dplace •  dplace ensures the Linux kernel “pins” a thread [or series

of threads] to a specific CPU core within a container. Once pinned they do not migrate.

• By default, binds processes sequentially in a round-robin fashion against logical CPUs in current cpuset.

•  Integrate with MPT[via omplace and environmental variables].

•  It understands fork, exec, threads etc..

• Helps to ensure optimal performance and to minimize runtime variability.

Page 43: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

dplace Feature •  Default memory allocation policy is node-local (first

touch).

•  dplace allows processes to be bound to specific logical(within cpuset) cpus.

•  Prevents migration (thread hopping).

•  May require knowledge of application.

•  Global load balancing.

Page 44: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

dplace Command Line Options

•  dplace -c CPU list

-e exact placement

-s skip n cpu’s before starting placement

-n only processes with name

-x skip mask

-p placement file

-r replicate shared text to each node

-q list global count

Page 45: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

dplace examples •  dplace –c 0-3 ./a.out # places thread on the first four cpus, beginning with core 0.

•  dplace –c 0-7 –x2 ./a.out # place threads on the first 8 cpus, but used SKIP MASK[-x2] to skip the second thread(which in the case of Intel OpenMP is the lightweight monitor thread)

•  mpirun –np 8 dplace –s1 –c 0-7 ./a.out # skips the first process as this process is essentially the MPI shepherd. dplace handles the placements of the other 7 MPI ranks.

Page 46: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

numactl and dplace •  Consider a code that runs with 4 threads.

• What is the difference between

numactl –c 0-3 a.out

dplace –c 0-3 a.out

• With dplace, each thread is bound to a particular cpu. With numactl, the threads are bound to the range of cpus 0-3, and are free to migrate within that range.

•  numactl does have memory binding options.

Page 47: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

omplace •  Tool for controlling the placement of MPI processes and

OpenMP threads.

-c cpulist: specifies the effective CPU list.

-nt threads: specifies the number of threads per MPI process.

-s skip: the number of processes to skip before placements starts.

-vv verbose: Automatically generated placement file will be displayed in its entirely.

Page 48: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

omplace examples •  mpirun –np 2 omplace –nt 4 -vv ./a.out # To run 2 MPI processes with 4 threads per process, and to display the generated placement file.

Page 49: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

dlook •  Tool for showing process memory maps and cpu usage.

•  View address space and page placement.

•  Two forms

dlook [options] pid

dlook [options] <command> [command-args]

•  Run a MPI job using mpirun and print the memory map for each thread:

mpirun –np 8 dlook a.out

Page 50: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

Summary •  Use cpumap to determine partitioning and placement.

•  Use taskset to lock a process or process groups into CPU or group of CPUs.

•  Use dplace to place a process group into system topology.

•  Run an MPI/OpenMP hybrid and use omplace for pining.

•  Use numactl to control memory placement.

Page 51: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

Tips!!! •  Use dplace, numactl, or cpuset to lock down

processes, preventing thread hopping/migration.

•  Strong cache affinity reduces cache misses, instruction pipeline flushes.

•  Keeps processes close to their node-local memory.

•  Be aware of data placement.

Page 52: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

Heisenberg Principle •  Looking at the system will impact the system

•  Tracing events are the highest impact: strace, gprof,

•  PCP and sar the lowest impact

•  You can not measure a system without effecting it. top will show up in the top display.

•  PCP uses less than 1% of a CPU.

Page 53: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

sar •  sar indicates normal/abnormal behavior of

system. sar can imply performance problems and bottlenecks.

•  Many people look at sar as a set of performance metrics when it is not. It is an indicator of what a system is doing!

•  PCP and sar simply tell you what to look for.

Page 54: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

sar •  sar –vq to check kernel table sizes.

•  sar -W to check swapping activity.

•  sar –rsW to what memory and swap is left.

•  sar –u reports the amount of time executing kernel code.

Page 55: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

top, ps , pstree •  top provides a dynamic real-time view of a

running system.

•  top with H provides thread information.

•  ps: report a snapshot of the currently running processes on the system. Use with grep <username> to get user specific information.

•  pstree: display a tree of processes.

Page 56: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

vmstat, mpstat •  vmstat indicates reports information about

processes, memory, paging, block IO, traps and cpu activity.

•  mpstat writes to standard output activities for each available processor, processor 0 being the first one. Global average activities among all processors are also reported.

Page 57: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

mpvis •  mpvis displays a three dimensional bar chart of

CPU utilization. The display is updated with new values retrieved from the target host or archive every interval seconds.

Page 58: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

pidstat •  pidstat is used for monitoring individual task a

currently being managed by the Linux kernel.

–r report page faults and memory utilization

-d report I/O statistics

-u report CPU utilization

-p select tasks for which statistics are to be reported

-t display statistics for threads associated with selected tasks

•  pidstat –t –p 14374

Page 59: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

cpuset-Q •  It gives information allocated CPUs, node, IPD,

WCHAN, Command name etc..

Page 60: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

dlook •  Tool for showing process memory maps and cpu

usage.

•  Two forms

dlook [options] pid

dlook [options] <command> [command-args]

•  Run a MPI job using mpirun and print the memory map for each thread:

mpirun –np 8 dlook a.out

Page 61: Altix UV HW/SW · TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV Cache Affinity • Affinity scheduling is a special scheduling discipline used in multiprocessor system. • As

TG11 - SGI Altix UV Tutorial NCSA - PSC - RDAV

References •  UV System Analysis Manual •  UV System Administration Manual •  Technical Advances in the SGI Altix UV

Architecture(white paper) •  A Hardware-Accelerated MPI Implementation on

SGI Altix UV Systems(white paper) •  Linux Application Tuning Guide for SGI X86_64

Based Systems •  SGI Message Passing Toolkit(MPT) User’s Guide •  SGI NUMAlink white paper


Recommended