+ All Categories
Home > Documents > Cpu Monitoring and Tunig SMIT

Cpu Monitoring and Tunig SMIT

Date post: 02-Apr-2018
Category:
Upload: jeetmajumdar007
View: 214 times
Download: 0 times
Share this document with a friend

of 26

Transcript
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    1/26

    Introduction

    AIX 5L Version 5.3 is the latest version of the AIX operating system that offers

    simultaneous multi-threading (SMT) on eServer p5 systems to deliver industry leading

    throughput and performance levels. With support for advanced virtualization, AIX 5L

    Version 5.3 helps you to dramatically increase your server utilization and consolidateworkloads for more efficient management.

    A review of computing history and operating systems shows that computer scientists have

    developed many CPU scheduling policies. First-in, first-out (FIFO), shortest job first, and

    round robin are just a few. Scheduling policies are important because a single policy might

    not be best suited to all applications. Some applications in certain workloads can run well in a

    default scheduling policy. However, the same applications with a different workload might

    require a scheduling policy adjustment in order to achieve the optimal performance.

    Note: This article is an update for AIX 5.3 performance. The advanced virtualization is not

    discussed in this article. It has enhancements and updates to emphasize AIX 5L Version 5.3features, tools, and capabilities.

    Back to top

    What is SMT?

    SMT is the ability of a single physical processor to concurrently dispatch instructions from

    more than one hardware thread. In AIX 5L Version 5.3, a dedicated partition created with one

    physical processor is configured as a logical two-way by default. Two hardware threads can

    run on one physical processor at the same time. SMT is a good choice when overall

    throughput is more important than the throughput of an individual thread. For example, Web

    servers and database servers are good candidates for SMT.

    Viewing processor and attribute information

    By default, the SMT is enabled, as shown inListing 1below.

    Listing 1. SMT# smtctl

    This system is SMT capable.

    SMT is currently enabled.

    SMT threads are bound to the same physical processor.

    Proc0 has 2 SMT threads

    Bind processor 0 is bound with proc0

    Bind processor 2 is bound with proc0

    Proc2 has 2 SMT threads

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing1https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing1https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing1https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing1https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    2/26

    Bind processor 1 is bound with proc2

    Bind processor 3 is bound with proc2

    # lsattr -El proc0

    frequency 1656376000 Processor Speed False

    smt_enabled true Processor SMT enabled False

    smt_threads 2 Processor SMT threads False

    state enable Processor state False

    type PowerPC_POWER5 Processor type False

    The smtctl command provides privileged users and applications the ability control

    utilization of processors with SMT support. With this command, you can turn SMT on or off.

    The smtctl command syntax is:

    smtctl [-m off | on [ -w boot | now] ]

    What are shared processors?

    Shared processors are physical processors that are allocated to partition on a timeslice basis.

    You can use any physical processor in the shared processor pool to meet the execution needs

    of any partition using the shared processor pool. An eServer p5 system can contain a mix of

    shared and dedicated partitions. A partition must be all shared or all dedicated, and you can

    not use dynamic LPAR (DLPAR) commands to change between the two. You need to bring

    down the partition and switch it from using dedicated to shared, or vice versa.

    Processing units

    After a partition is configured, you can assign it an amount of processing units. A partition

    must have a minimum of 1/10 of a processor. And after that requirement has been met, you

    can configure processing units at the granularity on 1/100 of a processor. A partition that uses

    shared processors is often called a shared partition. A dedicated partition is one that usesdedicated processors.

    Each partition is configured with a percentage of execution dispatch time for each 10

    milliseconds (ms) timeslice. For example:

    A partition with 0.2 processing units is entitled to 20 percent capacity during eachtimeslice.

    A partition with 1.8 processing units is entitled to 18ms processing time for each10ms timeslice (using multiple processors).

    There is no accumulation of unused cycles. If a partition does not use the entitled processingcapacity, the excess processing time is ceded back to the shared processing pool.

  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    3/26

    Partitions with shared processors are either capped or uncapped. The capped partition is

    assigned with a hard limit capacity. If a partition needs an extra CPU cycle (more than its

    total processing units), it can utilize unused capacity in the shared pool.

    Back to top

    Scheduling algorithms

    AIX 5 implements the following scheduling policies: FIFO, round robin, and a fair round

    robin. The FIFO policy has three different implementations: FIFO, FIFO2, and FIFO3. The

    round robin policy is named SCHED_RR in AIX, and the fair round robin is called

    SCHED_OTHER. We discuss these policies in greater detail in the upcoming sections.

    Scheduling policies can have a major impact on system performance, depending on how one

    assigns and manages them (response time and throughput). For example, FIFO is a goodchoice for the job that uses a lot of CPU, but it also can choke out all of the other jobs waiting

    in line. A basic round robin gives a "timeslice" or "quantum" to each job in a time-shared

    manner. As a result, it tends to discriminate against I/O-intensive tasks, since those tasks

    often give up CPU voluntarily due to I/O wait. The fair round robin is "fair" because

    scheduling priorities change as the jobs accumulate quantums of CPU time during execution.

    This allows the operating system to demote a CPU hugger so that an I/O bound job has a fair

    chance to use the CPU resource.

    Let's go over two important concepts before getting into the scheduling details: the nice value

    and the AIX priority and run queue structure.

    The nice and renice commands

    AIX has two important scheduling commands: nice and renice. A user job in AIX carries a

    base priority level of 40 and a default nice value of 20. Together, these two numbers form

    the default priority level of 60. This value applies to most of the jobs you see in a system.

    When you start a job with a nice command, such as nice -n 10 myjob, the number 10

    becomes the delta_NICE. This number is added to the default 20 to create the new nicevalue of 30. In AIX, the higher this number, the lower the priority. Using this example, your

    job now starts with a priority of 70, which is 10 levels worse in priority than the default.

    The renice command applies to a job that has already started. For example, the renice -n

    5 -p 2345 command causes process 2345 to have a nice value of 25. Note that the renice

    value is always applied to a base nice of 20, regardless of the current nice value of the

    process.

    AIX priority and run queue structure

    A thread carries a priority range from 0 to 255 (the range is from 0 to 127 on systems prior to

    AIX 5). Priority 0 is the highest or the most favorable, and 255 is the lowest or least

    favorable. AIX maintains a run queue in the form of a 256-level priority queue to efficientlysupport the 256 priority levels of threads.

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    4/26

    AIX also implements a 256-bit array to map to the 256 levels of the queue. If a particular

    queue level is empty, the corresponding bit is set to 0. This design allows the AIX scheduler

    to quickly identify the first non-empty level and start the first ready-to-run job in that level.

    See the AIX run queue structure inFigure 1below.

    Figure 1. Scheduler run queue

    InFigure 1, the scheduler maintains a run queue of all the threads that are ready to bedispatched. All dispatchable threads of a given priority occupy consecutive positions in the

    run queue.

    AIX 5L implements one run queue for each CPU and a global queue. For example, there are

    32 run queues and one global queue in an eServer pSeries p590 machine. With a per-CPU

    run queue, a thread has better chance to go back to the same CPU after a preemption, which

    is an affinity enhancement. Also, the contention among CPUs to lock the run queue structure

    is much reduced with multiple run queues.

    However, for some situations, a multiple run queue structure might not be desirable.

    Exporting a system environment variable RT_GRQ=ON can cause a thread to be placed on

    the global run queue when it becomes runnable. This can improve performance for threads

    that are interrupt-driven and running SCHED_OTHER. Ifschedo o fixed_pri_global=1 is run on AIX 5L Version 5.2 and later, threads running the fixed priority are placed on theglobal run queue.

    For local run queues, the dispatcher picks the best priority thread in the run queue when a

    CPU is available. When a thread has been running on a CPU, it tends to stay on that CPU's

    run queue. If that CPU is busy, then the thread can be dispatched to another idle CPU and

    assigned to that CPU's run queue.

    FIFO

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1ahttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1ahttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1ahttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1ahttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1ahttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1ahttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1ahttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1a
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    5/26

    Although the FIFO policy is the simplest, it is rarely used because of its non-preemptive

    nature. A thread with this scheduling policy runs all the way to completion, unless one of the

    following happens:

    It gives up the CPU voluntarily by executing a function that would put the thread tosleep, such as sleep() orselect().

    It gets blocked due to resource contention. It has to wait for I/O completion.

    The checkout lane at a grocery store uses a typical FIFO policy. Imagine yourself in the

    checkout lane with only one TV dinner (and you're hungry), but the person in front has a full

    load in his cart. What can you do? Not much. Since this is a FIFO, you must wait patiently

    for your turn.

    Similarly, it is obvious that job response time can suffer severely if several tasks are running

    FIFO mode in AIX. Consequently, FIFO is rarely used in AIX. Only a process owned by root

    can set itself or another thread to FIFO with the thread_setsched() system call.

    There are two variations of the FIFO policy: FIFO2 and FIFO3. FIFO2 says that a thread is

    put at the head of its run queue if it was asleep for only a short period of time less than a

    predefined number of ticks (affinity_lim ticks, tunable with the schedo -p command).

    This allows a thread to have a good chance to reuse the cache content. For FIFO3, a thread is

    always put at the head of the queue when it becomes runnable.

    Round robin

    The well-known round robin scheduling policy is even older than UNIX itself. AIX 5Limplements round robin on top of its multilevel priority queue of 256 levels. At a given

    priority level, a round robin thread shares the CPU timeslices with all other entries of the

    same priority. A thread is scheduled to run until one of the following occurs:

    It yields the CPU to other tasks. It is blocked for I/O. It uses up its timeslice.

    When the timeslice is exhausted, if a thread of equal or better priority is available to run on

    that CPU, the thread that is currently running is then placed at the end of the queue for the

    next turn to own the processor. A thread can be preempted because of a higher priority jobwaking up or a device interrupt (for example, after an I/O is done).

    For a round robin task only, this preempted thread is placed at the beginning of its queue

    level, because AIX wants to ensure that a round robin job has a full timeslice before it is

    moved to the end of the round robin chain. It is important to note that the priority of a round

    robin thread is fixed and does not change over time. This makes the priority of a round robin

    task persistent (as opposed to the changing priorities in fair round robin) and more

    predictable.

    Since a round robin thread has special status, only root can set a thread to run with the round

    robin scheduling policy. To set SCHED_RR for a thread, use one of the following applicationprogramming interfaces (APIs): thread_setsched() orsetpri().

  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    6/26

    SCHED_OTHER

    This last scheduling policy is also the default. While trying to establish the fairest policy

    among tasks, this innovative SCHED_OTHER algorithm was created with a not so innovative

    POSIX-defined name. The AIX SCHED_OTHER is a priority-queue round robin design at the

    core, with one major difference: the priority is no longer fixed. If a task is using an excessiveamount of CPU time, its priority level should be downgraded to allow other jobs an

    opportunity to access the CPU.

    If a task is at a priority level so low (a high number) that it does not have an opportunity to

    run, then its priority should be upgraded to a higher level (a lower number) so it can run to

    finish. A new concept was also implemented to further enhance the effectiveness of the nice

    value: If a task is nice (the UNIX nice value) at the beginning, the system will then force it

    to be nice all the time. I discuss this feature later.

    Traditional CPU utilization

    Prior to AIX 5.3 or with SMT disabled, AIX processor utilization uses a sample-based

    approach to approximate:

    Percentage of processor time spent executing user programs System code Waiting for disk I/O Idle time

    AIX produces 100 interrupts per second to take samples. At each interrupt, a local timer tick

    (10ms) is charged to the current running thread that is preempted by the timer interrupt. Oneof the following utilization categories is chosen based on the state of the interrupted thread:

    If the thread was executing code in the kernel using system call, the entire tick ischarged to the process system time.

    If the thread was executing application code, the entire tick is charged to the processuser time. Otherwise, if the current running thread was the operating system's idle

    process, the tick is changed in a separate variable. The problem with this method is

    the process receiving the tick most likely did not run for the entire timer period and

    happened to be executing when the timer expired. With AIX 5.3 SMT enabled, the

    traditional utilization metrics are misleading as treating due to the two logical

    processors. If one thread is 100 percent busy, one idle thread would result in 50 percent

    utilization. But in reality, if one SMT thread is using all CPU resources, then that

    CPU is 100 percent busy, as reported using the new Processor Utilization Resource

    Register- (PURR) based method.

    PURR

    Beginning in AIX 5.3, the number of dispatch cycles for each thread can be measured using a

    new register called the PURR. Each physical processor has two PURR registers (one for each

    hardware thread). The PURR is a new register provided by the POWER5 processor, which is

    used to provide an actual count of physical processing time units that a logical processor hasused. All performance tools and APIs utilize this PURR value to report CPU utilization

  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    7/26

    metrics for SMT systems. This register is a special-purpose register that can be read or

    written by the POWER Hypervisor; however, it is read-only by the operating system.

    The hardware increments for PURRs is based on how each thread is using the resources of

    the processor, including the dispatch cycles that are allocated to each thread. For a cycle in

    which no instructions are dispatched, the PURR of the thread that last dispatched an

    instruction is incremented. The register advances automatically so that the operating systemcan always get the current up-to-date value.

    When the processor is in single-thread mode, the PURR increments by one every eight

    processor clock cycles. When the processor is in SMT mode, the thread that dispatches a

    group of instructions in a cycle increments the counter by 1/8 in that cycle. If no group

    dispatch occurs in a given cycle, both threads increment their PURR by 1/16. Over a period

    of time, the sum of the two PURR registers, when running in SMT mode, should be very

    close, but not greater than the number of timebase ticks.

    AIX 5.3 CPU utilization

    In AIX 5L V5.3, there are new metrics that are collected by the kernel that are stated-based

    rather than a sample-based approach. State-based is the collection of information based on

    PURR increments rather than a set time of 10ms. AIX 5.3 uses PURR for process accounting.

    Instead of charging the entire 10ms clock tick to the interrupted process as before, processes

    are charged on the PURR delta for the hardware thread since the last interval. At each

    interrupt:

    The elapsed PURR is calculated for the current sample period. This value is added to the appropriated utilization category (user, sys, iowait, and

    idle), instead of the fixed-size increment (10 ms) that was previously added.

    There are two different ways to measure: the threads processor time and the elapsed

    time. To measure the elapsed time, the time-based register (TB) is still used. The physical

    resource utilization metrics for a logical processor are:

    (delta PURR/delta TB) represents the fraction of the physical processor consumed bya logical processor.

    (delta PURR/delta TB) * 100 over an interval represent the percentage of dispatchcycles given to a logical processor.

    CPU utilization example

    Assume two threads are running on one physical processor with SMT enabled. Both SMT

    threads of a physical CPU are busy. Using the old tick-based method, both SMT threads

    would be reported as 100 percent busy but, in reality, they are really sharing the CPU

    resources evenly. This means the new PURR-based method would show each SMT thread as

    50 percent busy.

    Using the PURR methods, each logical processor reports a utilization of 50 percent

    representing the proportion of physical processor resources that it used, assuming equal

    distribution of physical processor resources to both the hardware threads.

    Additional CPU utilization metrics

  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    8/26

    The following metrics uses the per-thread PURR method to measure the thread's processor

    time and uses the TB register to measure the elapsed time.

    Table 1. Per-thread PURR method

    Additional CPU utilization metrics Information provided%sys=(delta PURR in system mode/entitled PURR) *

    100 where entitled PURR (ENT * delta TB), and

    ENT is entitlement in # of processors (entitlement/100)

    Physical CPU utilization metrics are

    calculated using the PURR-based

    samples and entitlement.

    sum (delta PURR/delta TB) for each logical processor

    in a partition

    The Physical Processor Consumed

    over an interval.

    (PPC/ENT) * 100The percentage of entitlement

    consumed.

    (delta PIC/delta TB) where PIC is the Pool Idle count,

    which represents the clock ticks where POWER

    Hypervisor was idle

    It provides the available pool of

    processors.

    Sum of traditional 10ms tic-based %sys and %user

    Logical processor utilization helpsyou to determine if more virtual

    processors should be added to apartition.

    Back to top

    AIX 5.3 command changes

    When AIX is running with SMT enabled, commands that display CPU information, such as

    vmstat, iostat, topas, and sar, display the PURR-based statistics, rather than the

    traditional sample-based statistics. In SMT mode, additional columns of information are

    displayed, as show inTable 2below.

    Table 2. SMT mode

    Column Descriptionpc or physc Physical Processor Consumed by the partition

    pec or %entcPercentage of Entitlement Consumed by the partition

    Another tool that needed modification was trace/trcrpt and several other tools that are basedon the trace utility. In an SMT environment, trace can optionally collect PURR register

    values at each trace hook, and trcrpt can display elapsed PURR.

    Table 3below shows the arguments to use for an SMT.

    Table 3. Arguments for SMT

    Argument Description

    trace r PURRCollects the PURR register values. Only valid for a trace run on a 64-

    bit kernel.

    trcrpt OPURR=[on|off]

    Tells trcrpt to show the PURR, along with any timestamps.

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table3https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table3https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table3https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    9/26

    netpmon r PURRUses the PURR time instead of timebase in percent and CPU

    calculation. Elapsed time calculations are unaffected.

    pprof r PURRUses the PURR time instead of timebase in percent and CPU

    calculation. Elapsed time calculations are unaffected.

    gprof GPROF is the new environment variable to support the SMT.

    curt r PURR Specifies the use of PURR register to calculate CPU times.splat p Specifies the use of PURR register to calculate CPU times.

    Back to top

    Thread priority formulas

    You can calculate the priority of a thread using the formulas, as shown inListing 2below. It

    is a function of the nice value, the CPU usage c, and a tuning factorr.

    Back to top

    How AIX calculates the new priority

    The clock timer interrupt occurs every 10ms or 1 tick on each CPU. The timers are staggered

    so that a CPU's clock timer does not go off at the same time as another CPU's clock timer.

    When the CPU clock timer interrupt occurs (even before the thread has run for a full 10ms),

    the thread has its CPU usage value (the CPU charge) incremented by one, up to a maximum

    of 120. If a job does not get a full 10ms slice and is running RR policy, the system dispatcherchanges the thread's priority in the run queue to allow it to run again soon.

    The priority of most user processes varies with the amount of CPU time the process has used

    recently. The CPU scheduler's priority calculations are based on two parameters that are set

    with schedo, sched_R, and sched_D. The sched_R and sched_D values are in 1/32 seconds.

    The scheduler uses this formula to calculate the amount to add to a process's priority value as

    a penalty for recent CPU use. For example:

    CPU penalty = (recently used CPU value of the process) * (r/32)

    The recalculation (once per second) of the recently used CPU value of each process is:

    New recently used CPU value = (old recently used CPU value of the process)* (d/32)

    Both r (sched_R parameter) and d (sched_D parameter) have default values of 16.

    The recent CPU charge C is then used to determine the priority penalty and to recalculate thenew thread priority. Using the first formula as a reference (seeListing 2), you know that a

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    10/26

    newly started user task, which carries a base priority 40, a default nice value of 20, and no

    CPU charge so far (C=0), begins with a priority level 60.

    Also, in the first formula, the value r determines the penalty ratio with a range from zero to

    32. An r value of zero means a no-charge penalty for the CPU, since it is always zero

    (C*r/32). Ifr=32, it yields the highest possible penalty charge for a CPU -- each tick (10ms)of CPU usage translates to one priority-level downgrade.

    In most cases, the value ofr lies near the middle between zero and 32. AIX defaults r to 16;

    that is, every two ticks of CPU charge become one level of priority penalty. When the r value

    is high, the impact of a nice value becomes less important since the CPU usage penalty

    prevails. A smallerr, on the contrary, makes the effect of the nice value more obvious.

    Based on this discussion, the effectiveness of the nice value diminishes after a while. The

    reason for this is because the CPU charge grows in time and gradually becomes the main

    factor in determining the new priority.

    This formula has been modified in AIX 5L to increase the weight of the nice value in

    calculating the priority level. With all the different versions of AIX, two new factors have

    been introduced : x_nice and x_nice_factor ("extra nice" and "extra nice factor"). See the

    second formula inListing 2below.

    Listing 2. Thread priority formulasPriority = p_nice + (C * r/32) (1)

    Priority = x_nice + (C * r/32 * x_nice_factor) (2)Where:

    p_nice = base_PRIORITY + NICEbase_PRIORITY = 40NICE = 20 + delta_NICE(20 is the default nice value)That is,

    P_nice = 60 + delta_NICE

    C is the CPU usage chargeThe maximum value of C is 120

    If NICE 20 thenx_nice = p_nice * 2 - 60 orx_nice = p_nice + delta_NICE, or (3)x_nice = 60 + (2 * delta_NICE) (3a)x_nice_factor = (x_nice + 4)/64 (4)Priority has a maximum value of 255

    As you can see from Formula 2 and Formula 3, the x_nice now has doubled the increased

    nice value. The x_nice_factor further strengthens the r ratio. For example, an initial nice

    16, which gives a nice value of 36, results in a new x_nice_factor of 1.5. This value is a 50

    percent higher CPU charge penalty for the CPU usage part over the lifetime of the thread.

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    11/26

    Decaying the CPU usage

    It is possible that a thread can get a priority so low that it never has a chance to run. This

    would occur if you use only Formulas 1 and 2 without a mechanism to push a thread's

    priority level back up.

    When a thread runs with SCHED_OTHER, its priority is degraded for its use of CPU time. When

    it is not running and is waiting for its turn, AIX tries to regain its priority by "decaying" its

    CPU charges, about once a second. The rule is simple: A CPU-bound job should be assigned

    a lower priority to allow other jobs to run, but it should not be discriminated against to the

    point that it cannot finish itself. All threads' CPU charge is decayed based on a predefined

    factor of once per second, as follows:

    New Charge C = (Old Charge C) * d / 32 (5)

    A kernel process Sweapperdoes this job. Once every second, Swapper wakes up and handlesthe CPU charge decaying for all the threads. The default decay factor is 0.5 or d=16, which

    "discounts" or "waives" half of the CPU charge.

    With this mechanism, a CPU-intensive job accumulates CPU charge, gets to a lower priority

    level, and then advances to a much higher level at the end of a second. On the other hand, an

    I/O-intensive job does not vary its priority up and down as much, since it generally

    accumulates less CPU time.

    Back to top

    Have you exhausted your CPU?

    Now that you understand how the AIX scheduler prioritizes the workload, let's look at several

    commonly used commands. If AIX seems to take too long to finish your workload or it does

    not respond quickly enough, try these commands to investigate whether your system is CPU-

    bound: vmstat, iostat, and sar.

    We do not discuss all the possible ways to use these commands, but instead emphasize the

    information they convey to you. For a detailed description of these commands, see your AIXpublications or visit the IBM System p and AIX Information Center at

    http://publib16.boulder.ibm.com/pseries/index.htm. Scroll down, if necessary, and clickAIX

    Version 5L Version 5.3 Version 5.3 information centerto start using the AIX 5 publications.

    The priority change history of a thread

    Listing 3shows how the CPU charge can change the priority of a thread:

    Listing 3. Change of CPU charge and the priority of a thread

    Base priority is 40Default NICE value is 20, assume task was run using thedefault nice value

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttp://publib16.boulder.ibm.com/pseries/index.htmhttp://publib16.boulder.ibm.com/pseries/index.htmhttp://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsphttp://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsphttp://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsphttp://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsphttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing3https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing3https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing3http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsphttp://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsphttp://publib16.boulder.ibm.com/pseries/index.htmhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    12/26

    p_nice = base_priority + NICE = 40 + 20 = 60Assume r = 2 to slow down the penalty increase (defaultr value is 16)Priority = p_nice + C*r/32 = 60 + C * r / 32Tick 0 P = 60 + 0 * 2 / 32 = 60Tick 1 P = 60 + 1 * 2 / 32 = 60

    Tick 2 P = 60 + 2 * 2 / 32 = 60.Tick 15 P = 60 + 15 * 2 / 32 = 60Tick 16 P = 60 + 16 * 2 / 32 = 61Tick 17 P = 60 + 17 * 2 / 32 = 61..Tick 100 P = 60 + 100 * 2 / 32 = 66Tick 100 Swapper decays all CPU usage charges for all threads.New C CPU Charge = (Current CPU Charge) * d / 32Assume d = 16 (the default)For the test thread, new C = 100 * 16 / 32 = 50

    Tick 101 P = 60 + 51 * 2 / 32 = 63

    Listing 4shows how to specify a fast or slow priority:

    Listing 4. Priority change of a typical CPU-bound job (fast verses slow)fast.c:main(){for (;;)}

    slow.c:

    main() {sleep 80;}

    Back to top

    Common commands

    The vmstat, iostat, and sar commands are used frequently for CPU monitoring. You

    should be familiar with the usage and the meaning of the reports each command generates.

    vmstat

    The vmstat command provides an overview of resource utilization through a report of CPU,

    disk, and memory activity in a one-line-per-report format. The sample output inListing 5is

    generated on an AIX 5L Version 5.3 system running "vmstat 1 6". This report was

    generated every second, as requested. Since a count of six was specified following the

    interval, reporting stops after the sixth report. One popular way to run the vmstat command

    is to leave out the count parameter; vmstat then generates reports continuously until the

    command terminates.

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing4https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing4https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing4
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    13/26

    Except for the avm and fre columns, the first report contains average statistics per second

    since system startup. Subsequent reports contain statistics collected during the interval since

    the previous report.

    Beginning with AIX 5L Version 5.3, the vmstat command reports the number of physical

    processors consumed (pc) and the percentage of entitlement consumed (ec) in the Micro-Partitioning and SMT environments. These metrics only display on Micro-Partitioning and

    SMT environments.

    AIX 5L adds a useful new option "-I" to vmstat that shows the number of threads waiting

    for the raw I/O to complete (p column) and the number of file pages paged in/out per second

    (fi/fo columns).

    The following detailed descriptions of the columns convey useful information about CPU

    utilization.Listing 5shows the output of the vmstat 1 6 command:

    Listing 5. Output of the vmstat 1 6 command from a p520 system (two CPUs)vmstat 1 6System configuration: lcpu=4 mem=15808MBkthr memory page faults cpu----- ------- ------ -------- -----------r b avm fre re pi po fr sr cy in sy cs us sy id wa1 1 110996 763741 0 0 0 0 0 0 231 96 91 0 0 99 00 0 111002 763734 0 0 0 0 0 0 332 2365 179 0 1 99 00 0 111002 763734 0 0 0 0 0 0 330 2283 139 0 5 93 10 0 111002 763734 0 0 0 0 0 0 310 2212 153 0 0 99 01 0 111002 763734 0 0 0 0 0 0 314 2259 173 0 0 99 0

    0 0 111002 763734 0 0 0 0 0 0 321 2261 177 0 1 99 0

    Figure 2shows the output of the command vmstat -I 1 (issued during a software

    installation):

    Figure 2. Output of the vmstat -I 1 command

    XML error: The image is not displayed because the width is greater than the maximum of

    580 pixels. Please decrease the image width.

    SeeTable 4below for a listing of relevant columns with descriptions.

    Table 4. Description of relevant columns

    ColumnDescription

    kthr Kernel thread state changes per second over the sampling interval.

    r Number of kernel threads placed in run queue.

    b Number of kernel threads placed in the Virtual Memory Manager (VMM) waitqueue (awaiting resource, awaiting input/output).

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table4https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table4https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table4https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table4https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    14/26

    pThe number of threads waiting on raw I/Os (bypassing journaled file system (JFS))

    to complete. This is only available on AIX 5 and later.

    fi/foNumber of file pages paged in/out per second. Note: This column is available only

    on AIX 5 and later systems.

    cpu

    Breakdown of percentage usage of CPU time. For multiprocessor systems, CPU

    values are global averages among all processors. Also, the I/O wait state is definedsystem-wide and not per processor.

    us Average percentage of CPU time executing in the user mode.

    sy Average percentage of CPU time executing in the system mode.

    idAverage percentage of time that CPUs were idle and the system did not have an

    outstanding disk I/O request.

    a

    CPU idle time during which the system had outstanding disk/NFS I/O request(s). If

    there is at least one outstanding I/O to a disk when wait is running, the time is

    classified as waiting for I/O. Unless asynchronous I/O is being used by the process,

    an I/O request to disk causes the calling process to block (or sleep) until the request

    has been completed. Once an I/O request for a process completes, it is placed on the

    run queue. If the I/Os were completing faster, more CPU time could be used.

    pcNumber of physical processors consumed. Displayed only if the partition is running

    with shared processor.

    ecThe percentage of entitled capacity consumed. Displayed only if the partition is

    running with the shared processor.

    A CPU is marked wio at the time of a clock interrupt (every 1/100 ms), if the CPU is idling

    and an outstanding I/O was initiated on that CPU. If a CPU is only idling with no outstanding

    I/O from that CPU, it is marked as id instead ofwa. For example, a system with four CPUs

    and one thread doing I/O reports a maximum of 25 percent wio time. A system with 12 CPUs

    and one thread doing I/O reports a maximum of 8.3 percent wio time. To be precise, the wiomeasures the percent of time the CPU is idle as it waits for an I/O to complete.

    These four columns should total 100 percent, or very close. If the sum of user and system (us

    and sy) CPU-utilization percentages consistently approach a 100 percent, the system might

    be encountering a CPU bottleneck.

    iostat

    The iostat command is used primarily to monitor system input and output devices, but it

    can also provide CPU utilization data. Beginning with AIX 5.3, the iostat command reports

    number of physical processors consumed (physc) and the percentage of entitlementconsumed (% entc) in Micro-Partitioning and SMT environments. These metrics are only

    displayed on Micro-Partitioning/SMT environments. When SMT is enabled, iostat

    automatically uses a new PURR-based data and formula for:

    %user %sys %wait %idle

    Listing 6is generated on an AIX 5L Version 5.3 system by entering "iostat 5 3", as

    follows:

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing6https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing6https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing6
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    15/26

    Listing 6. iostat reportSystem configuration: lcpu=4 drives=9

    tty: tin tout avq-cpu: %user %sys %idle %iowait0.0 4.3 0.2 0.6 98.8 0.4

    Disks: %tm_act Kbps tps Kb_read Kb_wrtnhdisk0 0.0 0.2 0.0 7993 4408hdisk1 0.0 0.0 0.0 2179 1692hdisk2 0.4 1.5 0.3 67548 59151cd0 0.0 0.0 0.0 0 0tty: tin tout cpu: %user %sys %idle %iowait

    0.0 30.3 8.8 7.2 83.9 0.2Disks: %tm_act Kbps tps Kb_read Kb_wrtnhdisk0 0.2 0.8 0.2 4 0hdisk1 0.0 0.0 0.0 0 0hdisk2 0.0 0.0 0.0 0 0cd0 0.0 0.0 0.0 0 0tty: tin tout cpu: %user %sys %idle %iowait

    0.0 8.4 0.2 5.8 0.0 93.8Disks: %tm_act Kbps tps Kb_read Kb_wrtnhdisk0 0.0 0.0 0.0 0 0hdisk1 0.0 0.0 0.0 0 0hdisk2 98.4 75.6 61.9 396 2488cd0 0.0 0.0 0.0 0 0

    Example iostat with SPLAR configuration#iostat t 2 3System Configuration: lcpu=4 ent=0.80avg-cpu %user %sys %idle %iowait physc %entc

    0.1 0.2 99.7 0.0 0.0 0.90.1 0.4 99.5 0.0 0.0 1.1

    0.1 0.2 99.7 0.0 0.0 0.9

    Just like the vmstat command report, the first report contains statistic averages since thesystem started up. Subsequent reports contain statistics collected during the interval since the

    previous report.

    The four columns that show the breakdown of CPU usage time convey the same information

    as the vmstat command. The columns should total approximately 100 percent. If the sum of

    user and system (us and sy) CPU-utilization percentages consistently approach 100 percent,

    the system might be encountering a CPU bottleneck.

    On systems running one application, a high I/O wait percentage might be related to the

    workload. On systems with many processes, some will be running while others wait for I/O.

    In this case, the %iowait can be small or zero because running processes "hide" some wait

    time. Although %iowait is low, a bottleneck can still limit application performance. If the

    iostat command indicates that a CPU-bound situation does not exist and %iowait time is

    greater than 20 percent, you might have an I/O or disk-bound situation.

    sar

    The sar command has two forms: The first form samples, displays, and/or saves systemstatistics and the second form processes and displays previously captured data. The sar

  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    16/26

    command can provide queue and processor statistics just like the vmstat and iostat

    commands. However, it has two additional features:

    Each sample has a leading time stamp, so an overall average appears at the end of thesamples.

    The -P option can be used to generate per-processor statistics, in addition to theglobal averages among all processors. The sample code below shows sample output

    from a four-way symmetric multiprocessor (SMP) system that resulted from entering

    two commands:

    osar -o savefile 5 3 > /dev/null &

    oNote: This command collects the data three times at five-second intervals,

    saves the collected data in savefile, and redirects the report to null so that noreport is written to the terminal.

    osar -P ALL -u -f savefile

    oo Note: The -P ALL is specified to get per-processor statistics for each

    individual processor and -u CPU usage data. In addition, -f savefile tells

    sar to generate the report using the data saved in savefile. The sar P

    All output for all logical processors with SMT enabled shows the physical

    processor consumed physc (delta PURR/delta TB). This column shows therelative SMT split between processors -- in other words, it illustrates the

    measurement of fraction of time a logical processor was getting physical

    processor cycles. Whenever the percentage of entitled capacity consumed is

    under 100 percent, a line beginning with U is added to represent the unused

    capacity. When running in shared mode, sar displays the percentage of

    entitlement consumed %entc, which is ((PPC/ENT)*100).

    Listing 7. A typical sar report from a 2-way p520 system with dedicated LPAR

    configurationAIX nutmeg 3 5 00CD241F4C00 06/14/05

    System configuration: lcpu=4

    11:51:33 cpu %usr %sys %wio %idle physc11:51:34 0 0 0 0 100 0.30

    1 1 1 1 98 0.692 2 1 0 96 0.693 0 0 0 100 0.31- 1 1 0 98 1.99

    11:51:35 0 0 0 0 100 0.311 0 0 0 100 0.692 0 0 0 100 0.733 0 0 0 100 0.31- 0 0 0 100 2.04

  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    17/26

    11:51:36 0 0 0 0 100 0.311 0 0 0 100 0.692 0 0 0 100 0.703 0 0 0 100 0.31- 0 0 0 100 2.01

    11:51:37 0 0 0 0 100 0.31

    1 0 0 0 100 0.692 0 0 0 100 0.693 0 0 0 100 0.31- 0 0 0 100 2.00

    Average 0 0 0 0 100 0.311 0 0 0 99 0.692 1 0 0 99 0.703 0 0 0 100 0.31- 0 0 0 99 2.01

    mpstat

    The mpstat command collects and displays performance statistics for all logical CPUs in the

    system. If SMT is enabled, the mpstat s command displays physicals as well as usage of

    logical processors, as shown inListing 8below.

    Listing 8. A typical mpstat report from a 2-way p520 system with SPLAR configurationSystem configuration: lcpu=4

    Proc0 Proc1

    63.65% 63.65%

    cpu2 cpu0 cpu1 cpu358.15% 5.50% 61.43% 2.22%

    lparstat

    The lparstat command provides a report of LPAR-related information and utilization

    statistics. This command provides a display of current LPAR-related parameters and

    hypervisor information, as well as utilization statistics for the LPAR. An interval mechanismretrieves numbers of reports at a certain interval.

    The following statistics are displayed only when the partition type is shared:

    physc Shows the number of physical processors consumed.

    %entcShows the percentage of the entitled capacity consumed.

    lbusyShows the percentage of logical processor(s) utilization that occurred while executing

    at the user and system level.

    app Shows the available physical processors in the shared pool.

    phintShows the number of phantom (targeted to another shared partition in this pool)

    interruptions received.

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing8https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing8https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing8https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing8
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    18/26

    The following statistics are displayed only when the -h flag is specified:

    %hypvShows the percentage of time spent in hypervisor.

    hcalls Shows number of hypervisor calls executed.

    Listing 9. A typical lparstat report from a 2-way p520 machineSystem configuration: type=Dedicated mode=Capped smt=On lcpu=4 mem=15808

    %user %sys %wait %idle----- ---- ----- -----

    0.0 0.1 0.0 99.90.0 0.1 0.0 99.90.4 0.2 0.1 99.3

    # lparstat 1 3

    System configuration: type=Shared mode=Uncapped smt=On lcpu=2 mem=2560ent=0.50

    %user %sys %wait %idle physc %entc lbusy app vcsw phint----- ---- ----- ----- ----- ----- ------ --- ---- -----

    0.3 0.4 0.0 99.3 0.01 1.1 0.0 - 346 043.2 6.9 0.0 49.9 0.29 58.4 12.7 - 389 00.1 0.4 0.0 99.5 0.00 0.9 0.0 - 312 0

    Back to top

    Improving system performance

    For a CPU-bound system, you can improve the system performance by manipulating thread

    and process priorities of a specific process or tuning the scheduler algorithm to set a different

    system-wide scheduling policy.

    Changing user-process priority

    The commands to change or set user task priority include the nice and renice commandsand two system calls that allow thread priority and scheduling policy to be changed through

    API calls.

    Using the nice command

    The standard nice value of a foreground process is 20; the standard nice value of a

    background process is 24, if started from ksh orcsh (20, if started by tcsh and bsh). The

    system uses the nice value to calculate the priority of all threads associated with the process.

    Using the nice command, a user can specify an increment or decrement to the standard nice

    value so that a process can be started with a different priority. The thread priority is still non-

    fixed and gets different values based on the thread's CPU usage.

    By using nice, any user can run a command at a lower priority than normal. Only root can

    use nice to run commands at a priority higher than normal. For example, the command nice-5 iostat 10 3 >iostat.out causes the iostat command to start with a nice value of 25

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    19/26

    (instead of 20), resulting in a lower starting priority. The values ofnice and priority can be

    viewed using the ps command with the -l flag.Listing 10shows a typical output using the

    ps -l command:

    Listing 10. Using ps -l to observe process priorityF S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD

    240001 A 0 15396 5746 1 60 20 393ce 732 pts/3 0:00 ksh200001 A 0 15810 15396 3 70 25 793fe 524 pts/3 0:00

    iostat

    As root, you can run iostat at a higher priority with # nice --5 vmstat 10 3 >io.out.

    The iostat command can run with a nice value of 15, resulting in a higher starting priority.

    Using the renice command

    If a process is already running, you can use the renice command to alter the nice value, and

    thus the priority. The processes are identified by process ID, process group ID, or the name of

    the user who owns the processes. The renice command cannot be used on fixed priority

    processes.

    Using the setpri() and thread_setsched() subroutines

    There are now two system calls that allow users to make individual processes or threads to be

    scheduled with fixed priority. The setpri() system call is process-oriented and

    thread_setsched() is thread-oriented. Use caution when calling these two subroutines,

    since improper use might cause the system to hang.

    An application that runs under the root user ID can invoke the setpri() subroutine to set its

    own priority or the priority of another process. The target process is scheduled using the

    SCHED_RR scheduling policy with a fixed priority. The change is applied to all the threads in

    the process. Note the following two examples:

    retcode = setpri(0, 45);

    Gives the calling process a fixed priority of 45.

    retcode = setpri(1234, 35);

    Gives the process with PID of 1234 a fixed priority of 35.

    If the change is intended for a specific thread, the thread_setsched() subroutine can be

    used:

    retcode = thread_setsched(thread_id,priority_value, scheduling_policy)

    The parameter scheduling_policy can be one of the following:

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing10https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing10https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing10https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing10
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    20/26

    SCHED_OTHER, SCHED_FIFO, or SCHED_RR.

    When SCHED_OTHER is specified as the scheduling policy, the second parameter

    (priority_value) is ignored.

    Changing the scheduling algorithm globally

    AIX allows users to make changes to the priority calculation formula using the schedo

    command.

    Adjusting r and d

    As mentioned earlier, the formula for calculating the priority value is as follows:

    Priority = x_nice + (C * r/32 * x_nice_factor)

    The recent CPU usage value is displayed as the C column in the ps command output. Themaximum value of recent CPU usage is 120. Once every second, the CPU usage value for

    each thread is degraded using the following formula:

    New Charge C = (Old Charge C) * d / 32

    The default value ofr is 16; therefore, the thread priority is penalized by recent CPU usage

    * 0.5. The d also has a default value of 16, which means the recent CPU usage value ofevery process is reduced to half of its original value once every second. For some users, the

    default values ofsched_R and sched_D do not allow enough distinction between foreground

    and background processes. These two values can be tuned using sched_R and sched_D

    options to the schedo command. Note the following two examples:

    # schedo -o sched_R=0

    (R=0, D=.5) indicates that the CPU penalty was always 0. The priority value of the

    process would effectively be fixed, although it is not treated like an RR process.

    # schedo -o sched_D=32

    (R=0.5, D=1) indicates that long-running processes would reach a C value of 120 and

    stay there. The recent CPU usage value does not get reduced once every second and

  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    21/26

    the priority of long-running processes would not fluctuate back to low numbers

    (higher importance) to compete with new processes.

    Changing the timeslice

    Although the schedo command can modify the length of the scheduler timeslice, thetimeslice change only applies to RR threads. This does not affect threads running with other

    scheduling policies. The syntax for this command is:

    schedo -L timeslice

    n is the number of 10ms clock ticks to be used as the timeslice. schedo -p -o timeslice=2would set the timeslice length to 20ms.

    You must log on as root to make changes using the schedo command.

    Back to top

    Using additional techniques

    Other techniques that can help a CPU-bound system include the following.

    Scheduling

    Depending on the relative importance of applications, you could schedule less important ones

    for off-shift hours using at, cron, orbatch commands.

    Using the mkpasswd command

    If your system has thousands of entries in the /etc/passwd file, you could use mkpasswd

    command to create a hashed or indexed version of the /etc/passwd file to save CPU time

    spent in looking up a user ID.

    Back to top

    Tuning individual applications

    The following techniques can help you diagnose and improve the performance of specific

    applications running under AIX.

    Using the ps command

    The ps command or profiling can identify an application that is consuming large fractions of

    CPU time. This information can then be used to narrow the search for a CPU bottleneck.

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    22/26

    After you find the problem area, you can tune up or improve the application. You might need

    to recompile the application or change the source code.

    Using the schedo command

    The schedo command is used to set or display current or next boot values for all CPUscheduler tuning parameters. This command can only be executed by the root user. The

    schedo command can also make permanent changes or defer changes until the next reboot.Beginning with AIX 5L Version 5.3, several tuning parameters have been added to the

    schedo command.Listing 11shows all the CPU scheduler parameters.

    Listing 11. CPU scheduler parameters# schedo -a

    %usDelta = 100affinity_lim = 7

    big_tick_size = 1fixed_pri_global = 0force_grq = 0

    hotlocks_enable = 0idle_migration_barrier = 4

    krlock_confer2self = n/akrlock_conferb4alloc = n/a

    krlock_enable = n/akrlock_spinb4alloc = n/akrlock_spinb4confer = n/a

    maxspin = 16384n_idle_loop_vlopri = 100

    pacefork = 10sched_D = 16sched_R = 16

    search_globalrq_mload = 256search_smtrunq_mload = 256setnewrq_sidle_mload = 384shed_primrunq_mload = 64sidle_S1runq_mload = 64sidle_S2runq_mload = 134sidle_S3runq_mload = 134sidle_S4runq_mload = 4294967040slock_spinb4confer = 1024

    smt_snooze_delay = 0smtrunq_load_diff = 2

    timeslice = 1unboost_inflih = 1v_exempt_secs = 2v_min_process = 2

    v_repage_hi = 0v_repage_proc = 4

    v_sec_wait = 1

    Upgrading

    Upgrading the system to a faster CPU or more CPUs might be necessary if tuning does notimprove the performance.

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing11https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing11https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing11https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing11
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    23/26

    Back to top

    Case studies

    Two real-world examples show how the performance experts from IBM implemented these

    theories and techniques.

    Case 1

    Symptoms: The user has a batch script that starts up 500 other batch scripts, and each of

    these scripts queries and updates a database. Each script also starts as a client request from

    another machine. Each client request creates a database user thread on the database server

    machine. The response time began at less than 10 seconds for a period of time. Then the

    response time gradually became worse. At times it was more than a minute -- sometimes two

    minutes.

    Diagnosis: The run queue began growing until it reached into the hundreds. Another

    symptom included the CPU being 100 percent utilized (this was an eight-way SMP system),

    with 99 percent in user mode. By examining an AIX trace sample collected for a few

    seconds, we saw a pattern emerge. While a thread was using the CPU, a network packet

    would arrive and cause a network adapter interrupt. This would take the currently running

    thread off its CPU so the interrupt could be serviced.

    After servicing the interrupt, the scheduler verifies if any other threads are runnable and have

    a better priority than the currently running thread. Since the currently running thread had run

    for a few timeslices already, its CPU priority had increased as it accumulated CPU ticks.

    Each of the 500 scripts began with priority 60. If they were runnable, they would preempt any

    currently running thread with a thread priority higher than 60. The preempted thread would

    then be put at the end of the run queue and have to wait for the CPU until its priority rose

    again.

    One effect of this preemption was that sometimes a thread would be preempted while holding

    a database lock. Since this type of lock is implemented at the application layer within the

    database software, the kernel does not know that the thread is holding a lock. If the lock was

    a kernel-level lock or a pthread library mutex lock, then the kernel could perform priority

    boosting and boost a thread's priority to the same level as that of a running thread that isrequesting the lock. This way, the requesting thread does not have to wait long for the lock

    holder to get the CPU again and release the lock.

    Since the lock in this scenario was a user lock, the database thread would spin on the lock

    until it exhausted its spin count (a tunable database parameter), and then go to sleep. So the

    99 percent used CPU was mostly due to the threads spinning on database locks.

    Prescription: After determining that priority preemption was having a negative effect, we

    tuned the scheduler formula, which calculates the thread priority. This particular formula is:

    pri = base_pri + NICE + (C * r/32)

    https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    24/26

    pri is the new priority, base_pri is 40, NICE is the nice value (20 in this case), C is the CPU

    usage in ticks, and r is 16.

    As a thread accumulates CPU ticks, its priority value becomes larger, thereby making its

    priority lower.

    The schedo command provides a way to change the value ofr by using the sched_R option.

    Running schedo -p -o sched_R=0 causes r to be 0, which then causes the CPU penalty

    factor (C * r/32) to be 0. This prevents priorities from changing, unless the nice value is

    changed. If the nice value is the same for all threads, then threads can complete their

    timeslices without being preempted due to priority changes. This allows the thread that is

    currently running and holding the database lock to keep running and then release the lock.

    Results: These changes had an instantaneous impact on the performance. The response time,

    which was over two minutes by this time, started getting better until all of the scripts were

    completing in just a few seconds. The C value in the priority formula is recalculated once a

    second by a CPU usage decay factor (C = C*d/32). Setting the d value to 0 when using the

    schedo command would have accomplished the same result. In this case, ifd=0, then C*d/32

    = 0. Since the CPU penalty factor is C*r/32, this also becomes 0 so that the priority will be

    just 40 + NICE.

    Case 2

    Symptoms: A pSeries machine was used as both a database and an application server. Users

    would input requests into a forms-based application and then submit the transactions. They

    noticed that at certain times the forms would take longer to get updated on their screens and

    their usual short-running queries would return in a longer time period.

    Diagnosis: When this slowness was observed, there were also some long-running database

    batch jobs that were submitted to the system. Normally, such batch jobs would be run at

    night, but near the end of the month additional batch jobs were run during the day while the

    users were on the system. The batch jobs were CPU-intensive and constantly on the run

    queue. Therefore, users' threads had to compete with the threads of the batch jobs for the

    CPU.

    With priorities degrading as CPU usage increased, the batch jobs' priorities became worse

    and allowed the users' threads to run. However, the kernel decays the CPU usage value C by

    half once a second. This allowed the priorities of the batch jobs to improve in a short timeperiod. So the batch jobs would again compete for the CPU with the users' threads.

    Prescription: By changing the decay factor (d/32) used to reduce CPU usage once a second,

    we improved performance for the users. We used the schedo command to set the d value to

    31. The higher the value ofd; the higher the value ofC (C=C*d/32).

    Since C is used to calculate priorities (pri=40+NICE+C*r/32), the priority would get worse as

    C became larger. By setting the d value to a higher number, the C value is reduced at a slower

    than usual rate.

  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    25/26

    Results: The users' threads get the CPU more often than the batch threads. As a result, the

    users saw an immediate improvement in performance. Of course, the batch jobs would be

    slowed down somewhat, but these jobs would get the CPU whenever the users had any

    "think" time or had to wait on I/O. The impact was minimal on the batch jobs, but

    performance improvement for the users was dramatic.

    Case study notes: Tracing a pattern

    A final tip describes some odd things that impact performance. During one of our

    benchmarks, we noticed that the CPU usage reached 100 percent, with most of the time being

    charged to "system". At that time, the application performance degraded noticeably.

    After we collected an AIX trace, we noticed a repeating pattern. One application process

    would encounter a page fault on an address. That page fault caused a protection exception in

    the VMM, which in turn caused the kernel to send this process a SIGSEGV (segmentationviolation) signal. When the process resumed, the page faulted on the same address again,

    which then caused yet another protection exception and anotherSIGSEGV signal to be sent to

    the process. The default signal disposition for the SIGSEGV signal is to kill the process andgenerate a core dump, but in this case, the application continued on and stayed in this loop.

    Most of the CPU time was spent in this loop.

    After investigation, we discovered the problem: A developer for another component had

    installed a signal handler to catch the SIGSEGV signal in the code during the test process.

    After the testing was completed, the developer had forgotten to remove the signal handler.

    That component then linked with the rest of the application and, during the benchmark,

    another unrelated component of the application caused a segmentation fault. This old signal

    handler caught the exception, ignored it, and caused the process to resume. The currentinstruction (the one which caused the exception) was then restarted, causing an infinite loop

    to occur.

    Resources

    TheAIX 5L Support for Micro-Partitioning and Simultaneous Multi-threadingwhitepaper describes the simultaneous multi-threading and optionally, Micro-Partitioning

    new technologies and the AIX 5L support for them.

    The articleOperating system exploitation of the POWER5 systemdiscusses how newperformance features deliver improved system scalability and performance.

    TheAIX 5L Differences Guide Version 5.3 EditionRedbook focuses on thedifferences introduced in AIX 5L Version 5.3 when compared to AIX 5L Version 5.2.

    TheCapped and Uncapped Partitions in IBM POWER5whitepaper introduces andexplains the concepts of capped and uncapped partitions and discusses priority

    weighting and CPU utilization by memory pools.

    http://www-1.ibm.com/servers/aix/whitepapers/aix_support.pdfhttp://www-1.ibm.com/servers/aix/whitepapers/aix_support.pdfhttp://www-1.ibm.com/servers/aix/whitepapers/aix_support.pdfhttp://www.research.ibm.com/journal/abstracts/rd/494/mackerras.htmlhttp://www.research.ibm.com/journal/abstracts/rd/494/mackerras.htmlhttp://www.research.ibm.com/journal/abstracts/rd/494/mackerras.htmlhttp://www.redbooks.ibm.com/abstracts/SG247463.htmlhttp://www.redbooks.ibm.com/abstracts/SG247463.htmlhttp://www.redbooks.ibm.com/abstracts/SG247463.htmlhttps://www.ibm.com/partnerworld/wps/servlet/ContentHandler?contentId=cnt5j0ZSCR$cYX4MDAD&roadMapId=aix5lsol&roadMapName=Porting+your+Solaris+solution+to+AIX+Version+6.1&locale=en_UShttps://www.ibm.com/partnerworld/wps/servlet/ContentHandler?contentId=cnt5j0ZSCR$cYX4MDAD&roadMapId=aix5lsol&roadMapName=Porting+your+Solaris+solution+to+AIX+Version+6.1&locale=en_UShttps://www.ibm.com/partnerworld/wps/servlet/ContentHandler?contentId=cnt5j0ZSCR$cYX4MDAD&roadMapId=aix5lsol&roadMapName=Porting+your+Solaris+solution+to+AIX+Version+6.1&locale=en_UShttps://www.ibm.com/partnerworld/wps/servlet/ContentHandler?contentId=cnt5j0ZSCR$cYX4MDAD&roadMapId=aix5lsol&roadMapName=Porting+your+Solaris+solution+to+AIX+Version+6.1&locale=en_UShttp://www.redbooks.ibm.com/abstracts/SG247463.htmlhttp://www.research.ibm.com/journal/abstracts/rd/494/mackerras.htmlhttp://www-1.ibm.com/servers/aix/whitepapers/aix_support.pdf
  • 7/27/2019 Cpu Monitoring and Tunig SMIT

    26/26

    TheAIX 5L Practical Performance Tools and Tuning GuideRedbook acomprehensive guide about the performance monitoring and tuning tools that are

    provided with AIX 5L Version 5.3.

    Want more? The developerWorksAIX and UNIXzone hosts hundreds of informativearticles and introductory, intermediate, and advanced tutorials.

    Get involved in the developerWorks community by participating indeveloperWorksblogs.

    http://www.redbooks.ibm.com/abstracts/SG246478.htmlhttp://www.redbooks.ibm.com/abstracts/SG246478.htmlhttp://www.redbooks.ibm.com/abstracts/SG246478.htmlhttp://www.ibm.com/developerWorks/aix/http://www.ibm.com/developerWorks/aix/http://www.ibm.com/developerWorks/aix/http://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerWorks/aix/http://www.redbooks.ibm.com/abstracts/SG246478.html

Recommended