Why did my job run so long?Speeding Performance by Understanding the Cause
John BakerMVS Solutions
Agenda
• Where is my application spending its time?– CPU time, I/O time, wait (queue) times
• What am I waiting for?– Various flavors of queue time– What can/should I do about delays?
• Real world comparison– Stay tuned!
• Q/A
• Conclusions and wrap up
2
Distribution of Elapsed Time
Elapsed time = CPU time + I/O time + wait times
3
• CPU time = TCB + SRB
• I/O = IOSQ + PEND + CON + DISC
• Wait (queue) times– Initiator– Allocation (ENQ contention)– System services (HSM recall)– CPU Delay– LPAR dispatch– …
Sample Job A
• Elapsed time over 4 hours
• CPU time almost 1 hour
• I/O time under 10 minutes
= Focus on CPU time
4
JOB RUNTM CPUTM IOTIME
JOBA 4:23:53 0:48:21 0:09:12
Reducing CPU time
• Recompile– Many improvements in OS updates
• Tune application– Application Performance Tools
• (e.g. Strobe, FreezeFrame)– Identify CPU use by area of source code– Make friends with your developer
5
Sample Job B
• Elapsed time under 3 hours
• CPU time 20 minutes
• I/O time over 1.5 hours
= Focus on I/O time
6
JOB RUNTM CPUTM IOTIME
JOBB 2:41:30 0:21:20 1:37:44
Reducing I/O time
• Identify patterns– sequential vs random; read vs write
• Buffers– For VSAM consider NSR vs LSR– Give SORT memory – but not too much!
• Block size– System-determined generally works well – but check!– Half track for sequential; No smaller than 2K for random
• Compression– zEDC sounds promising!
• Include Storage Subsystem in your capacity planning
7
Wait/Queue time
8
0
5
10
15
20
25
30
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100%
Respon
se/Elapsed
Tim
e
Percent Utilization
Time vs Utilization
Wait
Execution
High Utilization = High wait time
9
Elapsed time grows exponentially with
utilization. Increasing priority doesn’t make the CPU any faster
• At high utilization levels, wait time is much greater than service time
Flavors of Queue (wait) Time
• Wait for “server”: initiator / CICS AOR / IMS MPR
• CPU delay (wait for logical CPU)
• I/O delay (iosq, pend, disconnect)
• Capping delay (LPAR capped vs actual delay)
• Resource Group maximum enforced
• Wait for LPAR (logical CPU) to be dispatched– PR/SM weight– Demand from other LPARs– CPC/CEC capacity
10
Initiator Queue
• SMF: R723CQDT
• TOTAL queue time (divide for average per job)
• Just start more inits?
• Not necessarily a good idea
• “Tuning to reduce the number of simultaneously active address spaces to the proper number needed to support a workload can reduce RNI and improve performance”
11https://www‐304.ibm.com/servers/resourcelink/lib03060.nsf/pages/lsprwork?OpenDocument&pathID=
Automated Initiators:Less is More
12
0
50
100
150
200
250
300
350
0 1 2 3 4 5 6 7 8 9
Act
ive
Jobs
Time (hours)
Benchmark: TM vs WLM concurrent Jobs
TM
WLM
TM jobs ahead
• Concurrency based on performance and utilization
CPU Delay
• Wait for logical CPU
• SMF: R723CCDE
• Work is ready to run but is delayed access to CPU
• Related to Service Class / goal / importance– Dispatching priority
• There is almost always some CPU delay– Tolerance is subjective– Are goals/SLA’s being met?
• Priorities are relative – overloading leads to thrashing
• Consider discretionary for MTTW13
0
10
20
30
40
50
60
70
80
90
100
Utilization vs CPU Delay
HIDLY
MDDLY
LODLY
UTIL
Utilization vs CPU delays
14
50% delay may be acceptable
At 100% busy, throughput degrades
significantly
I/O Delay
• IOSQ:– HyperPAV
• Pend:– CMR = overloaded controller– DB = volume contention (reserve?)– Any remaining = likely channels
• Disconnect– Random read misses– Synchronous remote copy
15
Revisit: sample Job B
• Disconnect time of 0:40:31 = 40% of total 1:37:44
• 40:31 (2431 seconds) divided by 9239646 I/O’s…
• = .263 ms average disconnect time
• Likely not unreasonable for random reads (consider SSD)
• Could also be replicated writes
• Become familiar with your typical application response times16
JOB RUNTM CPUTM IOTIME SMF30AID SMF30AIW EXCPS
JOBB 2:41:30 0:21:20 1:37:44 0:40:31 0:04:40 9239646
Capping Delay
• Possible when caps present
• SMF70NSW– WLM caps the logical CPUs– Delays LPAR dispatch
• SMF70NCA– Work is actually delayed for CPU due to capping
• Consider TM automation
17
0
200
400
600
800
1000
1200
0:00
:00
0:15
:00
0:30
:00
0:45
:00
1:00
:00
1:15
:00
1:15
:00
1:30
:00
1:45
:00
2:00
:00
2:15
:00
2:30
:00
2:45
:00
3:00
:00
3:15
:00
3:30
:00
3:45
:00
4:00
:00
4:15
:00
4:30
:00
4:45
:00
5:00
:00
5:15
:00
5:30
:00
5:45
:00
6:00
:00
6:15
:00
6:30
:00
6:45
:00
7:00
:00
7:15
:00
7:30
:00
7:45
:00
8:00
:00
8:00
:00
MSU
/HR
MSU Demand vs R4HA
LPAR_C
LPAR_B
LPAR_A
CPCR4HA
CAP
Capping vs Delay
LPAR is capped (SMF70NSW)
Work is delayed SMF70NCA
Capping can impactall Workloads
0
10
20
30
40
50
60
70
80
90
100
Velocity
STC_H: Importance 1
Goal
Velocity
Machine CPU reaches 100% Capping
begins
Batch is not the only workload suffering. Even the most
critical workloads are unable to meet their goal
Resource Group (max)
• Also a form of capping (same WLM algorithms)
• Pro: Useful to control “problem” applications
• Con: Static. Not flexible
• R723CCCA– Resource Group maximum enforced– Will override Service Class goals
20
LPAR Dispatch Delay
• Ratio of logical processor busy to physical processor busy
• Not always as obvious but very common!
• Term “Short CPs” introduced by Kathy Walsh (IBM WSC)– Share, Aug. 2004– https://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/PRS1077
• MXG: PLCPRDYQ
• Improved with Hiperdispatch and IRD
21
LPAR Dispatch Delay
• More initiators and/or higher dispatching priority will not resolve this problem
22
0
10
20
30
40
50
60
70
80
90
100
LPAR Dispatch Delay
INITS
CPUBSY
MVSBSY
40% delay
What does this look like in the real world?
23
Let’s take a trip to the deli
How many in the store at one time?
24
INITIATORS
Who’s next in line?
25
Dispatching Priority (Service Class)
How long til I can give my order?
26
Logical Processor (CP) Busy
How long til I’m done!
27
Physical Processor (CP) Busy
Forcing one step only stresses the next
28
29
Balance
30
About MVS Solutions
• MVS Solutions Inc.– Installed in over 200 datacenters worldwide– IBM Partner in Development
• ThruPut Manager– Automated Workload Balancing– Automated Batch Prioritization– Automated Capacity Management
31
Contact me:[email protected]
Join our Blog at www.thruputmanager.com