Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | kathlyn-foster |
View: | 215 times |
Download: | 0 times |
1
Elementary Performance Modelling as Applied to a Large System Benchmark
Sep 2014
22
Copyright (C) 2014 Rajesh Mansharamani
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled "GNU
Free Documentation License".
3
SECTION -1: ORGANIZATIONAL AWARENESS in PERFORMANCE ENGINEERING
44
Is there a single contact point for Performance Engineering in TCS?
Depending on who is asking the question and who is replying to it, the answer can be any of the following:
a) Yesb) Noc) I don’t knowd) Maybee) It depends …
55
Whom Should You Reach Out to for your System Performance Needs?
66
Whom Should You Reach Out to for your System Performance Needs?
Happy Phase of your Project
77
Whom Should You Reach Out to for your System Performance Needs?
CEG
CTG
Performance ToolsGroup
During the Happy Phase you can reach out to:
CTG
PERC
During the Happy Phase one often does not worry about what’s going to come.
88
Whom Should You Reach Out to for your System Performance Needs?
Quite often client will want a fire drill. So where do you go?
Assurance Practice
PT Practice
How many people does it take to light a fire?
99
Whom Should You Reach Out to for your System Performance Needs?
When the fire gets to you where do you go?
Infrastructure Practice
How many people does it take to fix a fire?
1010
Whom Should You Reach Out to for your System Performance Needs?
When the fire gets to you where do you go?
GCP
How much does it cost to hire a fire consultant?
Perf Engg
1111
Whom Should You Reach Out to for your System Performance Needs?
When the fire gets to you where do you go?
DEG
How many firefighters are available?
SK, SB
1212
Whom Should You Reach Out to for your System Performance Needs?
When the fire gets to you where do you go?
Phone a friend – nothing official about it!
How many ‘friends’, good samaritans are available in a company with 3L employees?
1313
Whom Should You Reach Out to for your System Performance Needs?
The 7 Pillars of
TCS Performance Engineering
CTG CTO Assurance
Infra GCP DEG
You
1414
What Happens When You Are Left to Yourself, With Nowhere to Go?
Dawn of Common Sense & Guts
1515
Common Sense and Guts: Performance Lifecycle
Requirement Analysis
Architecture & Design
Coding
Testing
Production
Common Sense
Guts
16
SECTION 0: DEFINITIONS
1717
Background: Enterprise Systems Performance
Business Processing System
End User
Response Time (R)
Throughput (X)= no. of completions per unit timeBusiness Workload
N concurrent users
Capacity: CPU, RAM, Storage, Network
Utilization (U)
1818
Definitions: R
System
Response Time = Exit Time – Entry Time
R1, R2, ..., Rn
R = average system response time = Ri / n
For example: average response time for a web page < 2 seconds,average response time in network < 1 second,average response time per SQL < 200ms,average response time per IO < 10ms
It all depends on where you draw the system boundary
1919
Definitions: R, R95,σR
Though average response time is used by default, it is not the only way to characterize response time. Other metrics are percentiles and standard deviation.
Consider 1000 samples of response time as per the following histogram:
0.5 1.0 1.5 2.0 2.50
100
200
300
400
500
600
100
500
300
50 50No.
of
sam
ple
s
R = ΣRi/1000
= (100*0.5 + 500*1.0 + 300*1.5 + 50*2.0 + 50*2.5)/1000 = 1.225
R95 = 95th percentile
= value within which 95% of samples fall = ?
2020
Definitions: R, R95,σR
Though average response time is used by default, it is not the only way to characterize response time. Other metrics are percentiles and standard deviation.
Consider 1000 samples of response time as per the following histogram:
0.5 1.0 1.5 2.0 2.50
100
200
300
400
500
600
100
500
300
50 50No.
of
sam
ple
s
R = ΣRi/1000
= (100*0.5 + 500*1.0 + 300*1.5 + 50*2.0 + 50*2.5)/1000 = 1.225
R95 = 95th percentile
= value within which 95% of samples fall = 2.0
Variance(R) = Σ(Ri-R)2/1000 = 0.212
Standard DeviationσR = sqrt(Variance) = 0.46
2121
Definitions: X
System
Throughput = Number of Completions per Unit Time
1, 2, ..., n
Measurement interval T
X = system throughput = n / T
For example: business throughput = 20 orders/sec,Web server throughput = 200 pages/sec,DB server throughput = 250 SQLs/sec,IO subsystem throughput = 1500 IOPS
It all depends on where you draw the system boundary
2222
Definitions: N
System
N = average number in the system = (1/T) N(t)dt
For example: average of 20 orders being processed in OMS,average of 500 concurrent sessions at web server,average of 10 orders in dispatch queue,average of 50 SQLs concurrent in DB server
It all depends on where you draw the system boundary
t
N(t)
12
3
2323
Definitions: Concurrent Users
Network
Web App DB
S1
S2 S3 S4
N
Avg think time Z
2424
Definitions: Concurrent Users
Network
Web App DB
S1
S2 S3 S4
N
Avg think time Z
Business Processing System S
By default, concurrent users refers to users doing business processing,which is N. The rate of submission depends upon think time and system response time.
2525
Definitions: Concurrent Users
Network
Web App DB
S1
S2 S3 S4
N
Avg think time Z
Business Processing System S
What is think time?
2626
Definitions: Concurrent Users
Network
Web App DB
S1
S2 S3 S4
N
Avg think time Z
Business Processing System S
What is think time?Time taken for any action outside of waiting for a response from the system, that is, time spent at the user terminal such as data entry, review of a response, waiting for the next transaction, ‘doing nothing’
2727
Definitions S:
Service Time S
Response Time R
WaitingTime
W
Average Service Time S = Average Response Time in Resource outside of queueing/waiting = Single User Response Time at Resource
28
SECTION 1: CASE STUDY RFP
29
Oracle Forms
Oracle Reports
Oracle database
RCC City A
RCC City BRCC City C
RCC – Regional Computing Centre
Total 36 RCC
Client server architecture
Each RCC works in isolation
Background:
Replace with one single centralized system for the country
3030
Database
VSAT
Dialup
WAN
HTTP server App Server Oracle DB
NCC - National Computing Centre
RCC City C
RCC City A
RCC City B
LBS’sLBS’s
Forms & Report server
Background (Proposed Architecture)
Benchmark to determine if this is technically feasible.Risk Mitigation Exercise.
31
System Integration RFP
• Application Benchmarking a pre-condition in RFP
• 4000 concurrent users
• Average Server Side Response Time per Screen < 1 sec
• Server Utilization < 50%
32
Objectives of the Benchmark
• To verify server side performance
• To evaluate scalability targets
• Recommend hardware configuration for the application
33
Rules of the Benchmark
• Application/Database/Load Runner scripts frozen
• Deterministic Think Time (see next slide)
• No reorganization of database permitted
• No application code optimization permitted
• Configuration parameter tuning of web, app, DB servers permitted
• Tests to be executed:
• 1000, 2000, 4000 users for 1 to 4 million transactions
34
Main Transaction To Be Benchmarked
Average Response Time Per Screen < 1 sec
Think Time for entire transaction: Z = 3 sec!!
Observed Cycle Time was 2 minutes!!!
35
Background to Z=3
• Small Scale Benchmarking done by proposal team with Z = 0
• Application crashed
• Small Scale Benchmarked repeated with Z = 3
• Application did not crash but response time was high
• Technical Committee decided that proper capacity planning had to be done and this was left to the vendors who bid for the RFP
36
Problem Statement
• Clearly a think time of 3 doesn’t make sense
• But how to convince the client, who happens to be the income tax department?
37
SECTION 2: A ‘LITTLE’ OF ELEMENTARY PERFORMANCE MODELLING THEORY
38
Let's Do Some Operational Analysis
SYSTEM(Work Conserving)
External Observer
Work Conserving: No work is created or destroyed within the system
As an External Observer what events can you observe?
39
Operational Analysis
t
A(t) = Total #arrivals up to time t
D(t) = Total #departures up to time t
?
40
Operational Analysis
t
A(t) = Total #arrivals up to time t
D(t) = Total #departures up to time t
N(t) = A(t) – D(t)
?
41
Operational Analysis
t
A(t) = Total #arrivals up to time t
D(t) = Total #departures up to time t
N(t) = A(t) – D(t)Zero in the system:N(t) = 0
42
Operational Analysis
t
N(t) = A(t) – D(t)
We would like to find the average of number in the system
43
Operational Analysis
t
N(t) = A(t) – D(t)
N = 1
TN(t)dt
44
Operational Analysis
t
N = 1
TN(t)dt
45
Operational Analysis
t
N = 1
TN(t)dt
?
46
Operational Analysis
t
N = 1
TN(t)dt
1 ?
47
Operational Analysis
t
N = 1
TN(t)dt
1 Ri = Response
Time of Job i
(under what assumption?)
48
Operational Analysis
t
N = 1
TN(t)dt
1 Ri = Response
Time of Job i
(assuming FCFS)
49
Operational Analysis
t
N = 1
TN(t)dt
1 Ri = Response
Time of Job i
(assuming FCFS)
= 1
T(1 x Ri)
i=1
D(T)
= D(T)
TRi
i=1
D(T)
D(T)
1X
50
Operational Analysis
t
N = 1
TN(t)dt
1 Ri = Response
Time of Job i
(assuming FCFS)
= 1
T(1 x Ri)
i=1
D(T)
= D(T)
TRi
i=1
D(T)
D(T)
1X
Avg Response Time RThroughput
X = #completions per unit of time
51
Operational Analysis
t
N = 1
TN(t)dt
1 Ri = Response
Time of Job i
(assuming FCFS)
= 1
T(1 x Ri)
i=1
D(T)
= D(T)
TRi
i=1
D(T)
D(T)
1X
Avg Response Time RThroughput
X = #completions per unit of time
Therefore N = X R
52
What if we don't have FCFS?
t
1 Di – Ai
N = 1
TN(t)dt =
1
T[1 x (Di – Ai)]
i=1
D(T)
1
TDi – Ai
i=1
D(T)
i=1
D(T)
=
53
t
N = 1
TN(t)dt
1
= 1
T[1 x (Di – Ai)]
i=1
D(T)
Di – Ai
1
TDi – Ai
i=1
D(T)
i=1
D(T)
=
1-1 pairing
A1
A2
A3
A4
D1
D2
D3
D4
D2'
D4'
D1'
D3'
What if we don't have FCFS?
1-1 mapping or pairing
Relabel departure sequence to correspond to arrival sequence
54
t
N = 1
TN(t)dt
1
= 1
T[1 x (Di – Ai)]
i=1
D(T)
Di – Ai
1
TDi
' – Ai
i=1
D(T)
i=1
D(T)
=
1-1 pairing
A1
A2
A3
A4
D1
D2
D3
D4
D2'
D4'
D1'
D3'
What if we don't have FCFS?
55
t
N = 1
TN(t)dt
1
= 1
T[1 x (Di – Ai)]
i=1
D(T)
Di – Ai
1
T(Di
' – Ai)i=1
D(T)
=
1-1 pairing
A1
A2
A3
A4
D1
D2
D3
D4
D2'
D4'
D1'
D3'
What if we don't have FCFS?
56
t
N = 1
TN(t)dt
1
=
Di – Ai
1
T(Di
' – Ai)i=1
D(T)
1-1 pairing
A1
A2
A3
A4
D1
D2
D3
D4
D2'
D4'
D1'
D3'
= i=1
D(T)1
TRi
What if we don't have FCFS?
57
t
N = 1
TN(t)dt
1
=
Di – Ai
1-1 pairing
A1
A2
A3
A4
D1
D2
D3
D4
D2'
D4'
D1'
D3'
= i=1
D(T)1
TRi
D(T)
T
1
D(T)X
i=1
D(T)
Ri = X R
What if we don't have FCFS?
58
Little's Law
For Any Work Conserving System
Average Number in the System N
= System Throughput X x Average Response Time in System R
All depends upon how you mark your system boundary
If X is business tps, then R is average completion time for a business txn
and
N is average number of business txns in the system
59
Little's Law for Closed Systems:
60
Little's Law for Closed Systems:
Average Number in Overall System (Red Box) = N
Overall System Throughput = X
Avg Response Time or Cycle in Overall System = ?
61
Little's Law for Closed Systems:
Average Number in Overall System (Red Box) = N
Overall System Throughput = X
Avg Response Time or Cycle in Overall System = Z+R
62
Little's Law for Closed Systems:
Average Number in Overall System (Red Box) = N
Overall System Throughput = X
Avg Response Time or Cycle in Overall System = Z+R
N = X (R+Z)
63
Little's Law for Closed Systems: Simpler Derivation
Cycle Time Per User = Z + R
Throughput Per User = 1/(Z + R)
Throughput For N Users = X = N/(Z + R)
N = X (R+Z)
64
SECTION 3: APPLYING THEORY TO PRACTICE
65
Recall Little’s Law for Closed Systems
N = X (R + Z)
Given:• N = 4000• R = 6.25 sec (for business transaction)• Z = 3 sec for business transaction
We understood that the customer wanted to do 52 million main transactions per year
How do we get expected X from the client?
a) What is your throughput?
b) How many returns/sec?
c) How many returns/hour?
d) How many returns/day?
e) How many returns/month?
f) How many returns/year?
66
RFP: Little’s Law Validation
N = X (R + Z)
Given:• N = 4000• R = 6.25 sec (for business transaction)• Z = 3 sec for business transaction• X = 52 million/year
Now check if these 4 balance with each other
67
RFP: Little’s Law Validation
X = N/(R+Z)
X = 4000/(6.25 + 3) = 432 returns/sec
Now compute how many returns per day?
How many working hours do we assume per day?
68
RFP: Little’s Law Validation
X = N/(R+Z)
X = 4000/(6.25 + 3) = 432 returns/sec
69
RFP: Little’s Law Validation
70
RFP: Little’s Law Validation
71
RFP: Little’s Law Validation
72
RFP: Little’s Law Validation
73
RFP: Little’s Law Validation
Based on this analysis the think time was increased, thoughnot to the extent we anticipated
• RFP think time was revised to Z=39 sec from the original Z=3 sec
74
SECTION 4: BENCHMARK & RESULTS
7575
Benchmark Environment Capacity Estimation
How to plan on number of CPUs required for Performance Testing
N = 4000Z = 39R = 6.2
X = N/(R+Z) = 4000/45.2 = 88 tps
To determine number of CPUs required to support 88 tps,we ran pilot tests on a 4 CPU box based on which we could extrapolate to the desired capacity for the test environment
7676
Benchmark Environment Capacity Estimation
Pilot Performance Tests on 4 CPU
0
0.5
1
1.5
2
Bus
ines
s Tx
ns p
er s
ec
1 5 10 100
Number of concurrent users
BenchmarkTarget: 88tps
7777
Benchmark Environment Capacity Estimation
Pilot Performance Tests on 4 CPU
0
0.5
1
1.5
2
Bus
ines
s Tx
ns p
er s
ec
1 5 10 100
Number of concurrent users
4 CPUs for 2 tpsÞ How many for 88 tps?
BenchmarkTarget: 88tps
4/2 * 88 = 176 CPUs
7878
Benchmark Environment Capacity Estimation
Pilot Performance Tests on 4 CPU
0
2
4
6
8
10
Bus
ines
s Tx
ns p
er s
ec
1 20 40 60 80 100
Number of concurrent users
The real picture
4 CPUs for 10 tpsÞ How many for 88 tps?
4/10 * 88 = 36 CPUs
7979
8080
Sequence Number Generation Problem
8181
Sequence Number Generation Problem
Seq No.
10000
10001
10002
10003
…
B-Tree Index in DB
10000
10001
10002
8282
8383
To get rid of this hot spot use ‘Reverse Key Index’
8484
Sequence Number Generation Problem
Seq No.
10000
10001
10002
10003
…
B-Tree Index in DB
10001
20001
ReverseSeq No.
8585
10001
20001
8686
Test Results @ Z = 3
N = 500 << 4000X = 48 returns/secR = 7 seconds (> RFP requirement of 6.25 sec)
8787
Test Results: 1000 & 2000 users @ Z=39
Throughput
3216CPU Used (DB)
1 M1 MReturns processed
Utilization
Response Times sec
< 10%~15%Web CPU %
~40%~40%Apps CPU %
~50%~45%DB CPU %
User Exit
Print Result
Refund Details
Compute
Main Screen
1.1
0.4
0.3
0.3
0.4
0.5
1.1
0.3
0.2
0.1
0.3
0.4User Login
05:4311:09Completion Time (Hrs)
4825Business TPS
2000/1 million
1000/ 1millionTest Type
1.2
0.3
0.2
0.1
0.4
0.5
Avg 95th pct
1.0
0.4
0.2
0.2
0.3
0.4Meets
RFP
Criteria
Throughput scales almost linearly – no apparent bottleneck
8888
Almost There
• At this point in time there was one week left for the exercise to complete
• Up to 2000 users results looked good and for the final test of 4000 users, extra capacity was also kept available on standby
• Auditors to visit benchmark lab for two days, towards end of the week
8989
And Now for the Grand Finale: N=4000 @ Z=39
9090
And Now for the Grand Finale: N=4000 @ Z=39
32
1 M
< 10%
~40%
~50%
05:43
48
2000/1 million
1.0
0.4
0.2
0.2
0.3
0.4
32CPU Used (DB)
1 MReturns processed
~5%Web CPU %
~45%Apps CPU %
~50%DB CPU %
User Exit
Print Result
Refund Details
Compute
Main Screen
1.2
29.3
4.5
17.1
2.1
1.6User Login
06:03Completion Time (Hrs)
48Business TPS
4000/1 millionTest Type
1.1
24.0
2.1
13.7
0.7
0.6
Throughput
Response Time Avg95th pct
Utilization
R4000 = 40.5R2000 = 1.1
9191
Let’s Add More CPUs
No. of Users 2000 4000 4000 4000
No. of DB CPU 32 32 48 56
No. of App Server
Business Throughput
4
48 tps
4
48 tps
5
48 tps
5
48 tps
Response Times (seconds)
User Login 0.5 2
Main Screen 0.4 2 same as for same as for
Compute 0. 3 17 32 CPUs 32 CPUs
Refund Details 0. 3 5
Print Result 0.4 29
User Exit 1 1
Utilization
DB CPU % ~50% ~50% ~45% ~40%
Apps CPU % ~40% ~45% ~40% ~40%
9292
Nothing Wrong with Capacity Planning
Nothing to be alarmed about
No disk, memory, network bottlenecks
Database does not scale with CPUs
< 192Kbps on 1 Gbps lan
< 45%
Constant at 45%
93
And The Panic Button Has Been Pressed!!!
• Response time jumped 40 fold when moving from N=2000 users to N=4000 users
• 1 day went in multiple people finger pointing each otheron whether they had touched any configuration file
• Another day went in various attempts to tune, butthe result was always the same
• X = 48 tps, R = 40.5 seconds
• Now there was just one day left before the auditors would come and disqualify us
What to do? Try more options or sit back and analyze?
94
SECTION 5: SOME MORE ELEMENTARY PERFORMANCE MODELLING
95
Bottleneck Law: Background - Pipelining
Car
Serial Mode: 1 car per 8 min
Pipelined Mode?
Chasis Door Window Paint
2 min 2 min 2 min 2 min2 min
96
Pipelining
Car
Serial Mode: 1 car per 8 min
Pipelined Mode? 4 cars per 8 min
Chasis Door Window Paint
2 min 2 min 2 min 2 min2 min
97
Pipelining
Car
Serial Mode: 1 car per 8 min
Pipelined Mode? 4 cars per 8 min
If door takes 4 min what is the throughput?
Chasis Door Window Paint
2 min 2 min 2 min 2 min4 min
98
Pipelining
Car
Serial Mode: 1 car per 8 min
Pipelined Mode? 4 cars per 8 min
If door takes 4 min what is the throughput?
Chasis Door Window Paint
2 min 2 min 2 min 2 min4 min
99
Pipelining
Car
Serial Mode: 1 car per 8 min
Pipelined Mode? 4 cars per 8 min
If door takes 4 min what is the throughput?
1 car per 4 min
Chasis Door Window Paint
2 min 2 min 2 min 2 min4 min
The slowest stage or bottleneck limits overall throughput
100
But Every Flow is Not a Pipeline
101
Average Time Spent inServicing Request (no contention)
Average Number of Times Resource is Visited per Transaction
Visit Counts
102
Visit Counts & Demand
Average Time Spent inServicing Request (no contention)
Average Number of Times Resource is Visited per Transaction
103
Visit Counts & Demand
Average Time Spent inServicing Request (no contention)
Average Number of Times Resource is Visited per Transaction
To reduce visit count:a) removal of redundant callsb) caching at calling tierc) increase capacity
To reduce service time:a) optimize codeb) tune platformc) get a faster resource
104
Effectively it is a Pipeline of Demands
Dmax = max { Di }
Max throughput = 1 / Dmax
Bottleneck Law
Demand Di = Vi Si
X 1 / Dmax
This bound becomes an equality once the bottleneck is reached (and the system doesn't crash or break down thereafter)
105
Elementary Performance Modelling: Summary
Little’s Law for Closed Systems: N = X (R+Z)
Bottleneck Law: X = 1/Dmax at saturation
Therefore at saturation:R = N/(1/Dmax) – ZR = N Dmax – Z
106
SECTION 6: APPLIED MODELLING & ANALYSIS
107
Modelling the 40-fold Increase in R
ApplicationTest Results
500 userZ=3 sec
2000 userZ=39 sec
4000 userZ=39 sec
Average Rsp Time of Entire Business Txn
7 sec 1.1 sec 40.5 sec
Business Throughput 48/sec 48/sec 48/sec
Clearly throughput saturates at X = 48/sec
Recall X = 1/Dmax when the bottleneck is hit
Dmax = 1/48 = 20msRecall at saturation:R = N Dmax – Z
108
Modelling the 40-fold Increase in R
ApplicationTest Results
500 userZ=3 sec
2000 userZ=39 sec
4000 userZ=39 sec
Average Rsp Time of Entire Business Txn
7 sec 1.1 sec 40.5 sec
Business Throughput 48/sec 48/sec 48/sec
Model: R = N Dmax – Z,
Dmax = 20ms
N = 500, Z = 3Þ R = 500*20ms – 3 = 10 – 3 = 7 sec
N = 2000, Z = 39Þ R = 2000*20ms – 39 = ?
N = 2000, Z = 39Þ R = 2000*20ms – 39 = 40-39 = 1 sec
N = 4000, Z = 39Þ R = 4000*20ms – 39 = ?
N = 4000, Z = 39Þ R = 4000*20ms – 39 = 80-39 = 41 sec
It is natural to have the 40 fold increase in R.
Now find out where is Dmax
109
SECTION 7: IN SEARCH OF THE BOTTLENECKDmax = 20ms
110
Let’s Look at the Usual Suspects
111
Oracle Trace File Analysis: No. of SQLs per BT
1 session in tracefile. 435 user SQL statements in trace file. 70 internal SQL statements in trace file. 505 SQL statements in trace file. 162 unique SQL statements in trace file. 145 SQL statements EXPLAINed using schema: AST.prof$plan_table Default table was used. Table was created. Table was dropped. 5295 lines in trace file.
WinterCorp Report At That Time:Peak OLTP worldwide on Unix = 8.6 million SQL calls per hour!!!!
435 Database user calls per business transaction= 435 * 48= 21024/second= 75.7 million per hour!!
112
SQL ordered by Gets for DB: RCC Instance: RCC CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s)
Hash Value --------------- -------------- ----------------- --------- ---------- ----------- -------------- 454,871,006 87,688 5,187.4 46.8 2778.07 20102.85
3184176672Module: f90runm@rp84201 (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and
SQL ordered by Reads for DB: RCC Instance: RCC
CPU ElapsdPhysical Reads Executions Reads per Exec %Total Time (s) Time (s)
Hash Value------------------ -------------- ------------------ --------- ----------- ----------- --------------- 14,416,210 87,688 164.4 83.6 2778.07 20102.85
3184176672Module: f90runm@rp84201 (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and
Extract of 30min Oracle Statspack report for 4000 user test
Let’s Look at the Usual Suspects: Bottleneck SQLs
113
SQL ordered by Gets for DB: RCC Instance: RCC CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s)
Hash Value --------------- -------------- ----------------- --------- ---------- ----------- -------------- 454,871,006 87,688 5,187.4 46.8 2778.07 20102.85
3184176672Module: f90runm@rp84201 (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and
SQL ordered by Reads for DB: RCC Instance: RCC
CPU ElapsdPhysical Reads Executions Reads per Exec %Total Time (s) Time (s)
Hash Value------------------ -------------- ------------------ --------- ----------- ----------- --------------- 14,416,210 87,688 164.4 83.6 2778.07 20102.85
3184176672Module: f90runm@rp84201 (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and
Extract of 30min Oracle Statspack report for 4000 user test
High physical and logical reads
Let’s Look at the Usual Suspects
114
SQL ordered by Gets for DB: RCC Instance: RCC CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s)
Hash Value --------------- -------------- ----------------- --------- ---------- ----------- -------------- 454,871,006 87,688 5,187.4 46.8 2778.07 20102.85
3184176672Module: f90runm@rp84201 (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and
SQL ordered by Reads for DB: RCC Instance: RCC
CPU ElapsdPhysical Reads Executions Reads per Exec %Total Time (s) Time (s)
Hash Value------------------ -------------- ------------------ --------- ----------- ----------- --------------- 14,416,210 87,688 164.4 83.6 2778.07 20102.85
3184176672Module: f90runm@rp84201 (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and
Extract of 30min Oracle Statspack report for 4000 user test
Let’s Look at the Usual Suspects
Avg. Response time/Execution= 20102.85/87688 =0.23 sec
No. of executions/sec = 87688/1800 (30 min report)
= 48
= business tps
Therefore we have only one execution of this SQL per business transaction.
If this SQL takes up 0.23sec of response time and is 46.8% of all SQLs, then we have a max contribution of 0.5sec from SQL Execution Time per Business Transaction.
Therefore the problem lies somewhere else.
115
Statspack Top Events
STATSPACK report for
Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- ---------
-------------------Begin Snap: 2 31-Oct-05 18:36:38 4,009 118.7 End Snap: 3 31-Oct-05 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)
Top 5 Timed Events~~~~~~~~~~~~~~~~~~~~~
% TotalEvent Waits Time (s) %Ela Time------------------------------ --------- ---------- --------latch free 1,539,663 3,587,743 98.96CPU time 28,487 .79db file sequential read 17,221,454 7,500 .21log file sync 46,102 773 .02enqueue 5,299 680 .02-----------------------------------------------------------------
Extract of Oracle Statspack report for 4000 user test: 30 min
Excessively high Latch contention (99% of total wait time)
116
What is a Latch in Oracle?
• Low level ‘lock’ that co-ordinates access to shared data structures
• Protect data structures from corruption when accessed by multiple processes
• First session to get the latch obtains exclusive access
• Otherwise Latch spinning Latch Sleeping Retry
• Latches are commonly used during SQL parsing
118
Stages of SQL Processing in Oracle
Syntax CheckSQLStmt
e.g. SLECT * FROM xyz;
Semantic Check
e.g. SELECT * FROM non-existent table;
SharedPool
Check
Each statement is hashed to generate a hash value
Generation of Multple Execution Plans
Generation of
Query Plan
Execution
Hash value found
Library Cache Hit
SoftParse
Hard ParseLibrary Cache Miss
Latch is held on library cache during hard/soft parse
119
Statspack Wait Events
Wait Events for DB: RCC Instance: RCC
Avg. Total Wait wait WaitsEvent Waits Time (s) (ms) /txn------------------------- ------- ------ ------ --------latch free 1,539,663 3,587,743 2330 5.2db file sequential read 17,221,454 7,500 0 60.6log file sync 246,102 773 3 0.9enqueue 5,299 680 128 0.0
Extract of Oracle Statspack report for 4000 user test: 30 min
Latch Free wait during 2000 user test was 8.31% and Avg. wait time was 2 ms
120
Statspack Wait Events
Wait Events for DB: RCC Instance: RCC
Avg. Total Wait wait WaitsEvent Waits Time (s) (ms) /txn------------------------- ------- ------ ------ --------latch free 1,539,663 3,587,743 2330 5.2db file sequential read 17,221,454 7,500 0 60.6log file sync 246,102 773 3 0.9enqueue 5,299 680 128 0.0
Extract of Oracle Statspack report for 4000 user test: 30 min
Total Wait Time per Business Transaction= No. of Waits/BusinessTxn X Average Time per Wait = No. of Waits/DBTxn X No. of DBTxn/BusinessTxn X Average Time per Wait
= 5.2 X No. of DBTxn/BusinessTxn X 2.33 sec
How to determine No. of DBTxn per BusinessTxn?
121121
Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- --------- --------------Begin Snap: 2 31-Oct-05 18:36:38 4,009 118.7 End Snap: 3 31-Oct-05 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)
Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 1,537,284.62 9,748.08 Logical reads: 539,210.76 3,419.19 Block changes: 8,615.05 54.63 Physical reads: 9,570.96 60.69 Physical writes: 982.97 6.23 User calls: 22,725.02 144.10 Parses: 14,450.63 91.63 Hard parses: 0.00 0.00 Sorts: 3,699.62 23.46 Logons: 0.00 0.00 Executes: 38,242.79 242.50 Transactions: 157.70
Statspack Load Profile
- DBTxns/sec = 157.7
- BusinessTxns/sec = 48
DBTxns/BusinessTxn = 157.7/48 = 3.29
122
Total Wait Time Avg.
Total Wait wait WaitsEvent Waits Time (s) (ms) /txn------------------------- ------- ------ ------ --------latch free 1,539,663 3,587,743 2330 5.2db file sequential read 17,221,454 7,500 0 60.6log file sync 246,102 773 3 0.9enqueue 5,299 680 128 0.0
Total Wait Time per Business Transaction= No. of Waits/BusinessTxn X Average Time per Wait = No. of Waits/DBTxn X No. of DBTxn/BusinessTxn X Average Time per Wait
= 5.2 X No. of DBTxn/BusinessTxn X 2.33 sec
Total Wait Time Per Business Txn= 5.2 * 3.29 * 2.33 = 39.86 sec
123
Response Time Breakdown
Average Response Time for 4000 users = 40.5 seconds
• Latch Wait Time per Business Transaction = 39.9 sec• Estimated SQL execution time per BT = 0.5 sec• Miscellaneous = 0.1 sec
Total 40.5 sec
Clearly the only bottleneck is the latching.
What is causing so much of latching? As discussed earlier, SQL parses cause latching.
124124
Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- --------- --------------Begin Snap: 2 31-Oct-05 18:36:38 4,009 118.7 End Snap: 3 31-Oct-05 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)
Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 1,537,284.62 9,748.08 Logical reads: 539,210.76 3,419.19 Block changes: 8,615.05 54.63 Physical reads: 9,570.96 60.69 Physical writes: 982.97 6.23 User calls: 22,725.02 144.10 Parses: 14,450.63 91.63 Hard parses: 0.00 0.00 Sorts: 3,699.62 23.46 Logons: 0.00 0.00 Executes: 38,242.79 242.50 Transactions: 157.70
Statspack Load Profile
We only have soft parses
125125
Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- --------- --------------Begin Snap: 2 31-Oct-05 18:36:38 4,009 118.7 End Snap: 3 31-Oct-05 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)
Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 1,537,284.62 9,748.08 Logical reads: 539,210.76 3,419.19 Block changes: 8,615.05 54.63 Physical reads: 9,570.96 60.69 Physical writes: 982.97 6.23 User calls: 22,725.02 144.10 Parses: 14,450.63 91.63 Hard parses: 0.00 0.00 Sorts: 3,699.62 23.46 Logons: 0.00 0.00 Executes: 38,242.79 242.50 Transactions: 157.70
Statspack Load Profile
It is common to have 100 to 1000 parses/sec in large systems. Here we have > 10,000/sec
126
Database Tuning that did not Help !
• Increasing spin count for acquiring latches
• Increasing session_cached_cursors
• Forcing cursor sharing for all sqls
• Setting cursor_space_for_time
• Increasing SGA size (shared pool, buffer cache)
• Creating buffer pools
127127
Parsing Analysis
SQL ordered by Parse Calls for DB: RCC Instance: RCC
% TotalParse Calls Executions Parses Hash Value------------ -------------- ---------- ------------ 14,285,993 14,286,265 54.89 2588670467
Module: f90runm@rp84201 (TNS V1-V3)declare p varchar2(32767); begin p := GF_GLOBAL_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); :v4 := substr(p,12001,4000); :v5 := substr(p,16001,4000); :v6 := substr(p,20001,4000);
2,939,514 2,939,618 11.29 2294365478
Module: f90runm@rp84201 (TNS V1-V3)declare p varchar2(32767); begin p := GF_MASTER_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); :v4 := substr(p,12001,4000); :v5 := substr(p,16001,4000); :v6 := substr(p,20001,4000);
Extract of Oracle Statspack report for 4000 user test
Most of the parse calls are for two functions : GF_GLOBAL_POLICY & GF_MASTER_POLICY
128
GF_GLOBAL_POLICY: call count cpu elapsed ------- ------ -------- ---------- Parse 207 0.01 0.02 Execute 207 0.03 0.04 Fetch 0 0.00 0.00 ------- ------ -------- ---------- total 414 0.04 0.07
GF_MASTER_POLICY:• call count cpu elapsed ------- ------ -------- ----------
• Parse 40 0.01 0.00 Execute 40 0.01 0.01 Fetch 0 0.00 0.00 • ------- - ----- -------- ---------- total 80 0.02 0.02
Function Call Analysis Extract of Oracle TKPROF report of test
Per Business Transactionthese two functions are called 207+40 = 247 times and have a total CPU time of 0.02 sec
Note: Our Performance Model had estimated bottleneck demand Dmax = 20ms = 0.02 sec!!
129
SECTION 8: THE LAST NAIL IN THE COFFIN
130
The Story Thus Far• RFP for System Integration of Centralized Application
• Benchmark of application is mandatory
• Think Time specified as 3 sec
• Realistic estimate proposed as 105 sec
• Technical Committee revises think time to 39 sec
• Last week of benchmark shows acceptable performancefor 2000 users, but complete disaster for 4000 users
• The application cannot be touched, all attempts at configuration tuning and capacity management are leading to a dead end
• Performance model validates that problem will occur at 4000 users
• Bottleneck identified to be excessive parsing
1 day left for auditor to arrive. What to do next?
131
Recall Bottleneck
GF_GLOBAL_POLICY: call count cpu elapsed ------- ------ -------- ---------- Parse 207 0.01 0.02 Execute 207 0.03 0.04 Fetch 0 0.00 0.00 ------- ------ -------- ---------- total 414 0.04 0.07
GF_MASTER_POLICY: call count cpu elapsed ------- ------ -------- ---------- Parse 40 0.01 0.00 Execute 40 0.01 0.01 Fetch 0 0.00 0.00 ------- - ----- -------- ---------- total 80 0.02 0.02
Per Business Transactionthese two functions are called 207+40 = 247 times and have a total CPU time of 0.02 sec
Note: Our Performance Model had estimated bottleneck demand Dmax = 20ms = 0.02 sec!!
These two functions are because of Oracle VPD
132
Oracle VPD
• Virtual Private Database to enforce security to a fine level of granularity
• For example an Regional Officer must see only his own region
• For a database with 52 million people, SELECT COUNT(*) FROM PEOPLE will return 52 million in general
• But suppose Regional Officer A is allowed to see only his 10,000 people, then when he fires the query he should see only 10,000
• VPD solves this problem by having the administrator specify policieswhich will append to the WHERE clause
• Thus when Regional officer A fires the query:SELECT COUNT(*) FROM PEOPLE
the policy function appends to it, to make it effectively: SELECT COUNT(*) FROM PEOPLE WHERE REGIONAL_OFFICER=“A”;
133
The Final Countdown
• Excessive latching was being caused due to VPD
• The application could not be touched and hence visit counts couldnot be reduced
• Recall: Demand = Visit Count x Service Time
• To speedup the time for parsing we needed a faster CPU, but we were already using the best in class
134
The Final Countdown
Theorem: If you cannot do it, prove that others cannot do it
Proof:
• R = N Dmax – Z = 4000 Dmax – 39
• Acceptable R ≤ 6.25, therefore acceptable Dmax ≤ (R+Z)/4000 = (6.25+39)/4000 = 11.31ms
• For Dmax to reduce from 20ms to 11.31ms the CPU should 45% faster
• At that point in time the competitor’s CPU was 25% faster
• This gave us relief that others would also have failed in this benchmark
135
The Final Countdown
• Last Day Options Available
• Option 1: Do nothing, and tell the customer that they will haveto wait for 2 years for CPUs to get as fast as their application needs
• Option 2: Run benchmark with realistic think time and forceauditor to include it in the audit report
We opted for Option 2
136
The Final Countdown
Recall, we had proposed Z=105 in response to which the customer increased Z from 3 to 39 sec
Now, even 39 sec was proving to be a bottleneck
We decided to benchmark with Z=93 sec instead of 39 sec
137
The Final Countdown
½ M1 M1 M1 MReturns processed
Throughput
Utilization
Response Times sec – 95th percentile
~5%~5%~10%~15%Web CPU %
~35%~45%~45%~40%Apps CPU %
~45%~50%~50%~45%DB CPU %
1.011.11.2User Exit
0.4290.40.3Print Result
0.250.30.2Refund Details
0.2170.30.1Compute
0.420.40.4Main Screen
0.520.50.5User Login
03:3606:0305:4311:09Completion Time (Hrs)
38474825Business TPS
4000/½ million
4000/1 million
2000/1 million
1000/ 1million
Test Type
Z = 39 sec Z = 93 sec
138138
The End
Net Result: RFP was scrapped and a new RFP released with think time = 2 minutes
139139
Summary of this SessionElementary Performance Modelling:
• Little’s Law: N = X (R+Z) for any work conserving system
• Bottleneck Law: X ≤ 1/Dmax
Application Centralization RFP:
• Proving infeasibility of Z=3 well before the benchmark
• Revision to Z=39 and subsequent failure at 4000 users
• Use of simple modelling to analyze bottleneck and provethat nothing needed to be done
• Use of simple modelling to arrive at realistic think time estimate and benchmark run to prove feasibility of Centralization despite stringent security checks in application
• New RFP which led to successful implementation of CentralizedApplication
140140
References
• Lazowska et al: Quantitative System Performance
• V Jain, J Murty: Centralized Tax Processing Performance, ROSETEA 2007
• R. Mansharamani et al. Performance Testing: Far From Steady State. IEEE COMPSAC, Seoul, July 2010
• www.SoftwarePerformanceEngineering.com