1 Elementary Performance Modelling as Applied to a Large System Benchmark...

1

Elementary Performance Modelling as Applied to a Large System Benchmark

[email protected]

Sep 2014

22

Copyright (C) 2014 Rajesh Mansharamani

Permission is granted to copy, distribute and/or modify this document

under the terms of the GNU Free Documentation License, Version 1.3

or any later version published by the Free Software Foundation;

with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

A copy of the license is included in the section entitled "GNU

Free Documentation License".

3

SECTION -1: ORGANIZATIONAL AWARENESS in PERFORMANCE ENGINEERING

44

Is there a single contact point for Performance Engineering in TCS?

Depending on who is asking the question and who is replying to it, the answer can be any of the following:

a) Yesb) Noc) I don’t knowd) Maybee) It depends …

55

Whom Should You Reach Out to for your System Performance Needs?

66


Happy Phase of your Project

77


CEG

CTG

Performance ToolsGroup

During the Happy Phase you can reach out to:

CTG

PERC

During the Happy Phase one often does not worry about what’s going to come.

88


Quite often client will want a fire drill. So where do you go?

Assurance Practice

PT Practice

How many people does it take to light a fire?

99


When the fire gets to you where do you go?

Infrastructure Practice

How many people does it take to fix a fire?

1010



GCP

How much does it cost to hire a fire consultant?

Perf Engg

1111



DEG

How many firefighters are available?

SK, SB

1212



Phone a friend – nothing official about it!

How many ‘friends’, good samaritans are available in a company with 3L employees?

1313


The 7 Pillars of

TCS Performance Engineering

CTG CTO Assurance

Infra GCP DEG

You

1414

What Happens When You Are Left to Yourself, With Nowhere to Go?

Dawn of Common Sense & Guts

1515

Common Sense and Guts: Performance Lifecycle

Requirement Analysis

Architecture & Design

Coding

Testing

Production

Common Sense

Guts

16

SECTION 0: DEFINITIONS

1717

Background: Enterprise Systems Performance

Business Processing System

End User

Response Time (R)

Throughput (X)= no. of completions per unit timeBusiness Workload

N concurrent users

Capacity: CPU, RAM, Storage, Network

Utilization (U)

1818

Definitions: R

System

Response Time = Exit Time – Entry Time

R1, R2, ..., Rn

R = average system response time = Ri / n

For example: average response time for a web page < 2 seconds,average response time in network < 1 second,average response time per SQL < 200ms,average response time per IO < 10ms

It all depends on where you draw the system boundary

1919

Definitions: R, R95,σR

Though average response time is used by default, it is not the only way to characterize response time. Other metrics are percentiles and standard deviation.

Consider 1000 samples of response time as per the following histogram:

0.5 1.0 1.5 2.0 2.50

100

200

300

400

500

600

100

500

300

50 50No.

of

sam

ple

s

R = ΣRi/1000

= (100*0.5 + 500*1.0 + 300*1.5 + 50*2.0 + 50*2.5)/1000 = 1.225

R95 = 95th percentile

= value within which 95% of samples fall = ?

2020

Definitions: R, R95,σR

Though average response time is used by default, it is not the only way to characterize response time. Other metrics are percentiles and standard deviation.

Consider 1000 samples of response time as per the following histogram:

0.5 1.0 1.5 2.0 2.50

100

200

300

400

500

600

100

500

300

50 50No.

of

sam

ple

s

R = ΣRi/1000

= (100*0.5 + 500*1.0 + 300*1.5 + 50*2.0 + 50*2.5)/1000 = 1.225

R95 = 95th percentile

= value within which 95% of samples fall = 2.0

Variance(R) = Σ(Ri-R)2/1000 = 0.212

Standard DeviationσR = sqrt(Variance) = 0.46

2121

Definitions: X

System

Throughput = Number of Completions per Unit Time

1, 2, ..., n

Measurement interval T

X = system throughput = n / T

For example: business throughput = 20 orders/sec,Web server throughput = 200 pages/sec,DB server throughput = 250 SQLs/sec,IO subsystem throughput = 1500 IOPS


2222

Definitions: N

System

N = average number in the system = (1/T) N(t)dt

For example: average of 20 orders being processed in OMS,average of 500 concurrent sessions at web server,average of 10 orders in dispatch queue,average of 50 SQLs concurrent in DB server


t

N(t)

12

3

2323

Definitions: Concurrent Users

Network

Web App DB

S1

S2 S3 S4

N

Avg think time Z

2424


Network

Web App DB

S1

S2 S3 S4

N

Avg think time Z

Business Processing System S

By default, concurrent users refers to users doing business processing,which is N. The rate of submission depends upon think time and system response time.

2525


Network

Web App DB

S1

S2 S3 S4

N

Avg think time Z


What is think time?

2626


Network

Web App DB

S1

S2 S3 S4

N

Avg think time Z


What is think time?Time taken for any action outside of waiting for a response from the system, that is, time spent at the user terminal such as data entry, review of a response, waiting for the next transaction, ‘doing nothing’

2727

Definitions S:

Service Time S

Response Time R

WaitingTime

W

Average Service Time S = Average Response Time in Resource outside of queueing/waiting = Single User Response Time at Resource

28

SECTION 1: CASE STUDY RFP

29

Oracle Forms

Oracle Reports

Oracle database

RCC City A

RCC City BRCC City C

RCC – Regional Computing Centre

Total 36 RCC

Client server architecture

Each RCC works in isolation

Background:

Replace with one single centralized system for the country

3030

Database

VSAT

Dialup

WAN

HTTP server App Server Oracle DB

NCC - National Computing Centre

RCC City C

RCC City A

RCC City B

LBS’sLBS’s

Forms & Report server

Background (Proposed Architecture)

Benchmark to determine if this is technically feasible.Risk Mitigation Exercise.

31

System Integration RFP

• Application Benchmarking a pre-condition in RFP

• 4000 concurrent users

• Average Server Side Response Time per Screen < 1 sec

• Server Utilization < 50%

32

Objectives of the Benchmark

• To verify server side performance

• To evaluate scalability targets

• Recommend hardware configuration for the application

33

Rules of the Benchmark

• Application/Database/Load Runner scripts frozen

• Deterministic Think Time (see next slide)

• No reorganization of database permitted

• No application code optimization permitted

• Configuration parameter tuning of web, app, DB servers permitted

• Tests to be executed:

• 1000, 2000, 4000 users for 1 to 4 million transactions

34

Main Transaction To Be Benchmarked

Average Response Time Per Screen < 1 sec

Think Time for entire transaction: Z = 3 sec!!

Observed Cycle Time was 2 minutes!!!

35

Background to Z=3

• Small Scale Benchmarking done by proposal team with Z = 0

• Application crashed

• Small Scale Benchmarked repeated with Z = 3

• Application did not crash but response time was high

• Technical Committee decided that proper capacity planning had to be done and this was left to the vendors who bid for the RFP

36

Problem Statement

• Clearly a think time of 3 doesn’t make sense

• But how to convince the client, who happens to be the income tax department?

37

SECTION 2: A ‘LITTLE’ OF ELEMENTARY PERFORMANCE MODELLING THEORY

38

Let's Do Some Operational Analysis

SYSTEM(Work Conserving)

External Observer

Work Conserving: No work is created or destroyed within the system

As an External Observer what events can you observe?

39

Operational Analysis

t

A(t) = Total #arrivals up to time t

D(t) = Total #departures up to time t

?

40


t



N(t) = A(t) – D(t)

?

41


t



N(t) = A(t) – D(t)Zero in the system:N(t) = 0

42


t

N(t) = A(t) – D(t)

We would like to find the average of number in the system

43


t

N(t) = A(t) – D(t)

N = 1

TN(t)dt

44


t

N = 1

TN(t)dt

45


t

N = 1

TN(t)dt

?

46


t

N = 1

TN(t)dt

1 ?

47


t

N = 1

TN(t)dt

1 Ri = Response

Time of Job i

(under what assumption?)

48


t

N = 1

TN(t)dt

1 Ri = Response

Time of Job i

(assuming FCFS)

49


t

N = 1

TN(t)dt

1 Ri = Response

Time of Job i

(assuming FCFS)

= 1

T(1 x Ri)

i=1

D(T)

= D(T)

TRi

i=1

D(T)

D(T)

1X

50


t

N = 1

TN(t)dt

1 Ri = Response

Time of Job i

(assuming FCFS)

= 1

T(1 x Ri)

i=1

D(T)

= D(T)

TRi

i=1

D(T)

D(T)

1X

Avg Response Time RThroughput

X = #completions per unit of time

51


t

N = 1

TN(t)dt

1 Ri = Response

Time of Job i

(assuming FCFS)

= 1

T(1 x Ri)

i=1

D(T)

= D(T)

TRi

i=1

D(T)

D(T)

1X

Avg Response Time RThroughput

X = #completions per unit of time

Therefore N = X R

52

What if we don't have FCFS?

t

1 Di – Ai

N = 1

TN(t)dt =

1

T[1 x (Di – Ai)]

i=1

D(T)

1

TDi – Ai

i=1

D(T)

i=1

D(T)

=

53

t

N = 1

TN(t)dt

1

= 1

T[1 x (Di – Ai)]

i=1

D(T)

Di – Ai

1

TDi – Ai

i=1

D(T)

i=1

D(T)

=

1-1 pairing

A1

A2

A3

A4

D1

D2

D3

D4

D2'

D4'

D1'

D3'


1-1 mapping or pairing

Relabel departure sequence to correspond to arrival sequence

54

t

N = 1

TN(t)dt

1

= 1

T[1 x (Di – Ai)]

i=1

D(T)

Di – Ai

1

TDi

' – Ai

i=1

D(T)

i=1

D(T)

=

1-1 pairing

A1

A2

A3

A4

D1

D2

D3

D4

D2'

D4'

D1'

D3'


55

t

N = 1

TN(t)dt

1

= 1

T[1 x (Di – Ai)]

i=1

D(T)

Di – Ai

1

T(Di

' – Ai)i=1

D(T)

=

1-1 pairing

A1

A2

A3

A4

D1

D2

D3

D4

D2'

D4'

D1'

D3'


56

t

N = 1

TN(t)dt

1

=

Di – Ai

1

T(Di

' – Ai)i=1

D(T)

1-1 pairing

A1

A2

A3

A4

D1

D2

D3

D4

D2'

D4'

D1'

D3'

= i=1

D(T)1

TRi


57

t

N = 1

TN(t)dt

1

=

Di – Ai

1-1 pairing

A1

A2

A3

A4

D1

D2

D3

D4

D2'

D4'

D1'

D3'

= i=1

D(T)1

TRi

D(T)

T

1

D(T)X

i=1

D(T)

Ri = X R


58

Little's Law

For Any Work Conserving System

Average Number in the System N

= System Throughput X x Average Response Time in System R

All depends upon how you mark your system boundary

If X is business tps, then R is average completion time for a business txn

and

N is average number of business txns in the system

59

Little's Law for Closed Systems:

60


Average Number in Overall System (Red Box) = N

Overall System Throughput = X

Avg Response Time or Cycle in Overall System = ?

61




Avg Response Time or Cycle in Overall System = Z+R

62




Avg Response Time or Cycle in Overall System = Z+R

N = X (R+Z)

63

Little's Law for Closed Systems: Simpler Derivation

Cycle Time Per User = Z + R

Throughput Per User = 1/(Z + R)

Throughput For N Users = X = N/(Z + R)

N = X (R+Z)

64

SECTION 3: APPLYING THEORY TO PRACTICE

65

Recall Little’s Law for Closed Systems

N = X (R + Z)

Given:• N = 4000• R = 6.25 sec (for business transaction)• Z = 3 sec for business transaction

We understood that the customer wanted to do 52 million main transactions per year

How do we get expected X from the client?

a) What is your throughput?

b) How many returns/sec?

c) How many returns/hour?

d) How many returns/day?

e) How many returns/month?

f) How many returns/year?

66

RFP: Little’s Law Validation

N = X (R + Z)

Given:• N = 4000• R = 6.25 sec (for business transaction)• Z = 3 sec for business transaction• X = 52 million/year

Now check if these 4 balance with each other

67


X = N/(R+Z)

X = 4000/(6.25 + 3) = 432 returns/sec

Now compute how many returns per day?

How many working hours do we assume per day?

68


X = N/(R+Z)

X = 4000/(6.25 + 3) = 432 returns/sec

69


70


71


72


73


Based on this analysis the think time was increased, thoughnot to the extent we anticipated

• RFP think time was revised to Z=39 sec from the original Z=3 sec

74

SECTION 4: BENCHMARK & RESULTS

7575

Benchmark Environment Capacity Estimation

How to plan on number of CPUs required for Performance Testing

N = 4000Z = 39R = 6.2

X = N/(R+Z) = 4000/45.2 = 88 tps

To determine number of CPUs required to support 88 tps,we ran pilot tests on a 4 CPU box based on which we could extrapolate to the desired capacity for the test environment

7676


Pilot Performance Tests on 4 CPU

0

0.5

1

1.5

2

Bus

ines

s Tx

ns p

er s

ec

1 5 10 100

Number of concurrent users

BenchmarkTarget: 88tps

7777



0

0.5

1

1.5

2

Bus

ines

s Tx

ns p

er s

ec

1 5 10 100


4 CPUs for 2 tpsÞ How many for 88 tps?

BenchmarkTarget: 88tps

4/2 * 88 = 176 CPUs

7878



0

2

4

6

8

10

Bus

ines

s Tx

ns p

er s

ec

1 20 40 60 80 100


The real picture

4 CPUs for 10 tpsÞ How many for 88 tps?

4/10 * 88 = 36 CPUs

7979

8080

Sequence Number Generation Problem

8181


Seq No.

10000

10001

10002

10003

…

B-Tree Index in DB

10000

10001

10002

8282

8383

To get rid of this hot spot use ‘Reverse Key Index’

8484


Seq No.

10000

10001

10002

10003

…

B-Tree Index in DB

10001

20001

ReverseSeq No.

8585

10001

20001

8686

Test Results @ Z = 3

N = 500 << 4000X = 48 returns/secR = 7 seconds (> RFP requirement of 6.25 sec)

8787

Test Results: 1000 & 2000 users @ Z=39

Throughput

3216CPU Used (DB)

1 M1 MReturns processed

Utilization

Response Times sec

< 10%~15%Web CPU %

~40%~40%Apps CPU %

~50%~45%DB CPU %

User Exit

Print Result

Refund Details

Compute

Main Screen

1.1

0.4

0.3

0.3

0.4

0.5

1.1

0.3

0.2

0.1

0.3

0.4User Login

05:4311:09Completion Time (Hrs)

4825Business TPS

2000/1 million

1000/ 1millionTest Type

1.2

0.3

0.2

0.1

0.4

0.5

Avg 95th pct

1.0

0.4

0.2

0.2

0.3

0.4Meets

RFP

Criteria

Throughput scales almost linearly – no apparent bottleneck

8888

Almost There

• At this point in time there was one week left for the exercise to complete

• Up to 2000 users results looked good and for the final test of 4000 users, extra capacity was also kept available on standby

• Auditors to visit benchmark lab for two days, towards end of the week

8989

And Now for the Grand Finale: N=4000 @ Z=39

9090

And Now for the Grand Finale: N=4000 @ Z=39

32

1 M

< 10%

~40%

~50%

05:43

48

2000/1 million

1.0

0.4

0.2

0.2

0.3

0.4

32CPU Used (DB)

1 MReturns processed

~5%Web CPU %

~45%Apps CPU %

~50%DB CPU %

User Exit

Print Result

Refund Details

Compute

Main Screen

1.2

29.3

4.5

17.1

2.1

1.6User Login

06:03Completion Time (Hrs)

48Business TPS

4000/1 millionTest Type

1.1

24.0

2.1

13.7

0.7

0.6

Throughput

Response Time Avg95th pct

Utilization

R4000 = 40.5R2000 = 1.1

9191

Let’s Add More CPUs

No. of Users 2000 4000 4000 4000

No. of DB CPU 32 32 48 56

No. of App Server

Business Throughput

4

48 tps

4

48 tps

5

48 tps

5

48 tps

Response Times (seconds)

User Login 0.5 2

Main Screen 0.4 2 same as for same as for

Compute 0. 3 17 32 CPUs 32 CPUs

Refund Details 0. 3 5

Print Result 0.4 29

User Exit 1 1

Utilization

DB CPU % ~50% ~50% ~45% ~40%

Apps CPU % ~40% ~45% ~40% ~40%

9292

Nothing Wrong with Capacity Planning

Nothing to be alarmed about

No disk, memory, network bottlenecks

Database does not scale with CPUs

< 192Kbps on 1 Gbps lan

< 45%

Constant at 45%

93

And The Panic Button Has Been Pressed!!!

• Response time jumped 40 fold when moving from N=2000 users to N=4000 users

• 1 day went in multiple people finger pointing each otheron whether they had touched any configuration file

• Another day went in various attempts to tune, butthe result was always the same

• X = 48 tps, R = 40.5 seconds

• Now there was just one day left before the auditors would come and disqualify us

What to do? Try more options or sit back and analyze?

94

SECTION 5: SOME MORE ELEMENTARY PERFORMANCE MODELLING

95

Bottleneck Law: Background - Pipelining

Car

Serial Mode: 1 car per 8 min

Pipelined Mode?

Chasis Door Window Paint

2 min 2 min 2 min 2 min2 min

96

Pipelining

Car


Pipelined Mode? 4 cars per 8 min



97

Pipelining

Car



If door takes 4 min what is the throughput?



98

Pipelining

Car






99

Pipelining

Car




1 car per 4 min



The slowest stage or bottleneck limits overall throughput

100

But Every Flow is Not a Pipeline

101

Average Time Spent inServicing Request (no contention)

Average Number of Times Resource is Visited per Transaction

Visit Counts

102

Visit Counts & Demand



103

Visit Counts & Demand



To reduce visit count:a) removal of redundant callsb) caching at calling tierc) increase capacity

To reduce service time:a) optimize codeb) tune platformc) get a faster resource

104

Effectively it is a Pipeline of Demands

Dmax = max { Di }

Max throughput = 1 / Dmax

Bottleneck Law

Demand Di = Vi Si

X 1 / Dmax

This bound becomes an equality once the bottleneck is reached (and the system doesn't crash or break down thereafter)

105

Elementary Performance Modelling: Summary

Little’s Law for Closed Systems: N = X (R+Z)

Bottleneck Law: X = 1/Dmax at saturation

Therefore at saturation:R = N/(1/Dmax) – ZR = N Dmax – Z

106

SECTION 6: APPLIED MODELLING & ANALYSIS

107

Modelling the 40-fold Increase in R

ApplicationTest Results

500 userZ=3 sec

2000 userZ=39 sec

4000 userZ=39 sec

Average Rsp Time of Entire Business Txn

7 sec 1.1 sec 40.5 sec

Business Throughput 48/sec 48/sec 48/sec

Clearly throughput saturates at X = 48/sec

Recall X = 1/Dmax when the bottleneck is hit

Dmax = 1/48 = 20msRecall at saturation:R = N Dmax – Z

108

Modelling the 40-fold Increase in R

ApplicationTest Results

500 userZ=3 sec

2000 userZ=39 sec

4000 userZ=39 sec

Average Rsp Time of Entire Business Txn

7 sec 1.1 sec 40.5 sec

Business Throughput 48/sec 48/sec 48/sec

Model: R = N Dmax – Z,

Dmax = 20ms

N = 500, Z = 3Þ R = 500*20ms – 3 = 10 – 3 = 7 sec

N = 2000, Z = 39Þ R = 2000*20ms – 39 = ?

N = 2000, Z = 39Þ R = 2000*20ms – 39 = 40-39 = 1 sec

N = 4000, Z = 39Þ R = 4000*20ms – 39 = ?

N = 4000, Z = 39Þ R = 4000*20ms – 39 = 80-39 = 41 sec

It is natural to have the 40 fold increase in R.

Now find out where is Dmax

109

SECTION 7: IN SEARCH OF THE BOTTLENECKDmax = 20ms

110

Let’s Look at the Usual Suspects

111

Oracle Trace File Analysis: No. of SQLs per BT

1 session in tracefile. 435 user SQL statements in trace file. 70 internal SQL statements in trace file. 505 SQL statements in trace file. 162 unique SQL statements in trace file. 145 SQL statements EXPLAINed using schema: AST.prof$plan_table Default table was used. Table was created. Table was dropped. 5295 lines in trace file.

WinterCorp Report At That Time:Peak OLTP worldwide on Unix = 8.6 million SQL calls per hour!!!!

435 Database user calls per business transaction= 435 * 48= 21024/second= 75.7 million per hour!!

112

SQL ordered by Gets for DB: RCC Instance: RCC CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s)

Hash Value --------------- -------------- ----------------- --------- ---------- ----------- -------------- 454,871,006 87,688 5,187.4 46.8 2778.07 20102.85

3184176672Module: f90runm@rp84201 (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and

SQL ordered by Reads for DB: RCC Instance: RCC

CPU ElapsdPhysical Reads Executions Reads per Exec %Total Time (s) Time (s)

Hash Value------------------ -------------- ------------------ --------- ----------- ----------- --------------- 14,416,210 87,688 164.4 83.6 2778.07 20102.85


Extract of 30min Oracle Statspack report for 4000 user test

Let’s Look at the Usual Suspects: Bottleneck SQLs

113


Hash Value --------------- -------------- ----------------- --------- ---------- ----------- -------------- 454,871,006 87,688 5,187.4 46.8 2778.07 20102.85




Hash Value------------------ -------------- ------------------ --------- ----------- ----------- --------------- 14,416,210 87,688 164.4 83.6 2778.07 20102.85



High physical and logical reads


114


Hash Value --------------- -------------- ----------------- --------- ---------- ----------- -------------- 454,871,006 87,688 5,187.4 46.8 2778.07 20102.85




Hash Value------------------ -------------- ------------------ --------- ----------- ----------- --------------- 14,416,210 87,688 164.4 83.6 2778.07 20102.85




Avg. Response time/Execution= 20102.85/87688 =0.23 sec

No. of executions/sec = 87688/1800 (30 min report)

= 48

= business tps

Therefore we have only one execution of this SQL per business transaction.

If this SQL takes up 0.23sec of response time and is 46.8% of all SQLs, then we have a max contribution of 0.5sec from SQL Execution Time per Business Transaction.

Therefore the problem lies somewhere else.

115

Statspack Top Events

STATSPACK report for

Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- ---------

-------------------Begin Snap: 2 31-Oct-05 18:36:38 4,009 118.7 End Snap: 3 31-Oct-05 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)

Top 5 Timed Events~~~~~~~~~~~~~~~~~~~~~

% TotalEvent Waits Time (s) %Ela Time------------------------------ --------- ---------- --------latch free 1,539,663 3,587,743 98.96CPU time 28,487 .79db file sequential read 17,221,454 7,500 .21log file sync 46,102 773 .02enqueue 5,299 680 .02-----------------------------------------------------------------

Extract of Oracle Statspack report for 4000 user test: 30 min

Excessively high Latch contention (99% of total wait time)

116

What is a Latch in Oracle?

• Low level ‘lock’ that co-ordinates access to shared data structures

• Protect data structures from corruption when accessed by multiple processes

• First session to get the latch obtains exclusive access

• Otherwise Latch spinning Latch Sleeping Retry

• Latches are commonly used during SQL parsing

118

Stages of SQL Processing in Oracle

Syntax CheckSQLStmt

e.g. SLECT * FROM xyz;

Semantic Check

e.g. SELECT * FROM non-existent table;

SharedPool

Check

Each statement is hashed to generate a hash value

Generation of Multple Execution Plans

Generation of

Query Plan

Execution

Hash value found

Library Cache Hit

SoftParse

Hard ParseLibrary Cache Miss

Latch is held on library cache during hard/soft parse

119

Statspack Wait Events

Wait Events for DB: RCC Instance: RCC

Avg. Total Wait wait WaitsEvent Waits Time (s) (ms) /txn------------------------- ------- ------ ------ --------latch free 1,539,663 3,587,743 2330 5.2db file sequential read 17,221,454 7,500 0 60.6log file sync 246,102 773 3 0.9enqueue 5,299 680 128 0.0


Latch Free wait during 2000 user test was 8.31% and Avg. wait time was 2 ms

120

Statspack Wait Events

Wait Events for DB: RCC Instance: RCC

Avg. Total Wait wait WaitsEvent Waits Time (s) (ms) /txn------------------------- ------- ------ ------ --------latch free 1,539,663 3,587,743 2330 5.2db file sequential read 17,221,454 7,500 0 60.6log file sync 246,102 773 3 0.9enqueue 5,299 680 128 0.0


Total Wait Time per Business Transaction= No. of Waits/BusinessTxn X Average Time per Wait = No. of Waits/DBTxn X No. of DBTxn/BusinessTxn X Average Time per Wait

= 5.2 X No. of DBTxn/BusinessTxn X 2.33 sec

How to determine No. of DBTxn per BusinessTxn?

121121

Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- --------- --------------Begin Snap: 2 31-Oct-05 18:36:38 4,009 118.7 End Snap: 3 31-Oct-05 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)

Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 1,537,284.62 9,748.08 Logical reads: 539,210.76 3,419.19 Block changes: 8,615.05 54.63 Physical reads: 9,570.96 60.69 Physical writes: 982.97 6.23 User calls: 22,725.02 144.10 Parses: 14,450.63 91.63 Hard parses: 0.00 0.00 Sorts: 3,699.62 23.46 Logons: 0.00 0.00 Executes: 38,242.79 242.50 Transactions: 157.70

Statspack Load Profile

- DBTxns/sec = 157.7

- BusinessTxns/sec = 48

DBTxns/BusinessTxn = 157.7/48 = 3.29

122

Total Wait Time Avg.

Total Wait wait WaitsEvent Waits Time (s) (ms) /txn------------------------- ------- ------ ------ --------latch free 1,539,663 3,587,743 2330 5.2db file sequential read 17,221,454 7,500 0 60.6log file sync 246,102 773 3 0.9enqueue 5,299 680 128 0.0

Total Wait Time per Business Transaction= No. of Waits/BusinessTxn X Average Time per Wait = No. of Waits/DBTxn X No. of DBTxn/BusinessTxn X Average Time per Wait

= 5.2 X No. of DBTxn/BusinessTxn X 2.33 sec

Total Wait Time Per Business Txn= 5.2 * 3.29 * 2.33 = 39.86 sec

123

Response Time Breakdown

Average Response Time for 4000 users = 40.5 seconds

• Latch Wait Time per Business Transaction = 39.9 sec• Estimated SQL execution time per BT = 0.5 sec• Miscellaneous = 0.1 sec

Total 40.5 sec

Clearly the only bottleneck is the latching.

What is causing so much of latching? As discussed earlier, SQL parses cause latching.

124124




We only have soft parses

125125




It is common to have 100 to 1000 parses/sec in large systems. Here we have > 10,000/sec

126

Database Tuning that did not Help !

• Increasing spin count for acquiring latches

• Increasing session_cached_cursors

• Forcing cursor sharing for all sqls

• Setting cursor_space_for_time

• Increasing SGA size (shared pool, buffer cache)

• Creating buffer pools

127127

Parsing Analysis

SQL ordered by Parse Calls for DB: RCC Instance: RCC

% TotalParse Calls Executions Parses Hash Value------------ -------------- ---------- ------------ 14,285,993 14,286,265 54.89 2588670467

Module: f90runm@rp84201 (TNS V1-V3)declare p varchar2(32767); begin p := GF_GLOBAL_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); :v4 := substr(p,12001,4000); :v5 := substr(p,16001,4000); :v6 := substr(p,20001,4000);

2,939,514 2,939,618 11.29 2294365478

Module: f90runm@rp84201 (TNS V1-V3)declare p varchar2(32767); begin p := GF_MASTER_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); :v4 := substr(p,12001,4000); :v5 := substr(p,16001,4000); :v6 := substr(p,20001,4000);

Extract of Oracle Statspack report for 4000 user test

Most of the parse calls are for two functions : GF_GLOBAL_POLICY & GF_MASTER_POLICY

128

GF_GLOBAL_POLICY: call count cpu elapsed ------- ------ -------- ---------- Parse 207 0.01 0.02 Execute 207 0.03 0.04 Fetch 0 0.00 0.00 ------- ------ -------- ---------- total 414 0.04 0.07

GF_MASTER_POLICY:• call count cpu elapsed ------- ------ -------- ----------

• Parse 40 0.01 0.00 Execute 40 0.01 0.01 Fetch 0 0.00 0.00 • ------- - ----- -------- ---------- total 80 0.02 0.02

Function Call Analysis Extract of Oracle TKPROF report of test

Per Business Transactionthese two functions are called 207+40 = 247 times and have a total CPU time of 0.02 sec

Note: Our Performance Model had estimated bottleneck demand Dmax = 20ms = 0.02 sec!!

129

SECTION 8: THE LAST NAIL IN THE COFFIN

130

The Story Thus Far• RFP for System Integration of Centralized Application

• Benchmark of application is mandatory

• Think Time specified as 3 sec

• Realistic estimate proposed as 105 sec

• Technical Committee revises think time to 39 sec

• Last week of benchmark shows acceptable performancefor 2000 users, but complete disaster for 4000 users

• The application cannot be touched, all attempts at configuration tuning and capacity management are leading to a dead end

• Performance model validates that problem will occur at 4000 users

• Bottleneck identified to be excessive parsing

1 day left for auditor to arrive. What to do next?

131

Recall Bottleneck

GF_GLOBAL_POLICY: call count cpu elapsed ------- ------ -------- ---------- Parse 207 0.01 0.02 Execute 207 0.03 0.04 Fetch 0 0.00 0.00 ------- ------ -------- ---------- total 414 0.04 0.07

GF_MASTER_POLICY: call count cpu elapsed ------- ------ -------- ---------- Parse 40 0.01 0.00 Execute 40 0.01 0.01 Fetch 0 0.00 0.00 ------- - ----- -------- ---------- total 80 0.02 0.02

Per Business Transactionthese two functions are called 207+40 = 247 times and have a total CPU time of 0.02 sec

Note: Our Performance Model had estimated bottleneck demand Dmax = 20ms = 0.02 sec!!

These two functions are because of Oracle VPD

132

Oracle VPD

• Virtual Private Database to enforce security to a fine level of granularity

• For example an Regional Officer must see only his own region

• For a database with 52 million people, SELECT COUNT(*) FROM PEOPLE will return 52 million in general

• But suppose Regional Officer A is allowed to see only his 10,000 people, then when he fires the query he should see only 10,000

• VPD solves this problem by having the administrator specify policieswhich will append to the WHERE clause

• Thus when Regional officer A fires the query:SELECT COUNT(*) FROM PEOPLE

the policy function appends to it, to make it effectively: SELECT COUNT(*) FROM PEOPLE WHERE REGIONAL_OFFICER=“A”;

133

The Final Countdown

• Excessive latching was being caused due to VPD

• The application could not be touched and hence visit counts couldnot be reduced

• Recall: Demand = Visit Count x Service Time

• To speedup the time for parsing we needed a faster CPU, but we were already using the best in class

134

The Final Countdown

Theorem: If you cannot do it, prove that others cannot do it

Proof:

• R = N Dmax – Z = 4000 Dmax – 39

• Acceptable R ≤ 6.25, therefore acceptable Dmax ≤ (R+Z)/4000 = (6.25+39)/4000 = 11.31ms

• For Dmax to reduce from 20ms to 11.31ms the CPU should 45% faster

• At that point in time the competitor’s CPU was 25% faster

• This gave us relief that others would also have failed in this benchmark

135

The Final Countdown

• Last Day Options Available

• Option 1: Do nothing, and tell the customer that they will haveto wait for 2 years for CPUs to get as fast as their application needs

• Option 2: Run benchmark with realistic think time and forceauditor to include it in the audit report

We opted for Option 2

136

The Final Countdown

Recall, we had proposed Z=105 in response to which the customer increased Z from 3 to 39 sec

Now, even 39 sec was proving to be a bottleneck

We decided to benchmark with Z=93 sec instead of 39 sec

137

The Final Countdown

½ M1 M1 M1 MReturns processed

Throughput

Utilization

Response Times sec – 95th percentile

~5%~5%~10%~15%Web CPU %

~35%~45%~45%~40%Apps CPU %

~45%~50%~50%~45%DB CPU %

1.011.11.2User Exit

0.4290.40.3Print Result

0.250.30.2Refund Details

0.2170.30.1Compute

0.420.40.4Main Screen

0.520.50.5User Login

03:3606:0305:4311:09Completion Time (Hrs)

38474825Business TPS

4000/½ million

4000/1 million

2000/1 million

1000/ 1million

Test Type

Z = 39 sec Z = 93 sec

138138

The End

Net Result: RFP was scrapped and a new RFP released with think time = 2 minutes

139139

Summary of this SessionElementary Performance Modelling:

• Little’s Law: N = X (R+Z) for any work conserving system

• Bottleneck Law: X ≤ 1/Dmax

Application Centralization RFP:

• Proving infeasibility of Z=3 well before the benchmark

• Revision to Z=39 and subsequent failure at 4000 users

• Use of simple modelling to analyze bottleneck and provethat nothing needed to be done

• Use of simple modelling to arrive at realistic think time estimate and benchmark run to prove feasibility of Centralization despite stringent security checks in application

• New RFP which led to successful implementation of CentralizedApplication

140140

References

• Lazowska et al: Quantitative System Performance

• V Jain, J Murty: Centralized Tax Processing Performance, ROSETEA 2007

• R. Mansharamani et al. Performance Testing: Far From Steady State. IEEE COMPSAC, Seoul, July 2010

• www.SoftwarePerformanceEngineering.com

Date post:	26-Dec-2015
Category:	Documents
Upload:	kathlyn-foster
View:	215 times
Download:	0 times

1 Elementary Performance Modelling as Applied to a Large System Benchmark...

Documents