+ All Categories
Home > Documents > 1 Elementary Performance Modelling as Applied to a Large System Benchmark...

1 Elementary Performance Modelling as Applied to a Large System Benchmark...

Date post: 26-Dec-2015
Category:
Upload: kathlyn-foster
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
139
1 Elementary Performance Modelling as Applied to a Large System Benchmark [email protected] Sep 2014
Transcript
Page 1: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

1

Elementary Performance Modelling as Applied to a Large System Benchmark

[email protected]

Sep 2014

Page 2: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

22

Copyright (C) 2014 Rajesh Mansharamani

Permission is granted to copy, distribute and/or modify this document

under the terms of the GNU Free Documentation License, Version 1.3

or any later version published by the Free Software Foundation;

with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

A copy of the license is included in the section entitled "GNU

Free Documentation License".

Page 3: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

3

SECTION -1: ORGANIZATIONAL AWARENESS in PERFORMANCE ENGINEERING

Page 4: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

44

Is there a single contact point for Performance Engineering in TCS?

Depending on who is asking the question and who is replying to it, the answer can be any of the following:

a) Yesb) Noc) I don’t knowd) Maybee) It depends …

Page 5: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

55

Whom Should You Reach Out to for your System Performance Needs?

Page 6: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

66

Whom Should You Reach Out to for your System Performance Needs?

Happy Phase of your Project

Page 7: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

77

Whom Should You Reach Out to for your System Performance Needs?

CEG

CTG

Performance ToolsGroup

During the Happy Phase you can reach out to:

CTG

PERC

During the Happy Phase one often does not worry about what’s going to come.

Page 8: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

88

Whom Should You Reach Out to for your System Performance Needs?

Quite often client will want a fire drill. So where do you go?

Assurance Practice

PT Practice

How many people does it take to light a fire?

Page 9: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

99

Whom Should You Reach Out to for your System Performance Needs?

When the fire gets to you where do you go?

Infrastructure Practice

How many people does it take to fix a fire?

Page 10: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

1010

Whom Should You Reach Out to for your System Performance Needs?

When the fire gets to you where do you go?

GCP

How much does it cost to hire a fire consultant?

Perf Engg

Page 11: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

1111

Whom Should You Reach Out to for your System Performance Needs?

When the fire gets to you where do you go?

DEG

How many firefighters are available?

SK, SB

Page 12: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

1212

Whom Should You Reach Out to for your System Performance Needs?

When the fire gets to you where do you go?

Phone a friend – nothing official about it!

How many ‘friends’, good samaritans are available in a company with 3L employees?

Page 13: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

1313

Whom Should You Reach Out to for your System Performance Needs?

The 7 Pillars of

TCS Performance Engineering

CTG CTO Assurance

Infra GCP DEG

You

Page 14: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

1414

What Happens When You Are Left to Yourself, With Nowhere to Go?

Dawn of Common Sense & Guts

Page 15: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

1515

Common Sense and Guts: Performance Lifecycle

Requirement Analysis

Architecture & Design

Coding

Testing

Production

Common Sense

Guts

Page 16: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

16

SECTION 0: DEFINITIONS

Page 17: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

1717

Background: Enterprise Systems Performance

Business Processing System

End User

Response Time (R)

Throughput (X)= no. of completions per unit timeBusiness Workload

N concurrent users

Capacity: CPU, RAM, Storage, Network

Utilization (U)

Page 18: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

1818

Definitions: R

System

Response Time = Exit Time – Entry Time

R1, R2, ..., Rn

R = average system response time = Ri / n

For example: average response time for a web page < 2 seconds,average response time in network < 1 second,average response time per SQL < 200ms,average response time per IO < 10ms

It all depends on where you draw the system boundary

Page 19: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

1919

Definitions: R, R95,σR

Though average response time is used by default, it is not the only way to characterize response time. Other metrics are percentiles and standard deviation.

Consider 1000 samples of response time as per the following histogram:

0.5 1.0 1.5 2.0 2.50

100

200

300

400

500

600

100

500

300

50 50No.

of

sam

ple

s

R = ΣRi/1000

= (100*0.5 + 500*1.0 + 300*1.5 + 50*2.0 + 50*2.5)/1000 = 1.225

R95 = 95th percentile

= value within which 95% of samples fall = ?

Page 20: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

2020

Definitions: R, R95,σR

Though average response time is used by default, it is not the only way to characterize response time. Other metrics are percentiles and standard deviation.

Consider 1000 samples of response time as per the following histogram:

0.5 1.0 1.5 2.0 2.50

100

200

300

400

500

600

100

500

300

50 50No.

of

sam

ple

s

R = ΣRi/1000

= (100*0.5 + 500*1.0 + 300*1.5 + 50*2.0 + 50*2.5)/1000 = 1.225

R95 = 95th percentile

= value within which 95% of samples fall = 2.0

Variance(R) = Σ(Ri-R)2/1000 = 0.212

Standard DeviationσR = sqrt(Variance) = 0.46

Page 21: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

2121

Definitions: X

System

Throughput = Number of Completions per Unit Time

1, 2, ..., n

Measurement interval T

X = system throughput = n / T

For example: business throughput = 20 orders/sec,Web server throughput = 200 pages/sec,DB server throughput = 250 SQLs/sec,IO subsystem throughput = 1500 IOPS

It all depends on where you draw the system boundary

Page 22: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

2222

Definitions: N

System

N = average number in the system = (1/T) N(t)dt

For example: average of 20 orders being processed in OMS,average of 500 concurrent sessions at web server,average of 10 orders in dispatch queue,average of 50 SQLs concurrent in DB server

It all depends on where you draw the system boundary

t

N(t)

12

3

Page 23: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

2323

Definitions: Concurrent Users

Network

Web App DB

S1

S2 S3 S4

N

Avg think time Z

Page 24: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

2424

Definitions: Concurrent Users

Network

Web App DB

S1

S2 S3 S4

N

Avg think time Z

Business Processing System S

By default, concurrent users refers to users doing business processing,which is N. The rate of submission depends upon think time and system response time.

Page 25: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

2525

Definitions: Concurrent Users

Network

Web App DB

S1

S2 S3 S4

N

Avg think time Z

Business Processing System S

What is think time?

Page 26: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

2626

Definitions: Concurrent Users

Network

Web App DB

S1

S2 S3 S4

N

Avg think time Z

Business Processing System S

What is think time?Time taken for any action outside of waiting for a response from the system, that is, time spent at the user terminal such as data entry, review of a response, waiting for the next transaction, ‘doing nothing’

Page 27: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

2727

Definitions S:

Service Time S

Response Time R

WaitingTime

W

Average Service Time S = Average Response Time in Resource outside of queueing/waiting = Single User Response Time at Resource

Page 28: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

28

SECTION 1: CASE STUDY RFP

Page 29: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

29

Oracle Forms

Oracle Reports

Oracle database

RCC City A

RCC City BRCC City C

RCC – Regional Computing Centre

Total 36 RCC

Client server architecture

Each RCC works in isolation

Background:

Replace with one single centralized system for the country

Page 30: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

3030

Database

VSAT

Dialup

WAN

HTTP server App Server Oracle DB

NCC - National Computing Centre

RCC City C

RCC City A

RCC City B

LBS’sLBS’s

Forms & Report server

Background (Proposed Architecture)

Benchmark to determine if this is technically feasible.Risk Mitigation Exercise.

Page 31: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

31

System Integration RFP

• Application Benchmarking a pre-condition in RFP

• 4000 concurrent users

• Average Server Side Response Time per Screen < 1 sec

• Server Utilization < 50%

Page 32: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

32

Objectives of the Benchmark

• To verify server side performance

• To evaluate scalability targets

• Recommend hardware configuration for the application

Page 33: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

33

Rules of the Benchmark

• Application/Database/Load Runner scripts frozen

• Deterministic Think Time (see next slide)

• No reorganization of database permitted

• No application code optimization permitted

• Configuration parameter tuning of web, app, DB servers permitted

• Tests to be executed:

• 1000, 2000, 4000 users for 1 to 4 million transactions

Page 34: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

34

Main Transaction To Be Benchmarked

Average Response Time Per Screen < 1 sec

Think Time for entire transaction: Z = 3 sec!!

Observed Cycle Time was 2 minutes!!!

Page 35: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

35

Background to Z=3

• Small Scale Benchmarking done by proposal team with Z = 0

• Application crashed

• Small Scale Benchmarked repeated with Z = 3

• Application did not crash but response time was high

• Technical Committee decided that proper capacity planning had to be done and this was left to the vendors who bid for the RFP

Page 36: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

36

Problem Statement

• Clearly a think time of 3 doesn’t make sense

• But how to convince the client, who happens to be the income tax department?

Page 37: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

37

SECTION 2: A ‘LITTLE’ OF ELEMENTARY PERFORMANCE MODELLING THEORY

Page 38: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

38

Let's Do Some Operational Analysis

SYSTEM(Work Conserving)

External Observer

Work Conserving: No work is created or destroyed within the system

As an External Observer what events can you observe?

Page 39: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

39

Operational Analysis

t

A(t) = Total #arrivals up to time t

D(t) = Total #departures up to time t

?

Page 40: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

40

Operational Analysis

t

A(t) = Total #arrivals up to time t

D(t) = Total #departures up to time t

N(t) = A(t) – D(t)

?

Page 41: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

41

Operational Analysis

t

A(t) = Total #arrivals up to time t

D(t) = Total #departures up to time t

N(t) = A(t) – D(t)Zero in the system:N(t) = 0

Page 42: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

42

Operational Analysis

t

N(t) = A(t) – D(t)

We would like to find the average of number in the system

Page 43: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

43

Operational Analysis

t

N(t) = A(t) – D(t)

N = 1

TN(t)dt

Page 44: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

44

Operational Analysis

t

N = 1

TN(t)dt

Page 45: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

45

Operational Analysis

t

N = 1

TN(t)dt

?

Page 46: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

46

Operational Analysis

t

N = 1

TN(t)dt

1 ?

Page 47: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

47

Operational Analysis

t

N = 1

TN(t)dt

1 Ri = Response

Time of Job i

(under what assumption?)

Page 48: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

48

Operational Analysis

t

N = 1

TN(t)dt

1 Ri = Response

Time of Job i

(assuming FCFS)

Page 49: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

49

Operational Analysis

t

N = 1

TN(t)dt

1 Ri = Response

Time of Job i

(assuming FCFS)

= 1

T(1 x Ri)

i=1

D(T)

= D(T)

TRi

i=1

D(T)

D(T)

1X

Page 50: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

50

Operational Analysis

t

N = 1

TN(t)dt

1 Ri = Response

Time of Job i

(assuming FCFS)

= 1

T(1 x Ri)

i=1

D(T)

= D(T)

TRi

i=1

D(T)

D(T)

1X

Avg Response Time RThroughput

X = #completions per unit of time

Page 51: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

51

Operational Analysis

t

N = 1

TN(t)dt

1 Ri = Response

Time of Job i

(assuming FCFS)

= 1

T(1 x Ri)

i=1

D(T)

= D(T)

TRi

i=1

D(T)

D(T)

1X

Avg Response Time RThroughput

X = #completions per unit of time

Therefore N = X R

Page 52: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

52

What if we don't have FCFS?

t

1 Di – Ai

N = 1

TN(t)dt =

1

T[1 x (Di – Ai)]

i=1

D(T)

1

TDi – Ai

i=1

D(T)

i=1

D(T)

=

Page 53: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

53

t

N = 1

TN(t)dt

1

= 1

T[1 x (Di – Ai)]

i=1

D(T)

Di – Ai

1

TDi – Ai

i=1

D(T)

i=1

D(T)

=

1-1 pairing

A1

A2

A3

A4

D1

D2

D3

D4

D2'

D4'

D1'

D3'

What if we don't have FCFS?

1-1 mapping or pairing

Relabel departure sequence to correspond to arrival sequence

Page 54: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

54

t

N = 1

TN(t)dt

1

= 1

T[1 x (Di – Ai)]

i=1

D(T)

Di – Ai

1

TDi

' – Ai

i=1

D(T)

i=1

D(T)

=

1-1 pairing

A1

A2

A3

A4

D1

D2

D3

D4

D2'

D4'

D1'

D3'

What if we don't have FCFS?

Page 55: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

55

t

N = 1

TN(t)dt

1

= 1

T[1 x (Di – Ai)]

i=1

D(T)

Di – Ai

1

T(Di

' – Ai)i=1

D(T)

=

1-1 pairing

A1

A2

A3

A4

D1

D2

D3

D4

D2'

D4'

D1'

D3'

What if we don't have FCFS?

Page 56: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

56

t

N = 1

TN(t)dt

1

=

Di – Ai

1

T(Di

' – Ai)i=1

D(T)

1-1 pairing

A1

A2

A3

A4

D1

D2

D3

D4

D2'

D4'

D1'

D3'

= i=1

D(T)1

TRi

What if we don't have FCFS?

Page 57: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

57

t

N = 1

TN(t)dt

1

=

Di – Ai

1-1 pairing

A1

A2

A3

A4

D1

D2

D3

D4

D2'

D4'

D1'

D3'

= i=1

D(T)1

TRi

D(T)

T

1

D(T)X

i=1

D(T)

Ri = X R

What if we don't have FCFS?

Page 58: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

58

Little's Law

For Any Work Conserving System

Average Number in the System N

= System Throughput X x Average Response Time in System R

All depends upon how you mark your system boundary

If X is business tps, then R is average completion time for a business txn

and

N is average number of business txns in the system

Page 59: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

59

Little's Law for Closed Systems:

Page 60: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

60

Little's Law for Closed Systems:

Average Number in Overall System (Red Box) = N

Overall System Throughput = X

Avg Response Time or Cycle in Overall System = ?

Page 61: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

61

Little's Law for Closed Systems:

Average Number in Overall System (Red Box) = N

Overall System Throughput = X

Avg Response Time or Cycle in Overall System = Z+R

Page 62: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

62

Little's Law for Closed Systems:

Average Number in Overall System (Red Box) = N

Overall System Throughput = X

Avg Response Time or Cycle in Overall System = Z+R

N = X (R+Z)

Page 63: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

63

Little's Law for Closed Systems: Simpler Derivation

Cycle Time Per User = Z + R

Throughput Per User = 1/(Z + R)

Throughput For N Users = X = N/(Z + R)

N = X (R+Z)

Page 64: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

64

SECTION 3: APPLYING THEORY TO PRACTICE

Page 65: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

65

Recall Little’s Law for Closed Systems

N = X (R + Z)

Given:• N = 4000• R = 6.25 sec (for business transaction)• Z = 3 sec for business transaction

We understood that the customer wanted to do 52 million main transactions per year

How do we get expected X from the client?

a) What is your throughput?

b) How many returns/sec?

c) How many returns/hour?

d) How many returns/day?

e) How many returns/month?

f) How many returns/year?

Page 66: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

66

RFP: Little’s Law Validation

N = X (R + Z)

Given:• N = 4000• R = 6.25 sec (for business transaction)• Z = 3 sec for business transaction• X = 52 million/year

Now check if these 4 balance with each other

Page 67: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

67

RFP: Little’s Law Validation

X = N/(R+Z)

X = 4000/(6.25 + 3) = 432 returns/sec

Now compute how many returns per day?

How many working hours do we assume per day?

Page 68: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

68

RFP: Little’s Law Validation

X = N/(R+Z)

X = 4000/(6.25 + 3) = 432 returns/sec

Page 69: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

69

RFP: Little’s Law Validation

Page 70: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

70

RFP: Little’s Law Validation

Page 71: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

71

RFP: Little’s Law Validation

Page 72: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

72

RFP: Little’s Law Validation

Page 73: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

73

RFP: Little’s Law Validation

Based on this analysis the think time was increased, thoughnot to the extent we anticipated

• RFP think time was revised to Z=39 sec from the original Z=3 sec

Page 74: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

74

SECTION 4: BENCHMARK & RESULTS

Page 75: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

7575

Benchmark Environment Capacity Estimation

How to plan on number of CPUs required for Performance Testing

N = 4000Z = 39R = 6.2

X = N/(R+Z) = 4000/45.2 = 88 tps

To determine number of CPUs required to support 88 tps,we ran pilot tests on a 4 CPU box based on which we could extrapolate to the desired capacity for the test environment

Page 76: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

7676

Benchmark Environment Capacity Estimation

Pilot Performance Tests on 4 CPU

0

0.5

1

1.5

2

Bus

ines

s Tx

ns p

er s

ec

1 5 10 100

Number of concurrent users

BenchmarkTarget: 88tps

Page 77: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

7777

Benchmark Environment Capacity Estimation

Pilot Performance Tests on 4 CPU

0

0.5

1

1.5

2

Bus

ines

s Tx

ns p

er s

ec

1 5 10 100

Number of concurrent users

4 CPUs for 2 tpsÞ How many for 88 tps?

BenchmarkTarget: 88tps

4/2 * 88 = 176 CPUs

Page 78: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

7878

Benchmark Environment Capacity Estimation

Pilot Performance Tests on 4 CPU

0

2

4

6

8

10

Bus

ines

s Tx

ns p

er s

ec

1 20 40 60 80 100

Number of concurrent users

The real picture

4 CPUs for 10 tpsÞ How many for 88 tps?

4/10 * 88 = 36 CPUs

Page 79: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

7979

Page 80: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

8080

Sequence Number Generation Problem

Page 81: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

8181

Sequence Number Generation Problem

Seq No.

10000

10001

10002

10003

B-Tree Index in DB

10000

10001

10002

Page 82: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

8282

Page 83: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

8383

To get rid of this hot spot use ‘Reverse Key Index’

Page 84: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

8484

Sequence Number Generation Problem

Seq No.

10000

10001

10002

10003

B-Tree Index in DB

10001

20001

ReverseSeq No.

Page 85: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

8585

10001

20001

Page 86: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

8686

Test Results @ Z = 3

N = 500 << 4000X = 48 returns/secR = 7 seconds (> RFP requirement of 6.25 sec)

Page 87: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

8787

Test Results: 1000 & 2000 users @ Z=39

Throughput

3216CPU Used (DB)

1 M1 MReturns processed

Utilization

Response Times sec

< 10%~15%Web CPU %

~40%~40%Apps CPU %

~50%~45%DB CPU %

User Exit

Print Result

Refund Details

Compute

Main Screen

1.1

0.4

0.3

0.3

0.4

0.5

1.1

0.3

0.2

0.1

0.3

0.4User Login

05:4311:09Completion Time (Hrs)

4825Business TPS

2000/1 million

1000/ 1millionTest Type

1.2

0.3

0.2

0.1

0.4

0.5

Avg 95th pct

1.0

0.4

0.2

0.2

0.3

0.4Meets

RFP

Criteria

Throughput scales almost linearly – no apparent bottleneck

Page 88: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

8888

Almost There

• At this point in time there was one week left for the exercise to complete

• Up to 2000 users results looked good and for the final test of 4000 users, extra capacity was also kept available on standby

• Auditors to visit benchmark lab for two days, towards end of the week

Page 89: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

8989

And Now for the Grand Finale: N=4000 @ Z=39

Page 90: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

9090

And Now for the Grand Finale: N=4000 @ Z=39

32

1 M

< 10%

~40%

~50%

05:43

48

2000/1 million

1.0

0.4

0.2

0.2

0.3

0.4

32CPU Used (DB)

1 MReturns processed

~5%Web CPU %

~45%Apps CPU %

~50%DB CPU %

User Exit

Print Result

Refund Details

Compute

Main Screen

1.2

29.3

4.5

17.1

2.1

1.6User Login

06:03Completion Time (Hrs)

48Business TPS

4000/1 millionTest Type

1.1

24.0

2.1

13.7

0.7

0.6

Throughput

Response Time Avg95th pct

Utilization

R4000 = 40.5R2000 = 1.1

Page 91: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

9191

Let’s Add More CPUs

No. of Users 2000 4000 4000 4000

No. of DB CPU 32 32 48 56

No. of App Server

Business Throughput

4

48 tps

4

48 tps

5

48 tps

5

48 tps

Response Times (seconds)

User Login 0.5 2

Main Screen 0.4 2 same as for same as for

Compute 0. 3 17 32 CPUs 32 CPUs

Refund Details 0. 3 5

Print Result 0.4 29

User Exit 1 1

Utilization

DB CPU % ~50% ~50% ~45% ~40%

Apps CPU % ~40% ~45% ~40% ~40%

Page 92: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

9292

Nothing Wrong with Capacity Planning

Nothing to be alarmed about

No disk, memory, network bottlenecks

Database does not scale with CPUs

< 192Kbps on 1 Gbps lan

< 45%

Constant at 45%

Page 93: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

93

And The Panic Button Has Been Pressed!!!

• Response time jumped 40 fold when moving from N=2000 users to N=4000 users

• 1 day went in multiple people finger pointing each otheron whether they had touched any configuration file

• Another day went in various attempts to tune, butthe result was always the same

• X = 48 tps, R = 40.5 seconds

• Now there was just one day left before the auditors would come and disqualify us

What to do? Try more options or sit back and analyze?

Page 94: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

94

SECTION 5: SOME MORE ELEMENTARY PERFORMANCE MODELLING

Page 95: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

95

Bottleneck Law: Background - Pipelining

Car

Serial Mode: 1 car per 8 min

Pipelined Mode?

Chasis Door Window Paint

2 min 2 min 2 min 2 min2 min

Page 96: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

96

Pipelining

Car

Serial Mode: 1 car per 8 min

Pipelined Mode? 4 cars per 8 min

Chasis Door Window Paint

2 min 2 min 2 min 2 min2 min

Page 97: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

97

Pipelining

Car

Serial Mode: 1 car per 8 min

Pipelined Mode? 4 cars per 8 min

If door takes 4 min what is the throughput?

Chasis Door Window Paint

2 min 2 min 2 min 2 min4 min

Page 98: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

98

Pipelining

Car

Serial Mode: 1 car per 8 min

Pipelined Mode? 4 cars per 8 min

If door takes 4 min what is the throughput?

Chasis Door Window Paint

2 min 2 min 2 min 2 min4 min

Page 99: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

99

Pipelining

Car

Serial Mode: 1 car per 8 min

Pipelined Mode? 4 cars per 8 min

If door takes 4 min what is the throughput?

1 car per 4 min

Chasis Door Window Paint

2 min 2 min 2 min 2 min4 min

The slowest stage or bottleneck limits overall throughput

Page 100: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

100

But Every Flow is Not a Pipeline

Page 101: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

101

Average Time Spent inServicing Request (no contention)

Average Number of Times Resource is Visited per Transaction

Visit Counts

Page 102: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

102

Visit Counts & Demand

Average Time Spent inServicing Request (no contention)

Average Number of Times Resource is Visited per Transaction

Page 103: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

103

Visit Counts & Demand

Average Time Spent inServicing Request (no contention)

Average Number of Times Resource is Visited per Transaction

To reduce visit count:a) removal of redundant callsb) caching at calling tierc) increase capacity

To reduce service time:a) optimize codeb) tune platformc) get a faster resource

Page 104: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

104

Effectively it is a Pipeline of Demands

Dmax = max { Di }

Max throughput = 1 / Dmax

Bottleneck Law

Demand Di = Vi Si

X 1 / Dmax

This bound becomes an equality once the bottleneck is reached (and the system doesn't crash or break down thereafter)

Page 105: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

105

Elementary Performance Modelling: Summary

Little’s Law for Closed Systems: N = X (R+Z)

Bottleneck Law: X = 1/Dmax at saturation

Therefore at saturation:R = N/(1/Dmax) – ZR = N Dmax – Z

Page 106: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

106

SECTION 6: APPLIED MODELLING & ANALYSIS

Page 107: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

107

Modelling the 40-fold Increase in R

ApplicationTest Results

500 userZ=3 sec

2000 userZ=39 sec

4000 userZ=39 sec

Average Rsp Time of Entire Business Txn

7 sec 1.1 sec 40.5 sec

Business Throughput 48/sec 48/sec 48/sec

Clearly throughput saturates at X = 48/sec

Recall X = 1/Dmax when the bottleneck is hit

Dmax = 1/48 = 20msRecall at saturation:R = N Dmax – Z

Page 108: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

108

Modelling the 40-fold Increase in R

ApplicationTest Results

500 userZ=3 sec

2000 userZ=39 sec

4000 userZ=39 sec

Average Rsp Time of Entire Business Txn

7 sec 1.1 sec 40.5 sec

Business Throughput 48/sec 48/sec 48/sec

Model: R = N Dmax – Z,

Dmax = 20ms

N = 500, Z = 3Þ R = 500*20ms – 3 = 10 – 3 = 7 sec

N = 2000, Z = 39Þ R = 2000*20ms – 39 = ?

N = 2000, Z = 39Þ R = 2000*20ms – 39 = 40-39 = 1 sec

N = 4000, Z = 39Þ R = 4000*20ms – 39 = ?

N = 4000, Z = 39Þ R = 4000*20ms – 39 = 80-39 = 41 sec

It is natural to have the 40 fold increase in R.

Now find out where is Dmax

Page 109: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

109

SECTION 7: IN SEARCH OF THE BOTTLENECKDmax = 20ms

Page 110: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

110

Let’s Look at the Usual Suspects

Page 111: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

111

Oracle Trace File Analysis: No. of SQLs per BT

1 session in tracefile. 435 user SQL statements in trace file. 70 internal SQL statements in trace file. 505 SQL statements in trace file. 162 unique SQL statements in trace file. 145 SQL statements EXPLAINed using schema: AST.prof$plan_table Default table was used. Table was created. Table was dropped. 5295 lines in trace file.

WinterCorp Report At That Time:Peak OLTP worldwide on Unix = 8.6 million SQL calls per hour!!!!

435 Database user calls per business transaction= 435 * 48= 21024/second= 75.7 million per hour!!

Page 112: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

112

SQL ordered by Gets for DB: RCC Instance: RCC CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s)

Hash Value --------------- -------------- ----------------- --------- ---------- ----------- -------------- 454,871,006 87,688 5,187.4 46.8 2778.07 20102.85

3184176672Module: f90runm@rp84201 (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and

SQL ordered by Reads for DB: RCC Instance: RCC

CPU ElapsdPhysical Reads Executions Reads per Exec %Total Time (s) Time (s)

Hash Value------------------ -------------- ------------------ --------- ----------- ----------- --------------- 14,416,210 87,688 164.4 83.6 2778.07 20102.85

3184176672Module: f90runm@rp84201 (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and

Extract of 30min Oracle Statspack report for 4000 user test

Let’s Look at the Usual Suspects: Bottleneck SQLs

Page 113: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

113

SQL ordered by Gets for DB: RCC Instance: RCC CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s)

Hash Value --------------- -------------- ----------------- --------- ---------- ----------- -------------- 454,871,006 87,688 5,187.4 46.8 2778.07 20102.85

3184176672Module: f90runm@rp84201 (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and

SQL ordered by Reads for DB: RCC Instance: RCC

CPU ElapsdPhysical Reads Executions Reads per Exec %Total Time (s) Time (s)

Hash Value------------------ -------------- ------------------ --------- ----------- ----------- --------------- 14,416,210 87,688 164.4 83.6 2778.07 20102.85

3184176672Module: f90runm@rp84201 (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and

Extract of 30min Oracle Statspack report for 4000 user test

High physical and logical reads

Let’s Look at the Usual Suspects

Page 114: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

114

SQL ordered by Gets for DB: RCC Instance: RCC CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s)

Hash Value --------------- -------------- ----------------- --------- ---------- ----------- -------------- 454,871,006 87,688 5,187.4 46.8 2778.07 20102.85

3184176672Module: f90runm@rp84201 (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and

SQL ordered by Reads for DB: RCC Instance: RCC

CPU ElapsdPhysical Reads Executions Reads per Exec %Total Time (s) Time (s)

Hash Value------------------ -------------- ------------------ --------- ----------- ----------- --------------- 14,416,210 87,688 164.4 83.6 2778.07 20102.85

3184176672Module: f90runm@rp84201 (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and

Extract of 30min Oracle Statspack report for 4000 user test

Let’s Look at the Usual Suspects

Avg. Response time/Execution= 20102.85/87688 =0.23 sec

No. of executions/sec = 87688/1800 (30 min report)

= 48

= business tps

Therefore we have only one execution of this SQL per business transaction.

If this SQL takes up 0.23sec of response time and is 46.8% of all SQLs, then we have a max contribution of 0.5sec from SQL Execution Time per Business Transaction.

Therefore the problem lies somewhere else.

Page 115: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

115

Statspack Top Events

STATSPACK report for

Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- ---------

-------------------Begin Snap: 2 31-Oct-05 18:36:38 4,009 118.7 End Snap: 3 31-Oct-05 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)

Top 5 Timed Events~~~~~~~~~~~~~~~~~~~~~

% TotalEvent Waits Time (s) %Ela Time------------------------------ --------- ---------- --------latch free 1,539,663 3,587,743 98.96CPU time 28,487 .79db file sequential read 17,221,454 7,500 .21log file sync 46,102 773 .02enqueue 5,299 680 .02-----------------------------------------------------------------

Extract of Oracle Statspack report for 4000 user test: 30 min

Excessively high Latch contention (99% of total wait time)

Page 116: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

116

What is a Latch in Oracle?

• Low level ‘lock’ that co-ordinates access to shared data structures

• Protect data structures from corruption when accessed by multiple processes

• First session to get the latch obtains exclusive access

• Otherwise Latch spinning Latch Sleeping Retry

• Latches are commonly used during SQL parsing

Page 117: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

118

Stages of SQL Processing in Oracle

Syntax CheckSQLStmt

e.g. SLECT * FROM xyz;

Semantic Check

e.g. SELECT * FROM non-existent table;

SharedPool

Check

Each statement is hashed to generate a hash value

Generation of Multple Execution Plans

Generation of

Query Plan

Execution

Hash value found

Library Cache Hit

SoftParse

Hard ParseLibrary Cache Miss

Latch is held on library cache during hard/soft parse

Page 118: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

119

Statspack Wait Events

Wait Events for DB: RCC Instance: RCC

Avg. Total Wait wait WaitsEvent Waits Time (s) (ms) /txn------------------------- ------- ------ ------ --------latch free 1,539,663 3,587,743 2330 5.2db file sequential read 17,221,454 7,500 0 60.6log file sync 246,102 773 3 0.9enqueue 5,299 680 128 0.0

Extract of Oracle Statspack report for 4000 user test: 30 min

Latch Free wait during 2000 user test was 8.31% and Avg. wait time was 2 ms

Page 119: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

120

Statspack Wait Events

Wait Events for DB: RCC Instance: RCC

Avg. Total Wait wait WaitsEvent Waits Time (s) (ms) /txn------------------------- ------- ------ ------ --------latch free 1,539,663 3,587,743 2330 5.2db file sequential read 17,221,454 7,500 0 60.6log file sync 246,102 773 3 0.9enqueue 5,299 680 128 0.0

Extract of Oracle Statspack report for 4000 user test: 30 min

Total Wait Time per Business Transaction= No. of Waits/BusinessTxn X Average Time per Wait = No. of Waits/DBTxn X No. of DBTxn/BusinessTxn X Average Time per Wait

= 5.2 X No. of DBTxn/BusinessTxn X 2.33 sec

How to determine No. of DBTxn per BusinessTxn?

Page 120: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

121121

Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- --------- --------------Begin Snap: 2 31-Oct-05 18:36:38 4,009 118.7 End Snap: 3 31-Oct-05 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)

Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 1,537,284.62 9,748.08 Logical reads: 539,210.76 3,419.19 Block changes: 8,615.05 54.63 Physical reads: 9,570.96 60.69 Physical writes: 982.97 6.23 User calls: 22,725.02 144.10 Parses: 14,450.63 91.63 Hard parses: 0.00 0.00 Sorts: 3,699.62 23.46 Logons: 0.00 0.00 Executes: 38,242.79 242.50 Transactions: 157.70

Statspack Load Profile

- DBTxns/sec = 157.7

- BusinessTxns/sec = 48

DBTxns/BusinessTxn = 157.7/48 = 3.29

Page 121: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

122

Total Wait Time Avg.

Total Wait wait WaitsEvent Waits Time (s) (ms) /txn------------------------- ------- ------ ------ --------latch free 1,539,663 3,587,743 2330 5.2db file sequential read 17,221,454 7,500 0 60.6log file sync 246,102 773 3 0.9enqueue 5,299 680 128 0.0

Total Wait Time per Business Transaction= No. of Waits/BusinessTxn X Average Time per Wait = No. of Waits/DBTxn X No. of DBTxn/BusinessTxn X Average Time per Wait

= 5.2 X No. of DBTxn/BusinessTxn X 2.33 sec

Total Wait Time Per Business Txn= 5.2 * 3.29 * 2.33 = 39.86 sec

Page 122: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

123

Response Time Breakdown

Average Response Time for 4000 users = 40.5 seconds

• Latch Wait Time per Business Transaction = 39.9 sec• Estimated SQL execution time per BT = 0.5 sec• Miscellaneous = 0.1 sec

Total 40.5 sec

Clearly the only bottleneck is the latching.

What is causing so much of latching? As discussed earlier, SQL parses cause latching.

Page 123: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

124124

Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- --------- --------------Begin Snap: 2 31-Oct-05 18:36:38 4,009 118.7 End Snap: 3 31-Oct-05 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)

Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 1,537,284.62 9,748.08 Logical reads: 539,210.76 3,419.19 Block changes: 8,615.05 54.63 Physical reads: 9,570.96 60.69 Physical writes: 982.97 6.23 User calls: 22,725.02 144.10 Parses: 14,450.63 91.63 Hard parses: 0.00 0.00 Sorts: 3,699.62 23.46 Logons: 0.00 0.00 Executes: 38,242.79 242.50 Transactions: 157.70

Statspack Load Profile

We only have soft parses

Page 124: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

125125

Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- --------- --------------Begin Snap: 2 31-Oct-05 18:36:38 4,009 118.7 End Snap: 3 31-Oct-05 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)

Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 1,537,284.62 9,748.08 Logical reads: 539,210.76 3,419.19 Block changes: 8,615.05 54.63 Physical reads: 9,570.96 60.69 Physical writes: 982.97 6.23 User calls: 22,725.02 144.10 Parses: 14,450.63 91.63 Hard parses: 0.00 0.00 Sorts: 3,699.62 23.46 Logons: 0.00 0.00 Executes: 38,242.79 242.50 Transactions: 157.70

Statspack Load Profile

It is common to have 100 to 1000 parses/sec in large systems. Here we have > 10,000/sec

Page 125: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

126

Database Tuning that did not Help !

• Increasing spin count for acquiring latches

• Increasing session_cached_cursors

• Forcing cursor sharing for all sqls

• Setting cursor_space_for_time

• Increasing SGA size (shared pool, buffer cache)

• Creating buffer pools

Page 126: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

127127

Parsing Analysis

SQL ordered by Parse Calls for DB: RCC Instance: RCC

% TotalParse Calls Executions Parses Hash Value------------ -------------- ---------- ------------ 14,285,993 14,286,265 54.89 2588670467

Module: f90runm@rp84201 (TNS V1-V3)declare p varchar2(32767); begin p := GF_GLOBAL_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); :v4 := substr(p,12001,4000); :v5 := substr(p,16001,4000); :v6 := substr(p,20001,4000);

2,939,514 2,939,618 11.29 2294365478

Module: f90runm@rp84201 (TNS V1-V3)declare p varchar2(32767); begin p := GF_MASTER_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); :v4 := substr(p,12001,4000); :v5 := substr(p,16001,4000); :v6 := substr(p,20001,4000);

Extract of Oracle Statspack report for 4000 user test

Most of the parse calls are for two functions : GF_GLOBAL_POLICY & GF_MASTER_POLICY

Page 127: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

128

GF_GLOBAL_POLICY: call count cpu elapsed ------- ------ -------- ---------- Parse 207 0.01 0.02 Execute 207 0.03 0.04 Fetch 0 0.00 0.00 ------- ------ -------- ---------- total 414 0.04 0.07

GF_MASTER_POLICY:• call count cpu elapsed ------- ------ -------- ----------

• Parse 40 0.01 0.00 Execute 40 0.01 0.01 Fetch 0 0.00 0.00 • ------- - ----- -------- ---------- total 80 0.02 0.02

Function Call Analysis Extract of Oracle TKPROF report of test

Per Business Transactionthese two functions are called 207+40 = 247 times and have a total CPU time of 0.02 sec

Note: Our Performance Model had estimated bottleneck demand Dmax = 20ms = 0.02 sec!!

Page 128: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

129

SECTION 8: THE LAST NAIL IN THE COFFIN

Page 129: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

130

The Story Thus Far• RFP for System Integration of Centralized Application

• Benchmark of application is mandatory

• Think Time specified as 3 sec

• Realistic estimate proposed as 105 sec

• Technical Committee revises think time to 39 sec

• Last week of benchmark shows acceptable performancefor 2000 users, but complete disaster for 4000 users

• The application cannot be touched, all attempts at configuration tuning and capacity management are leading to a dead end

• Performance model validates that problem will occur at 4000 users

• Bottleneck identified to be excessive parsing

1 day left for auditor to arrive. What to do next?

Page 130: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

131

Recall Bottleneck

GF_GLOBAL_POLICY: call count cpu elapsed ------- ------ -------- ---------- Parse 207 0.01 0.02 Execute 207 0.03 0.04 Fetch 0 0.00 0.00 ------- ------ -------- ---------- total 414 0.04 0.07

GF_MASTER_POLICY: call count cpu elapsed ------- ------ -------- ---------- Parse 40 0.01 0.00 Execute 40 0.01 0.01 Fetch 0 0.00 0.00 ------- - ----- -------- ---------- total 80 0.02 0.02

Per Business Transactionthese two functions are called 207+40 = 247 times and have a total CPU time of 0.02 sec

Note: Our Performance Model had estimated bottleneck demand Dmax = 20ms = 0.02 sec!!

These two functions are because of Oracle VPD

Page 131: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

132

Oracle VPD

• Virtual Private Database to enforce security to a fine level of granularity

• For example an Regional Officer must see only his own region

• For a database with 52 million people, SELECT COUNT(*) FROM PEOPLE will return 52 million in general

• But suppose Regional Officer A is allowed to see only his 10,000 people, then when he fires the query he should see only 10,000

• VPD solves this problem by having the administrator specify policieswhich will append to the WHERE clause

• Thus when Regional officer A fires the query:SELECT COUNT(*) FROM PEOPLE

the policy function appends to it, to make it effectively: SELECT COUNT(*) FROM PEOPLE WHERE REGIONAL_OFFICER=“A”;

Page 132: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

133

The Final Countdown

• Excessive latching was being caused due to VPD

• The application could not be touched and hence visit counts couldnot be reduced

• Recall: Demand = Visit Count x Service Time

• To speedup the time for parsing we needed a faster CPU, but we were already using the best in class

Page 133: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

134

The Final Countdown

Theorem: If you cannot do it, prove that others cannot do it

Proof:

• R = N Dmax – Z = 4000 Dmax – 39

• Acceptable R ≤ 6.25, therefore acceptable Dmax ≤ (R+Z)/4000 = (6.25+39)/4000 = 11.31ms

• For Dmax to reduce from 20ms to 11.31ms the CPU should 45% faster

• At that point in time the competitor’s CPU was 25% faster

• This gave us relief that others would also have failed in this benchmark

Page 134: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

135

The Final Countdown

• Last Day Options Available

• Option 1: Do nothing, and tell the customer that they will haveto wait for 2 years for CPUs to get as fast as their application needs

• Option 2: Run benchmark with realistic think time and forceauditor to include it in the audit report

We opted for Option 2

Page 135: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

136

The Final Countdown

Recall, we had proposed Z=105 in response to which the customer increased Z from 3 to 39 sec

Now, even 39 sec was proving to be a bottleneck

We decided to benchmark with Z=93 sec instead of 39 sec

Page 136: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

137

The Final Countdown

½ M1 M1 M1 MReturns processed

Throughput

Utilization

Response Times sec – 95th percentile

~5%~5%~10%~15%Web CPU %

~35%~45%~45%~40%Apps CPU %

~45%~50%~50%~45%DB CPU %

1.011.11.2User Exit

0.4290.40.3Print Result

0.250.30.2Refund Details

0.2170.30.1Compute

0.420.40.4Main Screen

0.520.50.5User Login

03:3606:0305:4311:09Completion Time (Hrs)

38474825Business TPS

4000/½ million

4000/1 million

2000/1 million

1000/ 1million

Test Type

Z = 39 sec Z = 93 sec

Page 137: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

138138

The End

Net Result: RFP was scrapped and a new RFP released with think time = 2 minutes

Page 138: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

139139

Summary of this SessionElementary Performance Modelling:

• Little’s Law: N = X (R+Z) for any work conserving system

• Bottleneck Law: X ≤ 1/Dmax

Application Centralization RFP:

• Proving infeasibility of Z=3 well before the benchmark

• Revision to Z=39 and subsequent failure at 4000 users

• Use of simple modelling to analyze bottleneck and provethat nothing needed to be done

• Use of simple modelling to arrive at realistic think time estimate and benchmark run to prove feasibility of Centralization despite stringent security checks in application

• New RFP which led to successful implementation of CentralizedApplication

Page 139: 1 Elementary Performance Modelling as Applied to a Large System Benchmark Rajesh.Mansharamani@gmail.com Sep 2014.

140140

References

• Lazowska et al: Quantitative System Performance

• V Jain, J Murty: Centralized Tax Processing Performance, ROSETEA 2007

• R. Mansharamani et al. Performance Testing: Far From Steady State. IEEE COMPSAC, Seoul, July 2010

• www.SoftwarePerformanceEngineering.com


Recommended