Automated Discovery of Performance Regressions in Enterprise Applications

Automated Discovery of Performance Regressions in Enterprise Applications

King Chun (Derek) FooSupervisors: Dr. Jenny Zou and Dr. Ahmed E. HassenDepartment of Electrical and Computer Engineering

Performance Regression

• Software changes over time– Bug fixes– Features enhancements– Execution environments

• Performance regressions describe situations where the performance degrades compared to previous releases

2

Example of Performance Regression

3

Application Server Data Store

SP1 introduces a new default policy to throttle the “# of RPC/min”

• Significant increase of job queue and response time

• CPU utilization decreases

• Certification of 3rd party component

Load Generator

Application Server

Data StoreSP 1

Load Generator

Current Practice of Performance Verification

Requirements

Design

Implementation

Verification

Maintenance

4

FOUR CHALLENGESThe Current Practice of Performance Verification

5

1. Too Late in the Development Lifecycle

• Design changes are not evaluated until after code is written

– Happens at the last stage of a delayed schedule

6

Requirements

Design

Implementation

Verification

Maintenance

2. Lots of Data

• Industrial case studies have 2000> counters

• Time consuming to analyze

• Hard to compare more than 2 tests at once

7

3. No Documented Behavior

8

• Analysts have different perceptions of performance regressions

• Analysis may be influenced by– Analyst’s knowledge– Deadline

4. Heterogeneous Environments

• Multiple labs to parallelize test executions– Hardware and software may differ– Tests from one lab may not be used to analyze

tests from another lab

9

Categorize Each Challenge

10

Design Implementation

ImplementationImplementation

AT THE DESIGN LEVELPerformance Verification

11

Evaluate Design Changes through Performance Modeling

• Analytical models are often not suitable for all stakeholders– Abstract mathematical and statistical concepts

• Simulation models can be implemented with support of existing framework– Visualization– No systematic approach to construct models that

can be used by different stakeholders

12

Layered Simulation Model

13

Physical layerComponent layer

World view layer

Can the current infrastructure support the projected growth of users?

Investigate threading model

Hardware resource utilization

Case Studies

• We conducted two case studies– RSS Cloud

• Show the process of constructed the model• Derive the bottleneck of the application

– Performance monitor for ULS systems• Evaluate whether or not an organization should re-

architect the software

• Our model can be used to extract important information and aid in decision making

14

AT THE IMPLEMENTATION LEVELPerformance Verification

15

Challenges with Analyzing Performance Tests

• Lots of data– Industrial case studies have 2000> counters– Time consuming to analyze– Hard to compare more than 2 tests at once

• No documented behavior– Analyst’s subjectivity

16

Performance Signatures

Intuition: Counter correlations are the same across tests

17

Repository

Arrival RateMedium

CPU Utilization Medium

Throughput Medium

RAM Utilization Medium

Job Queue Size Low

…

Performance Signatures

Approach Overview

18

Case Studies

• 2 Open Source Applications– Dell DVD Store and JPetStore– Manually injected bugs to simulate performance

regressions• Enterprise Application

– Compare counters flagged by our technique against analyst’s reports

19

Case Studies Result

• Open source applications:– Precision: 75% - 100%– Recall: 52% - 67%

• Enterprise application:– Precision: 93%– Recall: 100% (relative to the organization’s report)– Discovered new regressions that were not

included in the analysis reports

20

HETEROGENEOUS ENVIRONMENTSAnalyzing Performance Tests conducted in

21

Heterogeneous Environments

• Different hardware and software configurations

• Performance tests conducted in different lab exhibits different behaviors

• Must identify performance regressions from performance difference caused by heterogeneous environments

22

Ensemble-based Approach

• Build a collection of models from the repository

– Each model specializes in detecting the performance regressions in a specific environment

• Reduces risks of following a single model which may contain conflicting behaviors

23

Case Studies

• 2 Open Source Applications– Dell DVD Store and JPetStore– Manually injected bugs and varies

hardware/software resources• Enterprise Application

– Use existing tests conducted from different lab

24

Case Studies Result

• Original approach– Precision: 80%– Recall: 50% (3-level discretization) - 60% (EW)

• Ensemble-based approach:– Precision: 80% (Bagging) - 100% (Stacking)– Recall: 80%

• Ensemble-based approach with stacking produces the best result in our experiments

25

Major Contributions

• An approach to build layered simulation models to evaluate design changes early

• An automated approach to detect performance regression, allowing analysts to analyze large amount of performance data while limiting subjectivity

• An ensemble-based approach to deal with performance tests conducted in heterogeneous environments, which is common in practice

26

Conclusion

27

Publication

K. C. Foo, Z. M. Jiang, B. Adams, A. E. Hassan, Y. Zou, P. Flora, "Mining Performance Regression Testing Repositories for Automated Performance Analysis," Proc. Int’l Conf. on Quality Softw. (QSIC), 2010

28

Future Work

• Online analysis of performance test• Compacting the performance regression

report• Maintaining the training data for our

automated analysis approach• Using performance signatures to build

performance models

29

30

Logical view Development view

Process view Physical view

Scenarios view

Figure 2‑1: The "4+1" view model

31

Execution of performance

regression test

Threshold-based analysis of test result

Manual analysis of test

result

Report generation

Figure 2‑2: The process of performance verification

32

QN models Types of application suitable to be modeled

Open QN Applications with jobs arriving externally; these jobs will eventually depart

from the applications.

Closed QN Applications with a fixed number of jobs circulating within the applications.

Mixed QN Applications with jobs that arrive externally and jobs that circulate within

the applications.SQN-HQN

SRN Distributed applications with synchronous communication.

LQN Distributed applications with synchronous or asynchronous communication.

Table 3 1: Summary of approaches based on QN model‑

33

Figure 3‑1: Open queueing network model

Figure 3‑2: Closed queueing network model

34

Stakeholder Performance Concerns

End user Overall system performance for various deployment scenarios

Programmer Organization and performance of system modules

System Engineer Hardware resource utilization of the running application

System Integrator Performance of each high-level component in the application

Table 4‑1: Performance concerns of stakeholders

35

Stakeholder Layer in Our Simulation Model 4+1 View ModelArchitects, Managers

End users

Sales Representatives

World View Layer Logical view

Programmers

System IntegratorsComponent Layer

Development View

Process ViewSystem Engineers Physical Layer Physical ViewAll Stakeholders Scenario Scenario

Table 4‑2: Mapping of our simulation models to the 4+1 view model

36

•Physical layer•Component layer

•World view layer

Figure 4‑1: Example of layered simulation model for an RSS cloud

37

Layer Component Has connection to

World view layerUsers, blogs RSS server

RSS server Users, blogs

Component layer

In Queues, out queues Application logic

Application logicInput queues, output queues,

hardware

Hardware Application logic

Physical layerHardware allocator CPU, RAM, disk

CPU, RAM, disk Hardware allocator

Table 4‑3: Components and connections in Figure 4‑1

38

Resource Requirement

CPU 2 unit

RAM 5 KB

Thread 1

Processing time 2 seconds

Table 4‑4: Processing requirement for an RSS notification

39

Figure 4‑2: Plot of the throughput of the RSS server at various request arrival rates

40

Figure 4‑3: Plot of the response time of the RSS server at various request arrival rates

41

Figure 4‑4: Plot of the hardware utilization of the RSS server at various request arrival rates

42

Figure 4-5: World view layer of the performance monitor for ULS applications

43

Layers Performance Data

World view layer Response time, Transmission Cost

Component layer Thread Utilization

Physical layer CPU and RAM utilization

Table 4‑5: Performance data collected per layer

CPU Util. Low OK High Very HighRange (s) < 30 30 – 60 60 – 75 > 75

Discretization 0.25 0.5 0.75 1

Table 4‑6: Categorization of CPU utilization

RAM Util. Low OK High Very HighRange (%) < 25 25 – 50 50 – 60 > 60

Discretization 0.25 0.5 0.75 1

Table 4‑7: Categorization of RAM utilization

44

Data Collection Frequency

(Hz)

LayersData

Broadcast Period (s)

Response time

(s)Cost ($)

Central Monito

r Thread

Util. (%)

Central Monitor CPU Util. (%)

Central Monitor RAM

Util. (%)

0.1

World View 1 6.8 5.0 1.6 15.6 6.1

Component 1 6.8 5.0 1.6 15.6 6.1Physical 1 6.8 5.0 1.6 15.6 6.1

0.2

World View 1 7.7 5.0 4.0 40.3 15.7


0.3

World View 1 8.9 5.0 6.4 64.4 25.3


Table 4‑8: Simulation result for the performance monitor case study

45

(b) Details of performance regressions

Time series plots show the periods whereperformance regressionsare detected.Box plots give a quick visual

comparison between prior tests and the new test.

Counters with performance regressions (underlined) are annotated with expected counter correlations.

(a) Overview of problematic regressions

46

Figure 5‑2: Overview of performance regression analysis approach

47

(a) Original counter data

(b) Counter discretization(Shaded area corresponds to the medium Discretization level)

Figure 5‑3: Counter normalization and discretization

48

For each counter,

High = All values above the medium level

Medium = Median +/- 1 standard deviation

Low = All values below the medium level

Figure 5‑4: Definition of counter discretization levels

49

Figure 5‑5: Example of an association rule

50

# of test

scenarios

Duration per

test (hours)

Average

precisionAverage recall

DS2 4 1 100% 52%

JPetStore 2 0.5 75% 67%

Enterprise

Application13 8 93%

100%

(relative to organization’s

original analysis)

Table 5‑1: Average precision and recall

51

Load generator

% Processor Time

# Orders/minute

# Network Bytes Sent/second

# Network Bytes Received/Second

Tomcat

% Processor Time

# Threads

# Virtual Bytes

# Private Bytes

MySQL

% Processor Time

# Private Bytes

# Bytes written to disk/second

# Context Switches/second

# Page Reads/second

# Page Writes/second

% Committed Bytes In Use

# Disk Reads/second

# Disk Writes/second

# I/O Reads Bytes/second

# I/O Writes Bytes/second

Table 5‑2: Summary of counters collected for DS2

52

53

Figure 5‑6: Performance Regression Report for DS2 test D_4 (Increased Load)

54

Test

Summary of the report

submitted by the performance

analyst

Our findings

E_1 No performance problem found.

Our approach identified

abnormal behaviors in system

arrival rate and throughput

counters.

E_2

Arrival rates from two load

generators differ significantly.

Abnormally high database

transaction rate.

High spikes in job queue.

Our approach flagged the same

counters as the performance

analyst’s analysis with one false

positive.

E_3Slight elevation of # database

transactions/second.No counter flagged.

Table 5‑4: Summary of analysis for the enterprise application

55

Model Counters flagged as Violation

R1 CPU utilization, throughput

R2 Memory utilization, throughput

R3 Memory utilization, throughput

R4 Database transactions/second

Table 6‑1: Counters flagged in T5 by multiple rule sets

56

Counters flagged as Violation # of times flagged

Throughput 3

Memory utilization 2

CPU utilization 1

# Database transactions / second 1

Table 6‑2: Count of counters flagged as violations by individual rule set

57

Performance Testing Repository New Test

T1 T2 T5

CPU 2 GHz, 2 cores 2 GHz, 2 cores2 GHz, 2

cores

Memory 2 GB 1 GB 2 GB

Database Version 1 2 1

OS Architecture 32 bit 64 bit 64 bit

Table 6‑3: Test Configurations

58

Table 6‑4: Summary of performance of our approachesP represents precision, R represents recall, and F represents F-measure

(Values are rounded up to 1 significant digit)

Date post:	13-Jan-2017
Category:	Software
Upload:	sailqu
View:	42 times
Download:	2 times