Date post: | 07-Feb-2017 |
Category: |
Technology |
Upload: | sailqu |
View: | 126 times |
Download: | 0 times |
An Automated Approach for Recommending When to Stop
Performance Tests
Hammam AlGhamdi
Weiyi Shang
Mark D. Syer
Ahmed E. Hassan
1
Failures in ultra large-scale systems are often due to
performance issues rather than
functional issues
2
A 25-minutes service outage in 2013 cost Amazon approximately $1.7M
3
4
Performance testing is essential to prevent these failures
System under test
requests
requests
requests Performance
counters, e.g., CPU, memory, I/O and response time
Pre-defined workload
Performance testing environment
5
Determining the length of a performance test is challenging
Time
Repetitive data is generated from the test
Optimal stopping time
6
Determining the length of a performance test is challenging
Time
Stopping too early, misses performance
issues
Stopping too late, delays the release and wastes testing
resources Optimal stopping
time
7
Our approach for recommending when to stop a performance test
Collected data Likelihood of repetitiveness
First derivatives
Whether to stop the test
1) Collect the already-generated
data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop the
test
STOP
No
Yes
8
Our approach for recommending when to stop a performance test
Collected data Likelihood of repetitiveness
First derivatives
Whether to stop the test
1) Collect the already-generated
data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop the
test
STOP
No
Yes
9
Our approach for recommending when to stop a performance test
Collected data Likelihood of repetitiveness
First derivatives
Whether to stop the test
1) Collect the already-generated
data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop the
test
STOP
No
Yes
10
Our approach for recommending when to stop a performance test
Collected data Likelihood of repetitiveness
First derivatives
Whether to stop the test
1) Collect the already-generated
data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop the
test
STOP
No
Yes
11
Time
Current time
Collected data
Likelihood of repetitiveness
First derivative
Whether to stop the test
1) Collect the already-
generated data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop
the test
Step 1: Collect the data that the test generates
Performance counters, e.g., CPU, memory, I/O
and response time
12
Time
Step 2: Measure the likelihood of repetitiveness
Select a random time period (e.g. 30 min)
Current time
Collected data
Likelihood of repetitiveness
First derivative
Whether to stop the test
1) Collect the already-
generated data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop
the test
A
13
Time
Current time
Search for another non-overlapping time period that is NOT statistically significantly different.
…
Step 2: Measure the likelihood of repetitiveness
… B … A
Collected data
Likelihood of repetitiveness
First derivative
Whether to stop the test
1) Collect the already-
generated data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop
the test
14
Time
Wilcoxon test between the
distributions of every performance counter across both
periods
…
Current time
Step 2: Measure the likelihood of repetitiveness
B … A
Collected data
Likelihood of repetitiveness
First derivative
Whether to stop the test
1) Collect the already-
generated data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop
the test
15
Step 2: Measure the likelihood of repetitiveness
Response time
CPU Memory IO
p-values 0.0258 0.313 0.687 0.645
Statistically significantly different in response time!
Collected data
Likelihood of repetitiveness
First derivative
Whether to stop the test
1) Collect the already-
generated data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop
the test
Time
Wilcoxon test between every performance counter from both periods
…
Current time
B … A
16
Step 2: Measure the likelihood of repetitiveness
Response time
CPU Memory IO
p-values 0.67 0.313 0.687 0.645
Find a time period that is NOT statistically significantly different in ALL performance metrics!
Collected data
Likelihood of repetitiveness
First derivative
Whether to stop the test
1) Collect the already-
generated data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop
the test
Time
Wilcoxon test between every performance counter from both periods
…
Current time
B … A
17
Find a period that is NOT statistically significantly different?
Yes. Repetitive! No. Not repetitive!
Step 2: Measure the likelihood of repetitiveness
Collected data
Likelihood of repetitiveness
First derivative
Whether to stop the test
1) Collect the already-
generated data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop
the test
Time
Wilcoxon test between every performance
counter from both periods
…
Current time
B … A
18
Step 2: Measure the likelihood of repetitiveness
Collected data
Likelihood of repetitiveness
First derivative
Whether to stop the test
1) Collect the already-
generated data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop
the test
Repeat this process a large number (e.g., 1,000) times
to calculate the:
likelihood of repetitiveness
19
Step 2: Measure the likelihood of repetitiveness
30 min 40 min
Time …
1h 10 min
Collected data
Likelihood of repetitiveness
First derivative
Whether to stop the test
1) Collect the already-
generated data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop
the test
A new likelihood of repetitiveness is measured periodically, e.g., every 10 min, in order to get more frequent feedback on the
repetitiveness
20
Step 2: Measure the likelihood of repetitiveness
Time
likelihood of repetitiveness
00:00 24:00
1%
100%
Stabilization (little new information)
Collected data
Likelihood of repetitiveness
First derivative
Whether to stop the test
1) Collect the already-
generated data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop
the test
The likelihood of repetitiveness eventually starts stabilizing.
21
Step 3: Extrapolate the likelihood of repetitiveness
Time
likelihood of repetitiveness
00:00 24:00
1%
100%
Collected data
Likelihood of repetitiveness
First derivative
Whether to stop the test
1) Collect the already-
generated data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop
the test
To know when the repetitiveness stabilizes, we calculate the first derivative.
22
Step 4: Determine whether to stop the test
Time
likelihood of repetitivenes
s
00:00 24:00
1%
100%
Stop the test if the fist derivative is
close to 0.
Collected data
Likelihood of repetitiveness
First derivative
Whether to stop the test
1) Collect the already-
generated data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop
the test
To know when the repetitiveness stabilizes, we calculate the first derivative.
23
Our approach for recommending when to stop a performance test
Collected data Likelihood of repetitiveness
First derivatives
Whether to stop the test
1) Collect the already-generated
data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop the
test
STOP
No
Yes
PetClinic Dell DVD Store
CloudStore
24
We conduct 24-hour performance tests on three systems
25
We evaluate whether our approach:
Stops the test too early?
Stops the test too late?
Optimal stopping time
1 2
26
Pre-stopping data Post-stopping data
Time
STOP
Does our approach stop the test too early?
00:00 24:00
1
1) Select a random time period from the post-
stopping data
2) Check if the random time period has a
repetitive one from the pre-stopping data
The test is likely to generate little new data, after the stopping times (preserving more
than 91.9% of the information).
Repeat 1,000 times
27
We apply our evaluation approach in RQ1 at the end of every hour during the test to find the most cost effective stopping time.
Does our approach stop the test too late?
2
1h 2h
Time …
10h 20h 24h
The most cost-effective stopping time has: 1. A big difference to
the previous hour
2. A small difference to the next hour
28
1%
100%
00:00 04:00 05:00 06:00
Does our approach stop the test too late?
2
likelihood of repetitiveness
29
There is a short delay between the
recommended stopping times and the most cost effective stopping times (The majority are under
4-hour delay).
Short delay
Does our approach stop the test too late?
2
30
Determining the length of a performance test is challenging
Time
Stopping too early, misses performance
issues
Stopping too late, delays the release and wastes testing
resources Optimal stopping
time
31
30
Determining the length of a performance test is challenging
Time
Stopping too early, misses performance
issues
Stopping too late, delays the release and wastes testing
resources Optimal stopping
time
32
Our approach for recommending when to stop a performance test
Collected data Likelihood of repetitiveness
First derivatives
Whether to stop the test
1) Collect the already-generated
data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop the
test
STOP
No
Yes
33
30
Determining the length of a performance test is challenging
Time
Stopping too early, misses performance
issues
Stopping too late, delays the release and wastes testing
resources Optimal stopping
time 32
Our approach for recommending when to stop a performance test
Collected data Likelihood of repetitiveness
First derivatives
Whether to stop the test
1) Collect the already-generated
data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop the
test
STOP
No
Yes
34
Pre-stopping data Post-stopping data
Time
STOP
Does our approach stop the test too early?
00:00 24:00
1
1) Select a random time period from the post-
stopping data
2) Check if the random time period has a
repetitive one from the pre-stopping data
The test is likely to generate little new data, after the stopping times (preserving more
than 91.9% of the information).
Repeat 1,000 times
35
30
Determining the length of a performance test is challenging
Time
Stopping too early, misses performance
issues
Stopping too late, delays the release and wastes testing
resources Optimal stopping
time
32
Pre-stopping data Post-stopping data
Time
STOP
Does our approach stop the test too early?
00:00 24:00
1
1) Select a random time period from the post-
stopping data
2) Check if the random time period has a
repetitive one from the pre-stopping data
The test is likely to generate little new data, after the stopping times (preserving more
than 91.9% of the information).
Repeat 1,000 times
32
Our approach for recommending when to stop a performance test
Collected data Likelihood of repetitiveness
First derivatives
Whether to stop the test
1) Collect the already-generated
data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop the
test
STOP
No
Yes
36
There is a short delay between the
recommended stopping times and the most cost effective stopping times (The majority are under
4-hour delay).
Short delay
Does our approach stop the test too late?
2
37
30
Determining the length of a performance test is challenging
Time
Stopping too early, misses performance
issues
Stopping too late, delays the release and wastes testing
resources Optimal stopping
time
32
Pre-stopping data Post-stopping data
Time
STOP
Does our approach stop the test too early?
00:00 24:00
1
1) Select a random time period from the post-
stopping data
2) Check if the random time period has a
repetitive one from the pre-stopping data
The test is likely to generate little new data, after the stopping times (preserving more
than 91.9% of the information).
Repeat 1,000 times
33
There is a short delay between the
recommended stopping times and the most cost effective stopping times (The majority are under
4-hour delay).
Short delay
Does our approach stop the test too late?
2
32
Our approach for recommending when to stop a performance test
Collected data Likelihood of repetitiveness
First derivatives
Whether to stop the test
1) Collect the already-generated
data
2) Measure the likelihood of repetitiveness
3) Extrapolate the likelihood of repetitiveness
4) Determine whether to stop the
test
STOP
No
Yes