Date post: | 27-Jan-2015 |
Category: |
Technology |
Upload: | philip-tellis |
View: | 109 times |
Download: | 5 times |
• Philip Tellis
• .com• [email protected]
• @bluesmoon• geek paranoid speedfreak• http://bluesmoon.info/
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 1
I’m a Web Speedfreak
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 2
We measure real user website performance
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 3
This talk is about the Statistics we learned while building it
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 4
The Statistics of Web Performance Analysis
Philip Tellis / [email protected]
Boston #WebPerf Meetup / 2012-08-14
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 5
0Numbers
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 6
Accurately measure page performance∗
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 7
Be unintrusive
If you try to measure something accurately, you will changesomething related
– Heisenberg’s uncertainty principle
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 8
And one number to rule them all
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 9
What do we measure?
• Network Throughput• Network Latency• User perceived page load time
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 10
We measure real user data
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 11
Which is noisy
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 12
1Statistics - 1
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 13
Disclaimer
I am not a statistician
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 14
1-1Random Sampling
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 15
Population
All possible users of your system
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 16
Sample
Representative subset of the population
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 17
Bad sample
Sometimes it’s not
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 18
How to randomize?
http://xkcd.com/221/
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 19
How to randomize?
• Pick 10% of users at random and always test them
OR
• For each user, decide at random if they should be tested
http://tech.bluesmoon.info/2010/01/statistics-of-performance-measurement.html
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 20
Select 10% of users - I
if($sessionid % 10 === 0) {// instrument code for measurement
}
• Once a user enters the measurement bucket, they staythere until they log out
• Fixed set of users, so tests may be more consistent• Error in the sample results in positive feedback
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 21
Select 10% of users - II
if(rand() < 0.1 * getrandmax()) {// instrument code for measurement
}
• For every request, a user has a 10% chance of beingtested
• Gets rid of positive feedback errors, but sample size !=10% of population
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 22
How big a sample is representative?
Select n such that∣∣∣1.96 σ√n
∣∣∣ ≤ 5%µ
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 23
1-2Margin of Error
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 24
Standard Deviation
• Standard deviation tells you the spread of the curve• The narrower the curve, the more confident you can be
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 25
MoE at 95% confidence
±1.96 σ√n
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 26
MoE & Sample size
There is an inverse square root correlation between sample sizeand margin of error
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 27
1-3Central Tendency
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 28
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 29
One number
• Mean (Arithmetic)• Good for symmetric curves• Affected by outliers
Mean(10, 11, 12, 11, 109) = 30
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 30
One number
• Median• Middle value measures central tendency well• Not trivial to pull out of a DB
Median(10, 11, 12, 11, 109) = 11
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 31
One number
• Mode• Not often used• Multi-modal distributions suggest problems
Mode(10, 11, 12, 11, 109) = 11
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 32
Other numbers
• A percentile point in the distribution: 95th, 98.5th or 99th
• Used to find out the worst user experience• Makes more sense if you filter data first
P95th(10, 11, 12, 11, 109) = 12
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 33
Other means
• Geometric mean• Good if your data is exponential in nature
(with the tail on the right)
GMean(10, 11, 12, 11, 109) = 16.68
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 34
Wait... how did I get that?
N
√ΠN
i=1xi — could lead to overflow
e
(ΣN
i=1 loge(xi )N
)— computationally simpler
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
Wait... how did I get that?
N
√ΠN
i=1xi — could lead to overflow
e
(ΣN
i=1 loge(xi )N
)— computationally simpler
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
Wait... how did I get that?
N
√ΠN
i=1xi — could lead to overflow
e
(ΣN
i=1 loge(xi )N
)— computationally simpler
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
Wait... how did I get that?
N
√ΠN
i=1xi — could lead to overflow
e
(ΣN
i=1 loge(xi )N
)— computationally simpler
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
Other means
And there is also the Harmonic mean, but forget about that
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 36
...though consequently
We have other margins of error• Geometric margin of error
• Uses geometric standard deviation• Median margin of error
• Uses ranges of actual values from data set
• Stick to the arithmetic MoE– simpler to calculate, simpler to read and not incorrect
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 37
...though consequently
We have other margins of error• Geometric margin of error
• Uses geometric standard deviation• Median margin of error
• Uses ranges of actual values from data set
• Stick to the arithmetic MoE– simpler to calculate, simpler to read and not incorrect
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 37
2Statistics - 2
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 38
2-1Distributions
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 39
Let’s look at some real charts
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 40
Sparse Distribution
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 41
Log-normal distribution
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 42
Bimodal distribution
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 43
What does all of this mean?
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 44
Distributions
• Sparse distribution suggests that you don’t have enoughdata points
• Log-normal distribution is typical• Bi-modal distribution suggests two (or more) distributions
combined
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 45
In practice, a bi-modal distribution is not uncommon
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 46
Hint: Does your site do a lot of back-end caching?
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 47
2-2Filtering
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 48
Outliers
• Out of range data points• Nothing you can fix here• There’s even a book about
them
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
Outliers
• Out of range data points• Nothing you can fix here• There’s even a book about
them
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
Outliers
• Out of range data points• Nothing you can fix here• There’s even a book about
them
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
Outliers
• Out of range data points• Nothing you can fix here• There’s even a book about
them
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
DNS problems can cause outliers
• 2 or 3 DNS servers for an ISP• 30 second timeout if first fails• ... 30 second increase in page load time• Maybe measure both and fix what you can• http://nms.lcs.mit.edu/papers/dns-ton2002.pdf
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 50
Band-pass filtering
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 51
Band-pass filtering
• Strip everything outside a reasonable range• Bandwidth range: 4kbps - 4Gbps• Page load time: 50ms - 120s
• You may need to relook at the ranges all the time
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 51
IQR filtering
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 52
IQR filtering
Here, we derive the range from the data
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 52
Further Reading
lognormal.com/blog/2012/08/13/analysing-performance-data/
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 53
Summary
• Choose a reasonable sample size and sampling factor• Tune sample size for minimal margin of error• Decide based on your data whether to use mode, median
or one of the means• Figure out whether your data is Normal, Log-Normal or
something else• Filter out anomalous outliers
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 54
• Philip Tellis
• .com• [email protected]
• @bluesmoon• geek paranoid speedfreak• http://bluesmoon.info/
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 55
Thank you
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 56
Photo credits
• http://www.flickr.com/photos/leoffreitas/332360959/ by leoffreitas
• http://www.flickr.com/photos/cobalt/56500295/ by cobalt123
• http://www.flickr.com/photos/sophistechate/4264466015/ by LisaBrewster
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 57
List of figures
• http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg
• http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg
• http://en.wikipedia.org/wiki/File:KilroySchematic.svg
• http://en.wikipedia.org/wiki/File:Boxplot_vs_PDF.png
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 58