Model Fitting
Jean-Yves Le Boudec
1
Contents
1. What is model fitting ?2. Linear Regression
3. Linear regression with norm minimization4. Choosing a distribution
5. Heavy Tail
2
Virus Infection DataWe would like to capture the growth of infected hosts
(explanatory model)
An exponential model seems appropriate
How can we fit the model, in particular, what is the value of ?
3
Least Square Fit of Virus Infection Data
4
Least square fit
= 0.5173
Mean doubling time 1.34 hours
Prediction at +6 hours: 100 000 hosts
Least Square Fit of Virus Infection Data In Log Scale
5
Least square fit
= 0.39
Mean doubling time 1.77 hours
Prediction at +6 hours: 39 000 hosts
Compare the Two
6
LS fit in natural scale
LS fit in log scale
Which Fitting Method should I use ?Which optimization criterion should I use ?
The answer is in a statistical model.Model not only the interesting part, but also the noise
For example
7
= 0.5173
How can I tell which is correct ?
8
= 0.39
Look at Residuals= validate model
9
10
Least Square Fit = Gaussian iid NoiseAssume model (homoscedasticity)
The theorem says: minimize least squares = compute MLE for this model
This is how we computed the estimates for the virus example
11
Least Square and Projection
Skrivañ war an daol petra zo: data point, predicted response and estimated parameter for virus example
12
Data point
Predicted response
Estimated parameter
ManifoldWhere the data point would lie if there would be no noise
Confidence Intervals
13
14
Robustness to « Outliers »
15
A Simple Example
Least Square
Model: noise
What is m ?
Confidence interval ?
L1 Norm Minimization
Model : noise
What is m ?
Confidence interval ?
16
Mean Versus Median
17
2. Linear RegressionAlso called « ANOVA » (Analysis of Variance »)
= least square + linear dependence on parameter
A special case where computations are easy
18
Example 4.3
What is the parameter ?Is it a linear model ?How many degrees of freedom ?What do we assume on i?
What is the matrix X ?
19
20
Does this model have full rank ?
21
Some Terminology
xi are called explanatory variableAssumed fixed and known
yi are called response variablesThey are « the data »Assumed to be one sample output of the model 22
Least Square and Projection
23
Data point
Predicted response
Estimated parameter
ManifoldWhere the data point would lie if there would be no noise
Solution of the Linear Regression Model
24
Least Square and ProjectionThe theorem gives H and K
25
residuals
Predicted response
Estimated parameter
ManifoldWhere the data point would lie if there would be no noise
data
The Theorem Gives with Confidence Interval
26
SSRConfidence Intervals use the quantity s
s2 is called « Sum of Squared Residuals »
27
residuals
Predicted response
data
Validate the Assumptions with Residuals
28
ResidualsResiduals are given by the theorem
29
residuals
Predicted response
data
Standardized ResidualsThe residuals ei are an estimate of the noise terms i
They are not (exactly) normal iid
The variance of ei is ????
A: 1- Hi,i
Standardized residuals are not exactly normal iid either but their variance is 1
30
Which of these two models could be a linear regression model ?
A: both
Linear regression does not mean that yi is a linear function of xi
Achtung: There is a hidden assumptionNoise is iid gaussian -> homoscedasticity
31
32
3. Linear Regression with L1 norm minimization
= L1 norm minimization + linear dependency on parameterMore robustLess traditional
33
This is convex programming
34
35
Confidence IntervalsNo closed form
Compare to median !
Boostrap:How ?
36
37
4. Choosing a DistributionKnow a catalog of distributions, guess a fit
ShapeKurtosis, SkewnessPower lawsHazard Rate
Fit Verify the fit visually or with a test (see later)
38
Distribution ShapeDistributions have a shape
By definition: the shape is what remains the same when we ShiftRescale
Example: normal distribution: what is the shape parameter ?
Example: exponential distribution: what is the shape parameter ?
39
Standard DistributionsIn a given catalog of distributions, we give only the distributions with different shapes. For each shape, we pick one particular distribution, which we call standard.
Standard normal: N(0,1)
Standard exponential: Exp(1)
Standard Uniform: U(0,1)
40
Log-Normal Distribution
41
42
Skewness and Curtosis
43
Power Laws and Pareto Distribution
44
Complementary Distribution FunctionsLog-log Scales
45
ParetoLognormal Normal
Zipf’s Law
46
47
Hazard RateInterpretation: probability that a flow dies in next dt seconds given still alive
Used to classify distribsAging
Memoriless
Fat tail
Ex: normal ? Exponential ? Pareto ? Log Normal ? 48
The Weibull DistributionStandard Weibull CDF:
Aging for c > 1Memoriless for c = 1Fat tailed for c <1
49
Fitting A DistributionAssume iidUse maximum likelihoodEx: assume gaussian; what are parameters ?
Frequent issuesCensoringCombinations
50
Censored DataWe want to fit a log normal distrib, but we have only data samples with values less than some max
Lognormal is fat tailed so we cannot ignore the tail
Idea: use the model
and estimate F0 and a (truncation threshold)
51
52
CombinationsWe want to fit a log normal distrib to the body and pareto to the tail
Model:
MLE satisfies
53
54
5. Heavy TailsRecall what fat tail isHeavier than fat:
55
Heavy Tail means Central Limit does not hold
Central limit theorem:
a sum of n independent random variables with finite second moment tends to have a normal distribution, when n is large
explains why we can often use normal assumption
But it does not always hold. It does not hold if random variables have infinite second moment.
56
Central Limit Theorem for Heavy Tails
57
One Sample of 10000 pointsPareto p = 1
normal qqplot histogram complementary d.f.log-log
58
1 sample, 10000 points average of 1000 samples
p=1
p=1.5
p=2
p=2.5
p=3
Convergence for heavy tailed distributions
59
Importance of Second Moment
60
RWP with Heavy TailStationary ?
61
Evidence of Heavy Tail
62
Testing Heavy TailAssume you have very large data set
Else no statement can be made
One can look at empirical cdf in log scale
63
Taqqu’s methodA better method (numerically safer is as follows).
Aggregate data multiple times
64
We should have
and
If ≈ log ( m2 / m1) then measure p = / pest = average of all p’s
65
66
Example
log ( 2) / plog ( 2)
Evidence of Heavy Tail
67
p = 1.08 ± 0.1
A Load Generator: SurgeDesigned to create load for a web serverUsed in next labSophisticated load modelIt is an example of a benchmark, there are many others – see lecture
68
User Equivalent ModelIdea: find a stochastice model that represents user wellUser modelled as sequence of downloads, followed by “think time”
Tool can implement several “user equivalents”
Used to generate real work over TCP connections
69
Characterization of UE
70
Weibull dsitributions
Successive file requests are not independent
Q: What would be the distribution if they were independent ?A: geometric
71
Fitting the distributions
Done by Surge authors with aest tool + ad-hoc (least quare fit of histogram)What other method could one use ?A: maximum likelihood with numerical optimization – issue is non iid-ness
72