+ All Categories
Home > Documents > Model Fitting

Model Fitting

Date post: 24-Feb-2016
Category:
Upload: lydia
View: 49 times
Download: 0 times
Share this document with a friend
Description:
Model Fitting. Jean-Yves Le Boudec. Contents. What is model fitting ? Linear Regression Linear regression with norm minimization Choosing a distribution Heavy Tail. Virus Infection Data. We would like to capture the growth of infected hosts (explanatory model) - PowerPoint PPT Presentation
72
Model Fitting Jean-Yves Le Boudec 1
Transcript
Page 1: Model Fitting

Model Fitting

Jean-Yves Le Boudec

1

Page 2: Model Fitting

Contents

1. What is model fitting ?2. Linear Regression

3. Linear regression with norm minimization4. Choosing a distribution

5. Heavy Tail

2

Page 3: Model Fitting

Virus Infection DataWe would like to capture the growth of infected hosts

(explanatory model)

An exponential model seems appropriate

How can we fit the model, in particular, what is the value of ?

3

Page 4: Model Fitting

Least Square Fit of Virus Infection Data

4

Least square fit

= 0.5173

Mean doubling time 1.34 hours

Prediction at +6 hours: 100 000 hosts

Page 5: Model Fitting

Least Square Fit of Virus Infection Data In Log Scale

5

Least square fit

= 0.39

Mean doubling time 1.77 hours

Prediction at +6 hours: 39 000 hosts

Page 6: Model Fitting

Compare the Two

6

LS fit in natural scale

LS fit in log scale

Page 7: Model Fitting

Which Fitting Method should I use ?Which optimization criterion should I use ?

The answer is in a statistical model.Model not only the interesting part, but also the noise

For example

7

= 0.5173

Page 8: Model Fitting

How can I tell which is correct ?

8

= 0.39

Page 9: Model Fitting

Look at Residuals= validate model

9

Page 10: Model Fitting

10

Page 11: Model Fitting

Least Square Fit = Gaussian iid NoiseAssume model (homoscedasticity)

The theorem says: minimize least squares = compute MLE for this model

This is how we computed the estimates for the virus example

11

Page 12: Model Fitting

Least Square and Projection

Skrivañ war an daol petra zo: data point, predicted response and estimated parameter for virus example

12

Data point

Predicted response

Estimated parameter

ManifoldWhere the data point would lie if there would be no noise

Page 13: Model Fitting

Confidence Intervals

13

Page 14: Model Fitting

14

Page 15: Model Fitting

Robustness to « Outliers »

15

Page 16: Model Fitting

A Simple Example

Least Square

Model: noise

What is m ?

Confidence interval ?

L1 Norm Minimization

Model : noise

What is m ?

Confidence interval ?

16

Page 17: Model Fitting

Mean Versus Median

17

Page 18: Model Fitting

2. Linear RegressionAlso called « ANOVA » (Analysis of Variance »)

= least square + linear dependence on parameter

A special case where computations are easy

18

Page 19: Model Fitting

Example 4.3

What is the parameter ?Is it a linear model ?How many degrees of freedom ?What do we assume on i?

What is the matrix X ?

19

Page 20: Model Fitting

20

Page 21: Model Fitting

Does this model have full rank ?

21

Page 22: Model Fitting

Some Terminology

xi are called explanatory variableAssumed fixed and known

yi are called response variablesThey are « the data »Assumed to be one sample output of the model 22

Page 23: Model Fitting

Least Square and Projection

23

Data point

Predicted response

Estimated parameter

ManifoldWhere the data point would lie if there would be no noise

Page 24: Model Fitting

Solution of the Linear Regression Model

24

Page 25: Model Fitting

Least Square and ProjectionThe theorem gives H and K

25

residuals

Predicted response

Estimated parameter

ManifoldWhere the data point would lie if there would be no noise

data

Page 26: Model Fitting

The Theorem Gives with Confidence Interval

26

Page 27: Model Fitting

SSRConfidence Intervals use the quantity s

s2 is called « Sum of Squared Residuals »

27

residuals

Predicted response

data

Page 28: Model Fitting

Validate the Assumptions with Residuals

28

Page 29: Model Fitting

ResidualsResiduals are given by the theorem

29

residuals

Predicted response

data

Page 30: Model Fitting

Standardized ResidualsThe residuals ei are an estimate of the noise terms i

They are not (exactly) normal iid

The variance of ei is ????

A: 1- Hi,i

Standardized residuals are not exactly normal iid either but their variance is 1

30

Page 31: Model Fitting

Which of these two models could be a linear regression model ?

A: both

Linear regression does not mean that yi is a linear function of xi

Achtung: There is a hidden assumptionNoise is iid gaussian -> homoscedasticity

31

Page 32: Model Fitting

32

Page 33: Model Fitting

3. Linear Regression with L1 norm minimization

= L1 norm minimization + linear dependency on parameterMore robustLess traditional

33

Page 34: Model Fitting

This is convex programming

34

Page 35: Model Fitting

35

Page 36: Model Fitting

Confidence IntervalsNo closed form

Compare to median !

Boostrap:How ?

36

Page 37: Model Fitting

37

Page 38: Model Fitting

4. Choosing a DistributionKnow a catalog of distributions, guess a fit

ShapeKurtosis, SkewnessPower lawsHazard Rate

Fit Verify the fit visually or with a test (see later)

38

Page 39: Model Fitting

Distribution ShapeDistributions have a shape

By definition: the shape is what remains the same when we ShiftRescale

Example: normal distribution: what is the shape parameter ?

Example: exponential distribution: what is the shape parameter ?

39

Page 40: Model Fitting

Standard DistributionsIn a given catalog of distributions, we give only the distributions with different shapes. For each shape, we pick one particular distribution, which we call standard.

Standard normal: N(0,1)

Standard exponential: Exp(1)

Standard Uniform: U(0,1)

40

Page 41: Model Fitting

Log-Normal Distribution

41

Page 42: Model Fitting

42

Page 43: Model Fitting

Skewness and Curtosis

43

Page 44: Model Fitting

Power Laws and Pareto Distribution

44

Page 45: Model Fitting

Complementary Distribution FunctionsLog-log Scales

45

ParetoLognormal Normal

Page 46: Model Fitting

Zipf’s Law

46

Page 47: Model Fitting

47

Page 48: Model Fitting

Hazard RateInterpretation: probability that a flow dies in next dt seconds given still alive

Used to classify distribsAging

Memoriless

Fat tail

Ex: normal ? Exponential ? Pareto ? Log Normal ? 48

Page 49: Model Fitting

The Weibull DistributionStandard Weibull CDF:

Aging for c > 1Memoriless for c = 1Fat tailed for c <1

49

Page 50: Model Fitting

Fitting A DistributionAssume iidUse maximum likelihoodEx: assume gaussian; what are parameters ?

Frequent issuesCensoringCombinations

50

Page 51: Model Fitting

Censored DataWe want to fit a log normal distrib, but we have only data samples with values less than some max

Lognormal is fat tailed so we cannot ignore the tail

Idea: use the model

and estimate F0 and a (truncation threshold)

51

Page 52: Model Fitting

52

Page 53: Model Fitting

CombinationsWe want to fit a log normal distrib to the body and pareto to the tail

Model:

MLE satisfies

53

Page 54: Model Fitting

54

Page 55: Model Fitting

5. Heavy TailsRecall what fat tail isHeavier than fat:

55

Page 56: Model Fitting

Heavy Tail means Central Limit does not hold

Central limit theorem:

a sum of n independent random variables with finite second moment tends to have a normal distribution, when n is large

explains why we can often use normal assumption

But it does not always hold. It does not hold if random variables have infinite second moment.

56

Page 57: Model Fitting

Central Limit Theorem for Heavy Tails

57

One Sample of 10000 pointsPareto p = 1

normal qqplot histogram complementary d.f.log-log

Page 58: Model Fitting

58

1 sample, 10000 points average of 1000 samples

p=1

p=1.5

p=2

p=2.5

p=3

Page 59: Model Fitting

Convergence for heavy tailed distributions

59

Page 60: Model Fitting

Importance of Second Moment

60

Page 61: Model Fitting

RWP with Heavy TailStationary ?

61

Page 62: Model Fitting

Evidence of Heavy Tail

62

Page 63: Model Fitting

Testing Heavy TailAssume you have very large data set

Else no statement can be made

One can look at empirical cdf in log scale

63

Page 64: Model Fitting

Taqqu’s methodA better method (numerically safer is as follows).

Aggregate data multiple times

64

Page 65: Model Fitting

We should have

and

If ≈ log ( m2 / m1) then measure p = / pest = average of all p’s

65

Page 66: Model Fitting

66

Example

log ( 2) / plog ( 2)

Page 67: Model Fitting

Evidence of Heavy Tail

67

p = 1.08 ± 0.1

Page 68: Model Fitting

A Load Generator: SurgeDesigned to create load for a web serverUsed in next labSophisticated load modelIt is an example of a benchmark, there are many others – see lecture

68

Page 69: Model Fitting

User Equivalent ModelIdea: find a stochastice model that represents user wellUser modelled as sequence of downloads, followed by “think time”

Tool can implement several “user equivalents”

Used to generate real work over TCP connections

69

Page 70: Model Fitting

Characterization of UE

70

Weibull dsitributions

Page 71: Model Fitting

Successive file requests are not independent

Q: What would be the distribution if they were independent ?A: geometric

71

Page 72: Model Fitting

Fitting the distributions

Done by Surge authors with aest tool + ad-hoc (least quare fit of histogram)What other method could one use ?A: maximum likelihood with numerical optimization – issue is non iid-ness

72


Recommended