CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is...

1

CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is fitter CH7: Choosing the objective function CH8: Theoretical stuff Ch9: Adding variables CH11. Choosing a model equation

6. Which fit is fitter

In this session:1. What makes for a good fit2. Introducing the CURE plot3. Eliminating ‘overall bias’4. The bias of a fit5. Using the CURE plot

2

What makes for a good fit?

Common ‘goodness-of-fit’ measures: R2, χ2, AIC,...These are ‘overall’ (single-number) measures.For application SPF they are insufficient. Recall…

Two perspectives on SPF

E{m} and { }s m = f(Traits, parameters)

Applications centered perspective

Cause and effect centered perspective

SPF Workshop February 2014, UBCO 3

• One judges the fit of a model by its residuals. • In SPFs for applications a fit is thought good only if the residuals are closely packed around 0 everywhere.

Perhaps acceptable


The main figure of merit for SPFs: Unbiased Everywhere

0 10 20 30 40 50 60 70 80 90 100

-40

-20

0

20

40

Variable value Res

idua

l: O

bser

ved

- F

itte

d

Fitted is too large

Fitted is too smallBut this one is not!

5SPF Workshop February 2014, UBCO

Informative?

The usual residual plot


But, when the same residuals are cumulated

From spreadsheet

Compute Residual → Cumulate → Plot


The CURE Plot

Now one can see!0-A, B-C, E-F: Observed>Fitted, not good;A-B, D-E, Fitted>Observed, bad; Where the drop is precipitous there may be outliers.

Resi

dual

: Obs

erve

d - F

itted


Benefits:1. Chaos is replaced by clarity. 2. We can recognize a good model.3. The cost of parameterization is clear..

(2) What should a good CURE plot look like?• Should not have long up or down runs• Should not have vertical drops• Should meander around the horizontal axis


(3) The cost of parametric curve fitting is now manifest

Imposing the function 1.675×(Segment Length)0.866 on the data causes bias almost everywhere!

No biasBiased estimates

Bad decisions

Real costs


How much bias is there?

Accumulated Accidents

Fitted Accidents

Bias Bias/Fitted Accident

Origin to A 1899 1596 303 0.19A to B 854 1532 -688 -0.44B to C

... ... ... ... ...

TAB=Total Accumulated Bias =303+|-688|+...

11

When the scale parameter is determined by ‘Solver’ the sum of fitted values is usually not the same as the sum of crash counts. This is a blemish.

To remove this blemish, add constraint

Levelling the playing field

Open spreadsheet #7. OLS with constraint

12

click

How to add constraints


With constraint

Now click ‘Solve’ to get


When is a CURE plot good enough?

Open (again): #7 OLS with constraint

Open: #8 CURE computations

After SOLVER with constraint was used you should now see:

Copy values in columns A, B, D and E into CURE spreadsheet


Copied

Important step:On ‘DATA’ tab choose ‘Sort’ and sort in ascending order by ‘miles’


Now add columns E, F, and G,Note that for the last row (n=5323) the Cumulated Residuals=0. Why?

C4-D4

F3+E4


0 1 2 3

-500

400

Below is a plot of segment length (column B) against cumulative residuals (column F)

Segment Length

Cum

ulati

ve re

sidu

als

Upward drift means that in this range ‘observed’ tends to be consistently larger than ‘fitted’.

Vertical gap is possible ‘outlier’

Truncated at 3 m

iles

The question was when a CURE plot is good enough.


Computing the limits which a random walk should seldom exceed. Details in text.

The last ‘cumulated squared residual’+2s’

-2s’


40% within ±0.5s’Stop, you are in danger of overfitting.

Rule of thumb: 95% within ±2s’. This fit does not pass muster.

Guidance:

20

Which fit is better?

Objective Function b0 b1

∑ squared differences 1.656 0.870

∑ absolute differences 1.618 0.911

The steeper the run the larger the bias;Red increased A to B bias.Black is better


Summary for section 6. (Which fit is fitter?)

1. For SPFs the main figure of merit is when the fit is unbiased everywhere;2. For applications R2, χ2, AIC,... ‘overall’ measures

are of limited use;3. The usual plot of residuals is not informative; the

CURE plot opens one’s eyes;4. We show how to compute bias and Total Accumulated

Bias. The cost of parametric C-F was manifest; 5. It is clear what a good CURE plot should look like;6. By adding a constraint we eliminated overall bias;


7. We computed ±2s’ limits and provided guidance on when a CURE plot is acceptable and when overfitting is a danger;

8. We showed how to decide which of two CURE plots is better.

9. All fits were bad. Perhaps, partly, because minimizing SSD is not good since crash count distributions are not symmetrical. What should be optimized? Next.

Date post:	29-Dec-2015
Category:	Documents
Upload:	amelia-mcdonald
View:	219 times
Download:	0 times

CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is...

Documents