Analysing the development of road safety using demographic data

Last changed by: bijleveld, svn revision: 1055, date: 2013-12-16 17:07:41 +0100 (Mon, 16 Dec 2013)

Large scale disaggregation by age and conflict type to assess the impact of

past and future demographic changes on road safety in the Netherlands

Frits Bijlevelda,b,∗, Yvette van Nordenc, Henk Stipdonka

aSWOV Institute for Road Safety Research, The Hague, The NetherlandsbVU University Amsterdam, The Netherlands

cErasmus MC-Daniel den Hoed Cancer Center, P.O. box 5201, 3008 AE Rotterdam, The Netherlands

Abstract

In this paper ...

Keywords: Disaggregation, Road safety, Exposure to risk, Injury risk, the Netherlands, Age, Crashtype, Distance travelled, Population

1. Introduction

This paper presents details of an approach tolarge scale disaggregation of road crash data byage and conflict type that is implemented in alonger term road safety forecasting study in theNetherlands (van Norden et al., 2010).

One of the presumptions of the approach isthat as a result of changes over time in the demo-graphics of a country the age distribution of trafficparticipants (slowly) changed over time (Stipdonket al., 2013). As the population in a country ages,relatively more elderly individuals may take partin travel. This in turn may result in relativelymore elderly drivers which is likely to be associ-ated more travel by the demographic group of el-derly drivers. By the same token it is likely thatthe share of travel by younger drivers decreasesover time.

As younger drivers are considered to be moredangerous than older (assumed to be more expe-rienced) drivers (Maycock et al., 1991; Vlakveld,2011, Chapter 1, Figure 1.3), a future reductionof the share of young drivers in travel may resultin an improvement of safety for all traffic partic-ipants.

∗Corresponding author

Conversely, as elderly drivers are more vulner-able to harm than younger drivers once involvedin a crash (Davidse, 2007, Chapter 1), an increasein the share of travel by elderly drivers may re-sult in an increase in the share among victims ofelderly victims.

When longer term forecasts are to be made,one aspect among others to consider is whethersuch changes in demographics may affect suchforecasts.

The model presented in this paper in its coredecomposes the development of the expected num-ber of victims of each crash type in the productof two developments: (1) one dependent on theage of the driver of the victim (thus not neces-sarily the age of the victim) and his or her modeof transportation and (2) one dependent on theage of the driver of the vehicle the victim initiallycollides with and his or her mode of transporta-tion. In the process, the model allows for flexibleoptions to assess the road safety development ofgroups of road users of interest for policy makers.

In their study van Norden et al. (2010) presenta forecasting approach for two types of victims:the number of fatalities and the number of victimsin road crashes requiring inpatient care having amaximum score on the Abbreviated Injury Scale(Gennarelli and Wodzin, 2005) of 2 or above. For

Preprint submitted to Elsevier December 17, 2013

both fatalities and thus defined seriously injured,the victim counts are further disaggregated bycrash type. The forecast horizon in van Nordenet al. (2010) was 2020.

This paper presents details of that approachapplied to seriously injured involved in crasheswith cars where the victim was traveling by caror by bicycle. A less extensive disaggregation isused in van Norden et al. (2010) for other crashtypes and where fatalities are considered, resultsof which are not reported here.

Age, expressed in age year (denoting the ageof a driver at the time of the crash), and time,expressed in calendar year (denoting the year inwhich a crash took place) are distinguished. Bothquantities are rounded to the nearest lower inte-ger. As the age in years of both the driver ofthe victim and the driver of the colliding vehicleare considered separately, and for each combina-tion the development of the number of victimsis to be modeled, a substantial number of sub-models is to be identified, involving many param-eters. It is therefore of paramount importance tocontrol the (equivalent, effective) degrees of free-dom in the model. In order to do this the meth-ods used in this paper are smoothing techniquesbased on penalized likelihood using (k-fold) cross-validation (see, for instance Green and Silverman,1994; Hastie et al., 2008; Efron and Tibshirani,1993; Hastie and Tibshirani, 1990).

This paper is organized as follows...

2. Method

2.1. Introduction

This paper presents the details of models forthe number of inpatient victims resulting fromcar-car and bicycle-car crashes used in van Nordenet al. (2010). A model is developed separately forboth crash types that, decomposes the predictednumber of victims

• by the age i of the driver of the victim and

• the age j of the driver of the colliding vehi-cle,

in two components depending either on the age ofthe driver of the victim or on the age of the driverof the vehicle the victim collides with. These com-ponents are in principle not considered constantover time.

Assuming the disaggregate number of victimsof a certain crash type in year t where the ageof the driver of the victim is i and the age ofthe colliding driver is j to be denoted by ntij, weloosely obtain:

ntij = A(1)ti × A

(2)tj × Etij, (1)

where A(1)ti is defined as the contribution due to

the driver of the victim (with age i) in year t and

A(2)tj the contribution due to the driver of the vehi-

cle collided with (with age j). Etij is an appropri-ately defined (multiplicative) disturbance term.In our application t in (1) ranges from 1995 to2009, while 0 ≤ i, j < 100. A log linear ap-proach is taken for (1), further decomposing A

(1)ti

and A(2)tj into a part (M

(1)ti , respectively M

(2)tj ) re-

lated to traffic volume (distance travelled) and a

part(

log(ρ(1)ti

), log

(ρ(2)tj

))related to a general-

ized risk. We thus obtain

log (ntij) = c1 × log(M

(1)ti

)+ log

(ρ(1)ti

)+

c2 × log(M

(2)tj

)+ log

(ρ(2)tj

)+ εtij (2)

where εtij is an appropriately defined disturbanceterm, which is further omitted where possible forbrevity. c1 and c2 are regression coefficients. Thedata used for M

(1)ti and M

(2)tj are obtained from

[cite: ‘van-norden-yrs-of-zo-anders-safetynet-d710?’ here], where a smoothing technique basedon travel survey data is used. The original travelsurvey data are obtained from the Dutch nationaltravel survey collected by the ministry of trans-port and Statistics Netherlands, ([cite: ‘SWOV/NTS’here][cite: ‘SWOVwebsite’ here]). The esti-

matesM(1)ti andM

(2)tj are assumed fixed and known

in this study.The central task in this project is to identify

and quantify logarithms of ρ(1)ti and ρ

(2)tj , both for

bicycle-car crashes and car-car crashes. The log-arithms are modeled in order to maintain posi-tiveness of the resulting estimates of ρ

(1)ti and ρ

(2)tj .

2

ρ(1)ti can roughly be considered the function that

describes the age dependency of the risk to ‘pro-duce’ a victim of the driver of the victim while ρ

(2)tj

could roughly be considered the function that de-scribes the age dependency of the risk to becomeinvolved in a crash as the colliding party.

The main result presented in this paper is howsmooth estimates of ρ

(1)ti and ρ

(2)tj where obtained

for victims of bicycle-car and car-car crashes, andwhat in these cases estimates of ρ

(1)ti and ρ

(2)tj are.

It is of particular interest to compare the shapes ofρ(2)tj for bicycle-car crashes to its counterpart for

car-car crashes, as both could be interpreted asthe harmfulness of – in a sense the same – collidingdriver.

To our knowledge there is no well-known can-didate parametric function (of time and age) for

either ρ(1)ti or ρ

(2)tj . Therefore a non parametric re-

gression approach is taken. The approach takeneffectively means that one parameter for each ρ

(1)ti

and one for each ρ(2)tj , t = 1995, . . . , 2009, i, j =

0, . . . , 99 needs to be estimated. We however as-sume the the shapes of ρ

(1)ti and ρ

(2)tj to be gener-

ally smooth, as values for (calendar and age) yearsclose to each other are likely to be similar.

Figure 1 presents a diagram with potential di-rections along which it can be argued that valuesof ρ

(1)ti and ρ

(2)tj might be close to each other. The

vertical arrows (constant calendar years)suggest atendency that the values should not differ muchbetween ages for each calendar year. It is alsopossible that if two neighboring values differ, thatthe next two values may differ by approximatelythe same amount. It is therefore assumed thatfor both ρ

(1)ti and ρ

(2)tj , a smooth line can be drawn

for each t through the age years. One plausibleexception might be a break at the (licensing) ageof 18 for car drivers. Such breaks however canbe modeled explicitly, which is done in this studywhere relevant. The horizontal lines (constant ageyear) indicate potential similarity between age-goups over subsequent years. The dashed diago-nal arrows (constant year of birth) indicate a pos-sible similarity within cohorts. It is assumed thatρ(1) and ρ(2) are smooth in these directions.

Age 18

Hin 1999L

Age 19

Hin 1999L

Age 20

Hin 1999L

Age 18

Hin 2000L

Age 19

Hin 2000L

Age 20

Hin 2000L

Figure 1: Schematic diagram of relations between values

of ρ(1)ti or ρ

(2)tj . Only a small part is shown. Vertical ar-

rows symbolize relations between age groups. Horizontalarrows symbolize relations between subsequent calendaryears. Diagonal (dashed) arrows symbolize relations be-tween groups with the same year of birth (birth cohort).

In this study, the smoothness of ρ(1) and ρ(2) ismeasured separately using the smoothness alongthe calendar year (time) direction and in one casealong the (vertical) age year direction and in an-other case along the (diagonal) cohort direction.

Taking a non parametric approach howeverhas substantial bearing on estimating the coef-ficients c1 and c2 in (2). Without any criteria inaddition to the fit to the data, for each ρ(1).. andc1, a complementary ρ̃(1).. (and c̃1 ≡ 0) exists thatsatisfies

ρ̃(1)ti = c1 × log

(M

(1)ti

)+ ρ

(1)ti ,

which fits the data equally well without the needfor additional traffic volume data. A similar ar-gument can be held for ρ(2).. and c2. If relevant,use of traffic volume data however will affect thesmoothness of both ρ(1).. and ρ(2).. , and will affectfurther properties. We will return to this issuebelow[where?].

2.2. Estimation approach

Many parameters are needed for a completelyfree-form shape of ρ(1).. and ρ(2).. , one for each value.

3

It is therefore essential to control the (equiva-lent, effective) degrees of freedom in the model.In this study this is attempted using an heuristicapproach along the lines of penalized smoothingtechniques combined with (k-fold) cross-validation(see Hastie et al., 2008; Green and Silverman,1994).

The (k-fold) approach taken in this paper isas follows: instead of optimizing the likelihoodbased on (2) only, the likelihood of (2) is penal-ized for the roughness of both ρ(1).. and ρ(2).. usingroughness penalty functions: (schematically)

f(θ) = −log likelihood of (2)+

penalty(ρ(1).. ) + penalty(ρ(2).. ), (3)

where it is assumed that θ contains all parametersin the right hand side of the equation. The rough-ness penalties penalty(ρ(1).. ) and penalty(ρ(2).. ) aredecomposed into two further components: one pe-nalizing the roughness along the ‘year’ axis (pyear),and one along the age (or birth year) axis (page).This can be formulated as follows (assuming thecoefficients λij > 0, detailed below):

penalty(ρ(1)) =λ11∑

t∈years

∑i∈ages

pyear

(ρ(1)ti

)+

λ12∑

t∈years

∑i∈ages

page

(ρ(1)ti

), (4)

penalty(ρ(2)) =λ21∑

t∈years

∑j∈ages

pyear

(ρ(2)tj

)+

λ22∑

t∈years

∑j∈ages

page

(ρ(2)tj

). (5)

Both pyear and page are defined as the square ofthe unit-length (h = 1 in (25.3.23) of Abramowitzand Stegun (1968)) numerical estimate of the sec-ond order derivative of ρ(1).. and ρ(2).. along the yearand age axis respectively, and (apart from a scalefactor, as actually h > 1) along the diagonal inthe cohort case. page is thus defined as:

page(ρtk) =((ρt (k+1) − ρtk

)−(ρtk − ρt (k−1)

))2=(ρt (k+1) − 2 ρtk + ρt (k−1)

)2.

In case of the birth year cohort approach, page isdefined as

page(ρtk) =(ρ(t+1) (k+1) − 2 ρtk + ρ(t−1) (k−1)

)2,

while for the calendar year case, pyear is defined as

pyear(ρtk) =(ρ(t+1) k − 2 ρtk + ρ(t−1) k

)2.

For brevity, using

p1(ρ) =∑

t∈years

∑i∈ages

pyear (ρti)

and

p2(ρ) =∑

t∈years

∑i∈ages

page (ρti)

we summarize (3), (4) and (5) as:

f(θ) = −log likelihood of (2)+

λ11 p1(ρ(1).. ) + λ12 p2(ρ

(1).. )+

λ21 p1(ρ(2).. ) + λ22 p2(ρ

(2).. ). (6)

Mark that if λ11 = λ12 = λ21 = λ22 = 0, minimiz-ing (6) for θ yields an overfitted, almost saturatedmodel. On the other hand, minimizing (6) for θwith λ11, λ12, λ21 and λ22 sufficiently large yieldsan approach similar to a classic approach to solv-ing a constrained minimization problem (Chap-ter 12, Luenberger, 1984). That approach can beused to maximize the log likelihood of (2) whilesatisfying the constraints p1(ρ

(1).. ) = 0, p2(ρ

(1).. ) =

0, p1(ρ(2).. ) = 0 and p2(ρ

(2).. ) = 0 (Luenberger, 1984,

Chapter 12). The relevance of this finding for thecurrent analysis is that, if these constraints aremet, ρ(1).. and ρ(2).. are effectively two dimensional(flat) planes. Thus, although many parametersare used, the model degrees of freedom are equiv-alent to the degrees of freedom of a model that ex-plicitely models a plane for each. Practically, withfinite λ11, λ12, λ21 and λ22, the equivalent degreesof freedom is somewhere in between. [We havenot developed an estimate of the effectivedegrees of freedom yet.]

A further consequence is that this approachwill not penalize the addition of a linear functionto either ρ(1).. or ρ(2).. . As a result, the penalty onρ(.).. is identical to the penalty on ρ(.).. + (a× t+ b).It is therefore from (2) immediately clear that

some standardization is required as ρ(1)ti + ρ

(2)tj =

(ρ(1)ti −(a×t+b))+(ρ

(2)tj +(a×t+b)) for any linear

4

function of time a×t+b. This means that only rel-ative tendencies can be identified within ρ

(1)ti and

separately within ρ(2)tj . To solve this identification

problem, ρ(2).. for the forty year old age group is

fixed ρ(2)t40 ≡ 0.

2.3. Estimation of λ11, λ12, λ21 and λ22

In Section 2.2 it is described how the log like-lihood of (2) can be optimized while to some ex-tent controlling the smoothness of ρ(1).. and ρ(2)..

using the parameters λ11, λ12, λ21 and λ22. Thissection describes how values for these parametersare estimated. Estimating λ11, λ12, λ21 and λ22alongside the parameters in ρ(1).. and ρ(2).. will re-sult in an overfitted model. The remedy chosenin this study is to fit the model (with given λ11,λ12, λ21 and λ22) to part of the data, and judgethe fit based on how well the model explains therest of the data.

Practically, the data are ordered by time andage. All combinations of time and age are con-sidered and are randomly partitioned in 4 par-titions1, which where then kept throughout theproject. Each time and age combination (pair)appears in exactly one partition.

An example of a random partition is in Fig-ure 2. Figure 2 contains 8 panes in two columns.All 8 panes have 10 × 10 squares. Concentrat-ing on the right hand column, we can assume thecolor of each square to represent the presence ofa year × age pair included in that pane: if thesquare is black the pair is present, if the square iswhite, the pair is not present. The partition canbe created by randomly distributing the pairs overthe four panes.

On the left hand side of Figure 2 the com-plement of the partition on the right hand side isgiven. For each partition (row) the model is fittedto the data associated with the black pixels on theleft hand side using given (thus fixed) values forλ11, λ12, λ21 and λ22. The results are then used topredict the data associated with the black pixels

1Also 20, 100 and in some cases 200 partitions havebeen used to assess effect on the solution of the choice ofthe number of partitions. Results of these analyses are notreported here.

Figure 2: Estimation scheme using the random partition.The four rows are associated with one random partition(right hand panel). In the left hand panel the complementof that partition is displayed. For each partition the modelis fitted to the data points marked black on the left handside. The results are used to predict the data marked blackon the right hand side.

5

on the right hand side. As all data are includedin exactly one partition, a prediction of all datapoints is thus achieved (the 4 partitions that aremutually exclusive, and collectively exhaustive).The overall likelihood of all victim counts basedon Poisson likelihood given these combined pre-dictions can be calculated. This likelihood is con-sidered a function of λ11, λ12, λ21 and λ22. Thislikelihood is maximized for these parameters andthe maximizing values of λ11, λ12, λ21 and λ22 arethereafter used in the same procedure detailed inSection 2.2, but now with all data (no partitions),to obtain estimates of ρ(1).. and ρ(2).. . Results of thevalues obtained for the models considered can befound in Table 6. The logarithms of the param-eters λ11, λ12, λ21 and λ22 obtained through thisprocedure can be found in Table 8.

3. Application

3.1. Models considered

Eight different models are distinguished in thisstudy: based on how traffic volume is considered,on which manner age dependency is modeled andwhether or not a log-linear time trend is consid-ered in the risk components:

1. Log-linear : Smooth along the age axis. Log-linear along the time axis. Coefficients for traf-fic volume c1 and c2 (in (2)) fixed at 1.

2. Log-linear-C : Smooth along the age axis. Log-linear along the time axis. Coefficients for traf-fic volume c1 and c2 estimated.

3. Smooth Smooth along the age axis and thetime axis. Coefficients for traffic volume c1 andc2 fixed at 1.

4. Smooth-C Smooth along the age axis and thetime axis. Coefficients for traffic volume c1 andc2 estimated.

5. Log-linear No Mobility Smooth along the ageaxis. Log-linear along the time axis. Coeffi-cients for traffic volume c1 and c2 fixed at 0.

6. Smooth No Mobility Smooth along the age axisand the time axis. Coefficients for traffic vol-ume c1 and c2 fixed at 0.

Initially the cohort models were also consid-ered:

a. Smooth Cohort Smooth along the birth yearcohort axis and the time axis. Coefficients fortraffic volume fixed at 1.

b. Smooth Cohort-C Smooth along the birth yearcohort axis and the time axis. Coefficients fortraffic volume estimated.

Although is is possible to model the log-linearmodels practically by implementing a very largevalue for λ12 and λ22, the linear form is explicitlyimplemented in ρ(1) and ρ(2).

Visual inspection of the ρ(1).. and ρ(2).. surfaces(Figures 3, 5, 4, and 6) reveals that, at least forcar×car and bicycle×car crashes, not much canbe gained from the cohort approach (items a andb above). It appears that the age of the driverdetermines the ‘risk’ to a greater extent than theyear of birth of the driver. Therefore, analysis isfocussed on the remaining models.

3.2. Assessment of the relevance of traffic volumedata

3.2.1. Visual inspection

Inspecting Figures 3, 5, 4, and 6 reveals thatas far as the shape of ρ(1) and ρ(2) are concerned,the solutions for c1 and c2 estimated compared toc1 ≡ 1 and c2 ≡ 1 are very similar, whereas c1 ≡ 0and c2 ≡ 0 yield substantially different results.

It is clear from Figures 3(b), 3(d), 4(b), 4(d),5(b) and 5(d) (right had side panels) that youngdrivers pose a risk, regardless whether they collidewith bicycles or cars. Figures 3(f), 4(f) and 5(f)reveal that, once traffic volume data is not avail-able, older drivers appear to have a greater share.This is caused by the fact that in these mod-els no correction for exposure is made. Similarphenomena can be found inspecting Figures 3(a),3(c), 3(e), 4(a), 4(c) and 4(e). The fact that inFigures 3(e) and 4(e) the younger bicyclists havesuch a high peak where this effect is not present inthe other figures, is most likely also due to expo-sure. Finally it seems visible that the elderly arealso at risk, both as a victim, and as a collidingdriver. However, data is sparse for elderly driver,in particular for ages 80+, results for which aretherefore suppressed in the figures.

6

stipdonk

Note

alternatief: it is possible to model the log-linear models by implementing a large value for lambda 12 and 22, thereby removing the age dependence of the model. However, ... Klopt het trouwens?

stipdonk

Note

Dit is het volgens mij dus niet precies. tijd, age en year of birth zijn afhankelijk dus met twee van de drie heb je alles, maakt in eerste orde benadering niet uit welke je neemt, tenzij het risicovlak een hoge ddf/dxdy heeft. Wat jij zegt is denk ik te kort door de bocht

stipdonk

Note

volgorde?

stipdonk

Note

(loglinear mobility)

stipdonk

Note

(constant mobility)

stipdonk

Insert Text

n

stipdonk

Insert Text

and hence, no correction for exposure can be made,

stipdonk

Strikeout

stipdonk

Note

Wat bedoel je precies? Het is toch juist andersom? Hun risico is hoger, maar als je de mobiliteit weghaalt, zie je geen hoger aandeel meer. Of begrijp ik het niet? Het zijn wel veel afbeeldingen. Zijn ze allemaal nodig? Bijvoorbeeld: het verschil tussen fig 3 en fig 4 hoeven we niet 6x te laten zien , voor fig 5 en fig 6 idem. Aan de andere kant: er zijn ook features die je wel ziet waar waarover we niets zeggen (de kleine bult in de richhel bij jonge bestuurders rond 2000-2004

stipdonk

Note

(e.g. children cycling to secondary school)

(a) Joint ρ(1), c1 estimated. (b) Joint ρ(2), c2 estimated.

(c) Joint ρ(1), c1 ≡ 1. (d) Joint ρ(2), c2 ≡ 1.

(e) Joint ρ(1), c1 ≡ 0. (f) Joint ρ(2), c2 ≡ 0.

Figure 3: Joint surfaceplots (100 samples) for crashtype “bicycle × car”(smooth time).

7

stipdonk

Note

In het onderschrift en de subonderschriften zou ik liever wat meer in woorden zetten wat het betreft (dus: optimized loglineai car mobility, bictcle mobility, etc) En kunnen de schalen van bovenste en middelste grafieken meer gelijk? Dat maakt beter duidelijk dat de fbeldingen op elkaar lijken.

stipdonk

Note

age car driver

stipdonk

Note

age cyclist

stipdonk

Note

Hoewel iedereen het doet vind ik rate geen goed woord voor een grootheid met een dimensie. rate is verhouding (dimensieloos). voor mij toch liever risk.




Figure 4: Joint surfaceplots (100 samples) for crashtype “bicycle × car”(log-linear time).

8




Figure 5: Joint surfaceplots (100 samples) for crashtype “car × car”(smooth time).

9



(e) Joint ρ(1) , c1 ≡ 0. (f) Joint ρ(2), c2 ≡ 0.

Figure 6: Joint surfaceplots (100 samples) for crashtype “car × car”(log-linear time).

10

3.2.2. Quantitative measures

The general approach taken in this study is topartition the data in (n = 4) partitions (see Fig-ure 2) where each partition is predicted out-of-sample by a model fitted to the rest of the data.In order to assess the relative merits of the modelsconsidered, the predictive performance (in termsof likelihood for these partitions) is compared be-tween models. Because the random nature of thepartitioning scheme used potentially affects theresults, comparisons are made based on 100 re-runs of the partitioning scheme. This results in100 × 4 = 400 partitions based on which predic-tive performance is compared.

To compare two models, pairwise comparisonsof the log-likelihood of the partition given the pre-dictions are made. In Table 1 the following com-parisons are made:

Bla Bla Blathis approach was applied a hundred times (a

hundred samples): a hundred times all data waspartitioned in four partitions.

The relative performance of models is deter-mined by comparing the predictive performanceon each of the 100 × 4 = 400 partitions.

Practically, the difference in likelihood (basedon the Poisson distribution)

by calculating the log-likelihood for each modelto be compared.

For each sample, the data of four partitions arepredicted out of sample using the remaining data.In the following models based on estimated c1 andc2, c1 ≡ 1 and c2 ≡ 1 and well as c1 ≡ 0 and c2 ≡0 are compared pairwise by first calculating thedifference in (log)likelihood between the out-of-sample predictions of the models to be compared.

the data was pIn Table 1 results of a pairwise comparison of

the cross-validated fit is given.For each comparison (listed in the left-hand

column). The general approach taken is to parti-tion the data in (n = 4) partitions (see Figure 2)where each partition is predicted out-of-sample bya model fitted to the rest of the data. In total ahundred times the data partitions are made

3.3. Discussion

One property to consider of the method usedis the fact that the partition is randomly chosen.This means that the results found in Table 6 andTable 8 may be different when a different butequally possible partition is used. For that rea-son part of the analysis is performed using nineother random partitions. See Section ??.

11

stipdonk

Note

Is dit de "100" die je in de captions noemt? Dan zou ik dat er daar uithalen (want het is voor het begrijpen van de afbeeldingen niet informatief.

stipdonk

Note

Wat hierboven staat is toch duidelijk? Ik zou geen alternatief zoeken? Of ben je ergens nog niet tevreden over?

stipdonk

Note

Daar kom je toch wel uit? pairwise comparisons were made between the log likelihoods of models using c=0, c=1 and c=optimized. The results show that c=optimized and c=1 are very much alike. En in de discussie erop terugkomen, met de constatering dat c<>1 niet help bij het beter begrijpen van de vv-ontwikkeling.

(a) bicycle × car

Mean Median 2.5% 25% 75% 97.5%“C estimated” and “C fixed at 0” 16.31 8.85 −0.66 8.85 23.97 36.12“C fixed at 1” and “C fixed at 0” 11.23 0.53 −13.93 0.53 22.34 42.26“C estimated” and “C fixed at 1” 5.09 1.53 −5.68 1.53 9.32 14.21

(b) bicycle × car (smoothed time)

Mean Median 2.5% 25% 75% 97.5%“C estimated” and “C fixed at 0” 23.33 13.78 −0.59 13.78 33.26 53.05“C fixed at 1” and “C fixed at 0” 22.31 11.43 −6.06 11.43 32.50 59.24“C estimated” and “C fixed at 1” 1.02 −2.44 −14.08 −2.44 4.70 20.50

(c) car × car

Mean Median 2.5% 25% 75% 97.5%“C estimated” and “C fixed at 0” 25.54 15.41 −3.28 15.41 37.02 55.11“C fixed at 1” and “C fixed at 0” 26.91 14.47 −1.91 14.47 38.83 63.69“C estimated” and “C fixed at 1” −1.37 −2.52 −6.91 −2.52 0.36 2.13

(d) car × car (smoothed time)

Mean Median 2.5% 25% 75% 97.5%“C estimated” and “C fixed at 0” 18.17 9.67 −5.54 9.67 27.53 43.63“C fixed at 1” and “C fixed at 0” 19.72 9.55 −5.07 9.55 29.94 50.38“C estimated” and “C fixed at 1” −1.55 −3.84 −11.32 −3.84 −0.20 17.20

Table 1: Statistics on pairwise differences in cross-validated log-likelihood over all partitions and all samples, usingpenalty values optimized at for each estimate.

1995 2000 2005 20100

500

1000

1500

2000

Time

Num

ber

ofvi

ctim

s

(a) vicCarCarplot.pdf.

1995 2000 2005 20100

200

400

600

800

1000

1200

1400

Time

Num

ber

ofvi

ctim

s

(b) vicBicycleCarplotSamples.pdf

Figure 7: Aggregate development

12

4. Discussion

Bla Bla BlaWith respect to the usability, the following re-

sults have emerged. Firstly, the whole estima-tion procedure is computationally very extensive.With current computer systems computation maytake a prolonged time. Future computer technol-ogy is likely to lessen this issue.

Since the models have been implemented, somelimited further analysis is performed. This analy-sis does not constitute a complete statistical anal-ysis, which would require further research and im-plementation efforts. The current analysis is in-tended as an exploration into the possibilities offurther steps in this direction: stop here, developa reduced model based on off-the-shelf techniquesor perform a complete statistical analysis.

Based on the results found in Tables 2–5 it isconcluded that the car × car model appears to bethe best among the ones considered, This howeverneed not be the case for the bicycle × car model.From Tables 2–3 it can be inferred that the model

regardless of whether a smooth time develop-ment is assumed.

The reader should be warned that this dis-cussion concerns the presence of differences in fitvalues, not considering the magnitude of these dif-ferences nor whether these values indicate signif-icant differences in fit of the data to the respec-tive models. It is assumed that finding similarfit values likely translates into not finding signif-icant differences. The reverse however need notbe true: finding a model that systematically fitsbetter than another model need

5. Tables

13

Partition 1 Partition 2 Partition 3 Partition 4“C estimated” and “C fixed at 0” 7.00 8.00 4.00 10.00“C fixed at 1” and “C fixed at 0” 6.00 7.00 4.00 11.00“C estimated” and “C fixed at 1” 65.00 63.00 67.00 66.00

Table 2: Percentage negative values of differences between cross-validated fit of individually optimal models (partitions1–4) car × car


Table 3: Percentage negative values of differences between cross-validated fit of individually optimal models (partitions1–4) car × car (smoothed time)


Table 4: Percentage negative values of differences between cross-validated fit of individually optimal models (partitions1–4) bicycle × car


Table 5: Percentage negative values of differences between cross-validated fit of individually optimal models (partitions1–4) bicycle × car (smoothed time)

14

(a) Transport mode victim driver “Bicycle”

1 2 3 4Log-linear −35799.940 −35679.299 280023.213 274.616Log-linear-C −35794.431 −35666.450 252362.521 246.576Smooth −35736.805 −35526.798 75726.310 72.011Smooth-C −35736.932 −35531.810 74813.500 71.156Smooth Cohort −35745.389 −35524.692 90206.552 85.876Smooth Cohort-C −35746.228 −35528.655 90344.299 86.032Smooth No Mobility −35754.935 −35502.050 72289.671 68.719

(b) Transport mode victim driver “Car”

1 2 3 4Log-linear −30593.271 −30498.505 94334.309 93.544Log-linear-C −30595.298 −30497.543 92288.859 90.866Smooth −30584.287 −30467.916 73135.764 68.491Smooth-C −30586.193 −30468.409 72772.415 68.168Smooth Cohort −30589.505 −30512.050 94490.698 93.784Smooth Cohort-C −30594.576 −30518.321 92823.005 91.675Smooth No Mobility −30608.460 −30436.176 73487.170 66.668

Table 6: Statistical fit information for various models, 1=Cross-validated log-likelihood, 2=Overall log-likelihood, 3=Or-dinary least squares, 4=Weighted least squares


p1(ρ(1))

p1(ρ(2))

p2(ρ(1))

p2(ρ(2))

Log-linear 2.745 2.188 0.000 0.000Log-linear-C 2.614 3.654 0.000 0.000Smooth 2.806 2.999 0.000 1.068Smooth-C 2.554 2.380 0.000 1.108Smooth Cohort 2.835 10.459 0.000 0.844Smooth Cohort-C 2.602 9.512 0.000 0.835Smooth No Mobility 3.934 6.158 0.010 1.218


p1(ρ(1))

p1(ρ(2))

p2(ρ(1))

p2(ρ(2))

Log-linear 0.253 0.651 0.000 0.000Log-linear-C 0.285 0.605 0.000 0.000Smooth 0.162 0.470 0.058 0.000Smooth-C 0.184 0.341 0.064 0.000Smooth Cohort 0.185 0.489 0.000 0.000Smooth Cohort-C 0.202 0.234 0.000 0.000Smooth No Mobility 0.792 2.405 0.034 0.007

Table 7: Penalty information for various models, p1(ρ(1)

)=Age year penalty victim driver, p1

(ρ(2)

)=Age year penalty

opponent driver, p2(ρ(1)

)=Calendar year penalty victim driver, p2

(ρ(2)

)=Calendar year penalty opponent driver

15


λ11 λ21 λ12 λ22Log-linear 1.971 1.891 - -Log-linear-C 1.975 1.506 - -Smooth 1.967 2.203 28.849 3.116Smooth-C 2.011 2.412 23.112 3.051Smooth Cohort 2.287 2.171 25.156 3.435Smooth Cohort-C 2.296 2.302 23.695 3.426Smooth No Mobility 1.963 1.673 6.626 3.112


λ11 λ21 λ12 λ22Log-linear 3.828 2.968 - -Log-linear-C 3.709 3.030 - -Smooth 4.682 3.242 4.998 23.642Smooth-C 4.518 3.514 4.945 23.061Smooth Cohort 5.317 3.520 23.941 26.001Smooth Cohort-C 5.069 4.398 25.463 20.836Smooth No Mobility 3.295 2.189 5.774 6.989

Table 8: Logarithms of penalty parameters for various models, λ11: Age year penalty victim driver, λ21: Age yearpenalty opponent driver, λ12: Calendar year penalty victim driver, λ22: Calendar year penalty opponent driver

(a) Parameter “Coefficient mobility victimdriver”

Bicycle CarLog-linear - -Log-linear-C 0.836 0.862Smooth - -Smooth-C 0.804 0.939Smooth Cohort - -Smooth Cohort-C 0.804 0.896Smooth No Mobility - -

(b) Parameter “Coefficient mobility opponentdriver”

Bicycle CarLog-linear - -Log-linear-C 0.497 1.002Smooth - -Smooth-C 1.359 1.084Smooth Cohort - -Smooth Cohort-C 1.243 0.992Smooth No Mobility - -

Table 9: Cvalues

Bicycle CarLog-linear −35799.940 −30593.271Log-linear-C −35794.431 −30595.298Smooth −35736.805 −30584.287Smooth-C −35736.932 −30586.193Smooth Cohort −35745.389 −30589.505Smooth Cohort-C −35746.228 −30594.576Smooth No Mobility −35754.935 −30608.460

Table 10: PenaltyFit

16

Bicycle CarLog-linear 2.745 0.253Log-linear-C 2.614 0.285Smooth 2.806 0.162Smooth-C 2.554 0.184Smooth Cohort 2.835 0.185Smooth Cohort-C 2.602 0.202Smooth No Mobility 3.934 0.792

Table 11: agePenalty1


Table 12: agePenalty2


Table 13: riskFit


Table 14: ssq

17


Table 15: wsq


Table 16: yearPenalty1


Table 17: yearPenalty2

18

6. Conclusion

References

Abramowitz, M., Stegun, I.A., 1968. Handbook of Mathe-matical Functions with Formulas, Graphs, and Mathe-matical Tables. Dover, New York. Fifth Dover printing.

Davidse, R.J., 2007. Assisting the older driver. Ph.D. the-sis. Rijksuniversiteit Groningen. Groningen, the Nether-lands.

Efron, B., Tibshirani, R., 1993. An Introduction to theBootstrap. Monographs on statistics and applied prob-ability, Chapman & Hall.

Gennarelli, T.A., Wodzin, E. (Eds.), 2005. AbbreviatedInjury Scale (AIS) 2005. Association for the Advance-ment of Automotive Medicine AAAM, Barrington, IL.URL: http://www.aaam.org/.

Green, P.J., Silverman, B.W., 1994. Nonparametric regres-sion and generalized linear models: a roughness penaltyapproach. Chapman & Hall/CRC, Boca Raton, Fl.

Hastie, T., Tibshirani, R., Friedman, J., 2008. The ele-ments of statistical learning: data mining, inference andprediction. 2 ed., Springer. URL: http://www-stat.stanford.edu/~tibs/ElemStatLearn/.

Hastie, T.J., Tibshirani, R.J., 1990. Generalized AdditiveModels. Monographs on Statistics & Applied Probabil-ity, Chapman & Hall/CRC, Boca Raton, Fl.

Luenberger, D.G., 1984. Linear and nonlinear program-ming. Second ed., Addison-Wesley, Reading, Mas-sachusetts.

Maycock, G., Lockwood, C.R., Lester, J.F., 1991.The accident liability of car drivers. TechnicalReport RR315. Transport Research Laboratory.Wokingham, United Kingdom. URL: http://www.

trl.co.uk/online_store/reports_publications/

trl_reports/cat_road_user_safety/report_the_

accident_liability_of_car_drivers.htm.van Norden, Y., Bijleveld, F.D., Stipdonk, H.L., 2010.

Beschrijving van een verkennend model voor de ver-keersveiligheid. Technical Report R-2010-34. SWOV.Leidschendam, the Netherlands. In Dutch.

Stipdonk, H., Bijleveld, F., van Norden, Y., Commandeur,J., 2013. Analysing the development of road safety us-ing demographic data. Accident Analysis & Prevention60, 435—444. URL: http://www.sciencedirect.com/science/article/pii/S0001457512002928, doi:10.1016/j.aap.2012.08.005.

Vlakveld, W.P., 2011. Hazard anticipation of youngnovice drivers. Ph.D. thesis. Rijksuniversiteit Gronin-gen. Groningen, the Netherlands.

19

Date post:	11-Nov-2023
Category:	Documents
Upload:	swov
View:	0 times
Download:	0 times

Analysing the development of road safety using demographic data

Documents