Last changed by: bijleveld, svn revision: 1055, date: 2013-12-16 17:07:41 +0100 (Mon, 16 Dec 2013)
Large scale disaggregation by age and conflict type to assess the impact of
past and future demographic changes on road safety in the Netherlands
Frits Bijlevelda,b,∗, Yvette van Nordenc, Henk Stipdonka
aSWOV Institute for Road Safety Research, The Hague, The NetherlandsbVU University Amsterdam, The Netherlands
cErasmus MC-Daniel den Hoed Cancer Center, P.O. box 5201, 3008 AE Rotterdam, The Netherlands
Abstract
In this paper ...
Keywords: Disaggregation, Road safety, Exposure to risk, Injury risk, the Netherlands, Age, Crashtype, Distance travelled, Population
1. Introduction
This paper presents details of an approach tolarge scale disaggregation of road crash data byage and conflict type that is implemented in alonger term road safety forecasting study in theNetherlands (van Norden et al., 2010).
One of the presumptions of the approach isthat as a result of changes over time in the demo-graphics of a country the age distribution of trafficparticipants (slowly) changed over time (Stipdonket al., 2013). As the population in a country ages,relatively more elderly individuals may take partin travel. This in turn may result in relativelymore elderly drivers which is likely to be associ-ated more travel by the demographic group of el-derly drivers. By the same token it is likely thatthe share of travel by younger drivers decreasesover time.
As younger drivers are considered to be moredangerous than older (assumed to be more expe-rienced) drivers (Maycock et al., 1991; Vlakveld,2011, Chapter 1, Figure 1.3), a future reductionof the share of young drivers in travel may resultin an improvement of safety for all traffic partic-ipants.
∗Corresponding author
Conversely, as elderly drivers are more vulner-able to harm than younger drivers once involvedin a crash (Davidse, 2007, Chapter 1), an increasein the share of travel by elderly drivers may re-sult in an increase in the share among victims ofelderly victims.
When longer term forecasts are to be made,one aspect among others to consider is whethersuch changes in demographics may affect suchforecasts.
The model presented in this paper in its coredecomposes the development of the expected num-ber of victims of each crash type in the productof two developments: (1) one dependent on theage of the driver of the victim (thus not neces-sarily the age of the victim) and his or her modeof transportation and (2) one dependent on theage of the driver of the vehicle the victim initiallycollides with and his or her mode of transporta-tion. In the process, the model allows for flexibleoptions to assess the road safety development ofgroups of road users of interest for policy makers.
In their study van Norden et al. (2010) presenta forecasting approach for two types of victims:the number of fatalities and the number of victimsin road crashes requiring inpatient care having amaximum score on the Abbreviated Injury Scale(Gennarelli and Wodzin, 2005) of 2 or above. For
Preprint submitted to Elsevier December 17, 2013
both fatalities and thus defined seriously injured,the victim counts are further disaggregated bycrash type. The forecast horizon in van Nordenet al. (2010) was 2020.
This paper presents details of that approachapplied to seriously injured involved in crasheswith cars where the victim was traveling by caror by bicycle. A less extensive disaggregation isused in van Norden et al. (2010) for other crashtypes and where fatalities are considered, resultsof which are not reported here.
Age, expressed in age year (denoting the ageof a driver at the time of the crash), and time,expressed in calendar year (denoting the year inwhich a crash took place) are distinguished. Bothquantities are rounded to the nearest lower inte-ger. As the age in years of both the driver ofthe victim and the driver of the colliding vehicleare considered separately, and for each combina-tion the development of the number of victimsis to be modeled, a substantial number of sub-models is to be identified, involving many param-eters. It is therefore of paramount importance tocontrol the (equivalent, effective) degrees of free-dom in the model. In order to do this the meth-ods used in this paper are smoothing techniquesbased on penalized likelihood using (k-fold) cross-validation (see, for instance Green and Silverman,1994; Hastie et al., 2008; Efron and Tibshirani,1993; Hastie and Tibshirani, 1990).
This paper is organized as follows...
2. Method
2.1. Introduction
This paper presents the details of models forthe number of inpatient victims resulting fromcar-car and bicycle-car crashes used in van Nordenet al. (2010). A model is developed separately forboth crash types that, decomposes the predictednumber of victims
• by the age i of the driver of the victim and
• the age j of the driver of the colliding vehi-cle,
in two components depending either on the age ofthe driver of the victim or on the age of the driverof the vehicle the victim collides with. These com-ponents are in principle not considered constantover time.
Assuming the disaggregate number of victimsof a certain crash type in year t where the ageof the driver of the victim is i and the age ofthe colliding driver is j to be denoted by ntij, weloosely obtain:
ntij = A(1)ti × A
(2)tj × Etij, (1)
where A(1)ti is defined as the contribution due to
the driver of the victim (with age i) in year t and
A(2)tj the contribution due to the driver of the vehi-
cle collided with (with age j). Etij is an appropri-ately defined (multiplicative) disturbance term.In our application t in (1) ranges from 1995 to2009, while 0 ≤ i, j < 100. A log linear ap-proach is taken for (1), further decomposing A
(1)ti
and A(2)tj into a part (M
(1)ti , respectively M
(2)tj ) re-
lated to traffic volume (distance travelled) and a
part(
log(ρ(1)ti
), log
(ρ(2)tj
))related to a general-
ized risk. We thus obtain
log (ntij) = c1 × log(M
(1)ti
)+ log
(ρ(1)ti
)+
c2 × log(M
(2)tj
)+ log
(ρ(2)tj
)+ εtij (2)
where εtij is an appropriately defined disturbanceterm, which is further omitted where possible forbrevity. c1 and c2 are regression coefficients. Thedata used for M
(1)ti and M
(2)tj are obtained from
[cite: ‘van-norden-yrs-of-zo-anders-safetynet-d710?’ here], where a smoothing technique basedon travel survey data is used. The original travelsurvey data are obtained from the Dutch nationaltravel survey collected by the ministry of trans-port and Statistics Netherlands, ([cite: ‘SWOV/NTS’here][cite: ‘SWOVwebsite’ here]). The esti-
matesM(1)ti andM
(2)tj are assumed fixed and known
in this study.The central task in this project is to identify
and quantify logarithms of ρ(1)ti and ρ
(2)tj , both for
bicycle-car crashes and car-car crashes. The log-arithms are modeled in order to maintain posi-tiveness of the resulting estimates of ρ
(1)ti and ρ
(2)tj .
2
ρ(1)ti can roughly be considered the function that
describes the age dependency of the risk to ‘pro-duce’ a victim of the driver of the victim while ρ
(2)tj
could roughly be considered the function that de-scribes the age dependency of the risk to becomeinvolved in a crash as the colliding party.
The main result presented in this paper is howsmooth estimates of ρ
(1)ti and ρ
(2)tj where obtained
for victims of bicycle-car and car-car crashes, andwhat in these cases estimates of ρ
(1)ti and ρ
(2)tj are.
It is of particular interest to compare the shapes ofρ(2)tj for bicycle-car crashes to its counterpart for
car-car crashes, as both could be interpreted asthe harmfulness of – in a sense the same – collidingdriver.
To our knowledge there is no well-known can-didate parametric function (of time and age) for
either ρ(1)ti or ρ
(2)tj . Therefore a non parametric re-
gression approach is taken. The approach takeneffectively means that one parameter for each ρ
(1)ti
and one for each ρ(2)tj , t = 1995, . . . , 2009, i, j =
0, . . . , 99 needs to be estimated. We however as-sume the the shapes of ρ
(1)ti and ρ
(2)tj to be gener-
ally smooth, as values for (calendar and age) yearsclose to each other are likely to be similar.
Figure 1 presents a diagram with potential di-rections along which it can be argued that valuesof ρ
(1)ti and ρ
(2)tj might be close to each other. The
vertical arrows (constant calendar years)suggest atendency that the values should not differ muchbetween ages for each calendar year. It is alsopossible that if two neighboring values differ, thatthe next two values may differ by approximatelythe same amount. It is therefore assumed thatfor both ρ
(1)ti and ρ
(2)tj , a smooth line can be drawn
for each t through the age years. One plausibleexception might be a break at the (licensing) ageof 18 for car drivers. Such breaks however canbe modeled explicitly, which is done in this studywhere relevant. The horizontal lines (constant ageyear) indicate potential similarity between age-goups over subsequent years. The dashed diago-nal arrows (constant year of birth) indicate a pos-sible similarity within cohorts. It is assumed thatρ(1) and ρ(2) are smooth in these directions.
Age 18
Hin 1999L
Age 19
Hin 1999L
Age 20
Hin 1999L
Age 18
Hin 2000L
Age 19
Hin 2000L
Age 20
Hin 2000L
Figure 1: Schematic diagram of relations between values
of ρ(1)ti or ρ
(2)tj . Only a small part is shown. Vertical ar-
rows symbolize relations between age groups. Horizontalarrows symbolize relations between subsequent calendaryears. Diagonal (dashed) arrows symbolize relations be-tween groups with the same year of birth (birth cohort).
In this study, the smoothness of ρ(1) and ρ(2) ismeasured separately using the smoothness alongthe calendar year (time) direction and in one casealong the (vertical) age year direction and in an-other case along the (diagonal) cohort direction.
Taking a non parametric approach howeverhas substantial bearing on estimating the coef-ficients c1 and c2 in (2). Without any criteria inaddition to the fit to the data, for each ρ(1).. andc1, a complementary ρ̃(1).. (and c̃1 ≡ 0) exists thatsatisfies
ρ̃(1)ti = c1 × log
(M
(1)ti
)+ ρ
(1)ti ,
which fits the data equally well without the needfor additional traffic volume data. A similar ar-gument can be held for ρ(2).. and c2. If relevant,use of traffic volume data however will affect thesmoothness of both ρ(1).. and ρ(2).. , and will affectfurther properties. We will return to this issuebelow[where?].
2.2. Estimation approach
Many parameters are needed for a completelyfree-form shape of ρ(1).. and ρ(2).. , one for each value.
3
It is therefore essential to control the (equiva-lent, effective) degrees of freedom in the model.In this study this is attempted using an heuristicapproach along the lines of penalized smoothingtechniques combined with (k-fold) cross-validation(see Hastie et al., 2008; Green and Silverman,1994).
The (k-fold) approach taken in this paper isas follows: instead of optimizing the likelihoodbased on (2) only, the likelihood of (2) is penal-ized for the roughness of both ρ(1).. and ρ(2).. usingroughness penalty functions: (schematically)
f(θ) = −log likelihood of (2)+
penalty(ρ(1).. ) + penalty(ρ(2).. ), (3)
where it is assumed that θ contains all parametersin the right hand side of the equation. The rough-ness penalties penalty(ρ(1).. ) and penalty(ρ(2).. ) aredecomposed into two further components: one pe-nalizing the roughness along the ‘year’ axis (pyear),and one along the age (or birth year) axis (page).This can be formulated as follows (assuming thecoefficients λij > 0, detailed below):
penalty(ρ(1)) =λ11∑
t∈years
∑i∈ages
pyear
(ρ(1)ti
)+
λ12∑
t∈years
∑i∈ages
page
(ρ(1)ti
), (4)
penalty(ρ(2)) =λ21∑
t∈years
∑j∈ages
pyear
(ρ(2)tj
)+
λ22∑
t∈years
∑j∈ages
page
(ρ(2)tj
). (5)
Both pyear and page are defined as the square ofthe unit-length (h = 1 in (25.3.23) of Abramowitzand Stegun (1968)) numerical estimate of the sec-ond order derivative of ρ(1).. and ρ(2).. along the yearand age axis respectively, and (apart from a scalefactor, as actually h > 1) along the diagonal inthe cohort case. page is thus defined as:
page(ρtk) =((ρt (k+1) − ρtk
)−(ρtk − ρt (k−1)
))2=(ρt (k+1) − 2 ρtk + ρt (k−1)
)2.
In case of the birth year cohort approach, page isdefined as
page(ρtk) =(ρ(t+1) (k+1) − 2 ρtk + ρ(t−1) (k−1)
)2,
while for the calendar year case, pyear is defined as
pyear(ρtk) =(ρ(t+1) k − 2 ρtk + ρ(t−1) k
)2.
For brevity, using
p1(ρ) =∑
t∈years
∑i∈ages
pyear (ρti)
and
p2(ρ) =∑
t∈years
∑i∈ages
page (ρti)
we summarize (3), (4) and (5) as:
f(θ) = −log likelihood of (2)+
λ11 p1(ρ(1).. ) + λ12 p2(ρ
(1).. )+
λ21 p1(ρ(2).. ) + λ22 p2(ρ
(2).. ). (6)
Mark that if λ11 = λ12 = λ21 = λ22 = 0, minimiz-ing (6) for θ yields an overfitted, almost saturatedmodel. On the other hand, minimizing (6) for θwith λ11, λ12, λ21 and λ22 sufficiently large yieldsan approach similar to a classic approach to solv-ing a constrained minimization problem (Chap-ter 12, Luenberger, 1984). That approach can beused to maximize the log likelihood of (2) whilesatisfying the constraints p1(ρ
(1).. ) = 0, p2(ρ
(1).. ) =
0, p1(ρ(2).. ) = 0 and p2(ρ
(2).. ) = 0 (Luenberger, 1984,
Chapter 12). The relevance of this finding for thecurrent analysis is that, if these constraints aremet, ρ(1).. and ρ(2).. are effectively two dimensional(flat) planes. Thus, although many parametersare used, the model degrees of freedom are equiv-alent to the degrees of freedom of a model that ex-plicitely models a plane for each. Practically, withfinite λ11, λ12, λ21 and λ22, the equivalent degreesof freedom is somewhere in between. [We havenot developed an estimate of the effectivedegrees of freedom yet.]
A further consequence is that this approachwill not penalize the addition of a linear functionto either ρ(1).. or ρ(2).. . As a result, the penalty onρ(.).. is identical to the penalty on ρ(.).. + (a× t+ b).It is therefore from (2) immediately clear that
some standardization is required as ρ(1)ti + ρ
(2)tj =
(ρ(1)ti −(a×t+b))+(ρ
(2)tj +(a×t+b)) for any linear
4
function of time a×t+b. This means that only rel-ative tendencies can be identified within ρ
(1)ti and
separately within ρ(2)tj . To solve this identification
problem, ρ(2).. for the forty year old age group is
fixed ρ(2)t40 ≡ 0.
2.3. Estimation of λ11, λ12, λ21 and λ22
In Section 2.2 it is described how the log like-lihood of (2) can be optimized while to some ex-tent controlling the smoothness of ρ(1).. and ρ(2)..
using the parameters λ11, λ12, λ21 and λ22. Thissection describes how values for these parametersare estimated. Estimating λ11, λ12, λ21 and λ22alongside the parameters in ρ(1).. and ρ(2).. will re-sult in an overfitted model. The remedy chosenin this study is to fit the model (with given λ11,λ12, λ21 and λ22) to part of the data, and judgethe fit based on how well the model explains therest of the data.
Practically, the data are ordered by time andage. All combinations of time and age are con-sidered and are randomly partitioned in 4 par-titions1, which where then kept throughout theproject. Each time and age combination (pair)appears in exactly one partition.
An example of a random partition is in Fig-ure 2. Figure 2 contains 8 panes in two columns.All 8 panes have 10 × 10 squares. Concentrat-ing on the right hand column, we can assume thecolor of each square to represent the presence ofa year × age pair included in that pane: if thesquare is black the pair is present, if the square iswhite, the pair is not present. The partition canbe created by randomly distributing the pairs overthe four panes.
On the left hand side of Figure 2 the com-plement of the partition on the right hand side isgiven. For each partition (row) the model is fittedto the data associated with the black pixels on theleft hand side using given (thus fixed) values forλ11, λ12, λ21 and λ22. The results are then used topredict the data associated with the black pixels
1Also 20, 100 and in some cases 200 partitions havebeen used to assess effect on the solution of the choice ofthe number of partitions. Results of these analyses are notreported here.
Figure 2: Estimation scheme using the random partition.The four rows are associated with one random partition(right hand panel). In the left hand panel the complementof that partition is displayed. For each partition the modelis fitted to the data points marked black on the left handside. The results are used to predict the data marked blackon the right hand side.
5
on the right hand side. As all data are includedin exactly one partition, a prediction of all datapoints is thus achieved (the 4 partitions that aremutually exclusive, and collectively exhaustive).The overall likelihood of all victim counts basedon Poisson likelihood given these combined pre-dictions can be calculated. This likelihood is con-sidered a function of λ11, λ12, λ21 and λ22. Thislikelihood is maximized for these parameters andthe maximizing values of λ11, λ12, λ21 and λ22 arethereafter used in the same procedure detailed inSection 2.2, but now with all data (no partitions),to obtain estimates of ρ(1).. and ρ(2).. . Results of thevalues obtained for the models considered can befound in Table 6. The logarithms of the param-eters λ11, λ12, λ21 and λ22 obtained through thisprocedure can be found in Table 8.
3. Application
3.1. Models considered
Eight different models are distinguished in thisstudy: based on how traffic volume is considered,on which manner age dependency is modeled andwhether or not a log-linear time trend is consid-ered in the risk components:
1. Log-linear : Smooth along the age axis. Log-linear along the time axis. Coefficients for traf-fic volume c1 and c2 (in (2)) fixed at 1.
2. Log-linear-C : Smooth along the age axis. Log-linear along the time axis. Coefficients for traf-fic volume c1 and c2 estimated.
3. Smooth Smooth along the age axis and thetime axis. Coefficients for traffic volume c1 andc2 fixed at 1.
4. Smooth-C Smooth along the age axis and thetime axis. Coefficients for traffic volume c1 andc2 estimated.
5. Log-linear No Mobility Smooth along the ageaxis. Log-linear along the time axis. Coeffi-cients for traffic volume c1 and c2 fixed at 0.
6. Smooth No Mobility Smooth along the age axisand the time axis. Coefficients for traffic vol-ume c1 and c2 fixed at 0.
Initially the cohort models were also consid-ered:
a. Smooth Cohort Smooth along the birth yearcohort axis and the time axis. Coefficients fortraffic volume fixed at 1.
b. Smooth Cohort-C Smooth along the birth yearcohort axis and the time axis. Coefficients fortraffic volume estimated.
Although is is possible to model the log-linearmodels practically by implementing a very largevalue for λ12 and λ22, the linear form is explicitlyimplemented in ρ(1) and ρ(2).
Visual inspection of the ρ(1).. and ρ(2).. surfaces(Figures 3, 5, 4, and 6) reveals that, at least forcar×car and bicycle×car crashes, not much canbe gained from the cohort approach (items a andb above). It appears that the age of the driverdetermines the ‘risk’ to a greater extent than theyear of birth of the driver. Therefore, analysis isfocussed on the remaining models.
3.2. Assessment of the relevance of traffic volumedata
3.2.1. Visual inspection
Inspecting Figures 3, 5, 4, and 6 reveals thatas far as the shape of ρ(1) and ρ(2) are concerned,the solutions for c1 and c2 estimated compared toc1 ≡ 1 and c2 ≡ 1 are very similar, whereas c1 ≡ 0and c2 ≡ 0 yield substantially different results.
It is clear from Figures 3(b), 3(d), 4(b), 4(d),5(b) and 5(d) (right had side panels) that youngdrivers pose a risk, regardless whether they collidewith bicycles or cars. Figures 3(f), 4(f) and 5(f)reveal that, once traffic volume data is not avail-able, older drivers appear to have a greater share.This is caused by the fact that in these mod-els no correction for exposure is made. Similarphenomena can be found inspecting Figures 3(a),3(c), 3(e), 4(a), 4(c) and 4(e). The fact that inFigures 3(e) and 4(e) the younger bicyclists havesuch a high peak where this effect is not present inthe other figures, is most likely also due to expo-sure. Finally it seems visible that the elderly arealso at risk, both as a victim, and as a collidingdriver. However, data is sparse for elderly driver,in particular for ages 80+, results for which aretherefore suppressed in the figures.
6
(a) Joint ρ(1), c1 estimated. (b) Joint ρ(2), c2 estimated.
(c) Joint ρ(1), c1 ≡ 1. (d) Joint ρ(2), c2 ≡ 1.
(e) Joint ρ(1), c1 ≡ 0. (f) Joint ρ(2), c2 ≡ 0.
Figure 3: Joint surfaceplots (100 samples) for crashtype “bicycle × car”(smooth time).
7
(a) Joint ρ(1), c1 estimated. (b) Joint ρ(2), c2 estimated.
(c) Joint ρ(1), c1 ≡ 1. (d) Joint ρ(2), c2 ≡ 1.
(e) Joint ρ(1), c1 ≡ 0. (f) Joint ρ(2), c2 ≡ 0.
Figure 4: Joint surfaceplots (100 samples) for crashtype “bicycle × car”(log-linear time).
8
(a) Joint ρ(1), c1 estimated. (b) Joint ρ(2), c2 estimated.
(c) Joint ρ(1), c1 ≡ 1. (d) Joint ρ(2), c2 ≡ 1.
(e) Joint ρ(1), c1 ≡ 0. (f) Joint ρ(2), c2 ≡ 0.
Figure 5: Joint surfaceplots (100 samples) for crashtype “car × car”(smooth time).
9
(a) Joint ρ(1), c1 estimated. (b) Joint ρ(2), c2 estimated.
(c) Joint ρ(1), c1 ≡ 1. (d) Joint ρ(2), c2 ≡ 1.
(e) Joint ρ(1) , c1 ≡ 0. (f) Joint ρ(2), c2 ≡ 0.
Figure 6: Joint surfaceplots (100 samples) for crashtype “car × car”(log-linear time).
10
3.2.2. Quantitative measures
The general approach taken in this study is topartition the data in (n = 4) partitions (see Fig-ure 2) where each partition is predicted out-of-sample by a model fitted to the rest of the data.In order to assess the relative merits of the modelsconsidered, the predictive performance (in termsof likelihood for these partitions) is compared be-tween models. Because the random nature of thepartitioning scheme used potentially affects theresults, comparisons are made based on 100 re-runs of the partitioning scheme. This results in100 × 4 = 400 partitions based on which predic-tive performance is compared.
To compare two models, pairwise comparisonsof the log-likelihood of the partition given the pre-dictions are made. In Table 1 the following com-parisons are made:
Bla Bla Blathis approach was applied a hundred times (a
hundred samples): a hundred times all data waspartitioned in four partitions.
The relative performance of models is deter-mined by comparing the predictive performanceon each of the 100 × 4 = 400 partitions.
Practically, the difference in likelihood (basedon the Poisson distribution)
by calculating the log-likelihood for each modelto be compared.
For each sample, the data of four partitions arepredicted out of sample using the remaining data.In the following models based on estimated c1 andc2, c1 ≡ 1 and c2 ≡ 1 and well as c1 ≡ 0 and c2 ≡0 are compared pairwise by first calculating thedifference in (log)likelihood between the out-of-sample predictions of the models to be compared.
the data was pIn Table 1 results of a pairwise comparison of
the cross-validated fit is given.For each comparison (listed in the left-hand
column). The general approach taken is to parti-tion the data in (n = 4) partitions (see Figure 2)where each partition is predicted out-of-sample bya model fitted to the rest of the data. In total ahundred times the data partitions are made
3.3. Discussion
One property to consider of the method usedis the fact that the partition is randomly chosen.This means that the results found in Table 6 andTable 8 may be different when a different butequally possible partition is used. For that rea-son part of the analysis is performed using nineother random partitions. See Section ??.
11
(a) bicycle × car
Mean Median 2.5% 25% 75% 97.5%“C estimated” and “C fixed at 0” 16.31 8.85 −0.66 8.85 23.97 36.12“C fixed at 1” and “C fixed at 0” 11.23 0.53 −13.93 0.53 22.34 42.26“C estimated” and “C fixed at 1” 5.09 1.53 −5.68 1.53 9.32 14.21
(b) bicycle × car (smoothed time)
Mean Median 2.5% 25% 75% 97.5%“C estimated” and “C fixed at 0” 23.33 13.78 −0.59 13.78 33.26 53.05“C fixed at 1” and “C fixed at 0” 22.31 11.43 −6.06 11.43 32.50 59.24“C estimated” and “C fixed at 1” 1.02 −2.44 −14.08 −2.44 4.70 20.50
(c) car × car
Mean Median 2.5% 25% 75% 97.5%“C estimated” and “C fixed at 0” 25.54 15.41 −3.28 15.41 37.02 55.11“C fixed at 1” and “C fixed at 0” 26.91 14.47 −1.91 14.47 38.83 63.69“C estimated” and “C fixed at 1” −1.37 −2.52 −6.91 −2.52 0.36 2.13
(d) car × car (smoothed time)
Mean Median 2.5% 25% 75% 97.5%“C estimated” and “C fixed at 0” 18.17 9.67 −5.54 9.67 27.53 43.63“C fixed at 1” and “C fixed at 0” 19.72 9.55 −5.07 9.55 29.94 50.38“C estimated” and “C fixed at 1” −1.55 −3.84 −11.32 −3.84 −0.20 17.20
Table 1: Statistics on pairwise differences in cross-validated log-likelihood over all partitions and all samples, usingpenalty values optimized at for each estimate.
1995 2000 2005 20100
500
1000
1500
2000
Time
Num
ber
ofvi
ctim
s
(a) vicCarCarplot.pdf.
1995 2000 2005 20100
200
400
600
800
1000
1200
1400
Time
Num
ber
ofvi
ctim
s
(b) vicBicycleCarplotSamples.pdf
Figure 7: Aggregate development
12
4. Discussion
Bla Bla BlaWith respect to the usability, the following re-
sults have emerged. Firstly, the whole estima-tion procedure is computationally very extensive.With current computer systems computation maytake a prolonged time. Future computer technol-ogy is likely to lessen this issue.
Since the models have been implemented, somelimited further analysis is performed. This analy-sis does not constitute a complete statistical anal-ysis, which would require further research and im-plementation efforts. The current analysis is in-tended as an exploration into the possibilities offurther steps in this direction: stop here, developa reduced model based on off-the-shelf techniquesor perform a complete statistical analysis.
Based on the results found in Tables 2–5 it isconcluded that the car × car model appears to bethe best among the ones considered, This howeverneed not be the case for the bicycle × car model.From Tables 2–3 it can be inferred that the model
regardless of whether a smooth time develop-ment is assumed.
The reader should be warned that this dis-cussion concerns the presence of differences in fitvalues, not considering the magnitude of these dif-ferences nor whether these values indicate signif-icant differences in fit of the data to the respec-tive models. It is assumed that finding similarfit values likely translates into not finding signif-icant differences. The reverse however need notbe true: finding a model that systematically fitsbetter than another model need
5. Tables
13
Partition 1 Partition 2 Partition 3 Partition 4“C estimated” and “C fixed at 0” 7.00 8.00 4.00 10.00“C fixed at 1” and “C fixed at 0” 6.00 7.00 4.00 11.00“C estimated” and “C fixed at 1” 65.00 63.00 67.00 66.00
Table 2: Percentage negative values of differences between cross-validated fit of individually optimal models (partitions1–4) car × car
Partition 1 Partition 2 Partition 3 Partition 4“C estimated” and “C fixed at 0” 11.00 14.00 4.00 13.00“C fixed at 1” and “C fixed at 0” 12.00 11.00 4.00 11.00“C estimated” and “C fixed at 1” 78.00 82.00 78.00 74.00
Table 3: Percentage negative values of differences between cross-validated fit of individually optimal models (partitions1–4) car × car (smoothed time)
Partition 1 Partition 2 Partition 3 Partition 4“C estimated” and “C fixed at 0” 5.00 5.00 6.00 6.00“C fixed at 1” and “C fixed at 0” 18.00 19.00 28.00 28.00“C estimated” and “C fixed at 1” 26.00 19.00 17.00 16.00
Table 4: Percentage negative values of differences between cross-validated fit of individually optimal models (partitions1–4) bicycle × car
Partition 1 Partition 2 Partition 3 Partition 4“C estimated” and “C fixed at 0” 5.00 4.00 5.00 7.00“C fixed at 1” and “C fixed at 0” 8.00 5.00 7.00 16.00“C estimated” and “C fixed at 1” 42.00 37.00 40.00 39.00
Table 5: Percentage negative values of differences between cross-validated fit of individually optimal models (partitions1–4) bicycle × car (smoothed time)
14
(a) Transport mode victim driver “Bicycle”
1 2 3 4Log-linear −35799.940 −35679.299 280023.213 274.616Log-linear-C −35794.431 −35666.450 252362.521 246.576Smooth −35736.805 −35526.798 75726.310 72.011Smooth-C −35736.932 −35531.810 74813.500 71.156Smooth Cohort −35745.389 −35524.692 90206.552 85.876Smooth Cohort-C −35746.228 −35528.655 90344.299 86.032Smooth No Mobility −35754.935 −35502.050 72289.671 68.719
(b) Transport mode victim driver “Car”
1 2 3 4Log-linear −30593.271 −30498.505 94334.309 93.544Log-linear-C −30595.298 −30497.543 92288.859 90.866Smooth −30584.287 −30467.916 73135.764 68.491Smooth-C −30586.193 −30468.409 72772.415 68.168Smooth Cohort −30589.505 −30512.050 94490.698 93.784Smooth Cohort-C −30594.576 −30518.321 92823.005 91.675Smooth No Mobility −30608.460 −30436.176 73487.170 66.668
Table 6: Statistical fit information for various models, 1=Cross-validated log-likelihood, 2=Overall log-likelihood, 3=Or-dinary least squares, 4=Weighted least squares
(a) Transport mode victim driver “Bicycle”
p1(ρ(1))
p1(ρ(2))
p2(ρ(1))
p2(ρ(2))
Log-linear 2.745 2.188 0.000 0.000Log-linear-C 2.614 3.654 0.000 0.000Smooth 2.806 2.999 0.000 1.068Smooth-C 2.554 2.380 0.000 1.108Smooth Cohort 2.835 10.459 0.000 0.844Smooth Cohort-C 2.602 9.512 0.000 0.835Smooth No Mobility 3.934 6.158 0.010 1.218
(b) Transport mode victim driver “Car”
p1(ρ(1))
p1(ρ(2))
p2(ρ(1))
p2(ρ(2))
Log-linear 0.253 0.651 0.000 0.000Log-linear-C 0.285 0.605 0.000 0.000Smooth 0.162 0.470 0.058 0.000Smooth-C 0.184 0.341 0.064 0.000Smooth Cohort 0.185 0.489 0.000 0.000Smooth Cohort-C 0.202 0.234 0.000 0.000Smooth No Mobility 0.792 2.405 0.034 0.007
Table 7: Penalty information for various models, p1(ρ(1)
)=Age year penalty victim driver, p1
(ρ(2)
)=Age year penalty
opponent driver, p2(ρ(1)
)=Calendar year penalty victim driver, p2
(ρ(2)
)=Calendar year penalty opponent driver
15
(a) Transport mode victim driver “Bicycle”
λ11 λ21 λ12 λ22Log-linear 1.971 1.891 - -Log-linear-C 1.975 1.506 - -Smooth 1.967 2.203 28.849 3.116Smooth-C 2.011 2.412 23.112 3.051Smooth Cohort 2.287 2.171 25.156 3.435Smooth Cohort-C 2.296 2.302 23.695 3.426Smooth No Mobility 1.963 1.673 6.626 3.112
(b) Transport mode victim driver “Car”
λ11 λ21 λ12 λ22Log-linear 3.828 2.968 - -Log-linear-C 3.709 3.030 - -Smooth 4.682 3.242 4.998 23.642Smooth-C 4.518 3.514 4.945 23.061Smooth Cohort 5.317 3.520 23.941 26.001Smooth Cohort-C 5.069 4.398 25.463 20.836Smooth No Mobility 3.295 2.189 5.774 6.989
Table 8: Logarithms of penalty parameters for various models, λ11: Age year penalty victim driver, λ21: Age yearpenalty opponent driver, λ12: Calendar year penalty victim driver, λ22: Calendar year penalty opponent driver
(a) Parameter “Coefficient mobility victimdriver”
Bicycle CarLog-linear - -Log-linear-C 0.836 0.862Smooth - -Smooth-C 0.804 0.939Smooth Cohort - -Smooth Cohort-C 0.804 0.896Smooth No Mobility - -
(b) Parameter “Coefficient mobility opponentdriver”
Bicycle CarLog-linear - -Log-linear-C 0.497 1.002Smooth - -Smooth-C 1.359 1.084Smooth Cohort - -Smooth Cohort-C 1.243 0.992Smooth No Mobility - -
Table 9: Cvalues
Bicycle CarLog-linear −35799.940 −30593.271Log-linear-C −35794.431 −30595.298Smooth −35736.805 −30584.287Smooth-C −35736.932 −30586.193Smooth Cohort −35745.389 −30589.505Smooth Cohort-C −35746.228 −30594.576Smooth No Mobility −35754.935 −30608.460
Table 10: PenaltyFit
16
Bicycle CarLog-linear 2.745 0.253Log-linear-C 2.614 0.285Smooth 2.806 0.162Smooth-C 2.554 0.184Smooth Cohort 2.835 0.185Smooth Cohort-C 2.602 0.202Smooth No Mobility 3.934 0.792
Table 11: agePenalty1
Bicycle CarLog-linear 2.188 0.651Log-linear-C 3.654 0.605Smooth 2.999 0.470Smooth-C 2.380 0.341Smooth Cohort 10.459 0.489Smooth Cohort-C 9.512 0.234Smooth No Mobility 6.158 2.405
Table 12: agePenalty2
Bicycle CarLog-linear 35679.299 30498.505Log-linear-C 35666.450 30497.543Smooth 35526.798 30467.916Smooth-C 35531.810 30468.409Smooth Cohort 35524.692 30512.050Smooth Cohort-C 35528.655 30518.321Smooth No Mobility 35502.050 30436.176
Table 13: riskFit
Bicycle CarLog-linear 280023.213 94334.309Log-linear-C 252362.521 92288.859Smooth 75726.310 73135.764Smooth-C 74813.500 72772.415Smooth Cohort 90206.552 94490.698Smooth Cohort-C 90344.299 92823.005Smooth No Mobility 72289.671 73487.170
Table 14: ssq
17
Bicycle CarLog-linear 274.616 93.544Log-linear-C 246.576 90.866Smooth 72.011 68.491Smooth-C 71.156 68.168Smooth Cohort 85.876 93.784Smooth Cohort-C 86.032 91.675Smooth No Mobility 68.719 66.668
Table 15: wsq
Bicycle CarLog-linear 0.000 0.000Log-linear-C 0.000 0.000Smooth 0.000 0.058Smooth-C 0.000 0.064Smooth Cohort 0.000 0.000Smooth Cohort-C 0.000 0.000Smooth No Mobility 0.010 0.034
Table 16: yearPenalty1
Bicycle CarLog-linear 0.000 0.000Log-linear-C 0.000 0.000Smooth 1.068 0.000Smooth-C 1.108 0.000Smooth Cohort 0.844 0.000Smooth Cohort-C 0.835 0.000Smooth No Mobility 1.218 0.007
Table 17: yearPenalty2
18
6. Conclusion
References
Abramowitz, M., Stegun, I.A., 1968. Handbook of Mathe-matical Functions with Formulas, Graphs, and Mathe-matical Tables. Dover, New York. Fifth Dover printing.
Davidse, R.J., 2007. Assisting the older driver. Ph.D. the-sis. Rijksuniversiteit Groningen. Groningen, the Nether-lands.
Efron, B., Tibshirani, R., 1993. An Introduction to theBootstrap. Monographs on statistics and applied prob-ability, Chapman & Hall.
Gennarelli, T.A., Wodzin, E. (Eds.), 2005. AbbreviatedInjury Scale (AIS) 2005. Association for the Advance-ment of Automotive Medicine AAAM, Barrington, IL.URL: http://www.aaam.org/.
Green, P.J., Silverman, B.W., 1994. Nonparametric regres-sion and generalized linear models: a roughness penaltyapproach. Chapman & Hall/CRC, Boca Raton, Fl.
Hastie, T., Tibshirani, R., Friedman, J., 2008. The ele-ments of statistical learning: data mining, inference andprediction. 2 ed., Springer. URL: http://www-stat.stanford.edu/~tibs/ElemStatLearn/.
Hastie, T.J., Tibshirani, R.J., 1990. Generalized AdditiveModels. Monographs on Statistics & Applied Probabil-ity, Chapman & Hall/CRC, Boca Raton, Fl.
Luenberger, D.G., 1984. Linear and nonlinear program-ming. Second ed., Addison-Wesley, Reading, Mas-sachusetts.
Maycock, G., Lockwood, C.R., Lester, J.F., 1991.The accident liability of car drivers. TechnicalReport RR315. Transport Research Laboratory.Wokingham, United Kingdom. URL: http://www.
trl.co.uk/online_store/reports_publications/
trl_reports/cat_road_user_safety/report_the_
accident_liability_of_car_drivers.htm.van Norden, Y., Bijleveld, F.D., Stipdonk, H.L., 2010.
Beschrijving van een verkennend model voor de ver-keersveiligheid. Technical Report R-2010-34. SWOV.Leidschendam, the Netherlands. In Dutch.
Stipdonk, H., Bijleveld, F., van Norden, Y., Commandeur,J., 2013. Analysing the development of road safety us-ing demographic data. Accident Analysis & Prevention60, 435—444. URL: http://www.sciencedirect.com/science/article/pii/S0001457512002928, doi:10.1016/j.aap.2012.08.005.
Vlakveld, W.P., 2011. Hazard anticipation of youngnovice drivers. Ph.D. thesis. Rijksuniversiteit Gronin-gen. Groningen, the Netherlands.
19