+ All Categories
Home > Documents > Introduction - Imperial College London · Web view12 ) review the suitability of various regularity...

Introduction - Imperial College London · Web view12 ) review the suitability of various regularity...

Date post: 03-Aug-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
29
Karathodorou and Condry 1 Choosing Optimal Reliability Measures for Passenger Railways: Different Measures for Different Purposes Niovi Karathodorou and Ben Condry Railway and Transport Strategy Centre, Centre for Transport Studies, Department of Civil and Environmental Engineering, Imperial College London, London, United Kingdom SW7 2AZ ABSTRACT Reliability is one of the top factors influencing customer satisfaction with passenger rail services. It affects the level of demand for the service as passengers place a large negative value on delays. This matters to service providers, as it drives the fare revenue, and to policy makers as it influences mode share. This paper comprises a review of literature on reliability measurement in public transport, the results of a global survey of suburban rail operators and an assessment of the value of specific reliability measures. Reliability measures are typically required for three distinct purposes: internal measurements to manage the service, reporting to governments/authorities or franchisors for regulatory purposes and external reporting to customers and the media. Different measures may be optimal for each of these purposes and careful consideration is required for their definition and use. However, most railways surveyed chose their reliability measures either based on regulatory obligations or simply because these were used elsewhere. SUBMISSION DATE: 27 th July 2015 (Revised 15 th November 2015) WORD COUNT: Tables: (1) 250 Figures: (2) 500 Abstract: 156 Text: 6,068 †† Author for correspondence: Centre for Transport Studies, Imperial College London, London SW7 2AZ, UK, Tel: +44 (0)20 7594 6088, Fax: +44 (0)20 7594 5681, Email: [email protected]. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 1 2 3
Transcript
Page 1: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 1

Choosing Optimal Reliability Measures for Passenger Railways: Different Measures for Different Purposes

Niovi Karathodorou† and Ben Condry

Railway and Transport Strategy Centre, Centre for Transport Studies, Department of Civil and Environmental Engineering, Imperial College London, London, United Kingdom SW7 2AZ

ABSTRACTReliability is one of the top factors influencing customer satisfaction with passenger rail services. It affects the level of demand for the service as passengers place a large negative value on delays. This matters to service providers, as it drives the fare revenue, and to policy makers as it influences mode share. This paper comprises a review of literature on reliability measurement in public transport, the results of a global survey of suburban rail operators and an assessment of the value of specific reliability measures. Reliability measures are typically required for three distinct purposes: internal measurements to manage the service, reporting to governments/authorities or franchisors for regulatory purposes and external reporting to customers and the media. Different measures may be optimal for each of these purposes and careful consideration is required for their definition and use. However, most railways surveyed chose their reliability measures either based on regulatory obligations or simply because these were used elsewhere.

SUBMISSION DATE: 27th July 2015 (Revised 15th November 2015)

WORD COUNT:

Tables: (1) 250Figures: (2) 500Abstract: 156Text: 6,068References: 461Total: 7,435

††Author for correspondence: Centre for Transport Studies, Imperial College London, London SW7 2AZ, UK, Tel: +44 (0)20 7594 6088, Fax: +44 (0)20 7594 5681, Email: [email protected].

1

2

3

45

6789

101112131415161718192021

22

232425262728

12

Page 2: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 2

INTRODUCTIONReliability is consistently related as a top factor affecting customer satisfaction for railway passengers (Rehnström et al (1), Lombart et al (2), Morpace International Inc (3)). This is corroborated by studies that estimate the value of travel time reliability both for road and public transport travel, which generally show that travel time reliability is valued more by travellers than travel time itself (Li et al (4), Bates et al (5), De Jong et al (6), Preston et al (7), Vincent et al (8)). The review by Preston et al (7) that focusses on railway transport found that one minute of delay is valued 1.25 to 3 times as much as one minute of journey time. The standard deviation of travel time is valued up to 2.9 times as much journey time. Hence poor reliability can significantly increase the generalised cost of travel to users, leading to reduced demand for travel. This in turn reduces fare revenue as well as lowering the mode share, especially where there is strong competition from other modes including the private car.

Improving service reliability (punctuality) should thus be a key goal for railway operators. Effective measurement of punctuality is a key first step to improving punctuality. However, despite a wide literature on the measurement of reliability for road travel, reliability measurement in relation to public transport, and in particular railways, has received little attention in the academic literature. The measures used for road reliability may not be directly applicable to the railway industry for a number of reasons. Travel time reliability for road travel considers the individual traveller’s perspective, whereas in the railway sector aggregate measures (e.g. for the entire network or of specific lines) may be more suitable to enable railways to understand issues and identify where efforts for improvements should be concentrated. Car travellers can continuously adjust their departure time, whereas for railway passengers possible departure times only occur in discrete steps. Car travel only involves in-vehicle delays, whereas railway passengers can be late even if the actual journey time is not delayed (e.g. late departure). Lastly, trains follow fixed routes, whereas cars can amend their route if required.

Several studies exist that look at the suitability of different measures (e.g. Börjesson and Eliasson (9)) or propose new measures not currently employed in the public transport sector (e.g. Moussa and Stephan (10)). Few studies exist that compare and evaluate different measures and these focus on urban bus services (e.g. Currie et al (11)), and most often on high frequency services in particular (e.g. Trompet et al (12), Mazloumi et al (13)). As with car travel, results regarding reliability measurement for bus systems may not be applicable to railways due to the different service characteristics of these two types of transport systems. As an example, planning for high frequency bus services relies on headways rather than a fixed timetable (this study focuses on timetable based railway operation), something that differentiates the notion of delay in the two systems. Surveys of how public transport operators, and in particular railways, are actually measuring reliability are also lacking. Barron et al (14) offer an interesting study of reliability measurement in metro systems, but metros also often have headway based operations and hence may differ from railways that run timetable based services. Even where urban bus and metro service are operated according to timetables, frequencies are typically higher than on railway systems.

Previous studies have also overlooked that reliability measurements are not used only for internal management purposes. In addition to internal management, reliability measures are required to report to franchisors or authorities (including for receiving penalties or bonuses related to reliability) and to inform the public on service performance. They are also frequently referenced by the media, so are important in terms of the perception of the operator. Effective

123456789

101112131415161718192021222324252627282930313233343536373839404142434445

Page 3: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 3

measurement of reliability is also valuable to support the financial case for initiatives to improve performance.

This paper examines railways’ experience with reliability measurement and investigates how railways should measure reliability taking into account the various purposes for which these measurements are used: internal management, reporting to regulators and sharing information with the public. In particular, the objectives of the paper are:

To understand how railways measure reliability To understand the reasons for railways’ choice of reliability measures To provide a critical review of measures used by the railways based both on railways’

experience and the literature To provide recommendations on how reliability should be measured for different

purposes

The paper focuses on suburban railways with a timetable based operation. It is based on a survey of 13 international suburban railway operators. To our knowledge, this is the first paper to assess a range of reliability measures which focuses exclusively on the railway sector and also takes into account the purpose for which reliability measurements are being used. The term reliability has been used to refer to different aspects of service performance. In this paper, we use the term reliability or alternatively journey time reliability to incorporate all aspects relating to on-time performance of trains. However, our use of the term does not include incident occurrence and cancellations.

RELEVANT LITERATUREThis review focuses on studies that assess various reliability measures in a comparative context specifically for the public transport sector. Research looking at specific measures is referred to when discussing the results of the study later in the paper. Only a single study was identified that evaluated measures in a comparative context for the railway industry and this focuses on metro systems (Barron et al (14)). Three papers were identified examining reliability measurement for bus services. As discussed in the introduction, the context of reliability measurement in timetable based railway operations may be different to metro systems and buses and therefore not all results reviewed in this section are directly applicable to railway systems.

Barron et al (14) investigate reliability measurement in metro systems based on a survey of 22 international metro systems. The survey finds that Mean Distance Between Failures (MDBF) is the most frequently used measure; the data most commonly recorded is the number of incidents by cause. However, as the authors point out, the measure only reveals the frequency of incidents and not its impact on the service and on passengers. Using data from a specific metro the authors illustrate that although rolling stock incidents were the most frequently occurring type of incident, their impact on delays was minimal compared to other types of incidents. In such cases, MDBF can wrongly lead metros to target rolling stock incidents. The authors endorse total hours of train delay and passenger delay due to incidents as alternative measures to MDBF, but also point out that out of the 22 metros surveyed, only two were able to provide sufficient data to estimate the number of passengers affected by incidents.

Mazloumi et al (13) evaluate various measures of travel time reliability and variability for use in the public transport industry by applying them in a database of automatic vehicle location data from buses. For travel time reliability, the study considers the following measures:

123456789

10111213141516171819202122232425262728293031323334353637383940414243444546

Page 4: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 4

Buffer Index=(95 th percentileTT – meanTT )/ (meanTT ) ∙100 %, where TT denotes Travel Time

Florida Reliability Index=100 %−%of trips with travel time greater than average

Misery Index=(MeanTT of the longest 20 %of the trips – MeanTT of all trips)MeanTT of all trips

For travel time variability, the study considers the coefficient of variation and the difference between the 90th travel time percentile and the 10th travel time percentile.

The study finds no considerable difference between measures in terms of results. However, measures are found to be sensitive to the extent to which they consider outliers (e.g. if the 99th instead of the 90th percentile is travel time variability measure discussed above). The time interval over which measurements were undertaken was found to have a strong impact on results. The authors propose the use of the Buffer Index to measure travel time reliability and the use of coefficient of variation to measure travel time variability.

Trompet et al (12) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures are evaluated based on four criteria: Ease of communication, objectivity, how closely they represent the passenger experience (including if very long headway gaps are penalised) and their ability to meet data condition requirements. The analysis is illustrated using data from 12 international operators that were part of the International Bus Benchmarking Group (IBBG) at the time. The following four measures are considered:

Excess Wait Time (EWT) defined as EWT = ∑ ( AH2)2x∑ ( AH )

- ∑ (SH 2)2x∑ (SH )

where AH is the actual interval between trains (in minutes) and SH is the scheduled interval between services (in minutes).

Standard deviation of the difference between scheduled and actual headways

% of headways that deviates no more than a fixed % threshold

% of headways that deviates no more than a fixed absolute threshold

The paper illustrates that the relative performance of peers depends on the indicator chosen for the analysis. The study chooses Excess Wait Times for benchmarking purposes for IBBG members for a number of reasons. Firstly, performance rakings based on the measure are close to the mean value over all rankings from the four measures considered. The measure penalises long headways which can be major source of dissatisfaction for bus users and was also found to most closely represent the experience of all customers. Importantly for benchmarking purposes, comparable data can be obtained from the different bus operators to estimate the measure.

Currie et al (11) also critically reviews measures of bus service reliability. The authors use similar criteria as Trompet et al (12), but their evaluation includes a larger number of measures, considers both low frequency and high frequency bus services and does not focus solely on benchmarking.

1234

5

6

789

1011121314151617181920212223

24

2526272829303132333435363738394041424344

Page 5: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 5

The paper considers the following measures: % services cancelled, % services departing on-time and % services arriving on-time Excess Wait Time (EWT) Mean delay Statistical measures of variability - the coefficient of variation of headways proposed by

the ‘US Transit Capacity and Quality of Service Manual’ for high frequency services and the coefficient of variation of delays used by the UK Passenger Demand Forecasting Handbook (ATOC 2002)

The reliability buffer index used by the US Department of Transport Federal Highway Administration (FHA 2004) defined as (95th percentile – average time)/average time

Customer Journey Time Delay (CJTD) defined as the difference between expected and actual travel time from the start to finish on the bus trip

Passenger satisfaction surveys and customer complaints

The study advocates the use of EWT and CJTD. Both measures are easy to communicate and have a high customer focus. CJTD has the advantage that it takes into account both waiting time and on-board delays whereas EWT only captures delays in waiting time. CJTD is also applicable to both high and low frequency services, whereas EWT is only suitable for high frequency services. On the other hand, data collection is significantly less demanding for EWT.

METHODOLOGYThe research is based on a survey of 13 suburban railway operators from Europe, Asia, America and Australia that are part of the ISBeRG benchmarking group that is managed by the Railway and Transport Strategy Centre (RTSC) at Imperial College London. ISBeRG is an international benchmarking group for suburban rail operators that aims to identify and share best practices in a confidential environment. At the time of the study it consisted of 15 operators from the following cities around the world: Barcelona, Brisbane, Cape Town, Copenhagen, Hong Kong, London, Melbourne, Munich, New York (two operators), Oslo, San Francisco, São Paulo, Sydney and Tokyo. Information sharing within ISBeRG is covered by a confidentiality agreement and therefore study results can only be published in an anonymised form. As a result, participating railways will not be identified by name throughout the paper.

A questionnaire was sent out to all ISBeRG members in January 2015. The survey looked at railways’ experience with reliability measurement. It included questions on what measures are used by railways to measure reliability for different purposes; on the reasons for choosing the specific measures and on details regarding the specification of each measure (e.g. threshold for defining a train late). Responses were received from 14 operators. Follow-up questions were sent to clarify ambiguous items in responses. One of the operators participating in the survey has a headway based operation, whereas the remaining 13 operators have timetable based operations. The final paper is based on the responses of the 13 operators that a run a service based on timetables. The headway based operator was excluded from the analysis presented in the paper because reliability measures for headway based operation are inherently differently and cannot be directly compared to measures for timetable based operation. Suburban railways usually run a timetable based service. Railways running a frequency based operation are closer to metros and hence the reliability measures they use should be compared to those used by metro operators. To provide context, mean train trip distance in the railways considered in the paper ranges from 19

123456789

101112131415161718192021222324252627282930313233343536373839404142434445

Page 6: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 6

to 78 km; mean train trip duration ranges from 31 to 81 minutes. The mean passenger trip ranges in distance from 6 to 47 km and in duration from 14 to 52 minutes.

The next section presents and discusses the results of the survey.

FINDINGS AND DISCUSSIONTo properly evaluate reliability measures, it is important to understand the context in which they are used. Naturally, reliability measurements are primarily used for internal management to enable improvements in service performance. The extent to which different staff within the company have access to reliability results varies between railways. On one hand, some railway aim to restrict access to few staff due to the fear of media leaks; on the other, some railways are moving towards open access to detailed reliability results for all staff to enable use of the data by local managers. All railways participating in the study reported having an obligation to report on reliability to a franchisor, transport authority or government agency, although reporting requirements vary. 54% of railways also report receiving bonuses or penalties related to reliability. Most, but not all, railways share reliability results with the public on a regular basis. The frequency of reporting ranges from detailed daily updates to yearly summary reports.

A key finding of the study is that most railways choose reliability measures either based on their reporting obligations to regulators (62% of railways) or simply because these are typically used within the railway industry (38% of railways). Few railways provided a concrete reasoning for choosing specific measures. Reasons reported include:

The need for different measures for different audiences/ purposes to justify the use of multiple measures (15% of railways)

The need for measures to reflect the service as experienced by passengers, the ultimate users of the service, to justify measures estimating the impact on passengers (23% of railways)

The need for measures that are easy to calculate and simple to understand; the need to strike a balance between the work involved and results (8%)

Although few railways can adequately justify the use of specific measures, railways who do so touch some very important issues that should be considered when measuring reliability in the railway industry context:

The need for measures to describe the true passenger experience The need for measures to be easily communicated The cost involved in calculating measurements The trade-off between complexity and ease of communication

These issues will be considered when discussing the measures used by railways in the remainder of this section.

The study questionnaire asked railways to list all measures used to measure and report on journey time reliability. Results are summarised in FIGURE 1.

123456789

1011121314151617181920212223242526272829303132333435363738394041

Page 7: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 7

FIGURE 1 Journey time reliability measures used by railways.

The most frequently used measures are the % of actual trains that ran on-time (62% of railways) and the % of scheduled trains that ran on-time or were late (52% of railways). Only one railway does not use at least one of the two measures. The key difference between these two measures is that the former (% of actual trains) excludes cancelled trains from the denominator, whereas the latter (% of scheduled trains) includes these. Railways that use the % of actual rather than scheduled trains that ran on-time typically use another measure capturing cancellations such as the % of trains that were cancelled.

The key advantage of using the % of trains (either scheduled or actual) that ran on-time is the measure’s simplicity. It is easy to estimate – the data required are typically collected by railways – and also easy to communicate, including to the public and the media. However, the measure only captures how often delays occur – it does not capture the magnitude of delays or their impact (i.e. by how much trains are actually delayed and how many passengers are affected). It is recommended that the measure is used, but that its use is complemented by other measures capturing the magnitude of delays and their impact. Alternatively, the impact in terms of affected passengers may also be implicitly captured by using separate estimates for the peak and off-peak periods, as well as weekends. Disaggregating results by time period is good practice for all reliability measures, particularly for internal management, as it allows railways to identify when issues occur and to understand the impact on different market segments (e.g. commuters, leisure travellers). Disaggregation of results by time period will be discussed in more details later in the section.

Total hours of train delay or mean train delay appear to also be relative common measures in the railway industry; they are used by 23% and 15% of participating railways

123456789

1011121314151617181920212223242526

Page 8: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 8

respectively. Mean train delay does capture the mean magnitude of delays in addition to how often delays occur, but has been criticised in the literature for not capturing the distribution of delays (e.g. Börjesson and Eliasson (9)). For instance, mean delay is 1 minute both if 2% of trains are 50 min late and if 20% of trains are 5 min late. However, passengers may perceive few long delays differently to many short delays. Borjesson and Eliasson (9) use stated-preference data from Sweden to show that many short delays are preferable to fewer long delays for passengers on long-distance trains. On the other hand, the Passenger Demand Forecasting Handbook (PDFH) (15) which is used by all major organisations in the UK railway sector reviews a wide range of studies conducted in the UK and concludes that passengers respond according to the average magnitude of lateness, irrespective of the distribution. It can be concluded that whether mean delay can adequately describe the actual passenger experience may depend on local factors and hence, may vary between railways. It is recommended that railways conduct their own research to understand what matters to their passengers and adopt measures that capture the distribution of delays in more detail if necessary.

Considering the minimum and maximum delay alongside mean delay as one railway reported doing, can add to the understanding of the distribution of delays. In this study, 31% of railways also reported distinguishing between short and long delays when estimating reliability measures (e.g. estimating the % of trains that arrived within different time thresholds). Similarly, the UK rail industry uses a ‘Cancellation and Significant Lateness’ (CaSL) measure defined as the % of scheduled trains which are either cancelled (including cancelled en-route) or arrive at their destination more than 30 minutes late (16).

Moussa and Stephan (10) propose a new measure on the risk of a train delay, called Delay-At-Risk, that is based on the Value-at-Risk measure employed in financial risk management literature. Value-At-Risk is defined as the maximum potential change in the value of a portfolio of financial instruments with a given probability over a certain horizon. Delay-at-Risk summarises the train delay risk and allows the estimation of a probability distribution for different levels of delays. Although such sophisticated measures can provide a more accurate summary of delays in a railway network, their complexity makes them impractical to use in the railway industry. Their estimation requires a high level of statistical expertise and can also be costly; moreover, results are difficult to communicate, possibly not only to the public but also internally to staff as well as railway regulators. As such, they may be more suitable for use in academic or other research rather than everyday railway management.

Alternative to mean delay, a participating railway reported using Excess Wait Time (defined earlier in the paper). A disadvantage of the measure is that it captures only delays in waiting time. Its definition is also more suitable for headway based operations and/or very high frequency services.

All measures discussed to this point are operation focused in the sense that they measure the network’s performance but not their impact on passengers. Barron et al (14) distinguishes between operation-oriented and passenger-oriented measures. Operation-oriented measures assess the network’s performance compared to the scheduled level of service, whereas passenger-oriented measures aim to capture what passenger experience. Passenger-oriented measures employed by the railways surveyed for this study include the % passenger journeys that are on-time (15% of railways) and total or mean passenger delay (15% of railways). The % of passengers’ journey that are on-time is a good example of a measure that is simple to understand, meaningful to passengers and, if correctly estimated, representative of the true passenger experience. It is thus a useful measure both for internal management and external

123456789

10111213141516171819202122232425262728293031323334353637383940414243444546

Page 9: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 9

reporting. However, collecting data and accurately estimating the measure can be difficult and costly. Use of this measure, and all passenger-oriented measures, is recommended but only if these can be accurately measured.

Some railways use indirect passenger-oriented measures that weight delays or incident occurrence based on factors such as the time of day they occur, the location or the cause of the incident/delay. Such measures can provide a useful approximation of the impact of delays on passengers, but they are not an accurate representation of the passenger impact and this should always be kept in mind when they are used. Such measures may also not be suitable for public reporting as they can be difficult to explain to passengers.

Several measures typically used for highway performance, and not used by participating railways, such as the additional travel time that a traveller needs to take into account due to travel time variability to ensure they will arrive on-time (schedule delay), the probability of arrival within a given time threshold and the buffer index (see Li et al (4), Lomax (17) and FHA (18)) are also representative of the passenger perspective. Although these more sophisticated measures reflect the user experience, they are not recommended for use in the railway industry context for a number of reasons. Firstly, they may not be useful for internal management as they cannot necessarily convey where issues occur and where efforts to improve service performance should be concentrated. Moreover, they are difficult to communicate not only to the public but also to regulators and even internally to staff. The high cost of implementation is an additional drawback.

The study questionnaire specifically asked railways whether they consider any measure of travel time variability such as the standard deviation/coefficient of variation of travel time or delays. Such measures have been used to measure travel time variability in highways (see Li et al (4)) and considered for use for bus services (see Currie et al (11), Mazloumi et al (13)). Travel time variability measures capture how journey times or delays vary with time, i.e. how predictable the service is. No railway reported using a measure of variability of travel time or delays. Such measures could be useful for internal management; measures such as the coefficient of variation should not be difficult to estimate. They are not suitable for reporting to the public or even to regulators as they are not intuitive to individuals with no statistical training. Journey times and delays vary less in railway services that run to fixed timetables than for buses or cars, so such measures may not be very informative/useful in the railway industry context. This could explain why they appear not to be used.

The study questionnaire also asked railways whether they use the same metrics to report to different audiences. It considered three types of audiences: internal staff; the franchisor, transport authority or government agency; and the public. 28% of railways use the same measures to report to all audiences considered in the study. 38% of railways report fewer measures to the public than those used for other purposes or do not report at all to the public. 15% of railways report the same measures to all external audiences, – both authorities and the public, – but use additional measures for internal management. Finally, in 8% of railways different measures are used internally than measures reported to the franchisor, transport authority or government agency, although some common measures are also used for the two purposes. A subset of the common measures is also reported to the public.

In addition to the type of metric, railways should also consider where and when reliability should be measured. The survey showed that most railways consider on-time arrival at the final destination or a central station only, despite recording data for all stations. When on-time arrival at specific stations only is considered, monitoring stations which are important for passengers

123456789

10111213141516171819202122232425262728293031323334353637383940414243444546

Page 10: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 10

should be chosen. If this is not the final destination of the train, alternative monitoring stations should be chosen such central or interchange stations. If a train is late at various busy stations en route but arrives on-time at the final destination, it will be counted as on-time although many passengers may perceive it as late. This can be common as slack time may be scheduled for the last part of the journey which may be less busy if it is, for instance, in an outlying suburban section. From the passengers’ perspective, it is more important that trains are on-time at important stations rather than outlying suburban stations. An interesting alternative reported by a participating railway is using the % of train stops that are on-time rather than the % of trains that are on-time. Although the measure requires more data to be collected, most railways reported collecting data on on-time arrival at all stations so the measure should not be difficult to estimate. Since most railways collect data at all stations, reliability could also be monitored separately at each station.

Measuring reliability at important/busy stations is good practice for internal reporting. It should, however, be noted that performance measurement may appear to be worse this way, so this may not necessarily be good practice when reporting reliability to a franchisor, authority or government agency. For reporting to the public, reporting results from key stations is good practice even if results are worse; reporting better results for less important stations may make the railway appear non-transparent.

As mentioned earlier when discussing the % of trains on-time, estimating reliability measurements for different time periods can provide an insight into when most delays occur and consequently some understanding of how many passengers are affected and which market segments in particular. 77% of surveyed railways consider some disaggregation by time period (e.g. separate results for the peak and off-peak periods) for figures used in internal management. 54% also report disaggregate figures to the regulator (i.e. franchisor, transport authority or government agency). 31% of railways estimate figures specifically for weekends for internal purposes.

Some railways also report reliability results for different time periods to the public. Statistics for peak periods are typically reported separately. This can appear as a natural choice as peak periods are when most passengers travel and when most passengers are interested in arriving on-time. Moreover, peak periods are often when most delays occur due to the both high utilisation of network capacity and high passenger volumes. However, off-peak periods should not be ignored. Off-peak demand is typically more elastic, with a greater proportion of leisure and other more discretionary travellers who may choose to use another mode, or not travel at all, if the service is considered unreliable. During the peak periods, most passengers are typically commuters, who may have little alternative but to use the train service, at least in the short term. Moreover, works, which can disrupt the network and cause delay, are typically scheduled for off-peak periods and especially weekends.

Lastly, the study examined railways’ time thresholds for defining a late train. FIGURE 2 shows the frequency of use of thresholds used by railways (for railways running both suburban and intercity services, thresholds for suburban services are included). Thresholds vary from 2:59 minutes to 5:59 minutes after the scheduled arrival time. Some railways also register delays internally after 0:59 or 1:59 minutes (not included in diagram) to enable more effective management of reliability.

123456789

1011121314151617181920212223242526272829303132333435363738394041424344

Page 11: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 11

FIGURE 2 Thresholds for considering a train to be late.

Different time thresholds may be suitable for different railways. Thresholds should be chosen based on journey time length for routes on railways’ networks, the average passenger journey length, service frequency, and local culture (this may affect what a passenger considers as late). Thresholds need to reflect what passengers are likely to consider as late. It is useful to choose a threshold that is also used by some other railways (or that is lower that commonly used thresholds) so the choice can be supported if challenged by the public or other organisations (e.g. regulators or the media).

It may also be appropriate to use different thresholds for different purposes. In Great Britain, the Public Performance Measure (PPM) which records the % of trains delayed by 5 minutes or more (10 minutes or more for long distance services) is the normal measure for reporting, including to the public. Regulatory targets are also set for operators in terms of PPM. However, “Right-time” data has also been recorded since 2012. “Right-time” performance measures the percentage of trains arriving at their terminating station early or within 59 seconds of schedule (16). “Right-time” data is made available to the public but is not included within formal reporting or regulatory targets (19). Recoding data at a lower threshold can help operators identity problems with reliability before these become more severe, for example by identifying where amendments to the timetable may be required. When selecting appropriate thresholds for measurement it is therefore important to consider the purpose of the measurement and how results can be used and acted upon.

123456789

1011121314151617181920212223

Page 12: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 12

RECOMMENDATIONSTABLE 1 summarises the advantages and disadvantages of all measures discussed in the previous section. It also provides recommendations for each measure regarding potential use to report both internally and externally. Some key recommendations for reporting on reliability to different audiences are presented below.

Internal Reporting When measuring on-time arrival, in addition to the externally used delay threshold

railways should also use a lower and a higher threshold to better understand problems in their networks.

Delay thresholds for internal measures may depend on what magnitude of delay will typically cause knock-on delays.

Reliability should be measured at key stations in the network, such as main destination stations.

Disaggregating reliability measurements by time of day can help railways understand when issues occur. The same is true about disaggregation by day of the week (weekdays versus weekends).

The following measures are recommended for use for internal management purposes:o % of trains arrival on-time at key stationso % of stops of timeo Mean/total hours of train delay if passengers are not sensitive to distribution of

delays and if passenger delay can be accurately measuredo % passenger journeys on-time if it can be accurately measuredo Weighted delays or incident occurrence (weights based on location, time of day

etc.) if carefully constructedo Mean/total hours of passenger delay if passengers are not sensitive to distribution

of delays

Reporting to a franchisor, transport authority or government agencyUnavoidably, specification of measures for reporting to a franchisor, transport authority or government agency does not generally fall within the remit of railways. However, railways can negotiate for the use of meaningful measures that they are likely to want to improve (e.g. measures aligned with the railway’s internal goals).

Reporting to the public Choosing a delay threshold should be based on local conditions (e.g. what passengers are

likely to perceive as late), but also ensure that the railway is not unnecessarily portrayed in a negative way. Thresholds can be longer than those used internally. Using a threshold that other railways also use can help defend a railway’s choice if challenged by the public.

If separate results for the peak are shared with the public, it is recommended that results for the off-peak are also reported.

Reliability at stations that are important to passengers or results that take into account all stations in the network (e.g. % of stops that are on time) should be reported. However, if all stations are taken into account when measuring reliability, railways should avoid skewing results towards minor stations.

123456789

10111213141516171819202122232425262728293031323334353637383940414243444546

Page 13: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 13

The following measures are recommended for use to report on reliability to the public:o % of trains arriving on-time at stations that matter to the public (e.g. main

destination stations)o % of stops of timeo % passenger journeys on-time if it can be accurately measuredo

123456

Page 14: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 14

  Advantages Disadvantages Recommendations for internal reporting

Recommendations for reporting to the public

Used by railways

% of trains on-time

Easy to estimate; railways typically already collect required data;easy to communicate (e.g. to passengers & media)

Does not capture how many passengers are delayed or how long delays are; may not be representative of the true passenger experience

Report for several key stations to understand where issues occur; use several delay thresholds to get an understanding of the magnitude of delays

Report for stations that are important to passengers (e.g busy stations etc); report separately for the peak and off-peak

Yes

% of stops on-time

Easy to estimate; most railways reported collecting the required data; easy to communicate

May mask reliability results at key stations (if all stations are weighted equally)

Recommended; reliability at key stations should also be estimated separately

Recommended Yes

Mean delay per train/total hours of train delay

Easy to estimate; provides some understanding of the magnitude of delays in the network

Does not capture distribution of delays and hence may not be representative of the passenger experience; studies are inconclusive regarding whether passengers perceive few long delays the same as many short delays.

Conduct research on whether your passengers perceive few long delays the same as many short delays before using the measure

Not recommended for reporting to the public as it highlights delays; % of trains on-time should be preferred

Yes

% passenger journeys on-time

Captures the passenger experience; easy to communicate, meaningful to passengers

Difficult and potentially costly to estimate accurately

Recommended for use if it can be accurately measured

Recommended for use if it can be accurately measured Yes

Weighted delays or incident occurrence*

Relatively easy to estimate; attempts to capture the passenger experience

Not an accurate measure of the number of affected passengers and should not be treated as such; may be difficult to communicate to some audiences

Carefully constructed measure could be useful for internal management

Not readily understandable by the public. Reporting delays/incident occurrence by time of day/ location may be preferable

Yes

Total/mean passenger delay

Captures the passenger experience

Difficult and potentially costly to estimate accurately

Recommended for use if it can be accurately measured

Not recommended for reporting to the public as it highlights delays; % of passengers on-time should be preferred

Yes

*weights based on factors such as the time of day delays/incidents occur, the location or the cause of the incident/delay

TABLE 1 Summary of recommendations for different reliability measures.12

Page 15: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 15

  Advantages Disadvantages Recommendations for internal reporting

Recommendations for reporting to the public

Used by railways

Excess wait time Relatively easy to estimate

Somewhat difficult to communicate; more suitable for measuring delays in waiting time (does not capture delays which occur en-route)

Not recommended for use in timetable based operation; more suitable for headway based operations or very high frequency services

Not recommended for reporting to the public; may not be readily understandable by the public

Yes

Min, max delay Easy to estimate; easy to communicate

Sensitive to outliers; hence not particularly informative

Not recommended for internal use as it is likely to only capture extreme cases; the % of trains arriving within different delay threshold is recommended as an alternative

Not recommended for external reporting Yes

Delays-at-Risk (Moussa & Stephan (10))

Accurately describes the distribution of delays, so captures both the frequency and the magnitude of delays

Too complex to estimate and communicate to be effective for day-to-day management purposes

Not recommended Not recommended No

Schedule delay Reflects the passenger experience

High cost of implementation; difficult to communicate

Not recommended - high cost of implementation, but does not convey where issues occur

Not recommended - difficult to communicate No

Probability of arrival within a given time threshold

Reflects the passenger experience

High cost of implementation; difficult to communicate

Not recommended - high cost of implementation, but does not convey where issues occur

Not recommended - difficult to communicate No

Buffer index Reflects the passenger experience

High cost of implementation; difficult to communicate

Not recommended - high cost of implementation, but does not convey where issues occur

Not recommended - difficult to communicate No

Standard deviation or coefficient of variation of travel time/delays

Easy to estimate

Not intuitive to individuals with no statistical training; may not be informative if journey times & delays do not vary substantially

Only useful for railways with a large variation in travel times/delays; such variation is not typical in railways

Not recommended No

TABLE 1 Summary of recommendations for different reliability measures. (Continuation from previous page)1

Page 16: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 16

CONCLUSIONSThis paper includes a review of literature on reliability measurement in public transport, the results of a global survey of suburban rail operators and an assessment of the value of specific reliability measures. Reliability is one of the top factors influencing customer satisfaction with passenger rail services. It affects the level of demand for the service as passengers place a large negative value on delays. This matters to service providers, as it drives the fare revenue, and to policy makers as it influences mode share.

A key element identified in this research was the need to ensure that measures are defined appropriately for different purposes for which they are required. This typically includes three categories: internal measurements to manage the service, reporting to governments/authorities or franchisors for regulatory purposes and external reporting to customers and the media. Different types of measures, definitions, degrees of aggregation, measurement thresholds and levels of complexity will be optimal for each of these purposes. However, only 15% of railways surveyed reported specifically considering the need for different measures for different audiences/purposes when choosing the reliably measure they currently use.

Most railways choose reliability measures either based on their reporting obligations to regulators (62% of railways) or simply because these are typically used within the railway industry (38% of railways). Few railways provided a clear reasoning for choosing specific measures. Those that did highlighted the need for measures to describe the true passenger experience, to be easily communicated, to be cost effective to measure and to provide an appropriate trade-off between complexity and ease of communication.

The trade-off between accuracy and ease of communications is a theme that runs through the discussion above. Many sophisticated measures may be more accurate but ultimately measures are used to communicate results to others and so must be easily understandable. Simple measures such as the % of trains arriving on-time may not be an accurate representation of true experience, but they have the advantage that they can be easily understood by the public (e.g. passengers, journalists). Similarly measurement thresholds should be considered in relation to the purpose of the measure; this may be different for public reporting and for internal management of service operation.

Complex measures can be more informative for understanding reliability issues internally, as well as passengers’ experience, but may not be suitable for public reporting. Complex measures can be used for internal reporting and reporting to regulators/authorities, while simpler ones might be more appropriate for public reporting.

Disaggregating results by time period is also valuable for both internal and external purposes. 77% of the railways surveyed consider some disaggregation by time period (e.g. separate results for the peak and off-peak periods). Delays during the peak periods typically affect most passengers; however, off peak demand may be more discretionary with more alternatives available to passengers if the rail service is not perceived as being satisfactory.

This paper offered several recommendations for reporting on reliability to different audiences. It should be noted that a single measure for each audience cannot be recommended. The best measures to use will depend on the specific characteristics of each railway. Such characteristics include train frequency (high versus low frequency), type of operation (e.g. terminus operation versus cross-city) and passengers’ usual travel purpose (e.g. mainly commuter versus mixed passenger traffic).

123456789

101112131415161718192021222324252627282930313233343536373839404142434445

Page 17: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 17

REFERENCES

1. Rehnström, G., Increasing the service quality of metropolitan railways. Report 3. 49th International UITP Congress, Stockholm, 1993.

2. Lombart, A. and M. Favre, Global quality of metros. 1995: Report 2. 51st International UITP Congress, Paris.

3. Morpace International Inc, ‘A Handbook for Measuring Customer Satisfaction and Service Quality’, in Transit Cooperative Research Program Report 47, N.R. Council, Editor. 1999: Washington, D.C.

4. Li Z., Hensher D. & Rose J.M., Willingness to pay for travel time reliability in passenger transport: A review and some new empirical evidence. Transportation Research Part E, Vol. 46, 2010, pp. 384–403.

5. Bates, J., Polak, J., Jones, P. and Cook, A., The valuation of reliability for personal travel, Transportation Research Part E , Vol. 37, 2001, pp. 191–229.

6. De Jong, G., E. Kroes, R. Plasmeijer, P. Sanders, and P. Warffemius. The value of reliability. in Applied Methods Stream, European Transport Conference. 2004. Strasbourg, 4-6 October 2004.

7. Preston, J., G. Wall, R. Batley, J.N. Ibanez, and J. Shires, Impact of delays on passenger train services. Evidence from Great Britain. Transportation Research Record: Journal of the Transportation Research Board, 2009. No. 2117: p. 14-23.

8. Vincent, M. and B.A. Hamilton, Measurement valuation of public transport reliability. 2008, Land Transport New Zealand Research Report 339.

9. Börjesson, M. and Eliasson, J. On the use of ‘‘average delay’’ as a measure of train reliability, Transportation Research Part A, Vol. 45, 2011, pp. 171–184.

10. Moussa, M. A., Stephan, M. Extreme risk measures for train delay time. Working paper. LAMETA. Universite de Montpellier I, 2014.

11. Currie, G., Douglas, N. J., & Kearns, I. An Assessment of Alternative Bus Reliability Indicators. Paper presented at the Australasian Transport Research Forum 2012 Proceedings, Perth, Australia, 2012.

12. Trompet, M., Xiang, L., Graham, D.J. Development of Key Performance Indicator to Compare Regularity of Service Between Urban Bus Operators. Transportation Research Record Vol. 2216, 2011, pp. 33 41.

13. Mazloumi, E., Currie, G., Sarvi, M. Assessing Measures of Transit Travel Time Variability and Reliability Using AVL Data. Presented at the 87th TRB Annual Meeting, Washington, DC, 2008.

14. Barron, A., P.C. Melo, J.M.Cohen, and R.J. Anderson. A Passenger-Focused Management Approach to the Measurement of Train Delay Impacts. Transportation Research Record Vol. 2351, 2013, pp. 46-53.

15. Association of Train Operating Companies, Passenger Demand Forecasting Handbook v5.1. London: ATOC, 2013.

16. Network Rail. Performance: Cancellation and significant lateness (CaSL). www.networkrail.co.uk/about/performance/. Accessed Jul. 21st, 2015.

17. Lomax T., Schrank D., Turner S. Selecting travel reliability measures. Texas Transportation Institute and Cambridge Systematics, Inc. 2013.

18. FHA. Traffic Congestion & Reliability – Trends and Advanced Strategies for Congestion Mitigation. US Department of Transportation Federal Highway Administration, 2004.

1

23456789

101112131415161718192021222324252627282930313233343536373839404142434445

Page 18: Introduction - Imperial College London · Web view12 ) review the suitability of various regularity measures for benchmarking analyses of high frequency urban bus services. The measures

Karathodorou and Condry 18

19. Office of Rail and Road (ORR). Passenger & Freight Rail Performance Quality Report. London, May 2015.

12


Recommended