MScThesis Eka Hintaran · R.E. Hintaran in partial fulfilment of the requirements for the degree of...

ii

iii

Unravelling Urban Pedestrian Trips

Developing a new pedestrian route choice model estimated from revealed preference GPS data

By

R.E. Hintaran

in partial fulfilment of the requirements for the degree of

Master of Science in Transport, Infrastructure and Logistics

at the Delft University of Technology,

to be defended publicly on Monday January 4th, 2016 at 14:00 PM.

Graduation committee: Chair: Prof. dr. ir. S.P. Hoogendoorn TU Delft Dr. ir. W. Daamen TU Delft

Dr. J.A. Annema TU Delft External Supervisors: Prof. dr. K.W. Axhausen ETH Zürich

L. Montini, MSc ETH Zürich

An electronic version of this thesis is available at http://repository.tudelft.nl/

iv

v

Preface This thesis shows the results of my graduation project on pedestrian route choice behaviour in urban areas. It is fulfilled as part of the master program Transport, Infrastructure and Logistics at Delft University of Technology. The graduation project was carried out in cooperation with ETH Zürich. However, it would be possible to be accomplished without the guidance and support of several people. Therefore, I would like to thank all the people who have supported me during my graduation project. First of all, I would like to thank my daily supervisor at TU Delft, Winnie Daamen, for her scientific and personal support throughout the entire project. She was always critical and realistic, but also understanding when it was needed. It often happened that I was lost in my thesis work, but Winnie always knew how to motivate me with sharp feedback. I enjoyed the many thesis and off-thesis discussions that we had, and it was very inspiring to work with such a committed scientist. Secondly, I would like to thank my second daily supervisor at TU Delft, Jan Anne Annema, for his scientific support and his infinite optimism and enthusiasm. His encouraging feedback and positive spirit helped me to structure my work and to see the greater picture. Lastly, I would like to thank Serge Hoogendoorn for his encouragement, support and his natural enthusiasm for this topic. His contagious enthusiasm and bright ideas motivated me to find new solutions. I would also like to thank Prof. Kay Axhausen for inviting me to his institute, for his support during my thesis and for giving me the freedom to define my own research project at the institute. Furthermore, I would like to thank Lara Montini, my daily supervisor at ETH Zürich, for her guidance during my time in in Zürich and for her patience to learn me programming in Java. She was always very patient and helpful, even when I was already back in the Netherlands. Also, I would like to thank my colleagues at ETH Zürich for the great and unforgettable time. In the four months that I spent there I have learned a lot of things and I got the opportunity to join my colleagues to the yearly Swiss Transport Research Conference. Lastly, I would like to thank my friends for the fun times during my graduation project and all the other afstudeerders of the famous Afstudeerhok for fun times and fruitful discussions during coffee and lunch breaks and also outside the university. Special thanks go out to my parents and my sister, who were always supportive and who were always there for me. And also very special thanks to Eelco, for his unconditional support in all times and for always cheering me up with his positive perspective on life.

Delft, December 2015

Eka Hintaran 1383876

vi

vii

viii

Contents

Introduction ....................................................................................................................... 3

1.1 Problem analysis ...................................................................................................... 4

1.2 Conceptual framework and research objective ............................................................ 5

1.3 Contribution to practice ............................................................................................. 6

1.4 Contribution to science ............................................................................................. 6

1.5 Research approach ................................................................................................... 7

1.6 Scope and research limitations .................................................................................. 8

1.7 Thesis outline ........................................................................................................... 9

State-of-the-art on Pedestrian Route Choice Behaviour ........................................................ 11

2.1 Pedestrian route choice behaviour ........................................................................... 11

2.1.1 Route choice decision-making .............................................................................. 12

2.1.2 Environmental street characteristics influencing route choice behaviour ................. 13

2.2 Conclusion ............................................................................................................. 15

State-of-the-art on Pedestrian Route Choice modelling ........................................................ 17

3.3 Observed routes ..................................................................................................... 25

3.3.1 Data collection methods ....................................................................................... 25

3.3.2 RP studies in pedestrian research ......................................................................... 26

3.4 Generation of alternative routes ............................................................................... 27

3.4.1 Choice Set Generation in modelling process ........................................................... 27

3.4.2 Requirements for the choice sets and the method .................................................. 28

3.4.3 Evaluation methods ............................................................................................. 29

3.4.4 Different procedures ............................................................................................ 30

3.5 Formulation of correlation structure ......................................................................... 32

3.6 Conclusion ............................................................................................................. 34

Case study Zürich ............................................................................................................. 39

4.1 Used data .............................................................................................................. 40

4.1.1 Street network .................................................................................................... 40

4.1.2 Observed routes .................................................................................................. 41

4.1.3 GPS data collection and post-processing ................................................................ 41

4.2 Processing of GPS data ........................................................................................... 43

4.3 Map-matching procedure......................................................................................... 45

4.4 Generation of alternative non-chosen routes ............................................................. 47

4.5 Calculation of route characteristics and Path-Sizes ..................................................... 50

4.5.1 Environmental street characteristics ...................................................................... 50

4.5.2 Path-Size factors (overlap) ................................................................................... 52

4.5.3 Writing final results for choice modelling ............................................................... 52

Analysis of GPS and generated data ................................................................................... 55

5.1 Research plan ........................................................................................................ 55

5.2 Descriptive analysis of results .................................................................................. 57

5.3 Comparing the chosen routes with the alternative non-chosen routes in the choice set. 63

5.4 Conclusion ............................................................................................................. 69

Estimation of route choice models ..................................................................................... 73

6.1 Research plan ........................................................................................................ 74

6.2 Model specification ................................................................................................. 76

6.3 Basic Model ............................................................................................................ 77

6.3.1 Independent estimation of parameters .................................................................. 78

6.3.2 Basic model results .............................................................................................. 79

ix

6.3.3 Conclusion and next steps .................................................................................... 81

6.4 Sampling of alternatives .......................................................................................... 81

6.4.1 Samples ............................................................................................................. 81

6.4.2 Sample of longest routes ..................................................................................... 82

6.4.3 Random sample .................................................................................................. 82

6.4.4 Importance Sampling 1 ........................................................................................ 85

6.4.5 Importance Sampling 2 ........................................................................................ 87

6.4.6 Conclusion .......................................................................................................... 89

6.5 21 data set ............................................................................................................ 89

6.5.1 Basic model ........................................................................................................ 89

6.5.2 Importance Sampling .......................................................................................... 92

6.5.3 Conclusion .......................................................................................................... 94

6.6 Final Conclusion ..................................................................................................... 94

Conclusions and recommendations .................................................................................... 99

7.1 Findings ................................................................................................................. 99

7.2 Conclusions ............................................................................................................ 99

7.3 Recommendations for science and further research ................................................. 100

7.4 Recommendations for practice ............................................................................... 102

7.5 Discussion ............................................................................................................ 104

Bibliography ................................................................................................................... 107

Appendix 1 Study area in MATSim format and in OpenStreetMap ...................................... 113

Appendix 2 Example of Travel Diary ................................................................................ 115

Appendix 3 .................................................................................................................... 116

Descriptive analysis chosen routes ................................................................................... 116

Appendix 4 .................................................................................................................... 121

Model estimation results of sample of longest routes ......................................................... 121

x

List of Tables

Table 1: Overview of route attributes that form the route characteristics ............................. 14

Table 2: Overview of model formulations applied to slow modes ........................................ 24

Table 3: Overview of RP studies in pedestrians' research ................................................... 26

Table 4: Calculated Route Attributes ................................................................................ 57

Table 5: Characteristics of chosen routes.......................................................................... 58

Table 6: Descriptive analysis of all chosen routes .............................................................. 60

Table 7: Descriptive analysis of non-chosen routes ............................................................ 60

Table 8: Chosen route compared with alternative routes .................................................... 63

Table 9: Shortest routes and detours ............................................................................... 64

Table 10: Correlations between attributes for 20 and 21-data sets ..................................... 69

Table 11: Route class definition ....................................................................................... 75

Table 12: Basic model with 20 alternatives, attributes independently estimated ................... 78

Table 13: Basic model PSL results, trip length in Distance (km) .......................................... 80

Table 14: Basic model PSL results, trip length in Route Classes .......................................... 80

Table 15: Random sample, attributes independently estimated .......................................... 83

Table 16: Random sample PSL results, trip length in Distance (km) .................................... 84

Table 17: Random sample PSL results, trip length in Route Classes .................................... 84

Table 18: Importance Sampling 1, attributes independently estimated ................................ 85

Table 19: Importance Sampling 1 PSL results, trip length in Distance (km).......................... 86

Table 20: Importance Sampling 1 PSL results, trip length in Route Classes .......................... 86

Table 21: Importance Sampling 2, attributes independently estimated ................................ 87

Table 22: Importance Sampling 2 PSL results, trip length in Distance (km).......................... 88

Table 23: Importance Sampling 2 PSL results, trip length in Route Classes .......................... 88

Table 24: Basic model 21-data set, attributes independently estimated ............................... 90

Table 25: Basic model 21-data set PSL results, trip length in Distance (km) ......................... 91

Table 26: Basic model 21-data set PSL results, trip length in Route Classes ......................... 91

Table 27: Importance sampling 21-data set, attributes independently estimated .................. 92

Table 28: Importance Sampling 21 PSL results, trip length in Distance (km) ........................ 93

Table 29: Importance Sampling 21 PSL results, trip length in Route Classes ........................ 93

Table 30: Basic model and Importance Sampling 1 PSL results, trip length in Route Classes . 95

xi

List of Figures

Figure 1: Route selection scheme ...................................................................................... 5

Figure 2: Basic Conceptual Framework ............................................................................... 6

Figure 3: Research approach and thesis outline .................................................................. 9

Figure 4: Examples of overlapping and crossing routes (Bovy & Stern, 1990) ...................... 12

Figure 5: Three different choice situations (Bovy & Stern, 1990) ........................................ 12

Figure 6: From objective to subjective factors ................................................................... 13

Figure 7: Overview of the Route Choice Modelling process ................................................. 18

Figure 8: The overlapping Path problem (Ramming, 2002) ................................................. 21

Figure 9: Hierarchy in choice sets, from the pedestrian's and the researcher's perspective (Hoogendoorn-Lanser & van Nes, 2004) ........................................................................... 28

Figure 10: Overview of Choice Set Generation Methods ..................................................... 31

Figure 11: Updated Conceptual Framework ...................................................................... 36

Figure 12: Extensive public transport network of Zürich (www.stadt-zuerich.ch) .................. 39

Figure 13: Study Area (left: www.openstreetmap.org; right: constructed network (MATSim, visualised in VIA)............................................................................................................ 40

Figure 14: Example of observed routes of one person (ArcGIS, using OSM network) ............ 41

Figure 15: Example GPS tracks and GPS device ................................................................. 42

Figure 16: Comparison of GPS data with data from Mikrozensus 2010 (mode share and trip purpose) ........................................................................................................................ 43

Figure 17: Visualisation of observed routes by one person before processing of GPS data (ArcGis, using OpenStreetMap network) ........................................................................... 43

Figure 18: Processing of GPS data ................................................................................... 44

Figure 19: Map-matching of GPS points ............................................................................ 46

Figure 20: GPS points (red) and walking trips after Map-Matching (green) .......................... 47

Figure 21: Order in which the nodes are explored (stackoverflow.com) ............................... 47

Figure 22: BFS-LE algorithm: d = depth; Sn = additional alternatives found at depth n; S = size of the choice set; b(d) = Number of candidate networks at depth d; (Rieser-Schüssler (2012)).......................................................................................................................... 48

Figure 23: Road types in the street network (visualisation in VIA)....................................... 50

Figure 24: Overview of route attributes calculation ............................................................ 51

Figure 25: Histogram of trip lengths in KM of chosen routes............................................... 58

Figure 26: Distribution of walking trips by activity type ...................................................... 59

Figure 27: Route from tram station to viewpoint in Open Street Map .................................. 61

Figure 28: Route from tram station to viewpoint in VIA (left) and links used by alternative routes (right) ................................................................................................................. 61

Figure 29: Trip from the Polybahn to the Main station in Open Street Map .......................... 62

Figure 30: Chosen trip in VIA .......................................................................................... 62

Figure 31: Links used by alternative routes, in VIA ............................................................ 62

Figure 32: Distribution of chosen routes ranked by distance (in percentage and counts) ....... 65

Figure 33: Route classes grouped by distance ................................................................... 65

Figure 34: Frequency tables of 20-data set (left) and 21-data set (right); distribution of chosen routes ranked by distance ............................................................................................... 66

Figure 35: Route classes grouped by distance (20-data set) ............................................... 66

Figure 36: Route classes grouped by distance (21-data set) ............................................... 66

Figure 37: Histogram of distances (20-data set) ................................................................ 67

Figure 38: Histogram of distances (21-data set) ................................................................ 67

Figure 39: Trip distances of two choice sets of 20-data set (left chosen is 0,09; right 0,16) .. 68

xii

Figure 40: Trip distances of two choice sets of 21-data set (left chosen is 0,11; right 0,08) .. 68

Figure 41: Overview of model estimations ........................................................................ 74

Figure 42: Central (www.central.ch) .............................................................................. 102

xiii

xiv

xv

Executive summary

Walking is very important in our lives: for millions of years, walking has been the most basic mode of

transport. However, much less research has been done on walking and pedestrians compared to

motorised vehicular modes. Especially pedestrian’s route choice behaviour is an interesting topic in

research. Knowledge about pedestrian’s route choice behaviour is sparse, while this knowledge is very

relevant for planning and designing public spaces (rail stations, airports) and pedestrians facilities in

cities. Theory on pedestrian route choice behaviour could also support in planning and managing large

events. Current trends and challenges, such as the growing world population and increasing

urbanisation (both resulting in increasing pressure on urban space and its infrastructures), make this

topic more and more important. Therefore, the objective of this thesis is to determine which

environmental street characteristics have on influence on the route choice process. This choice process

is influenced by various factors, but this thesis focuses on environmental street characteristics. The aim

of this thesis is reflected in the main research question:

‘“Which environmental street characteristics have an influence on pedestrian route choice behaviour in

urban areas?”

A literature review and a case study have been carried out to answer this main research question. The

city of Zürich is taken as a case study for a revealed preference experiment and the data are collected

by GPS trackers. The purpose of this thesis is to estimate a pedestrian route choice model based on

revealed preference GPS data.

Literature shows that pedestrians make choices on three levels: strategic level (departure time choice

and activity pattern choice), tactical level (activity scheduling, activity area choice and route-choice to

reach activity areas) and operational level (walking behaviour). The focus in this thesis is on the tactical

level: route-choices from origin to destination. It is assumed that pedestrians mainly make their route

choices simultaneously: he or she makes a choice for the entire route before departing and does not

change it on the way. Which route is chosen is based on their perceptions of the transport network and

on personal characteristics. When utility maximization is assumed, individuals choose, or intend to

choose, the alternative with the highest perceived utility. Route choice behaviour of pedestrians is

influenced by various factors: network characteristics, route characteristics, personal characteristics and

trip characteristics. A fifth category that also influences route choices are circumstances, such as

weather conditions and traffic information. Environmental street characteristics belong the route

characteristics category.

Literature study on pedestrian route choice behaviour in urban areas shows that trip length is in most of

the cases the most dominant factor in route choices. Other reported influential factors are scenery and

safety factors, but these are not directly measurable from the network thus not taken into account in

this thesis. Other selected attributes are road type and gradient, as road type relates with safety factors

and comfort and gradient is related to physical effort, and especially important in a hilly city such as

Zürich. Both environmental street characteristics are measurable from the available network.

Pedestrian route choice modelling

As the aim of this thesis is to report a pedestrian route choice model estimated on the basis of revealed

preference data, first the suitable methods for each step in the route choice modelling process was

selected. The route choice modelling process consists of three main steps: obtaining trip observations,

xvi

generating alternative non-chosen routes and defining the correlation structure between the

alternatives in the choice set. These steps are essential before the actual estimation process could start.

In this thesis, utility maximization is assumed, thus route choice behaviour is described within the

discrete choice modelling framework. The main idea of utility maximization is that individuals make a

subjective rational choice between a finite number of choice options and select the alternative with the

highest utility. As revealed pedestrian’s route choices in an urban area is modelled, the selected model

formulation needs to be able to work with a dense real size network, to handle the extensive data set

and to account for similarities in alternatives (overlap). The best option for the situation in this thesis

turned out to be the Path-Size Logit model: it is able to capture overlap among routes, it is known to be

sufficiently robust, it has the relatively simple MNL structure and it has been shown to perform well

relative to more complex model forms in real size networks.

For route choice modelling, both observed trips as non-chosen alternative trips are required. These two

form the choice set. In this revealed preference study, trip observations are collected using GPS

technology. The non-chosen alternative routes are generated using the Breadth First Search on Link

Elimination (BFS-LE) method. This algorithm has proven to be efficient and consistent in bicycle route

choice studies using large urban networks, and it has computational speed. Also, the BFS-LE method

enables to use any (multi-attribute) cost-function so environmental factors can be taken into account

when generating the routes. Furthermore, the method has shown to be able to generate heterogeneous

routes. Choice set generation is a very complex task, as the analyst lacks information about the exact

alternatives that are known and considered by the traveller.

The last step was to define the correlation structure between the alternatives in the choice set. As

mentioned earlier, the Path-Size Logit model is selected to describe pedestrian’s route choices. In order

to use a Path-Size Logit model, the adjustment term (Path-Size Factor) needs to be defined and

calculated for each choice set. There are several Path-Size Factor formulations proposed, the challenge

is to select the one which best represents the travellers’ perceptions of overlapping routes. In this

thesis, the two traditional formulations of Ben-Akiva & Bierlaire (1999) and the Path-Size correction

term of Bovy et al. (2008) are selected.

Case study: Zürich

To answer the main research question of this thesis, the city of Zürich is taken as a case study.

Observed routes were collected in the city of Zürich using GPS technology. 159 participants collected

approximately one week of travel data using a mobile GPS device, which resulted in 7233 stages. After

extensive post-processing of the raw GPS data (filtering, smoothing, cleaning), filtering for interesting

participants (participants who actually made walking trips within Zürich) and the map-matching

procedure, only 51 participants were left, together making 580 trips. For the map-matching procedure,

a street network based on Open Street Map data and an Elevation model (heights) are used. The

results of the map-matching procedure (the observed routes) and the street network are used to

generate the non-chosen alternative routes. As mentioned before, the BFS-LE method was used to

generate alternative routes. The algorithm combines a Breadth First Search with topologically equivalent

network reduction (link elimination). One advantage of this method is that it could use any given cost

function, specified by the researcher. In this thesis, a multi-attribute cost function is used, including

four attributes: trip length, path (foot path or no foot path), road type (walk only, walk and bike and all

modes) and gradient. When generating the alternative routes, the algorithm is driven by these

attributes and it tries to vary in these attributes. The algorithm generates choice sets of 20 alternatives

and when the chosen route was not generated by the algorithm, it was added to choice set in the end

(which results in a choice set of 21 alternatives). So the total data set consists of choice sets of 20 and

xvii

21 alternatives. The choice set generation method was able to reproduce 67% of the chosen routes,

which is a good score. In order to use the choice sets for route choice modelling, the route

characteristics were calculated. The calculated attributes are trip length, gradient characteristics, road

type fraction, fall and rise characteristics and the Path-Size factors. The final output is a data file with all

the observed and generated non-chosen routes with their calculated attributes.

Descriptive analysis of observed and generated routes

Results of descriptive analyses form the basis of further quantitative research (in this thesis, the

estimation process). Main conclusions of the descriptive analyses are that people in Zürich mainly walk

short distances (on average 0,13 km). Many of these trips turned out to be transits between modes of

lines or trips inside or around the house. On average, the non-chosen generated routes are in trip

lengths shorter than the observed routes, have on average a higher maximum rise and average rise,

and the PS factors of the generated routes are on average lower, which means that the observed

routes are less overlapping. Furthermore, the GPS data tell us that pedestrians do not always choose

the shortest route available (in normal situations), but they mainly choose one of the shortest routes.

When the chosen route was not generated by the algorithm, it mainly belongs to one of the longest

routes of the choice set. This leads to the assumption that when the choice set consists of 21

alternatives, the chosen route is apparently influenced by other factors than trip length (for example

trip purpose, such as shopping) because the chosen route mainly belongs to one of the longest routes.

When this is the case, the travel behaviours of the 20-choice sets and the 21-choice sets cannot be

explained by the same model, thus for model estimation the total data set was split into two subsets.

Other conclusion from the descriptive analysis is that the data reveal that maximum rise is considered

as more important than average rise and that the differences in distance between route alternatives can

be very small. Therefore, the full choice set was taken into account for estimation.

Model estimation

In the model estimation process, the attributes that influence the route choice process of pedestrians

are identified. In the estimation process, the total data set is split into two data subsets: one subset

containing all data with choice sets of 20 alternatives (20-data set) and the other containing all data

with choice sets choice sets of 21 alternatives (21-data set). For both data sets, the same estimation

procedure is carried out: first the parameters are estimated independently, then two basic models are

estimated with 20 or 21 alternatives and finally, samples of alternatives are tested, to see what happens

with the model results when the size and composition of the choice set are changed. The attributes that

were taken into the estimation process are trip length, gradient, road type and the Path-Sizes. The

following samples are used in estimation: sample of longest routes (20 alternatives), random sample of

six alternatives and two samples of 6 alternatives using importance sampling (first and second method).

Conclusion is that using importance sampling according to the first method resulted in the best model

results: most parameters were significant, best Goodness of fit and the parameter values for trip

lengths were according to our expectations based on findings from literature and descriptive analysis

that trip length has a negative effect on route choices. The other significant attributes, maximum rise,

road type allowed for walk and bike and Path Size factors, were consistent in all model results.

Maximum rise seems to be the dominant factor in route choices of pedestrians in Zürich.

The 21-data set did not provide much information about route choices regarding trip length, as trip

length was never significant in the different model estimations. Remarkable in these results is the very

high Adjusted rho-square of approximately 0,5. This is very high, especially in a revealed preference

study. Apparently, the 21-data set fits the model very well, which is very remarkable because the 21-

data were seen as the exceptions of the total data set. The high value suggests that the generated

choice set may contain too few reasonable alternatives, biasing the parameter estimates.

xviii

Conclusions and recommendations

The main finding is that it is possible to estimate route choice models and to obtain significant results

from GPS data collected by pedestrians. Therefore, it is realistic to treat walking behaviour as utility

maximizing behaviour. Therefore, it can be concluded that route choice behaviour of pedestrians can be

described in the discrete choice modelling framework.

In this case study, all significant attributes (maximum rise, road type allowed for walk and bike and

Path Size factors) were found to be consistent in all model estimation results. Maximum rise was found

to be the most dominant negative factor in pedestrian’s route choices. The fraction of Walk and Bike

roads is also found significant (positive influence). All Path-Size factors were found to have a negative

influence. The relative influence of Walk and Bike roads and the Path-Size factors were less than the

influence of maximum rise. The results on the influence of trip length are not consistent, but it is clear

that trip length is not the dominant factor in pedestrians’ route choices in Zürich. This is the opposite of

what is found in literature and partly in descriptive analysis (people mainly choose one of the shortest

routes). In the best model results were obtained by using importance sampling of alternatives for the

20-data set: most parameters were significant and the model had the best model fit.

To answer the main research question, maximum rise, road type (walk and bike roads), overlap and trip

length all have an influence on pedestrian route choices in urban areas. Their relative influence to

pedestrian route choices is in this case study different than in other case studies. In a hilly city as

Zürich, maximum rise is dominant while in any city of the Netherlands this is probably not the case.

Therefore, the results of this casus are not useful for other cities. Also, the data sample used in this

casus contains very short walking trips, which is not representative for actual pedestrian behaviour in

cities. Therefore, results based on this data sample are not valid and scalable to other case studies. But

this thesis shows that a GPS-based route choice model for pedestrians could support in policy-making:

the casus show that it is possible to estimate a pedestrian route choice model from GPS data and

therefore the methodology could be adopted to support in policy-making in other cities. Results from

GPS based route choice studies could support local governments in pedestrian planning and in the

management of pedestrian flows. When governments know which street characteristics are preferred

by pedestrians, governments could plan and design public places accordingly. Lastly, there are also

some recommendations for science and further research, as there are still a lot of topics which were

uncovered in this thesis. Firstly, pedestrian route choice modelling in general needs more attention:

research is needed into advanced data collection and processing methods (virtual and augmented

reality, automated processing of GPS data), new choice set generation methods especially developed

for pedestrians, advanced model formulations which could better represent the complex behaviour of

pedestrians and advanced methods to account for similarities between alternatives (as perceived by

pedestrians). Also useful for pedestrian route choice modelling is to find out how pedestrians gain

knowledge about the network and how they form their choice set.

1

2

3

1 Introduction

Walking is very important in our lives: people have been walking for millions of years. Nowadays,

the demand for walking is still growing since it is a very practical and sustainable mode of transport.

In cities it is the most important mode of transportation: walking connects activities within a certain

range very easily, without interchange or using a vehicle. Yet, there is still a lot we do not know

about walking, which makes pedestrian research very important. Especially there is little known

about pedestrian route choice behaviour. This knowledge is relevant for designing cities and large

public spaces, planning large events and managing pedestrian flows. Besides, the trends mentioned

below and challenges make pedestrian research even more important today and in the future.

The world population is rapidly growing: it will grow from seven billion today to over nine billion by

2050 (United Nations, 2013). Furthermore, more and more people will live in cities; from 50% to

over 70% of the world population by 2050 (United Nations, 2013). When there is lack of space and

the infrastructure could not change accordingly, more people in the cities mean higher densities in

its infrastructure: crowded streets (more cars and pedestrians), crowded transport systems, dense

housing en high rise buildings. The challenge is not only to serve the people, but also to manage the

related risks. There will be more pedestrians in the cities, so to serve them and to manage the risks

they have, it is important to have a good understanding of their behaviour and needs. In addition,

not only the amount of people will increase, but also their average age (United Nations, 2013). Due

better living conditions, there will be an increase in elderly. Travel behaviour of older people might

not differ that much from young people, but they have other needs: they move slower and they

cannot walk long distances. So there will be more and more a need for accessible infrastructure,

reduced distances, simple paths and clear signs. To meet those needs, it is important to understand

the physical requirements of walking.

Another trend is that mass gatherings have become increasingly popular, as well as organised as

non-organised. In organised gatherings, such as music festivals and religious festivals, the

organisation is prepared for the crowd, but in spontaneous gatherings the preparation time for

crowd management is limited. Spontaneous gatherings could be organised within very short time

due to social media. The popularity of these events also leads to serious problems, such as human

stampedes due to high densities. When we have a better understanding of the behaviour of

pedestrians and crowds, these dangerous situations can be managed and prevented.

However there are more situations that require a safe and efficient management of large numbers

of people in regular and emergency conditions, for example large public spaces, such as airports and

rail stations. In the future there will be more large public spaces and they will also increase in size.

It should be noted that pedestrians behave differently in regular and in emergency situations, so it is

necessary to have a good understanding of their behaviour in both situations. By understanding

pedestrian behaviour in different situations, these behaviours can be predicted and simulated in

4

advance. This information can be used for designing new pedestrian facilities, for avoiding

dangerous situations, for planning adequately for large events and emergencies and for making

walking more attractive in general.

Another reason why pedestrian research is important is because walking as a transportation mode

offers a lot of benefits. It is not only an environmental-friendly mode of transport, but it also offers

benefits for public health and social life. Examples of benefits are a decrease in congestion,

reduction of greenhouse gases, safer and cleaner cities, more social interactions in cities and a

reduced risk for several cardio-related diseases. Therefore, promoting walking is often one of the

goals in local policies: in almost all current plans for urban and suburban travel behaviour change,

the encouragement of using slow modes is a central element. Walking could be encouraged by

providing well-designed and safe pedestrian networks, but their design requires a good

understanding of pedestrian’s route choice behaviour and preferences. Based on this knowledge,

policymakers and urban planners could improve urban facilities for pedestrians and hence, increase

the percentage of people who choose to walk.

1.1 Problem analysis

Bovy & Stern (1990) defined the route choice problem as follows: the choice of a route for a

particular trip from a set of given route alternatives. The search for new routes and information

about new routes is defined as the route search problem. Both topics concerning route choices are

heavily studied in research. The study of travellers’ route choice behaviour in networks is primarily

focused on gaining knowledge about their spatial choice behaviour. Researchers within this field try

to find out how people choose routes in a network, what their knowledge about the network is, how

they gain knowledge and which factors play an role in the route choice decision-making. Knowledge

about route choice behaviour could be used to design quantitative models aimed at predicting and

forecasting network usage dependent on the routes’ and travellers’ characteristics. Practical

applications are infrastructure planning, network performance evaluation, traffic control, policy-

making and designing new infrastructures and facilities.

In this thesis, focusing on pedestrian route choice behaviour, we look into these topics as well: we

want to know how pedestrians choose their routes and which factors have an influence on this

process. The problem is that the route choice process of pedestrians is very complex, as it is not

always clear what the drivers are in the decision-making process. Do pedestrians choose their routes

based on utility maximization or choose people their routes randomly, or more based on habit? This

uncertainty makes modelling pedestrian behaviour, and individual human behaviour in general very

complex. Another problem is that we don’t know how pedestrians gain and process information

about the network. Network knowledge and the processing of information might have a large

influence on the route choices, but until now this relationship is still an important topic for research.

This problem about network knowledge and information processing leads to the next problem: the

choice set formation process. Even when we exactly know which routes are known to the traveller,

we still don’t know which routes are actually considered by the traveller. Lack of information about

the choice set formation process and about considered non-chosen alternative routes (the true

choice set of the traveller), is a major problem in the field of route choice modelling. This problem

makes the generation of non-chosen alternative routes very complex. Also, we will never know if the

generated choice set is the actual choice set considered by the traveller.

In the route selection scheme illustrated in Figure 1, these three problems of (pedestrian) route

choice behaviour can be found in the Black Box in the middle. The challenge is to make this Black

5

Box clearer and to find out what its relations are with the other two boxes. A clearer Black Box could

lead to better route choice model results and to a better understanding of pedestrian route choices.

Figure 1: Route selection scheme

In this thesis, we look specifically into the different network factors that influence the route choice

process. Network and route characteristics (network factors) are defined by route attributes, which

can be measured in the given network. However, the fact that these route attributes can be

measured, does not say anything about their significance. A larger route attribute from class one is

not always valued as more important than a smaller route attribute from class two. The explanation

is that route attributes are not perceived as equally important, and their significance varies

according to the person, the kind of trip and to occasionally changing circumstances (Bovy & Stern,

1990). The problem is that it is unknown what the general ranking is in attributes in terms of

importance (part of pedestrian route choice process problem in the Black Box). In this thesis, we

want to know what the relative influence is of different route attributes on the route choice process.

1.2 Conceptual framework and research objective

The purpose of this thesis is to estimate a pedestrian route choice model from revealed preference

GPS data. As the amount of revealed preference studies on this topic is very limited, it would be

interesting to look into this problem from this perspective. By using different techniques for choice

modelling and data collection, new insights can be gained. The aim of this thesis is to better

understand pedestrian route choice behaviour in regional urban areas.

The route choice decision making process can be influenced by different internal and external

factors (Daamen, 2004). Route characteristics are one of the main external factors. The proposed

conceptual framework (see Figure 2) primarily focuses on this relationship between pedestrian route

choice behaviour and route characteristics (red arrow). The yellow boxes represent all the different

factors that influence the route choice process. From the rich list of route characteristics, only the

quantitative environmental street characteristics will be taken into account in this study. The aim of

this thesis can be translated into a main and sub-research questions, as stated below. To answer the

research questions, the city of Zürich is taken as a case study in this thesis.

“Which environmental street characteristics have an influence on pedestrian route choice behaviour

in urban areas?”

The main research question can be answered by answering the following sub-questions:

• How do pedestrians make their route choice decisions according to literature?

• Which quantitative environmental street characteristics have an influence on pedestrian

route choice behaviour according to literature?

• Which type of choice model, which data collection techniques and modelling techniques are

suitable to model pedestrians route choices, concerning a revealed preference study?

• What reveals the GPS data about the choice behaviour of pedestrians in Zürich and which

hypotheses based on literature are confirmed?

• What is the influence of the size and the composition of the choice set on the quality of the

model results?

Input

•Pedestrian

•Network

•Non-network factors

Black Box

•Gain and process information

•Choice set formation

•Pedestrian Route Choice Process

Output: Selected route

6

• Is it realistic to treat walking behaviour as utility maximizing behaviour?

Figure 2: Basic Conceptual Framework

1.3 Contribution to practice

In practice, this thesis may be useful for local governments that aim at improving pedestrian

facilities and infrastructures. They could take the recommendations regarding designing pedestrian-

friendly environments into their new policies and urban plans. Also, the newly developed pedestrian

route choice model based on GPS data could support local governments in their decision-making.

The results of this case study might not be useful, but the methods used in this thesis might be.

Furthermore, design and consultancy firms can use methods and results of this thesis as well, as

support in their problem analysis, design process, planning practice and decision-making.

1.4 Contribution to science

For science, this thesis offers complementary evidence to existing experiments and theories. As

there are not many revealed preference studies about pedestrian route choice behaviour using

tracking systems, this thesis can provide new insights in this field.

In general, the last years the interest in pedestrian research has increased and the interest will grow

only more in the future. A lot of experiments with pedestrians, focusing on different aspects, have

been conducted in both controlled and real situations. Also, several studies have been done in the

field of pedestrian route choice behaviour. However, there is still a need for case studies because

results can be very different, depending on the environment and the situation. Moreover, most of

these studies on pedestrian route choice behaviour are based on stated preference data. As this

thesis uses revealed preference data, results and used methods could complement existing

knowledge, mainly based on stated preference studies. Also, (revealed preference) studies focusing

on pedestrian route choice behaviour in urban areas are rare. Many pedestrian route choice studies

were found on local level, such as in stations, airports or on events, but only a few on regional level.

7

There are a few pedestrian route choice studies found on a regional, urban level, but they are

mainly based on stated preference data or revealed preference data using self reported trips only

(surveys as well). The only research found by the author on pedestrian route choice behaviour on an

urban level using a tracking system for data collection, is the work of Broach & Dill (2015) of this

year as well. This shows how rare these studies are, and that (preliminary) results and methods

used in this thesis could be very useful for further research on this topic. As Broach & Dill (2015)

also used GPS data for estimation, this thesis could also offer useful material for a comparative

study. The study of Broach & Dill (2015) was conducted in Portland (Oregon), a city with a very

different network topology than Zürich, so a comparative study could provide interesting insights.

More specific, this thesis focuses on the relationship between pedestrian route choice behaviour and

route characteristics. This causal relationship between the built environment and travel behaviour

has been an interesting and heated topic for research and discussion for a long time. In general,

scientists agree that there is a correlation between the built environment and travel behaviour

(Boarnet & Crane, 2001), but a causal relationship is difficult to prove (Oakes, 2004). This thesis

could not describe this causal relationship, but it could provide some new insights into pedestrian

preferences towards different attributes from the built environment.

1.5 Research approach

In this thesis, the city of Zürich is taken as a case study to answer the research questions. To make

this project more valuable for science and practice, the author has chosen to look at this topic in

general and to take Zürich as a case study within this research. This way, recommendations based

on findings of this thesis could be used by other cities as well. The methods used in this thesis, and

maybe also the results found in this thesis, might be useful for other cities as well. Zürich is chosen

because the city has a policy that aims to increase the amount and length of slow traffic (cycling and

walking). Currently, no route choice model is used in their policy-making, so the results and methods

used in this project are very useful for the city. A similar study has already been done about cyclists

(Menghini, Carrasco, Schüssler, & Axhausen, 2010).

After defining the objectives and the research questions, a literature review will be conducted. The

aim of the literature review is to know what the state of the art is on this topic and what the

conclusions are of existing similar studies. The literature review consists of two parts: State-of-the-

Art on pedestrian route choice behaviour and State-of-the-Art on Pedestrian route choice modelling.

These two topics were separated because the aims of both literature studies are different. The first

gives a general idea about the whole topic: conclusions from existing studies on this topic could give

an idea of the expected results of this research, and they could support in selecting relevant route

attributes. The relevant route attributes will be brought into the estimation process. The second part

of the literature review gives an overview of the whole route choice modelling process and its

different techniques. Aim of this literature study is to provide guidance in selecting the most suitable

model formulation and modelling techniques for each step in pedestrian route choice modelling. This

process of exploration and selection is necessary as there are no modelling techniques available

(yet) that are especially developed for pedestrians. Findings from both literature studies will be used

to update the conceptual framework and to design and guide the revealed preference study and

model estimation process. Selected route attributes and selected modelling techniques will be

applied in the case study.

In this thesis, the city of Zürich is taken as a case study: the observed routes were collected in this

city and the street network of Zürich is used in the modelling process. The constructed street

network is based on OpenStreetMap data (OpenStreetMap, 2015) and on the Elevation Model of

8

SwissTopo (Federal Office of Topography SwissTopo, 2015). The observed routes were collected by

our colleagues of ETH Zürich, as part of a larger travel behaviour study in Switzerland. The GPS data

was collected by person-based GPS loggers and the trips took place anywhere in Switzerland, using

any kind of travel mode. The original GPS data set was collected by 159 participants (Zürich

residents), who all collected one week of travel data between August 2011 and December 2012. In

addition, they were asked to fill in daily travel diaries as well, to correct their trips and add missing

trips. Personal characteristics were not asked, so no socio-economic data were available of the

participants. The original GPS data set consists of 7233 stages, making 5284 trips. As we are only

interested in trips taking place in the city of Zürich, only these trips were extracted from the full data

set. This resulted in a data set of 3053 stages collected by 59 participants (all travel modes). This

raw GPS data set is extensively processed and filtered. After the last filtering and map-matching,

only clean GPS data of walking trips taking place in Zürich were left (51 participants, 580 stages).

This data set forms the actual observed routes. The next step in the modelling process is to

generate matching alternative non-chosen routes, using the observed routes from the previous step

and the given network. The resulting choice sets and the network will be used to calculate the route

characteristics and the overlap of the choice sets. These results, choice sets with calculated route

characteristics and overlap, could be used for choice modelling.

Before estimation of the models, a descriptive statistical analysis will be conducted on the chosen

and the generated non-chosen routes (the choice sets with calculated characteristics). A research

plan and hypotheses for the descriptive analysis will be formulated using findings from the literature

study (part 1). Based on these descriptive results, a research plan and hypotheses could be

formulated for the model estimation process. For model estimation, the software package BIOGEME

(Bierlaire, 2003) will be used. The choice modelling results can be used to answer the main research

question and to give recommendations for science and practice. Figure 3 shows the schematic

research approach and the thesis outline.

1.6 Scope and research limitations

This thesis focuses on pedestrian route choice behaviour under normal conditions in an urban area.

Only the influence of selected environmental street characteristics on route choices will be

investigated, other factors will be mentioned, but will not be taken into account in this study.

Unfortunately, socio-demographic data and information about traffic volumes were not available for

analysis in this research. For the case study, the scope of this project will be the city of Zürich, so

only trips that took place in the city of Zürich will be taken into account. The GPS data from person-

based GPS loggers were collected by our colleagues of ETH Zürich, as part of a larger study in

Switzerland. From this full GPS data set, including all trips throughout Switzerland and trips made by

all modes, only walking trips taking place in the city of Zürich will be extracted.

A limitation in this project is the available GPS data. The person-based data is collected by a

representative group of 159 Zürich residents. The question is how representative this group of

people is for the population of Zürich. Since personal characteristics were not made available, it is

not possible to verify how representative the sample is. From experience we have learned that older

people are more willing to participate in travel studies than younger people. The participants

collected one week of GPS data each. Another question is whether this set of data is representative

for a regular week in Zürich (special events, weather, holiday period). Also, when this week was

during a holiday period, the participant could make different trips than in a regular working week.

Other limitations are skills, software and the time.

9

1.7 Thesis outline

The outline of this thesis is illustrated in Figure 3. The green boxes represent the chapters and the

blue boxes represent the specific outcomes that will be used in the next chapters. Chapter 4 (the

case study) is in the figure below divided into three sub boxes, as this chapter covers three main

steps of the route choice modelling process, all three using different outcomes. The thesis can be

divided in two parts: a literature study and a case study. Findings from the literature study will be

applied in the case study. Thus, both studies will lead to the final results. Based on the final results,

recommendations and conclusions will be formulated.

Figure 3: Research approach and thesis outline

10

11

2 State-of-the-art on Pedestrian Route

Choice Behaviour

This chapter gives an overview of existing literature, aimed at gaining knowledge and identifying

gaps in order to develop a pedestrian route choice model. The purpose of this literature review is to

get an insight into the different aspects concerning pedestrian route choice behaviour and to

understand how travellers choose their route, and how this process is influenced by different factors.

The following sub-questions will be answered in this literature study:




Conclusions from this study will be used to design the revealed preference study and to specify the

route choice model. The environmental street characteristics that will be taken into account in the

route choice model will be selected and discussed.

2.1 Pedestrian route choice behaviour

A trip is an action resulting from several individual choices made by the trip-maker. These choices

depend on several factors, such as the available transport network, available services and personal

characteristics. The five main trip making choices, hierarchically related to each other are: (i)

whether to leave home to engage in an activity (activity choice), (ii) where to perform the activity

(destination choice), (iii) how to reach the destination (mode choice), (iv) when to depart (departure

time choice) and (v) which route to take, i.e. route choice (Bovy, Bliemer, & van Nes, 2006). When

we look at pedestrians, three levels can be distinguished in pedestrian behaviour. According to

Hoogendoorn & Bovy (2004) these levels of pedestrian behaviour are:

1. Strategic level: Departure time choice, and activity pattern choice

2. Tactical level: Activity scheduling, activity area choice, and route-choice to reach

activity areas

3. Operational level: Walking behaviour

This thesis focuses on the tactical level, namely on the route-choice to reach activity areas. First, the

route choice decision-making process will be shortly described. This is a complex process, influenced

12

by different factors. An overview of these factors is discussed in section 2.2.2. In Figure 1, this

process is illustrated in the second box.

2.1.1 Route choice decision-making

The decision-making process consists of two main sequential activities: finding the alternatives

(route search) and making a choice based on available information and experience (route choice).

Route search is the process of finding possible routes to reach the destination (choice set

formation). Route choice is the process of choosing a route from this set of known alternatives. A

basic assumption here is that the decision-maker chooses from a finite non-empty set of available

alternatives known to him (Fiorenzo-Catalano, 2007). According to Bovy & Stern (1990), this finite

set of available alternatives considered by the trip-maker is about 6 alternatives. The actual set of

alternatives is usually too large for the traveller: our brains are not able to compare all of them. The

available set of considered alternatives is a result of a filtering process by (significant) aspects. This

filtering process and an elaborated description of the rest of the route selection process can be

found in Bovy & Stern (1990). Their main conclusion is that travellers choose their routes on the

basis of their perceptions of the transport network. When utility maximization is assumed, individuals

choose, or intend to choose, the alternative with the highest perceived utility.

In contrast to other travel choices, such as mode or destination choice, analysing route choice is

more complex due to overlap and crossings in route alternatives (see Figure 4). This makes both the

route search problem (generation of alternative routes) and the route choice problem more complex.

Figure 4: Examples of overlapping and crossing routes (Bovy & Stern, 1990)

When it comes to making the actual choice, there are three situations possible. Often, especially in

complex transport networks, alternative routes can overlap or cross each other. This means that it is

possible that there are more decision points along the route. In Figure 5, the nodes between origin

O and destination D are new decision points.

Figure 5: Three different choice situations (Bovy & Stern, 1990)

13

Bovy & Stern (1990) describe the three possible choice situations as follows: in the first, the traveller

makes a simultaneous choice, which means that he makes his choice for the entire route before

starting the trip and he does not change it on the way. The second situation is when the traveller

makes a sequential choice: by each decision point along the way the traveller chooses once again

from among the sub-routes to his next decision point. An alternative route consists of a sequence of

independent choices. The third option is a compromise and is called hierarchical choice: the traveller

makes his choice at the decision points, but the choices are dependent upon previous choices. These

three situations can be illustrated with a decision tree (Figure 5). Studies have shown that all three

situations of route choice behaviour occur in reality.

2.1.2 Environmental street characteristics influencing route choice behaviour

According to Daamen (2004), the factors influencing route choice through a horizontal network can

be divided into four categories:

• Network characteristics, such as the number of available routes and overlapping routes

• Route characteristics: here a distinction can be made between link-additive and non-link

additive attributes. Quantitative attributes such as travel time and distance are link-

additive while qualitative attributes such as scenic characteristics are non link-additive

attributes (Ben-Akiva & Bierlaire, 1999). Other important factors in this category are

directness, crowdedness, safety factors, weather protection, road type and gradient

• Personal characteristics, such as age and gender

• Trip characteristics, such as trip purpose, time budget, mode used and departure time

Next to these four categories, there is another category of factors that could have an influence on

the route choices:

• Circumstances, such as weather conditions, road and traffic information, road works,

accidents on the route, day or night

According to Bovy & Stern (1990), the individual traveller chooses his path on the basis of route

characteristics. The other four groups of characteristics are of influence only on the relative

importance and perception attached to those route characteristics. Route characteristics could be

derived from measurable route attributes, such as distance and the number of crossings. Route

attributes are objective, but they are not perceived as equally important and their significance varies

according to the person, the kind of trip and to occasionally changing circumstances (Bovy & Stern,

1990). It is clear that the relative importance of choice attributes for pedestrians is different than for

car-users. This relation is illustrated in Figure 6.

Figure 6: From objective to subjective factors

The route attributes, which possibly have an influence in route choice, can be divided into three

categories: attributes that concern the roads of the routes, attributes of the traffic encountered on

the way and attributes of the road environment. These categories can be further divided into four

classes: general, effort-related, comfort-related and other attributes (see Table 1).

Route Attributes(Objective)

Perception model (Individual)

Route Choice Factors (Subjective)

14

Attributes General Effort-related Comfort-related Others

Road Road type, width,

length, number of

lanes, bridges

Intersections,

number of turns,

slopes, traffic lights

Road surface, road

lights, dedicated

roads, signposting

Speed limits

Traffic Traffic composition,

traffic density, speed

Congestion, waiting

time

Noise, parking

opportunities,

crowdness

Toll, safety,

reliability in

travel time

Environment Building types,

scenery, land use,

visible landmarks

Environmental

obstacles

Weather protection,

road lights,

noise/air pollution

Safety

Table 1: Overview of route attributes that form the route characteristics

In this thesis, we only focus on the first two categories: network characteristics and route

characteristics. For pedestrians, different factors are important than for car-users or public transport

travellers: route choice of motorized mode users is mainly driven by travel time while pedestrians

mainly choose their routes based on physical effort. It is also likely that weather protection and

safety factors (exposure to motorized traffic) only influence non-motorized travellers. Also scenic

characteristics are more influential on slow traffic users, as they interact more with the environment.

Another difference with motorized travellers is that pedestrians have greater manoeuvrability than

any other mode and they face less constraints in their movements: they do not need to move with

other traffic, don’t need to follow lanes, face less traffic regulations and they could stop whenever

desired. This high degree of freedom results in more alternatives from which he can select a route.

To find out which factors are the most influential, we look into several studies on pedestrian route

choice behaviour that have been carried out in the past. Only studies carried out on a urban network

are taken into account. Most of them are based on results of a survey; only a few have used

tracking systems as GPS. As this was the main data collection method, the results of most of the

studies are quite similar: trip length (shortest distance) appears to be in general the most important

in all survey-based pedestrian route choice studies. The reason for trip length is related to physical

effort rather than travel time. It is obvious that there are differences between trip purposes:

someone who goes shopping take obviously longer and not direct routes while someone who goes

to the station every day to catch the train chooses mainly the fastest route.

Seneviratne & Morrall (1985), Borgers & Timmermans (1986), Verlander & Heydecker (1997),

Agrawal Weinstein, Schlossberg, & Irvin (2008) and Guo & Loo (2013) all found that trip length is

the dominant factor for pedestrians when they choose a route. These studies on pedestrian route

choices were all based on a survey where the participants were asked to report their walked trip and

to indicate which factor is the most dominant in their route choice. They mainly give distance as

their most important factor. However, this could differ when the survey results are compared with

the alternative routes in the available network, as people mostly choose their perceived shortest

route. The perceived shortest route could be a different route than the shortest route available in

the network, as pedestrians might not know all available routes in the network. Also, people could

report a shorter route in a survey than their actual chosen route. This gives distance as most

dominant factor in the survey, while the actual behaviour could show different results. Other

significant factors that were reported earlier in literature are the built environment and safety

factors. Brown, Werner, Amburgey & Szalay (2007), Borst, de Vries, Graham et al. (2009) and

Agrawal Weinstein, Schlossberg & Irvin (2008) found that street environment and safety factors are

also important in route choices, next to the trip length. Brown, Werner, Amburgey, & Szalay (2007)

also mentioned the building attractiveness to be important. Guo & Loo (2013) and Rodriguez, Merlin

15

& Prato (2014) concluded that people are more likely to choose routes with footpaths, mainly for

safety reasons. Lastly, Broach & Dill (2015) found that next to trip length, also the amount of turns

and the gradient are important in route choices.

2.2 Conclusion

This chapter aims at answering the following sub-questions:




According to Hoogendoorn & Bovy (2004) pedestrians make choices on three levels: strategic level

(departure time choice and activity pattern choice), tactical level (activity scheduling, activity area

choice and route-choice to reach activity areas) and operational level (walking behaviour). The focus

in this thesis in on the tactical level: route-choices to reach activity areas. According to Bovy & Stern

(1990), there are three situations on how travellers make their route choices: simultaneous,

sequential or hierarchical. It is assumed that pedestrians mainly make their route choices

simultaneously: he or she makes a choice for the entire route before departing and does not change

it on the way. Which route is chosen is based on their perceptions of the transport network and on

personal characteristics. When utility maximization is assumed, individuals choose, or intend to

choose, the alternative with the highest perceived utility. Concerning the second sub-question,

factors influencing route choice through a horizontal network can be divided into four categories

(Daamen, 2004): network characteristics, route characteristics, personal characteristics and trip

characteristics. Then there is a fifth category that also influences route choices: circumstances, such

as weather conditions and traffic information. Environmental street characteristics belong to the

category route characteristics. In this group a distinction can be made between link-additive and

non-link additive attributes. Quantitative attributes such as travel time and distance are link-additive

attributes while qualitative attributes (scenic routes) are non-link additive attributes (Ben-Akiva &

Bierlaire, 1999). To limit the amount of link attributes, only quantitative environmental street

characteristics will be taken into account. It seems wise to start with quantitative attributes, as these

attributes are measurable (some from GPS data). In a later stadium, when it is proved that formal

pedestrian route choice models can be estimated from GPS data, qualitative attributes can be taken

into account.

To discover which quantitative attributes are most influential, several studies on pedestrian route

choice behaviour in urban areas are consulted. Most of them used surveys for data collection, only a

few have used tracking systems such as GPS. The general trend in survey outcomes is that

pedestrians choose their route based on trip length. Apparently, when pedestrians are asked to

report their main reason for choosing a route, trip length is the most dominant factor. For

pedestrians is trip length rather related to physical effort than to travel time. Other reported

important factors are scenery and safety factors, but these are not directly measurable thus not

taken into account in this thesis. Other selected attributes are road type and gradient. Road type

partly relates with safety factors and partly with comfort. Gradient is especially in a city as Zürich

very important, as it is strongly related to physical effort. Both environmental street characteristics

are measurable from available network.

16

17

3 State-of-the-art on Pedestrian Route

Choice modelling

Modelling route choice behaviour is essential to forecast travellers’ behaviour under hypothetical

scenarios, to predict future traffic flows on transportation networks, to understand travellers’

reaction and adaptation to facilities and information, and to evaluate travellers’ perceptions of route

characteristics (Prato, 2009). Modelling route choice behaviour is not an easy task, since one need

to deal with the complexity of representing human behaviour, the uncertainty about travellers’

perceptions of route characteristics, the high level of correlation among routes that share a large

number of links (overlap) and the lack of precise information about travellers’ preferences and about

the alternatives actually considered by the traveller.

Representing route choice behaviour consists in modelling the choice of a certain route within a set

of alternative routes. A route choice model predicts the probability that any given path between

Origin and Destination is selected to perform a trip, given a transportation network and an OD-pair

(Bierlaire & Frejinger, 2008).

The difference between route choice modelling and mode choice or destination choice modelling is

that there are usually more available alternatives. In mode or destination choice, the number of

alternatives is clear and they are easy to identify and visualize. In route choice, it is more difficult to

define realistic alternative routes. If the available routes need to be extracted from a very dense

urban network, hundreds of alternatives can be extracted. In this case of pedestrians, we need to

deal with this problem of dense networks and finding routes that are relevant to the traveller.

This literature review aims at gaining knowledge about the state-of-the-art on pedestrian route

choice modelling. Conclusions will be used to develop the revealed preference study and the route

choice model. The following research question will guide this chapter:

• Which type of choice model and which data collection techniques and modelling techniques

are suitable to model pedestrians route choices, concerning a revealed preference study?

An overview of the route choice modelling process can be found in Figure 7. Route choice modelling

is complex and involves several critical steps before the actual model estimation. Route choice

modelling requires both observed trips and alternative non-chosen trips. The first step in the

modelling process is to obtain trip observations. The second step is to generate alternative non-

chosen routes. For both challenging processes (data collection and choice set generation) there exist

different methods. The results of these two processes form the choice set. Within this choice set,

18

alternatives can be highly correlated due to overlap between routes. The last step before estimating

the route choice model involves an appropriate description of the correlation among alternatives.

Figure 7: Overview of the Route Choice Modelling process

The aim of this chapter is to find the most suitable methods for each step in the modelling process.

First, different approaches for route choice modelling will be discussed, aimed at finding a suitable

approach to model pedestrian route choices. Then, different methods for the three main steps in the

modelling process will be discussed in this chapter.

3.1 Modelling approaches to pedestrian behaviour

There are different modelling approaches to describe pedestrian behaviour at different levels.

Among these models are regression models to predict pedestrian flow operations under specific

circumstances, queuing models to describe pedestrian movements between nodes, macroscopic

models which sees pedestrians as a flow (a crowd with properties as fluid or gas) and microscopic

models, in which pedestrians are seen as individuals or agents (Hoogendoorn, 2001). Route choice

behaviour is often described in discrete choice models within the Random Utility Maximization (RUM)

framework, which describe route choice of pedestrians based on the concept of utility maximization.

The main assumption here is that pedestrians make a subjective rational choice between a finite

number of choice options. Another approach in discrete choice modelling is the Random Regret

Minimization-approach (RRM), which is based on the concept of minimizing regret in choice

situations rather that maximizing utility (Chorus, 2010). This concept might be suitable as well, but

will not be discussed in this thesis. Alternative approaches for route choice modelling other than

discrete choice modelling, such as approaches based on fuzzy logic, artificial neural networks or

approaches using decision trees will also not be discussed here.

Discrete choice models (DCM), and random utility models (RUM) in particular, are disaggregate

behavioural models used to predict the behaviour of individuals in choice situations. Application of

these models can be found in econometrics and transportation science. These models assume that

each alternative in a choice experiment can be associated with a latent quantity, a utility. The utility

of each alternative is based on the attributes of the alternative, the socio-economic characteristics of

the decision-maker (individual preferences), the choice situation and its similarities with the other

available alternatives (Schüssler, 2010). Based on the concept of utility-maximization, the individual

is assumed to select the alternative with the highest utility, given constraints from his or her activity

19

agenda and risks involved in their decisions, while taking into account the uncertainty in the

expected traffic conditions (Hoogendoorn, 2003).

3.2 Discrete Choice Models

Discrete choice models (DCMs) are widely used in transport research, as they could be applied to all

aspects of travel behaviour, such as destination choice, mode choice and household activity

scheduling. As concluded in the previous section, they could also be well applied to route choices

and therefore DCM’s are adopted here to represent pedestrian route choices. These models are

designed to describe and predict choices of individuals between a set of finite distinct alternatives,

and they are based on utility maximization, which is consistent with the rational behaviour

assumption. Moreover, DCMs are disaggregate behavioural models, hence suitable for a microscopic

approach for pedestrian behaviour (individual choice behaviour), as in this thesis. Therefore, in this

thesis pedestrian route choice modelling will be reviewed within this DCM framework.

A Discrete Choice Model has four aspects: a choice set, a list of attributes describing the

alternatives, a list of socio-economic characteristics describing the decision-maker and a random

term capturing unobserved errors and uncertainties regarding the choice process (Antonini,

2005). The decision-maker could represent a single individual, a household, a firm or an

organization. He uses decision rules to process the available information in order to make a choice.

In this thesis, the decision-maker is a single individual. The choice set represents the set of available

alternatives that are known to the decision-maker. The alternatives have link-additive and non-link-

additive attributes. Quantitative attributes are link–additive attributes, such as length and travel

time, and qualitative attributes are in general non link-additive attributes, such as scenic

characteristics (Ben-Akiva & Bierlaire, 1999). In this thesis, both types are important as pedestrian

route choices are influenced by both types of attributes. This is not always the case, as route

choices of car-users are highly dependent on quantitative attributes. As a decision rule, the decision-

maker is assumed to maximize his utility that he perceives from each of the alternatives. His

behaviour is rational and consistent, thus he is assumed to choose the alternative with the highest

utility. Inconsistencies in choice experiments can be related to the analyst’s lack of knowledge.

Because analysts do not know with certainty the utility values, these are treated as random

variables. Manski (1977) formalized the random utility approach, which identifies four sources of

randomness: unobserved alternative attributes, unobserved socio-economic characteristics,

measurements errors and instrumental variables. Given a choice set nC consisting of j alternatives

and a specific population of N individuals, the (random) utility function inU perceived by individual n

for alternative i could be defined as follows:

in in inU V ε= +

(1)

where i = 1, .. , j and n = 1, .. , N. inV represents the deterministic part of the utility, based on

the alternatives’ attributes and the socio-economic characteristics of the decision-maker and being

defined as = f(β,xin

), where β is a vector of taste coefficients, and xin

is a vector of the

attributes of alternative i as faced by individual n in the specific choice situation (Schüssler, 2010).

The inε term is a random variable, which captures the uncertainty and unobserved errors.

inε

inV

20

In general, there are two types of route choice models, based on random utility theory: deterministic

and stochastic route choice models. Deterministic route choice models assume unrealistically that

travellers have perfect knowledge about path costs and choose the route that minimizes their travel

costs. Stochastic route choice models are probabilistic models that assume reasonably that travellers

have imperfect information about path costs and choose the route that minimizes their perceived

travel costs (Prato, 2009). Since choice behaviour can be very complex, probability is used to take

stochasticity of decisions into account (Train, 2003). In this thesis, the focus will be on the last

category, as in a revealed preference study it is impossible that the decision-makers have perfect

knowledge about all available alternatives and their path costs.

In probabilistic models within the random utility model framework, travellers are assumed to

maximize utility. The discrete choice model estimates the probability for each alternative i of being

chosen by individual n from a choice set nC :

( )| ( , ) ( max ) nn n n in jn n n in j C jnP i C P U U j C P U U∈= ≥ ∀ ∈ = =

(2)

Within the Discrete Choice Modelling framework, there are several types of model formulations to

model pedestrian route choice. Different assumptions on the random terms lead to different model

formulations. However, not all of them are suitable to model individual pedestrian route choices.

Some of them suit better to pedestrian route choices; others better to other transport modes. The

aim of this section is to find out which model structure is most suitable to model specifically

pedestrian route choices in real networks in a revealed preference study.

3.2.1 Multinomial Logit Model and its limitations

The Multinomial Logit Model (MNL) is the simplest and the most used discrete choice model.

The model has a logit structure, which assumes that the perceived attractiveness of the alternatives

are mutually independent and random variables are identically Gumbel distributed (Bovy, Bliemer, &

van Nes, 2006). However, despite its large use in literature, it shows some important limitations,

especially for application in route choice modelling. The most important one is that in the MNL it is

assumed that error terms are independent and identically Gumbel distributed, which results in the

Independence from Irrelevant Alternatives (IIA) property. Antonini (2005) formulated this property

as follows: the ratio of the choice probabilities for two alternatives is not affected by the systematic

utilities of the other alternatives (Antonini, 2005). For route choice modelling, this property is a

limitation when two or more alternatives share common (un)observed attributes (overlap). Since the

error terms in the MNL model are independently distributed, no (un)observed correlations are

included in the model. Due to the IIA property, the MNL fails for accounting for similarities between

alternatives. As it is very likely to have overlap in real networks, the MNL model is not suitable to

model route choices in real networks. This problem can be illustrated by the well-known red

bus/blue bus paradox (Debreu, 1960) or by the example illustrated in Figure 8. Here, Path 1, 2 and

3 all have the same distance (T). However, there is an overlap in Path 1 and 2. When route utility is

based on distance only, the MNL would predict in this case a share of one-third for each of the

routes. In reality, the traveller is more likely to see only two options here: Path 1 and 2 (as one

option) together would have a share of one-half and Path 3 also one-half. This is more likely when

the overlap between Path 1 and 2 approaches the length of the whole route.

21

Figure 8: The overlapping Path problem (Ramming, 2002)

Another limitation of the MNL model, relevant in this study, concerns with deterministic taste

variations. MNL models can only capture deterministic taste variations, while it seems plausible that

different agents have heterogeneous preferences (Bliemer & Rose, 2010). In this thesis, no relevant

data is available to divide the population into different segments, so this assumption of the MNL

model forms a limitation here. In case of homogeneous agents, this would not cause any problem.

Given these two limitations, the MNL model seems not to be the suitable model to represent

individual pedestrian route choice behaviour in real size networks. The model structure is robust

(irrelevant routes in the route choice set do not bias the route choice probabilities), but is does not

take route overlap into account (Bliemer & Bovy, 2008). Moreover, it does not reflect the individual

preferences of the pedestrians. The last issue could be resolved by deterministically identifying

segments in the population. The first issue may only be addressed by using alternative model

structures. More literature about capturing these two limitations of the MNL model can be found in

Hess et al. (2005) and Train (2003).

3.2.2 Accounting for overlap between alternatives

In real size networks, the overlap problem cannot be avoided. The question is not if there is an

overlap problem, but whether overlap between alternatives has positive or negative effects on their

choice probabilities. Some studies have shown that similarities reduce the probability to be chosen,

but other studies (such as Hoogendoorn-Lanser and Bovy (2007)) suggest that this assumption does

not hold for all choice contexts. It could have a positive effect as it could give the possibility to

switch routes or connections while traveling.

Overcoming the IIA property is a major research issue in the field of discrete choice modelling.

There are different alternative model structures in use to overcome the overlap problem. According

to Schüssler (2010) these model structures belong to one of the following approaches:

• introducing adjustment terms in the deterministic part of the utility function (group 1)

• imposing a nesting structure (group 2)

• explicitly modelling the correlation using multivariate error terms (group 3)

The first group of models consists of modifications of the Logit structure. These models are based

on the assumption that the utility of an alternative is influenced by its level of similarity with other

alternatives and that it can be corrected accordingly (Schüssler, 2010). They aim to capture

22

similarities by correcting the systematic component of the utility function, by adding a deterministic

adjustment term that measures the similarity (similarity attribute) to the utility function. This means

that the utility consists of two parts: the first depends only on the attributes of the alternative itself

and a second part that depends on the attributes of other alternatives. The utility function for these

models could be defined as follows:

( )in in in inU V f A ε= + +

(3)

where inA is the adjustment term that measures the similarity between alternative i and all other

alternatives j ≠ i and f() is the transformation of inA .

The advantage of these models is that they maintain the simple MNL model structure (the error

terms remain i.i.d. Gumbel distributed). The challenge of this approach is to choose the appropriate

adjustment term. Examples of these models are C-Logit and Path-size Logit (PSL). These model

formulations follows the generally made assumption that the similarity of an alternative with other,

competing alternatives decreases its utility and, thus, its probability to be chosen (Ben-Akiva &

Bierlaire, 1999).

The second group consists of generalizations of the Logit structure. Generalizations of the Logit

structure have a more complex error structure and are members of the Generalized Extreme

Value (GEV) model family, introduced by McFadden (1978). Models of the GEV family allow taking

correlation patterns in the choice set into account. The unobserved portions of utility for all

alternatives are jointly distributed as a generalized extreme value. This distribution allows for

correlations over alternatives (Train, 2003). Detailed theory about GEV models can be found in

McFadden (1978). Several models can be derived from the GEV formulation, such as the MNL (when

all correlations are zero), the Nested Logit (NL), Cross Nested Logit (CNL) model and the

Paired Combinatorial Logit (PCL). In these models, alternatives of the choice set are subdivided

into nests. Alternatives belonging to the same nest are correlated to each other.

Modifications and generalizations of the Logit structure could deal with overlap, but they could not

incorporate random taste heterogeneity appropriately. The last group of models could deal with both

limitations of the MNL model. The Probit model is based on the assumption that the unobserved

attributes are multivariate normal distributed (Bovy, Bliemer, & van Nes, 2006). In MNL and other

GEV models these error terms are assumed to be independently and identically Gumbel distributed.

This assumption of the Probit model is a limitation as well, since in some situations normal

distributions are inappropriate. This is for example the case with price coefficient, which is rarely

positive for people. The Mixed Logit (Logit Kernel) model has properties of both the Logit model

and the Probit model. It is a model in which the error terms consist of both a probit-like portion

(unobserved attributes are multivariate randomly distributed) and a logit-like portion, an additive

i.i.d. Gumbel distributed portion (Walker, 2001). The probit-portion in the utility function captures

the correlation between alternatives and allows for flexibility while the logit-portion aids in

estimation. When the cross-alternative correlations in these models are estimated to be zero, the

model reduces to MNL (Bekhor, Ben-Akiva, & Ramming, 2006). Advantages of these models are

their flexibility in handling correlations over alternatives and time and their ability to incorporate

random taste variation appropriately. Disadvantage is that these models cannot be computed

analytically thus simulation is required. An overview of the main model formulations, with a short

description and their pros and cons can be found in Table 2.

23

3.2.3 Models suitable for pedestrian route choices

Advanced models of the GEV family and the Mixed Logit Model are promising within the field of

pedestrian route choice modelling, but they significantly increase the model complexity and they

bring difficulties in the estimation, especially for large networks and data sets as in this research. An

overview of route choice models can be found in Table 2. For this research, especially the ones

already used successfully for pedestrians or cyclists are interesting. Route choice behaviour of

cyclists is comparable to route choice behaviour of pedestrians since their behaviour is also

influenced by non-link-additive attributes and characteristics. This is different for car-users, where

route choice behaviour is mainly driven by link-additive attributes such as travel time.

Type of

Route Choice

Model

Pros Cons Compu-

tational

effort

required

Introduction

in Route

Choices

Applied to

Pedestrians

or Cyclists?

Binomial Logit Simple model

structure

Only 2

alternatives

available

Low (Cheung &

Lam, 1998)

Multinomial

Logit

Simple model

structure

No overlap, no

taste variations

Low (McFadden,

1973)

(Borgers &

Timmermans,

1986); (van der

Waerden,

Borgers, &

Timmermans,

2004)

C-Logit Simple model

structure,

commonality

factor for overlap

Several

formulations of

commonality

factor, but lack

of theory or

guidance on

which to use

Medium (Cascetta,

Nuzzolo,

Russo, &

Vitetta, 1996)

Path-size Logit Simple model

structure, path-

size term for

overlap,

theoretical

foundation

available

Several

formulations

proposed,

correlated with

observed and

unobserved

attributes

Medium (Ben-Akiva &

Bierlaire,

1999)

(Daamen &

Hoogendoorn,

2004);

(Menghini et al.

(2010)

Nested Logit Correlated

alternatives in

one nest

Each

alternative

belongs

exclusively to

one nest

Medium (Ben-Akiva,

1973)

(Liu, Usher, &

Strawderman,

2009)

Cross-Nested

Logit

Each alternative

may belong to

more than one

nest

Complex for

realistic size

network

High (Vovsha,

1997)

(Antonini,

Bierlaire, &

Weber, 2006)

24

Paired

Combinatorial

Logit

Creates a nest

for each pair of

alternatives and

estimates a

dissimilarity

parameter for

each nest

Complex for

realistic size

network

High (Chu, 1989)

Multinomial

Probit model

Captures

correlation

among all

alternatives,

captures random

taste variation

Simulation

required, error

terms are

multivariate

normal

distributed

High (Daganzo &

Sheffi, 1977)

(Hofmann,

2000);

(Guo & Loo,

2013)

Mixed Logit

(Logit Kernel)

Captures

correlation

among all

alternatives,

captures random

taste variation

Simulation

required,

complex for

realistic size

networks

High (Ben-Akiva &

Bolduc, 1996);

(McFadden &

Train, 2000)

(Antonini,

Bierlaire, &

Weber, 2006);

(Srikukenthiran,

Shalaby, &

Morrow, 2014)

Table 2: Overview of model formulations applied to slow modes

To assess the different model formulations, their pros en cons and their computational effort

required are summarised in Table 2. When a model is already successfully applied to pedestrians or

cyclists, this can also be seen as an advantage. The chosen route choice models need to meet the

following criteria: it should be applicable to real size and detailed networks, it should be able to

capture correlation among alternatives and it should be able to manage the extensive data set.

Preferably, it also takes random taste variations into account. Based on literature research, briefly

summarised in Table 1, we could conclude that at least three models are inappropriate to model

route choice behaviour in real size networks in general. Binomial logit is not useful because in route

choice modelling there are usually more than two alternatives. Multinomial Logit is not useful

because it does not take overlap of routes into account and Nested Logit not because in this model

each alternative belongs exclusively to one nest. However, these models could be used in other

pedestrian’s choice studies, where only distinct alternatives are considered. This is for example the

case in a study when only a few distinct route options are available in a specific area or in a study

where an elevator or stairs are considered.

In this research, the Path-Size Logit model (PSL) will be adopted. The PSL is chosen because

this model type can capture overlapping among routes and it is known to be sufficiently robust to

cope with the necessary simplifying assumptions (Daamen & Hoogendoorn, 2004). Moreover, the

model has the relatively simple MNL structure. The PSL model is preferred to the other model

structure of this group, the C-Logit, because Cascetta et al. (1996) propose several different

formulations for adjusting for overlap, but they do not offer any guidance or theoretical basis for the

selection of which one to use. The lack of theoretical guidance for the C-Logit model and the

availability of theoretical foundation for the PSL model was the motivation to choose the PSL model.

Also, Ramming (2002) proved that the PS Logit outperforms the C-Logit in any case and indicated

that C-Logit is not recommended in large urban networks. In addition, in real size networks the

relatively simple PSL model has been shown to perform well relative to more complex model forms

(Broach, Gliebe, & Dill, 2011). Although nested logit models should outperform the PSL model, they

are limited in real size and detailed networks (Bekhor, Ben-Akiva, & Ramming, 2006). The downside

of this model is that several formulations for the Path-Size factor (adjustment term) have been

25

proposed, so the challenge is to select the most suitable one. This issue will be discussed in section

3.6. When this model shows satisfactory results, other, more complex model structures can be

considered, such as Cross-Nested Logit and Mixed logit.

3.3 Observed routes

Route choice modelling requires both observed routes and matching non-chosen routes. The quality

of the findings of the route choice models depends on both observed and non-chosen routes, so the

processes of obtaining both sources are both very important. This section will focus on the observed

choices. According to Guo (2013) and Broach (2015) there are only a few studies focusing on

developing a formal pedestrian route choice model on real street networks. Most studies on

pedestrian route choices has focused on pedestrian movements at small scales, on networks inside

buildings such as stations or airports or in evacuation scenarios. These studies often require micro

simulation techniques. It is clear that such modelling is quite different than modelling route choice at

the regional level. Also for data collection, different methods are used.

3.3.1 Data collection methods

The dominant data collection method in route choice studies on a urban level has been stated

preference surveys (Broach, Gliebe, & Dill, 2011). Stated preference methods are preferred for

several reasons: data collection is easier, less time consuming and less expensive, compared to

other data collection methods. In addition, no detailed travel network data is needed and the

challenge to generate alternative non-chosen routes based on a real network can be avoided. Also

model specification and estimation are easier, as the data is “clean” and the size and the

composition of the choice set is controlled. But SP methods have drawbacks as well. One of the

disadvantages is that it is difficult to predefine what travellers consider when choosing a route

(Halldórsdóttir, Rieser-Schüssler, Axhausen, Prato, & Nielsen, 2014). It is also difficult to know how

well a participant can map textual or pictorial representations to her or his preferences for real

facilities (Broach, Gliebe, & Dill, 2011). Moreover, it is very possible that salient features of routes

are not captured in text or in a picture. Another issue in surveys is the response burden: the effort

required by the participant to answer and complete the survey. The survey mode (written, face-to-

face, computer), length of the survey, complexity of questionnaire and similarities in the choice set

could influence the response rate and the trustworthiness of the results. However, this does not

mean that stated preference studies are not useful. For example in policymaking or in transport

planning, surveys are a powerful tool for testing rare or non-existent scenarios.

The opposite of stated preference studies are revealed preference studies. Where stated preference

studies can be defined as in a laboratory setting, revealed preference studies deal with real life

situations. Revealed data give information about choices that people actually made. There are

different methods to collect data about revealed trips. Some are very useful on smaller scales, such

as stations (direct observations, video cameras, smart card data, Bluetooth tracking, Wi-Fi sensors)

while other methods are more useful on regional scale. One of the methods useful on a regional

scale is to collect data about actual trips via a survey or a travel diary. Participants are asked to

report the trips they actually made and to indicate which route they have chosen. The advantage is

that the data requires less post-processing. The drawback is again the response burden.

Another data collection method in revealed preference studies is to collect GPS data using special

devices or smartphones. The reason why there are not many revealed preference studies reported

26

using tracking systems is that it is used to be very time-consuming and costly. In addition, the data

collected was not very accurate, so very extensive post-processing was required. New techniques

and developments in GPS technology has changed the situation substantively: today it is possible to

trace the route choice of travellers in detail across all modes, by using lightweight and cheap devices

over multiple days (Menghini, Carrasco, Schüssler, & Axhausen, 2010). Also new possibilities in

(automatic) processing of GPS points make revealed preference studies less time-consuming. But

still, GPS studies require extensive post-processing (filtering and smoothing) of the GPS points and

they also require having a detailed digital network to map the routes. An advantage is that it

reduces the response burden, so this could lead to more participants in the study. It also eliminates

the problem of the underreported trips, as all trips will be tracked by the devices.

New techniques to collect, process and analyse rich data sets are still under development.

Innovative data collection methods such as tracking via Smartphone and dedicated apps, social

media, Bluetooth/WiFi sensors, data collection using Augmented Reality and experiments in Virtual

reality will be tested in the near future. Also new techniques for (automated) processing and

analysis (big data analytics, data fusion, linguistic data analysis applied to social media messages,

advanced GIS analysis) will be developed (Hoogendoorn, 2015)

3.3.2 RP studies in pedestrian research

The lack of rich data sets and techniques to collect, process and analyse large amounts of data may

be the main reason why there are only a few revealed preference studies using tracking systems on

pedestrians (Hoogendoorn, 2015). However, they are widely used in bicycle research. Much of the

evidence about relative preferences of pedestrians is based upon (stated preference) survey

techniques, rather than revealed preference (tracking) techniques.

Authors Data collection

method

Important factors Route

choice

model

Hill (1982) Stalking Trip length No

Seneviratne & Morrall

(1985)

Survey (on-street) Trip length No

Borgers & Timmermans

(1986)

Survey (on-street) Trip length Yes, MN Logit

Verlander & Heydecker

(1997)

Survey (travel

diary at home)

Trip length No

Brown, Werner, Amburgey,

& Szalay (2007)

Social milieu, building

attractiveness, personal safety

Agrawal Weinstein,

Schlossberg, & Irvin (2008)

Survey (on-street) Trip length, but also safety

factors

No

Borst, de Vries, Graham et

al. (2009)

Survey (home) Street environment (for

elderly)

Guo & Loo (2013) Survey (on-street) Trip length, retail, foot path Yes, Probit

Rodriguez, Merlin, & Prato

(2014)

GPS + travel diary Trip length, safety factors,

foot path, green (for girls)

Yes, PS Logit

Broach & Dill (2015) GPS Trip length, turns, gradient Yes, PS Logit

Table 3: Overview of RP studies in pedestrians' research

27

Most of the revealed preference studies on pedestrians used surveys to gain information about

walked trips. Participants are for example asked to report their walked trips by drawing their trips

and by selecting the factor that would best describe the reason for selecting the walked route

(Seneviratne & Morrall, 1985). On-street surveys are preferred to at-home surveys, as many

pedestrian route decisions may not be recursive thus subject to quick memory loss (Guo & Loo,

2013). Limitation here is that reported trips could differ from actual trips. A nonconventional method

for collecting data is ‘stalking’, used by Hill (1982). When using this method in an urban area, it

requires that the observer actually follow the subject on foot. To gain personal information of the

participants, it is necessary to hand over a questionnaire in the end. An overview of RP studies in

pedestrian research can be found in Table 3. Only studies of pedestrian route choices in urban

networks are taken into account. Of these studies, only a few have estimated formal pedestrian

route choice models. Thereof, only two studies used GPS data to estimate a pedestrian route choice

model. In contrast to bicycle route choices, there are many studies found using GPS data for

estimating bicycle route choice models (Menghini, Carrasco, Schüssler, & Axhausen (2010); Hood,

Sall, & Charlton (2011); Broach, Gliebe, & Dill (2011)). Conclusion is that it useful to estimate a

pedestrian route choice model based on revealed preference GPS data, because there are only a few

of these kind of studies done before. Therefore, a revealed preference study is adopted here.

3.4 Generation of alternative routes

As stated before, both the collection of observed choices and the generation of alternative non-

chosen alternatives are challenging processes. The first challenge has greatly benefitted from new

technologies and software for data processing. The second, which concerns with the generation of

realistic and heterogeneous alternative choices and the composition of the choice sets, is still

challenging and a topic for future research. In this section, different choice set generation

procedures will be evaluated using evaluation methods derived from literature. These specific

procedures are selected because they are likely to be suitable and efficient for highly detailed

pedestrian networks. As most studies on choice set generation have focused on implementing choice

set generation procedures for cars or public transport, which normally use a simplified network, it

can be a difficult task to select and implement a suitable procedure for pedestrians in real size

networks.

3.4.1 Choice Set Generation in modelling process

Route choice modelling is typically divided into a two-stage process: first, the generation of plausible

and heterogeneous alternative routes that are relevant to the particular trip maker, to form the

choice set, and second, the calculation of the probability that a given route is chosen from a

specified choice set (Bekhor, Ben-Akiva, & Ramming, 2006). Choice sets are defined as the collection

of travel options perceived available (actual subjective choice set), out of all alternatives that exist

(universal choice set), to an individual in satisfying his travel demand (Bovy & Fiorenzo-Catalano,

2007). As the traveller chooses one of the feasible routes, and from the researcher’s perspective,

the researcher does not know which alternatives are actually considered, the actual subjective

choice set or estimated objective choice set is relevant in route choice modelling (see Figure 9).

Choice set generation is especially in a pedestrian network very complex since there are even more

alternative routes available than in a car or public transport network. However, many of these

possible routes are not useful in route choice modelling, as many are unlikely to be considered by

the particular traveller. These irrelevant routes are routes that have a significantly lower utility than

28

the best route alternative (Bliemer & Bovy, 2008). Moreover, as mentioned earlier, the traveller is

only able to consider about 6 alternatives (Bovy & Stern, 1990), so it makes no sense to take all

possible routes into consideration for estimation. Travellers often limit the availability to attractive

routes on the basis of their constraints, preferences and experiences. This can be very different for

every traveller. Also, some routes may not be perceived as distinct alternatives, because of high

overlap with other routes.

Figure 9: Hierarchy in choice sets, from the pedestrian's and the researcher's perspective (Hoogendoorn-

Lanser & van Nes, 2004)

In route choice modelling, the task is to predict route choice among the routes that any traveller

might consider (feasible routes). This process is very complex, as the analyst lacks information

about what the exact alternatives are, that are known to and considered by the traveller (the

composition of the choice set) and the analyst also lacks information about the actual size of the

choice set. Moreover, composition and size could be very different for every traveller.

3.4.2 Requirements for the choice sets and the method

Various studies have shown that the size and composition of choice sets have an influence in case of

estimation and prediction (see van der Waerden et al. (2004); Prato & Bekhor (2007); Bliemer &

Bovy (2008)). This means that the quality and correctness of the choice set parameter estimates

and of demand predictions depend on the quality, size and composition of the adopted choice sets.

It depends on the purpose of the choice sets which requirements need to be posed to the choice

sets in terms of size, composition and variety. There are three major purposes for choice set

generation: (1) analysis of travel alternatives to determine their availability, number, characteristics,

variety and composition; (2) estimation of disaggregate demand models to uncover behavioural

parameters of utility functions at the individual level, using observations of individual route choices

and (3) prediction of choice probabilities to determine route and link flow levels in networks, using

route choice models with estimated parameters (Prato, 2009). In this thesis, the second purpose

applies for choice set generation (choice model estimation). The main requirement for this purpose

is that the generated choice sets should include the observed chosen alternative. The requirements

on the quality of the choice sets are less strict, as not all relevant alternatives have to be included

29

(Bovy, 2009). Satisfactory estimation results can also be obtained for small well-sampled choice

sets. But, this only applies for MNL or its modifications. For several other model specifications, it is

shown that choice set size and composition affect model estimates and choice probabilities (Prato &

Bekhor, 2007). Hoogendoorn-Lanser (2005) proposed a few other requirements regarding generated

choice sets for estimation of route choice models: the choice sets should not include dominant

alternatives (that are better or worse than other alternatives in all aspects), the choice sets should

contain a sufficient variety of alternatives and lastly, the choice sets should show sufficient

overlapping among alternatives in order to be able to estimate the related parameter (Hoogendoorn-

Lanser, 2005). She also stated that choice sets need not to be exhaustive for estimation purposes,

but they should be representative subsets of all of available alternatives. A more detailed elaboration

on different purposes and requirements on size and composition of the choice sets to be used can

be found in Hoogendoorn-Lanser (2005) and Bovy (2009).

Besides these general requirements for choice sets, there are a few other requirements for the

choice set and choice set generation method, posed by the author. These requirements apply in

pedestrians’ research. The choice set generation method should be able to efficiently handle large

detailed networks, as the networks that pedestrians use and consider are more detailed than the

networks of car-users or public transport users. Only repeated shortest path searches have been

proven to be efficient in large networks. Also, stochastic path generation and link elimination

methods of this class were also successfully applied in large networks (Halldórsdóttir, Rieser-

Schüssler, Axhausen, Prato, & Nielsen, 2014). Second, the choice set generation method should be

able to generate heterogeneous alternatives, while also taking environmental variables into account.

For pedestrians, it is desirable that the choice set is heterogeneous in environmental variables as

well, as this influences the route choices. While route choices of car-users heavily depend on a

single attribute (travel time), pedestrian route choices depend on various environmental variables

(such as distance, gradient and scenery) as walking requires physical effort and pedestrians are

more sensible to influences of the built environment (weather, safety, other traffic) than car users.

3.4.3 Evaluation methods

Not only the size and the composition of the choice sets have influence on the results, but also the

choice set generation method. The effectiveness of different choice set generation methods is

defined in terms of the generated routes’ consistency and coverage of the observed routes (Bekhor,

Ben-Akiva, & Ramming, 2006). The choice set is considered consistent with the observed behaviour

when the choice set generation algorithm has replicated the observed route. The consistency is

evaluated by considering the length of the links that the generated route shares in common with the

observed route for each choice set. This overlap is typically expressed as a percentage of the

observed route distance (Halldórsdóttir, Rieser-Schüssler, Axhausen, Prato, & Nielsen, 2014):

Onr = Lnr

Ln

(4)

where Onr is the overlap measure, Lnr is the overlapping length between the path generated by

choice set generation method r and the observed route for pedestrian n, and Ln is the length of the

observed route for pedestrian n.

30

Coverage is defined as the percentage of observations for which an algorithm or set of algorithms

has generated a route that satisfies a particular threshold for the overlap measure (Bekhor, Ben-

Akiva, & Ramming, 2006). This is formulated by Halldórsdóttir et al. (2014) as follows:

maxr

I (Onrn=1

N

∑ ≥ δ ) (5)

where I() is the coverage function, and when its argument is true it is equal to one and when false it

equals to zero, andδ is the threshold for the overlap measure.

The effectiveness can also be evaluated by investigating the heterogeneity of the choice set

composition. Heterogeneity can be explored by calculating the Path-Size factor for each route in

each choice set. The calculation of the different Path-Size factors is discussed in section 3.6. These

Path-Size factors represent the average degree of independence of the routes and indicate whether

the choice set contains heterogeneous routes.

A note here is that formal evaluation of the relevance and realism of the generated choice sets is

difficult in practice, as the actual choice sets in general are unknown to the analyst. Moreover,

empirical analysis has shown that no choice set generation method is able to fully reproduce the

observed routes. The best results were found by Ramming (2002) and Prato and Bekhor (2006):

both found 91% of the observed routes were fully reproduced. Ramming (2002) combined various

algorithms while Prato and Bekhor (2006) used their branch-and-bound algorithm.

3.4.4 Different procedures

Choice set generation methods can be classified into four categories: deterministic shortest path-

based methods, stochastic shortest path-based methods, constrained enumeration algorithms and

probabilistic approaches (Prato, 2009). An overview of the methods can be found in Figure 10.

Deterministic shortest path-based methods are based on repeated shortest path searches in the

network, where the computation of optimal paths follows the modification of one or more input

variables such as link impedances, route constraints and search criteria (Prato, 2009). Most of the

path generation methods belong to this category. Solutions are often deterministic, and origin-

destination pairs are processed sequentially. These methods are computationally attractive due to

the efficiency of shortest path algorithms.

The second category is formed by stochastic methods: methods that generate an individual specific

subset. In general, there are three approaches in this group: simulation, Doubly Stochastic Route

Choice Set Generation and the importance sampling approach. The simulation approach generates

alternative feasible routes by drawing link costs from different probability distributions. The Doubly

Stochastic Route Choice Set Generation approach proposed by Bovy and Fiorenzo-Catalano (2007) is

similar to the simulation approach but it accounts for variation in travellers’ link costs and differences

in travellers’ attribute preferences by drawing random costs and random parameters from probability

distributions. In the importance sampling approach, the choice set generation method generates

suitable subsets of routes for model estimation. Using only a subset of alternatives in estimation, it

is required to calculate and add a sampling correction to the path utilities, in order to get unbiased

estimation results. The result is a choice set of which all alternatives belong to the true (actual)

choice set of the traveller (all alternatives are actually considered). Most choice set generation

approaches aim at generating universal choice sets. In importance sampling, alternatives which are

expected to have high choice probabilities (attractive routes) have a higher probability of being

31

sampled (generated) than unattractive routes (Frejinger, Bierlaire, & Ben-Akiva, 2009). Importance

sampling is preferred to random sampling of alternatives, as a random sample is likely to contain

alternatives that a traveller would never consider. When a chosen route is compared to a set of very

unattractive routes, it will not reveal much information on the route choices. A new method of the

importance sampling approach is Metropolis-Hastings sampling of paths, which sample paths

according to a given distribution from a general network. It generates a Markov chain with a

stationary distribution that coincides with an arbitrary, pre-specified distribution (Flotterod &

Bierlaire, 2013).

Constrained enumeration methods form the third category. The Branch & Bound method was

proposed by Hoogendoorn-Lanser (2005) for multi-modal networks and by Prato and Bekhor (2006)

for route networks. This method constructs a connection tree between origin and destination of a

trip by processing link sequences according to a branching rule, while accounting for logical

constraints in order to increase route heterogeneity. This algorithm generates very realistic and

heterogeneous routes, but the computation time in a detailed network is very high.

The last group consists of probabilistic methods. Using these methods is complex in real size

applications. The Random Walk algorithm developed by Frejinger (2009) is promising in pedestrian

research. Broach (2015) used this algorithm successfully for generating pedestrian trips. This

method is currently being updated and could be promising for future pedestrian research.

Figure 10: Overview of Choice Set Generation Methods

Choice set generation is a heavily studied area within route choice modelling, but literature on

generating choice sets for pedestrians in a regional network is very sparse. As a reference, we could

use studies that focus on route choices of cyclists, as cyclists also use a detailed regional real size

network and their route choices are also influenced by various environmental factors (distance,

gradient, scenery, etc.). In all these selected studies bicycle route choice models are estimated from

revealed preference GPS data. Menghini et al. (2010) applied a Breadth First Search on Link

Elimination (BFS-LE) method (Schüssler, 2010) with a single attribute cost function (only route

length); Broach et al. (2011) compared a modified route labelling method to a K-shortest path link

penalty, a simulated shortest paths and labelled routes method. Hood et al. (2011) implemented a

Doubly Stochastic Generation method (2007) with a multi-attribute cost function. The last one

showed the best performance. The reason might be that the first two researchers used only one

attribute or only travel time and distance in their cost function while the last one used a multi-

attribute cost function. Hood et al. (2011) managed to reproduce one-third of the observed routes.

Halldórsdóttir et al. (2014) evaluated the efficiency of three choice set generation methods in a

bicycle route choice context. She evaluated the Breadth First Search on Link Elimination (BFS-LE),

the Doubly Stochastic Generation (DSG) method and the Branch and Bound method. These methods

32

were chosen because they proved to successfully reproduce observed car choices. In her evaluation

she used a detailed bicycle network and she used multi-attribute cost functions to take the various

environmental factors into account that are relevant in bicycle route choices. The BFS-LE method

turns out to be the most efficient in high-resolution networks. The method outperforms the other

two when it comes to replication of the observed route (62% to 68% of the chosen routes were

reproduced, percentages of other two were lower). BFS-LE and DSG both performed well in

consistency and in generating heterogeneous routes, and both algorithms managed to generate

alternatives for all or almost all of the observations. In computation time, the BFS-LE algorithm

clearly outperforms the other two, as BFS-LE needed 4 minutes for each observation while DSG

needed almost 39 hours and B&B almost 33,5 hours in detailed networks.

As Menghini et al. (2010), Schüssler (2010) and Halldórsdóttir et al. (2014) proved that the BFS-LE

procedure has shown satisfactory results in high-resolution networks, this algorithm is adopted in

this research as well. Reasons are that the algorithm ensures a significant level of diversity between

routes, its high level of consistency with the observed routes, its high computational speed, its

efficiency in real size networks and its flexibility to use any given link cost function.

3.5 Formulation of correlation structure

In the route choice context, it is assumed that an overlapping path may not be perceived as a

distinct alternative (Ben-Akiva & Bierlaire, 1999). To account for overlapping paths, the Path-Size

Logit model will be used in this thesis, as stated in chapter 3.3. In this model, the utilities are

corrected to account for the correlation, using a Path-Size factor (adjustment term) that needs to be

calculated for each choice set. The Path-Size factor imbeds travellers’ perceptions of alternative

paths in a measure of the “significance” or “relevance” of a path relative to others in the choice set

(Ramming, 2002). As mentioned earlier, there are many different Path-Size formulations possible, so

the challenge is to select the Path-Size formulation that best represents travellers’ perceptions of

overlapping paths. It is important that the Path-Size factor is robust, even when the choice set

generation method is not efficient which results in questionable routes in the choice set. Distinct

paths should always have the maximum path size of one and overlapping paths should have a size

between zero and one. This range between zero and one indicates the portion of the route that

constitutes a completely independent alternative. Thus, unique routes have a path size of one, while

two duplicate routes will each have a path size factor of ½ and so on.

Ben-Akiva & Bierlaire (1999) introduced the Path-Size Logit model as follows and proposed two

different formulations for the Path-Size factor:

(6)

where the Path-Size factor is defined by

(7)

P(i Cn) = eµ(Vin+ln PSin )

eµ(Vjn+ln PSjn )

j∈Cn

∑

PSin

1

i

n

ain

a i ajj C

lPS

L δ∈Γ∈

=

∑

∑

33

and is the set of all links of route i, is the length of link a, and the length of route i;

is the link-path incidence variable which equals one if link a is on route j and zero otherwise.

in the second part of the formulation can be seen as the number of routes in Cn using link a.

The second formulation below additionally accounts for the relative ratio between the length of the

shortest route L∗Cn in Cn using link a and the length of each route j using link a.

(8)

These two formulations for the adjustment term were the original formulations by Ben-Akiva &

Bierlaire (1999). After, there were many alternative formulations proposed. Ramming (2002) stated

that the limitation of these formulations is that they are not affected by the length of other routes

than the shortest route, if a link is used by more than one route. To account for the contribution of

the individual links, he formulated the General Path Size Factor (GPS). The GPS factor was

introduced in order to decrease the influence of unrealistically long paths on the utility of shorter

paths in the choice set. However, Hoogendoorn-Lanser et al. (2005), who applied the GPS factor to

multi-modal route choices, as well as Frejinger and Bierlaire (2007) found the interpretation of this

approach difficult and this formulation considerably increases the model’s complexity. Also, Frejinger

and Bierlaire (2007) found that the GPS factor may produce counter intuitive results and therefore

the original PS formulation is preferred (Frejinger & Bierlaire, 2007).

Hoogendoorn-Lanser & Bovy (2007) also proposed an alternative formulation of the Path Size factor

for route choice modelling in multi-modal networks. They introduced the trip part specific Path Size

Factor, which enables the modeller to account for varying valuations of overlap between different

(multi-modal) parts of the trip. This formulation is based on stages (part of trip covered by one

transport mode) and not on links. When estimating the models, they found that overlap in access

and egress parts of the trip is valued negatively while overlap in the train part had a positive

influence on the route choice. Apparently, redundancy in the train part makes a route more

attractive, as it could give the possibility to switch routes or connections while travelling.

Frejinger et al. (2009) proposed the Expanded Path-Size term, which is based on the idea that the

Path-Size factor should be computed based on the full (true) choice set, and not only on the

generated choice set. They argue that unbiased estimation results are obtained if the PS attribute

reflects the correlation among all paths. The traditional PS attributes are derived from the physical

overlapping of paths in the generated choice set only, and they ignore correlation with other non-

generated alternative routes. The Expanded PS formulation of Frejinger et al. (2009) is derived from

their Importance Sampling approach as discussed in section 3.5.4. Since it is not possible to

calculate PS attributes on all paths when using a real network, their formulation introduces an

expansion factor that corrects for the sampling. The application of the Expanded PS term is very

promising, as their experiments show that the models using the Expanded PS factor outperform the

models using the traditional PS terms.

Bovy et al. (2008) proposed the Path Size Correction (PSC) term, another approach for the Path Size

Factor. The PSC term depends on the number of shared links, the lengths of these common links

and the number of distinct routes using each common link. A completely independent route gets a

Γ i la Li δ aj

δ ajj∈Cn

∑

1

i n

n

ain

a Ciaj

j C j

lPS

LL

Lδ

∗∈Γ

∈

=

∑

∑

34

PSC of 0. The absolute value of the PSC has no upper bound. The utility of a route decreases with

an increasing number of common links on a route, increasing lengths of the common links and

increasing number of other routes of the choice set that uses one or more links of the route (Bovy,

Bekhor, & Prato, 2008). The PSC term is defined as follows:

(9)

The two original formulations (7 & 8) by Ben-Akiva & Bierlaire (1999) and the formulation (9) by

Bovy et al. (2008) were selected to calculate the Path-Size terms. Three formulations were selected

in order to compare the results in the estimation process. The different formulations should give

similar results, as the Path-Sizes are calculated from the same choice sets. The formulation that

gives the best model results will be selected in the final estimation process.

General Path Size Factor of Ramming (2002) was not selected as several researchers have indicated

that the results are difficult to interpret. Frejinger and Bierlaire (2007) preferred the original

formulations to the GPS term as these formulations have theoretical support and they have shown

intuitive results. Moreover, in their research they presented estimation results that suggest a

behavioural interpretation of the Path Size attribute, as the formulations show that overlap could be

both attractive and unattractive for travellers. The formulation by Hoogendoorn-Lanser & Bovy

(2007) was not selected, because the formulation was developed for multi-modal networks. The

formulation by Frejinger et al. (2009) was not selected because it requires an Importance Sampling

approach in the choice set generation process. In further research, it would be interesting to use the

Importance Sampling approach for choice set generation and the Expanded Path-Size to calculate

the corresponding Path-Sizes, as this approach showed promising results.

3.6 Conclusion

This chapter aimed at finding the most suitable methods for each step in the route choice modelling

process to model pedestrian route choices. The route choice modelling process is visualised in Figure

5 and consists of three main steps: obtaining trip observations, generating alternative non-chosen

routes and defining the correlation structure between the alternatives in the choice set. These steps

are essential before the estimation of the route choice model could start. The first two steps were

discussed in this chapter; the last step is discussed in the next chapter.

The conclusions of this chapter will be used to develop the revealed preference study and the route

choice model. The following research question will be answered in the conclusions:

• Which type of choice model and which data collection techniques and modelling techniques

are suitable to model pedestrians route choices, concerning a revealed preference study?

The first part of the research question concerns with the type of route choice model. A route choice

model predicts the probability that any given path between Origin and Destination is selected to

perform a trip, given a transportation network and an OD-pair (Bierlaire & Frejinger, 2008). Route

choice behaviour is often described in discrete choice models within the Random Utility Maximization

(RUM) framework, which describe route choice of pedestrians based on the concept of utility

maximization. Discrete choice models assume that each alternative in a choice experiment can be

associated with a latent quantity (an utility) which is based on the attributes of the alternative, the

socio-economic characteristics of the decision-maker (individual preferences), the choice situation

PSCin = − la

Li

a∈Γi

∑ ln δ ajj∈Cn

∑

35

and its similarities with the other available alternatives (Schüssler, 2010). The main assumption of

this framework is that individuals make a subjective rational choice between a finite number of

choice options and select the alternative with the highest utility. Discrete choice models are widely

used in transport research. They are also adopted in this thesis, because they have been well

applied in route choice modelling before and they are disaggregate behavioural models, thus

suitable for a microscopic approach for pedestrian behaviour.

In this thesis, route choices of pedestrians will be modelled within a real size urban area. This means

that a complex and dense network will be used in the modelling process. In a complex and dense

network it is inevitable that alternatives show similarities with other alternatives (overlap).

Therefore, the most simple model structure, the Multinomial Logit model, cannot be used as this

model formulation is not suitable to model choices with overlapping alternatives. There exist various

other models that are suitable to account for overlap between alternatives. These models can be

sorted into three groups: models introducing an adjustment term, models imposing a nesting

structure and models using multivariate error terms. An overview of all these models can be found in

Table 2. The selected model formulation for this thesis should meet the following criteria: it should

be applicable to real size and detailed networks, it should be able to capture correlation among

alternatives and it should be able to manage the extensive data set. The Path-Size Logit model

turned out to be the best option for the situation in this thesis: it could capture overlap among

routes, it is known to be sufficiently robust, it has the relatively simple MNL structure and it has

been shown to perform well relative to more complex model forms in real size networks.

The second part of the research question concerns with the collection of data about observed

routes. The dominant data collection method in route choice studies on a regional level has been

stated preference surveys (Broach, Gliebe, & Dill, 2011). Stated preference experiments have a lot

of advantages, because they can be controlled by the analyst, which could make the whole process

less complex. These methods are especially powerful tools for testing non-existent scenarios.

In this thesis, revealed preference methods are used to model route choices. Where stated

preference studies can be defined as in a laboratory setting, revealed preference studies deal with

real life situations based on real data. The observed choices are actually made by the participants.

Revealed preference data can be collected using various methods. Here, GPS data is used to obtain

observed trips. Last years, new techniques and developments in (automatic) post-processing of GPS

data have made working with GPS data a bit easier but it is still a complex task. New techniques to

collect, process and analyse rich data sets are still under development.

The last part of the research question concerns with the generation of realistic and heterogeneous

non-chosen alternatives and with the formulation of the correlation structure. Both the observed

routes and the non-chosen routes form the choice sets. Forming the choice set is a complex task, as

the analyst lacks information about the exact alternatives that are known and considered by the

traveller. Choice set generation is still a heated topic in literature and many choice set generation

methods have been proposed in the past. So far, no choice set generation method has been

developed especially for pedestrians in real urban areas, so the method that suits best in this

situation is selected to generate the alternative routes. Requirements for the chosen method are

that the method should be able to efficiently handle large detailed networks and it should be able to

generate heterogeneous alternatives while also taking environmental factors into account. An

overview of choice set generation methods can be found in Figure 7. The Breadth First Search on

Link Elimination (BFS-LE) method (Schüssler, 2010) is selected because it has been proven to be

efficient and consistent in bicycle route choice studies using large urban networks, and because of

its computational speed. Also, the BFS-LE method enables to use any (multi-attribute) cost-function

36

so environmental factors can be taken into account when generating the routes. Lastly, the method

has shown to be able to generate heterogeneous routes.

To formulate the correlation structure, the two original formulations by Ben-Akiva & Bierlaire (1999)

and the formulation by Bovy et al. (2008) were selected to calculate the Path-Size terms. The

calculation of the Path-Size term is required to use Path-Size Logit model. Three formulations were

selected in order to compare the results in the estimation process. The formulation that gives the

best model results will be selected in the final estimation process.

To conclude, the next methods will be used to design the revealed preference study and the route

choice model for pedestrians: observed routes will be obtained from GPS data, non-chosen

alternative routes will be generated using the BFS-LE choice set generation method, the model that

will be estimated will be a Path-Size Logit model, to account for similarities between alternatives,

and for the calculation of the Path-Size terms the formulations by Ben-Akiva & Bierlaire (1999) and

Bovy et al. (2008) will be used.

With the findings from literature (both chapter 2 and 3), the basic conceptual framework of Figure 2

could be updated to the version of Figure 11. The red arrow shows the main relationship on which

this thesis is focused. The yellow boxes represent the factors influencing the route choice process,

mainly discussed in chapter 2; the blue boxes form the choice set formation process.

Figure 11: Updated Conceptual Framework

37

38

39

4 Case study Zürich

The scope of the case study is the city of Zürich. Zürich is the largest city of Switzerland with a

population of approximately 400,000 inhabitants. In the Zürich agglomeration live more than 1

million people. The scope of this case study is the Zürich agglomeration, which consists of the city of

Zürich and 130 other neighbouring municipalities. The city is located in north-central Switzerland, at

the northern side of the Zürichsee. The lowest point of the city is at 392 metres above sea level and

the highest point, the peak of the Uetliberg, is at 871 metres. The Old Town lies on both sides of the

Limmat river, which flows from the Zürichsee.

The city is Switzerland’s hub for railways and air traffic: the central station is one of Europe’s main

railway intersections, with between 350.000 and 500.000 commuters every day and Zürich airport is

the largest and busiest international airport in the country, serving more than 25 million passenger a

year. The airport is also the principal hub of Swiss International Air Lines. The city is also a hub for

road traffic, as the A1, A3 and A4 motorways pass close to the city. For transportation within the city

and the agglomeration, public transport is very popular due to an extensive network of S-Bahn,

trams, buses, cable cars and boats on the lake, and due to its high frequency of service (Figure 12).

Figure 12: Extensive public transport network of Zürich (www.stadt-zuerich.ch)

40

4.1 Used data

For choice modelling, both observed choices and matching sets of non-chosen alternatives are

required. In order to construct the set of non-chosen alternatives, a suitable and detailed street

network model is required. This study is restricted to the area around Zürich, as shown in Figure 13.

This area was chosen such that most everyday trips of participants were included.

4.1.1 Street network

For constructing the street network, all map data was extracted from OpenStreetMap data

(OpenStreetMap, 2015). The network is mainly based on the OSM highway attributes (tags). See

Figure 13 for the study area in OpenStreetMap (left) and the corresponding constructed street

network based on OSM highway attributes (right, in MATSim format). Because only pedestrians are

considered in this study, the network includes all links except motorway and trunk links, which

resulted in a network with approximately 3 million links. Three road types for pedestrians can be

distinguished: WalkOnly (only for pedestrians, in green), WalkSafe (for pedestrians and cyclists, in

purple) and WalkAll (all modes allowed, in white). See Appendix 1 for the larger maps.

Figure 13: Study Area (left: www.openstreetmap.org; right: constructed network (MATSim, visualised in VIA)

For walking, also the gradient of the link is a relevant attribute for route choice. Especially in a hilly

city as Zürich, this attribute should be taken into account when analysing the route choices. The

elevations for the canton of Zürich are open source available under the GIS-ZH licence (Office for

Spatial Development of the Canton of Zurich, 2015). The Digital Terrain Model (DTM ZH) is

represented as a raster and is available with a resolution of 0.5 meters and in the scale 1:1000. The

elevation data is obtained by using high precision laser scanning (Lidar). To each node of the

network is the elevation assigned of the nearest measurement point to the node. With this data the

maximum and average rise as well as maximum and average fall can be calculated per route. If a

link is longer than 20 meters of a route, the slope is calculated directly. If a link is shorter, it is

joined with the next links until the total length of the joined links is longer than 20 meters. The slope

is then calculated for the joined links together. The maximum rise or fall is the absolute value of the

most positive or most negative slope. The average rise or fall is calculated as the average of all

positive or all negative slopes.

41

4.1.2 Observed routes

The observed routes are extracted from a data set collected in and around Zürich between August

2011 and December 2012 (Montini, Rieser-Schüssler, & Axhausen, 2013). Within this period 159

participants collected approximately one week of data. The participants collected the data by

person-based mobile GPS-trackers and they corrected the processed travel diaries afterwards in the

dedicated prompted recall web-interface survey. In Figure 15 an example is shown of visualised GPS

tracks and on the right the device is shown that is used to collect the data (MobiTest GSL, 2012). An

example of a travel diary can be found in Appendix 2. In the travel diaries, participants could correct

and add trips, and add locations, activities and used travel modes. The corrected travel diaries (the

original data set) consist of 7233 stages, which made up 5284 trips. A stage is defined as a

movement between two consecutive stop points, covered by one mode of transport. A stop point is

defined as a location where the person performed an activity or where the person changed the

mode of transport. Stages are linked into trips, connected by mode transfers. A trip is defined as a

movement between two consecutive activities.

The stages in the original data set took place everywhere in Switzerland. In this thesis, only the area

around Zürich is considered (Figure 13), so the data needs to be filtered first. The first filtering takes

place by filtering for interesting participants: only participants who made most of their trips in Zürich

will be left. The data is filtered for the first time by visualising the GPS data in ArcGIS (ArcGIS,

2015). Figure 14 shows how the GPS data look like in GIS software. As can be seen in Figure 14,

this person made trips outside Zürich, but most of his or her trips were within Zürich. After the first

filtering, only the people making trips in Zürich were left (134 participants making 4380 stages).

Conclusion after the first filtering was that many of these participants only made a few trips in Zürich

(they might work or live outside the area), so these participants were less relevant in this study.

Most of their trips in the study area were trips to the station or to their car, so these trips were

excluded. After the second filtering, only the participants were left who made most of their trips in

the Zürich area. The final data set, which will be used in this thesis, consists of 59 participants,

making 3053 stages (by any transport mode, mainly in Zürich).

Figure 14: Example of observed routes of one person (ArcGIS, using OSM network)

4.1.3 GPS data collection and post-processing

The devices for data collection were equipped with a SIM-card to make it possible to send the data

over the GSM network. The participants were instructed to carry the mobile device every day for one

week and to charge it every night. When charging, the data will be sent to the FTP-server (every

night). Alternatively, the data can be downloaded from the device directly as well and can then be

uploaded to the server. The raw GPS data will then be automatically post-processed using available

routines, and the results will be stored in a central MySQL database. The used automated post-

42

processing routines are open source available (POSDAP, 2012) and in detail described in Rieser-

Schüssler et al. (2011) and Schüssler and Axhausen (2009). The three main sequential steps in post-

processing of GPS data are:

• filtering and smoothing of the raw data,

• the detection of stop points, stages, trips and activities

• mode identification

Filtering and smoothing of the data is automatically done at this phase, results are stored in the

MySQL database. Both processes are essential for reliable results as there can be various errors in

the GPS measurements. The most commonly used filtering criterion is the number of satellites in

view (Rieser-Schüssler, Montini, & Dobler, 2011). The written results of this phase of post-

processing contain GPS and accelerometer data, saved as .mbt files in the database. The data

available in the .mbt files are longitude, latitude, height, date, time, number of satellites in view and

acceleration characteristics of the GPS points. As the author did not collect the GPS data herself,

these .mbt files with filtered and smoothed GPS and accelerometer data were the files that the

author got from the institute. These files are used as input for further processing. Mode

identification and the detection of stop points, stages, trips and activities will be done in a later

phase of post-processing. To detect stop points and stages, speed and acceleration characteristics

and positions of the recorded GPS points are used. Stop points with changes in speed and

acceleration could for example be linked to mode transfer. Very short time between two consecutive

GPS points could be linked to signal loss. For mode detection, criteria are used such as average or

maximum speed, duration of the stage, data quality or proximity to certain network elements

(roads, stations) to derive deterministically the best fitting mode (Rieser-Schüssler et al., 2011)

The GPS and accelerometer data stored in the MySQL database will be used to generate travel

diaries. Generation of travel diaries was done once a night and it is ultimately presented as a diary

to the respondents via a prompted recall web-interface survey (see Appendix 2). The addition of this

survey has three purposes: first, the respondents could correct and validate the results of the post-

processing procedures, second, they are often asked to add information that cannot be derived from

the GPS data (mode, trip purpose, destination type) and third, the survey can deliver input for the

processing procedures (Rieser-Schüssler, Montini, & Dobler, 2011). Results of this survey are

summarised in an Activities file, which is in this thesis also used as input for further processing

procedures (together with the .mbt files).

Figure 15: Example GPS tracks and GPS device

43

To find out whether the GPS data is representative for travel behaviour of the population, the GPS

data set is compared with data from the Mikrozensus Verkehr 2010 (Swiss Federal Statistical Office,

2010). As seen in Figure 16, it can be concluded that the mode share is comparable, so these results

of the GPS study are representative. When comparing the data for trip purpose, we could see that in

the GPS study there are more work trips reported than in the Mikrozensus. But in the Mikrozensus

there are more shopping and education trips reported than in the GPS study. The reason for this

difference is that more older, working people were willing to participate in the GPS study than

school-going young people. No personal characteristics were made available for the observed

routes, so this cannot be taken into account in model estimation.

Figure 16: Comparison of GPS data with data from Mikrozensus 2010 (mode share and trip purpose)

4.2 Processing of GPS data

Before the GPS data can be used for analysis, it requires extensive processing, so that the data are

useful for the next step of route choice modelling. Before processing, the data look like as shown in

Figure 17: the data look very messy, there are no stop points and stages defined, and the trips are

not aligned to the street network. The crowdedness of data in the first picture reveals the

respondent’s work place or home. The processing procedure includes filtering and smoothing of the

data (cleaning), obtaining stop points and stages and aligning the GPS data to the street network

(map-matching). The desired results are the chosen walking routes for each respondent, and

characteristics of these routes. The following characteristics for all stages needs to be obtained from

the data: the start and end time, start and end coordinates, start and end nodes in the network,

used links in the network, and all coordinates of the GPS points of the stage with their times,

average speed and transport mode. Figure 18 shows the whole procedure of GPS processing for

route choice modelling. The author started in the process with the data resulting from the MySQL

database. The data was already automated filtered and smoothed for the first time. The whole

procedure is written in one program. The map-matching results (chosen routes aligned to the

network) will be used in the next step of route choice modelling (Choice Set Generation).

Figure 17: Visualisation of observed routes by one person before processing of GPS data (ArcGis, using

OpenStreetMap network)

44

Figure 18: Processing of GPS data

The GPS and accelerometer data are saved in .mbt format, as a result of the first phase of post-

processing. The data available in the .mbt files are longitude, latitude, height, date, time, number of

satellites in view and acceleration characteristics of the GPS points. No stages, trips or modes are

identified in the raw GPS data. Next to the .mbt files, there is also an Activities file, which is the

result of the travel diaries, filled in and corrected by the respondents. In this file all the activities of

the respondents can be found, with their start and end time, activity type, location description,

location coordinates, duration of activity in seconds, the mode used to get to the location and the

mode used to leave the location of the activity. When no trip purpose is assigned to an activity, and

the activity lasts shorter than three minutes, then it is assumed to be a mode transfer. Both the .mbt

files as well as the Activities file are used for GPS processing.

For the processing procedure, the programming language Java is used in the Integrated

Development Environment Eclipse (Eclipse, 2015). To make the data useful for route choice

modelling, a program is written (main method) which implements existing routines for GPS

processing. These existing algorithms are open source available in POSDAP (POSDAP, 2012) and in

detail described in Rieser-Schüssler et al. (2011) and Schüssler and Axhausen (2009).

In the main method, first the environment will be prepared (load config.xml file with parameters,

load street network, define type of GPS and accelerometer data, define time format for written files,

load Activities file and GPS data files, create output files to write results, define headers of output

files, tell the program to process each person separately and to create for each person separate

45

files), then the processing could start. For all the data used in the program (GPS points,

Accelerometer data, stop points and stages of Activities file) a Java class is created that holds all the

data. The first step is to further filter and smooth (clean) the GPS data (the .mbt files of 59

interesting participants, result of first filtering using GIS software). The filtering and smoothing

criteria (parameters) are defined in the config.xml file. The program filters out GPS points that have

unrealistic altitude values (values lower than 200 and above 4200 meters above sea level, as these

does not exist in Switzerland), GPS points that have less than three satellites in view, GPS points

that make unrealistic jumps in the stream of GPS coordinates, and the program uses a HDOP and

VDOP filter. The HDOP and VDOP are measures of the best possible horizontal or vertical position

for a given configuration of GPS satellites. Even if there are enough satellites in view, they might not

be ideally positioned (Schüssler & Axhausen, 2009). The Dilution of Precision (DOP) expresses the

value of the positioning and is an indication of the accuracy of the GPS points, solely based on the

geometry of the satellites. As the satellites move, the geometry varies with time, but it is very

predictable. The maximum HDOP and VDOP values are also set in the config.xml file. After filtering,

the GPS coordinates are smoothed using the parameters defined in the config.xml file. Data filtering

removes systematic errors while data smoothing removes random errors. Random errors are for

example caused by satellite or receiver issues and signal blocking, and could lead to missing GPS

points. In the config.xml file, the smoothing technique for position (set to Gauss kernel, as

recommended in POSDAP) and the smoothing range are defined. The result of filtering and

smoothing is a clean GPS data set that is ready to be map-matched to the given network.

After filtering and smoothing, the coordinates and acceleration characteristics are calculated. The

speed and acceleration are calculated directly from the position and the timestamp of the GPS

points. Finally, coordinates are converted into the Swiss coordinate system (X and Y coordinates).

As there is an Activities file that defines activities (and thus stop points) there is no need to detect

stop points and stages from the GPS data. The used modes are also defined in this file. The data

from the Activities file is simply loaded into the program and is used to obtain stop points and

stages. This means that the two remaining steps of GPS post-processing (mode identification, and

the detection of stop points, stages, trips and activities, see section 4.2.3.) were done using the

Activities file, which saved a lot of programming work.

4.3 Map-matching procedure

Map-matching is the process of aligning a sequence of observed user positions with the road

network on a digital map (Lou, et al., 2009). The purpose of this process is to establish routes

travelled by the participants. It is one of the key post-processing steps in a GPS study and it is

fundamental step for many applications, such as traffic flow analysis. Efficient map-matching

algorithms are required to handle large GPS data sets in reasonable computation times. Schüssler

and Axhausen (2009) developed an algorithm that is proved to be efficient in handling large GPS

data sets. This map-matching procedure is implemented in this program, and is in detail described in

Schüssler and Axhausen (2009) and Marchal et al. (2005).

While the three steps in post-processing of GPS data did not employ other information but the GPS

points, the map-matching procedure requires the use of a network. The network was loaded into the

main method in the beginning of the procedure (part of environment preparation). The used street

network is the OSM-based network and the elevation model as described in section 4.2.1.

The cleaned GPS points, and the obtained stop points and stages from the Activities file, were

matched to the given network using the algorithm of Schüssler and Axhausen (2009), implemented

in the main method. Figures 19 and 20 show the results of map-matching (in green, only walking

46

trips), where each GPS point is assigned to a link of a given network. The parameters for map-

matching and the directories for the output files are defined in the config.xml.

Figure 19: Map-matching of GPS points

This procedure of Schüssler and Axhausen (2009) is actually developed for navigation networks. The

network used in this thesis is based on OSM and is more detailed than the navigation networks. For

example, a very curvy street in OSM is represented with many small links, and not as one link as in

navigation networks. For this reason, some of the criteria in the map-matching procedure had to be

adjusted to the OSM network, such as the minimum number of GPS points per link to get a valid

match. These criteria and other parameters can be found in the config.xml file. This is especially

essential when map-matching pedestrian trips, otherwise there will be far less valid results.

Since map-matching requires long computational times, only the stages which are interesting in this

thesis will be map-matched to the network. This means, the map-matching procedure will go

through all trips, but will only run for the stages if the stages have valid start and end time, and if

walking is the used mode. The map-matching results for all walking stages (the chosen routes) are

written in separate output files for each person, and there is one file with all results of all

participants. These output files contain the trip id, the number of GPS points per trip, start time,

start and end node in the network and used route links in the network. This map-matching

procedure also forms the last filter in the GPS processing procedure: the results of map-matching

contains only walking stages that meet all criteria for map-matching, thus these results can be used

for further analysis. After this phase only 580 walking stages/trips are left made by 51 participants.

Fewer participants were left for analysis because apparently a few participants did not make walking

trips, or their trips did not meet the criteria for map-matching.

After the map-matching procedure, the output files of the whole processing procedure are written.

Also, here the condition is that results will only be written when a stage has a valid start and end

time. The first output files contain information about the stages (Stage files): a list of all stages and

its characteristics (user, start and end time, start and end coordinates, start and end nodes in the

network when map-matched). One Stage file is created for all stages (any mode) and one is created

for only walking stages. Also, output files of all the GPS points are created (GPS files). These GPS

files contain a list of all GPS points with their coordinates, stage id, user id, time, speed and mode.

Three kinds of files are created; each of them can be useful for different purposes. The first contains

all GPS points of all stages of all participants, the second contains all GPS points of only walking

stages of all participants and the last kind are separate files with GPS points for each person (also

only walk stages). The last output file to be written is a network file with the chosen routes (results

of map-matching procedure). This network with routes can be used for analysis in GIS software.

47

Figure 20: GPS points (red) and walking trips after Map-Matching (green)

4.4 Generation of alternative non-chosen routes

The next step in route choice modelling is to generate alternative non-chosen routes. The non-

chosen routes will be generated using the results from Map-matching (chosen routes) and the street

network. As argued in section 3.5, for choice set generation the Breadth First Search on Link

Elimination (BFS-LE) method developed by Rieser-Schüssler (2012) will be used. The procedure

combines a Breadth First Search with topologically equivalent network reduction. Breadth First

Search is an algorithm for searching tree data structures, developed by Moore (1959). It starts at a

tree root (source) and it first visits neighbouring nodes before moving to the next level nodes.

The general goal of choice set generation is to produce a route choice set of diverse, feasible and

least cost routes. A feasible route is continuous, contains no loops and has low travel costs. The

Breadth First Search algorithm processes nodes for short routes earlier than long ones, so the

algorithm is more likely (than other search algorithms, as Depth-First, Best-First or Multiway Tree

Search) to generate least cost routes.

Figure 21: Order in which the nodes are explored (stackoverflow.com)

48

The BFS-LE method calculates, given a cost function, repeated least cost (shortest) paths of a given

origin-destination pair for a given network and it removes the links in turn (network reduction).

When a shortest path is calculated, the links in turn of this shortest path are removed one by one.

For the resulting subnetwork(s), it searches for the next shortest path(s). The algorithm proceeds to

the next level (depth) when all links of the original shortest path have been processed. The

calculated shortest paths become the starting points for the next iteration of link elimination. The

algorithm monitors the generated networks and retains only unique and connected routes and

shortest paths for the choice set. The algorithm will stop when the desired number of unique routes

in the choice set has been generated, when the time abort threshold is met or when the original

shortest path is exhausted. The BFS-LE method and its development and performance are in detail

described in Rieser-Schüssler (2012) and the method is illustrated with an example in Figure 22.

Menghini (2010) implemented the BFS-LE method in the bicycle route choice context using a single-

attribute cost function, only considering the length of the link. Halldórsdóttir (2014) also

implemented BFS-LE for bicycle route choices, but used a multi-attribute cost function, taking into

account the length of the link, road type, cycle lanes and land use. In this thesis, also a multi-

attribute cost function will be used, including the pedestrian-oriented cost attributes length, path,

road type and gradient. A multi-attribute cost function is used to get realistic, diverse route

alternatives and to account for heterogeneous preferences across different pedestrians. For car-

users, travel time is most relevant in route choices, so a single-attribute cost function would be

sufficient. But for pedestrians and cyclists, other attributes are relevant for route choices as well,

and every individual has own preferences, so a multi-attribute cost function is required to get a

heterogeneous choice set and to estimate route choice models. As the quality of parameter

estimates depends on the quality of the choice sets, it is important to include these pedestrian-

oriented factors as well to understand pedestrian’s preferences. An advantage of the BFS-LE is that

it could use any given cost function, specified by the analyst, without changing the algorithm

structure or computational performance (Rieser-Schüssler, Balmer, & Axhausen, 2012). The cost

function can take any form and depends only on the available network information.

Figure 22: BFS-LE algorithm: d = depth; Sn = additional alternatives found at depth n; S = size of the choice

set; b(d) = Number of candidate networks at depth d; (Rieser-Schüssler (2012))

49

For the choice set generation, a new Java program is written. The main method reads the OD pairs

of the chosen routes (Map-matching results), reads the given OSM-based network, generate choice

sets using both data sources, the given cost function and the specified choice set generation

algorithm (implemented in main method), add the chosen routes to the choice set when these are

not generated by algorithm, and write the output files with the choice sets.

The main method starts with defining the location of the parameters used (config.xml file) and

reading the network, the elevation model and the chosen routes. Then, the attributes (gradient,

road type) for the links will be set. The gradient of the links are calculated using the heights of the

nodes (from elevation model) and the distances from the street network. The road types are set

using only the OSM street network. The different road types are based on the tags of OSM (roads

for walk only, walk and cycle and allowed for all modes). After, the cost function and limits need to

be defined. The multi-attribute cost function used here includes four attributes: length (distance),

path, road type and gradient. There are three parameters for road types (walk only, walk and cycle,

all modes) and two for path (foot path or no foot path). The following cost function is used:

(( ) )

(( ) )

(( ) )

k ak

k ak

k ak

a RoadType RoadType ak ak

Path Path ak ak

Gradient Gradient ak a ak

C RoadType Length

Path Length

Gradient Length

β ξ

β ξ

β ξ ε

= +

+ +

+ + +

∑

∑

∑

i i

i i

i i

(10)

where Ca

is the random cost of Link a, Lengtha

is the length (distance) of Link a, akRoadType ,

akPath and akGradient are the Road Type k, Path k and Gradient k that Link a belongs to,

akRoadTypeξ , akPathξ and

akGradientξ are error components related to Road Type k, Path k and Gradient k

of Link a, RoadTypek

β , Pathk

β and Gradientk

β are coefficients related to Road Type k, Path k and Gradient k

and εa is the random error term for Link a. Here, each error term ε

a is equal to zero for every Link a

and each error component akRoadTypeξ ,

akPathξ , akGradientξ

was equal to one.

Before starting the choice set generation, the choice set size and the time abort threshold (limits)

need to be defined (set in the config.xml file). The choice set size is set to twenty alternatives and

the time abort threshold is set to 300 seconds per OD pair. The amount of twenty is chosen because

this provides the opportunity to vary in size and composition of the choice set when estimating the

route choice model. It would be consistent to choose six as choice set size, because an individual

could only consider about six alternatives (Bovy & Stern, 1990), but this would make the estimation

process less flexible. After running a few tests, the time abort threshold of 300 seconds seems to be

sufficient for generating twenty feasible alternatives. Rieser-Schüssler (2012) tested the algorithm

for 100 alternatives, and the average computation time per OD pair does not exceed 10 minutes.

Then, the choice set generation algorithm will run for the OD pairs. The OD pairs will only be

processed when they have a valid start node and end node. The algorithm creates alternative routes

for the processed OD pairs using the network, the cost function and the conditions set in the

config.xml file. If the chosen route is not generated by the algorithm itself, the chosen route will be

added to the choice set in the end. In this case, the choice set size will be 21 instead of 20. The

results of choice set generation will be written to files, for each person separately. The choice set

50

writer is implemented in the main method as well. For each alternative in the choice sets, the start

time, the start and end node and the used links in the network will be written. Also, the chosen

route from the choice set will be indicated. The results will also be written in a format that could be

analysed in GIS software. The choice set writer for GIS is also implemented in the main method and

will write GIS results for each person separately. The written results for each alternative (route) are

the used links and the coordinates of their start and end node.

The choice set generation method was able to reproduce 67% of the chosen routes. So for 33% of

the OD pairs the chosen route was not reproduced and therefore added in the end to the choice set

(resulting in a choice set of 21 alternatives). This result is satisfactory as Halldórsdóttir (2014) found

in her choice set generation methods study that the BFS-LE method reproduced 62% to 68% of the

chosen routes and the Doubly Stochastic Generation method replicated 59% to 64% of the chosen

routes in a detailed network. The high percentage could be explained by the fact that pedestrians

make in general short trips. Halldórsdóttir (2014) found that the algorithms showed less consistent

results in longer trips: the average coverage decreases with the increasing trip length, especially

when the observed trip is longer than 10 km.

4.5 Calculation of route characteristics and Path-Sizes

In order to estimate the route choice models and to find out which characteristics have an influence

on the route choices of pedestrians, the route characteristics of the chosen and the non-chosen

alternative routes need to be calculated. Also for the calculation of route characteristics a new Java

program will be written. The main method reads the given network, reads the link attributes (road

types and node heights) and sets these link attributes to the links in the program, reads the choice

sets (results from previous step), calculates the route attributes for the alternatives, calculates the

path size factor for each choice set and writes the results for choice modelling. The final output is a

data file with all the observed and generated non-chosen routes with their calculated route

characteristics and Path-Sizes. This file can be used for choice modelling. Figure 24 shows an

overview of the routes attribute calculation process.

4.5.1 Environmental street characteristics

The inputs for the main method will be the OSM network, the elevation model and the choice sets.

These sources are needed for calculating the route attributes and Path-Sizes of the choice sets.

First, the network that will be used for route attribute calculation will be prepared: the link attributes

will be set in the main method (to the links of the network). Attributes will be calculated per link

using data taken from the OSM network and the elevation model.

Figure 23: Road types in the street network (visualisation in VIA)

51

The main method starts with reading the given network and the elevation model. In order to

calculate the link attributes for each link in the given network, a public class is created which holds

all the links in the network. General member variables in this class are length (distance), free flow

travel time, capacity and number of lanes. Additional variables for this class are gradient and road

type. Road type is based on OSM tags (see Figure 23), which can be WalkOnly (only pedestrians),

WalkSafe (pedestrians and cyclists) or WalkAllmodes (all modes allowed). Variables as distance and

road type can be taken from the OSM-based network itself, the gradient not. The gradient will be

set later in the main method to the links, using the elevation model. The result is a public class (link

class), which holds all the links of the network with the variables set to the links.

Returning to the main method, all link attributes will be taken from the link class or calculated (the

gradient), and set in the main method. The gradient is calculated by reading the node heights of the

elevation model and calculating the gradient of the links between the nodes. When gradient is found

for a link, this will be set in the main method. The road types for each link, taken from the links

class, will also be set in the main method.

Figure 24: Overview of route attributes calculation

52

After setting all link attributes to the links in main method, the choice sets are read which are

generated in the previous step. The choice sets consist of routes, so a public class is created which

holds all routes of the choice sets and the variables that will be calculated. Then, a route attributes

calculator is prepared, to calculate and set the route variables of the routes in the route class. These

variables will be calculated using the network that is set in the previous step. For the route

attributes calculator, a public class is created which holds the methods for route attribute

calculation. This class is called in the main method. The calculation methods in this class will run for

the choice set routes in the route class. For these routes, the distance, gradient, rise and fall

characteristics, road type fractions and Path-Size factors will be calculated. The calculator will set

these attributes to the routes as well. The calculation of the Path-Size factors will be discussed in the

next section.

Gradient is calculated as the height difference between the start and end node of the link, divided by

the length of the link. Rise and fall characteristics for the routes are maximum, minimum and

average rise and fall, and rising and falling altitude difference. Also, the proportion of the routes for

which it is flat, rising or falling is calculated (gradient proportions). For road type, the fraction of

WalkOnly, WalkSafe and WalkAllmodes of the routes will be calculated. This method to set road type

to the routes was chosen because routes do usually not consist of one road type. Especially long

routes could cover different road types. The total of all road type fractions is always one, because

links always belongs to one of the three road type categories. This method is preferred to a method

where road type is expressed in distance (meters or kilometres) as different routes within a choice

set could have different distances.

4.5.2 Path-Size factors (overlap)

As discussed in section 3.3.3, the Path-Size Logit model will be adopted in this research to overcome

the overlapping problem between routes in a choice set. The two original formulations and the PSC

term of Bovy et al. (2008) were implemented in the route attributes calculator class, which is called

in the main method. Formulations of these Path-Size attributes and motivation to select these three

formulations can be found in section 3.6. Three methods were implemented in order to compare the

results; finally, the formulation that shows the best model results will be selected in the final

estimation process. In the main method, also another class is called (Path Size Calculation Helper)

which helps the route attributes calculator in the Path Size calculation process. The variables used

for the calculation of the Path Sizes are defined in this class. As Path Size Factors depends on the

other routes in the choice set, the Path Size calculation method will run for each choice set. When

this is all calculated, the different Path Size Factors will be set to the routes.

4.5.3 Writing final results for choice modelling

When all route attributes are calculated and set to the routes, the results will be written. The output

will be used for choice modelling, so the output files are written accordingly. To write the final

results, a writer class is created which is called in the main method. The writer will run for all choice

sets with the calculated route attributes. Before writing the results, the location and the format of

the output file are defined in the writer class. Then, the header of the output file with the desired

route data and the route attributes data are defined in the writer. When this is all prepared, the

constant route data (such as person id and route id) and calculated results (such as distance,

gradient, Path Sizes) will be written to the file. In every choice set, the chosen route gets a ‘1’ in the

CHOICE column and otherwise a ‘0’ if the route was not chosen. When all results are written (choice

sets including route characteristics and path sizes), the writer and the main method could be closed.

53

54

55

5 Analysis of GPS and generated data

When all the data (the observed routes, the non-chosen alternatives, and their route attributes)

required for choice modelling are collected, the model estimation process could start. As guidance in

the model estimation process, first a descriptive analysis will be carried out on the data using SPSS.

It is important to know what statistics say about the data, even before starting the estimation

process. This way, relevant attributes could be selected to take into the estimation process.

Furthermore, results from descriptive analysis could be used to formulate hypotheses and to make a

research plan for the estimation process. The following research question will guide in this section:

• What reveals the GPS data about the choice behaviour of pedestrians in Zürich and which

hypotheses based on literature are confirmed?

First, a research plan will be presented in this chapter. Then, descriptive analyses will be conducted

to test the hypotheses that are formulated in the research plan. Conclusions of the descriptive

analysis could be used to design the estimation process.

5.1 Research plan

As numerous statistical analyses could be carried out on the data, it is wise to design a research

plan and to formulate objectives and hypotheses for the descriptive analysis. The main objective of

the descriptive analysis is to find out what the basic features are of the data used in this study.

Descriptive statistics are used to describe and summarise data in a meaningful way such that, for

example, patterns might be observed from the data. Results of descriptive analysis form the basis of

further quantitative research. In this thesis, it can be used design the model estimation process.

In order to obtain a clear picture of the data used in this study, first descriptive analyses will be

carried out on the observed routes and the non-chosen generated routes. These two data sets will

be described and summarised by looking into their distribution (frequency table), central tendency

(mean and median) and dispersion, which refers to the spread of the values around the central

tendency (range and standard deviation). These analyses will give a first idea about the data and

could give an idea about how the chosen routes differ from the non-chosen routes. When something

strange or unexpected is observed from the results, this requires further analysis.

After, it would be useful to see how the observed routes relate to the non-chosen generated routes.

The literature study in chapter 2 tells us why pedestrians choose certain routes and which route

preferences they have. The main conclusions about route choice behaviour of pedestrians from

56

literature (chapter 2) and first observations of the GPS data are used to formulate hypotheses about

the data. These were:

1. People always choose the shortest route (main conclusion from literature)

2. People clearly prefer WalkOnly roads (largest fraction WalkOnly, preference for pedestrians

paths and safety factors are found in literature)

3. Maximum rise has more influence on pedestrian route choices than average rise (Menghini

et al. (2010), conclusion for cyclists, but likely to be applicable for pedestrians as well)

4. Most distinct routes (PS1/2 close to 1; PSC to 0) are clearly preferred to overlapping routes

(overlap has a negative effect on route choices (Ben-Akiva & Bierlaire, 1999))

The first hypothesis is based on the main finding of the literature study: trip length is the most

dominant factor in pedestrian route choices. This conclusion is found in revealed preference studies

about pedestrian route choices of Hill (1982), Seneviratne & Morrall (1985), Borgers & Timmermans

(1986), Verlander & Heydecker (1997), Agrawal Weinstein, Schlossberg, & Irvin (2008), Guo & Loo

(2013), Rodriguez, Merlin, & Prato (2014) and Broach & Dill (2015). To find out if people really

choose the shortest route available in the network, we will find out what percentage of the chosen

routes is the shortest route available in their choice set. When the data set says that people do not

choose the shortest route, further analysis is required to find out why people do not choose the

shortest route, as stated in almost all studies about pedestrian route choice behaviour. It is unlikely

that trip length does not have an influence at all on route choices of pedestrians, so in this case

further data analysis is needed.

The second hypothesis is also based on literature, which says that pedestrians prefer pedestrian

paths for safety reasons. Brown, Werner, Amburgey, & Szalay (2007), Agrawal Weinstein,

Schlossberg, & Irvin (2008), Guo & Loo (2013) and Rodriguez, Merlin, & Prato (2014) give this as a

conclusion of their studies about pedestrian route choices. This will be researched by determining

the percentage of the chosen routes that has the largest fraction of WalkOnly roads.

The third hypothesis is also based on literature, but on a study about cyclists by Menghini et al.

(2010), also performed in Zürich. As individual pedestrians’ behaviour shows similarities with travel

behaviour of cyclists (both driven by physical effort), and both studies are conducted in the same

city, the conclusion about cyclists in Zürich is likely to apply for pedestrians in Zürich as well. The

hypothesis will be tested by comparing the percentage of chosen routes with the smallest average

rise of the choice set with the percentage of chosen routes with the smallest maximum rise of the

choice set. When the hypothesis is true, the maximum rise will be taken into the estimation process.

The fourth hypothesis in based on the conclusion from literature that overlap has a negative

influence on route choices (Ben-Akiva & Bierlaire, 1999). To find out if pedestrians prefer most

distinct routes and do not like overlapping routes, we will determine what percentage of the chosen

routes has the least overlap of their choice set (largest PS1 and PS2, smallest PSC).

The last hypothesis is not based on literature study but on own assumption and on first observation

of the GPS data. Apparently, the algorithm was not able to generate all observed routes. The

algorithm was mainly driven by finding shortest routes, so an explanation for not generating the

observed route is that the observed route is apparently not one of the shortest routes between a

given Origin and Destination. This observation and assumption leads to the following hypothesis.

57

5. When the chosen route is not generated by algorithm, the chosen route is mainly one of the

longest routes of the choice set

For testing the last hypothesis, only the choice sets having 21 alternatives will be taken into the

analysis. Of this data set, consisting of choice sets of 21 alternatives, we will find out if the chosen

route belongs to one of the longest routes in distance of the choice set.

Lastly, it is useful to know what the composition of the choice set is and to look into the correlations

between different attributes. Knowledge about the composition of the choice set could support in

sampling alternatives for estimation. Using a sample of well-sampled alternatives could lead to

better model results than using the full choice set. The results of the correlation analysis could

support in the interpretation of the model estimation results: when variables show to be

insignificant, the variable could strongly correlate with one of the other variables.

5.2 Descriptive analysis of results

The total data set that will be used for estimation consists of 579 valid trips made by 51 individuals.

Table 4 below shows an overview of all the calculated route attributes and their descriptions. As

seen in the table, there are a few gradient attributes calculated. Not all of them will be taken into

the estimation process, as that would result in correlated estimation results. By the end of the

descriptive analysis, the gradient attribute with the largest expected impact on route choices will be

selected to take into the estimation process.

Route attributes Description and unit

Distance Trip length [km]

RiseAverage Average absolute rise [m/ 100 m]

RiseMax Maximum rise [m/ 100 m]

FallMax Maximum fall [m/ 100 m]

Rise Fraction Fraction of route which is rising [0-1]

Flat Fraction Fraction of route which is flat [0-1]

Fall Fraction Fraction of route which is falling [0-1]

WalkOnlyFraction Fraction of route which is only for pedestrians [0-1]

WalkSafeFraction Fraction of route which is for pedestrians and cyclists [0-1]

WalkAllFraction Fraction of route which is used by all traffic modes [0-1]

PS1 Path Size Factor; Ben-Akiva & Bierlaire (1999) first formulation [0-1]

PS2 Path Size Factor; Ben-Akiva & Bierlaire (1999) second formulation [0-1]

PSC Path Size Correction Factor; Bovy et al. (2008) [0-1]

Table 4: Calculated Route Attributes

Table 5 shows the characteristics of the chosen routes. Apparently, people in Zürich mainly walk

short distances (average of 0,13 km). The extensive public transport network in Zürich and the fact

that most people possess an unlimited travel card for travel zone 1, could explain why people do not

walk long distances. Another conclusion is that people prefer flat routes: in 1/3 of the cases people

choose a route that is not rising. As also seen in the table, apparently people choose mainly routes

with mixed road types, as the percentages for homogeneous road type routes are small. The Path

Sizes show remarkable results: as all three Path Size factors are calculated on the same choice sets,

they are expected to show the same percentages for choosing distinct routes. A distinct route should

be recognized as such by all three PS factors. PS1 and PS2 show indeed the same percentage, but

the PSC shows a lower percentage. The results of PSC (and its implementation in general) are

58

therefore questionable. All percentages for PS factors are low, so it looks like that distinct routes are

not clearly more attractive.

Walk trips data characteristics

Number of all walk trips (GPS data) 579

Number of individuals 51

Mean distance (all walked trips) 0,134 km

Trips on non-rising routes (Rise = 0) 33%

Trips on WalkOnly routes (>95% WalkOnly fraction) 2%

Trips on WalkSafe routes (>95% WalkSafe fraction) 2%

Trips on only WalkOnly or WalkSafe routes (WalkAll fraction is 0) 3%

Trips on distinct routes (PS1 & PS2 = 1) 7%

Trips on distinct routes (PSC = 0) 3%

Table 5: Characteristics of chosen routes

Figure 25 below shows the distribution of the observed trips over the different distances. There were

no trips above 1,0 km and most of the trips were under 0,1 km. The short distances suggest that

people in Zürich in general do not use walking as their main transport mode. The short walking trips

could be a part of a longer multi-modal trip.

Distance Frequency

0,1 328

0,2 107

0,3 60

0,4 46

0,5 31

0,6 5

0,7 2

0,8 0

0,9 0

1 0

More 0

Table 6 shows the results of the descriptive analysis of the chosen routes and Table 7 the results of

the non-chosen alternatives. The trip lengths of the generated non-chosen routes are on average

shorter than the observed routes but the rise (maximum rise and average rise) in the non-chosen

routes is on average slightly higher. Also the fraction of WalkAll for the non-chosen routes is higher,

which means that the non-chosen routes consist more WalkAll links (routes on mixed traffic roads).

The PS1 and PS2 factors of the observed routes are higher than the non-chosen routes, so the

observed routes have on average less overlap than the generated routes. The results of the PSC

factors are comparable.

The median of 0,08 km (80 meters) of the chosen routes and the median of 0,067 of the non-

chosen routes are very remarkable. This raises questions about the validity of the data set: do the

collected walking trips correspond to regular walking trips made in the real world? The question is

whether the used data set is able to scientifically answer the research questions, as the data might

not represent actual behaviour of pedestrians. When looking into reference studies with revealed

choices of pedestrians, we find in Guo & Loo (2013) average distances of 630 meters in New York

City and 244 meters in Hong Kong and in Broach & Dill (2015) a mean distance of 876 meters in

0

100

200

300

400

0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 More

Fre

qu

en

cy

Distance in KM

Figure 25: Histogram of trip lengths in KM of chosen routes

59

Portland, Oregon. In general for all situations, the most commonly cited standard of 400 meters is

used as average distance that people are willing to walk. This standard is also used by the public

transit industry as radius around bus stops, to identify the area from which most transit users will

access the system by foot (El-Geneidy, Grimsrud, Wasfi, Tétreault, & Surprenant-Legault, 2014).

The median of 0,08 km and mean of 0,134 km is far below the distances found in the reference

studies, and it is also far below the average distance to get to a public transit facility. This leads to

the assumption that many walking trips were no access or egress trips from or to a public transport

facility. The question is what the trip purposes was of the people who made the short walking trips.

To find out which activities were linked to the short walking trips, further analysis is needed in the

Activities file. When we filter for walking trips in the Activities file, we find the following numbers:

Number Activity type Number of trips Percentage

1 Work 64 11%

2 Going home, trips inside/around house 116 20%

3 School 11 2%

4 Business trip 12 2%

5 Daily shopping 29 5%

6 Shopping (recreational) 6 1%

7 Medical services (doctor, hospital, etc) 11 2%

8 Recreational trips 58 10%

9 Bring or pick up someone 6 1%

10 Access/Egress PT and transfer 191 33%

11 Other 17 3%

12 No insert 58 10%

Figure 26: Distribution of walking trips by activity type

In Figure 27 we observe that most of the trips were transfers between modes (including access and

egress of public transport facility). Access or egress trips to and from a public transport facility could

have a significant distance, but transfers between modes (walk from house to car, walk from station

to car, change lines on a tram platform) could be very short modes. As there is no difference made

in trip purpose between access/egress and transfer between modes, it is likely that most of the

walking trips belong to the second category. As the trips of the second category are mostly very

short trips, this could be an explanation for the large amount of very short trips in the data sample.

Also, the amount of home trips (going home, trips inside or around the house) is also large (20%).

The duration of many of these activities (many only a couple of minutes) and the fact that many of

these home activities took place right after each other on the same location, leads to the assumption

that many of these home activities took place inside or around the house. As these house trips are

11%

20%

2%

2%5%

1%2%

10%

1%

33%

3%10%Walking trips

by Activity type

1

2

3

4

5

6

7

8

9

10

11

12

60

often very short, this could be an explanation for the very short trips as well. Another observation is

that there is also a large amount of trips with no insert for an activity type (10%). The reason why

participants did not insert an activity type is uncertain, but the fact that there is no activity type

given could lead to the assumption that these trips were probably disturbances in GPS traces. They

could be trips that were not actually made by the participants, and therefore no activity type was

given. Disturbances (lost signal, errors) are also an explanation for the very short trips.

This observation of the very short mean and median distances of the chosen routes leads to the

conclusion that the data sample used in this research is not representative for normal pedestrian

behaviour and therefore invalid to scientifically answer the research questions. Many of the trips in

the data sample are assumed to be transfers between modes, trips inside or around the house or

disturbances in the GPS traces. As the trip means and median are much smaller than the averages

found in reference studies, and the standard used for average walking distances by the industry, we

could not say that the data truly represent actual behaviour of pedestrians in normal situation in

cities. This makes the results based on this data sample not applicable or valid to larger data

samples. This was the risk taken by using revealed preference data: we could not control the data

that we would want to collect and we could not control the behaviour of the participants.

Furthermore, the author did not collect the data herself, so the control of the author on the data

collection process was minimal.

Mean Median Standard

Deviation

Minimum Maximum Confidence

Level (95%)

Distance 0,134 km 0,080 km 0,132 0,0005 km 0,618 km 0,011

RiseMax 0,027 0,009 0,049 0 0,493 0,004

RiseAverage 0,008 0,003 0,015 0 0,122 0,001

FallMax 0,033 0,012 0,052 0 0,482 0,004

WalkOnly 0,141 0 0,225 0 1 0,018

WalkSafe 0,128 0 0,237 0 1 0,019

WalkAll 0,730 0,861 0,312 0 1 0,025

PS1DIST 0,328 0,214 0,269 0 1 0,022

PS2DIST 0,315 0,200 0,274 0 1 0,022

PSCDIST 0,187 0,162 0,215 0 1 0,018

Table 6: Descriptive analysis of all chosen routes

Mean Median Standard

Deviation

Minimum Maximum Confidence

Level (95%

Distance 0,120 km 0,067 km 0,129 0 km 0,539 km 0,002

RiseMax 0,040 0,018 0,059 0 0,605 0,001

RiseAverage 0,010 0,005 0,014 0 0,175 0,000

FallMax 0,042 0,020 0,057 0 0,650 0,001

WalkOnly 0,106 0,035 0,157 0 1 0,003

WalkSafe 0,068 0,000 0,152 0 1 0,003

WalkAll 0,825 0,903 0,218 0 1 0,004

PS1DIST 0,257 0,214 0,150 0,054 1 0,003

PS2DIST 0,254 0,211 0,151 0,018 1 0,003

PSCDIST 0,177 0,161 0,188 0,000 0,999 0,004

Table 7: Descriptive analysis of non-chosen routes

61

Despite the very short distance routes, the choice set generation algorithm was surprisingly able to

generate 20 alternatives for most of the routes. This needs further analysis, as it is at least

remarkable that there are 20 alternative routes available for routes of on average 0,134 km. For

further analysis, the choice sets are visualised in the software program VIA (Senozon, 2015). Figure

27 shows a trip from the tram station close to the lake (Bürkliplatz) to the viewpoint that gives a

nice view over the lake. This route is selected because it is often made by pedestrians, as the route

via the viewpoint also leads to the ferries and to the park alongside the lake (west side of the lake).

This trip is also often found in our data sample and has a trip length of 107 meters. As the mean trip

length of our data sample is 0,134 km and the median is 0,08 km, and this trip is often found in the

data sample, this trip of 0,107 km can be seen as a regular trip in our data sample. The left picture

of Figure 28 shows the trip visualised in VIA, the right picture shows the visualised choice set for this

trip. The longest trip in the choice set has a distance of 203 meters, which is almost twice as long as

the chosen route. As seen in the right picture, there is a lot of overlap between the generated

alternatives. This explains the amount of 20 alternatives for short routes: most of the generated

alternative routes have a lot of overlap and some routes are much longer than the chosen route (as

in this example, the longest route in choice set was almost twice as long).

Figure 27: Route from tram station to viewpoint in Open Street Map

Figure 28: Route from tram station to viewpoint in VIA (left) and links used by alternative routes (right)

To find out if this also happens with longer chosen routes, we analyse another trip in VIA. Another

trip that is often taken by pedestrians is the trip from the main train station to the Polybahn. The

Polybahn offers a fast connection between the city centre and the university campus: walking to the

university from the city centre takes about 10 minutes (uphill) while the Polybahn takes passengers

in 100 seconds to the university. Also, the Polybahn runs every 2.5 minutes, so the waiting times are

also very short. This trip has a trip length of 298 meters and is visualised in Figure 29 and Figure 30.

62

Figure 29: Trip from the Polybahn to the Main station in Open Street Map

Figure 30: Chosen trip in VIA

Figure 31: Links used by alternative routes, in VIA

The route of Figure 29 and Figure 30 is one of the possible routes between the main station and the

Polybahn, and is also often found in the data sample. Figure 31 shows the links used by alternative

63

routes. With a trip length of 298 meters, this trip is one of the longer trips of the data sample. The

longest trip of this choice set (shown in Figure 31) has a trip length of 376 meters. This is a detour,

compared to the chosen route, but it is not twice as long (as in the previous example). The chosen

route was the shortest route, but the difference with the second shortest route is very small (only 5

meters). In this example, we observe the same as in the previous example: as shown in Figure 31,

many of the generated routes have a lot of overlap with other routes in the choice sets.

5.3 Comparing the chosen routes with the alternative non-

chosen routes in the choice set

The next task is to evaluate how the chosen routes relate to the alternative non-chosen routes in

the choice set. Out of 579 observed routes, it was only for 554 routes possible to generate

alternative routes. The 5 remaining observed routes (for which no alternatives were generated)

were possibly invalid for choice set generation (missing data), or it was impossible to find 20

alternative routes for the observed route. For the observed routes which were successful for choice

set generation, a choice set of 20 alternatives was generated, of which one is the chosen route.

When the chosen route was not generated by the algorithm, the chosen route was added to choice

set in the end, which resulted in a total choice set of 21 alternatives. For some analyses, the total

data set will be split into two subsets: one data set which contains choice sets of 20 alternatives

(365 routes) and another which contains choice sets of 21 alternatives (189 routes). The reason for

this distinction is the presumption that when the chosen route is not generated by the algorithm, the

chosen route must be a long distance route or for another reason an unattractive route. Choosing

for an presumably unattractive route could be explained by different reasons, for example by trip

purpose (for example leisure or shopping) or by other attributes along the route which are not

captured in the model. This travel behaviour is significantly different than the behaviour of people

making daily walking trips, so therefore, for some analyses, the data set is split into two subsets.

Chosen route compared with alternative routes

Number of walk trips for which choice set generation was successful 554

Chosen route was shortest route of choice set 7%

Chosen route was on average flattest route of choice set 20%

Chosen route had smallest maximum rise in the choice set 42%

Chosen route had largest Flat fraction in the choice set 31%

Chosen route had largest fraction of WalkOnly in choice set 35%

Chosen route had largest fraction of WalkSafe in choice set 37%

Chosen route had largest fraction of WalkAll in the choice set 35%

Chosen route had smallest fraction of WalkAll in choice set 29%

Chosen route had largest PS1 (least overlap with other routes) 18%

Chosen route had largest PS2 (least overlap with other routes) 17.5%

Chosen route had smallest PSC (least overlap with other routes) 5%

Table 8: Chosen route compared with alternative routes

In this section, the hypotheses as formulated in 5.2 will be tested. These were:

1. People always choose the shortest route (main conclusion from literature)

2. People clearly prefer WalkOnly roads (largest fraction WalkOnly, preference for pedestrians

paths and safety factors are found in literature)

3. Maximum rise has more influence on pedestrian route choices than average rise (Menghini

et al. (2010), conclusion for cyclists, but likely to be applicable for pedestrians as well)

64

4. Most distinct routes (PS1/2 close to 1; PSC to 0) are clearly preferred to overlapping routes

(overlap has a negative effect on route choices (Ben-Akiva & Bierlaire, 1999))

5. When the chosen route is not generated by algorithm, the chosen route is mainly one of the

longest routes of the choice set

In Table 8, the chosen routes are compared against their alternatives within their choice set.

Surprisingly, we observe that in only 7% of the cases the chosen route was the shortest route of the

choice set, so the first hypothesis could be rejected: people do not always choose the shortest

route. This goes against all results and literature findings about pedestrian route choices: almost all

of them conclude that distance is the most dominant factor in pedestrian route choices (see section

3.4.2 for an overview). An explanation for this very low percentage could be that people mostly

choose one of the shortest routes, and not always the shortest route in absolute distance. Also, in

this analysis all chosen routes were taken into account, so the total data set is not yet split into the

two subsets: the chosen routes of the 21-data set might have longer distances than their

alternatives in the choice set for other reasons. If people really choose their routes based on

shortest distance (as they say in surveys: see Table 3), an explanation for the results presented here

could be that people’s perceived shortest route is actually not the real shortest route. Or, the chosen

route is not the shortest in absolute value, but the difference in distance with the shortest route is

actually very small. For a pedestrian, walking five meters further is not recognized as a longer route,

while in the data analysis there can only be one shortest route. If people choose a route that is not

the shortest, but the chosen route is still one of the shortest out of the choice set, distance has an

influence on the route choices. As it is unlikely that trip length has no influence on the route choices

at all, the new hypothesis will be: people choose one of the shortest routes of their choice set. This

new hypothesis will be tested further below.

Before we test the hypothesis, we want to find out why people do not choose the shortest route of

the choice set. When we have a closer look into the shortest routes of the choice sets, this turns out

to be on average 0,01 km while the mean of observed routes is 0,14 km (see Table 9). This mean of

0,01 km for the shortest routes of the choice set is really small, especially when compared to the

mean trip length of the chosen routes. Therefore, it might be better to not use the shortest route for

comparison, as this route within the choice set could be unrealistically small (and therefore, probably

not a serious option for an alternative route). Instead, we could find out if people choose one of the

shortest routes, and not the absolute shortest. According to these numbers, the average detour

length would be 0,13 km, which is almost the same distance as the mean trip length of the observed

routes. This number is a result of some very short routes generated by the algorithm.

N Minimum Maximum Mean Std. Dev Variance

Detour in KM 554 0 0,61 0,126 0,132 0,017

Trip in KM 554 0 0,62 0,136 0,132 0,018

Shortest Walk 554 0 0,09 0,01 0,0132 0,000

Valid N (listwise) 554 0

Table 9: Shortest routes and detours

To find out if people choose one of the shortest routes between origin and destination, the

distribution of chosen routes ranked by distance is visualised in Figure 32. As can be seen in the

graph, the third shortest route in a choice set has the highest percentage of chosen routes (9%).

Another conclusion is also that in 7+7+9 = 23% of the cases, one of the three shortest routes is

chosen. Based on these numbers, a new graph is plotted which divide the routes of the choice set

(total of 20 or 21) into four categories (Figure 33). When a chosen route is one of the five shortest

65

routes, it belongs to category one, and when it is one of the five longest routes, it belongs to the

last category. As seen in the results in Figure 33, the first category is the largest category with

almost 35%. This means that 35% of the chosen routes belongs to the five shortest routes of the

choice set.

Figure 32: Distribution of chosen routes ranked by distance (in percentage and counts)

This result meets our expectation that route choice is influenced by trip length, and therefore the

new hypothesis is true. People might not always choose the shortest route (as generated by the

algorithm), but they mainly choose one of the shortest trips. Note that the last category is large as

well with 31%. This could be explained by the fact that some choice sets consist of 21 alternatives.

When this is the case, the last category is bigger than the other three categories: the first three

consist each of five alternatives in total, the last one in case of 21 alternatives, consists of 6

alternatives (number 16 to 21 of the choice set). So when the observed route is not generated by

the algorithm, and it is also longer than the 20 other generated routes, it belongs to the last

category as number 21. The size of the last category could also be explained by the fact that people

sometimes make round trips (for example as leisure activities). Then, the generated non-chosen

routes are likely to be much shorter for a given OD pair.

Figure 33: Route classes grouped by distance

As Figure 33 gives a biased picture, because the 21-data set was included in this figure as well

(which resulted in a larger last category), the same analysis will be done for the two subsets of data

separately. For the subset of 20 alternatives, it is expected that the first category (five shortest

routes) is even larger than the 35% as shown before. For the subset of 21 alternatives, a large part

of the chosen routes is expected to be in last category (longest routes). This is also the fifth

hypothesis of our list. The results of these analyses are shown in Figures 34, 35 and 36.

66

Figure 34: Frequency tables of 20-data set (left) and 21-data set (right); distribution of chosen routes ranked

by distance

Figure 35: Route classes grouped by distance (20-data set)

Figure 36: Route classes grouped by distance (21-data set)

The data set consisting of choice sets of 20 alternatives has 365 routes, which means that the other

data set of 21 alternatives consists of 189 routes. As expected for the 20-data set, the first category

is larger than the first category shown in Figure 33 for the total data set (40,5% instead of 35%).

Also, the last category is smaller (26% instead of 31%), due to the fact that the non-generated

chosen routes (which are now proved to be mainly longer routes) are excluded from the data set. In

the frequency table of the 20-data set we observe the same trend as before: the chosen routes are

mostly the 3rd shortest route of the choice set. These results confirm our expectation: people mainly

choose one of the shortest routes, thus trip length has an influence on the route choices.

Also our expectation about the 21-data set, and the fifth hypothesis is true: a large part of the

chosen routes (40,7 %) of the 21-data set belongs to the highest category of longest routes (see

Figure 36). The bar of the 21st route (the longest route) is by far the highest bar of the frequency

table (see Figure 34). This means that if the chosen route was not generated by the algorithm, it

mainly belongs to one of the longest routes in the choice set.

67

However, when looking into the absolute values of the distances of the 20 and the 21-data sets

(Figure 37 and 38), we observe that the chosen routes of the 21-data set are not longer in absolute

distance (they are on average even shorter). The trip lengths are comparable, with a mean of 0,14

km (20) and 0,12 km (21). This means that the chosen routes of the 21-set are mainly one of the

longest in their choice set, but they are in absolute value not clearly longer than the chosen routes

of the 20-data set. Most of the routes of both data sets are below 0,1 km.

Distance Frequency

0,1 191

0,2 61

0,3 51

0,4 37

0,5 20

0,6 4

0,7 1

More 0

Total 365

Mean 0,144

Distance Frequency

0,1 117

0,2 42

0,3 9

0,4 9

0,5 11

0,6 1

More 0

Total 189

Mean 0,120

The three other three hypotheses concern with other route attributes than trip length. The second

hypothesis is about people’s preference for WalkOnly roads. In Table 8 we could observe that 35%

of the chosen routes had the largest fraction of WalkOnly in the choice set. Numbers also showed

that 37% of the chosen routes had the largest fraction of WalkSafe and 35% of the chosen routes

had the largest WalkAll fraction. Furthermore, the data showed that 29% of the chosen routes had

the smallest fraction of WalkAll in the choice set, thus 29% of the chosen routes had the largest

fraction of WalkOnly and WalkSafe together. As these numbers are very similar, it is not proved that

WalkOnly roads are clearly preferred to other roads with other road types. Therefore, the hypothesis

is rejected. An explanation could be that people are likely to take routes on different road types.

The third hypothesis is true, as the percentage that the chosen route had the smallest maximum rise

(smallest RiseMax) is larger than the percentage that the chosen route had the smallest average rise

(smallest RiseAverage). See Table 8 for the results. As expected, pedestrian route choices are more

influenced by the maximum rise on a route than by the average rise of the total route. Apparently, a

very steep short route is less attractive than a longer route that gradually rises.

The fourth hypothesis is about overlapping routes with other routes in the choice set. Literature

about overlapping routes tell us that routes having a lot of overlap with other routes are less likely

to be chosen (Ben-Akiva & Bierlaire, 1999). The utility of a route decreases when it has shared links

with other routes. A distinct route has the highest Path Size Factor of 1. According to the data, 18%

of the chosen routes had the largest Path Size Factor (thus least overlap). Both according to PS1

0

20

40

60

80

100

120

140

0,1 0,2 0,3 0,4 0,5 0,6 More

Fre

qu

en

cy

Length in km

Figure 38: Histogram of distances (21-data set)

0

50

100

150

200

250

0,1 0,2 0,3 0,4 0,5 0,6 0,7 MoreF

req

ue

ncy

Length in km

Figure 37: Histogram of distances (20-data set)

68

and PS2 this was approximately 18% of the chosen routes. The PSC factor shows that only 5% of

the chosen routes had the least overlap. The PSC results are not consistent with the other two PS

factors, so these results are not taken into account. If only in 18% of the cases the most distinct

route (with the least overlap: highest PS1 and/or PS2) was chosen, it means that 82% of the chosen

routes were not the most distinct. Thus most distinct routes are not clearly preferred to overlapping

routes. The hypothesis about general preference for most distinct routes can therefore be rejected.

An explanation could be found in the trip lengths of the trips. It is likely that many of the generated

non-chosen routes show lots of overlap with the chosen route, as seen in the examples of Figure 28

and 31. The more alternative routes, the bigger the chance that the chosen route will become less

distinct. An explanation is the short distances of the trips: trip lengths between O and D are not very

long, so the chance for generating overlapping alternative routes is bigger.

The last two analyses concern with the composition of the choice set and the correlations between

attributes. As using well-sampled choice sets could lead to better model estimates, it is useful to

know how routes are distributed within one choice set. This knowledge could support to sample

alternatives, or to decide to not use samples. In this analysis, only the trip length is taken into

account. From both data sets, two choice sets are randomly selected and visualised in Figures 39

and 40. In the choice sets of both data sets, there are differences observed in trip lengths over the

full choice set, but the differences between the alternatives are very small. When for example only a

sample of the full choice set is taken into account in the estimation process (as shown in red areas

in the graphs) this would result in no significant results for the trip length, as there are only small

differences in trip lengths among the sample alternatives. Therefore, to obtain better estimation

results, the full choice set needs to be taken into account for estimation. Alternatively, a well-

sampled choice set could be used for estimation, sampled from the total choice set of 20 or 21. A

choice set is well sampled when there are differences observed in attribute values: when there are

hardly any differences, it is hard to determine which attributes have an influence on the route

choices. This would result in insignificant results, while there might be significant results when using

different (in composition and size) and better samples.

Figure 39: Trip distances of two choice sets of 20-data set (left chosen is 0,09; right 0,16)

Figure 40: Trip distances of two choice sets of 21-data set (left chosen is 0,11; right 0,08)

0,09

0,00

0,10

0,20

0,30

0,40

1 3 5 7 9 11 13 15 17 19

0,16

0,00

0,10

0,20

0,30

0,40

0,50

1 3 5 7 9 11 13 15 17 19

0,11

0,00

0,02

0,04

0,06

0,08

0,10

0,12

1 3 5 7 9 11 13 15 17 19 21

0,08

0,00

0,02

0,04

0,06

0,08

0,10

0,12

0,14

0,16

1 3 5 7 9 11 13 15 17 19 21

69

Lastly, we analysed the correlations between attributes, as these results could help in the

interpretation of the model estimation results. As some of the attributes clearly correlate (RiseMax

and RiseAverage; WalkOnly, WalkSafe and WalkAll; PS1, PS2 and PSC), we only look into the

correlations between the attributes and the trip length. With Degrees of Freedom of 19 (for 20-data

set) and 20 (for 21-data set) and a P-value of 0.05, the value for Pearson Chi-Square should exceed

the value of 30.14 (for 20) or 31.41 (for 21) to show a clear correlation. For the 20-data set, only

Min RiseMax, Max Walkonly fraction and Max PS1 and PS2 (least overlap) have a correlation with the

trip length. For the 21-data set, none of the attributes show a clear correlation with the trip length.

Attributes Correlation

with Distance

(20)?

Pearson Chi-

Square

For 20-data set

Correlation

with Distance

(21)?

Pearson Chi-

Square

For 21-data set

Min RiseMax V 33,3 - 21,3

Min RiseAverage - 21,2 - 28,3

Max WalkOnly V 31,7 - 14,4

Min WalkOnly - 20,8 - 28,5

Max WalkSafe - 19,4 - 24,0

Min WalkSafe - 19,9 - 24,4

Max WalkAll - 19,5 - 20,4

Min WalkAll - 30,1 - 25,5

Max PS1 V 35,4 - 28,8

Max PS2 V 37,1 - 26,1

Min PSC - 15,9 - 21,1

Table 10: Correlations between attributes for 20 and 21-data sets

5.4 Conclusion

This chapter aims at answering the following sub-question: What reveals the GPS data about the

choice behaviour of pedestrians? This research question will be answered by conducting descriptive

analyses on the GPS and the generated data. Hypotheses about the data, based on findings from

literature, will guide the descriptive analyses. Results of this chapter could be used to guide and

design the estimation process.

First, descriptive analyses are conducted on the observed and the non-chosen routes, and their

results are compared. When analysing the observed trips, we could conclude that people in Zürich

mainly walk short distances: the mean trip length is 0,13 km and the median is 0,08 km. Most of the

trips were below 0,1 km and there were no trips observed that were above 1,0 km. When comparing

the observed routes with the non-chosen generated routes, we observe that the trip lengths of the

non-chosen routes are on average shorter than the observed routes. But the generated routes have

on average a higher maximum rise, average rise and WalkAll fraction. The PS factors are on average

higher for the observed routes, which means that the observed routes are less overlapping. The

median of 0,08 km and the mean trip length of 0,13 km raises questions about the validity of the

data sample. In reference studies, the means and medians of observed routes are clearly bigger.

The numbers found in this study are also far below the standard for walking distances used by the

industry. Further analysis revealed that most of the short distance trips were probably transits

between modes or lines, trips in or around the house or disturbances in the GPS data. The mean

and median of the observed trips are very small for normal pedestrian behaviour, and therefore the

data sample cannot be seen as representative for normal pedestrian behaviour in cities. This makes

the data sample used in this study invalid to scientifically answer the research questions, thus the

70

results based on this data sample are not applicable to larger data samples. Surprisingly, the choice

set generation algorithm was able to generate 20 alternatives for most of the routes. Further

analysis in visualisation software showed that most of the generated routes have a lot of overlap

and some routes are much longer than the chosen route.

To test the hypotheses, we analyse how the chosen route relate with the non-chosen alternative

routes. The first hypothesis, which states that pedestrians always choose the shortest route, is not

true: data showed that people only choose in 7% of the cases the shortest route of the choice set.

As it is unlikely that trip length has no influence on the route choices, this needs to be further

analysed. To analyse this, the total data set was split into two subsets: the 20-data set consisting of

choice sets having 20 alternatives and the 21-data set consisting of choice sets having 21

alternatives. The reason for this is that we assume that the people who made the trips in the 20-

data set have different trip purposes than the people who made the trips in the 21-data set, which

results in different travel behaviour. When we divide the routes in route categories, based on trip

length, we observe that in normal conditions (20-data set) people choose mainly one of the shortest

routes (40,5% of the chosen routes was one of the shortest routes). As expected, the people from

the 21-data set shows different travel behaviour and mainly choose one of the longest routes of the

choice set. This confirms our fifth hypothesis: if the chosen route was not generated by the

algorithm (resulting in 21 routes), it mainly belongs to one of the longest routes in the choice set.

The other hypotheses concern with other route attributes than trip length. The second, which says

that people have a preference for WalkOnly roads, is not true: 35% of the chosen routes had the

largest fraction of WalkOnly in the choice set, 37% the largest fraction WalkSafe and 35% the

largest fraction WalkAll. As these numbers are very similar, it is not proved that WalkOnly roads are

clearly preferred to other roads with other road types. An explanation could be that people are likely

to take routes on different road types. The third, which states that maximum rise is more important

for route choices than average rise, is true, as the percentage that the chosen route had the

smallest maximum rise is larger than the percentage that the chosen route had the smallest average

rise. The fourth hypothesis about general preference for most distinct routes is not true: 18% of the

chosen routes was the most distinct route of the choice set, thus most distinct routes are not clearly

preferred to overlapping routes. An explanation are the lengths of the trips: for short trips it is more

likely that the generated alternative routes shows overlap with the chosen route. The more

alternative routes, the bigger the chance that the chosen route will become less distinct.

The last two analyses concern with the composition of the choice set and the correlations between

attributes. As using well-sampled choice sets could lead to better model estimates, it is useful to

know how routes are distributed within one choice set. When analysing the trip lengths of the routes

within one choice set, we observe that differences between alternatives could be very small. This

should be taken into account when composing a sample for model estimation: a sample with similar

trip lengths would result in no significant results for trip length. Therefore, in the estimation process

the full choice set needs to be taken into account, or a well-sampled choice set which show

significant differences in attributes.

The results of correlation analysis could help in interpreting estimation results: when attributes

correlate, one of them could show insignificant results in estimation. There are a few correlations

observed among route attributes: in the 20-data set the trip length shows correlation with Min

RiseMax, Max Walkonly and Max PS1 and PS2 (least overlap).

71

72

73

6 Estimation of route choice models

In this chapter the route choice models will be estimated, using the software BIOGEME (Bierlaire,

2003). In this thesis an unlabelled experiment will be adopted, as the alternatives are unlabelled. An

experiment is unlabelled when the names of the alternatives (for example alternative A and B) do

not convey meaning to the respondent on what the alternatives represent in reality and do not

provide any useful information to suggest that there are unobserved influences that are

systematically different for alternatives A and B (Hensher, Rose, & Greene, 2005). In this experiment

it means that alternative 1 of Origin-Destination A for person X is different from alternative 1 of

Origin-Destination A for person Y and that Origin-Destination A for person X is different from Origin-

Destination A for person Y. All alternatives have the same attributes. The experiment is unlabelled as

all pedestrians walked different routes between different origins and destinations. The implication of

using an unlabelled experiment is that no Alternative Specific Constants will be estimated.

In this section, different models will be estimated. The reason why is that several researchers have

found that the size and composition of the choice set have an influence on the estimation results

(Prato & Bekhor (2007); van der Waerden et al. (2004); Bliemer & Bovy (2008)). Different

intermediate models using different composition and sizes will be estimated, in order to find the

model with the best model result. The following research question will be answered in this section:


model results?

The second research question of this section concerns the approach that we are using in this thesis:

pedestrian behaviour could be seen as utility maximizing behaviour. If this is true, it should be

possible to successfully estimate a pedestrian route choice model, and to obtain significant

estimation results. The second research question of this section is:


To obtain better estimation results, the total data set is from the beginning split into two data

subsets: one data set consisting of choice sets of 20 alternatives and the other data set consisting of

choice sets of 21 alternatives. The reason for this is given in the previous chapter: travel behaviour

of the pedestrians in the 20-data set is assumed to be significantly different from the travel

behaviour of the pedestrians in the 21-data set, and therefore the route choice behaviour of both

groups cannot be explained by the same model. Results of descriptive analysis in the previous

chapter have shown that this assumption about the travel behaviour is true. First, the models for the

20-data set (behaviour under normal conditions) will be estimated, after the models for the 21-data

set. Intermediate conclusions will be formulated after each section.

74

6.1 Research plan

The main conclusions from the previous chapter could be used as guidance in the estimation

process. The first conclusion was already mentioned in the introduction of this chapter: the travel

behaviour of the pedestrians in the 20-data set is assumed to be significantly different from the

travel behaviour of the pedestrians in the 21-data set. Pedestrians from the 20-data set mainly

choose one of the shortest routes while pedestrians from the 21-data set mainly choose one of the

longest routes. It turned out that when the chosen route was not generated by the algorithm (which

results in a choice set of 21 alternatives), the chosen route mainly belongs to one of the longest

routes of the choice set. Therefore, to obtain better estimation results, the total data set was split

into two subsets, as seen in Figure 41.

Figure 41: Overview of model estimations

As several researchers have found that size and composition of the choice set have an influence on

the estimation results, different models will be estimated. In the end, results of different models

could be compared, and the model with the best results could be selected as final results. Frejinger

et al. (2009) showed that better model estimation could be obtained by using relevant samples as

choice sets for estimation. Therefore, we expect that well-sampled choice sets would result in better

estimation results.

The first models that will be estimated are the basic models for the 20-data set and the 21-data set.

These basic models include all alternatives in the choice set (so 20 and 21) and could be used as

reference results for other model estimations. First, the parameters of the basic models are

estimated independently, so find out if they actually have an influence on the route choices. In this

estimation process, the influence of the attributes is not influenced by other attributes, so the result

of independent estimation is not relative to other attributes. Then, two models will be estimated,

both including all attributes. The difference between the two models is the definition of the trip

lengths: in the first, the trip length is expressed in distance (km) and in the second model, the trip

length is expressed as a route class. The reason to use these two expressions was given in the

previous chapter: apparently, people do not always choose the shortest route, but they mainly

choose one of the shortest routes in normal conditions. When using only distance for trip length, this

would lead to insignificant results for trip length, as people do not always choose the absolute

shortest route. This is incorrect: when people choose one of the shortest routes, trip length actually

has an influence on route choices, but maybe their perceived shortest route is not the actual

shortest route. However, the models are also estimated with distance (km) as trip length, to be able

All Choice Sets

20 alternatives

Basic Model

- Independent- Distance

- Route classes

Longest routes


- Route classes

Random sample


- Route classes

Imp. Sampl. 1


- Route classes

Imp. Sampl. 2


- Route classes

21 alternatives

Basic Model


- Route classes

Best Sample


- Route classes

75

to compare the results between the two models. To find out if people really choose one of the

shortest routes, the trip lengths are represented as four route classes based on trip length: the first

route class contains the shortest routes, while the last route class contains the longest routes. Based

on findings from the previous chapter (Figure 35 and 36), we expect that the Route class 1 is

significant and most positive for the 20-data set and Route class 4 is significant and most positive

for the 21-data set.

To capture this behaviour in the model, dummy variables are proposed to represent the route

classes, such that the system recognizes certain routes as ‘one of the shortest routes’ or ‘one of the

longest routes’. For the data sets, 4 route classes are defined as shown in Table 11. Every route of

the choice set belongs to one of these route classes.

Route class Boundaries Definition

A Shortest routes (Min + B2)/2 = B1 Min ≤ X ≤ B1

B 2nd

shortest routes (Min + Max)/2 = B2 B1 < X ≤ B2

C 3rd

shortest routes (B2 + Max)/2 = B3 B2 < X ≤ B3

D Longest routes > B3 B3 < X ≤ Max

Table 11: Route class definition

As every choice set has different values and different ranges of values, the author has chosen for

this method to define the route classes. This method defines the route classes for every choice set in

a consistent way. Moreover, it enables to have four route classes of the same range within each

choice set. The range of the route classes depends on the range of the distances (minimum and

maximum distance), so the ranges of the route classes could differ between choice sets.

When the basic models are estimated, samples of alternatives will be used for estimation. For the

20-data set, four samples will be used:

• Longest routes (20 alternatives)

• 6 randomly chosen alternatives from a total set of 20 alternatives

• 6 alternatives selected based on importance sampling on trip length (1)


A sample of longest routes will be used for estimation to see what the influence is of trip length on

longer routes. As concluded from the previous chapter, the mean and median of the observed

routes are very small. Therefore, it might be interesting to only look into the longer routes, as these

routes might have longer and more heterogeneous alternative routes. So an assumption here is that

longer routes have more heterogeneous alternative routes in their choice set. These routes could

possibly provide more insights into pedestrian route choice behaviour, as route attributes (effort and

comfort) are more important on longer routes than on short routes. For a pedestrian, there is maybe

no difference in effort between a route of 50 or 55 meters, while there is maybe a difference

between a route of 500 or 550 meters.

The three other samples all have the same choice set size (6 alternatives), but their compositions

are different. These compositions were chosen to find out what the differences in results are

between random sampling and importance sampling. Based on findings from literature (Frejinger,

Bierlaire, & Ben-Akiva, 2009), we expect that importance sampling would result in better model

results. In the previous chapter we also observed that differences between trip lengths in one choice

set could be very small (Figure 39 and 40), so this observation also leads to the expectation that

importance sampling would lead to better model results than random sampling. In random sampling

76

the chance is higher that alternatives with similar trip lengths are sampled. For all these four

samples, all three model estimations will be conducted as for the basic models: first estimating the

parameters independently, then estimate two models with all attributes using two different

expressions for trip length.

When all the samples are used for estimation, we know which sampling method leads to the best

model results. Only this best sampling method will be used for the 21-data set for estimation, next

to the basic model. Also for this model, first the parameters will be estimated independently, then

two models will be estimated using the two expressions for trip length.

The route attributes that will be taken into the model estimation are trip length (distance in km or

route class), risemax (maximum rise), road types (walkonly, walksafe and walkall fractions) and

Path-Size factors. Description and units for these attributes can be found in Table 4. Maximum rise is

preferred to average rise because descriptive analysis in the previous chapter has shown that

maximum rise is perceived as more important in route choices than average rise.

Other expectations, based on findings of the previous chapter, are that there is no strong preference

for a specific road type (parameter values of road types are not extremely high, or not even

significant) and that the Path-Sizes have a negative influence, but their parameter values are not

extremely high either (no strong preference for most distinct routes). Lastly, for the 20-data set we

found correlations between trip length and Min RiseMax, Max Walkonly and Max PS1 and PS2 (least

overlap). When one of these attributes is insignificant in the 20-data set model estimation, an

explanation could be that the attribute correlate with another attribute.

6.2 Model specification

In this thesis it is assumed that pedestrians, like other travellers, choose a route before traveling by

selecting the alternative with the highest utility. Having this in mind, a discrete choice modelling

framework is adopted in which pedestrians choose an alternative among a discrete number of

alternatives known to him. Pedestrians are assumed to take the whole range of attributes into

account that maximizes their utility. Route choice is assumed to be a simultaneous choice: the

pedestrian makes his choice for the entire route before starting the trip and he does not change the

route on the way. They are also likely to make trade-offs between attributes: a very steep but short

trip or a longer trip that gradually rises.

Panel data was used for estimation, as each participant has multiple observations. Panel data could

provide evidence of the preferences of each individual in different circumstances. When using panel

data, responses from the same individual are not ‘independent’, while the general discrete choice

modelling framework was based on the assumption of the independence of the observations. This

complication with panel data and methods to correct for correlated observations are discussed in

Daly & Hess (2010).

The adopted model formulation is the Path-Size Logit model from Ben-Akiva & Bierlaire (1999). As

discussed in chapter 3, this model is selected because it could take overlap between alternatives into

account while retaining the simple MNL structure. To use this model, it is required to calculate Path-

Size factors for each alternative in the choice set. The different methods to calculate the adjustment

term (Path-Size factor) are discussed in section 3.6. The used model formulation by Ben-Akiva

& Bierlaire (1999) is shown below (11):

PSin

77

P(i Cn) = eµ(Vin+ln PSin )

eµ(Vjn+ln PSjn )

j∈Cn

∑ (11)

The Path-Size factors are attributes in utility functions, thus also for these terms the parameters are

estimated. In this thesis, the Path-Size factors are calculated over the full set of 20 or 21

alternatives, thus according to the original statement of Ben-Akiva & Bierlaire (1999), the PSL model

can only be estimated with the full choice set of 20 or 21 alternatives. When using a smaller sample

of alternatives, the Path-Size factors need to be calculated again for these choice sets. This

statement is based on the idea that Path-Sizes need to be calculated based on the physical

overlapping of paths in the generated choice set only, and they ignores correlation with other routes

from the universal choice set. However, Frejinger (2009) showed that the best estimation results

can be achieved by calculating correlations based on full (true) choice sets, and not only on the

generated choice set. Therefore, she argued that unbiased estimation results can only be obtained if

the Path-Sizes reflects the correlation among all possible paths. The more paths are included in the

Path-Size calculation, the better the representation of the correlation structure. Having this in mind,

the calculated Path-Sizes based on the full choice set of 20 or 21 alternatives will also be used to

estimate the Path-Size Logit models for the samples (6 alternatives).

6.3 Basic Model

In the Basic model, all alternatives (20) were used to estimate the model. Almost all calculated

attributes were taken into the utility function (see Table 4 for an overview and description of the

attributes). All values of the attributes were normalised such that all values of all attributes were

between 0 and 1. The sum of the three road type fractions is always 1: each part of each route

always belongs to one of these road types. The calculated Path-Size factors are also values between

0 and 1. However, the Path-Size factors (PS1 and PS2) of Ben-Akiva & Bierlaire (1999) are used in

two forms: in regular form (value between 0 and 1, a distinct route had a PS1 and PS2 factor of 1)

and in logarithmic form, as recommended by Ben-Akiva & Bierlaire (1999), such that the PS factors

are very negative for overlapping routes and 0 for distinct routes. The PSC factor is only used in

regular form (PSC is 0 for distinct routes).

For estimation, these two utility functions are used:

1 2

* * *

* * * 1 * 2 *DISTANCE RiseMax WalkOnly

WalkSafe WalkAll PS PS PSC

U DISTANCE RiseMax WalkOnly

WalkSafe WalkAll PS PS PSC

β β ββ β β β β= + +

+ + + + + (12)

In the utility function above (12), the trip length is expressed in distance in travelled kilometres. The

data show the calculated trip length in kilometres. In the second utility function (13) the trip lengths

are categorized in route classes (as defined in Table 11). The route classes are dummy variables,

which means that the value is 1 when the route belongs to the route class and 0 otherwise.

1 2

* * * *

* * * *

* 1 * 2 *

AClass BClass CClass DClass

RiseMax WalkOnly WalkSafe WalkAll

PS PS PSC

U AClass BClass CClass DClass

RiseMax WalkOnly WalkSafe WalkAll

PS PS PSC

β β β ββ β β ββ β β

= + + ++ + + +

+ + +

(13)

78

6.3.1 Independent estimation of parameters

First, the parameters are estimated independently to see what the estimation results are of the

different attributes without being influenced by other attributes. The results of the 20-data set Basic

model are shown in Table 12. This data set has 365 observations each having 20 alternatives. The

distance is expressed in kilometres.

Parameters Value Rob.

St err

Rob.

t-test

Rob.

p-val

Rho-

square

Adjusted

Rho-square

Significant

BETA_DISTANCE 1.05 0.435 2.41 0.02 0.003 0.003 V

BETA_ACLASS -0.807 0.157 -5.15 0.00 0.020 0.020 V

BETA_BCLASS 1.63 0.246 6.64 0.00 0.036 0.036 V

BETA_CCLASS 0.795 0.192 4.13 0.00 0.011 0.010 V

BETA_DCLASS -0.448 0.210 -2.14 0.03 0.004 0.003 V

BETA_RISEMAX -37.6 8.05 -4.67 0.00 0.085 0.085 V

BETA_WALKONLY -0.475 0.763 -0.62 0.53 0.001 -0.000 -

BETA_WALKSAFE 2.09 0.671 3.12 0.00 0.008 0.007 V

BETA_WALKALL -0.648 0.522 -1.24 0.21 0.002 0.001 -

BETA_PS1DIST -4.04 1.51 -2.67 0.01 0.036 0.035 V

BETA_Log(PS1DIST) -4.09 0.719 -5.69 0.00 0.092 0.091 V

BETA_PS2DIST -5.07 1.65 -3.08 0.00 0.052 0.051 V

BETA_Log(PS2DIST) -4.53 0.565 -8.01 0.00 0.135 0.134 V

BETA_PSCDIST -1.51 0.366 -4.14 0.00 0.009 0.009 V

Table 12: Basic model with 20 alternatives, attributes independently estimated

When estimating the parameters independently, almost all of them seem to be significant (at 5%

level, standard t-tests, absolute value should be larger than 1,96). Only the parameters for WalkOnly

and WalkAll are insignificant. This means that the rest of the attributes has an influence on the route

choices of the pedestrians. Goodness of fit (how well the model fits the data) is represented as the

adjusted rho-square. This is a value between 0 and 1 and the closer to the 1, the better the

Goodness of fit is. When this number is between 0,2 and 0,4 the Goodness of fit is best.

Remarkable is that Distance has a positive influence on the route choices, while pedestrians are

assumed to minimize their trip length. Also, the results of the Route classes show similar results, as

AClass (the shortest routes) has a negative influence on the route choices. The AClass was expected

79

to have at least a positive influence, as statistics have shown that people mainly choose one of the

shortest routes (see Figure 35). It is also remarkable that BClass is significantly positive in these

results, while statistics showed that this was the least chosen route class. The CClass has also a

positive effect, while the DClass has a negative effect. Except for the DClass, the estimation results

do not confirm our expectations based on descriptive analysis: the two smallest groups (B and C)

has positive effect on the route choices, while these groups were least chosen. Further analysis is

needed in order to explain the surprising results.

Maximum Rise seems to be the most dominant factor in route choices, as its value in the model

results is significantly larger (very negative) than the rest. Pedestrians seem to have a large aversion

to very steep routes. Regarding the road types, only WalkSafe (pedestrians and bikes) is significant

and positive. This means that mixed paths for pedestrians and cyclists only are preferred by

pedestrians, or they are more available in the network. Lastly, all forms of Path-Size factors are

significant and they show consistent results (all have a negative value). The absolute values for the

regular forms and the logarithmic forms of the PS factors are close, but their adjusted rho-squares

differ in values. From all the Path-Size factors, the Adjusted rho-square of LogPS2 is the highest: the

difference with the other adjusted rho-squares is quite big. As LogPS2 shows the best model fit, this

Path-Size factor will be used in the estimation of the Path-Size Logit model.

6.3.2 Basic model results

When estimating the model with all parameters, the relative influence of the attributes could be

determined. In these results we could see which attributes have the most influence and which the

least. The models are either estimated using the distance for trip length or the route classes (see

utility functions (12) and (13) above). As the sum of all road types is always 1, only the parameters

for WalkOnly and WalkSafe are estimated to avoid correlated results. WalkAll is fixed in this

estimation, as the result should result from the outcomes of the other two parameters. Also for the

same correlation reasons, only one of the Path Size factors at the time is estimated. LogPS2 is

selected to include in the model, because this form showed the best model results in the

independent estimation. To find out if this was a good choice, both PS factors were tested in

estimation. When using LogPS1 in the estimation with all parameters, the adjusted rho-square were

0,174 (with Distance) and 0,191 (with Route classes) and when using LogPS2 the adjusted rho

square were 0,206 (Distance) and 0,219 (Route classes) so LogPS2 is the actual better choice.

When estimating the model with all parameters, the Distance is not significant anymore. The

correlation matrix (Table 10) shows that Distance has a significant correlation with PS1, PS2,

RiseMax and WalkOnly. This correlation between Distance and PS2 and Distance and RiseMax could

explain why the Distance parameter is not significant anymore. The correlation between Distance

and PS2 could be explained by the fact that short distances have a higher chance to show overlap

with other routes. The correlation between Distance and RiseMax could be explained by the

probability that steep routes are often short routes. RiseMax, WalkSafe and LogPS2 remain

significant in this model and show the same trend in results as in Table 12. The model fit is actually

quite good (adjusted rho-square of 0,206).

When using Route classes as trip length for estimation, none of the Route classes show to be

significant (see Table 14). Apparently, these class parameters correlate with other parameters or

with each other, as they have shown to be significant in the independent estimation. Classes

correlate because they are all based on trip length. RiseMax, WalkSafe and LogPS2 remain

significant in this model and show the same trend in results as in previous estimations.

80

Model: Path-Size Logit for panel data (Distance)

Number of estimated parameters 5

Number of observations 365


Null log-likelihood -1093.442

Cte log-likelihood -729.495

Init log-likelihood -1093.442

Final log-likelihood -863.318

Likelihood ratio test 460.249

Rho-square 0.210

Adjusted rho-square 0.206


St err

Rob.

t-test

Rob.

p-val

Significant

BETA_DISTANCE 0.443 0.410 1.08 0.28 -

BETA_RISEMAX -32.6 7.65 -4.26 0.00 V

BETA_WALKONLY 0.217 0.836 0.26 0.79 -

BETA_WALKSAFE 3.37 0.807 4.17 0.00 V

BETA_WALKALL - - - - -

BETA_Log(PS2DIST) -4.19 0.569 -7.35 0.00 V

Table 13: Basic model PSL results, trip length in Distance (km)

Model: Path-Size Logit for panel data (Route classes)









Rho-square 0.227



St err

Rob.

t-test

Rob.

p-val

Significant

BETA_ACLASS -0.390 0.346 -1.13 0.26 -

BETA_BCLASS 0.675 0.459 1.47 0.14 -

BETA_CCLASS 0.218 0.420 0.52 0.60 -

BETA_DCLASS -0.503 0.397 -1.27 0.20 -

BETA_RISEMAX -31.9 7.43 -4.30 0.00 V

BETA_WALKONLY 0.0286 0.868 0.03 0.97 -

BETA_WALKSAFE 3.06 0.815 3.75 0.00 V


BETA_Log(PS2DIST) -3.78 0.575 -6.58 0.00 V

Table 14: Basic model PSL results, trip length in Route Classes

81

6.3.3 Conclusion and next steps

The first conclusion is that it is possible to estimate route choice models from the GPS data. When

estimating the parameters independently, almost all of them have shown to be significant.

Surprisingly, distance has a positive influence on the route choices, and the Route class of shortest

routes is not the preferred class when choosing routes. These results do not meet our expectations

based on descriptive analysis and neither the findings found in literature, while the descriptive

analysis is carried out with exactly the same data set. As the classes are estimated independently,

correlation could not be an explanation. This observation needs further research.

When the parameters are combined in one model for estimation, the distance and the route classes

are not significant anymore. Correlation with other attributes, or between classes, could be an

explanation for the insignificant results. The results of the other parameters in these combined

models do meet our expectations: Path-Size factor has a negative influence, WalkSafe positive and

RiseMax negative (however, it was not expected to be this negative). It was expected that WalkOnly

paths are most preferred by pedestrians. The insignificance of WalkOnly paths could be explained if

there is a low number of WalkOnly paths available in the network. Then, pedestrians do not have

the choice to choose for WalkOnly, which results in a significant and positive result for WalkSafe.

In intermediate models, the Path-Sizes were estimated in normal form and in logarithmic form. The

logarithmic forms resulted in better model fit, and therefore only the logarithmic forms will be used

in further estimations. The values for Goodness of Fit are very satisfactory, especially concerning a

revealed preference study. As size and composition of the choice set influence model estimates

(Prato & Bekhor (2007); van der Waerden et al. (2004); Bliemer & Bovy (2008)), the next step is to

experiment with different sizes and compositions of the choice set.

6.4 Sampling of alternatives

A method to vary in sizes and compositions of choice sets is to sample alternatives. As there is a full

choice set of 20 or 21 alternatives available, it is possible to create different subsets for estimation.

Samples could be randomly selected, or importance sampling can be used. The importance sampling

approach proposed by Frejinger et al. (2009) is described in section 3.5.4. She introduced an

importance sampling approach for choice set generation, which aims at defining a choice set

allowing for unbiased estimation and prediction results using samples of alternatives. The reason for

developing this approach is that it is impossible to generate complete choice sets, required for

avoiding bias in the model. Moreover, complete choice sets are also behaviourally not realistic. In

this section, different models will be estimated, using different samples of alternatives.

6.4.1 Samples

As mentioned in the research plan, four subsets will be used for model estimation. The first subset

consists of a sample of choice sets from the total amount of choice sets: all alternatives are taken

into account, but not all choice sets. The subset consist of the longest routes from the total 20-data

set: only the trips with a longer trip distance than 450 meters are selected. The three other subsets

consists of samples of alternatives: all choice sets are taken into account, but not all alternatives.

These samples of alternatives consist of six alternatives. The reason why six alternatives are chosen

is because people could in general only consider about six alternatives for each route (Bovy & Stern,

1990). The four subsets are:

82

• Longest routes (20 alternatives)

• 6 randomly chosen alternatives from a total set of 20 alternatives



The second subset is randomly chosen, which means that there is a chance that only very

unattractive routes are selected, or that all alternatives are very similar. The last problem is also

visualised in Figure 39 and 40. When the differences in distances between the alternatives within a

choice set are very small, no meaningful results can be obtained, because all routes are considered

as similar (concerning the trip length).

Due to the non-linear nature of the estimated models, and to avoid the problem described above for

the randomly chosen alternatives, importance sampling is used for the third and fourth subset to

select alternatives. The idea is to have a broad variation in routes, to better understand why certain

routes are chosen and why other routes are not chosen. This is better to understand when the

differences between alternatives are clear. For these samples, importance sampling is based on trip

length only, as there seems to be small differences between the trip lengths. Two methods are used

to form the samples:

- The alternatives within a choice set are ranked from small to large; the chosen route is

ranked as the first in the choice set. The sample consists of the first (the chosen route),

second, third, 11th, 19th and 20th of the choice set (Importance Sampling 1)

- The alternatives within a choice set are ranked from small to large; the chosen route is

ranked as the first in the choice set. The sample consists of the first (the chosen route),

second, 7th, 11th, 15th and 20th of the choice set (Importance Sampling 2)

However, the route utilities in this thesis are not corrected by a sampling correction. When using

only a sample of the choice set, a sampling correction is required to obtain unbiased estimation

results. Frejinger et al. (2009) found that using a sampling correction in estimation leads to better

model results than estimation without sampling correction. For further research, it would be

interesting to compare the results where sampling correction is used and where not.

6.4.2 Sample of longest routes

In the 20-data set, there are only 15 chosen routes that are 450 meters or longer. A data set of 15

entries is very small, and therefore the sample is not representative. This means that model

estimates are not very realistic and thus not valid in this research. For the interested reader, the

model estimation results can be found in Appendix 4. Models are estimated in the same way as in

the previous section: independent parameter estimation, and two route choice models.

6.4.3 Random sample

This sample consists of choice sets, which each are formed by six randomly chosen alternatives (out

of 20). Largest difference with the independent estimation of parameters of the full choice set is that

not all the Route classes are significant. Values for Distance, RiseMax and WalkSafe are similar. The

following findings were found when the parameters were estimated independently:

83

• Distance is significant and positive (1,09)

• Route classes: only the first class is significant (-0,284), but negatively

• Risemax is significant and very negative (-34,0)

• Road types: only WalkSafe is significant (2,55), the other road types not

• Path-Sizes are all significant and negative


St err

Rob.

t-test

Rob.

p-val

Rho-

square

Adjusted

Rho-square

Significant

BETA_DISTANCE 1.09 0.487 2.24 0.02 0.005 0.004 V

BETA_ACLASS -0.284 0.115 -2.48 0.01 0.007 0.001 V

BETA_BCLASS 0.139 0.265 0.52 0.60 0.007 0.001 -

BETA_CCLASS 0.243 0.183 1.33 0.18 0.007 0.001 -

BETA_DCLASS -0.0982 0.0920 -1.07 0.29 0.007 0.001 -

BETA_RISEMAX -34.0 6.73 -5.06 0.00 0.113 0.111 V

BETA_WALKONLY -0.322 0.741 -0.44 0.66 0.000 -0.001 -

BETA_WALKSAFE 2.57 0.651 3.94 0.00 0.017 0.016 V

BETA_WALKALL -0.883 0.519 -1.70 0.09 0.005 0.003 -

BETA_Log(PS1DIST) -2.56 1.16 -2.21 0.03 0.028 0.026 V

BETA_Log(PS2DIST) -2.98 1.14 -2.62 0.01 0.039 0.037 V

BETA_PSCDIST -1.44 0.358 -4.01 0.00 0.012 0.011 V

Table 15: Random sample, attributes independently estimated

When using Distance as trip length in the full model estimation, RiseMax, WalkSafe and the Path-

Size factor are significant (see Table 16). When Route classes are used, the model showed very

similar results, as none of the classes were significant, but RiseMax, WalkSafe and Path-Size factor

were significant (similar values as in Table 16). Also the adjusted rho-squares of both models are

quite similar. Both Distance and all Route Classes are in the combined estimation not significant,

while Distance and Route class A were significant in the independent estimation, which means that

these parameters correlate with other parameters in the model.

84

Model: Path-Size Logit for random sample (Distance)





Cte log-likelihood 0.000




Rho-square 0.160



St err

Rob.

t-test

Rob.

p-val

Significant

BETA_DISTANCE 0.759 0.503 1.51 0.13 -

BETA_RISEMAX -32.4 6.92 -4.68 0.00 V

BETA_WALKONLY -0.00138 0.762 -0.00 1.00 -

BETA_WALKSAFE 3.05 0.724 4.22 0.00 V


BETA_Log(PS2DIST) -2.45 0.995 -2.46 0.01 V

Table 16: Random sample PSL results, trip length in Distance (km)

Model: Path-Size Logit for random sample (Route Classes)









Rho-square 0.160



St err

Rob.

t-test

Rob.

p-val

Significant

BETA_ACLASS -0.168 1.09 -0.15 0.88 -

BETA_BCLASS 0.0423 1.08 0.04 0.97 -

BETA_CCLASS 0.238 1.11 0.21 0.83 -

BETA_DCLASS -0.112 1.09 -0.10 0.92 -

BETA_RISEMAX -32.7 6.83 -4.79 0.00 V

BETA_WALKONLY -0.00657 0.790 -0.01 0.99 -

BETA_WALKSAFE 3.00 0.725 4.13 0.00 V


BETA_Log(PS2DIST) -2.40 0.980 -2.45 0.01 V

Table 17: Random sample PSL results, trip length in Route Classes

85

6.4.4 Importance Sampling 1

When estimating the parameters independently for the second sample, it shows better results than

the sample estimated above (see Table 18). Almost all parameters are significant, except for the

WalkOnly and WalkAll. Distance shows for the first time a negative effect, which is according to our

expectations based on descriptive analysis and literature findings. Surprisingly, at the same time

Route class A (shortest routes) show to be negative as well, while this parameter is expected to be

positive when Distance has a negative effect. Route class D has a negative effect, which also meets

our expectations. Apparently, pedestrians aim to minimize trip length, but they do not have a

preference for one of the shortest routes. Their preference seems go to Route class B and C (both

positive, B is highest). The results of RiseMax, WalkSafe and Path Sizes are in line with previous

model results.


St err

Rob.

t-test

Rob.

p-val

Rho-

square

Adjusted

Rho-square

Significant

BETA_DISTANCE -0.627 0.257 -2.44 0.01 0.003 0.002 V

BETA_ACLASS -0.826 0.109 -7.58 0.00 0.180 0.174 V

BETA_BCLASS 1.38 0.236 5.85 0.00 0.180 0.174 V

BETA_CCLASS 0.925 0.210 4.42 0.00 0.180 0.174 V

BETA_DCLASS -1.48 0.109 -13.64 0.00 0.180 0.174 V

BETA_RISEMAX -32.3 8.04 -4.02 0.00 0.101 0.100 V

BETA_WALKONLY -0.344 0.753 -0.46 0.65 0.000 -0.001 -

BETA_WALKSAFE 1.63 0.569 2.87 0.00 0.008 0.006 V

BETA_WALKALL -0.577 0.529 -1.09 0.27 0.002 0.000 -

BETA_Log(PS1DIST) -3.50 1.28 -2.73 0.01 0.050 0.048 V

BETA_Log(PS2DIST) -4.11 1.32 -3.11 0.00 0.068 0.066 V

BETA_PSCDIST -1.69 0.394 -4.30 0.00 0.018 0.016 V

Table 18: Importance Sampling 1, attributes independently estimated

In the combined models, Distance and all route classes are still significant (see Table 19 and 20).

Distance is also still negative, which is in line with our expectation and with what we have found in

literature. For both models, the results for RiseMax and WalkSafe are comparable. LogPS2 has a

larger influence in the Distance model than in the Route Class model. The Goodness of Fit is better

for the Route Class model (adjusted rho-square is 0.285) than for the Distance model (adjusted rho-

square is 0.158), which means that the Route Class model is preferred to the Distance model.

86

Model: Path-Size Logit for Imp. Sampling 1 (Distance)









Rho-square 0.166



St err

Rob.

t-test

Rob.

p-val

Significant

BETA_DISTANCE -0.625 0.243 -2.57 0.01 V

BETA_RISEMAX -31.1 8.15 -3.81 0.00 V

BETA_WALKONLY 0.210 0.734 0.29 0.77 -

BETA_WALKSAFE 2.19 0.718 3.05 0.00 V


BETA_Log(PS2DIST) -3.49 1.20 -2.91 0.00 V

Table 19: Importance Sampling 1 PSL results, trip length in Distance (km)

Model: Path-Size Logit for Imp. Sampling 1 (Route Classes)









Rho-square 0.297



St err

Rob.

t-test

Rob.

p-val

Significant

BETA_ACLASS -0.749 0.314 -2.38 0.02 V

BETA_BCLASS 1.26 0.339 3.73 0.00 V

BETA_CCLASS 0.909 0.367 2.48 0.01 V

BETA_DCLASS -1.42 0.327 -4.35 0.00 V

BETA_RISEMAX -30.6 7.81 -3.91 0.00 V

BETA_WALKONLY -0.446 0.844 -0.53 0.60 -

BETA_WALKSAFE 1.82 0.811 2.24 0.03 V


BETA_Log(PS2DIST) -2.37 0.829 -2.86 0.00 V

Table 20: Importance Sampling 1 PSL results, trip length in Route Classes

87

6.4.5 Importance Sampling 2

In this section the sample is used which is formed according to the second method of importance

sampling. The results of independent estimation shown in Table 21 do not look very promising. In

the independent estimation of the parameters, neither Distance nor any of the Route classes are

found to be significant. The only significant parameters are RiseMax, WalkSafe and the Path-Sizes.

Their values are similar to previous results.


St err

Rob.

t-test

Rob.

p-val

Rho-

square

Adjusted

Rho-square

Significant

BETA_DISTANCE 0.0546 0.304 0.18 0.86 0.000 -0.002 -

BETA_ACLASS -0.646 1.80e+

308

-0.00 1.00 0.096 0.090 -

BETA_BCLASS 1.12 1.80e+

308

0.00 1.00 0.096 0.090 -

BETA_CCLASS 0.447 1.80e+

308

0.00 1.00 0.096 0.090 -

BETA_DCLASS -0.925 1.80e+

308

-0.00 1.00 0.096 0.090 -

BETA_RISEMAX -33.3 8.30 -4.01 0.00 0.105 0.103 V

BETA_WALKONLY -0.347 0.736 -0.47 0.64 0.000 -0.001 -

BETA_WALKSAFE 1.49 0.579 2.58 0.01 0.007 0.005 V

BETA_WALKALL -0.504 0.512 -0.98 0.32 0.002 0.000 -

BETA_Log(PS1DIST) -3.38 1.29 -2.61 0.01 0.046 0.044 V

BETA_Log(PS2DIST) -4.03 1.34 -3.01 0.00 0.063 0.062 V

BETA_PSCDIST -1.40 0.382 -3.68 0.00 0.012 0.011 V

Table 21: Importance Sampling 2, attributes independently estimated

Estimation of the combined models does not result in satisfactory parameter estimates. As expected

from the independent parameter estimation, only Risemax, WalkSafe and the LogPS2 are significant

in both models. These variables show similar results as in previous estimations. Only these variables

don’t say much about pedestrian route choices, and therefore this method for sampling is not

preferred for model estimation.

88

Model: Path-Size Logit for Imp Sampling 2 (Distance)









Rho-square 0.159



St err

Rob.

t-test

Rob.

p-val

Significant

BETA_DISTANCE -0.0485 0.276 -0.18 0.86 -

BETA_RISEMAX -31.6 9.05 -3.49 0.00 V

BETA_WALKONLY 0.110 0.764 0.14 0.89 -

BETA_WALKSAFE 1.91 0.720 2.66 0.01 V


BETA_Log(PS2DIST) -3.41 1.18 -2.90 0.00 V


Model: Path-Size Logit for Imp. Sampling 2 (Route classes)









Rho-square 0.222



St err

Rob.

t-test

Rob.

p-val

Significant

BETA_ACLASS -0.535 0.557 -0.96 0.34 -

BETA_BCLASS 0.942 0.447 1.11 0.04 -

BETA_CCLASS 0.451 0.447 1.01 0.31 -

BETA_DCLASS -0.859 0.573 -1.50 0.13 -

BETA_RISEMAX -30.5 8.58 -3.56 0.00 V

BETA_WALKONLY -0.289 0.825 -0.35 0.73 -

BETA_WALKSAFE 1.58 0.751 2.11 0.03 V


BETA_Log(PS2DIST) -2.66 0.970 -2.75 0.01 V


89

6.4.6 Conclusion

Size and composition of the choice set have indeed a large influence on the estimation results. Not

only a significant difference in values of the parameters is observed across the samples, but also

whether a parameter is significant or not. RiseMax, WalkSafe and the Path-Size factor show

consistent results across all samples and the full choice set of 20: they are always significant and

they show comparable results. The sample of Importance Sampling 1 using Route classes shows the

best results in terms of adjusted rho-square (0,285) and significance of parameters. Also, when

estimating the parameters independently, most parameters were significant and the values for trip

lengths were close to our expectations. Therefore, the first method for Importance Sampling is

recommended for future use.

6.5 21 data set

As mentioned before in the introduction, the total data set is split into two subsets: one of choice

sets consisting 20 alternatives and the other of choice sets consisting 21 alternatives. The 21-data

set consists of 189 chosen routes. In these choice sets, the chosen route was not generated by the

algorithm and is therefore added to the choice set in the end of the choice set generation process.

Our expectation based on descriptive analysis is that the last route class (largest routes) has the

most influence (positive) on the route choices, as the chosen routes of this data set mainly belong to

the longest routes of the choice set.

6.5.1 Basic model

For the 21 data set, the same estimation process will be conducted as for the previous data sets.

Table 24 shows the results of independent estimation of the parameters. Distance is not significant,

but three Route classes are significant: Class A and Class D have a positive effect and route C a

negative effect. That Route class D has a positive effect is according to our expectation (based on

what is found in the descriptive analysis), as the chosen routes of this data set mainly belong to one

the longest routes within the choice set.

A remarkable result is that RiseMax is not significant, while this parameter has always been

significant in previous models (with large values). Note that all road types are significant: as

expected, WalkOnly and WalkSafe are positive while WalkAll is a negative factor. Also, all Path-Size

factors are significant but in contrast to previous results, these Path-Size factors are all positive. In

these results, LogPS1 shows to have the best model result, and therefore this Path Size factor is

taken into account in further estimation.

90


St err

Rob.

t-test

Rob.

p-val

Rho-

square

Adjusted

Rho-square

Significant

BETA_DISTANCE 0.687 0.679 1.01 0.31 0.001 -0.001 -

BETA_ACLASS 0.203 0.0961 2.11 0.03 0.015 0.008 V

BETA_BCLASS 0.269 0.392 0.69 0.49 0.015 0.008 -

BETA_CCLASS -0.827 0.352 -2.35 0.02 0.015 0.008 V

BETA_DCLASS 0.355 0.112 3.18 0.00 0.015 0.008 V

BETA_RISEMAX 2.53 2.09 1.21 0.23 0.002 0.000 -

BETA_WALKONLY 9.28 1.12 8.26 0.00 0.180 0.178 V

BETA_WALKSAFE 11.1 1.19 9.36 0.00 0.238 0.236 V

BETA_WALKALL -12.2 1.15 -10.65 0.00 0.441 0.439 V

BETA_Log(PS1DIST) 7.20 0.445 16.19 0.00 0.294 0.292 V

BETA_Log(PS2DIST) 7.07 0.439 16.10 0.00 0.284 0.283 V

BETA_PSCDIST 2.96 0.352 8.41 0.00 0.074 0.072 V

Table 24: Basic model 21-data set, attributes independently estimated

Table 25 and 26 show the results for estimation of the combined models. In both models, the trip

lengths (all Route classes and Distance) are not significant. Correlation with other attributes or

between route classes could be an explanation for this. All other attributes (RiseMax, WalkOnly,

WalkSafe and PS1) are significant and both models show comparable results. Note that RiseMax is

significant in these models, while it was not in the independent estimation (Table 24). This is

actually very remarkable, because RiseMax could not correlate with other attributes in the

independent attribute estimation. WalkOnly and WalkSafe both have a positive effect and the Path-

Size factor as well. Apparently, people from the 21 data set have a strong preference for WalkOnly

and WalkSafe roads and they don’t mind choosing overlapping routes (LogPS1 is positive). When the

chosen routes were one of the longest routes of the choice set, they have a higher chance to show

more overlap than the shorter routes from the choice set. This could be an explanation for the

positive Path-Sizes.

Note that the adjusted rho-squares for both models is remarkably high, especially for Revealed

Preference data: 0.500 and 0.514. Even for Stated Preference data these numbers would have been

very high. These numbers are also much higher than the adjusted rho-squares of the models

estimated with the other data set. Apparently, the 21-data set fits the model very well, which is

very remarkable because the 21-data were seen as the exceptions of the total data set. The high

value suggests that the generated choice set may contain too few reasonable alternatives, biasing

the parameter estimates.

91

Model: Path-Size Logit for panel data (Distance)









Rho-square 0.517



St err

Rob.

t-test

Rob.

p-val

Significant

BETA_DISTANCE 0.626 0.856 0.73 0.46 -

BETA_RISEMAX -5.81 2.88 -2.02 0.04 V

BETA_WALKONLY 10.0 1.50 6.68 0.00 V

BETA_WALKSAFE 9.68 1.73 5.58 0.00 V


BETA_Log(PS1DIST) 5.30 0.433 12.24 0.00 V

Table 25: Basic model 21-data set PSL results, trip length in Distance (km)

Model: Path-Size Logit for panel data (Route Classes)









Rho-square 0.528



St err

Rob.

t-test

Rob.

p-val

Significant

BETA_ACLASS 0.313 0.719 0.44 0.66 -

BETA_BCLASS 0.147 0.766 0.19 0.85 -

BETA_CCLASS -0.959 0.899 -1.07 0.29 -

BETA_DCLASS 0.498 0.782 0.64 0.52 -

BETA_RISEMAX -6.78 2.80 -2.42 0.02 V

BETA_WALKONLY 10.3 1.45 7.09 0.00 V

BETA_WALKSAFE 10.1 1.70 5.99 0.00 V


BETA_Log(PS1DIST) 4.98 0.438 11.37 0.00 V

Table 26: Basic model 21-data set PSL results, trip length in Route Classes

92

6.5.2 Importance Sampling

As Importance Sampling method 1 have shown to result in the best results in the previous section,

this method is adopted here as well for Importance Sampling. Also here, six alternatives are selected

for each choice set (for total of 189 observations) according to Importance Sampling method 1.


St err

Rob.

t-test

Rob.

p-val

Rho-

square

Adjusted

Rho-square

Significant

BETA_DISTANCE -0.239 0.432 -0.55 0.58 0.000 -0.003 -

BETA_ACLASS -0.490 0.0421 -11.65 0.00 0.016 0.004 V

BETA_BCLASS 1.09 0.375 2.90 0.00 0.016 0.004 V

BETA_CCLASS -0.267 0.338 -0.79 0.43 0.016 0.004 -

BETA_DCLASS 0.329 0.0929 -3.54 0.00 0.016 0.004 V

BETA_RISEMAX 0.803 1.81 0.44 0.66 0.000 -0.003 -

BETA_WALKONLY 9.78 1.31 7.46 0.00 0.237 0.234 V

BETA_WALKSAFE 9.37 1.45 6.48 0.00 0.252 0.249 V

BETA_WALKALL -10.9 1.25 -8.69 0.00 0.487 0.484 V

BETA_Log(PS1DIST) 6.33 0.543 11.66 0.00 0.301 0.298 V

BETA_Log(PS2DIST) 6.22 0.530 11.74 0.00 0.292 0.289 V

BETA_PSCDIST 2.48 0.383 6.48 0.00 0.076 0.073 V

Table 27: Importance sampling 21-data set, attributes independently estimated

Independent parameter estimation show similar results as independent parameter estimation for the

total 21-data set (Table 24). The only difference is that here Route class C is not significant while in

the previous independent estimation Route class B was not significant.

Also booth models of Importance Sampling show similar results: Route classes and Distance are not

significant; RiseMax, WalkSafe, WalkOnly and LogPS1 are significant and show for both models

similar results. The results of these significant parameters and the values of the adjusted rho-square

are in line with the results of the basic model.

93

Model: Path-Size Logit for Imp Sampling 1 (Distance)









Rho-square 0.558



St err

Rob.

t-test

Rob.

p-val

Significant

BETA_DISTANCE -0.850 0.684 -1.24 0.21 -

BETA_RISEMAX -9.82 1.80 -5.44 0.00 V

BETA_WALKONLY 11.0 1.53 7.24 0.00 V

BETA_WALKSAFE 8.66 1.82 4.76 0.00 V


BETA_Log(PS1DIST) 4.58 0.637 7.18 0.00 V


Model: Path-Size Logit for Imp. Sampling 1 (Route Classes)









Rho-square 0.561



St err

Rob.

t-test

Rob.

p-val

Significant

BETA_ACLASS -0.143 0.182 -0.78 0.43 -

BETA_BCLASS 0.789 0.703 1.12 0.26 -

BETA_CCLASS -0.476 0.632 -0.75 0.45 -

BETA_DCLASS -0.170 0.271 -0.63 0.53 -

BETA_RISEMAX -9.73 1.85 -5.27 0.00 V

BETA_WALKONLY 10.9 1.59 6.88 0.00 V

BETA_WALKSAFE 8.69 1.87 4.66 0.00 V


BETA_Log(PS1DIST) 4.46 0.656 6.80 0.00 V


94

6.5.3 Conclusion

The results of the basic model and of importance sampling are very similar. Also in all combined

models estimated for the 21-data set, the trip length is not significant. Conclusion is that the

composition and the size of the choice set in this estimation process do not really influence the

model results when using the 21-data set. Only when the Route classes are estimated

independently, they show significant results for a few classes. As expected, there is a preference for

the D class (longest routes) as descriptive analysis have shown that most of the chosen routes

belongs to the longest routes. Another conclusion is that people from the 21-data set have a strong

preference for WalkOnly and WalkSafe routes (showed in all models) and they don’t mind choosing

overlapping routes (LogPS1 is positive in all models).

The adjusted rho-squares of all models of the 21-data set are remarkably high, especially for

Revealed Preference data. These numbers are also much higher than the adjusted rho-squares of

the models estimated with the 20-data set. Apparently, the 21-data set fits the model very well,

which is very remarkable because the 21-data were seen as the exceptions of the total data set. The

high value suggests that the generated choice set may contain too few reasonable alternatives,

biasing the parameter estimates.

6.6 Final Conclusion

This sections aims at answering the following research questions:


model results?


The main conclusion is that it is possible to estimate route choice models and to obtain significant

results from the GPS data set. This gives an answer to the second research question: yes, our

successful estimation of a pedestrian route choice model suggests that it is realistic to treat walking

behaviour as utility maximizing behaviour. Apparently, people do not choose their routes randomly,

and it is possible to partly explain their behaviour with a discrete choice model.

Estimation of the basic models for the 20-data set does not give satisfactory results. Based on

literature and findings of the descriptive analysis, it was expected that pedestrians aim at minimizing

trip length, and that they mainly choose one of the shortest routes. The results of the independent

parameter estimation prove otherwise: distance has a positive effect on route choices and the Route

class of the shortest routes is not preferred by pedestrians (significant, but negative). The positive

parameter for Distance could be explained by the fact that trip lengths of alternative routes could be

very similar (as shown in Figure 39 and 40). The negative parameter for Route class A could be

explained by the difference in methods to define the route classes. In the descriptive analysis, the

routes within a choice set were ranked from 1 to 20 (or 21) and route classes were defined by four

routes: the four shortest routes, the four second shortest routes, the four second longest routes and

the four longest routes. This way, each class had the same amount of routes. In the estimation

process, the route classes were defined according to the method shown in Table 11. This last

method is more systematic, and it would have been better to use this method for descriptive

analysis as well.

95

Estimation of the basic models (with all parameters) results in no significant parameters for Distance

or for Route classes. An explanation could be that trip length correlate with other attributes, or

classes correlate with each other. As seen in the correlation Table 10, trip length shows correlation

with Min RiseMax, Max Walkonly and Max PS1 and PS2, so this could be an explanation.

To find out what the effect is of different size and composition of the choice set on model results,

different samples of alternatives are tested. In total four samples are tested: longest routes, six

randomly chosen from choice set and selecting six alternatives using importance sampling (first and

second method). The sample of longest routes was too small, and therefore these model results

were not used in the analysis. The conclusion is that size and composition of the choice set have an

influence on the model estimates and that using well-sampled choice sets could lead to better model

results than using the full choice set. Using importance sampling according to the first method

resulted in the best model results: most parameters were significant, best Goodness of fit (adjusted

rho-square of 0.285 when using Route classes for trip length) and the parameter values for trip

lengths were mainly according to our expectations (in independent parameter estimation). Only in

these model results the Distance was significant and negative and Route class D (longest routes)

was also significant and negative. Table 30 shows the results from the Basic model and model using

Importance sampling method 1. As seen in the results, the Goodness of fit is better for the

Importance sampling model, and more significant parameters are found in this model. The relative

importance of attributes in both models is similar (RiseMax is most important, then the Path Size

factor and then road type (WalkSafe).

Model: Path-Size Logit for panel data (Route classes) Basic Model Imp Sampl 1

Number of estimated parameters 8 8

Number of observations 365 365

Number of individuals 49 49

Null log-likelihood -1093.442 -653.992

Cte log-likelihood -729.495 0.000

Init log-likelihood -1093.442 -653.992

Final log-likelihood -845.491 -459.752

Likelihood ratio test 495.903 388.481

Rho-square 0.227 0.297

Adjusted rho-square 0.219 0.285

Basic Imp Sampl 1


St err

Rob.

t-test

Value

Rob.

St err

Rob.

t-test

BETA_ACLASS - 0.346 -1.13 -0.749 0.314 -2.38

BETA_BCLASS - 0.459 1.47 1.26 0.339 3.73

BETA_CCLASS - 0.420 0.52 0.909 0.367 2.48

BETA_DCLASS - 0.397 -1.27 -1.42 0.327 -4.35

BETA_RISEMAX -31.9 7.43 -4.30 -30.6 7.81 -3.91

BETA_WALKONLY - 0.868 0.03 - 0.844 -0.53

BETA_WALKSAFE 3.06 0.815 3.75 1.82 0.811 2.24

BETA_WALKALL - - - - - -

BETA_Log(PS2DIST) -3.78 0.575 -6.58 -2.37 0.829 -2.86

Table 30: Basic model and Importance Sampling 1 PSL results, trip length in Route Classes

96

The attributes RiseMax, WalkSafe and Path Size factors, were very consistent in all model results:

they were always significant and they had comparable values in all models of the 20-data set.

RiseMax seems to be the most dominant factor in route choices of pedestrians in Zürich, as the

value of this parameter is in all model results significantly higher than the values of the other

parameters (between -30 and -35).

The total data set was split into two data sets because the behaviour of the pedestrians of the one

data set was expected to be different than the behaviour of the pedestrians of the other data set. In

the basic models and the models using importance sampling for the 21-data set, Route classes and

Distance are never significant. Unfortunately, this data set does not provide much information about

route choices regarding trip length of this group. Model results of both models are very similar. A

note is that the adjusted rho-squares of all models of the 21-data set are remarkably high, much

higher than the adjusted rho-squares of the models estimated with the 20-data set. Apparently, the

21-data set fits the model very well, which is very remarkable because the 21-data were seen as the

exceptions of the total data set. The high value suggests that the generated choice set may contain

too few reasonable alternatives, biasing the parameter estimates.

When comparing the results for independent parameter estimation of the 21-data set with the

results of the 20-data set (basic models), we would expect that different route classes are significant

and positive (A for 20-data set and D for 21-data set). This is observed when the parameters are

estimated independently for the 21-data set, but not for the 20-data set as route class A is negative.

In the basic models (with combined parameters), trip length does not play a big role in both data

sets: in both data-sets trip lengths are never significant. Importance Sampling method 1 shows

significant results for trip lengths for the 20-data set, but bot for the 21-data set. An explanation for

insignificant trip length parameters could be that trip length correlate with other attributes, or

classes correlate with each other.

Another substantial difference is that the Path-Size factors are negative in the 20-results and positive

in the 21-results which suggest that pedestrians of the 21-data set find overlapping routes

attractive. An explanation is that longer routes have a higher chance to have overlapping paths.

Another difference is that pedestrians of the 20-data set have a very strong aversion to steep

routes, while this is less observed in the 21-data set. Also, people of the 21-data set have a stronger

preference for WalkOnly and WalkSafe routes than the people of the 20-data set.

To answer the first research question of this chapter, size and composition could have a positive

effect on the quality of the mode results, but that depends on the choice set sample. Some samples

have shown to result in better model results than the basic model, such as the model which uses

Importance Sampling method 1 for the 20 data set. But this method does not guarantee better

model results, as when applying this method to the 21-data set, the results are very similar to the

results of the basic model.

The main conclusion of the estimation process is it is possible to estimate a route choice model from

GPS data, but that the estimates do not always correspond to our expectations, based on descriptive

analysis and findings from literature. The reason why the expected results for trip length in the

independent estimation were not found could be that the differences between distances are very

small (for Distance) and difference in methodology in defining the route classes (for route classes).

In combined estimation, this could be the explanation as well, or it could be explained by correlation

between route classes or between trip lengths and other attributes (see Table 10). Another

expectation was that there is no strong preference for a specific road type, but estimates show that

97

there is a preference for WalkSafe roads. This is actually in line with findings from literature, so the

result is not very surprising, it was only not clearly found in the descriptive analysis. An explanation

could be that in the descriptive analysis we have only looked into the effect of the largest fraction of

WalkSafe routes on route choices, and not on fraction of WalkSafe roads in general. The expectation

about the negative Path-Size factors was true and found in the model estimates.

98

99

7 Conclusions and recommendations

In the last chapter, main findings will be discussed and the final conclusions will be drawn by

answering the research questions. Based on these conclusions, recommendations will be given for

science and practice. Lastly, in the discussion the author will critically reflect on the work.

7.1 Findings

This thesis consists of the literature study and a case study. Findings from literature were applied in

the case study and findings from the case study are used to answer the main research question. The

main findings from literature on pedestrian route choice behaviour are that pedestrians mainly make

their route choices simultaneously and that trip length is found to be the most dominant factor in

pedestrian route choices. Other influential quantitative factors are road type and gradient (especially

in hilly cities). Therefore, these three attributes are selected for the route choice model. The main

finding from literature on pedestrian route choice modelling is that there are no modelling

techniques yet especially developed for pedestrians. The most suitable methods found for pedestrian

route choice modelling are the BFS-LE choice set generation method for generating non-chosen

routes and using the Path-Size Logit model to account for similarities between alternatives.

When all choice sets are prepared and all route attributes are calculated, the results are analysed

descriptively. The main findings of descriptive analysis is that people in Zürich mainly walk short

distances (median of 0,08 km and mean of 0,134 km for chosen routes) and that people mainly

choose one of the shortest routes of the choice set (in normal conditions). Other findings are that

pedestrians consider Maximum rise as more important than Average Rise and that differences in trip

length between alternatives could be very small.

The main finding form the estimation process is that it is possible to estimate a pedestrian route

choice model from revealed preference GPS data. Several significant parameters were found in

different model estimations. However, the estimation results did not always correspond to

expectations based on descriptive analysis and findings from literature. The attributes RiseMax,

WalkSafe and the Path Size factors were found the be very consistent in all model results: they were

all significant and they showed comparable results. The influence of trip length is found to be non-

consistent across all model estimations.

7.2 Conclusions

100

The purpose of this thesis is to estimate a pedestrian route choice model estimated on the basis of

revealed preference GPS data. So far, there have only been a few pedestrian route choice models

estimated from GPS data in a real size urban area. The aim of these models is to understand how

pedestrians really choose their route within the city.

The answer to the main research question is based on findings of the estimation process. The

following environmental street characteristics have an influence on pedestrian route choice

behaviour: maximum rise, road type “Walk Safe” (allowed for pedestrians and cyclists) and the Path

Size factor. Trip length also has an influence on pedestrian route choices, but the estimates for trip

length are not consistent across all model estimations: sometimes they are significant, sometimes

not and sometimes they are negative or positive. The estimates for RiseMax, WalkSafe and Path Size

factor are very consistent in all model results: they were always significant and they had comparable

values in all models using the same data set. RiseMax seems to be the most dominant factor in

route choices of pedestrians in Zürich, as the value of this parameter is in all model results

significantly higher than the values of the other parameters (between -30 and -35). Also the relative

importance of attributes is in all models similar (RiseMax is most important, then the Path Size factor

and then road type WalkSafe).

As the results for trip length are non-consistent, it is clear that the trip length is not the dominant

factor in pedestrian route choices in Zürich. This goes against all literature about pedestrian route

choices in urban areas (mainly based on surveys) and against the assumption that pedestrians aim

at minimizing trip length. This difference in result and expectation could be explained by the data

sample used in this casus: the walk trips are very short (median of 0,08 km and mean of 0,134 km

for chosen routes). However, in independent parameter estimations, trip lengths often show to have

a significant influence. In the best model results, obtained by using Importance sampling according

to the first method for the 20-data set, the parameter values for trip lengths were also significant:

distance has a negative influence and the last Route class (with the longest route) has also a

negative influence. Therefore, we could conclude that trip length has an influence on the pedestrian

route choices in urban areas, but the estimation results are not as consistent as the other attributes.

The main conclusion of the estimation process is that it is possible to estimate a pedestrian route

choice model based on revealed preference GPS data. Several significant parameters are found, and

most of the findings make sense. The successful estimation of a pedestrian route choice model

suggests that it is realistic to treat walking behaviour as utility maximizing behaviour. Apparently,

people do not choose their routes randomly, and it is possible to partly explain pedestrian behaviour

in a discrete choice modelling framework.

7.3 Recommendations for science and further research

Modelling of pedestrian behaviour is regional urban areas is an interesting topic for research. To

start with the data collection, using GPS data in a revealed data study is still very time-consuming.

Advances and automation in GPS data collection, post-processing, map-matching and analysis would

make this work a lot easier and faster. Also, more accurate GPS devices would help to make the

work less time-consuming: the smoothing and filtering process would be less extensive and map-

matching to the network would be easier. New innovations in data collection methods, such as using

Virtual Reality, Augmented Reality and Social Media, are very promising but needs to be developed

for use in route choice modelling. New algorithms need to be developed to obtain the desired

observed behaviour from the data and to post-process this data to prepare this for further analysis.

101

New methods to handle large and rich data sets are also in development, and could also help to do

research on route choices on a larger scale.

Do give concrete recommendations the data collection part, it is highly recommended ask the

participants to report their trip purpose and activities in the travel diaries. This is very useful for

estimation, as pedestrians with different trip purposes show different route choice behaviours. This

way, the trips could be categorizes by trip purpose for estimation. It is also recommended to ask for

basic socio-demographic characteristics, as this can be used in the estimation process to account for

heterogeneous tastes between participants. For the GPS processing, it is useful to link the GPS

tracks with trip purpose, for the same reasons as mentioned earlier. Stop-points could be defined as

well: is it a transfer to another mode, an activity during a round trip (going to the supermarket and

pharmacy combined in one round trip) or is there a signal lost? Improvements in filtering and

detecting of stop points could support in this process.

The main gaps were found in the route choice modelling part. There is no choice set generation

method developed especially for pedestrians, which is in line with pedestrian behaviour. In this

thesis, the selected method was assumed to be the best method to generate routes for pedestrians.

The future choice set generation method will need to take taste variation and environmental street

characteristics into account, and it should be able to handle dense and large urban networks as

pedestrians use dense networks. A promising method for pedestrian behaviour is to use Importance

sampling for choice set generation. This is not yet applied to pedestrians, so this could be an

interesting topic for further research.

Another interesting topic for research is to account for similarities between alternatives. There are

several methods to account for similarities, but which one represents the correlation structure best?

And when does it have a positive effect and when a negative effect? How is correlation perceived by

pedestrians? How do they know and do they know there is overlap between routes and how does

the pedestrian react on this? These questions could not be answered by the author and therefore

assumptions were made about how pedestrians perceive overlap between routes. The pedestrians in

this casus were assumed to have good knowledge about the overlap between routes, which is

actually unrealistic. This was assumed because the author lacks knowledge about what pedestrians

know about overlapping routes. For the calculation of the Path-Size factors, the question is if it

should be calculated based on the true choice set or based on the generated choice set? It sounds

logical that it should be calculated based on the true choice set (so as large as possible), in order to

approximate the true correlation structure. But the author lacks information about methods on how

to calculate path sizes based on the true choice set, thus therefore it was assumed that calculation

based on generated choice set also represents the correlation structure between overlapping routes.

Another assumption made in this thesis is that the Path-Size Logit model is the best model to explain

pedestrian route choice behaviour in cities, concerning a revealed preference study. In this model,

heterogeneous preferences of individuals are not captured in the route choice model. Further

research and knowledge is needed on how to capture heterogeneous preferences of pedestrians in

route choice models. The use of advanced model structures (such as Mixed Logit) for pedestrian

route choices or the use of interaction factors for accounting needs further research, as this could fill

the research gap of capturing heterogeneity in pedestrian route choice models.

In general, it would be interesting to do a similar study for another city and with a larger data

sample. One of the limitations in this thesis was the data sample: it contained very short walking

trips, which are not representative for actual pedestrian behaviour in cities. Because of this, it was

102

not possible to obtain results, which are useful for other cities or which can be used as standard for

planning and design of pedestrian places.

7.4 Recommendations for practice

For practice, the results of this thesis are only useful for policy-making in Zürich or in other hilly

cities. The main conclusion of this research is that maximum rise of a route, overlapping routes and

Walk and Bike roads have a clear influence on pedestrian route choices. Zürich has as one of the

main goals for mobility that the share of public transport and slow traffic should be increased with at

least 10% within 10 years (Stadt-Zuerich, 2015). Another goal of the city of Zürich is to improve

pedestrian and bicycle facilities and to make travelling by active mode more attractive. A

recommendation for policy-making is to plan more Walk only or Wa (Stadt-Zuerich, 2015)lk and Bike

roads in the city, especially outside the city centre area (which is already very pedestrian friendly).

Especially the area around Hönggerberg (also where the ETH campus is located) needs attention. As

the campus is located on a hill, outside the centre area and not close to a train or tram station, most

of the students and staff come by bus or by car. The roads leading to the campus are all main roads

for mixed traffic, mainly used by motorized traffic. Learned from own experience, the author knows

that it is not comfortable to cycle 3 kilometres uphill in the morning when a lot of cars and buses are

passing you by during peak hours. This trip would be much attractive if there were dedicated Walk

and Bike roads (this is also more attractive for pedestrians). There is also an alternative walk and

bike route to the campus, which is less steep and which goes partly through the forest. Dedicated

walk and Bike roads could also make this route more attractive, especially for the people who do not

like the very steep main route.

Figure 42: Central (www.central.ch)

103

Another example, also known from own experience, is the place called Central in the city centre

(Figure 42). The author has lived at this place for a couple of months, and knows how chaotic this

place is for pedestrians and cyclists: pedestrians have at least the pedestrian crossings, cyclists

seem to have no rights in this place. In this place, several main roads come together and at most

crossings there are not traffic lights. During peak hours, there are traffic controllers who regulate the

traffic. Many tramlines pass this place, so most of the time the tram has priority. The situation for

pedestrians and cyclists could be improved here by placing crossings or pedestrian only zones on

critical places. As can be seen in Figure 42, the tram station is located in the middle of Central. From

certain locations, you have to walk around to reach the tram station safely (via pedestrian

crossings). Pedestrian only or pedestrian priority zones and strategically placed crossings could

improve the situation for pedestrians at Central.

The results of this research are mainly applicable to Zürich, but the methods used to develop the

route choice model are useful for all local governments to support in policy-making for pedestrian

planning and managing pedestrians flows. As we estimated successfully a pedestrian route choice

model based on GPS data, we could assume that walking behaviour can be seen as utility

maximizing behaviour. This would allow pedestrians to be included in regional travel demand

models, as it should be possible to predict walk routes based on models. This way, predicted routes

could be used in planning scenarios. Returning to the Central example, predicted routes could

support in impact assessment of for example a large project such as a major crossing improvement.

These predicted routes could be used additionally to the currently mainly used walkability measures.

In a conversation with the Gemeente Amsterdam, the author has learned that they currently don’t

use any route choice model for pedestrian planning, so any model that provides information about

pedestrian’s preferences is useful. For cyclists, the Gemeente Amsterdam currently uses All-Or-

Nothing assignment. The Gemeente Amsterdam was taken as a reference because it is the biggest

city in the Netherlands, they receive a lot of tourists throughout the year and because they host the

largest city events in the Netherlands (such as King’s day, Gay Pride, SAIL). Therefore, Amsterdam it

was assumed that Amsterdam was the most likely city to use a route choice model for pedestrian

planning. So far, it was not necessary to use a pedestrian route choice model in policy-making.

However, a GPS-based model could be used for various applications:

• GPS data collected at a large city event (for example King’s day) could be used to develop a

route choice model for visitors during an event. Findings of this study could be used to plan

and organize the next large event, for example by planning more exit routes and toilet

facilities in crowded areas

• Predicted routes, based on GPS data, could be used in the planning and design of (large)

infrastructures (pedestrian bridge, crossing improvements, train station)

• Predicted routes, based on GPS data, could be used in impact assessment of new projects

• Predicted routes, based on GPS data, could be used in capacity planning (size of public

places, dimensions of pedestrian paths and areas)

• A GPS-based route choice model could help to determine optimal pedestrian environments.

With this knowledge, new walkability measures and design standards for urban planning

could be developed, which could be used by urban planners and policy-makers

104

7.5 Discussion

As the network topology of Zürich is very typical (lots of height differences), results of this casus and

the main conclusions of this thesis are not applicable to other cities. Maximum rise is here found to

be most dominant in pedestrian route choices, but this result is likely to be not valid for any city in

the Netherlands. However, the other significant factors could be relevant for other cities: preference

for Walk and Bike roads are likely to be valid in other situations as well.

Next to that, there are certain important limitations in the data sample used in this thesis, which

affect our ability to generalize to other situations. First of all, our sample is too small to scientifically

answer the research questions. Second, our sample is not representative for the population, thus

results cannot be generalized to other situations. As personal characteristics are not available, it is

actually not possible to verify whether the sample is representative for the population of Zürich.

Experience from the institute showed that especially elderly are willing to participate in travel

behaviour studies, so they are assumed to be well represented in our data sample as well. Third, the

collected data was not a total random sample: not every inhabitant of Zürich had the same chance

of being asked for participation. Addresses and telephone numbers of potentials participants were

bought from an address dealer and not every inhabitant of Zürich was included in this database.

The data sample is also not representative for normal pedestrian behaviour in cities, as the mean

and the median of the chosen routes are very small. Therefore, the data sample and thus the results

are invalid to scientifically answer the research questions. Results cannot be generalized to other

cities or to larger data sets.

However, the methods used in this thesis are valid and reliable. The pedestrian route choice model

measures what we want to know: it measures the relative influence of different environmental street

attributes. The methods for route choice modelling are also reliable: the same results will be

obtained when the same data sample is used in estimation. Only the data collection part might not

be reliable: collected trips could be very different and could lead to very different model results. For

a large part of this research, algorithms and software is used, so when using exactly the same

procedures, the same results will be obtained.

If the author could redo this research, knowing that the collected data sample is not representative

for several reasons, the author would reformulate the main research questions. The main research

question would be more focused on the used methods and its application for practice. In this

framework, the data sample will only be used as test data to find out if the methods work as they

should work and to find out if the methods are reliable.

105

106

107

Bibliography

Swiss Federal Statistical Office. (2010). Retrieved from Mikrozensus Verkehr 2010: http: Swiss Federal Statistical Office (BFS). (2010). Retrieved from Mikrozensus Verkehr 2010:

http://www.bfs.admin.ch/bfs/portal/de/index/infothek/erhebungen__quellen/blank/blank/mz/01.html

MobiTest GSL. (2012). Retrieved from http://www.mgedata.com/de/hw-und-sw-produkte/custom-produkte/mobitest

POSDAP. (2012). Retrieved from Position Data Processing: http://sourceforge.net/projects/posdap

ArcGIS. (2015). Retrieved from http://www.esri.com/ Eclipse. (2015). Retrieved from https://eclipse.org/ Federal Office of Topography SwissTopo. (2015). Retrieved from

http://www.swisstopo.admin.ch/internet/swisstopo/de/home/products/height/dhm25.html

MATSim. (2015). Retrieved from Multi-Agent Transportation Simulation: http://www.matsim.org

Office for Spatial Development of the Canton of Zurich. (2015). Retrieved from http://maps.zh.ch/?topic=LidarZH&offlayers=dom2014hillshade&over=UpBackGroundZH

OpenStreetMap. (2015). Retrieved from http://www.openstreetmap.org Agrawal Weinstein, A., Schlossberg, M., & Irvin, K. (2008). How Far, by Which Route and

Why? A Spatial Analysis of Pedestrian Preference. Journal of Urban Design, 13(1), 81–98.

Antonini, G. (2005). A discrete choice modeling framework for pedestrian walking behavior with application to human tracking in video sequences; PhD thesis. Lausanne: EPFL Lausanne.

Antonini, G., Bierlaire, M., & Weber, M. (2006). Discrete choice models of pedestrian walking behavior. Transportation Research Part B: Methodological, 40(8), pp. 667–687.

Bekhor, S., Ben-Akiva, M., & Ramming, S. (2006). Evaluation of choice set generation algorithms for route choice models. Annals of Operations Research, 144, 235–247.

Ben-Akiva, M. E. (1973). Structure of passenger travel demand models; PhD thesis. Cambridge, MA: MIT.

Ben-Akiva, M. E., & Bierlaire, M. (1999). Discrete choice methods and their applications to short-term travel decisions. In R. W. Hall, Handbook of Transportation Science (pp. 5-34). Dordrecht, The Netherlands: Kluwer Academic Publishers.

Ben-Akiva, M. E., & Bolduc, D. (1996). Multinomial probit with a logit kernel and a general parametric specification of the covariance structure. Working Paper. Cambridge, USA: Massachusetts Institute of Technology,.

Ben-Akiva, M. E., Bergman, M. J., Daly, A. J., & Ramaswamy, R. (1984). Modelling inter urban route choice behavior. Proceeding of the Ninth International Symposium on Transportation and Traffic Theory (pp. 299-330). Delft, Netherlands: VNU Science Press.

Bierlaire, M. (2003). BIOGEME: A free package for the estimation of discrete choice models. Proceedings of the 3rd Swiss Transportation Research Conference. Ascona, Switzerland.

Bierlaire, M., & Frejinger, E. (2008). Route choice modeling with network-free data. Transportation Research Part C 16(2), 187-198.

108

Bliemer, M. C., & Bovy, P. H. (2008). Impact of route choice set on route choice probabilities. Transportation Research Record 2076, 10-19.

Bliemer, M. C., & Rose, J. M. (2010). Construction of experimental designs for mixed logit models allowing for correlation across choice observations. Transportation Research Part B 44, 720-734.

Boarnet, M., & Crane, R. (2001). Travel by Design: The Influence of Urban Form on Travel. New York, NY: Oxford University Press.

Borgers, A. W., & Timmermans, H. J. (1986). City centre entry points, store location patterns and pedestrian route choice behaviour: a microlevel simulation model. Socio-Economic Planning Sciences, 20, pp. 25-31.

Borst, H. C., de Vries, S. I., Graham, J. M., van Dongen, J. E., Bakker, I., & Miedema, H. M. (2009). Influence of environmental street characteristics on walking route choice of elderly people. Journal of Environmental Psychology, 29, 477–484.

Bovy, P. H. (2009). On modelling route choice sets in transportation networks: a synthesis. Transport Reviews 29(1), 43-68.

Bovy, P. H., & Fiorenzo-Catalano, S. (2007). Stochastic route choice set generation: behavioral and probabilistic foundations. Transportmetrica, 3, 173-189.

Bovy, P. H., & Stern, E. (1990). Route choice: wayfinding in transport networks. Dordrecht, The Netherlands: Kluwer Academic Publishers.

Bovy, P. H., Bekhor, S., & Prato, C. G. (2008). The factor of revisited path size: alternative derivation. Transportation Research Record 2076, 132-140.

Bovy, P. H., Bliemer, M. C., & van Nes, R. (2006). CT4801 Transportation Modeling. Lecture notes, Delft University of Technology, Delft.

Bovy, P. H., Bliemer, M. C., & van Nes, R. (2006). Transportation modeling: lecture notes CT4801. Delft, The Netherlands: Delft University of Technology.

Broach, J., & Dill, J. (2015). Pedestrian Route Choice Model Estimated from Revealed Preference GPS Data. Transportation Research Board 94th Annual Meeting.

Broach, J., Gliebe, J. G., & Dill, J. L. (2011). Bicycle route choice model developed using revealed preference GPS data. Proceedings of the 90th Annual Meeting of the Transportation Research Board. Washington, D.C.

Brown, B. B., Werner, C. M., Amburgey, J. W., & Szalay, C. (2007). Walkable route perceptions and physical features: converging evidence for en route walking experiences. Environment Behavior 39, 34–61.

Cascetta, E., Nuzzolo, A., Russo, F., & Vitetta, A. (1996). A modified logit route choice model overcoming path overlapping problems: specification and some calibration results for interurban networks. Proceedings of the 13th International Symposium on Transportation and Traffic Theory, (pp. 697–711). Lyon, France.

Cheung, C. Y., & Lam, W. H. (1998). Pedestrian route choices between escalator and stairway in MTR Stations. Journal of Transportation Engineering, 124, 277-285.

Chorus, C. G. (2010). A new model of random regret minimization. EJTIR 2(10), 181-196. Chu, C. (1989). A paired combinatorial logit model for travel demand analysis. Proceedings

of the 5th World Conference on Transportation Research, (pp. 295-309). Ventura, USA.

Daamen, W. (2004). Modelling Passenger Flows in Public Transport Facilities; PhD thesis. Delft University of Technology. Delft: DUP Science.

Daamen, W., & Hoogendoorn, S. P. (2004). Level difference impacts in passenger route choice modelling. TRAIL conference proceedings 2004: A world of transport, infrastructure and logistics (pp. 103-127). Delft: DUP Science.

Daganzo, C. F., & Sheffi, Y. (1977). On stochastic models of traffic assignment. Transportation Science 11, 253-274.

109

Daly, A. J., & Hess, S. (2010). Simple approaches for random utility modelling with panel data. European Transport Conference 2010 Proceedings. Glasgow.

de la Barra, T., Perez, B., & Anez, J. (1993). Multidimensional path search and assignment. Proceedings of the 21st PTRC Summer Meeting, (pp. 307-319). Manchester, UK.

de Moraes Ramos, G. (2015). Dynamic Route Choice Modelling of the Effects of Travel Information using RP Data; PhD thesis. Delft: Delft University of Technology.

Debreu, G. (1960). Review of R.D. Luce individual choice behavior. American Economic Review, 50 (1), 186-188.

El-Geneidy, A., Grimsrud, M., Wasfi, R., Tétreault, P., & Surprenant-Legault, J. (2014). New evidence on walking distances to transit stops: Identifying redundancies and gaps using variable service areas. Transportation, 41(1), pp. 193-210.

Fiorenzo-Catalano, M. S. (2007). Choice Set Generation in Multi-Modal Transportation Networks; PhD thesis. Delft: Delft University of Technology.

Flotterod, G., & Bierlaire, M. (2013). Metropolis-Hastings sampling of paths. Transportation Research Part B: Methodological, 48, pp. 53-66.

Frejinger, E., & Bierlaire, M. (2007). Capturing correlation with subnetworks in route choice models. Transportation Research Part B: Methodological, 41 (3), pp. 363–378.

Frejinger, E., Bierlaire, M., & Ben-Akiva, M. (2009). Sampling of alternatives for route choice modeling. Transportation Research Part B: Methodological, 43 (10), pp. 984-994.

Guo, Z., & Loo, B. P. (2013). Pedestrian environment and route choice: evidence from New York City and Hong Kong. Journal of Transport Geography 28, 124–136.

Halldórsdóttir, K., Rieser-Schüssler, N., Axhausen, K. W., Prato, C. G., & Nielsen, O. A. (2014). Efficiency of Choice Set Generation Methods for Bicycle Routes. European Journal of Transport and Infrastructure Research, 14 (4), 332-348.

Hensher, D. A., Rose, J. M., & Greene, W. H. (2005). Applied choice analysis: a primer. Cambridge University Press.

Hess, S. (2015, February). DAS Module: Discrete Choice Modelling. Zurich. Hess, S., Bierlaire, M., & Polak, J. W. (2005). Capturing taste heterogeneity and correlation

structure with mixed GEV models. In A. Alberini, & R. Scarpa, Applications of Simulation Methods in Environmental and Resource Economics (pp. 55–76). Boston, MA: Kluwer Academic Publisher.

Hill, M. R. (1982). Spatial Structure and Decision-Making of Pedestrian Route Selection Through an Urban Environment; Phd thesis. University of Nebraska.

Hofmann, N. (2000). The Capacity Restraint Vine: a powerful framework for modelling individual travellers dynamic decision making in a network at micro-level. Proceedings of PTRC Seminar, 445, pp. 55-67.

Hood, J., Sall, E., & Charlton, B. (2011). A GPS-based bicycle route choice model for San Francisco, California. Transportation Letters: The International Journal of Transport Research, 3, 63-75.

Hoogendoorn, S. P. (2001). Normative Pedestrian Flow Behavior: Theory and Applications. Research Report Vk2001.002, Delft University of Technology, Transportation and Traffic Engineering Section.

Hoogendoorn, S. P. (2003). Pedestrian travel behavior modeling. 10th International Conference on Travel Behavior Research. Lucerne.

Hoogendoorn, S. P. (2015). Allegro: Annex 1 to the grant agreement Part B. Delft. Hoogendoorn, S. P., & Bovy, P. H. (2004). Pedestrian route-choice and activity scheduling

theory and models. Transportation Research Part B 38, 169-190. Hoogendoorn-Lanser, S. (2005). Modelling travel behaviour in multi-modal networks; PhD

thesis. Delft: Delft University of Technology.

110

Hoogendoorn-Lanser, S., & Bovy, P. H. (2007). Modeling overlap in multi-modal route choice by inclusion of trip part specific path size factors. Transportation Research Record, 74-83.

Hoogendoorn-Lanser, S., & van Nes, R. (2004). Multi-modal choice set composition: Analysis of reported and generated choice sets. Proceedings Transportation Research Board, Washington.

Hoogendoorn-Lanser, S., van Nes, R., & Bovy, P. H. (2005). Path-size and overlap in multimodal transport networks. Flow, Dynamics and Human Interaction - Proceedings of the 16th International Symposium on Transportation and Traffic Theory (pp. 63-83). Oxford: Elsevier.

Liu, X., Usher, J. M., & Strawderman, L. (2009). Nested logit model of airport pedestrians’ activity scheduling patterns. Symposium on Human Computer Interaction with Complex Systems (HICS).

Lou, Y., Zhang, C., Zheng, Y., Xie, X., Wang, W., & & Huang, Y. (2009). Map-matching for low-sampling-rate GPS trajectories. Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, (pp. 352-361).

Manski, C. (1977). The Structure of Random Utility Models. Theory and Decision 8, 229-254.

Marchal, F., Hackney, J. K., & Axhausen, K. W. (2005). Efficient map matching of large Global Positioning System data sets: Tests on speed-monitoring experiment in Zurich. Transportation Research Record 1935, 93-100.

McFadden, D. (1973). Conditional Logit Analysis of Qualitative Choice Behavior. In P. Zarembka, Frontiers in Econometrics (pp. 105-142). New York City: Academic Press.

McFadden, D. (1978). Modelling the choice of residential location. In A. Karlquist, L. Lundquist, F. Snickars, & J. Weibull, Spatial Interaction Theory and Planning Models (pp. 75-96). Amsterdam, The Netherlands: North-Holland Publishing Company.

McFadden, D., & Train, K. (2000). Mixed MNL Models for Discrete Response. Journal of Applied Econometrics, 15(5), 447-470.

Menghini, G., Carrasco, N., Schüssler, N., & Axhausen, K. W. (2010). Route choice of cyclists in Zurich. Transportation Research Part A, 44, pp. 754-765.

Montini, L., Rieser-Schüssler, N., & Axhausen, K. W. (2013). Field Report: One-Week GPS-based Travel Survey in the Greater Zurich Area. 13th Swiss Transport Research Conference. Ascona.

Moore, E. F. (1959). The shortest path through a maze. Proceedings of the International Symposium on the Theory of Switching (pp. 285–292). Harvard University Press.

Nielsen, O. A. (2000). A stochastic transit assignment model considering differences in passengers utility functions. Transportation Research Part B, 34, 377-402.

Oakes, J. (2004). The (mis)estimation of neighborhood effects: causal inference for a practicable social epidemiology. Social Science and Medicine 58 (10), pp. 1929–1952.

Prato, C. G. (2009). Route choice modeling: past, present and future research directions. Journal of Choice Modelling 2(1), 65-100.

Prato, C. G., & Bekhor, S. (2006). Applying branch and bound technique to route choice set generation. Transportation Research Record 1985, 19–28.

Prato, C. G., & Bekhor, S. (2007). Modeling route choice behavior: how relevant is the composition of choice set? Transportation Research Record 2003, 64–73.

Ramming, M. S. (2002). Network Knowledge and Route Choice, PhD thesis. Massachusetts Institute of Technology, Cambridge, MA.

Rieser-Schüssler, N., Balmer, M., & Axhausen, K. W. (2012). Route choice sets for very high-resolution data. Transportmetrica A: Transport Science 9:9, 825-845.

111

Rieser-Schüssler, N., Montini, L., & Dobler, C. (2011). Improving post-processing routines for GPS observations using prompted-recall data. 9th International Conference on Survey Methods in Transport. Termas de Puyehue, Chile.

Rodriguez, D. A., Merlin, L., & Prato, C. G. (2014, Environment and Behavior). Influence of the Built Environment on Pedestrian Route Choices of Adolescent Girls. Environment and Behavior, 47(4), 359–394.

Schüssler, N. (2010). Accounting for similarities between alternatives in discrete choice models based on high-resolution observations of transport behaviour; PhD thesis. Zürich: ETH Zürich.

Schüssler, N., & Axhausen, K. W. (2009). Map-matching of GPS traces on high-resolution navigation networks using the Multiple Hypothesis Technique (MHT). Arbeitsberichte Verkehrs- und Raumplanung 568.

Schüssler, N., & Axhausen, K. W. (2009). Processing Raw Data from Global Positioning Systems Without Additional Information. Transportation Research Record 2105, 28-36.

Seneviratne, P. N., & Morrall, J. F. (1985). Analysis of factors affecting the choice of route of pedestrians. Transportation Planning and Technology, 10(2), 147–159.

Senozon. (2015). Senozon AG, VIA. Retrieved September 2015, from www.senozon.com Srikukenthiran, S., Shalaby, A., & Morrow, E. (2014). Mixed Logit Model of Vertical

Transport Choice in Toronto Subway Stations and Application within Pedestrian Simulation. Transportation Research Procedia: The Conference on Pedestrian and Evacuation Dynamics 2014, (pp. 624–629). Delft.

Stadt-Zuerich. (2015). stadt-zuerich.ch. Retrieved from https://www.stadt-zuerich.ch/ted/de/index/stadtverkehr2025/programm_stadtverkehr_2025.html

Train, K. (2003). Discrete Choice Methods with Simulation. University of California, Berkeley: Cambridge University Press.

United Nations, D. (2013). World Population Prospects: The 2012 Revision. New York: United Nations.

van der Waerden, P., Borgers, A., & Timmermans, H. (2004). Choice Set Composition in the Context of Pedestrians’ Route Choice Modeling. Proceedings TRB 2004 Annual Meeting. Washington, D.C.

Verlander, N. Q., & Heydecker, B. G. (1997). Pedestrian route choice: an empirical study. Transportation Planning Methods: Proceedings of European Transport Forum Annual Meeting, Brunel University, England, (pp. 39–49).

Vovsha, P. (1997). The cross-nested logit model: application to mode choice in the Tel Aviv metropolitan area. Transportation Research Record 1607, 13-20.

Walker, J. L. (2001). Extended Discrete Choice Models: Integrated Framework, Flexible Error Structures, and Latent Variables; PhD thesis. Massachusetts Institute of Technology, Department of Civil and Environmental Engineering, Boston, MA.

112

113

Appendix 1

Study area in MATSim format and in OpenStreetMap

Green – Only Pedestrians (WalkOnly) Purple – Pedestrians and Bikes (WalkSafe) White – All modes

114

115

Appendix 2

Example of Travel Diary

116

Appendix 3 Descriptive analysis chosen routes

Distance RiseMax

Mean 0,134 Mean 0,027

Standard Error 0,005 Standard Error 0,002

Median 0,080 Median 0,009

Mode 0,059 Mode 0,000

Standard Deviation 0,132 Standard Deviation 0,049

Sample Variance 0,017 Sample Variance 0,002

Kurtosis 0,950 Kurtosis 20,221

Skewness 1,340 Skewness 3,727

Range 0,617 Range 0,493

Minimum 0,001 Minimum 0,000

Maximum 0,618 Maximum 0,493

Sum 77,513 Sum 15,541

Count 579 Count 579

Largest(1) 0,618 Largest(1) 0,493

Smallest(1) 0,001 Smallest(1) 0,000

Confidence

Level(95,0%) 0,011

Confidence

Level(95,0%) 0,004

RiseAverage FallMax

Mean 0,008 Mean 0,033



Mode 0,000 Mode 0,000








Sum 4,730 Sum 19,099

Count 579 Count 579



Confidence

Level(95,0%) 0,001 Confidence Level(95,0%) 0,004

117

WalkOnly WalkSafe

Mean 0,141 Mean 0,128



Mode 0,000 Mode 0,000








Sum 81,700 Sum 74,393

Count 579 Count 579



Confidence Level(95,0%) 0,018 Confidence Level(95,0%) 0,019

WalkAll PS1DIST

Mean 0,730 Mean 0,328



Mode 1,000 Mode 1,000



Kurtosis -0,309 Kurtosis 0,676

Skewness -0,957 Skewness 1,326




Sum 422,908 Sum 190,022

Count 579 Count 579




118

PS2DIST PSCDIST

Mean 0,315 Mean 0,187



Mode 1,000 Mode 0,000








Sum 182,488 Sum 108,496

Count 579 Count 579




Descriptive analysis non chosen routes

Distance RiseMax

Mean 0,120 Mean 0,040



Mode 0,155 Mode 0,000








Sum 1282,631 Sum 432,921

Count 10705,000 Count 10705,000




119

RiseAverage FallMax

Mean 0,010 Mean 0,042



Mode 0,000 Mode 0,000








Sum 108,355 Sum 448,243

Count 10705,000 Count 10705,000




WalkOnly WalkSafe

Mean 0,106 Mean 0,068



Mode 0,000 Mode 0,000








Sum 1136,490 Sum 731,589

Count 10705,000 Count 10705,000




120

WalkAll PS1DIST

Mean 0,825 Mean 0,257



Mode 1,000 Mode 0,500




Skewness -1,774 Skewness 1,387




Sum 8836,920 Sum 2751,986

Count 10705,000 Count 10705,000




PS2DIST PSCDIST

Mean 0,254 Mean 0,177



Mode 0,184 Mode 0,693








Sum 2715,329 Sum 1892,925

Count 10705,000 Count 10705,000




121

Appendix 4 Model estimation results of sample of longest routes


St err

Rob.

t-test

Rob.

p-val

Rho-

square

Adjusted

Rho-square

Significant

BETA_DISTANCE 18.2 1.67 10.88 0.00 0.803 0.780 V

BETA_ACLASS -8.77 0.626 -14.00 0.00 0.823 0.734 V

BETA_BCLASS -5.42 0.941 -5.76 0.00 0.823 0.734 V

BETA_CCLASS -5.45 0.916 -5.94 0.00 0.823 0.734 V

BETA_DCLASS 19.6 0.705 27.85 0.00 0.823 0.734 V

BETA_RISEMAX -100 37.6 -2.66 0.01 0.369 0.346 V

BETA_WALKONLY 1.51 2.90 0.52 0.60 0.006 -0.017 -

BETA_WALKSAFE 2.49 2.06 1.21 0.23 0.008 -0.014 -

BETA_WALKALL -2.00 2.15 -0.93 0.35 0.014 -0.008 -

BETA_Log(PS1DIST) -7.57 1.89 -4.01 0.00 0.254 0.231 V

BETA_Log(PS2DIST) -9.14 2.65 -3.45 0.00 0.398 0.376 V

BETA_PSCDIST -0.762 0.566 -1.35 0.18 0.004 -0.019 -

122

Model: Path-Size Logit for Longest routes

Distance as Trip Length









Rho-square 0.867



St err

Rob.

t-test

Rob.

p-val

Significant

BETA_DISTANCE 30.4 12.6 2.41 0.02 V

BETA_RISEMAX -100 53.8 -1.86 0.06 -

BETA_WALKONLY -6.14 5.21 -1.18 0.24 -

BETA_WALKSAFE -3.59 3.87 -0.93 0.35 -


BETA_Log(PS1DIST) 2.94 4.81 0.61 0.54 -

Model: Path-Size Logit for Longest routes

Route classes as Trip Length









Rho-square 0.885



St err

Rob.

t-test

Rob.

p-val

Significant

BETA_ACLASS -8.11 2.46 -3.29 0.00 V

BETA_BCLASS -2.69 5.10 -0.53 0.60 -

BETA_CCLASS -2.03 4.66 -0.44 0.66 -

BETA_DCLASS 12.8 6.22 2.06 0.04 V

BETA_RISEMAX -100 51.0 -1.96 0.05 V

BETA_WALKONLY -4.05 4.65 -0.87 0.38 -

BETA_WALKSAFE -3.36 3.67 -0.92 0.36 -


BETA_Log(PS2DIST) 1.69 3.83 0.44 0.66 -

123

124

125

Date post:	31-May-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

MScThesis Eka Hintaran · R.E. Hintaran in partial fulfilment of the requirements for the degree of...

Documents