GY460 Techniques of Spatial Analysis
Steve Gibbons
Lecture 6: Probabilistic choice models
Introduction
• Sometimes useful to model individual firm, or other agents choices over discrete alternatives
– Choice of transport mode
– Choice of firm location amongst regions
– Choice of cities or country to migrate to
• Theoretical framework
– Random utility model
• Empirical methods:
– Micro: Probit, logit, multinomial logit
– Aggregate: Poisson, OLS, gravity
The “Random Utility” choice model
Random Utility Model
• RUM underlies economic interpretation of discrete choice models. Developed by Daniel McFadden for econometric applications
– see JoEL January 2001 for Nobel lecture; also Manski (2001) Daniel McFadden and the Econometric Analysis of Discrete Choice, Scandinavian Journal of Economics, 103(2), 217-229
• Preferences are functions of biological taste templates, experiences, other personal characteristics
– Some of these are observed, others unobserved
– Allows for taste heterogeneity
• Discussion below is in terms of individual utility (e.g. migration, transport mode choice) but similar reasoning applies to firm choices
Random Utility Model
• Individual i’s utility from a choice j can be decomposed into two components:
• Vij is deterministic – common to everyone, given the same
characteristics and constraints
– representative tastes of the population e.g. effects of time and cost on travel mode choice
ij is random
– reflects idiosyncratic tastes of i and unobserved attributes of choice j
ijijij VU
Random Utility Model
• Vij is a function of attributes of alternative j (e.g. price
and time) and observed consumer and choice characteristics.
ij ij ij ijV t p z
• We are interested in finding , ,
• Lets forget about z now for simplicity
RUM and binary choices
• Consider two choices e.g. bus or car
• We observe whether an individual uses one or the other
• Define 1 if chooses bus
0 if chooses cari
i
y i
y i
• What is the probability that we observe an individual choosing to travel by bus?
• Assume utility maximisation
• Individual chooses bus (y=1) rather than car (y=0) if utility of commuting by bus exceeds utility of commuting by car
RUM and binary choices
• So choose bus if 01 ii UU
10011 iiii VV
01101 iiii VV
• So the probability that we observe an individual choosing bus travel is
1 0 1 0
1 0 1 0 1 0
Pr ob
Pr ob
i i i i
i i i i i i
V V
t t p p
The linear probability model
• Assume probability depends linearly on observed characteristics (price and time)
• Then you can estimate by linear regression
1 0 1 0Pr ob chooses bus i i i ii t t p p
1 1 0 1 0 1i i i i i iy t t p p
• Where is the “dummy variable” for mode choice (1 if bus, 0 if car)
• Other consumer and choice characteristics can be included (the zs in the first slide in this section)
1iy
The linear probability model
• Unfortunately his has some undesirable properties
Pr ob bus 1
0iV
Linear regression line
Non-linear probability model
• Better for probability function to have a shape something like:
1
0iV
Pr ob bus
Probits and logits
• Common assumptions:
– Cumulative normal distribution function – “Probit”
– Logistic function – “Logit”
expPr ob chooses bus
1 expi
i
Vi
V
• Estimation by maximum likelihood
1
Pr ob 1
Prob 0 1
ln ln 1 1
i
i
i n
i ii
y F
y F
L y F y F
i
i
i i
x β
x β
x β x β
Example
• McFadden, D. (1974) The Measurement of Urban Travel Demand, Journal of Public Economics, 3
• Methods of commuting in San Francisco Bay area
Example 1
Characteristics t
Family income $ 0.000095
(0.774)
Car-bus cost, cents per round trip
-0.01022*
(3.726)
Car-bus vehicle time costs (one way minutes x wage)
-0.01479
(2.460)
Bus total access time costs (one way minutes x wage)
-0.00314
(0.818)
Constant 0.3832 (0.428)
McFadden (1974) car versus bus commute modes in SF Bay area
Multiple choices and the “multinomial logit”
Multiple choices
• We often want to think about many more than two choices
– Choice of regional location
– Choice of transport mode with many alternatives
– Choice amongst a sample of schools
• How can we extend the binary choice logit model?
• Random Utility model extends to many choices
ijijij VU
kjVV ijijikik allfor
• Choose choice k if utility higher than for all other choices
Multinomial logit (1)
• Again we need to assume some distribution for the unobserved factor
• One type of distribution (extreme value) gives a simple solution for the probability that choice k is made:
• This is a generalisation of the logit model with many alternatives = “multinomial logit” or “conditional logit”
expPr ob chooses
expik
ijj
Vi k
V
1 1
ln lnProb i chooses jj J i n
ijj i
L y
Multinomial logit (2)
• Recall: Vij is a linear function of observed characteristics
of the individuals and their choices. e.g. for travel mode choice
• Parameters estimated:
• For an individual characteristic that is common across choices (e.g. income, gender): one parameter per choice
– For at least one choice this is zero (base case).
• For a characteristic which varies only across choices e.g. price of transport: one parameter common across choices
ij ij ij j ijV t p z
Example: Value of time• MNL models used to estimate “value of travel time” with from observed commuter behaviour
• Three transport choices: bus (0), train (1), car (2)
• Choosing bus as the base case:
1 1 0 1 0
1 0 1 0
2 2 0 2 0
2 0 2 0
( ) ( )
( ) ( )
( ) ( )
( ) ( )
i
i i
i
i i
V price price time time
sex companycar
V price price time time
sex companycar
Example 1: Value of time
• For example, from Truong and Hensher, Economic Journal, 95 (1985) p. 15 for bus/train/car choices in Sydney 1982
Example 2: immigration
• Scott, Coomes and Izyumov, (2005)The Location Choice of Employment-Based Immigrants among U.S. Metro Areas. Journal of Regional Science 45(1) 113-145
• Estimate the impact of metropolitan area characteristics on destination choice for US migrants in 1995
• 298 destination MSAs
Example 2: immigration
Source: Scott, Coomes et al (note: they also report models which include individual Xs)
The independence of irrelevant alternatives problem (IIA) and the nested logit model
Multinomial logit and “IIA”
• Many applications in economic and geographical journals (and other research areas)
• The multinomial logit model is the workhorse of multiple choice modelling in all disciplines. Easy to compute
• But it has a drawback
Independence of Irrelevant Alternatives
• Consider market shares
– Red bus 20%
– Blue bus 20%
– Train 60%
• IIA assumes that if red bus company shuts down, the market shares become
– Blue bus 20% + 5% = 25%
– Train 60% + 15% = 75%
• Because the ratio of blue bus trips to train trips must stay at 1:3
Independence of Irrelevant Alternatives
• Model assumes that ‘unobserved’ attributes of all alternatives are perceived as equally similar
• But will people unable to travel by red bus really switch to travelling by train?
• Most likely outcome is (assuming supply of bus seats is elastic)
– Blue bus: 40%
– Train: 60%
• This failure of multinomial/conditional logit models is called the
• Independence of Irrelevant Alternatives assumption (IIA)
Independence of Irrelevant Alternatives
• It is easy to see why this is:
• Ratio of probabilities of choosing k (e.g. red bus) and another choice l (e.g. train) is just
exp
expik
il
V
V
• All other choices drop out of this odds ratio
• There are models that overcome this, e.g…
Nested Logit Model
• Multinomial logit model can be generalised to relax IIA assumption
– Nested Logit (Nested Multinomial Logit)
Car (1) Public transport (2)
Bus (3) Train (4)
• Characteristics of Bus and Train affect decision of whether to use Car or Public Transport
• Estimate by sequential logits…
Nested Logit Model
• Value placed on choices available in second stage (3,4) enter into calculation of choice probabilities in first stage (2)…
• Logit for bus versus train to estimate V3 and V4
• Define the ‘Inclusive Value’ of public transport as
2 3 4ln exp expI V V
• Estimate logit model for Car (1) versus Public (2) using:
2 2
2 2 1
expPr ob Public
exp exp
V I
V I V
Example: Transport mode choice
• Asensio, J., Transport Mode Choice by Commuters to Barcelona’s CBD, Urban Studies, 39(10), 2002
• Travel mode for suburban commuters
• Sample of 1381 commuters from a travel survey
• Records mode of transport and other individual characteristics
Private car Public transport
Train Bus
Example: Transport mode choice
• Asensio, J., Transport Mode Choice by Commuters to Barcelona’s CBD, Urban Studies, 39(10), 2002
– Some selected coefficients
Variable Parameter
Cost -0.002
Travel time by car -0.054
Travel time by public transport -0.018
Sex (car) 0.889
Sex (bus) -1.001
• We don’t know the units of measurement, but how much more valuable is time saved car than time saved by public transport?
Other discrete choice applications
• Firm location choices e.g. Head, K. and T.Mayer seminar reading (2004), Market Potential and the Location of Japanese Investment in the European Union, Review of Economics and Statistics, 86(4) 959-972
• School choice (e.g. Barro, L. (2002) School choice through relocation: evidence from the Washington, D.C. area, Journal of Public Economics, 86 p.155-189
• Migration destinations
• Residential choice
Aggregate choice models
Micro and aggregated choice models
• Micro level logit choice models often have aggregated equivalents
• i.e. if you only have choice characteristics, you could use a choice-level regression of the proportion of individuals making each choice on the choice characteristics
• Obviously log(n_k) would work too (why?)
expPr ob chooses
exp
ln Pr ob chooses ln exp
ln /
k
jj
k jj
k k i
Vi k
V
i k V V
n N x
Micro and aggregated choice models
• In fact, a Poisson model on aggregated data gives exactly the same coefficient estimates as the conditional logit model
• Which is based on ML estimation of
• See Guimaraes et al Restats (2003)
– though this equivalence was known before this ‘discovery’
• Here’s an example…
expPr ob number choosing =
!
ln
knk k
kk
k k
k nn
x
Data (295 i’s 3 j’s)
id choice d x
1 American 0 18.97627
1 Japan 0 7.542373
1 Europe 1 3.461017
2 American 1 18.97627
2 Japan 0 7.542373
2 Europe 0 3.461017
3 American 1 18.97627
3 Japan 0 7.542373
3 Europe 0 3.461017
4 American 0 18.97627
4 Japan 1 7.542373
4 Europe 0 3.461017
5 American 1 18.97627
5 Japan 0 7.542373
5 Europe 0 3.461017
Conditional logit
Conditional (fixed-effects) logistic regression Number of obs = 885
LR chi2(1) = 129.65
Prob > chi2 = 0.0000
Log likelihood = -259.26785 Pseudo R2 = 0.2000
------------------------------------------------------------------------------
choice | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .0999331 .0091997 10.86 0.000 .081902 .1179642
------------------------------------------------------------------------------
Simpler data
choice n x p
American 192 18.97627 0.650847
Japan 64 7.542373 0.216949
Europe 39 3.461017 0.132203
Poisson
Poisson regression Number of obs = 3
LR chi2(1) = 129.65
Prob > chi2 = 0.0000
Log likelihood = -9.3973119 Pseudo R2 = 0.8734
------------------------------------------------------------------------------
n | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .0999331 .0091997 10.86 0.000 .081902 .1179642
_cons | 3.364614 .1450806 23.19 0.000 3.080262 3.648967
------------------------------------------------------------------------------
OLS
. reg lnp x
Source | SS df MS Number of obs = 3
-------------+------------------------------ F( 1, 1) = 370.23
Model | 1.32738687 1 1.32738687 Prob > F = 0.0331
Residual | .003585331 1 .003585331 R-squared = 0.9973
-------------+------------------------------ Adj R-squared = 0.9946
Total | 1.3309722 2 .665486102 Root MSE = .05988
------------------------------------------------------------------------------
lnp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .101293 .0052644 19.24 0.033 .034403 .168183
_cons | -2.339238 .06295 -37.16 0.017 -3.139094 -1.539383
------------------------------------------------------------------------------
Aggregate v micro choice models
• Hence, there’s little point in using conditional logit if you only have choice-characteristics
• Conditional/multinomial logit is good if you have individual and group-level characteristics
• The aggregated OLS version gives rise to “Spatial interaction” models of flows between origins and destinations
• = Gravity models
• Widely applied (generally a-theoretically) in migration, trade and commuting applications
– e.g. See Head (2003) Gravity for beginners
Gravity/spatial interaction/migration/trade models
• Flow from place j to place k modelled as
• Typically characteristics of destination and source include some measure of “attraction” e.g. population mass (or “market potential” in trade models) wages (endogenous)
• And measure of the cost in moving between place j and d (e.g. log distance)
• Hence gravity – after Newton
ln( )jk jk j k jkn x
ln( ) lnjk jk jk j k jkn d x
ln( ) ln ln 2lnjk j k jkForce mass mass dist
• Strong distance decay effects
– Typical elasticities -0.5 to -2.0
• Even for internet site visits!: see Blum and Goldfarb (2006) Journal of International Economics
• Trade literature has many examples
• Disdier and Head (2003) The Puzzling Persistence Of The Distance Effect On Bilateral Trade, Review of Economics and Statistics
– Finds mean distance elasticity of -0.9 from about 1500 studies
Gravity/spatial interaction/migration/trade models
Conclusion
• Generally possible to model ‘choices’ as discrete, or as flows
• Discrete choice models offer the advantage of
– Including micro-level (individual/firm) level characteristics
– An underlying structural model (RUM)
• Aggregate flow models
– Simpler to compute
– No need for distributional assumptions necessary for maximum likelihood (nonlinear) methods
– A can’t separate individual from aggregate factors