Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
Course Notes:
Statistical and Econometric Methods
0
1 2 3 4 5
h(t)
t
1
h (t)
h (t)
h (t)h (t)
3
4
2
2
4
0.5
1.0
0
1 2 3 4t
F(t)
h(t)
S(t)f(t)
+ +
+ +
+ +
+
+ + +
+
+
+ +
+
+
- -
- - -
-
-
-
-
-
-
- -
- - -
- -
-
sn
+ +
+ - -
-
+ +
-
-
+
+
+
+ + -
- -
- -
-
+ + +
line 1
line 2
+ +
( ) ( ) ( ) ( ) ( )( ) ( ) ( )21 6 1 1J
J+1n i j j j i
j i
E | i = - J P LN P P LN Pξ σ ρ π≠
⎡ ⎤⎡ ⎤− +⎢ ⎥⎣ ⎦⎣ ⎦∑
( ) ( )1f
I In
oI In
X
I In I X
CV - LN EXP Xβ
β
λ β∀
⎡ ⎤= − ⎢ ⎥⎣ ⎦∑ ( ) ( )
i
iX
S g X h X dX= ∫
( ) ( )
( ) ( ) ( )
Δ1
Δ
i i kI kI
kii i kI kI kI kI
n n
P ( i ) Ix
I I I
EXP X EXP xE
EXP X EXP x EXP x
β β
β β β∀
∀ ∀ ≠
⎡ ⎤⎣ ⎦= −
+⎡ ⎤⎣ ⎦
∑∑ ∑
( ) ( )!
iyi i
i i
EXPL β =
yλ λ−
∏
( ) ( )0
! !i i
i
ry m
i i i i im
P y = y mλ λ=
⎡ ⎤⎡ ⎤ ⎢ ⎥⎣ ⎦
⎣ ⎦∑
1((1 ) ) 1( )
(1 ) ! (1 ) (1 )
iy
i ii
i i i i
yLy
αΓ α α λλΓ α α λ α λ
⎛ ⎞ ⎛ ⎞+= ⎜ ⎟ ⎜ ⎟+ +⎝ ⎠ ⎝ ⎠∏
( ) 11 1T Tˆ X X X Y−− −=β Ω Ω ( ) 2 ITΕ εε σ= 00in in
inn in
V Vy Inc p
+ =∂ ∂∂ ∂
Professor Fred Mannering Purdue University – Spring 2007
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
i
Table of Contents Review of Statistical Methods (Estimators and their statistical properties) 1 Model Estimation 1 Properties of Estimators 1 Bias 1 Efficiency 2 Consistency 3 Other Asymptotic Properties 4 Least Squares and Maximum Likelihood Estimation 5 Properties of Least Squares Estimators 5 Maximum Likelihood Estimation 6 Specification Issues and Least Squares 9
Specification Error 9 Non-zero Disturbance Mean 10 Errors in Variables 10 Correlation Between Explanatory Variables and Disturbances 11 Selectivity Bias 11 Non-normality of Disturbances 12 Heteroskedasticity 12 Serial Correlation 12 Multicollinearity 13 Simultaneous Equation Models 13
Reduced Form and the Identification Problem 14 The Identification Problem 16
Order Condition 16 Simultaneous Equation Estimation 16
Single equation methods 17 System equation methods 18
A note on generalized least squares estimation 18 Hypothesis Testing and Diagnostics for Continuous Dependent Variable Models 20 Assessment of Estimates Coefficients 20 Overall Model Assessment 21 Count Data Models 24
Poisson Regression Model Goodness of Fit Measures 26 Truncated Poisson Regression Model 27 Negative Binomial Regression Model 28
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
ii
Zero-Inflated Poisson and Negative Binomial Regression Models 29 Discrete Outcome Models (Models of Discrete Data) 32 Binary and Multinomial Probit Models 34
Multinomial Logit Model 36 Indirect Utility 42 Properties and Estimation of Multinomial Logit Models 43 Statistical Evaluation 45 Interpretation of Findings 47 Elasticity 47 Cross-elasticity 48 Marginal Rates of Substitution (MRS) 49 Specification Errors 49
Independence of Irrelevant Alternatives (IIA) property 49 Other Specification Errors 51
Endogeniety in Discrete Outcome Models 53 Data Sampling 53
Forecasting and Aggregation Bias 56 Transferability 58
The Nested Logit Model (Generalized Extreme Value Models) 59 Special Properties of Logit Models 63
Sub-sampling of alternate outcomes for model estimation 63 Compensating Variation 63
Models of Ordered Discrete Data 64 Discrete/Continuous Models 69
The Discrete/Continuous Modeling Problem 69 Econometric Corrections: Instrumental Variables and Expected Value Method 70 Econometric Corrections: Selectivity-Bias Correction Term 72 Discrete/continuous Model Structures 74
Reduced form approach 74 Economic consistency approach 76 Duration Models 78
Hazard-Based Duration Models 78 Proportional hazards 81 Accelerated Lifetime 82 Characteristics of Duration Data 82 Tied Data 84
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
iii
Non-Parametric Models 84 Semi-Parametric Models 84 Fully-Parametric Models 85
Exponential 85 Weibull 86 Log-logistic 86
Comparisons of Non-Parametric, Semi-Parametric, and Fully-Parametric Models 87 State Dependence 90 Time-Varying Covariates 91 Discrete-Time Hazard Models 91
Course Assignments 93
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
1
Review of Statistical Methods
Estimators and their statistical properties Model Estimation
Consider a model of household vehicle miles of travel – Household Miles Driven Over
Some Time Period:
yt = β0 + β1 xt + εt
where: yt Dependent Variable,
xt Independent Variable
β0, β1 Estimable Parameters
εt Disturbance or Error Term
Estimation problem is one of finding values for β0 and β1.
Properties of Estimators
Classes of properties:
• Small sample - Hold for any size sample • Asymptotic - Hold only as the limit of n → ∞
Bias
- Desirable to have the estimator distribution have a mean value equal to the true parameter.
- Define unbiasedness as ( )ˆΕ β β=
- For small sample unbiasedness: ( )ˆ nΕ β β= ∀
- For asymptotic unbiasedness: ( )n
ˆlim Ε β β→∞
=
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
2
In general, bias is defined as: Bias = ( )ˆΕ β β−
Illustration of biased estimators.
Efficiency
Efficiency is a small sample property.
One estimator is more efficient than another if it has smaller variance: (Both estimators
must be unbiased).
e.g., 1β is more efficient than 2β if ( ) ( )1 2ˆ ˆVAR VARβ β<
The best unbiased estimator is the most efficient among all unbiased estimators.
E(beta hat)=beta E(beta hat) ne beta
bias
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
3
The most efficient estimator is defined as having a smaller variance than any other
unbiased estimator.
μ x
~ N( μ ,σ 2 /n)
X 1 ~ N( μ ,σ 2 )
X
( ) xE X μ=
Illustration of efficient estimators.
Identification of best unbiased (most efficient) estimator is achieved by the Cramer-
Rao theorem. Under a number of assumptions, it can be shown that for all
estimators:
( ) 2
2
1ˆVARL
β∂Ε∂β
⎛ ⎞⎜ ⎟⎜ ⎟≥⎜ ⎟⎡ ⎤−⎜ ⎟⎢ ⎥⎜ ⎟⎣ ⎦⎝ ⎠
Can prove most efficient estimator if ( )ˆVAR β is equal to the Cramer-Rao bound.
Consistency
Consistency is an asymptotic property.
Definition: A consistent estimator has a distribution which collapses on the true
parameter value as the sample size increases.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
4
β converges to β in the probability limit if for any
( ) 0n
ˆlim Prob β β→∞
− = . Also, ( ) 0n
ˆplim VAR β→∞
=
Prob
abili
ty d
ensi
ty
μ x
f (X*) for n 1
f (X*) for n 2 < n 1
f (X*) for n 4 < n 3
f (X*) for n 3 < n 2
Note: Consistent estimators can be biased and inefficient; therefore, consistency is not a
strong property.
Other Asymptotic Properties
Desire to show that estimator's distribution can be approximated better and better as sample
size increases.
1. Asymptotically Normal - The estimator's distribution converges to a normal
distribution.
2. Asymptotic Efficiency - nβ is asymptotically efficient if:
nβ is consistent
nβ asymptotic variance is smaller than the asymptotic variance of all other
consistent estimators
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
5
Models with Continuous Dependent Variables Least Squares and Maximum Likelihood Estimation
Least Squares Estimation
The object of least squares is to fit an equation that minimizes the squared differences between equation predicted values and observed values (i.e., data).
the objective function is: ( )2
i iˆM in Y Y−∑
where: Yi - Actual observations, iY - Refers to fitted values
The term i iˆY Y− is referred to as the residual and is denoted as εi.
For the case 0 1i iY xβ β= + it can be shown that:
( )( )( )
( )1 2 22
i ii i i i
ii i
x x Y Yn x Y x Y
x xn x xβ
− −−= =
−−
∑ ∑ ∑∑∑ ∑
and 0 1ˆY b xβ = −
For the case of many independent variables, least squares estimation can be represented in matrix form as ( ) ( )1X X X Yβ −′ ′=
where:
1
2
3
K
ˆ
ˆˆ ˆ
ˆ
β
ββ β
β
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥= ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
K = number of independent variables (x's); ' = indicates transposed matrix; -1= indicates matrix inversion; X = matrix of independent variables - N × K; Y = vector of dependent variable - N × 1 Properties of Least Squares Estimators
Under very general assumptions, the Gauss-Markov theorem demonstrates the least squares estimators (OLS - Ordinary Least Squares) are BLUE.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
6
BLUE - Best Linear Unbiased Estimator
Implies: Unbiased, Efficient
Assumptions required to prove OLS is BLUE:
A1. Normality:The disturbance term iε is normally distributed.
A2. Zero Mean:
E( iε ) = 0
A3. Homoskedasticity: Disturbance terms have the same variance.
( )2 2iE ε σ=
A4. Serial Independence: Disturbance terms are not correlated.
( ) 0i jE i jε ε = ∀ ≠
A5. Non Stochastic X: Is not random and has fixed values in repeated samples.
Maximum Likelihood Estimation
Principle: Different statistical populations generate different samples; any one sample is
more likely to come from some populations rather than others.
Example: If we have a sample of Y1, Y2, ... , Yn, we want to find the value of β most
likely to generate this sample.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
7
Consider the simple model, 0 1i i iY xβ β ε= + + Assume (as in OLS) that Yi is normally distributed with mean 0 1 ixβ β+ and variance σ
2,
therefore, the probability distribution can be written as:
( ) ( )20 122
1 122
i i iP Y E X P Y xβ βσπ σ
⎡ ⎤= − − −⎢ ⎥⎣ ⎦
The likelihood function is:
( ) ( ) ( ) ( ) ( )221 2 0 1 1 2 0 122
1
1 122
N
N N i ii
L Y ,Y , ,Y , , , P Y P Y P Y EXP Y xβ β σ β βσπσ=
⎛ ⎞ ⎡ ⎤= = − − −⎜ ⎟ ⎢ ⎥⎣ ⎦⎝ ⎠∏… …
where Π is the product of N factors.
Y1 Y6 Y3 Y5 Y2 Y4
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
8
For simplicity, work is done with the logarithm of L rather than L itself. This is acceptable since L is always non-negative and the logarithmic function is monotonic (preserves ordering). Maximizing LN( L), LLwith respect to 2
0 1, ,β β σ gives:
( ) ( )0 120
1 0i i
L LY xβ β
β σ∂
= − − =∂ ∑
( ) ( )0 12
1
1 0i i i
L Lx Y xβ β
β σ∂
= − − =⎡ ⎤⎣ ⎦∂ ∑
( ) ( )0 12 2 4
1 02 2 i i
LL N Y xβ βσ σ σ
∂= − + − − =
∂ ∑
Solving these equations gives:
0 1Y xβ β= − and ( )( )
( )1 2i i
i
x x Y Y
x xβ
− −=
−∑∑
which is equivalent to OLS estimators. However, in general, MLE's are not necessarily BLUE. Properties of MLE's (Maximum Likelihood Estimators) 1) They are consistent.
2) They are asymptotically normal.
3) They are asymptotically efficient (i.e., asymptotic variance = Cramer-Rao Bound)
Note: Maximum Likelihood Estimators are not generally unbiased or efficient.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
9
Specification Issues and Least Squares A. Specification Error
refers to errors resulting from a misspecified model (i.e., functional form). 1) Omitted Variables
Suppose the true model is: 1 1 2 2i i i iy x xβ β ε= + + and we estimate: 2 2
* *i i iy xβ ε= +
It can be shown by substitution that ( )( )2 2 1
2 1*
1
C O V x , xˆV A R x
β β β= +
Because there is no guarantee that the second term is equal to zero, the estimation of
2*β in the misspecified equation will be biased.
Because this bias does not dissappear and n → ∞ , the parameter will be inconsistent as well. However, if ( ) 02 1C O V x , x = (i.e., 2x and 1x are not correlated) then estimators will be BLUE except intercept.
2) Presence of Irrelevant Variables
Suppose the true model is 2 2i i iy xβ ε= + and we estimate The irrelevant variable x3 implies we are not accounting for the parameter restriction
where 3 0*β = . In general, not accounting for all available information leads to loss of efficiency; but no
loss of consistency or bias.
So, 2*β is unbiased and consistent. ( )2 2
*ˆE β β=
2 2 3 3* * *
i i i iy x xβ β ε= + +
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
10
but it is not efficient since ( ) ( )2 2*ˆ ˆVAR VARβ β>
exception is when COV(x2, x3) = 0 when estimators are again BLUE except intercept.
3) Nonlinearities
Suppose the true model is 2 32 2 3 2 4 2i i i i iy x x xβ β β ε= + + +
and we estimate 2 2
* *i i iy xβ ε= +
This results in the same consequences as omitted variables (i.e., biased and inconsistent
parameter estimates).
B. Non-zero Disturbance Mean (violation of assumption A2)
i.e., E(εi) ≠ 0 Cause: Can result from consistent positive or negative errors of measurement in Y.
If an intercept (β0) is excluded, the parameter estimates will be biased and
inconsistent.
If and intercept is included, it will be a biased estimate of the true intercept, but all
other parameters will be BLUE.
C. Errors in Variables (violation of assumption A5) If we have i i iy xβ ε= + and:
1) yi is measured with error, i.e., we use y* = yi + μi (μi is error)
If COV(μi, xi) = 0 then β is unbiased and consistent
If COV(μi, xi) ≠ 0 then bias and consistent
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
11
2) xi is measured with error
Then parameters will be bias and inconsistent 3) yi and xi measured with error
Then parameters will be bias and inconsistent
D. Correlation Between Explanatory Variables and Disturbances (violation of assumption A5)
Implies x does not have fixed values in repeated samples (A5)
If x and εi are correlated, β will be a biased and inconsistent estimator of β.
This correlation problem is the same problem that results from endogenous variables,
and leads to simultaneous equation estimation techniques. E. Selectivity Bias
Evolves when the available data sample is not representative of the entire population, and the reason for this is based on some selection process.
For example: Estimating a VMT equation for new cars will be biased since households buy new cars since they drive more (i.e., we do not know how much people owning used cars would drive if they had new cars).
+ +
+ +
+ +
+
+ + +
+
+
+ +
+
+
- -
- - -
-
-
-
-
-
-
- -
- - -
- -
-
sn
+ +
+ - -
-
+ +
-
-
+
+
+
+ + -
- -
- -
-
+ + +
line 1
line 2
+ +
βf Xn
Results in biased parameter estimates.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
12
F. Non-normality of Disturbances (violation of assumption A1)
Causes: 1. Measurement errors 2. Unobserved parameter variations
Results in hypothesis testing problems (i.e., hypothesis testing depends crucially on the normality assumption). With failure of normality, OLS is inefficient but still consistent. Diagnostics: 1. Specification tests 2. Plot residuals and see if they are normal
G. Heteroskedasticity (violation of assumption A3)
Results when the disturbance term variables have variances that are not equal
( ) ( ) ( )2 2 21 2 NE E Eε ε ε≠ ≠ ≠…
Causes: 1. Unequally sized observation units 2. Aggregation Heteroskedasticity results in OLS estimates that are unbiased and consistent but not efficient. Diagnostics: 1. Plot of squared residuals versus independent variable 2. Split sample regressions
H. Serial Correlation (violation of assumption A4)
Results when E(εiεj) ≠ 0 ∀ i ∀ j Causes: 1. Persistent disturbances 2. Omitted smoothly changing variables 3. Time averaged data Serial correlation results in OLS estimators that are generally unbiased and consistent but
not efficient.
If lagged dependent variables are in a model that has serial correlation, the problems are
much more severe.
Diagnostic: 1. Durbin-Watson statistic
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
13
I. Multicollinearity
Results when independent variables are highly correlated. Cause: 1. Lack of variation among data OLS estimators in the presence of multicollinearity remain BLUE However, the standard errors of the estimated coefficients can be quite large Diagnostic: 1. Condition number of X'X
Simultaneous Equation Models
Interrelated equations with continuous dependent variables:
Utilization of individual vehicles (measured in kilometers driven) in multivehicle
households
Interrelation between travel time from home to an activity and the duration of the
activity
Interrelation of average vehicle speeds by lane with the vehicle speeds in adjacent
lanes.
Problem:
Estimation of equation systems by the ordinary least squares (OLS) violates a key
OLS assumption in that a correlation between regressors and distrubances will be
present because not all independent variables are fixed in random samples (violation
of A5).
Overview of the simultaneous equations problem
Consider annual vehicle utilization equations (one for each vehicle) in two-vehicle
households of the following linear form:
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
14
1 1 1 1 1 2 1u Z X u= + + +β α λ ε
2 2 2 2 2 1 2u Z X u= + + +β α λ ε
Where:
u1 is the kilometers per year that vehicle 1 is driven,
u2 is the kilometers per year that vehicle 2 is driven,
Z1 and Z2 are vectors of vehicle attributes (for vehicles 1 and 2 respectively),
X is a vector of household characteristics,
β's, α's, are vectors of estimable parameters, λ's are estimable scalars, and ε's are
disturbance terms.
To satisfy regression assumption A5, the value of the dependent variable (left-hand
side variable) must not influence the value of an independent variable (right-hand
side).
This is not the case in these equations because in the first equation the independent
variable u2 varies as the dependent variable u1 varies, and in the second equation, the
independent variable u1 varies as the dependent variable u2 varies.
Thus, u2 and u1 are said to be endogenous variables in Equations 5.1 and 5.2
respectively.
Reduced Form and the Identification Problem
Reduced form solution: solving two equations and two unknowns to arrive at reduced forms.
Substituting second equation into the first in the previous example:
[ ]1 1 1 1 1 2 2 2 2 1 2 1u Z X Z X u= + + + + + +β α λ β α λ ε ε
rearranging,
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
15
1 1 1 2 1 2 1 2 11 1 2
1 2 1 2 1 2 1 21 1 1 1u Z X Z+ +
= + + +− − − −β α λ α λ β λ ε ελ λ λ λ λ λ λ λ
and similarly substituting first equation for u1 in the second equation gives,
2 2 2 1 2 1 2 1 22 2 1
2 1 2 1 2 1 2 11 1 1 1u Z X Z+ +
= + + +− − − −β α λ α λ β λ ε ελ λ λ λ λ λ λ λ
Because the endogenous variables u1 and u2 are replaced by their exogenous determinants,
the equations cand be estimated using ordinary least squares (OLS) as,
1 1 1 1 1 2 1u a Z b X c Z= + + + ξ , and
2 2 2 2 2 1 2u a Z b X c Z= + + + ξ ,
where,
1 1 1 2 1 2 1 2 11 1 1 1
1 2 1 2 1 2 1 2
; ; ;1 1 1 1
a b c + += = = =
− − − −β α λ α λ β λ ε εξλ λ λ λ λ λ λ λ
2 2 2 1 2 1 2 1 22 2 2 1
2 1 2 1 2 1 2 1
; ; ;1 1 1 1
a b c + += = = =
− − − −β α λ α λ β λ ε εξλ λ λ λ λ λ λ λ .
OLS estimation of these reduced form models (Equations 5.6 and 5.7) is called indirect least
squares (ILS).
Problem: While estimated reduced form models are readily used for forecasting purposes, if
inferences are to be drawn from the model system, the underlying parameters need to be
determined.
Unfortunately, uncovering the underlying parameters, (the β's, α's, and λ's) in reduced form
models is problematic because either too little or too much information is often available.
For example, note that above equations provide two possible solutions for β1,
( ) ( )2 2 11 1 1 2 1
2
c 1a 1 and
λ λβ λ λ β
λ−
= − = .
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
16
The Identification Problem
In some instances, it may be impossible to determine the underlying parameters. In
these cases, the modeling system is said to be unidentified.
In cases where exactly one equation solves the underlying parameters, the model system
is said to be exactly identified.
When more than one equation solves the underlying parameters (as shown in Equation
5.10), the model system is said to be over identified.
Order Condition
Determines an equation to be identified if the number of all variables excluded from an
equation in an equation system is greater than or equal to the number of endogenous
variables in the equation system minus one.
For example, in the first equation in the original equation system above, the number of
elements in the vector Z2, which is an exogenous vector excluded from the equation,
must be greater than or equal to one because there are two endogenous variables in the
equation system (u1 and u2).
Simultaneous Equation Estimation
1) Two modeling alternatives: single-equations estimation methods and systems
estimation methods.
2) The distinction between the two is that systems methods consider all of the
parameter restrictions (caused by over identification) in the entire equation system
and account for possible contemporaneous (cross-equation) correlation of
disturbance terms.
3) Because system estimation approaches are able to utilize more information
(parameter restrictions and contemporaneous correlation), they produce variance-
covariance matrices that are at worst equal to, and in most cases smaller than those
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
17
produced by single-equation methods (resulting in lower standard errors and higher
t-statistics for estimated model parameters).
Single equation methods
1) Indirect least squares (ILS)
Applies ordinary least squares to the reduced form models.
Consistent but not unbiased
2) Instrumental variables (IV)
1) Uses an instrument (a variable that is highly correlated with the endogenous variable it replaces, but is not correlated to the disturbance term) to estimate individual equations
2) Consistent but not unbiased.
3) Two-stage least squares (2SLS)
Approach finds the best instrument for endogenous variables.
Stage 1 regresses each endogenous variable on all exogenous variables.
Stage 2 uses regression-estimated values from stage 1 as instruments, and
estimates equations with ordinary least squares.
Consistent but not unbiased. Generally better small sample properties than ILS
or IV.
4) Limited Information Maximum Likelihood (LIML)
Uses maximum likelihood to estimate reduced form models. Can incorporate
parameter restrictions in over identified equations.
Consistent but not unbiased. Has same asymptotic variance-covariance matrix
as 2SLS.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
18
System equation methods
1) Three Stage Least Squares (3SLS)
Stage 1 gets 2SLS estimates of the model system.
Stage 2 uses the 2SLS estimates to compute residuals to determine cross-equation correlations.
Stage 3 uses generalized least squares (GLS) to estimate model parameters.
Consistent and more efficient than single-equation estimation methods.
2) Full Information Maximum Likelihood (FIML)
Similar to LIML but accounts for contemporaneous correlation of disturbances in the likelihood function.
Consistent and more efficient than single-equation estimation methods. Has same asymptotic variance-covariance matrix as 3SLS.
A note on generalized least squares estimation
Ordinary least squares (OLS) assumptions are that disturbance terms have equal variances
and are not correlated. Generalized least squares (GLS) is used to relax these OLS
assumptions. Under OLS assumptions, in matrix notation,
( ) 2T = IΕ εε σ
where:
E(.) denotes expected value,
ε is an n × 1 column vector of equation disturbance terms (where n is the total number of observations in the data), Tε is the 1 × n transpose of ε,
σ 2 is the disturbance term variance, and
I is the n × n identity matrix,
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
19
1 0 . 00 1 . 0. . . .0 0 . 1
⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦
I .
When heteroskedasticity is present, ( )T =Ε εε Ω , where Ω is n × n matrix,
21
22
2
0 00 0
0 0 n
.
.. . . .
.
σσ
Ω
σ
⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
.
For disturbance-term correlation, ( ) 2T =Ε εε σ Ω , where
1
2
1 2
11
1
N
N
N N
.
.. . . .
.
−
−
− −
⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
ρ ρρ ρ
Ω
ρ ρ
Recall that in ordinary least squares, parameters are estimated from,
( ) 1T Tˆ X X X Y−
=β ,
where:
β is an p × 1 column vector (where p is the number of parameters),
X is an n × p matrix of data,
TX is the transpose of X, and
Y is an n × 1 column vector.
Using Ω, Equation 5A.5 is rewritten as,
( ) 11 1T Tˆ X X X Y−− −=β Ω Ω .
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
20
The most difficult aspect of GLS estimation is obtaining an estimate of the Ω matrix. In
3SLS, it is estimated using the initial 2SLS parameter estimates.
Hypothesis Testing and Diagnostics for Continuous Dependent Variable Models
The objective of hypothesis testing and diagnostics is to determine the "best" model fit to a specified data set. A. Assessment of Estimates Coefficients
The most commonly used statistic used to evaluate coefficients is the t-statistic. The t-statistic is defined as:
D Fˆ
ˆt
S β
β β−=
where: tDF the t-stat with DF (degrees of freedom) (N-K) = DF (N minus the
number of coefficients in the model)
β the estimated parameter β value of parameter testing against (usually zero) ˆS β standard error of β (i.e., square root of VAR( β ))
Example: Suppose we estimate the model Y = A + Bx, with 30 observations, and find Coeff. Value Standard Error t-Stat A 2.47 1.92 1.29 B -3.13 1.33 2.35 * t-Stat is calculated with β = 0
Wish to test whether A and B are significantly different from zero.
For both DF = N-2 = 28
Wish to test that A > 0 and B < 0 use a one-tailed t-test.
From tables we find the critical values for: t0.90, 28 = 1.313 90% confidence level
t0.99, 28 = 2.467 99% confidence level
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
21
The hypotheses are: HO : A, B = 0
HA : A > 0, B < 0
For A, 1.29 < 1.313 so we can only be about 89% confident that A > 0
For B | -2.35 | < | -2.467 | s0 we can be about 90.5% confident that B < 0
If we want to test A ≠ 0 and B ≠ 0 we use a two-tailed test:
From tables we find the critical values for:
t0.90, 28 = 1.701 at 90% confidence level
t0.99, 28 = 2.763 at 99% confidence level
The hypotheses are: HO : A, B = 0
HA : A > 0, B < 0
We will be less confident since critical t-values are larger for the two-tailed test.
B. Overall Model Assessment
1) R-Squared
The most commonly used statistic is the R-squared.
R-squared is the ratio of data variance explained by the model to total data
variance.
( )( )
2
22
i
i
Y Y explainedRvariation in YY Y
−= =
−
∑∑
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
22
or = ( )
2
21 1i
e SSR Residual VariationTotal Variation in YY Y
−− = −
−∑∑
Generally, the higher the R-squared value, the better. However, it is important to
consider:
a) The Amount of Variance in the Data
Data with little variance may produce high R2's, but the model is not explaining
much.
Conversely, data with much variance may produce low R2's, but may still be
explaining much of the underlying process.
As a rule: It may be better to explain a little of a lot of variance rather than a lot of a
little variance.
b) The Number of Independent Variables in the Model
The R2 statistic will always increase as more variables are added.
To resolve this problem, the corrected R-squared statistic is used:
( )2 2 11 1 NR RN K
−= − −
−
where: N = number of observations
K = number of parameters in the model
The corrected 2R accounts for the number of variables in the model and therefore
can decline when additional variables are added.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
23
2) F-Statistic
The F-statistic is used to test whether the model is significantly different from zero (i.e.,
if a relation exists or not).
The F-statistic tests the joint hypothesis that all parameters are equal to zero
For finding critical values of F (i.e., from tables), the degrees of freedom are 1KN K
−−
where: N = number of observations
K = number of parameters in the model
Generally, if t-stats and R2's are good, F-stat will be OK.
3) Durbin-Watson Statistic
This statistic is used to test for the presence of serial correlation (auto correlation) of
disturbances.
The further away the statistic is from 2.0, the less confident we can be about the
absence of serial correlation.
4) Condition Number
Is used to determine the extent of multicollinearity.
It is derived from the characteristic roots of the X'X matrix.
Condition number = Largest Characteristic RootSmallest Characteristic Root
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
24
CN < 10 No multicollinearity
10 < CN < 100 Some Problems
CN > 100 Serious multicollinearity
Count Data Models
Count data consist of non-negative integer values
Examples:
number of driver route changes per week,
the number of trip departure changes per week,
drivers' frequency-of-use of ITS technologies over some time period,
the number of accidents observed on road segments per year.
Count data can be properly modeled by using a number of methods, the most popular of
which are Poisson and negative binomial regression models.
Poisson Regression Model
Consider the number of accidents occurring per year at various intersections in a city.
In a Poisson regression model, the probability of intersection i having yi accidents per year
(where yi is a non-negative integer) is given by:
( ) ( )!
iyi i
ii
EXPP y
yλ λ−
=
Where:
P(yi) is the probability of intersection i having yi accidents per year
λi is the Poisson parameter for intersection i, which is equal to
intersection i's expected number of accidents per year, E[yi].
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
25
Poisson regression models are estimated by specifying the Poisson parameter λi (the
expected number of events per period) as a function of explanatory variables.
The most common relationship between explanatory variables and the Poisson parameter is
the log-linear model,
( ) ( )or, equivalently ,i i i iEXP βX LN βXλ λ= =
Where:
Xi is a vector of explanatory variables and
β is a vector of estimable coefficients.
In this formulation, the expected number of events per period is given by
[ ] ( )i i iE y EXP Xλ β= =
For model estimation, note the likelihood function is:
( ) ( )ii
L β = P y∏
So, with the Poisson equation,
( ) ( )!
iyi i
i i
E X PL β =
yλ λ−
∏
Since ( )i iEXP Xλ β= ,
( ) ( ) ( )!
iyi i
i i
EXP -EXP βX EXP βXL β =
y⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦∏
Which gives the log-likelihood,
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
26
( ) ( ) ( )1
!n
i i i ii
LL β EXP βX y X LN yβ=
= − + −⎡ ⎤⎣ ⎦∑ .
Poisson Regression Model Goodness of Fit Measures
The likelihood ratio test is a common test used to assess two competing models. It provides
evidence in support of one model
The likelihood ratio test statistic is,
-2[LL(βR) – LL (βU)]
where
LL(βR) is the log-likelihood at convergence of the "restricted" model (sometimes
considered to have all coefficients in β equal to 0, or just to include the constant
term, to test overall fit of the model)
LL(βU) is the log-likelihood at convergence of the unrestricted model.
This statistic is χ2 distributed with the degrees of freedom equal to the difference in the
numbers of coefficients in the restricted an unrestricted model (the difference in the
number of coefficients in the βR and the βU coefficient vectors).
Another measure of overall model fit is the ρ2 statistic. The ρ2 statistic is,
( )( )
2 10
LLLL
βρ = −
Where:
LL(β) is the log-likelihood at convergence with coefficient vector β and
LL(0) is the initial log-likelihood (with all coefficients set to zero).
The perfect model would have a likelihood function equal to one (all selected alternative
outcomes would be predicted by the model with probability one, and the product of these
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
27
across the observations would also be one) and the log-likelihood would be zero giving a
ρ2 of one
The ρ2 statistic will be between zero and one and the closer it is to one, the more
variance the estimated model is explaining.
Truncated Poisson Regression Model
Truncation of data can occur in the routine collection of transportation data.
Example, if the number of times per week an in-vehicle navigation system is used on the
morning commute to work, during weekdays, the data are right truncated at 5, which is the
maximum number of uses in any given week.
Estimating a Poisson regression model without accounting for this truncation will result in
biased estimates of the parameter vector β, and erroneous inferences will be drawn.
Fortunately, the Poisson model is adapted easily to account for such truncation. The right-
truncated Poisson model is written as:
( ) ( )0
! !i i
i
ry m
i i i i im
P y = y mλ λ=
⎡ ⎤⎡ ⎤ ⎢ ⎥⎣ ⎦
⎣ ⎦∑ ,
Where:
P(yi) is the probability of commuter i using the system yi times per week,
λi is the Poisson parameter for commuter i;
mi is the number of uses per week;
and r is the right truncation (in this case, 5 times per week).
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
28
Negative Binomial Regression Model
Poisson distribution that restricts the mean and variance to be equal:
E[yi] = VAR[yi].
If this equality does not hold, the data are said to be under dispersed (E[yi] > VAR[yi]) or
overdispersed (E[yi] < VAR[yi]), and the coefficient vector will be biased if corrective
measures are not taken.
To account for cases when E[yi] ≠ VAR[yi], a negative binomial model is used.
The negative binomial model is derived by rewriting the λi equation such that,
λi = EXP(βXi + εi)
where EXP(εi) is a Gamma-distributed error term with mean 1 and variance α2.
The addition of this term allows the variance to differ from the mean as below,
VAR[yi] = E[yi][1+ αE[yi]] = E[yi]+ αE[yi]2
The Poisson regression model is regarded as a limiting model of the negative binomial
regression model as α approaches zero, which means that the selection between these two
models is dependent upon the value of α.
The parameter α is referred to as the overdispersion parameter.
The negative binomial distribution has the form,
1((1 ) ) 1( )
(1 ) ! (1 ) (1 )
iy
i ii
i i i
yP yy
αΓ α α λΓ α α λ α λ
⎛ ⎞ ⎛ ⎞+= ⎜ ⎟ ⎜ ⎟+ +⎝ ⎠ ⎝ ⎠
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
29
where Γ(.) is a gamma function. This results in the likelihood function,
1((1 ) ) 1( )
(1 ) ! (1 ) (1 )
iy
i ii
i i i i
yLy
αΓ α α λλΓ α α λ α λ
⎛ ⎞ ⎛ ⎞+= ⎜ ⎟ ⎜ ⎟+ +⎝ ⎠ ⎝ ⎠∏
Zero-Inflated Poisson and Negative Binomial Regression Models
Zero events can arise from two qualitatively different conditions.
1. One condition may result from simply failing to observe an event during the observation
period.
2. Another qualitatively different condition may result from an inability to ever experience
an event.
Two states can be present, one being a normal count-process state and the other being a zero-
count state.
A zero-count state may refer to situations where the likelihood of an event occurring is
extremely rare in comparison to the normal-count state where event occurrence is inevitable and
follows some know count process
Two aspects of this non qualitative distinction of the zero state are noteworthy:
1. There is a preponderance of zeroes in the data—more than would be expected under a
Poisson process.
2. A sampling unit is not required to be in the zero or near zero state into perpetuity, and
can move from the zero or near zero state to the normal count state with positive
probability.
Data obtained from two-state regimes (normal-count and zero-count states) often suffer from
overdispersion if considered as part of a single, normal-count state because the number of zeroes
is inflated by the zero-count state.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
30
Zero-inflated Poisson (ZIP)
Assumes that the events, Y = (y1, y2,……,yn), are independent and the model is
( ) ( )
( ) ( )
0 with probability 1
1 with probability
!
i i i i
yi i i
i
y p p EXP
p EXPy y
y
λ
λ λ
= + − −
− −=
.
where y is the number of events per period.
Zero-inflated negative binomial (ZINB)
regression model follows a similar formulation with events, Y = (y1, y2,……, yn ), being
independent and,
( )
( )
1
1
1
0 with probability 11
1 (1 ) with probability 1 , =1, 2, 3...
1 !
i i i
i
yi i
i i
y p p
y u uy y p y
y
α
α
α
λα
Γα
Γα
⎡ ⎤⎢ ⎥
= + − ⎢ ⎥⎛ ⎞⎢ ⎥+⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦
⎡ ⎤⎛ ⎞⎛ ⎞ + −⎜ ⎟⎢ ⎥⎜ ⎟⎝ ⎠⎝ ⎠⎢ ⎥= −⎛ ⎞⎢ ⎥⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦
where ( ) ( )1 1i iu α α λ= +⎡ ⎤⎣ ⎦ .
Zero-inflated models imply that the underlying data-generating process has a splitting
regime that provides for two types of zeros.
The splitting process can be assumed to follow a logit (logistic) or probit (normal)
probability process, or other probability processes.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
31
A point to remember is that there must be underlying justification to believe the splitting
process exists (resulting in two distinct states) prior to fitting this type of statistical
model. There should be a basis for believing that part of the process is in a zero-count
state.
To test the appropriateness of using a zero-inflated model rather than a traditional model, Vuong
(1989) proposed a test statistic for non-nested models that is well suited for situations where the
distributions (Poisson or negative binomial) are specified. The statistic is calculated as (for each
observation i),
( )( )
1
2
i ii
i i
f y Xm LN
f y X⎛ ⎞
= ⎜ ⎟⎜ ⎟⎝ ⎠
where:
f1(yi|Xi) is the probability density function of model 1, and
f2(yi|Xi) is the probability density function of model 2.
Using this, Vuongs' statistic for testing the non-nested hypothesis of model 1 versus model 2
is (Greene, 2000; Shankar et al., 1997),
( )( )1
2
1
1
1
n
ii
nm
ii
n m n mnVS
m mn
=
=
⎡ ⎤⎛ ⎞⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦= =
⎛ ⎞ −⎜ ⎟⎝ ⎠
∑
∑
Where: m is the mean ( ( )1
1n
ii
n m=∑ ), Sm is standard deviation,
Vuongs' value is asymptotically standard normal distributed (to be compared to z-values),
and
if V is less than Vcritical (1.96 for a 95% confidence level), the test does not support the
selection of one model over another.
Large positive values of V greater than Vcritical favor model 1 over model 2, whereas large
negative values support model 2.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
32
Because overdispersion will almost always include excess zeros, it is not always easy to
determine whether excess zeros arise from true overdispersion or from an underlying
splitting regime.
This could lead one to erroneously choose a negative binomial model when the correct
model may be a zero-inflated Poisson.
The use of a zero-inflated model may be simply capturing model mispecification that could
result from factors such as unobserved effects (heterogeneity) in the data.
Discrete Outcome Models
Examples of discrete data (unordered):
Mode of travel (automobile, bus, rail transit),
Type or class of vehicle owned, and
Type of a vehicular accident (run-off-road, rear-end, head-on, etc.).
Examples of discrete data (ordered):
telecommuting-frequency data that have outcomes of never, sometimes, and
frequently
In contrast to data that are not ordered, ordinal discrete data possess additional
information on the ordering of responses that can be used to improve the
efficiency of the model’s parameter estimates
Models of Discrete Data
For unordered discrete outcomes, start with a linear function of covariates that influences
specific discrete outcomes.
For example, in the event of a vehicular accident, possible discrete crash outcomes are rear-
end, sideswipe, run-off-road, head-on, turning, and other.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
33
Let Tin be a linear function that determines discrete outcome i for observation n such that,
Tin = βi Xin ,
Where:
βi is a vector of estimable parameters for discrete outcome i,
Xin is a vector of the observable characteristics (covariates) that determine discrete
outcomes for observation n.
To arrive at a statistically estimable probabilistic model, a disturbance term εin is
added, giving
Tin = βi Xin + εin .
Reasons for adding a disturbance term:
1. variables have been omitted from the function (some important data may not be
available),
2. the functional form may be incorrectly specified (it may not be linear),
3. proxy variables may be used (variables that approximate missing variables in the
database),
4. variations in βi that are not accounted for (βi may vary across observations).
To derive an estimable model of discrete outcomes with I denoting all possible outcomes for
observation n, and Pn(i) being the probability of observation n having discrete outcome i (i
∈ I)
Pn(i) = P(Tin ≥ TIn) ∀ I ≠ i .
By substituting for Tin,
Pn(i) = P(βi Xin + εin ≥ βI XIn + εIn) ∀ I ≠ i
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
34
or,
Pn(i) = P(βi Xn − βI Xn ≥ εIn − εin) ∀ I ≠ i .
Estimable models are developed by assuming a distribution of the random disturbance
term, ε′s.
Binary and Multinomial Probit Models
Probit models arise when the disturbance term εIn is assumed to be normally distributed.
In the binary case (two outcomes, denoted 1 or 2)
Pn(1) = P(β1 X1n − β2X2n ≥ ε2n − ε1n)
This equation estimates the probability of outcome 1 occurring for observation n, where
ε1n and ε2n are normally distributed with mean = 0, variances σ 21 and σ 22 respectively
and the covariance is σ12.
An attractive feature of normally distributed variates is that the addition or
subtraction of two normal variates also produces a normally distributed variate.
In this case ε2n − ε1n is normally distributed with mean zero and variance
σ 21 + σ22 - σ12. The resulting cumulative normal function is
( )( )1 1 2 2
21 1122
n nX X
nP E X P w d wβ β σ
π
−
−∞
⎡ ⎤= −⎢ ⎥⎣ ⎦∫
If Φ ( ) is the standardized cumulative normal distribution, then
( ) 1 1 2 21 n nn
X XP −⎛ ⎞= ⎜ ⎟⎝ ⎠β βΦ
σ
where σ = (σ 21 + σ 22 - σ12)0.5.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
35
The term 1/σ is a scaling of the function determining the discrete outcome and can be set to
any positive value, although σ = 1 is typically used.
General shape of probit outcome probabilities.
The parameter vector (β) is readily estimated using standard maximum likelihood methods.
If δin is defined as being equal to 1 if the observed discrete outcome for observation n is i
and zero otherwise, the likelihood function is
( )1 1
inN I
n= i=
L P i=∏∏ δ,
Pn (1)
β1 X1n − β2X2n
0.5
1.0
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
36
where N is the total number of observations. In the binary case with i = 1 or 2, the log-
likelihood is,
( )1 1 2 2 1 1 2 21 1
1
1N
n n n nn n
n=
X X X XLL LN LNβ β β βδ Φ δ Φσ σ− −⎛ ⎞⎛ ⎞ ⎛ ⎞= + −⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
∑
The problem with the multinomial probit is that the outcome probabilities are not closed
form and estimation of the likelihood functions requires numerical integration.
The difficulties of extending the probit formulation to more than two discrete outcomes have
lead researchers to consider other disturbance term distributions.
Multinomial Logit Model
From a model estimation perspective, a desirable property of an assumed distribution of
disturbances (ε′s) is that the maximums of randomly drawn values from the distribution
have the same distribution as the values from which they were drawn.
The normal distribution does not posses this property (the maximums of randomly drawn
values from the normal distribution are not normally distributed).
A disturbance term distribution with such a property greatly simplifies model
estimation because it could be applied to the multinomial case by replacing β2X2n with
the highest value (maximum) of all other βIXIn ≠ 1.
Distributions of the maximums of randomly drawn values from some underlying distribution
are referred to as extreme value distributions (Gumbel, 1958).
Extreme value distributions are categorized into three families: Type 1, Type 2, and Type 3
(see Johnson and Kotz, 1970).
The most common extreme value distribution is the Type 1 distribution (sometimes
referred to as the Gumbel distribution). It has the desirable property that maximums
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
37
of randomly drawn values from the extreme value Type 1 distribution are also extreme
value Type 1 distributed.
The probability density function of the extreme value Type 1 distribution is,
( ) ( )( ) ( )( )( )f EXP - - EXP -EXP - -ε η η ε ω η ε ω=
with corresponding distribution function
( ) ( )( )( )F EXP -EXP - -ε η ε ω=
where: η is a positive scale parameter, ω is a location parameter (mode), and the mean is
ω + 0.5772/η.
To derive an estimable model based on the extreme value Type 1 distribution, a revised
version of the probability equation is
( ) ( )maxn i in in I In In I iP i P X X
∀ ≠
⎛ ⎞= + ≥ +⎜ ⎟⎜ ⎟
⎝ ⎠β ε β ε
For the extreme value Type 1 distribution, if all εIn are independently and identically
(same variances) distributed random variates with modes ωIn and a common scale
parameter η (which implies equal variances), then the maximum of βI XIn + εIn’s is
extreme value Type 1 distributed with mode
( )1I In
I i
LN EXP Xηβη ∀ ≠
∑
and scale parameter η (see Gumbel 1958).
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
38
Illustration of an extreme value Type I distribution.
If εn' is a disturbance term associated with the maximum of all possible discrete
outcomes ≠ i with mode equal to zero and scale parameter η, and
β 'Xn' is the parameter and covariate product associated with the maximum of all possible
discrete outcomes ≠ i, then it can be shown that
f (x)
-2 0 2 4 6 x
ω = 0
η = 2
η = 1
η = 0.5
0.8 0.8
0.6
0.4
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
39
( )1' 'n I In
I i
X LN EXP Xβ ηβη ∀ ≠
= ∑
This result arises because for extreme value Type 1 distributed variates, ε, the addition
of a positive scalar constant say, a, changes the mode from ω to ω + a without affecting
the scale parameter η.
So, if εn' has mode equal to zero and scale parameter η, adding the scalar
( )1I I n
I i
L N E X P Xη βη ∀ ≠
∑ gives an extreme value distributed variate with mode (β 'Xn')
equal to ( )1I In
I i
LN EXP Xηβη ∀ ≠
∑ and scale parameter η.
Using these results, the probability equation is written as,
( ) ' ' 'n i in in n nP i P β X X
⎛ ⎞⎜ ⎟= + ≥ +⎜ ⎟⎝ ⎠
ε β ε
or,
( ) 0' ' 'n n n i in in P i P X - βX β ε ε⎛ ⎞= + + ≤⎜ ⎟
⎝ ⎠
And, because the difference between two independently distributed extreme value Type 1
variates with common scale parameter η is logistic distributed,
( )( )1
1n ' 'n i in
P i EXP X - β Xη β
=⎡ ⎤+ ⎣ ⎦
rearranging terms,
( ) ( )( ) ( )
i inn ' '
i in n
EXP XP i
EXP X EXP X
η β
η β η β
⎡ ⎤⎣ ⎦=⎡ ⎤+⎡ ⎤⎣ ⎦ ⎣ ⎦
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
40
Substituting with ( )1' 'n I In
I iX L N E X P Xβ η β
η ∀ ≠
= ∑ and setting η = 1 (there is no loss of
generality) the equation becomes
( ) [ ]
[ ] ( )i in
n
i in I In I i
EXP XP i
EXP X EXP LN exp X
β
β β∀ ≠
=⎡ ⎤+ ⎢ ⎥⎣ ⎦
∑
or,
( ) [ ]( )
i inn
I In I
EXP XP i
EXP Xββ
∀
=∑
which is the standard multinomial logit formulation. For estimation of the parameter vectors
(β’s) by maximum likelihood, the log-likelihood function is,
( )1 1
N I
in i in I Inn= i= I
LL X - LN EXP Xδ β β∀
⎛ ⎞⎡ ⎤= ⎜ ⎟⎢ ⎥⎣ ⎦⎝ ⎠∑ ∑ ∑
where I is the total number of outcomes, δ is as previously defined, and all other variables
are as defined previously.
When applying the multinomial logit model it is important to realize that the choice of the
extreme value Type 1 distribution is made on the basis of computational convenience,
although this distribution is similar to the normal distribution.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
41
Figure 11-3: Comparison of binary logit and probit outcome probabilities.
Discrete Data and Utility Theory
From economics, utility (satisfaction) is maximized subject to the prices of the alternatives
and an income constraint.
Because utility theory consists of decision-makers selecting a utility maximizing alternative
based on prices of alternatives and an income constraint, any purchase affects the remaining
income and thus all purchases are interrelated.
Problem: one theoretically cannot isolate specific choice situations.
Restrictions must be placed on utility functions.
To illustrate these, a utility function is defined that is determined by the consumption of m
goods (y1, y2,…, ym) such that
u = f(y1, y2,…, ym)
P (i)
β1 X1n − β2X2n
0.5
1.0
Logit
Probit
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
42
As an extremely restrictive case it is assumed that the consumption of one good is
independent of the consumption of all other goods. The utility function is then written as
u = f1(y1) + f2(y2) +…..+ fm(ym)
This is referred to as an additive utility function and, in nearly all applications, it is
unrealistically restrictive.
Example: the application of such an assumption implies that the acquisition of two types of
breakfast cereal are independent although it is clear that the purchase of one will affect the
purchase of the other.
A more realistic restriction on the utility function is to separate decisions into groups and to
assume that consumption of goods within the groups is independent of those goods in other
groups.
This is referred to as separability and is an important construct in applied economic theory.
It is this property that permits the focus on specific choice groups such as the choices of
travel mode to work.
Indirect Utility
Normal or direct utility has utility that is maximized subject to an income constraint and this
maximization produces a demand for goods y1, y2,…, ym.
When applying discrete outcome models, the utility function is typically written with prices
and incomes as arguments.
When the utility function is written in this way, the utility function is indirect, and it can be
shown that the relationship between this indirect utility and the resulting demand equation
for some good m is given by Roy's identity
0 mm
Vpy V
Inc
=−
∂∂∂∂
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
43
Where:
V is the indirect utility,
pm is the price of good m,
Inc is the decision-maker's income, and
ym0 is the utility maximizing demand for good m.
Applying the utility framework within discrete outcome models is straightforward.
Using the notation above, T becomes the utility determining the choice (as opposed to a
function determining the outcome).
But the derivations of discrete outcome models imply that the model is compensatory.
Changes in factors that determine the function T in for each discrete outcome do not
matter as long as the total value of the function remains the same.
This is potentially problematic in some utility-maximizing choice situations.
Properties and Estimation of Multinomial Logit Models
Consider a commuter's choice of route from home to work where the choices are to take an
arterial, a two-lane road, or a freeway.
( )a
fa t
V
VV V
eP ae e e
=+ +
, ( )t
fa t
V
VV V
eP te e e
=+ +
, ( )f
fa t
V
VV V
eP fe e e
=+ +
where P(a), P(t) and P(f), are the probabilities that commuter n selects the
arterial, two-lane road and freeway respectively and Va, Vt and Vf are
corresponding indirect utility functions.
Variables defining these functions are classified into two groups:
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
44
1. those that vary across outcome alternatives (in route choice, distance
and number of traffic signals)
2. those that do not vary across outcome alternatives (Commuter income
and other commuter-specific characteristics such as number of children,
number of vehicles, and age of commuting vehicle).
The distinction between these two sets of variables is important, because the
MNL model is derived using the difference in utilities.
Because of this differencing, estimable parameters relating to variables that do
not vary across outcome alternatives can, at most, be estimated in I-1 of the
functions determining the discrete outcome (I is the total number of discrete
outcomes).
The parameter of at least one of the discrete outcomes must be normalized to
zero to make parameter estimation possible (this is illustrated in a forthcoming
example).
Given these two variables types, the utility functions for Equation 11.26 are
defined as
Va = β1a + β2a Xa + β3a Z
Vt = β1t + β2t Xt + β3t Z ,
Vf = β1f + β2f Xf + β3f Z
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
45
Where:
Xa, Xt and Xf are vectors of variables that vary across arterial, two-
lane, and freeway choice outcomes respectively, as experienced by
commuter n,
Z is a vector of characteristics specific to commuter n,
β1's are constant terms,
β2's are vectors of estimable parameters corresponding to outcome-
specific variables in X vectors, and
β3's are vectors corresponding to variables that do not vary across
outcome alternatives.
Note that the constant terms are effectively the same as variables that do not vary
across alternate outcomes and at most are estimated for I-1 of the outcomes.
Statistical Evaluation
To determine if the estimated parameter is significantly different from zero, the t-
statistic is:
( )0 - t
S.E.β
β=
where S.E.(β) is the standard error of the parameter.
Note that because the MNL is derived from an extreme value distribution and not
a normal distribution, the use of t-statistics is not strictly correct although in
practice it is a reliable approximation of the true significance.
A more general and appropriate test is the likelihood ratio test.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
46
The likelihood ratio test statistic is
-2[LL(βR) – LL(βU)]
where LL(βR) is the log-likelihood at convergence of the "restricted" model and
LL(βU) is the log-likelihood at convergence of the "unrestricted" model.
This statistic is χ2 distributed with degrees of freedom equal to the difference in
the numbers of parameters between the restricted an unrestricted model (the
difference in the number of parameters in the βR and the βU parameter vectors).
Overall model fit is the ρ2 statistic (it is similar to R2 in regression models in
terms of purpose). The ρ2 statistic is:
( )( )
2 10
LLLL
βρ = −
where LL(β) is the log-likelihood at convergence with parameter vector β and
LL(0) is the initial log-likelihood (with all parameters set to zero).
As is the case with R2 in regression analysis, the disadvantage of the ρ2 statistic is
that it will always improve as additional parameters are estimated even though
the additional parameters may be statistically insignificant.
To account for the estimation of potentially insignificant parameters a corrected
ρ2 is estimated as
( )( )
2corrected 10
LL - KLLβ
ρ = −
where K is the number of parameters estimated in the model.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
47
Interpretation of Findings
Elasticity
Elasticity is computed from the partial derivative for each observation n (n
subscripting omitted):
( ) ( )( )ki
P i kix
ki
P i xEx P i
∂= ×
∂
Where:
P(i) is the probability of outcome i
xki is the value of variable k for outcome i.
It is readily shown by taking the partial derivative of the MNL model that
Equation 11.33 becomes
( ) ( )1ik
P ix ki kiE P i x= −⎡ ⎤⎣ ⎦ β
Elasticity values are interpreted as the percent effect that a 1% change in xki has
on the outcome probability P(i).
If the computed elasticity value is less than one, the variable xki is said to be
inelastic and a 1% change in xki will have less than a 1% change in outcome i's
selection probability.
If the computed elasticity is greater than one it is said to be elastic and a 1%
change in xki will have more than a 1% change in outcome i's selection
probability.
Key points:
1. The values are point elasticity's and as such are valid only for small changes xik and
considerable error may be introduced when an elasticity is used to estimate the probability
change caused by a doubling of xki.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
48
2. Elasticities are not applicable to indicator variables
Some measure of the sensitivity of indicator variables is made by computing a
pseudo-elasticity. The equation is
( ) ( )
( ) ( ) ( )
Δ1
Δ
i i kI kI
kii i kI kI kI kI
n n
P ( i ) Ix
I I I
EXP X EXP xE
EXP X EXP x EXP x
β β
β β β∀
∀ ∀ ≠
⎡ ⎤⎣ ⎦= −
+⎡ ⎤⎣ ⎦
∑∑ ∑
,
where In is the set of alternate outcomes with xk in the function determining the outcome,
and I is the set of all possible outcomes.
Cross-elasticity
It may be of interest to determine the effect of a variable influencing the probability of
outcome j may have on the probability of outcome i.
Example, how a change in the distance on the arterial affects the probability of the two-lane
road being chosen.
Known as a cross-elasticity, this value is computed using the equation
( ) ( )ik
P ix jk jkE P j x= − β
Note that this equation implies that there is one cross elasticity for all i (i ≠ j). This means
that an increase in distance on the arterial results in an equal increase in the likelihood that
the two-lane and freeway alternatives will be chosen.
This property of uniform cross elasticities is an artifact of the error term independence
assumed in deriving the multinomial logit model.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
49
Marginal Rates of Substitution (MRS)
Because logit models are compensatory, marginal rates of substitution are computed to
determine the relative magnitude of any two parameters estimated in the model.
In MNL models, this rate is computed simply as the ratio of parameters for any two
variables in question (in this case a and b)
( ) iaba
ib
MRS i =ββ
Specification Errors
Independence of Irrelevant Alternatives (IIA) property.
Recall that a critical assumption in the derivation of the multinomial logit model
is that the disturbances (ε′s) are independently and identically distributed.
When this assumption does not hold, a major specification error results.
This problem arises when only some of the functions, which determine possible
outcomes, share unobserved elements (that show up in the disturbances).
If all outcomes shared the same unobserved effects, the problem would self
correct because in the differencing of outcome functions common unobserved
effects would cancel out.
Because the common elements cancel in the differencing, a logit model with only
two outcomes can never have an IIA violation.
To illustrate the IIA problem, note the ratio of any two-outcome probabilities is
independent of the functions determining any other outcome since
( )( )
[ ][ ]
1 1
2 2
12
P E X P β X
P E X P β X=
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
50
Problem: consider the estimation of a model of choice of travel mode to work where the
alternatives are to take a personal vehicle, a red transit bus, or a blue transit bus.
The red and blue transit buses clearly share unobserved effects that will appear in
their disturbance terms and they will have exactly the same functions (βrbXrb =
βbbXbb) if the only difference in their observable characteristics is their color.
For illustrative purposes, assume that, for a sample commuter, all three modes
have the same value of βiXi 's (the red bus and blue bus will, and assume that
costs, time, and other factors that determine the likelihood of the personal vehicle
being chosen works out to the same value as the buses).
Then the predicted probabilities will give each mode a 33% chance of being
selected.
This is unrealistic since the correct answer should be a 50% chance of taking a
personal vehicle and a 50% chance of taking a bus (both red and blue bus
combined) and not 33.33% and 66.67% respectively as the MNL would predict.
Most applications the IIA violation is more subtle than in the previous example.
There are a number statistical tests that are conducted to test for IIA violations.
One of the more common of these tests was developed by Small and Hsiao (1985). The
procedure is to first split the data randomly into two samples (NA and NB) containing the
same number of observations. Two separate models are then estimated producing
parameter estimates βA and βB. A weighted average of these parameters is obtained from
( ) ( )1 2 1 1 2AB A B = + −β β β
Then, a restricted set of outcomes, D, is created as a sub-sample from the full set of
outcomes. The sample NB is then reduced to include only those observations in which the
observed outcome lies in D.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
51
Two models are estimated with the reduced sample (NB') using D as if it were the entire
outcome set ( B' in superscripting denotes the sample reduced to observations with outcomes
in D).
One model is estimated by constraining the parameter vector to be equal to βAB as computed
above. The second model estimates the unconstrained parameter vector βB'.
The resulting log-likelihoods are used to evaluate the suitability of the model structure by
creating a chi squared statistic with the number of degrees of freedom equal to the number
of parameters in βAB (also the same number as in βB'). This statistic is
χ2 = -2[LLB'(βAB) – LLB'(β B')]
The test is then repeated by interchanging the roles of the NA and NB sub-samples (reducing
the NA sample to observations were the observed outcomes lie in D and proceed). Using the
same notation, Equation 11.39 becomes
( ) ( )1 2 1 1 2BA B A = + −β β β
and the chi-squared statistic is
χ2 = -2[LLA'(βBA) – LLA'(βA')]
Other Specification Errors
• Omitted variables.
Results in inconsistent estimates of logit model parameters and choice probabilities if
any of the following hold:
1) the omitted variable is correlated with other variables included in the model,
2) the mean values of the omitted variable vary across alternate outcomes and
outcome specific constants are not included in the model, or
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
52
3) the omitted variable is correlated across alternate outcomes or has a different
variance in different outcomes.
Because one or more of these conditions are likely to hold, omitting relevant variables is
a serious specification problem.
• Presence of an irrelevant variable.
Estimates of parameter and choice probabilities remain consistent in the presence of an
irrelevant variable but the standard errors of the parameter estimates will increase (loss
of efficiency).
• Disturbances that are not independently and identically distributed (IID).
Dependence among a subset of possible outcomes causes the IIA problem resulting in
inconsistent parameter estimates and outcome probabilities.
Having disturbances with different variances (not identically distributed) also results in
inconsistent parameter estimates and outcome probabilities.
• Random parameter variations.
Standard MNL estimation assumes that the estimated parameters are the same for all
observations.
Random parameter estimates give inconsistent estimates of parameters and outcome
probabilities.
• Correlation between explanatory variables and disturbances and endogenous
variables.
If a correlation exists between X and ε then parameter estimates will be inconsistent.
• Erroneous data.
If erroneous data are used, parameter and outcome probabilities will be incorrectly
estimated (also erroneous).
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
53
• State dependence and heterogeneity.
A potential estimation problem can arise in discrete outcome models if information on
previous outcomes is used to determine current outcome probabilities. This may be
capturing:
1. Important habitual behavior (state dependence)
2. residual heterogeneity, which would lead one to observe spurious state
dependence.
Endogeniety in Discrete Outcome Models
Can argue that price is endogenous in a vehicle choice model for example (as it would
surely be in an aggregate regression model of vehicle demand).
Logic: the effect of any single observation on vehicle price is infinitesimal, many have
contended that price is exogenous for estimation purposes.
Still, if one were to forecast used-vehicle market shares using the model (using the
summation of individual outcome probabilities as a basis), vehicle prices would need to
be forecasted internally as a function of total vehicle demand.
Authors of recent work have argued that variables such as price still present a
problem in individual discrete outcome models. This is because prices tend to be
higher for products that have attributes that are observed by the consumer and not
by the analyst.
Data Sampling
There are two general types of sampling strategies for collecting data to estimate discrete
outcome models, random and stratified random sampling.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
54
All standard MNL model derivations assume that the data used to estimate the model are
drawn randomly from a population of observations.
Stratified sampling refers to a host of non-random sampling alternatives. The idea of
stratified sampling is that some known population of observations is partitioned into
subgroups (strata) and random sampling is conducted in each of these subgroups.
This type of sampling is particularly useful when one wants to gain information on a
specific group that is a small percentage of the total population of observations (such as
transit riders in the choice of transportation mode or households with incomes exceeding
$200,000 per year).
Note that random sampling is a special case of stratified sampling in that the number of
observations chosen from each strata is in exact proportion to the size of the strata in the
population of observations.
Four special cases of stratified sampling
1. Exogenous sampling refers to sampling that in which selection of the strata is based on
values of the X (right-hand side such as income) variables. In such cases, the standard
maximum likelihood estimation (treating the sample as though it was a random sample)
is appropriate.
2. Outcome-based sampling may be used to get a sufficient representation of a specific
outcome or may be an artifact of the data-gathering process.
If the proportions of outcomes in the sample are not equal to the proportions of
outcomes in the overall population, an estimation correction must be made.
The correction is straightforward providing that a full set of outcome-specific
constants is specified in the model (since constants do not vary across alternate
outcomes this means that I-1 constants must be specified, where I is the set of
outcomes).
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
55
Under these conditions, standard MNL estimation correctly estimates all
parameters except for the outcome-specific constants. To correct the constant
estimates, each constant must have the following subtracted from it:
i
i
S FL NP F
⎛ ⎞⎜ ⎟⎝ ⎠
Where:
SFi is the fraction of observations having outcome i in the sample
PFi is the fraction of observations having outcome i in the total
population.
3. Enriched sampling is the merging of a random (or random stratified) sample with a
sample of another type.
Example: a random sample of route choices may be merged with a sample of
commuters observed taking one of the routes such as the freeway.
Some types of enriched sampling problems reduce to the same correction used
for the outcome-based samples. Others may result in estimation complications.
4. Double sampling usually refers to the process where information from a random sample
is used to direct the gathering of additional data often targeted at oversampling
underrepresented components of the population.
Estimation of MNL models with double sampling complicates the likelihood
function. The reader is referred to Manski and Lerman (1977), Manski and
McFadden (1981) and Coslett (198l) for details on estimation alternatives in
sampling and additional sampling strategies.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
56
Forecasting and Aggregation Bias
When using logit models for forecasting there is a potential for bias if averages of X are
used because of the nonlinearly of the model.
avgx
Example of population aggregation bias.
all bias in estimating outcome shares in a population are eliminated by the equation
( ) ( )i
iX
S g X h X dX= ∫
where:
Si is the population share of outcome i,
0 xa x
1.0
xb
P (i|xb)
P (i|xa)
abP (i|xab) bias
P (i|xavg)
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
57
gi(X) is the functional representation of the model,
h(X) is the distribution of model variables over the population.
The problem in applying this equation is that h(X) is not completely known.
However, there are four common approximations to this equation:
1. Sample enumeration.
This procedure involves using the same sample that was used to estimate
the model to predict the population probabilities.
Outcome probabilities for all observations in the sample are computed
and these probabilities are averaged to approximate the true population
outcome probabilities for each alternative.
2. Density functions.
Constructing density function approximations of x's can approximate the
equation.
The advantage of this approach is that it has great flexibility in applying
results to different populations.
The disadvantages are that the functions are often difficult to construct on
theoretical or empirical grounds and it is difficult to capture covariances
among x's.
3. Distribution moments.
This approach attempts to describe h(X) by considering moments and
cross moments to represent the spread and shape of variable distributions
and their interactions in the population.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
58
The major limitation of this approach is the gathering of theoretical
and/or empirical information to support the representation of moments
and cross moments.
4. Classification.
The classification approach attempts to categorize the population into
nearly homogeneous groups and use averages of x's from these groups.
This approach is easy to apply but the assumption of homogeneity among
population groups is often dubious and can introduce considerable error.
Transferability
A concern with all models is whether or not their estimated parameters are transferable
spatially (among regions or cities) or temporally (over time).
From a spatial perspective, transferability is desirable because it means that parameters of
models estimated in other places are used, thus saving the cost of additional data collection
and estimation.
Temporal transferability ensures that forecasts made with the model have some validity in
that the estimated parameters are stable over time.
When testing spatial and temporal transferability, likelihood ratio tests are applied,
-2[LL(βT) – LL(βa) – LL(βb)]
where:
LL(βT) is the log-likelihood at convergence of the model estimated
with the data from both regions (or both time periods),
LL(βa) is the log-likelihood at convergence of the model using region
a data, and
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
59
LL(βb) is the log-likelihood at convergence of the model using region
b data.
In this test the same variables are used in all three models (total model, region a model,
and region b model). This statistic is χ2 distributed with degrees of freedom equal to the
summation of the number of estimated parameters in all regional models (a and b in this
case but additional regions can be added to this test) minus the number of estimated
parameters in the overall model.
The resulting χ2 statistic provides the probability that the models have different
parameters. Alternatively, one could conduct the following test,
-2[LL(βba) – LL(βa)]
where
LL(βba) is the log-likelihood at convergence of a model using the
converged parameters from region b (using only region b's data) on
region a's data (restricting the parameters to be region b's estimated
parameters),
LL(βa) is the log-likelihood at convergence of the model using region
a data.
This test can also be reversed using LL(βab) and LL(βb).
The statistic is χ2 distributed with the degrees of freedom equal to the number of
estimated parameters in βba and the resulting χ2 statistic provides the probability
that the models have different parameters.
The Nested Logit Model (Generalized Extreme Value Models)
To overcome the IIA problem, the idea behind a nested logit model is to group alternate
outcomes suspected of sharing unobserved effects into nests (this sharing sets up the
disturbance term correlation that violates the derivation assumption).
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
60
Because the outcome probabilities are determined by differences in the functions
determining these probabilities (both observed and unobserved), shared unobserved effects
will cancel out in each nest providing that all alternatives in the nest share the same
unobserved effects..
This canceling out will not occur if a nest (group of alternatives) contains some alternative
outcomes that share unobserved effects and others that do not (this sets up an IIA violation
in the nest).
Suppose it is suspected that the arterial and two-lane road share unobserved elements (being
lower level roads relative to the freeway with no access control, lower design speeds). When
developing a nested structure to deal with the suspected disturbance term correlation, a structure
shown visually in the Figure is used.
By grouping the arterial and two-lane road in the same nest their shared unobserved
elements cancel.
arterial two-lane
freeway
non-freeway
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
61
Mathematically, McFadden (1981) has shown the GEV disturbance assumption leads to the
following model structure for observation n choosing outcome i
Pn(i) = EXP[βiΧin + φi Lin] / I∀∑ EXP[βIΧIn +φI LSIn]
Pn(j|i) = EXP[βj|i Χn] / J∀∑ EXP[βJ|i ΧJn ]
LSin = LN[J∀∑ exp(βJ|i ΧJn )]
where
Pn(i) is the unconditional probability of observation n having discrete
outcome i,
Χ's are vectors of measurable characteristics that determine the
probability of discrete outcomes,
β's are vectors of estimable parameters,
Pn(j|i) is the probability of observation n having discrete outcome j
conditioned on the outcome being in outcome category i (for example,
for the nested structure shown in the Figure the outcome category i
would be non-freeway) and
Pn(j|i) would be the binary logit model of the choice between the
arterial and two-lane road), J is the conditional set of outcomes
(conditioned on i), I is the unconditional set of outcome categories
(the upper two branches of Figure 11-5),
LSin is the inclusive value (logsum), and
φi is an estimable parameter.
Note that this equation system implies that the unconditional probability of having outcome j is,
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
62
Pn(j) = Pn(i) × Pn(j|i)
Estimation of a nested model is usually done in a sequential fashion.
1. Estimate the conditional model using only the observations in the sample that are
observed having discrete outcomes J. In the example illustrated in the Figure
this is a binary model of commuters observed taking the arterial or the freeway.
2. Once these estimation results are obtained, the logsum is calculated (this is the
denominator of one or more of the conditional models) for all observations, both
those selecting J and those not (for all commuters in our example case).
3. These computed logsums (in our example there is just one logsum) are used as
independent variables in the functions. Note that not all unconditional outcomes
need to have a logsum in their respective functions (the example shown in the
Figure would only have a logsum present in the function for the non-freeway
choice).
Caution needs to be exercised when using the sequential estimation procedure described
above because results in variance-covariance matrices that are too small and thus t-
statistics are inflated (typically by about 10-15%). This problem is resolved by estimating
the entire model at once using full information maximum likelihood.
It is important to note that the interpretation of the estimated parameter associated with
logsums (φi's) has the following important elements:
1. φi's must be greater than 0 and less than 1 in magnitude to be consistent with the nested
logit derivation.
2. If φi = 1, the assumed shared unobserved effects in the nest are not significant and the
nested model reduces to a simple MNL. Test with:
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
63
( )1 - t
S.E.β
β=
3. If φi is less than zero then factors increasing the likelihood of an outcome being chosen
in the lower nest will decrease the likelihood of the nest being chosen.
4. If φi is equal to zero then changes in nest outcome probabilities will not affect the
probability of nest selection and the correct model is recursive.
Special Properties of Logit Models
1. Sub-sampling of alternate outcomes for model estimation
There are many discrete outcome situations where the number of alternate
outcomes is very large.
The independently identically distributed extreme value distribution used in the
derivation of the multinomial logit model permits consistent estimation of model
parameters using a sub-sample of the available outcome set.
The estimation procedure is one of reducing the set of outcomes available to each
observation to a manageable size.
In doing this, one must include the outcome observed for each observation in the
estimation sample and this is supplemented with additional outcome possibilities
that are selected randomly from the complete outcome set (a different set of
randomly chosen outcomes is generated for each observation).
2. Compensating Variation
When logit models are coupled with the theory of utility maximization, the
denominator is used to compute important welfare effects.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
64
The basis for this calculation is concept of compensating variation (CV), which is
the (hypothetical) amount of money that individuals would be paid (or debited)
to make them as well off after a change in X as they were prior to the change in
X.
The compensating variation for each observation n is
( ) ( )1f
I In
oI In
X
I In I X
CV - LN EXP Xβ
β
λ β∀
⎡ ⎤= − ⎢ ⎥⎣ ⎦∑
where
λ is the marginal utility of income,
Xo refers to initial values of X,
Xf refers to final values of X (after a policy change),
all other terms are as defined previously.
In most applications the marginal utility of income is equal in magnitude but
opposite in sign to the cost parameter associated with alternate outcomes—the
cost parameter estimated in the discrete outcome models.
Models of Ordered Discrete Data
In many transportation applications discrete data are ordered.
Examples:
quantitative ratings (on a scale from 1 to 10 rate the following),
ordered opinions (do you disagree, are neutral, or agree), or
categorical data (property damage only crash, injury crash, and fatal crash).
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
65
While these response data are discrete, application of the standard or nested multinomial
discrete models presented earlier do not account for the ordinal nature of the discrete
data and thus the ordering information is lost.
Ordered probability models are derived by defining an unobserved variable, z, that is
used as a basis for modeling the ordinal ranking of data.
z = βX + ε
where:
X is a vector of variables determining the discrete ordering for
observation n,
β is a vector of estimable parameters,
ε is a random disturbance.
Using this equation, observed ordinal data, y, for each observation are defined as,
y = 1 if z ≤ μ0
y = 2 if μ0 < z ≤ μ1
y = 3 if μ1 < z ≤ μ2
y = ...
y = I if z ≥ μI -1 ,
where μ's are estimable parameters (referred to as thresholds) that define y,
which corresponds to integer ordering, and I is the highest integer ordered
response.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
66
Note that during estimation, non-numerical orderings such as never, sometimes,
and frequently are converted to integers (for example, 1, 2, and 3) without loss of
generality.
The μ's are parameters that are estimated jointly with the model parameters (β).
The estimation problem then becomes one of determining the probability of I
specific ordered responses for each observation n.
This determination is accomplished by making an assumption on the distribution
of ε.
If ε is assumed to be normally distributed across observations with mean = 0 and
variance = 1, an ordered probit model results with the ordered selection
probabilities being
P(y = 1) = Φ(-βX)
P(y = 2) = Φ(μ1-βX) - Φ(βX)
P(y = 3) = Φ(μ2-βX) - Φ(μ1-βX)
..
..
..
P(y = I) = 1 - Φ(μI -1-βX)
where Φ(.) is the cumulative normal distribution,
( ) 21 122
u
u EXP w dwΦπ −∞
⎡ ⎤= −⎢ ⎥⎣ ⎦∫
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
67
The threshold μ0 is set equal to zero without loss of generality (this implies that one need
only estimate I - 2 thresholds). So,
Illustration of an ordered probability model with μ0= 0.
Note that below shows that a positive value of βk implies that an increase in xk will
unambiguously increase the probability that the highest ordered discrete category results (y
= 5 in the Figure) and unambiguously decreases the probability that the lowest ordered
discrete category results (y = 1 in the Figure).
f (ε )
- βX
0
ε
0.1
0.2
0.3
μ1- βX μ2- βX μ3- βX
y = 1
y = 2
y = 3
y = 4
y = 5
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
68
Illustration of an ordered probability models with an increase in βX (with μ0= 0).
The problem with ordered probability models is associated with the interpretation of
intermediate categories, (y = 2, y = 3, and y = 4 in the Figure).
Depending on the location of the thresholds, it is not necessarily clear what effect a positive
or negative βk has on the probabilities of these "interior" categories.
This difficulty arises because the areas between the shifted thresholds may yield increasing
or decreasing probabilities after shifts to the left or right.
- βX
0
ε
0.1
0.2
0.3
μ1- βX μ2- βX μ3- βX
y = 1
y = 2
y = 3
y = 4
y = 5
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
69
Discrete/Continuous Models
Interrelated discrete and continuous data. Examples: 1. consumers’ choice of the type of vehicle to own (discrete) and the number of
kilometers to drive it (continuous),
2. choice of route (discrete) and driving speed (continuous), and
3. choice of trip-generating activity (discrete) and duration in the activity
(continuous).
Interrelated discrete/continuous data can be easily overlooked and sometimes
difficult to identify.
The Discrete/Continuous Modeling Problem
Interrelated discrete/continuous data is a problem of selectivity, with observed data
being an outcome of a selection process that results in a non-random sample of data
in observed discrete categories.
+ +
+ +
+ +
+
+ + +
+
+
+ +
+
+
- -
- - -
-
-
-
-
-
-
- -
- - -
- -
-
sn
+ +
+ - -
-
+ +
-
-
+
+
+
+ + -
- -
- -
-
+ + +
line 1
line 2
+ +
βf Xn
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
70
Example: Consider the estimation of a regression model of average travel speed on the freeway
route from work to home,
fn f n fns = X +β ξ
where
fns is the average speed of commuter n on the freeway,
Xn is a vector of commuter n characteristics influencing average travel speed,
βf is a vector of estimable coefficients and
fnξ are unobserved characteristics influencing travel speed.
In the figure, commuter data indicated by a ‘+’ represents the data of observed freeway users
and commuter data indicated by a ‘-’ represents the unobserved speeds of non-freeway users
had they chosen to drive on the freeway.
Because freeway users are a self-selected group (for whatever reasons) of faster drivers (with
faster drivers being more likely to choose the freeway), they are under represented at lower
values of βf Xn and over represented at higher values of βf Xn.
If the speed equation is estimated on only observed data (observed freeway users only), the
resulting estimates would be biased as indicated by “line 1” in Figure 12-1. The true equation
of freeway speed with all data (observed and unobserved) is given by “line 2”.
Econometric Corrections: Instrumental Variables and Expected Value Method
Revise the speed equation such that the average travel speed to work on commuter n’s
chosen route is,
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
71
n n n ns = X Z + +β α ξ
where
ns is the average speed of commuter n on commuter n’s chosen route,
Xn is a vector of commuter n characteristics influencing average travel speed that
are not a function of the route chosen (e.g. such as income, driver age),
Zn is a vector of characteristics commuter n faces that influence average travel
speed that are a function of the route chosen (such as number of traffic signals
and travel distance),
β and α are corresponding vectors of estimable coefficients and
nξ are unobserved characteristics influencing travel speed.
Direct estimation of equation 12.2 would result in biased and inefficient coefficient
estimation because Zn is endogenous due to the discrete/continuous interrelation between
travel speed and route choice.
This is because, as a commuter’s choice of speed increases, elements in the Zn vector
will change since the likelihood of selecting specific route is interrelated with speed
preferences.
As was the case in simultaneous equations models that involve all continuous variables
(see Chapter 5), one could replace the elements of Zn with estimates derived from
regressing Zn against all exogenous variables.
The procedure consists of estimating regression equations for all elements of Zn and to
use regression predicted values, nZ to estimate equation 12.2 such that,
n n n nˆs = X Z + +β α ξ
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
72
An alternative to the instrumental variables approach is to interact the endogenous variables
directly with the corresponding discrete-outcome model (the route-choice model in this case).
The most common approach is to replace the elements in the endogenous variable vector Zn with
their expected values.
( ) ( )jn n jni
E z = P i z∀∑
where Pn(i) is the predicted probability of commuter n selecting discrete outcome i, as
determined by a discrete outcome model.
The Equation then becomes,
( )n n n jn nj i
s = X P i z ∀ ∀
+ +∑∑β α ξ
Econometric Corrections: Selectivity-Bias Correction Term
Another popular approach to resolve selectivity bias problems and arrive at unbiased coefficient
estimates is to develop an expression for a selectivity-bias correction term.
In the context of the example problem, this is done by noting that average travel speed, s, for
commuter, n, from home to work can be written as,
( ) ( )n i n nE s | i = X E | i+β ξ
where
( )nE s |i is the average commute speed of commuter n conditional on the chosen route
i,
Xn is a vector of commuter n characteristics influencing average travel speed,
βi is a vector of estimable coefficients and
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
73
( )nE | iξ is the conditional unobserved characteristics.
Note that variables specific to i (Zn in Equation 12.2) are omitted from the right-hand side of this
equation.
Application of this equation provides bias corrected, and consistent estimates of βi because the
conditional expectation of ξn, ( )nE |iξ , accounts for the nonrandom observed commute speeds
that are selectively biased by commuters’ self-selected choice of route.
The problem then becomes one of deriving a closed-form representation of ( )nE |iξ that can be
used for equation estimation.
With a multinomial logit model, let γ denote a vector of discrete-outcome disturbance terms (ε1,
ε2, ε3, …εJ) where J is the total number of discrete outcomes.
The conditional expectation ξ (conditioned on discrete outcome k) is written as,
( ) ( ) ( )1
1 J
jji |i
E | i = E | f dP =
⎛ ⎞⎜ ⎟⎝ ⎠
∏∫γ
ξ ξ γ ε γ
where Pi is the probability of discrete outcome i.
If it assumed that γ is generalized extreme value distributed with σ2 being the unconditional
variance of ξ and ρi being the correlation of ξ and the resulting discrete-outcome logistic error
terms (resulting from the differencing of εI −εj), then
( ) ( ) ( ) ( ) ( )( ) ( ) ( )21 6 1 1J
J+1n i j j j i
j i
E | i = - J P LN P P LN Pξ σ ρ π≠
⎡ ⎤⎡ ⎤− +⎢ ⎥⎣ ⎦⎣ ⎦∑
Using this equation, selectivity bias in discrete/continuous models is corrected by undertaking
the following 3 steps:
1. Estimate a multinomial logit model to predict the probabilities of discrete outcomes i for
each observation.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
74
2. Use the logit-predicted outcome probabilities to compute the portion of Equation 12.10 in
large brackets ([.]) for each observation.
3. Use the values computed in step 2 to estimate Equation 12.8 using standard least-squares
regression methods. The term ( )1 J+1- σ6ρi/π2 becomes a single estimable parameter.
The speed equation is estimated for each route k as:
in i n i n ns = X + +β αλ η
where αi = ( )1 J+1- σ6ρi/π2 , λn = ( ) ( )( ) ( ) ( )1 1J
j j j ij i
J P L N P P L N P≠
⎡ ⎤⎡ ⎤− +⎢ ⎥⎣ ⎦⎣ ⎦∑ , and nη is a
disturbance term. It is common practice to not include ρi in the summation over discrete
outcomes J.
By doing this, an equality restriction of the correlations of across ξn and εi−εj’s is imposed. This
restriction is relaxed by moving ρi within the summation making it necessary to estimate a total
of J−1 selectivity bias coefficients (α’s) for each continuous equation corresponding to discrete
outcome i.
However, it has been shown empirically by Hay (1980) and Mannering (1986a) that this
restriction on ρi is reasonable.
Discrete/continuous Model Structures
Reduced form approach
A common way to implement a reduced form is to start with the discrete model.
Let Tin be a linear function that determines discrete outcome i for observation n and
let yin be the corresponding continuous variable in the discrete/continuous modeling
system.
Then,
Tin = βi Xin + φi yin + εin
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
75
where βi is a vector of estimable coefficients for discrete outcome i,
Xin is a vector of the observable characteristics (covariates) that determines
discrete outcomes for observation n, and
εin is a disturbance term.
Let the corresponding continuous equation be the linear function,
yin = θi Win + νin
where θi is a vector of estimable coefficients for the continuous variable observed
for discrete outcome i,
Win is a vector of the observable characteristics (covariates) that determine yin,
and
νin is a disturbance term.
This Equation is estimated using ordinary least squares with appropriate selectivity
bias correction (such as adding a selectivity bias correction term).
For estimation of the discrete outcome portion of the discrete/continuous process,
note that yin is endogenous because yin changes with changing Tin due to the
interrelated discrete/continuous structure. So, substituting,
Tin = βi Xin + φiθi Win + φiνin + εin
With this Equation, a discrete outcome model is derived readily.
For example, if εin’s are assumed to be generalized extreme value distributed, a
multinomial logit model results with the probability of observation n having outcome
i as,
( ) ( )( )
i in i i inn
I In I I In I
E X P X WP i
E X P X Wβ φ θβ φ θ
∀
+=
+∑
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
76
Note that because the term φiνin does not vary across outcomes i, it cancels out and
does not enter the logit model structure.
Economic consistency approach
Using utility maximization, note that,
in
0 inin
in
n
Vpy V
Inc
= −
∂∂∂∂
where
Vin is the indirect utility of discrete alternative i to consumer n,
pin is the consumer’s unit price of consuming i,
Incn is the decision-maker's income, and
yin0 is the utility maximizing demand for i.
The discrete/continuous link is made by either specifying the indirect utility function,
Vin, or the commodity demand y.
To develop an economically consistent model structure, for vehicle type choice (discrete)
and vehicle utilization (continuous), a utilization equation is specified as (alternatively an
indirect utility function could be specified first),
yin = i in i in X Z+β α + κ(Incn − πrnpin) + νin
where
yin is the annual utilization (for example, kilometers per year) of vehicle i,
inX is a vector of consumer characteristics that determine vehicle
utilization,
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
77
inZ is a vector of vehicle characteristics that determine vehicle utilization,
Incn is consumer n's annual income,
rn is the expected annual vehicle utilization for consumer n,
pin is the consumer’s unit price of utilization (for example, dollars per
kilometer driven),
iβ , iα , κ, and π are estimable coefficients, and νi is a disturbance term.
The expected utilization, rin, is needed to capture the income effect of consumption (Incn −
πrnpin) but is determined exogenously by regressing it against exogenous variables.
With this Equation Roy’s identity is applied as a partial differential equation,
00in inin
n in
V Vy Inc p
+ =∂ ∂∂ ∂
and solve for Vin giving,
( ) inκpin i in i in n n in inV X Z Inc r p e−= + + − +⎡ ⎤⎣ ⎦β α κ π ν + εin
where εin is a disturbance term added for discrete-outcome model estimation.
If εin’s are assumed to be generalized extreme value distributed, a logit model for discrete
outcomes results.
The major drawback with the economically consistent approach is that a nonlinear form of
either the indirect utility function or continuous equation results.
In choosing between a reduced form and the economically consistent approaches, many
applications require a trade-off between ease of estimation and compromising theory.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
78
Duration Models
Data such as the elapsed time until the occurrence of an event or the duration of an event.
Examples include:
time until a vehicle accident occurs,
time between vehicle purchases,
the time devoted to an activity (shopping, recreational, etc.),
the time until the adoption of new transportation technologies.
Duration data are usually continuous and can be modeled using least-squares regression.
The use of estimation techniques that are based on hazard functions, however, can often
provide additional insights into the underlying duration problem.
Hazard-Based Duration Models
Hazard-based models are applied to study the conditional probability of a time duration ending
at some time t, given that the duration has continued until time t.
Developing hazard-based duration models begins with the cumulative distribution function,
F(t) = P(T < t)
Where:
P denotes probability,
T is a random time variable, and
t is some specified time.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
79
The density function corresponding to this distribution function (the first derivative of the
cumulative distribution with respect to time) is
f(t) = dF(t)/dt
and the hazard function is
h(t) = f(t)/[1 - F(t)]
where: h(t) is the conditional probability that an event will occur between time t and t
+ dt, given that the event has not occurred up to time t.
h(t) gives the rate at which event durations are ending at time t, given that the event
duration has not ended up to time t.
The survivor function, which provides the probability that a duration will be greater than or
equal to some specified time, t, is also frequently used in hazard analyses for interpretation of
results. The survivor function is
S(t) = P(T ≥ t)
Graphically, hazard, density, cumulative distribution, and survivor functions are illustrated in the
Figure.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
80
0.5
1.0
0
1 2 3 4t
F(t)
h(t)
S(t)f(t)
The slope of the hazard function (the first derivative with respect to time) captures Duration
Dependence:
0
1 2 3 4 5
h(t)
t
1
h (t)
h (t)
h (t)h (t)
3
4
2
2
4
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
81
Consider a driver’s probability of having an accident and the length of time without having an
accident.
1. The first hazard function, h1(t), has dh1(t)/dt < 0 for all t. This hazard is monotonically
decreasing in duration, implying that the longer drivers go without having an accident, the less
likely they are to have one soon.
2. The second hazard function is nonmonotonic and has dh2(t)/dt > 0 and
dh2(t)/dt < 0 depending on the length of duration t. In this case the accident probabilities
increase or decrease in duration.
3. The third hazard function has dh3(t)/dt > 0 for all t and is monotonically increasing in duration.
This implies that the longer drivers go without having an accident the more likely they are to
have an accident soon.
4. Finally, the fourth hazard function has dh4(t)/dt = 0, which means that accident probabilities are
independent of duration and no duration dependence exists.
In addition to duration dependence, hazard-based duration models account for the effect of
covariates on probabilities.
Proportional hazards
The proportional-hazards approach assumes that the covariates, which are factors that affect
accident probabilities, act multiplicatively on some underlying hazard function.
The baseline hazard function is denoted ho(t) assumes that all elements of a covariate vector, X,
are zero.
For simplicity, covariates are assumed influence the baseline hazard with the function EXP(βX),
where β is a vector of estimable coefficients.
The hazard rate with covariates is
h(t|X) = ho(t)EXP(βX)
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
82
This proportional-hazards approach is illustrated:
2
4
6
0 1 2 3 4 5
Accelerated Lifetime
Assumes that the covariates rescale (accelerate) time directly in a baseline survivor
function, which is the survivor function when all covariates are zero.
The accelerated lifetime model is written as
S(t|X) = So[t EXP(βX)]
which leads to the conditional hazard function
h(t|X) = ho[t EXP(βX)]EXP(βX)
Characteristics of Duration Data
Duration data are often left or right censored.
h(t|Z)=ho(t)EXP(βX)
ho(t)
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
83
For example, consider the time of driving a vehicle until a driver’s first accident.
Suppose data are only available for reported accidents over a specified time period
beginning at time a in the Figure and ending at time b.
Observation 1 is not observed, since it does not fall within the time period of observation.
Observation 2 is left and right censored because it is not known when driving began and the
first accident is not observed in the a to b time interval.
Observation 3 is complete with both start and ending times in the observed period.
Observations 4 and 6 are left censored and observation 5 is right censored.
Hazard-based models can readily account for right-censored data.
Time
Observations
1
2
3
4
5
6
t1
t2
t3
t4
t5
t6
a b
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
84
With left-censored data the problem becomes one of determining the distribution of
duration start times so that they can be used to determine the contribution of the left-
censored data to the model’s likelihood function.
Left-censored data creates a far more difficult problem because of the additional complexity
added to the likelihood function.
Tied Data
Occurs when a number of observations end their durations at the same time. This is referred
to as the problem of tied data.
Tied data can arise when data collection is not precise enough to identify exact duration-
ending times.
When duration exits are grouped at specific times, the likelihood function for proportional
and accelerated lifetime models becomes increasingly complex.
Non-Parametric Models
Rare because of pre-dominant use of semi-parametric and parametric methods.
There are two popular approaches for generating survival functions for non-parametric
methods:
1. The product-limit (PL) method developed by Kaplan and Meier (1958)
2. Life table method (groups survival times into intervals).
Semi-Parametric Models
Semi-parametric models do not assume a duration-time distribution, although they do retain
the parametric assumption of the covariate influence.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
85
A nonparametric approach for modeling the hazard function is convenient when little or no
knowledge of the functional form of the hazard is available.
The Cox proportional-hazards model is semi parametric because EXP(βX) is still used as the
functional form of the covariate influence.
The model is based on the ratio of hazards—so that the probability of an observation i
exiting a duration at time ti, given that at least one observation exits at time ti, is given as
( ) ( )i
i jj R
EXP X EXP Xβ β∈∑
Where: Ri denotes the set of observations, j, with durations greater than or equal to ti.
Fully-Parametric Models
With fully-parametric models, typical distributions of the hazard function include
gamma, exponential, Weibull, log-logistic, log-normal and Gompertz distributions,
among others.
The choice of a specific distribution has important implications relating not only to the
shape the underlying hazard, but also to the efficiency and potential biasedness of the
estimated parameters.
Exponential
f(t) = λEXP(-λt)
with hazard,
h(t) = λ
This distribution's hazard is constant, as illustrated by h4(t) in the previous figure.
This means that the probability of a duration ending is independent of time and there is
no duration dependence.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
86
Weibull
A more generalized form of the exponential.
It allows for positive duration dependence (hazard is monotonic increasing in duration and
the probability of the duration ending increases over time), negative duration dependence
(hazard is monotonic decreasing in duration and the probability of the duration ending
decreases over time) or no duration dependence (hazard is constant in duration and the
probability of the duration ending is unchanged over time).
With parameters λ > 0 and P > 0, the Weibull distribution has the density function,
f(t) = λP(λt)P-1EXP[-(λt)P]
with hazard
( ) ( ) ( ) 1Ph t P tλ λ −=
If the Weibull parameter P is greater than one, the hazard is monotone increasing in duration
(see h3(t) in Figure);
If P is less than one, it is monotone decreasing in duration (see h1(t) in Figure);
If P equals one, the hazard is constant in duration and reduces to the exponential
distribution's hazard with h(t) = λ (see h4(t) in Figure).
Because the Weibull distribution is a more generalized form of the exponential distribution,
it provides a more flexible means of capturing duration dependence. However, it is still
limited because it requires the hazard to be monotonic over time. In many applications, a
nonmonotonic hazard is theoretically justified.
Log-logistic
The log-logistic distribution allows for nonmonotonic hazard functions and is often used as
an approximation of the more computationally cumbersome lognormal distribution.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
87
The log-logistic with parameters λ > 0 and P > 0 has the density function,
f(t) = λP(λt)P-1[1+(λt)P]-2
and hazard function
( ) ( ) ( )( )
1
1
P
P
P th t
t
λ λ
λ
−
=⎡ ⎤+⎣ ⎦
The log-logistic's hazard is identical to the Weibull's except for the denominator.
If P < 1, then the hazard is monotone decreasing in duration (see h1(t) in Figure 9-2);
If P = 1, then the hazard is monotone decreasing in duration from parameter λ; and if P
> 1, then the hazard increases in duration from zero to an inflection point, ti = (P-1)1/P
/λ,
and decreases toward zero thereafter (see h2(t) in Figure 9-2).
Comparisons of Non-Parametric, Semi-Parametric, and Fully-Parametric Models
The choice among non-parametric, semi-parametric, and fully-parametric methods
for estimating survival or duration models can be complicated.
When there is little information about the underlying distribution due to the small
size of the sample or the lack of a theory that would suggest a specific distribution, a
non-parametric approach may be appropriate.
Parametric methods are more suitable when underlying distributions are known or
can be theoretically justified.
Semi-parametric models may also be a good choice when little is known about the
underlying hazard distribution. Problems:
1. Duration effects can be difficult to track.
2. A potential loss in efficiency may result. It can be shown that in data where
censoring exists and the underlying survival distribution is known, the Cox
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
88
semi-parametric proportional hazards model does not produce efficient
coefficient estimates.
Comparing various hazard distributional assumptions for fully-parametric models can also
be difficult.
Determining the relative difference between a Weibull and exponential model
could be approximated by the significance of the Weibull’s P parameter,
which represents the difference between the two distributions.
The difference between the Weibull and log-logistic models and other
distributions is more difficult to test because the models may not be nested.
One possible comparison for distributional models that are not nested is to
compare likelihood ratio statistics
–2(LL(0)- LL(βc))
where LL(0) is the initial log-likelihood (with all coefficients equal to
zero) and LL(βc) is the log-likelihood at convergence.
This statistic is χ2 distributed with the degrees of freedom equal to the
number of estimated coefficients included in the model.
One could select the distribution that provided the highest level of
significance for this statistic to determine the best-fit distribution.
Heterogeneity
While formulating proportional-hazard models an implicit assumption made is that the
survival function (see Equation 9.6) is homogenous across observations.
All of the variation in durations is assumed to be captured by the covariate vector X.
A problem arises when some unobserved factors, not included in X, influence durations.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
89
This is referred to as unobserved heterogeneity and can result in a major specification error
that can lead one to draw erroneous inferences on the shape of the hazard function, in
addition to producing inconsistent coefficient estimates.
In fully-parametric models the most common approach to account for heterogeneity is to
introduce a heterogeneity term designed to capture unobserved effects across the population
and to work with the resulting conditional survival function.
With a heterogeneity term, w, having a distribution over the population g(w), along with a
conditional survival function, S(t|w), the unconditional survival function becomes
S(t) = ∫ S(t|w)g(w)dw
For a Weibull distribution with gamma heterogeneity, without loss of generality, w is
assumed to be gamma distributed with mean 1 and variance = 1/k. So,
( )1
k-kv k-kg ( w ) e w
k=Γ
With the Weibull distribution and S(t) = f(t)/h(t), so,
( )P- w tS( t | w) e= λ
The unconditional survival distribution can then be written as (with θ = 1/k):
( )1
01
P
S( t ) S( t | w )g( w )dw t
−∞ ⎡ ⎤= = +⎣ ⎦∫θ
θ λ
resulting in the hazard function
h(t) = λP(λt)P-1[S(t)]θ
Note that if θ = 0, heterogeneity is not present because the hazard reduces to a simple
Weibull and the variance of the heterogeneity term w is zero.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
90
The selection of a heterogeneity distribution should not be taken lightly. The consequences
of incorrectly specifying g(w) is potentially severe and can result in inconsistent estimates.
State Dependence
In duration modeling, state dependence refers to a number of processes that seek to
establish a relationship between past duration experiences and current durations.
Conceptually, the understanding of how past experience affects current behavior is a
key component in modeling that captures important habitual behavior.
State dependence can be classified into three types:
1. Type I state dependence is duration dependence. As discussed above,
duration dependence is the conditional probability of a duration ending soon,
given that it has lasted to some known time. Type I state dependence is
captured in the shape of the hazard function.
2. Type II state dependence is occurrence dependence. This refers to the effect
that the number of previous durations has on a current duration. This is
accounted in the model by including the number of previous shopping
durations as a variable in the covariate vector.
3. Type III state dependence, lagged duration dependence, accounts for the
effect that lengths of previous durations have on a current duration. This
would be accounted for in the model’s covariate vector by including the
length of previous durations as a variable.
When including variables that account for Type II or Type III state duration
dependence, the findings can easily be misinterpreted. This arises because
unobserved heterogeneity can be captured in the coefficient estimates of the state
dependence variables.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
91
Time-varying covariates
Covariates that change over a duration are problematic.
If the covariate vector, X, changes over the duration being studied, coefficient
estimates may be biased.
Time-varying covariates are difficult to account for, but can be incorporated in
hazard models by allowing the covariate vector to be a function of time.
The hazard and likelihood functions are then appropriately re-written. The
likelihood function becomes more complex, but estimation is typically simplified
because time-varying covariates usually make only a few discrete changes over the
duration being studied.
Discrete-time hazard models
An alternative to standard continuous-time hazard models is to use a discrete-time
approach.
In discrete-time models, time is segmented into uniform discrete categories and exit
probabilities in each time period are estimated using a logistic regression or other
discrete outcome modeling approach.
This discrete approach allows for a very general form of the hazard function (and
duration effects) that can change from one time interval to the next as shown in the
Figure. However, the hazard is implicitly assumed to be constant during each of the
discrete time intervals.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
92
Discrete hazard models have the obvious drawback of inefficient coefficient estimators because of information lost in discretizing continuous data..
Although discrete-time hazard models are inefficient, they provide at least two
advantages over their continuous-time counterparts.
1. Time-varying covariates can be easily handled by incorporating new, time-
varying values in the discrete time intervals (there is still the assumption that
they will be constant over the discrete-time interval).
2. Second, tied data (groups of observations exiting a duration at the same
time), which, as discussed above, is problematic when using continuous-data
approaches is not a problem with discrete hazard models. This is because the
grouped data fall within a single discrete time interval and the subsequent
lack of information on “exact” exit times does not complicate the estimation.
Time
4
2
3
1
0
h(t)
1 2 3 4 5 60
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
93
Statistical and Econometric Methods
Assignment #1 (Continuous Data - Regression Analysis)
You are given 151 observations of a travel survey collected in State College, Pennsylvania. All of the households in the sample are making the morning commute to work. They are all departing from the same origin (a large residential complex in the suburbs) and going to work in the Central Business District. They have the choice of three alternate routes; 1) a four-lane arterial (speed limit = 35mph, 2 lanes each direction), 2) a two-lane rural road (speed limit = 35mph, 1 lane each direction) and 3) a limited access four-lane freeway (speed limit = 55mph, 2 lanes each direction). Your task is to estimate a model of individual average travel speed to work using standard regression techniques. Your solution to this problem should include: 1. The results of your best model specification. 2. A discussion of the logical process that led you to the selection of your final specification.
(e.g. Discuss the theory behind the inclusion of your selected variables). Include t-statistics and justify the sign of your variables.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
94
Variables available for your specification are: (file trt.out)
Variable Explanation
x1 Actual in-vehicle travel time in minutes
x2 Route chosen: 1 - arterial, 2 - rural road, 3 - freeway
x3 Traffic flow rate at time of departure in vehicles per hour
x4 Number of traffic signals on the selected route
x5 Distance along the selected route in tenths of miles
x6 Seat belts: 1 - if wear, 0 - if not
x7 Number of passengers in car
x8 Driver age in years: 1 - 18 to 23, 2 - 24 to 29, 3 - 30 to 39, 4 - 40 to 49, 5 - 50 and above
x9 Gender: 1 - male, 0 - female
x10 Marital status: 1 - single, 0 - married
x11 Number of children
x12 Annual income: 1 - less than 20000, 2 - 20000 to 29999, 3 - 30000 to 39999, 4 - 40000 to 49999, 5 - more than 50000
x13 Model year of car (e.g. 86 = 1986)
x14 Origin of car: 1 - domestic, 0 - foreign
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
95--> sample;1-151$ --> read;nvar=14;nobs=151;file=D:\old_drive_d\new_laptop\CE697N-disk\trt.out.... --> create;speed=(x5/10)/(x1/60)$ --> dstat;rhs=speed$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- SPEED .263726534D+02 .854564197D+01 .113684211D+02 .622500000D+02 151 --> create;if(x2=3) frwy=1$ --> create;if(x2=1) art=1$ --> create;cage=86-x13$ --> regress;lhs=speed;rhs=one,frwy,art,cage,x6,x3$ +-----------------------------------------------------------------------+ | Ordinary least squares regression Weighting variable = none | | Dep. var. = SPEED Mean= 26.37265340 , S.D.= 8.545641969 | | Model size: Observations = 151, Parameters = 6, Deg.Fr.= 145 | | Residuals: Sum of squares= .8103322769D+04, Std.Dev.= 7.47563 | | Fit: R-squared= .260254, Adjusted R-squared = .23475 | | Model test: F[ 5, 145] = 10.20, Prob value = .00000 | | Diagnostic: Log-L = -514.9573, Restricted(b=0) Log-L = -537.7167 | | LogAmemiyaPrCrt.= 4.062, Akaike Info. Crt.= 6.900 | | Autocorrel: Durbin-Watson Statistic = 1.93247, Rho = .03376 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 27.50201489 2.5979595 10.586 .0000 FRWY 11.93363775 2.3534831 5.071 .0000 .99337748E-01 ART 3.401468353 1.7110414 1.988 .0487 .21854305 CAGE -.2423910189 .15614209 -1.552 .1228 4.0927152 X6 2.056499369 1.3548641 1.518 .1312 .70860927 X3 -.7138526638E-02 .42918384E-02 -1.663 .0984 493.57616 --> create;if(x2=3)frwytl=x4$ --> create;if(x2=1)arttl=x4$ --> dstat;rhs=frwytl,arttl$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- FRWYTL .523178808D+00 .165261182D+01 .000000000D+00 .700000000D+01 151 ARTTL .322516556D+01 .620233524D+01 .000000000D+00 .230000000D+02 151
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
96 --> regress;lhs=speed;rhs=one,frwy,art,cage,x6,x3,x5,frwytl,arttl$ +-----------------------------------------------------------------------+ | Ordinary least squares regression Weighting variable = none | | Dep. var. = SPEED Mean= 26.37265340 , S.D.= 8.545641969 | | Model size: Observations = 151, Parameters = 9, Deg.Fr.= 142 | | Residuals: Sum of squares= .7009953198D+04, Std.Dev.= 7.02608 | | Fit: R-squared= .360067, Adjusted R-squared = .32401 | | Model test: F[ 8, 142] = 9.99, Prob value = .00000 | | Diagnostic: Log-L = -504.0141, Restricted(b=0) Log-L = -537.7167 | | LogAmemiyaPrCrt.= 3.957, Akaike Info. Crt.= 6.795 | | Autocorrel: Durbin-Watson Statistic = 1.91592, Rho = .04204 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 16.16325632 3.9928669 4.048 .0001 FRWY 16.94939297 7.6315911 2.221 .0279 .99337748E-01 ART 1.988634670 8.6623071 .230 .8188 .21854305 CAGE -.2328219473 .14912800 -1.561 .1207 4.0927152 X6 1.992892731 1.2752154 1.563 .1203 .70860927 X3 -.8198678320E-02 .40677382E-02 -2.016 .0457 493.57616 X5 .2661454594 .74404251E-01 3.577 .0005 48.099338 FRWYTL -2.301061742 1.2457614 -1.847 .0668 .52317881 ARTTL .2508103259E-01 .58844954 .043 .9661 3.2251656 --> create;if(x2=2)rurtl=x4$ --> create;if(x8<3&x9=1)youngm=1$ --> regress;lhs=speed;rhs=one,frwy,art,cage,x6,x3,x5,frwytl,rurtl,youngm$ +-----------------------------------------------------------------------+ | Ordinary least squares regression Weighting variable = none | | Dep. var. = SPEED Mean= 26.37265340 , S.D.= 8.545641969 | | Model size: Observations = 151, Parameters = 10, Deg.Fr.= 141 | | Residuals: Sum of squares= .6727183008D+04, Std.Dev.= 6.90728 | | Fit: R-squared= .385881, Adjusted R-squared = .34668 | | Model test: F[ 9, 141] = 9.84, Prob value = .00000 | | Diagnostic: Log-L = -500.9054, Restricted(b=0) Log-L = -537.7167 | | LogAmemiyaPrCrt.= 3.929, Akaike Info. Crt.= 6.767 | | Autocorrel: Durbin-Watson Statistic = 1.91023, Rho = .04489 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 22.90059405 5.1196606 4.473 .0000 FRWY -.4626342414E-02 10.374523 .000 .9996 .99337748E-01 ART -9.052728265 5.2732767 -1.717 .0882 .21854305 CAGE -.2187957231 .14657167 -1.493 .1377 4.0927152 X6 1.545666270 1.2703105 1.217 .2257 .70860927 X3 -.9457421614E-02 .40102172E-02 -2.358 .0197 493.57616 X5 .3757958689 .84435209E-01 4.451 .0000 48.099338 FRWYTL -1.821566472 1.2374165 -1.472 .1432 .52317881 RURTL -1.468789815 .64959125 -2.261 .0253 5.1523179 YOUNGM 1.309483785 1.2864788 1.018 .3105 .27814570
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
97 --> create;if(x9=1)maleage=x8$ --> histogram;rhs=maleage$
Bin
Histogram for Variable MALEAGE
Freq
uenc
y
0
19
38
57
76
0 1 2 3 4 5
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
98
Statistical and Econometric Methods
Assignment #2 (Count Data - Poisson Regression)
You are given 204 observations from a travel survey conducted in the Seattle metropolitan area. The purpose of the survey was to study the number of times (per week) commuters' changed their departure time on their work-to-home trip to avoid traffic congestion. The data are non-negative integers and are thus well suited to the Poisson regression approach. Remember in a Poisson regression, you are estimating a parameter vector β such that: λ = EXP(βX) where λ is the Poisson parameter that in this case is the expected number of departure changes per week. In your analysis include: 1. The results of your best model specification. 2. A discussion of the logical process that led you to the selection of your final specification (e.g.
discuss the theory behind the inclusion of your selected variables). Include t-statistics and justify the sign of your variables.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
99 Variables available for your specification are (file tobit.dat) Variable Number Explanation
x1 Household number
x2 Do you ever delay work-to-home departure to avoid traffic congestion? 1-yes, 0-no
x3 If sometimes delay, on average how many minutes do you delay?
x4 If sometimes delay, do you 1-perform additional work, 2-engage in non-work activities, or 3-do both?
x5 If sometimes delay, how many times have you delayed in the past week?
x6 Mode of transportation used work-to-home: 1-car SOV, 2-carpool, 3-vanpool, 4-bus, 5 other.
x7 Primary route (work-to-home): 1-I90, 2-I5, 3-SR520, 4-I405, 5-other
x8 Do you generally encounter traffic congestion on you work-to-home trip? 1-yes, 2-no
x9 Age: 1-(<25), 2-(26-30), 3-(31-35), 4-(36-40), 5-(41-45), 6-(46-50), 7-(>50)
x10 Gender: 1-male, 0-female
x11 Number of cars in household
x12 Number of children in household
x13 Income: 1 - less than 20000, 2 - 20000 to 29999, 3 - 30000 to 39999, 4 - 40000 to 49999, 5 - 50000 to 59999, 6 - >60000
x14 Do you have flexible work hours? 1-yes, 0-no
x15 Distance from work to home (in miles)
x16 Face LOS D or worse? 1-yes, 0-no
x17 Ratio of actual travel time to free-flow travel time
x18 Population of work zone
x19 Retail employment in work zone
x20 Service employment in work zone
x21 Size of work zone (in acres)
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
100 --> read;nvar=21;nobs=204;file=D:\old_drive_d\new_laptop\CE697N-disk\TOBIT.DAT$ --> reject;x2=0$ --> create;if(x7=3)sr520=1$ --> create;if(x7=2)I5=1$ --> dstat;rhs=x5$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- X5 .183333333D+01 .137394297D+01 .000000000D+00 .500000000D+01 96 --> histogram;rhs=x5$ Histogram for X5 NOBS= 96, Too low: 0, Too high: 0 Bin Lower limit Upper limit Frequency Cumulative Frequency ======================================================================== 0 .000 1.000 18 ( .1875) 18( .1875) 1 1.000 2.000 23 ( .2396) 41( .4271) 2 2.000 3.000 27 ( .2813) 68( .7083) 3 3.000 4.000 20 ( .2083) 88( .9167) 4 4.000 5.000 1 ( .0104) 89( .9271) 5 5.000 6.000 7 ( .0729) 96(1.0000)
Bin
Histogram for Variable X5
Freq
uenc
y
0
8
16
24
32
0 1 2 3 4 5
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
101 --> poisson;lhs=x5;rhs=one,sr520,i5,x10,x11,x14,x15,x17; limit=6;truncation;upper$ +-----------------------------------------------------------------------+ | Poisson Regression Model - OLS Results | | Ordinary least squares regression Weighting variable = none | | Dep. var. = X5 Mean= 1.556818182 , S.D.= 1.059797796 | | Model size: Observations = 88, Parameters = 8, Deg.Fr.= 80 | | Residuals: Sum of squares= .9111512339D+02, Std.Dev.= 1.06721 | | Fit: R-squared= .067551, Adjusted R-squared = -.01404 | | Model test: F[ 7, 80] = .83, Prob value = .56715 | | Diagnostic: Log-L = -126.3972, Restricted(b=0) Log-L = -129.4746 | | LogAmemiyaPrCrt.= .217, Akaike Info. Crt.= 3.054 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 2.124169713 1.1147607 1.905 .0567 SR520 -.1788795348E-05 .52109802E-03 -.003 .9973 .17045455 I5 .4994266029E-01 .24964851 .200 .8414 .36363636 X10 .8129313858E-03 .25291341 .003 .9974 .70454545 X11 -.6078538464E-02 .11441429 -.053 .9576 1.9431818 X14 -.4891679372 .24426295 -2.003 .0452 .64772727 X15 -.3586809102E-01 .26568623E-01 -1.350 .1770 7.8750000 X17 .1280136445E-01 .44047325 .029 .9768 1.9556818 +---------------------------------------------+ | Poisson Regression | | Maximum Likelihood Estimates | | Dependent variable X5 | | Weighting variable ONE | | Number of observations 96 | | Iterations completed 6 | | Log likelihood function -151.3086 | | Restricted log likelihood -160.5608 | | Chi-squared 18.50428 | | Degrees of freedom 7 | | Significance level .9890572E-02 | | RIGHT Truncated data, at Y = 5. | | Chi- squared = 90.40988 RsqP= .0757 | | G - squared = 103.41426 RsqD= .1168 | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 1.821166565 .75186875 2.422 .0154 SR520 -.5376391967 .25629632 -2.098 .0359 .16666667 I5 -.3251529733 .18668242 -1.742 .0816 .34375000 X10 -.4854679467E-01 .17966580 -.270 .7870 .69791667 X11 -.1210107221 .87127154E-01 -1.389 .1649 1.8854167 X14 -.3929141620 .17063688 -2.303 .0213 .63541667 X15 -.2822605374E-01 .20647975E-01 -1.367 .1716 7.7083333 X17 -.1453901223 .30066940 -.484 .6287 1.9593750
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
102
Statistical and Econometric Methods
Assignment #3 (Discrete Data - Logit Analysis)
You are given 151 observations of a travel survey collected in State College Pennsylvania (same data as in assignment #1). All of the households in the sample are making the morning commute to work. They are all departing from the same origin (a large residential complex in the suburbs) and going to work in the Central Business District. They have the choice of three alternate routes; 1) a four-lane arterial (speed limit = 35mph, 2 lanes each direction), 2) a two-lane rural road (speed limit = 35mph, 1 lane each direction) and 3) a limited access four-lane freeway (speed limit = 55mph, 2 lanes each direction). Your task is to estimate a model of Route Choice (i.e., the likelihood of an individual traveler taking one of the three routes). Your solution to this problem should include:
1. The results of your best model specification.
2. A discussion of the logical process that led you to the selection of your final specification. (e.g. Discuss the theory behind the inclusion of your selected variables). Include t-statistics and justify the sign of your variables.
For reference, see Example 11.1 on page 267 of Washington, S., M. Karlaftis and F. Mannering (2003) Statistical and econometric methods for transportation data analysis, Chapman & Hall/CRC, Boca Raton, FL, 425 pages.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
103 Variables available for your specification are (in file LOGIT-A1.txt):
Variable Number Explanation
x1 Route chosen, rows: 1 - arterial, 2 - rural road, 3 - freeway
x2 Arterial row indicator; 1 for arterial row, 0 for others
x3 Rural row indicator; 1 for rural row, 0 for others
x4 Freeway row indicator; 1 for freeway row, 0 for others
x5 Traffic flow rate
x6 Number of traffic signals
x7 Distance in tenths of miles
x8 Seat belts: 1 - if wear, 0 - if not
x9 Number of passengers in car
x10 Driver age in years: 1 - 18 to 23, 2 - 24 to 29, 3 - 30 to 39, 4 - 40 to 49, 5 - 50 and above
x11 Gender: 1 - male, 0 - female
x12 Marital status: 1 - single, 0 - married
x13 Number of children
x14 Annual income: 1 - less than 20000, 2 - 20000 to 29999, 3 - 30000 to 39999, 4 - 40000 to 49999, 5 - more than 50000
x15 Model year of car (e.g. 86 = 1986)
x16 Origin of car: 1 - domestic, 0 - foreign
x17 Fuel efficiency in miles per gallon
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
104 --> read;nvar=17;nobs=453;file=D:\old_drive_d\new_laptop\CE697N-disk\LOGIT-A1... --> create;cage=86-x15$ --> nlogit;lhs=x1;choices=arterial,rural,freeway;model: u(arterial)=dist*x7/ u(rural)=rural*one+dist*x7+cager*cage/ u(freeway)=freeway*one+dist*x7+malef*x11+cagef*cage$ Normal exit from iterations. Exit status=0. +---------------------------------------------+ | Discrete choice (multinomial logit) model | | Maximum Likelihood Estimates | | Dependent variable Choice | | Weighting variable ONE | | Number of observations 151 | | Iterations completed 7 | | Log likelihood function -97.32659 | | Log-L for Choice model = -97.3266 | | R2=1-LogL/LogL* Log-L fncn R-sqrd RsqAdj | | No coefficients -165.8905 .41331 .40142 | | Constants only -124.2267 .21654 .20066 | | Chi-squared[ 4] = 53.80016 | | Significance for chi-squared = 1.00000 | | Response data are given as ind. choice. | | Number of obs.= 151, skipped 0 bad obs. | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ DIST -.1663035351 .29951458E-01 -5.552 .0000 RURAL .1598214542 .33221388 .481 .6305 CAGER .1280857447 .67891270E-01 1.887 .0592 FREEWAY -.1641800393 .73082884 -.225 .8223 MALEF .6608161130 .59869845 1.104 .2697 CAGEF .2353363437 .84583760E-01 2.782 .0054 --> nlogit;lhs=x1;choices=arterial,rural,freeway;model: u(arterial)=dista*x7/ u(rural)=rural*one+distr*x7+cager*cage/ u(freeway)=freeway*one+distf*x7+malef*x11+cagef*cage ;prob=proute$ Normal exit from iterations. Exit status=0. +---------------------------------------------+ | Discrete choice (multinomial logit) model | | Maximum Likelihood Estimates | | Dependent variable Choice | | Weighting variable ONE | | Number of observations 151 | | Iterations completed 6 | | Log likelihood function -94.27486 | | Log-L for Choice model = -94.2749 | | R2=1-LogL/LogL* Log-L fncn R-sqrd RsqAdj | | No coefficients -165.8905 .43170 .41624 | | Constants only -124.2267 .24111 .22046 | | Chi-squared[ 6] = 59.90361 | | Significance for chi-squared = 1.00000 | | Response data are given as ind. choice. | | Number of obs.= 151, skipped 0 bad obs. | +---------------------------------------------+
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
105 +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ DISTA -.1224245143 .30195296E-01 -4.054 .0001 RURAL 2.789895508 1.3999591 1.993 .0463 DISTR -.1763266850 .30647778E-01 -5.753 .0000 CAGER .1234459887 .68604008E-01 1.799 .0720 FREEWAY -2.711745018 2.7256129 -.995 .3198 DISTF -.9566266147E-01 .47393754E-01 -2.018 .0435 MALEF .6645967680 .62917721 1.056 .2908 CAGEF .2272410439 .84670428E-01 2.684 .0073 --> reject;x3=1$ --> reject;x4=1$ --> dstat;rhs=proute$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- PROUTE .218543047D+00 .191798823D+00 .136479546D-02 .893318771D+00 151 --> include;x3=1$ --> reject;x2=1$ --> dstat;rhs=proute$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- PROUTE .682119196D+00 .225544682D+00 .781963709D-01 .987070066D+00 151 --> include;x4=1$ --> reject;x3=1$ --> dstat;rhs=proute$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- PROUTE .993377575D-01 .155630369D+00 .909963465D-03 .800433079D+00 151
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
106
Statistical and Econometric Methods
Assignment #4
(Discrete Data - Logit Analysis) Using the information from assignment #3, perform the following:
1. Develop a new model with a price variable in all three choice alternatives. The price variable is created as: set price = ((distance/10)/mpg)*1.05
2. Calculate direct elasticities for all continuous variables using the Limdep "effects" command (see software command-file downloads for assignment #3). Briefly comment on your findings.
3. Perform a likelihood ratio test to determine if men and women should be modeled separately. The test statistic is (see page 282 in the text): -2[LL(βT) – LL(βM) – LL(βF)] where LL(βT) is the log-likelihood at convergence of the model estimated with the data (males and females), LL(βM) is the log-likelihood at convergence of the model using only male data (use the Limdep "reject" command), and LL(βF) is the log-likelihood at convergence of the model using only female data. This statistic is χ2 distributed with degrees of freedom equal to the summation of the number of estimated parameters in individual male and female models minus the number of estimated parameters in the overall model. The resulting χ2 statistic provides the probability that the models have different parameters. Confidence levels for this can be read from Table C.3 on page 379 of the text. Briefly comment on your findings.
--> read;nvar=17;nobs=453;file=D:\old_drive_d\new_laptop\CE697N-disk\LOGIT-A1.txt$ --> create;cage=86-x15$ --> create;price=(x7/10)/x17*1.05$ --> nlogit;lhs=x1;choices=arterial,rural,freeway;model: u(arterial)=pricea*price/ u(rural)=rural*one+pricer*price+cager*cage/ u(freeway)=freeway*one+pricef*price+malef*x11+cagef*cage ;effects:price(arterial,rural,freeway)$
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
107Normal exit from iterations. Exit status=0. +---------------------------------------------+ | Discrete choice (multinomial logit) model | | Maximum Likelihood Estimates | | Dependent variable Choice | | Weighting variable ONE | | Number of observations 151 | | Iterations completed 7 | | Log likelihood function -93.08420 | | Log-L for Choice model = -93.0842 | | R2=1-LogL/LogL* Log-L fncn R-sqrd RsqAdj | | No coefficients -165.8905 .43888 .42361 | | Constants only -124.2267 .25069 .23030 | | Chi-squared[ 6] = 62.28493 | | Significance for chi-squared = 1.00000 | | Response data are given as ind. choice. | | Number of obs.= 151, skipped 0 bad obs. | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ PRICEA -27.59844150 5.9577684 -4.632 .0000 RURAL 1.955464369 .96933973 2.017 .0437 PRICER -36.12375952 5.9886626 -6.032 .0000 CAGER .2050962666 .79571545E-01 2.578 .0100 FREEWAY -2.680384373 1.4163297 -1.892 .0584 PRICEF -21.41617741 5.8457065 -3.664 .0002 MALEF .4913420715 .66134595 .743 .4575 CAGEF .2518958209 .97354563E-01 2.587 .0097 +-----------------------------------------------------------------+ | Elasticity Averaged over observations. | | Attribute is PRICE in choice ARTERIAL | | Effects on probabilities of all choices in the model: | | * indicates direct Elasticity effect of the attribute. | | Decomposition of Effect Total | | Trunk Limb Branch Choice Effect| | Trunk=Trunk{1} | | Limb=Lmb[1:1] | | Branch=B(1:1,1) | | * Choice=ARTERIAL .000 .000 .000 -6.035 -6.035 | | Choice=RURAL .000 .000 .000 1.517 1.517 | | Choice=FREEWAY .000 .000 .000 1.517 1.517 | +-----------------------------------------------------------------+
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
108 +-----------------------------------------------------------------+ | Elasticity Averaged over observations. | | Attribute is PRICE in choice RURAL | | Effects on probabilities of all choices in the model: | | * indicates direct Elasticity effect of the attribute. | | Decomposition of Effect Total | | Trunk Limb Branch Choice Effect| | Trunk=Trunk{1} | | Limb=Lmb[1:1] | | Branch=B(1:1,1) | | Choice=ARTERIAL .000 .000 .000 5.538 5.538 | | * Choice=RURAL .000 .000 .000 -3.181 -3.181 | | Choice=FREEWAY .000 .000 .000 5.538 5.538 | +-----------------------------------------------------------------+ +-----------------------------------------------------------------+ | Elasticity Averaged over observations. | | Attribute is PRICE in choice FREEWAY | | Effects on probabilities of all choices in the model: | | * indicates direct Elasticity effect of the attribute. | | Decomposition of Effect Total | | Trunk Limb Branch Choice Effect| | Trunk=Trunk{1} | | Limb=Lmb[1:1] | | Branch=B(1:1,1) | | Choice=ARTERIAL .000 .000 .000 .830 .830 | | Choice=RURAL .000 .000 .000 .830 .830 | | * Choice=FREEWAY .000 .000 .000 -6.356 -6.356 | +-----------------------------------------------------------------+
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
109
Statistical and Econometric Methods
Assignment #5
(Discrete Data – Ordered Probit)
A survey of 322 commuters was in the Seattle metropolitan area. The survey's intent was to gather information on commuters' opinions of high-occupancy vehicle (HOV) lanes (lanes that are restricted for use by vehicles with 2 or more occupants). The variables available from this survey are given on the attached table.
Among the questions asked, commuters were asked whether they agreed with the statement "HOV lanes should be open to all vehicles, regardless of vehicle occupancy level." (variable number x29 in the table). The question provided ordered responses of; strongly disagree, disagree, neutral, agree, agree strongly and the observed percentage frequency of response in these 5 categories was 32.74, 21.71, 8.54, 12.10, and 24.91 respectively. To understand the factors determining commuter opinions, an ordered probit model of this survey question is appropriate. Your task is to estimate a model of the ordered response of whether commuters believe HOV lanes should be open to all vehicles, regardless of vehicle occupancy level. Your solution to this problem should include: 1. The results of your best model specification. 2. A discussion of the logical process that led you to the selection of your final specification. (e.g.
Discuss the theory behind the inclusion of your selected variables). Include t-statistics and justify the sign of your variables.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
110
Variables available for your specification are (in file surveys.csv):
Variable Number Explanation
x1 Usual mode of travel: 0 if drive alone, 1 if two person carpool, 2 if three or more person carpool, 3 if vanpool, 4 if bus, 5 if bicycle or walk, 6 if motorcycle, 7 if other
x2 Have used HOV lanes: 1 if yes, 0 if no
x3 If used HOV lanes, what mode is most often used: 0 in a bus, 1 in two person carpool, 2 in three or more person carpool, 3 in vanpool, 4 alone in vehicle, 5 on motorcycle
x4 Sometimes eligible for HOV lane use but do not use: 1 if yes, 0 if no
x5 Reason for not using HOV lanes when eligible: 0 if slower than regular lanes, 1 if too much trouble to change lanes, 2 if HOV lanes are not safe, 3 if traffic moves fast enough, 4 if forget to use HOV lanes, 5 if other
x6 Usual mode of travel one year ago: 0 if drive alone, 1 if two person carpool, 2 if three or more person carpool, 3 if vanpool, 4 if bus, 5 if bicycle or walk, 6 if motorcycle, 7 if other
x7 Commuted to work in Seattle a year ago: 1 if yes, 0 if no
x8 Have flexible work start times: 1 if yes, 0 if no
x9 Changed departure times to work in the last year: 1 if yes, 0 if no
x10 On average, number of minutes leaving earlier for work relative to last year
x11 On average, number of minutes leaving later for work relative to last year
x12 If changed departure times to work in the last year, reason why: 0 if change in travel mode, 1 if increasing traffic congestion, 2 if change in work start time, 3 if presence of HOV lanes, 4 if change in residence, 5 if change in lifestyle, 6 if other
x13 Changed route to work in the last year: 1 if yes, 0 if no
x14 If changed route to work in the last year, reason why: 0 if change in travel mode, 1 if increasing traffic congestion, 2 if change in work start time, 3 if presence of HOV lanes, 4 if change in residence, 5 if change in lifestyle, 6 if other
x15 Usually commute to or from work on Interstate 90: 1 if yes, 0 if no
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
111
x16 Usually commuted to or from work on Interstate 90 last year: 1 if yes, 0 if no
x17 On your past five commutes to work, how often have you used HOV lanes
x18 On your past five commutes to work, how often did you drive alone
x19 On your past five commutes to work, how often did you carpool with one other person
x20 On your past five commutes to work, how often did you carpool with two or more people
x21 On your past five commutes to work, how often did you take a vanpool
x22 On your past five commutes to work, how often did you take a bus
x23 On your past five commutes to work, how often did you bicycle or walk
x24 On your past five commutes to work, how often did you take a motorcycle
x25 On your past five commutes to work, how often did you take a mode other than those listed in variables 18 through 24
x26 On your past five commutes to work, how often have you changed route or departure time
x27 HOV lanes save all commuters time: 0 if strongly disagree, 1 if disagree, 2 if neutral, 3 if agree, 4 if agree strongly
x28 Existing HOV lanes are being adequately used: 0 if strongly disagree, 1 if disagree, 2 if neutral, 3 if agree, 4 if agree strongly
x29 HOV lanes should be open to all traffic: 0 if strongly disagree, 1 if disagree, 2 if neutral, 3 if agree, 4 if agree strongly
x30 Converting some regular lanes to HOV lanes is a good idea: 0 if strongly disagree, 1 if disagree, 2 if neutral, 3 if agree, 4 if agree strongly
x31 Converting some regular lanes to HOV lanes is a good idea only if it is done before traffic congestion becomes serious: 0 if strongly disagree, 1 if disagree, 2 if neutral, 3 if agree, 4 if agree strongly
x32 Gender: 1 if male, 0 if female
x33 Age in years: 0 if under 21, 1 if 22 to 30, 2 if 31 to 40, 3 if 41 to 50, 4 if 51 to 64, 5 if 65 or greater
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
112
x34 Annual household income (US dollars per year): 0 if no income, 1 if 1 to 9,999, 2 if 10,000 to 19,999, 3 if 20,000 to 29,999, 4 if 30,000 to 39,999, 5 if 40,000 to 49,999, 6 if 50,000 to 74,999, 7 if 75,000 to 100,000, 8 if over 100,000
x35 Highest level of education: 0 if did not finish high school, 1 if high school, 2 if community college or trade school, 3 if college/university, 4 if post college graduate degree
x36 Number of household members
x37 Number of adults in household (aged 16 or more)
x38 Number of household members working outside the home
x39 Number of licensed motor vehicles in the household
x40 Postal zip code of work place
x41 Postal zip code of home
x42 Type of survey comment left by respondent regarding opinions on HOV lanes: 0 if no comment on HOV lanes, 1 if comment not in favor of HOV lanes, 2 comment positive toward HOV lanes but critical of HOV lane policies, 3 comment positive toward HOV lanes, 4 neutral HOV lane comment
--> read;nvar=42;nobs=322;file=D:\old_drive_d\new_laptop\CE697N-disk\SURVEYS-L.CSV$ --> create;if(x1=0)dalone=1$ --> create;if(x33>3&x32=1)oldmen=1$ --> create;if(x35>2)college=1$ --> histogram;rhs=x29$ Histogram for X29 NOBS= 314, Too low: 0, Too high: 0 Bin Lower limit Upper limit Frequency Cumulative Frequency ======================================================================== 0 .000 1.000 99 ( .3153) 99( .3153) 1 1.000 2.000 77 ( .2452) 176( .5605) 2 2.000 3.000 26 ( .0828) 202( .6433) 3 3.000 4.000 36 ( .1146) 238( .7580) 4 4.000 5.000 76 ( .2420) 314(1.0000)
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
113
Bin
Histogram for Variable X29Fr
eque
ncy
0
28
56
84
112
0 1 2 3 4
--> skip$ --> ordered;lhs=x29;rhs=one,dalone,x8,oldmen,college,x36;marginal effects$ +-----------------------------------------------------------------------+ | Dependent variable is binary, y=0 or y not equal 0 | | Ordinary least squares regression Weighting variable = none | | Dep. var. = Y=0/Not0 Mean= .6738351254 , S.D.= .4696508592 | | Model size: Observations = 279, Parameters = 6, Deg.Fr.= 273 | | Residuals: Sum of squares= .9972582530D+03, Std.Dev.= 1.91127 | | Fit: R-squared=*********, Adjusted R-squared = -15.56131 | | Diagnostic: Log-L = -573.5787, Restricted(b=0) Log-L = -184.5243 | | LogAmemiyaPrCrt.= 1.317, Akaike Info. Crt.= 4.155 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant .1296715074 .44220253 .293 .7693 DALONE .4388033502 .27449960 1.599 .1099 .77060932 X8 .4781738604E-01 .23267286 .206 .8372 .48028674 OLDMEN .6020054050E-01 .35308446 .170 .8646 .12903226 COLLEGE .9724386887E-01 .28975461 .336 .7372 .79211470 X36 .3343083528E-01 .97510448E-01 .343 .7317 2.9390681 Normal exit from iterations. Exit status=0. +---------------------------------------------+ | Ordered Probit Model | | Maximum Likelihood Estimates | | Dependent variable X29 | | Weighting variable ONE | | Number of observations 279 | | Iterations completed 14 | | Log likelihood function -397.2770 | | Restricted log likelihood -421.3950 | | Chi-squared 48.23599 | | Degrees of freedom 5 | | Significance level .0000000 | | Cell frequencies for outcomes | | Y Count Freq Y Count Freq Y Count Freq | | 0 91 .326 1 60 .215 2 24 .086 | | 3 34 .121 4 70 .250 | +---------------------------------------------+
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
114 +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Index function for probability Constant -.5807798304 .27813128 -2.088 .0368 DALONE 1.136565726 .16430272 6.918 .0000 .77060932 X8 .2301353655 .13561406 1.697 .0897 .48028674 OLDMEN .1968407034 .20635114 .954 .3401 .12903226 COLLEGE .1996976747E-01 .15582658 .128 .8980 .79211470 X36 .1178065062E-01 .60437908E-01 .195 .8455 2.9390681 Threshold parameters for index Mu( 1) .6231650207 .73062591E-01 8.529 .0000 Mu( 2) .8657954320 .83104656E-01 10.418 .0000 Mu( 3) 1.240495241 .95160650E-01 13.036 .0000 +------------------------------------------------------+ | Marginal Effects for OrdProbt | +----------+----------+----------+----------+----------+ | Variable | X29=0 | X29=1 | X29=2 | X29=3 | +----------+----------+----------+----------+----------+ | ONE | .2063 | .0230 | -.0142 | -.0415 | | DALONE | -.4038 | -.0451 | .0278 | .0812 | | X8 | -.0818 | -.0091 | .0056 | .0164 | | OLDMEN | -.0699 | -.0078 | .0048 | .0141 | | COLLEGE | -.0071 | -.0008 | .0005 | .0014 | | X36 | -.0042 | -.0005 | .0003 | .0008 | +----------+----------+----------+----------+----------+ Frequencies of actual & predicted outcomes Predicted outcome has maximum probability. Predicted ------ ------------------------- + ----- Actual 0 1 2 3 4 | Total ------ ------------------------- + ----- 0 59 0 0 0 32 | 91 1 23 0 0 0 37 | 60 2 9 0 0 0 15 | 24 3 9 0 0 0 25 | 34 4 25 0 0 0 45 | 70 ------ ------------------------- + ----- Total 125 0 0 0 154 | 279
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
115
Statistical and Econometric Methods
Assignment #6 (Duration Models)
You are given 204 observations from a travel survey conducted in the spring of 1988, in the Seattle area (this is the same data that was used for assignment #2). While the purpose of the survey was to study the number of times (per week) commuters' changed their departure time on their work-to-home trip to avoid traffic congestion, we also have information on the length of time that they delay their trips to avoid congestion. The length of time commuters’ delay is ideally suited to duration models. Your task is to estimate, Weibull, Weibull model with gamma heterogeneity and log-logistic hazard models using the software package LIMDEP version 7.0. Please note that LIMDEP actually estimates the parameter vector -β instead of just β so that the effect of the covariates on the hazard is: EXP(-βX) This means that a negative parameter in LIMDEP increases the hazard and thus decreases the duration. So the negative sign gives the effect on duration instead of on the hazard. In your analysis include: 1. The results of your best model specification. 2. Show and discuss the shape of the hazard function of your best specifications. 2. A discussion of the logical process that led you to the selection of your final specification. (e.g.
Discuss the theory behind the inclusion of your selected variables). Include t-statistics and justify the sign of your variables.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
116
Variables available for your specification are (file tobit.dat) Variable Number Explanation
x1 Household number
x2 Do you ever delay work-to-home departure to avoid traffic congestion? 1-yes, 0-no
x3 If sometimes delay, on average how many minutes do you delay?
x4 If sometimes delay, do you 1-perform additional work, 2-engage in non-work activities, or 3-do both?
x5 If sometimes delay, how many times have you delayed in the past week?
x6 Mode of transportation used work-to-home: 1-car SOV, 2-carpool, 3-vanpool, 4-bus, 5 other.
x7 Primary route (work-to-home): 1-I90, 2-I5, 3-SR520, 4-I405, 5-other
x8 Do you generally encounter traffic congestion on you work-to-home trip? 1-yes, 2-no
x9 Age in years: 1-(<25), 2-(26-30), 3-(31-35), 4-(36-40), 5-(41-45), 6-(46-50), 7-(>50)
x10 Gender: 1-male, 0-female
x11 Number of cars in household
x12 Number of children in household
x13 Annual income: 1 - less than 20000, 2 - 20000 to 29999, 3 - 30000 to 39999, 4 - 40000 to 49999, 5 - 50000 to 59999, 6 - >60000
x14 Do you have flexible work hours? 1-yes, 0-no
x15 Distance from work to home (in miles)
x16 Face LOS D or worse? 1-yes, 0-no
x17 Ratio of actual travel time to free-flow travel time
x18 Population of work zone
x19 Retail employment in work zone
x20 Service employment in work zone
x21 Size of work zone (in acres)
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
117 --> RESET --> sample;1-204$ --> read;nvar=21;nobs=204;file=D:\new_laptop\CE697N-disk\tobit.dat$ --> reject;x3=0$ --> dstat;rhs=x3$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- X3 51.2916667 37.4671552 4.00000000 240.000000 96 --> create;if(x6=1)car=1$ --> create;ltime=log(x3)$ --> create;if(x9>6)old=1$ --> dstat;rhs=car$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- CAR .718750000 .451969375 .000000000 1.00000000 96 --> survival;lhs=ltime;rhs=one,x15,x17,x12;model=weibull$ +-----------------------------------------------------------------------+ | Log-linear survival regression model: WEIBULL | | Least squares is used to obtain starting values for MLE. | | Ordinary least squares regression Weighting variable = none | | Dep. var. = LTIME Mean= 3.706843804 , S.D.= .6997312277 | | Model size: Observations = 96, Parameters = 4, Deg.Fr.= 92 | | Residuals: Sum of squares= 39.91702844 , Std.Dev.= .65870 | | Fit: R-squared= .141832, Adjusted R-squared = .11385 | | Model test: F[ 3, 92] = 5.07, Prob value = .00271 | | Diagnostic: Log-L = -94.0959, Restricted(b=0) Log-L = -101.4378 | | LogAmemiyaPrCrt.= -.794, Akaike Info. Crt.= 2.044 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 1.619000121 .54359827 2.978 .0029 X15 .4225283769E-01 .15618491E-01 2.705 .0068 7.7083333 X17 .9020960070 .24142168 3.737 .0002 1.9593750 X12 -.6645707142E-02 .60944914E-01 -.109 .9132 .81250000 Normal exit from iterations. Exit status=0.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
118 +---------------------------------------------+ | Loglinear survival model: WEIBULL | | Maximum Likelihood Estimates | | Dependent variable LTIME | | Weighting variable ONE | | Number of observations 96 | | Iterations completed 11 | | Log likelihood function -96.28262 | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ RHS of hazard model Constant 1.732270225 .65862735 2.630 .0085 X15 .3273725360E-01 .19402531E-01 1.687 .0916 7.7083333 X17 1.055416934 .27856852 3.789 .0002 1.9593750 X12 -.3865858378E-01 .57807767E-01 -.669 .5037 .81250000 Ancillary parameters for survival Sigma .5872525538 .55008811E-01 10.676 .0000 +----------------------------------------------------------------+ | Parameters of underlying density at data means: | | Parameter Estimate Std. Error Confidence Interval | | ------------------------------------------------------------ | | Lambda .01793 .00121 .0156 to .0203 | | P 1.70284 .15951 1.3902 to 2.0155 | | Median 44.96713 3.03851 39.0116 to 50.9226 | | Percentiles of survival distribution: | | Survival .25 .50 .75 .95 | | Time 67.56 44.97 26.83 9.75 | +----------------------------------------------------------------+ --> survival;lhs=ltime;rhs=one,x15,x17,x12;model=weibull;heterogeneity$ +-----------------------------------------------------------------------+ | Log-linear survival regression model: WEIBULL | | Least squares is used to obtain starting values for MLE. | | Weibull Model with Gamma Heterogeneity | | Ordinary least squares regression Weighting variable = none | | Dep. var. = LTIME Mean= 3.706843804 , S.D.= .6997312277 | | Model size: Observations = 96, Parameters = 4, Deg.Fr.= 92 | | Residuals: Sum of squares= 39.91702844 , Std.Dev.= .65870 | | Fit: R-squared= .141832, Adjusted R-squared = .11385 | | Model test: F[ 3, 92] = 5.07, Prob value = .00271 | | Diagnostic: Log-L = -94.0959, Restricted(b=0) Log-L = -101.4378 | | LogAmemiyaPrCrt.= -.794, Akaike Info. Crt.= 2.044 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 1.619000121 .54359827 2.978 .0029 X15 .4225283769E-01 .15618491E-01 2.705 .0068 7.7083333 X17 .9020960070 .24142168 3.737 .0002 1.9593750 X12 -.6645707142E-02 .60944914E-01 -.109 .9132 .81250000 Normal exit from iterations. Exit status=0.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
119 +---------------------------------------------+ | Loglinear survival model: WEIBULL | | Maximum Likelihood Estimates | | Dependent variable LTIME | | Weighting variable ONE | | Number of observations 96 | | Iterations completed 16 | | Log likelihood function -93.88402 | | Weibull Model with Gamma Heterogeneity | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ RHS of hazard model Constant 1.870386758 .58870206 3.177 .0015 X15 .3375074414E-01 .17842561E-01 1.892 .0585 7.7083333 X17 .8579132493 .25730277 3.334 .0009 1.9593750 X12 -.1044830246E-01 .57608312E-01 -.181 .8561 .81250000 Ancillary parameters for survival Theta .6141476031 .39135931 1.569 .1166 Sigma .4212482203 .71720253E-01 5.873 .0000 +----------------------------------------------------------------+ | Parameters of underlying density at data means: | | Parameter Estimate Std. Error Confidence Interval | | ------------------------------------------------------------ | | Lambda .02230 .00226 .0179 to .0267 | | P 2.37390 .40417 1.5817 to 3.1661 | | Median 42.16025 4.27718 33.7770 to 50.5435 | | Percentiles of survival distribution: | | Survival .25 .50 .75 .95 | | Time 62.34 42.16 27.55 12.92 | +----------------------------------------------------------------+ --> survival;lhs=ltime;rhs=one,x15,x17,x12;model=logistic;plot$ +-----------------------------------------------------------------------+ | Log-linear survival regression model: LOGISTIC | | Least squares is used to obtain starting values for MLE. | | Ordinary least squares regression Weighting variable = none | | Dep. var. = LTIME Mean= 3.706843804 , S.D.= .6997312277 | | Model size: Observations = 96, Parameters = 4, Deg.Fr.= 92 | | Residuals: Sum of squares= 39.91702844 , Std.Dev.= .65870 | | Fit: R-squared= .141832, Adjusted R-squared = .11385 | | Model test: F[ 3, 92] = 5.07, Prob value = .00271 | | Diagnostic: Log-L = -94.0959, Restricted(b=0) Log-L = -101.4378 | | LogAmemiyaPrCrt.= -.794, Akaike Info. Crt.= 2.044 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 1.619000121 .54359827 2.978 .0029 X15 .4225283769E-01 .15618491E-01 2.705 .0068 7.7083333 X17 .9020960070 .24142168 3.737 .0002 1.9593750 X12 -.6645707142E-02 .60944914E-01 -.109 .9132 .81250000 Normal exit from iterations. Exit status=0.
Statistical and Econometric Methods
Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007
120 +---------------------------------------------+ | Loglinear survival model: LOGISTIC | | Maximum Likelihood Estimates | | Dependent variable LTIME | | Weighting variable ONE | | Number of observations 96 | | Iterations completed 9 | | Log likelihood function -94.28102 | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ RHS of hazard model Constant 1.859264488 .56577702 3.286 .0010 X15 .3536846032E-01 .16964884E-01 2.085 .0371 7.7083333 X17 .8117401209 .24761640 3.278 .0010 1.9593750 X12 -.6399300532E-02 .55823521E-01 -.115 .9087 .81250000 Ancillary parameters for survival Sigma .3648248813 .34783222E-01 10.489 .0000 +----------------------------------------------------------------+ | Parameters of underlying density at data means: | | Parameter Estimate Std. Error Confidence Interval | | ------------------------------------------------------------ | | Lambda .02430 .00157 .0212 to .0274 | | P 2.74104 .26134 2.2288 to 3.2533 | | Median 41.14903 2.65177 35.9516 to 46.3465 | | Percentiles of survival distribution: | | Survival .25 .50 .75 .95 | | Time 61.44 41.15 27.56 14.06 | +----------------------------------------------------------------+
Estimated Hazard FunctionDuration
.000
.005
.010
.015
.020
.025
.030
.035
.040
-.00548 96 144 192 2400
HazardFn