Course Notes: Statistical and Econometric Methodsflm/CE697N_files/Stat...Fred Mannering, Statistical...

Fred Mannering, Statistical and Econometric Methods – Course Notes, Purdue University © 2007

Course Notes:

Statistical and Econometric Methods

0

1 2 3 4 5

h(t)

t

1

h (t)

h (t)

h (t)h (t)

3

4

2

2

4

0.5

1.0

0

1 2 3 4t

F(t)

h(t)

S(t)f(t)

+ +

+ +

+ +

+

+ + +

+

+

+ +

+

+

- -

- - -

-

-

-

-

-

-

- -

- - -

- -

-

sn

+ +

+ - -

-

+ +

-

-

+

+

+

+ + -

- -

- -

-

+ + +

line 1

line 2

+ +

( ) ( ) ( ) ( ) ( )( ) ( ) ( )21 6 1 1J

J+1n i j j j i

j i

E | i = - J P LN P P LN Pξ σ ρ π≠

⎡ ⎤⎡ ⎤− +⎢ ⎥⎣ ⎦⎣ ⎦∑

( ) ( )1f

I In

oI In

X

I In I X

CV - LN EXP Xβ

β

λ β∀

⎡ ⎤= − ⎢ ⎥⎣ ⎦∑ ( ) ( )

i

iX

S g X h X dX= ∫

( ) ( )

( ) ( ) ( )

Δ1

Δ

i i kI kI

kii i kI kI kI kI

n n

P ( i ) Ix

I I I

EXP X EXP xE

EXP X EXP x EXP x

β β

β β β∀

∀ ∀ ≠

⎡ ⎤⎣ ⎦= −

+⎡ ⎤⎣ ⎦

∑∑ ∑

( ) ( )!

iyi i

i i

EXPL β =

yλ λ−

∏

( ) ( )0

! !i i

i

ry m

i i i i im

P y = y mλ λ=

⎡ ⎤⎡ ⎤ ⎢ ⎥⎣ ⎦

⎣ ⎦∑

1((1 ) ) 1( )

(1 ) ! (1 ) (1 )

iy

i ii

i i i i

yLy

αΓ α α λλΓ α α λ α λ

⎛ ⎞ ⎛ ⎞+= ⎜ ⎟ ⎜ ⎟+ +⎝ ⎠ ⎝ ⎠∏

( ) 11 1T Tˆ X X X Y−− −=β Ω Ω ( ) 2 ITΕ εε σ= 00in in

inn in

V Vy Inc p

+ =∂ ∂∂ ∂

Professor Fred Mannering Purdue University – Spring 2007


i

Table of Contents Review of Statistical Methods (Estimators and their statistical properties) 1 Model Estimation 1 Properties of Estimators 1 Bias 1 Efficiency 2 Consistency 3 Other Asymptotic Properties 4 Least Squares and Maximum Likelihood Estimation 5 Properties of Least Squares Estimators 5 Maximum Likelihood Estimation 6 Specification Issues and Least Squares 9

Specification Error 9 Non-zero Disturbance Mean 10 Errors in Variables 10 Correlation Between Explanatory Variables and Disturbances 11 Selectivity Bias 11 Non-normality of Disturbances 12 Heteroskedasticity 12 Serial Correlation 12 Multicollinearity 13 Simultaneous Equation Models 13

Reduced Form and the Identification Problem 14 The Identification Problem 16

Order Condition 16 Simultaneous Equation Estimation 16

Single equation methods 17 System equation methods 18

A note on generalized least squares estimation 18 Hypothesis Testing and Diagnostics for Continuous Dependent Variable Models 20 Assessment of Estimates Coefficients 20 Overall Model Assessment 21 Count Data Models 24

Poisson Regression Model Goodness of Fit Measures 26 Truncated Poisson Regression Model 27 Negative Binomial Regression Model 28


ii

Zero-Inflated Poisson and Negative Binomial Regression Models 29 Discrete Outcome Models (Models of Discrete Data) 32 Binary and Multinomial Probit Models 34

Multinomial Logit Model 36 Indirect Utility 42 Properties and Estimation of Multinomial Logit Models 43 Statistical Evaluation 45 Interpretation of Findings 47 Elasticity 47 Cross-elasticity 48 Marginal Rates of Substitution (MRS) 49 Specification Errors 49

Independence of Irrelevant Alternatives (IIA) property 49 Other Specification Errors 51

Endogeniety in Discrete Outcome Models 53 Data Sampling 53

Forecasting and Aggregation Bias 56 Transferability 58

The Nested Logit Model (Generalized Extreme Value Models) 59 Special Properties of Logit Models 63

Sub-sampling of alternate outcomes for model estimation 63 Compensating Variation 63

Models of Ordered Discrete Data 64 Discrete/Continuous Models 69

The Discrete/Continuous Modeling Problem 69 Econometric Corrections: Instrumental Variables and Expected Value Method 70 Econometric Corrections: Selectivity-Bias Correction Term 72 Discrete/continuous Model Structures 74

Reduced form approach 74 Economic consistency approach 76 Duration Models 78

Hazard-Based Duration Models 78 Proportional hazards 81 Accelerated Lifetime 82 Characteristics of Duration Data 82 Tied Data 84


iii

Non-Parametric Models 84 Semi-Parametric Models 84 Fully-Parametric Models 85

Exponential 85 Weibull 86 Log-logistic 86

Comparisons of Non-Parametric, Semi-Parametric, and Fully-Parametric Models 87 State Dependence 90 Time-Varying Covariates 91 Discrete-Time Hazard Models 91

Course Assignments 93



1

Review of Statistical Methods

Estimators and their statistical properties Model Estimation

Consider a model of household vehicle miles of travel – Household Miles Driven Over

Some Time Period:

yt = β0 + β1 xt + εt

where: yt Dependent Variable,

xt Independent Variable

β0, β1 Estimable Parameters

εt Disturbance or Error Term

Estimation problem is one of finding values for β0 and β1.

Properties of Estimators

Classes of properties:

• Small sample - Hold for any size sample • Asymptotic - Hold only as the limit of n → ∞

Bias

- Desirable to have the estimator distribution have a mean value equal to the true parameter.

- Define unbiasedness as ( )ˆΕ β β=

- For small sample unbiasedness: ( )ˆ nΕ β β= ∀

- For asymptotic unbiasedness: ( )n

ˆlim Ε β β→∞

=



2

In general, bias is defined as: Bias = ( )ˆΕ β β−

Illustration of biased estimators.

Efficiency

Efficiency is a small sample property.

One estimator is more efficient than another if it has smaller variance: (Both estimators

must be unbiased).

e.g., 1β is more efficient than 2β if ( ) ( )1 2ˆ ˆVAR VARβ β<

The best unbiased estimator is the most efficient among all unbiased estimators.

E(beta hat)=beta E(beta hat) ne beta

bias



3

The most efficient estimator is defined as having a smaller variance than any other

unbiased estimator.

μ x

~ N( μ ,σ 2 /n)

X 1 ~ N( μ ,σ 2 )

X

( ) xE X μ=

Illustration of efficient estimators.

Identification of best unbiased (most efficient) estimator is achieved by the Cramer-

Rao theorem. Under a number of assumptions, it can be shown that for all

estimators:

( ) 2

2

1ˆVARL

β∂Ε∂β

⎛ ⎞⎜ ⎟⎜ ⎟≥⎜ ⎟⎡ ⎤−⎜ ⎟⎢ ⎥⎜ ⎟⎣ ⎦⎝ ⎠

Can prove most efficient estimator if ( )ˆVAR β is equal to the Cramer-Rao bound.

Consistency

Consistency is an asymptotic property.

Definition: A consistent estimator has a distribution which collapses on the true

parameter value as the sample size increases.



4

β converges to β in the probability limit if for any

( ) 0n

ˆlim Prob β β→∞

− = . Also, ( ) 0n

ˆplim VAR β→∞

=

Prob

abili

ty d

ensi

ty

μ x

f (X*) for n 1

f (X*) for n 2 < n 1

f (X*) for n 4 < n 3

f (X*) for n 3 < n 2

Note: Consistent estimators can be biased and inefficient; therefore, consistency is not a

strong property.

Other Asymptotic Properties

Desire to show that estimator's distribution can be approximated better and better as sample

size increases.

1. Asymptotically Normal - The estimator's distribution converges to a normal

distribution.

2. Asymptotic Efficiency - nβ is asymptotically efficient if:

nβ is consistent

nβ asymptotic variance is smaller than the asymptotic variance of all other

consistent estimators



5

Models with Continuous Dependent Variables Least Squares and Maximum Likelihood Estimation

Least Squares Estimation

The object of least squares is to fit an equation that minimizes the squared differences between equation predicted values and observed values (i.e., data).

the objective function is: ( )2

i iˆM in Y Y−∑

where: Yi - Actual observations, iY - Refers to fitted values

The term i iˆY Y− is referred to as the residual and is denoted as εi.

For the case 0 1i iY xβ β= + it can be shown that:

( )( )( )

( )1 2 22

i ii i i i

ii i

x x Y Yn x Y x Y

x xn x xβ

− −−= =

−−

∑ ∑ ∑∑∑ ∑

and 0 1ˆY b xβ = −

For the case of many independent variables, least squares estimation can be represented in matrix form as ( ) ( )1X X X Yβ −′ ′=

where:

1

2

3

K

ˆ

ˆˆ ˆ

ˆ

β

ββ β

β

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥= ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

K = number of independent variables (x's); ' = indicates transposed matrix; -1= indicates matrix inversion; X = matrix of independent variables - N × K; Y = vector of dependent variable - N × 1 Properties of Least Squares Estimators

Under very general assumptions, the Gauss-Markov theorem demonstrates the least squares estimators (OLS - Ordinary Least Squares) are BLUE.



6

BLUE - Best Linear Unbiased Estimator

Implies: Unbiased, Efficient

Assumptions required to prove OLS is BLUE:

A1. Normality:The disturbance term iε is normally distributed.

A2. Zero Mean:

E( iε ) = 0

A3. Homoskedasticity: Disturbance terms have the same variance.

( )2 2iE ε σ=

A4. Serial Independence: Disturbance terms are not correlated.

( ) 0i jE i jε ε = ∀ ≠

A5. Non Stochastic X: Is not random and has fixed values in repeated samples.

Maximum Likelihood Estimation

Principle: Different statistical populations generate different samples; any one sample is

more likely to come from some populations rather than others.

Example: If we have a sample of Y1, Y2, ... , Yn, we want to find the value of β most

likely to generate this sample.



7

Consider the simple model, 0 1i i iY xβ β ε= + + Assume (as in OLS) that Yi is normally distributed with mean 0 1 ixβ β+ and variance σ

2,

therefore, the probability distribution can be written as:

( ) ( )20 122

1 122

i i iP Y E X P Y xβ βσπ σ

⎡ ⎤= − − −⎢ ⎥⎣ ⎦

The likelihood function is:

( ) ( ) ( ) ( ) ( )221 2 0 1 1 2 0 122

1

1 122

N

N N i ii

L Y ,Y , ,Y , , , P Y P Y P Y EXP Y xβ β σ β βσπσ=

⎛ ⎞ ⎡ ⎤= = − − −⎜ ⎟ ⎢ ⎥⎣ ⎦⎝ ⎠∏… …

where Π is the product of N factors.

Y1 Y6 Y3 Y5 Y2 Y4



8

For simplicity, work is done with the logarithm of L rather than L itself. This is acceptable since L is always non-negative and the logarithmic function is monotonic (preserves ordering). Maximizing LN( L), LLwith respect to 2

0 1, ,β β σ gives:

( ) ( )0 120

1 0i i

L LY xβ β

β σ∂

= − − =∂ ∑

( ) ( )0 12

1

1 0i i i

L Lx Y xβ β

β σ∂

= − − =⎡ ⎤⎣ ⎦∂ ∑

( ) ( )0 12 2 4

1 02 2 i i

LL N Y xβ βσ σ σ

∂= − + − − =

∂ ∑

Solving these equations gives:

0 1Y xβ β= − and ( )( )

( )1 2i i

i

x x Y Y

x xβ

− −=

−∑∑

which is equivalent to OLS estimators. However, in general, MLE's are not necessarily BLUE. Properties of MLE's (Maximum Likelihood Estimators) 1) They are consistent.

2) They are asymptotically normal.

3) They are asymptotically efficient (i.e., asymptotic variance = Cramer-Rao Bound)

Note: Maximum Likelihood Estimators are not generally unbiased or efficient.



9

Specification Issues and Least Squares A. Specification Error

refers to errors resulting from a misspecified model (i.e., functional form). 1) Omitted Variables

Suppose the true model is: 1 1 2 2i i i iy x xβ β ε= + + and we estimate: 2 2

* *i i iy xβ ε= +

It can be shown by substitution that ( )( )2 2 1

2 1*

1

C O V x , xˆV A R x

β β β= +

Because there is no guarantee that the second term is equal to zero, the estimation of

2*β in the misspecified equation will be biased.

Because this bias does not dissappear and n → ∞ , the parameter will be inconsistent as well. However, if ( ) 02 1C O V x , x = (i.e., 2x and 1x are not correlated) then estimators will be BLUE except intercept.

2) Presence of Irrelevant Variables

Suppose the true model is 2 2i i iy xβ ε= + and we estimate The irrelevant variable x3 implies we are not accounting for the parameter restriction

where 3 0*β = . In general, not accounting for all available information leads to loss of efficiency; but no

loss of consistency or bias.

So, 2*β is unbiased and consistent. ( )2 2

*ˆE β β=

2 2 3 3* * *

i i i iy x xβ β ε= + +



10

but it is not efficient since ( ) ( )2 2*ˆ ˆVAR VARβ β>

exception is when COV(x2, x3) = 0 when estimators are again BLUE except intercept.

3) Nonlinearities

Suppose the true model is 2 32 2 3 2 4 2i i i i iy x x xβ β β ε= + + +

and we estimate 2 2

* *i i iy xβ ε= +

This results in the same consequences as omitted variables (i.e., biased and inconsistent

parameter estimates).

B. Non-zero Disturbance Mean (violation of assumption A2)

i.e., E(εi) ≠ 0 Cause: Can result from consistent positive or negative errors of measurement in Y.

If an intercept (β0) is excluded, the parameter estimates will be biased and

inconsistent.

If and intercept is included, it will be a biased estimate of the true intercept, but all

other parameters will be BLUE.

C. Errors in Variables (violation of assumption A5) If we have i i iy xβ ε= + and:

1) yi is measured with error, i.e., we use y* = yi + μi (μi is error)

If COV(μi, xi) = 0 then β is unbiased and consistent

If COV(μi, xi) ≠ 0 then bias and consistent



11

2) xi is measured with error

Then parameters will be bias and inconsistent 3) yi and xi measured with error

Then parameters will be bias and inconsistent

D. Correlation Between Explanatory Variables and Disturbances (violation of assumption A5)

Implies x does not have fixed values in repeated samples (A5)

If x and εi are correlated, β will be a biased and inconsistent estimator of β.

This correlation problem is the same problem that results from endogenous variables,

and leads to simultaneous equation estimation techniques. E. Selectivity Bias

Evolves when the available data sample is not representative of the entire population, and the reason for this is based on some selection process.

For example: Estimating a VMT equation for new cars will be biased since households buy new cars since they drive more (i.e., we do not know how much people owning used cars would drive if they had new cars).

+ +

+ +

+ +

+

+ + +

+

+

+ +

+

+

- -

- - -

-

-

-

-

-

-

- -

- - -

- -

-

sn

+ +

+ - -

-

+ +

-

-

+

+

+

+ + -

- -

- -

-

+ + +

line 1

line 2

+ +

βf Xn

Results in biased parameter estimates.



12

F. Non-normality of Disturbances (violation of assumption A1)

Causes: 1. Measurement errors 2. Unobserved parameter variations

Results in hypothesis testing problems (i.e., hypothesis testing depends crucially on the normality assumption). With failure of normality, OLS is inefficient but still consistent. Diagnostics: 1. Specification tests 2. Plot residuals and see if they are normal

G. Heteroskedasticity (violation of assumption A3)

Results when the disturbance term variables have variances that are not equal

( ) ( ) ( )2 2 21 2 NE E Eε ε ε≠ ≠ ≠…

Causes: 1. Unequally sized observation units 2. Aggregation Heteroskedasticity results in OLS estimates that are unbiased and consistent but not efficient. Diagnostics: 1. Plot of squared residuals versus independent variable 2. Split sample regressions

H. Serial Correlation (violation of assumption A4)

Results when E(εiεj) ≠ 0 ∀ i ∀ j Causes: 1. Persistent disturbances 2. Omitted smoothly changing variables 3. Time averaged data Serial correlation results in OLS estimators that are generally unbiased and consistent but

not efficient.

If lagged dependent variables are in a model that has serial correlation, the problems are

much more severe.

Diagnostic: 1. Durbin-Watson statistic



13

I. Multicollinearity

Results when independent variables are highly correlated. Cause: 1. Lack of variation among data OLS estimators in the presence of multicollinearity remain BLUE However, the standard errors of the estimated coefficients can be quite large Diagnostic: 1. Condition number of X'X

Simultaneous Equation Models

Interrelated equations with continuous dependent variables:

Utilization of individual vehicles (measured in kilometers driven) in multivehicle

households

Interrelation between travel time from home to an activity and the duration of the

activity

Interrelation of average vehicle speeds by lane with the vehicle speeds in adjacent

lanes.

Problem:

Estimation of equation systems by the ordinary least squares (OLS) violates a key

OLS assumption in that a correlation between regressors and distrubances will be

present because not all independent variables are fixed in random samples (violation

of A5).

Overview of the simultaneous equations problem

Consider annual vehicle utilization equations (one for each vehicle) in two-vehicle

households of the following linear form:



14

1 1 1 1 1 2 1u Z X u= + + +β α λ ε

2 2 2 2 2 1 2u Z X u= + + +β α λ ε

Where:

u1 is the kilometers per year that vehicle 1 is driven,

u2 is the kilometers per year that vehicle 2 is driven,

Z1 and Z2 are vectors of vehicle attributes (for vehicles 1 and 2 respectively),

X is a vector of household characteristics,

β's, α's, are vectors of estimable parameters, λ's are estimable scalars, and ε's are

disturbance terms.

To satisfy regression assumption A5, the value of the dependent variable (left-hand

side variable) must not influence the value of an independent variable (right-hand

side).

This is not the case in these equations because in the first equation the independent

variable u2 varies as the dependent variable u1 varies, and in the second equation, the

independent variable u1 varies as the dependent variable u2 varies.

Thus, u2 and u1 are said to be endogenous variables in Equations 5.1 and 5.2

respectively.

Reduced Form and the Identification Problem

Reduced form solution: solving two equations and two unknowns to arrive at reduced forms.

Substituting second equation into the first in the previous example:

[ ]1 1 1 1 1 2 2 2 2 1 2 1u Z X Z X u= + + + + + +β α λ β α λ ε ε

rearranging,



15

1 1 1 2 1 2 1 2 11 1 2

1 2 1 2 1 2 1 21 1 1 1u Z X Z+ +

= + + +− − − −β α λ α λ β λ ε ελ λ λ λ λ λ λ λ

and similarly substituting first equation for u1 in the second equation gives,

2 2 2 1 2 1 2 1 22 2 1

2 1 2 1 2 1 2 11 1 1 1u Z X Z+ +

= + + +− − − −β α λ α λ β λ ε ελ λ λ λ λ λ λ λ

Because the endogenous variables u1 and u2 are replaced by their exogenous determinants,

the equations cand be estimated using ordinary least squares (OLS) as,

1 1 1 1 1 2 1u a Z b X c Z= + + + ξ , and

2 2 2 2 2 1 2u a Z b X c Z= + + + ξ ,

where,

1 1 1 2 1 2 1 2 11 1 1 1

1 2 1 2 1 2 1 2

; ; ;1 1 1 1

a b c + += = = =

− − − −β α λ α λ β λ ε εξλ λ λ λ λ λ λ λ

2 2 2 1 2 1 2 1 22 2 2 1

2 1 2 1 2 1 2 1

; ; ;1 1 1 1

a b c + += = = =

− − − −β α λ α λ β λ ε εξλ λ λ λ λ λ λ λ .

OLS estimation of these reduced form models (Equations 5.6 and 5.7) is called indirect least

squares (ILS).

Problem: While estimated reduced form models are readily used for forecasting purposes, if

inferences are to be drawn from the model system, the underlying parameters need to be

determined.

Unfortunately, uncovering the underlying parameters, (the β's, α's, and λ's) in reduced form

models is problematic because either too little or too much information is often available.

For example, note that above equations provide two possible solutions for β1,

( ) ( )2 2 11 1 1 2 1

2

c 1a 1 and

λ λβ λ λ β

λ−

= − = .



16

The Identification Problem

In some instances, it may be impossible to determine the underlying parameters. In

these cases, the modeling system is said to be unidentified.

In cases where exactly one equation solves the underlying parameters, the model system

is said to be exactly identified.

When more than one equation solves the underlying parameters (as shown in Equation

5.10), the model system is said to be over identified.

Order Condition

Determines an equation to be identified if the number of all variables excluded from an

equation in an equation system is greater than or equal to the number of endogenous

variables in the equation system minus one.

For example, in the first equation in the original equation system above, the number of

elements in the vector Z2, which is an exogenous vector excluded from the equation,

must be greater than or equal to one because there are two endogenous variables in the

equation system (u1 and u2).

Simultaneous Equation Estimation

1) Two modeling alternatives: single-equations estimation methods and systems

estimation methods.

2) The distinction between the two is that systems methods consider all of the

parameter restrictions (caused by over identification) in the entire equation system

and account for possible contemporaneous (cross-equation) correlation of

disturbance terms.

3) Because system estimation approaches are able to utilize more information

(parameter restrictions and contemporaneous correlation), they produce variance-

covariance matrices that are at worst equal to, and in most cases smaller than those



17

produced by single-equation methods (resulting in lower standard errors and higher

t-statistics for estimated model parameters).

Single equation methods

1) Indirect least squares (ILS)

Applies ordinary least squares to the reduced form models.

Consistent but not unbiased

2) Instrumental variables (IV)

1) Uses an instrument (a variable that is highly correlated with the endogenous variable it replaces, but is not correlated to the disturbance term) to estimate individual equations

2) Consistent but not unbiased.

3) Two-stage least squares (2SLS)

Approach finds the best instrument for endogenous variables.

Stage 1 regresses each endogenous variable on all exogenous variables.

Stage 2 uses regression-estimated values from stage 1 as instruments, and

estimates equations with ordinary least squares.

Consistent but not unbiased. Generally better small sample properties than ILS

or IV.

4) Limited Information Maximum Likelihood (LIML)

Uses maximum likelihood to estimate reduced form models. Can incorporate

parameter restrictions in over identified equations.

Consistent but not unbiased. Has same asymptotic variance-covariance matrix

as 2SLS.



18

System equation methods

1) Three Stage Least Squares (3SLS)

Stage 1 gets 2SLS estimates of the model system.

Stage 2 uses the 2SLS estimates to compute residuals to determine cross-equation correlations.

Stage 3 uses generalized least squares (GLS) to estimate model parameters.

Consistent and more efficient than single-equation estimation methods.

2) Full Information Maximum Likelihood (FIML)

Similar to LIML but accounts for contemporaneous correlation of disturbances in the likelihood function.

Consistent and more efficient than single-equation estimation methods. Has same asymptotic variance-covariance matrix as 3SLS.

A note on generalized least squares estimation

Ordinary least squares (OLS) assumptions are that disturbance terms have equal variances

and are not correlated. Generalized least squares (GLS) is used to relax these OLS

assumptions. Under OLS assumptions, in matrix notation,

( ) 2T = IΕ εε σ

where:

E(.) denotes expected value,

ε is an n × 1 column vector of equation disturbance terms (where n is the total number of observations in the data), Tε is the 1 × n transpose of ε,

σ 2 is the disturbance term variance, and

I is the n × n identity matrix,



19

1 0 . 00 1 . 0. . . .0 0 . 1

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦

I .

When heteroskedasticity is present, ( )T =Ε εε Ω , where Ω is n × n matrix,

21

22

2

0 00 0

0 0 n

.

.. . . .

.

σσ

Ω

σ

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

.

For disturbance-term correlation, ( ) 2T =Ε εε σ Ω , where

1

2

1 2

11

1

N

N

N N

.

.. . . .

.

−

−

− −

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

ρ ρρ ρ

Ω

ρ ρ

Recall that in ordinary least squares, parameters are estimated from,

( ) 1T Tˆ X X X Y−

=β ,

where:

β is an p × 1 column vector (where p is the number of parameters),

X is an n × p matrix of data,

TX is the transpose of X, and

Y is an n × 1 column vector.

Using Ω, Equation 5A.5 is rewritten as,

( ) 11 1T Tˆ X X X Y−− −=β Ω Ω .



20

The most difficult aspect of GLS estimation is obtaining an estimate of the Ω matrix. In

3SLS, it is estimated using the initial 2SLS parameter estimates.

Hypothesis Testing and Diagnostics for Continuous Dependent Variable Models

The objective of hypothesis testing and diagnostics is to determine the "best" model fit to a specified data set. A. Assessment of Estimates Coefficients

The most commonly used statistic used to evaluate coefficients is the t-statistic. The t-statistic is defined as:

D Fˆ

ˆt

S β

β β−=

where: tDF the t-stat with DF (degrees of freedom) (N-K) = DF (N minus the

number of coefficients in the model)

β the estimated parameter β value of parameter testing against (usually zero) ˆS β standard error of β (i.e., square root of VAR( β ))

Example: Suppose we estimate the model Y = A + Bx, with 30 observations, and find Coeff. Value Standard Error t-Stat A 2.47 1.92 1.29 B -3.13 1.33 2.35 * t-Stat is calculated with β = 0

Wish to test whether A and B are significantly different from zero.

For both DF = N-2 = 28

Wish to test that A > 0 and B < 0 use a one-tailed t-test.

From tables we find the critical values for: t0.90, 28 = 1.313 90% confidence level

t0.99, 28 = 2.467 99% confidence level



21

The hypotheses are: HO : A, B = 0

HA : A > 0, B < 0

For A, 1.29 < 1.313 so we can only be about 89% confident that A > 0

For B | -2.35 | < | -2.467 | s0 we can be about 90.5% confident that B < 0

If we want to test A ≠ 0 and B ≠ 0 we use a two-tailed test:

From tables we find the critical values for:

t0.90, 28 = 1.701 at 90% confidence level

t0.99, 28 = 2.763 at 99% confidence level

The hypotheses are: HO : A, B = 0

HA : A > 0, B < 0

We will be less confident since critical t-values are larger for the two-tailed test.

B. Overall Model Assessment

1) R-Squared

The most commonly used statistic is the R-squared.

R-squared is the ratio of data variance explained by the model to total data

variance.

( )( )

2

22

i

i

Y Y explainedRvariation in YY Y

−= =

−

∑∑



22

or = ( )

2

21 1i

e SSR Residual VariationTotal Variation in YY Y

−− = −

−∑∑

Generally, the higher the R-squared value, the better. However, it is important to

consider:

a) The Amount of Variance in the Data

Data with little variance may produce high R2's, but the model is not explaining

much.

Conversely, data with much variance may produce low R2's, but may still be

explaining much of the underlying process.

As a rule: It may be better to explain a little of a lot of variance rather than a lot of a

little variance.

b) The Number of Independent Variables in the Model

The R2 statistic will always increase as more variables are added.

To resolve this problem, the corrected R-squared statistic is used:

( )2 2 11 1 NR RN K

−= − −

−

where: N = number of observations

K = number of parameters in the model

The corrected 2R accounts for the number of variables in the model and therefore

can decline when additional variables are added.



23

2) F-Statistic

The F-statistic is used to test whether the model is significantly different from zero (i.e.,

if a relation exists or not).

The F-statistic tests the joint hypothesis that all parameters are equal to zero

For finding critical values of F (i.e., from tables), the degrees of freedom are 1KN K

−−

where: N = number of observations

K = number of parameters in the model

Generally, if t-stats and R2's are good, F-stat will be OK.

3) Durbin-Watson Statistic

This statistic is used to test for the presence of serial correlation (auto correlation) of

disturbances.

The further away the statistic is from 2.0, the less confident we can be about the

absence of serial correlation.

4) Condition Number

Is used to determine the extent of multicollinearity.

It is derived from the characteristic roots of the X'X matrix.

Condition number = Largest Characteristic RootSmallest Characteristic Root



24

CN < 10 No multicollinearity

10 < CN < 100 Some Problems

CN > 100 Serious multicollinearity

Count Data Models

Count data consist of non-negative integer values

Examples:

number of driver route changes per week,

the number of trip departure changes per week,

drivers' frequency-of-use of ITS technologies over some time period,

the number of accidents observed on road segments per year.

Count data can be properly modeled by using a number of methods, the most popular of

which are Poisson and negative binomial regression models.

Poisson Regression Model

Consider the number of accidents occurring per year at various intersections in a city.

In a Poisson regression model, the probability of intersection i having yi accidents per year

(where yi is a non-negative integer) is given by:

( ) ( )!

iyi i

ii

EXPP y

yλ λ−

=

Where:

P(yi) is the probability of intersection i having yi accidents per year

λi is the Poisson parameter for intersection i, which is equal to

intersection i's expected number of accidents per year, E[yi].



25

Poisson regression models are estimated by specifying the Poisson parameter λi (the

expected number of events per period) as a function of explanatory variables.

The most common relationship between explanatory variables and the Poisson parameter is

the log-linear model,

( ) ( )or, equivalently ,i i i iEXP βX LN βXλ λ= =

Where:

Xi is a vector of explanatory variables and

β is a vector of estimable coefficients.

In this formulation, the expected number of events per period is given by

[ ] ( )i i iE y EXP Xλ β= =

For model estimation, note the likelihood function is:

( ) ( )ii

L β = P y∏

So, with the Poisson equation,

( ) ( )!

iyi i

i i

E X PL β =

yλ λ−

∏

Since ( )i iEXP Xλ β= ,

( ) ( ) ( )!

iyi i

i i

EXP -EXP βX EXP βXL β =

y⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦∏

Which gives the log-likelihood,



26

( ) ( ) ( )1

!n

i i i ii

LL β EXP βX y X LN yβ=

= − + −⎡ ⎤⎣ ⎦∑ .

Poisson Regression Model Goodness of Fit Measures

The likelihood ratio test is a common test used to assess two competing models. It provides

evidence in support of one model

The likelihood ratio test statistic is,

-2[LL(βR) – LL (βU)]

where

LL(βR) is the log-likelihood at convergence of the "restricted" model (sometimes

considered to have all coefficients in β equal to 0, or just to include the constant

term, to test overall fit of the model)

LL(βU) is the log-likelihood at convergence of the unrestricted model.

This statistic is χ2 distributed with the degrees of freedom equal to the difference in the

numbers of coefficients in the restricted an unrestricted model (the difference in the

number of coefficients in the βR and the βU coefficient vectors).

Another measure of overall model fit is the ρ2 statistic. The ρ2 statistic is,

( )( )

2 10

LLLL

βρ = −

Where:

LL(β) is the log-likelihood at convergence with coefficient vector β and

LL(0) is the initial log-likelihood (with all coefficients set to zero).

The perfect model would have a likelihood function equal to one (all selected alternative

outcomes would be predicted by the model with probability one, and the product of these



27

across the observations would also be one) and the log-likelihood would be zero giving a

ρ2 of one

The ρ2 statistic will be between zero and one and the closer it is to one, the more

variance the estimated model is explaining.

Truncated Poisson Regression Model

Truncation of data can occur in the routine collection of transportation data.

Example, if the number of times per week an in-vehicle navigation system is used on the

morning commute to work, during weekdays, the data are right truncated at 5, which is the

maximum number of uses in any given week.

Estimating a Poisson regression model without accounting for this truncation will result in

biased estimates of the parameter vector β, and erroneous inferences will be drawn.

Fortunately, the Poisson model is adapted easily to account for such truncation. The right-

truncated Poisson model is written as:

( ) ( )0

! !i i

i

ry m

i i i i im

P y = y mλ λ=

⎡ ⎤⎡ ⎤ ⎢ ⎥⎣ ⎦

⎣ ⎦∑ ,

Where:

P(yi) is the probability of commuter i using the system yi times per week,

λi is the Poisson parameter for commuter i;

mi is the number of uses per week;

and r is the right truncation (in this case, 5 times per week).



28

Negative Binomial Regression Model

Poisson distribution that restricts the mean and variance to be equal:

E[yi] = VAR[yi].

If this equality does not hold, the data are said to be under dispersed (E[yi] > VAR[yi]) or

overdispersed (E[yi] < VAR[yi]), and the coefficient vector will be biased if corrective

measures are not taken.

To account for cases when E[yi] ≠ VAR[yi], a negative binomial model is used.

The negative binomial model is derived by rewriting the λi equation such that,

λi = EXP(βXi + εi)

where EXP(εi) is a Gamma-distributed error term with mean 1 and variance α2.

The addition of this term allows the variance to differ from the mean as below,

VAR[yi] = E[yi][1+ αE[yi]] = E[yi]+ αE[yi]2

The Poisson regression model is regarded as a limiting model of the negative binomial

regression model as α approaches zero, which means that the selection between these two

models is dependent upon the value of α.

The parameter α is referred to as the overdispersion parameter.

The negative binomial distribution has the form,

1((1 ) ) 1( )

(1 ) ! (1 ) (1 )

iy

i ii

i i i

yP yy

αΓ α α λΓ α α λ α λ

⎛ ⎞ ⎛ ⎞+= ⎜ ⎟ ⎜ ⎟+ +⎝ ⎠ ⎝ ⎠



29

where Γ(.) is a gamma function. This results in the likelihood function,

1((1 ) ) 1( )

(1 ) ! (1 ) (1 )

iy

i ii

i i i i

yLy

αΓ α α λλΓ α α λ α λ

⎛ ⎞ ⎛ ⎞+= ⎜ ⎟ ⎜ ⎟+ +⎝ ⎠ ⎝ ⎠∏

Zero-Inflated Poisson and Negative Binomial Regression Models

Zero events can arise from two qualitatively different conditions.

1. One condition may result from simply failing to observe an event during the observation

period.

2. Another qualitatively different condition may result from an inability to ever experience

an event.

Two states can be present, one being a normal count-process state and the other being a zero-

count state.

A zero-count state may refer to situations where the likelihood of an event occurring is

extremely rare in comparison to the normal-count state where event occurrence is inevitable and

follows some know count process

Two aspects of this non qualitative distinction of the zero state are noteworthy:

1. There is a preponderance of zeroes in the data—more than would be expected under a

Poisson process.

2. A sampling unit is not required to be in the zero or near zero state into perpetuity, and

can move from the zero or near zero state to the normal count state with positive

probability.

Data obtained from two-state regimes (normal-count and zero-count states) often suffer from

overdispersion if considered as part of a single, normal-count state because the number of zeroes

is inflated by the zero-count state.



30

Zero-inflated Poisson (ZIP)

Assumes that the events, Y = (y1, y2,……,yn), are independent and the model is

( ) ( )

( ) ( )

0 with probability 1

1 with probability

!

i i i i

yi i i

i

y p p EXP

p EXPy y

y

λ

λ λ

= + − −

− −=

.

where y is the number of events per period.

Zero-inflated negative binomial (ZINB)

regression model follows a similar formulation with events, Y = (y1, y2,……, yn ), being

independent and,

( )

( )

1

1

1

0 with probability 11

1 (1 ) with probability 1 , =1, 2, 3...

1 !

i i i

i

yi i

i i

y p p

y u uy y p y

y

α

α

α

λα

Γα

Γα

⎡ ⎤⎢ ⎥

= + − ⎢ ⎥⎛ ⎞⎢ ⎥+⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦

⎡ ⎤⎛ ⎞⎛ ⎞ + −⎜ ⎟⎢ ⎥⎜ ⎟⎝ ⎠⎝ ⎠⎢ ⎥= −⎛ ⎞⎢ ⎥⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦

where ( ) ( )1 1i iu α α λ= +⎡ ⎤⎣ ⎦ .

Zero-inflated models imply that the underlying data-generating process has a splitting

regime that provides for two types of zeros.

The splitting process can be assumed to follow a logit (logistic) or probit (normal)

probability process, or other probability processes.



31

A point to remember is that there must be underlying justification to believe the splitting

process exists (resulting in two distinct states) prior to fitting this type of statistical

model. There should be a basis for believing that part of the process is in a zero-count

state.

To test the appropriateness of using a zero-inflated model rather than a traditional model, Vuong

(1989) proposed a test statistic for non-nested models that is well suited for situations where the

distributions (Poisson or negative binomial) are specified. The statistic is calculated as (for each

observation i),

( )( )

1

2

i ii

i i

f y Xm LN

f y X⎛ ⎞

= ⎜ ⎟⎜ ⎟⎝ ⎠

where:

f1(yi|Xi) is the probability density function of model 1, and

f2(yi|Xi) is the probability density function of model 2.

Using this, Vuongs' statistic for testing the non-nested hypothesis of model 1 versus model 2

is (Greene, 2000; Shankar et al., 1997),

( )( )1

2

1

1

1

n

ii

nm

ii

n m n mnVS

m mn

=

=

⎡ ⎤⎛ ⎞⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦= =

⎛ ⎞ −⎜ ⎟⎝ ⎠

∑

∑

Where: m is the mean ( ( )1

1n

ii

n m=∑ ), Sm is standard deviation,

Vuongs' value is asymptotically standard normal distributed (to be compared to z-values),

and

if V is less than Vcritical (1.96 for a 95% confidence level), the test does not support the

selection of one model over another.

Large positive values of V greater than Vcritical favor model 1 over model 2, whereas large

negative values support model 2.



32

Because overdispersion will almost always include excess zeros, it is not always easy to

determine whether excess zeros arise from true overdispersion or from an underlying

splitting regime.

This could lead one to erroneously choose a negative binomial model when the correct

model may be a zero-inflated Poisson.

The use of a zero-inflated model may be simply capturing model mispecification that could

result from factors such as unobserved effects (heterogeneity) in the data.

Discrete Outcome Models

Examples of discrete data (unordered):

Mode of travel (automobile, bus, rail transit),

Type or class of vehicle owned, and

Type of a vehicular accident (run-off-road, rear-end, head-on, etc.).

Examples of discrete data (ordered):

telecommuting-frequency data that have outcomes of never, sometimes, and

frequently

In contrast to data that are not ordered, ordinal discrete data possess additional

information on the ordering of responses that can be used to improve the

efficiency of the model’s parameter estimates

Models of Discrete Data

For unordered discrete outcomes, start with a linear function of covariates that influences

specific discrete outcomes.

For example, in the event of a vehicular accident, possible discrete crash outcomes are rear-

end, sideswipe, run-off-road, head-on, turning, and other.



33

Let Tin be a linear function that determines discrete outcome i for observation n such that,

Tin = βi Xin ,

Where:

βi is a vector of estimable parameters for discrete outcome i,

Xin is a vector of the observable characteristics (covariates) that determine discrete

outcomes for observation n.

To arrive at a statistically estimable probabilistic model, a disturbance term εin is

added, giving

Tin = βi Xin + εin .

Reasons for adding a disturbance term:

1. variables have been omitted from the function (some important data may not be

available),

2. the functional form may be incorrectly specified (it may not be linear),

3. proxy variables may be used (variables that approximate missing variables in the

database),

4. variations in βi that are not accounted for (βi may vary across observations).

To derive an estimable model of discrete outcomes with I denoting all possible outcomes for

observation n, and Pn(i) being the probability of observation n having discrete outcome i (i

∈ I)

Pn(i) = P(Tin ≥ TIn) ∀ I ≠ i .

By substituting for Tin,

Pn(i) = P(βi Xin + εin ≥ βI XIn + εIn) ∀ I ≠ i



34

or,

Pn(i) = P(βi Xn − βI Xn ≥ εIn − εin) ∀ I ≠ i .

Estimable models are developed by assuming a distribution of the random disturbance

term, ε′s.

Binary and Multinomial Probit Models

Probit models arise when the disturbance term εIn is assumed to be normally distributed.

In the binary case (two outcomes, denoted 1 or 2)

Pn(1) = P(β1 X1n − β2X2n ≥ ε2n − ε1n)

This equation estimates the probability of outcome 1 occurring for observation n, where

ε1n and ε2n are normally distributed with mean = 0, variances σ 21 and σ 22 respectively

and the covariance is σ12.

An attractive feature of normally distributed variates is that the addition or

subtraction of two normal variates also produces a normally distributed variate.

In this case ε2n − ε1n is normally distributed with mean zero and variance

σ 21 + σ22 - σ12. The resulting cumulative normal function is

( )( )1 1 2 2

21 1122

n nX X

nP E X P w d wβ β σ

π

−

−∞

⎡ ⎤= −⎢ ⎥⎣ ⎦∫

If Φ ( ) is the standardized cumulative normal distribution, then

( ) 1 1 2 21 n nn

X XP −⎛ ⎞= ⎜ ⎟⎝ ⎠β βΦ

σ

where σ = (σ 21 + σ 22 - σ12)0.5.



35

The term 1/σ is a scaling of the function determining the discrete outcome and can be set to

any positive value, although σ = 1 is typically used.

General shape of probit outcome probabilities.

The parameter vector (β) is readily estimated using standard maximum likelihood methods.

If δin is defined as being equal to 1 if the observed discrete outcome for observation n is i

and zero otherwise, the likelihood function is

( )1 1

inN I

n= i=

L P i=∏∏ δ,

Pn (1)

β1 X1n − β2X2n

0.5

1.0



36

where N is the total number of observations. In the binary case with i = 1 or 2, the log-

likelihood is,

( )1 1 2 2 1 1 2 21 1

1

1N

n n n nn n

n=

X X X XLL LN LNβ β β βδ Φ δ Φσ σ− −⎛ ⎞⎛ ⎞ ⎛ ⎞= + −⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

∑

The problem with the multinomial probit is that the outcome probabilities are not closed

form and estimation of the likelihood functions requires numerical integration.

The difficulties of extending the probit formulation to more than two discrete outcomes have

lead researchers to consider other disturbance term distributions.

Multinomial Logit Model

From a model estimation perspective, a desirable property of an assumed distribution of

disturbances (ε′s) is that the maximums of randomly drawn values from the distribution

have the same distribution as the values from which they were drawn.

The normal distribution does not posses this property (the maximums of randomly drawn

values from the normal distribution are not normally distributed).

A disturbance term distribution with such a property greatly simplifies model

estimation because it could be applied to the multinomial case by replacing β2X2n with

the highest value (maximum) of all other βIXIn ≠ 1.

Distributions of the maximums of randomly drawn values from some underlying distribution

are referred to as extreme value distributions (Gumbel, 1958).

Extreme value distributions are categorized into three families: Type 1, Type 2, and Type 3

(see Johnson and Kotz, 1970).

The most common extreme value distribution is the Type 1 distribution (sometimes

referred to as the Gumbel distribution). It has the desirable property that maximums



37

of randomly drawn values from the extreme value Type 1 distribution are also extreme

value Type 1 distributed.

The probability density function of the extreme value Type 1 distribution is,

( ) ( )( ) ( )( )( )f EXP - - EXP -EXP - -ε η η ε ω η ε ω=

with corresponding distribution function

( ) ( )( )( )F EXP -EXP - -ε η ε ω=

where: η is a positive scale parameter, ω is a location parameter (mode), and the mean is

ω + 0.5772/η.

To derive an estimable model based on the extreme value Type 1 distribution, a revised

version of the probability equation is

( ) ( )maxn i in in I In In I iP i P X X

∀ ≠

⎛ ⎞= + ≥ +⎜ ⎟⎜ ⎟

⎝ ⎠β ε β ε

For the extreme value Type 1 distribution, if all εIn are independently and identically

(same variances) distributed random variates with modes ωIn and a common scale

parameter η (which implies equal variances), then the maximum of βI XIn + εIn’s is

extreme value Type 1 distributed with mode

( )1I In

I i

LN EXP Xηβη ∀ ≠

∑

and scale parameter η (see Gumbel 1958).



38

Illustration of an extreme value Type I distribution.

If εn' is a disturbance term associated with the maximum of all possible discrete

outcomes ≠ i with mode equal to zero and scale parameter η, and

β 'Xn' is the parameter and covariate product associated with the maximum of all possible

discrete outcomes ≠ i, then it can be shown that

f (x)

-2 0 2 4 6 x

ω = 0

η = 2

η = 1

η = 0.5

0.8 0.8

0.6

0.4



39

( )1' 'n I In

I i

X LN EXP Xβ ηβη ∀ ≠

= ∑

This result arises because for extreme value Type 1 distributed variates, ε, the addition

of a positive scalar constant say, a, changes the mode from ω to ω + a without affecting

the scale parameter η.

So, if εn' has mode equal to zero and scale parameter η, adding the scalar

( )1I I n

I i

L N E X P Xη βη ∀ ≠

∑ gives an extreme value distributed variate with mode (β 'Xn')

equal to ( )1I In

I i

LN EXP Xηβη ∀ ≠

∑ and scale parameter η.

Using these results, the probability equation is written as,

( ) ' ' 'n i in in n nP i P β X X

⎛ ⎞⎜ ⎟= + ≥ +⎜ ⎟⎝ ⎠

ε β ε

or,

( ) 0' ' 'n n n i in in P i P X - βX β ε ε⎛ ⎞= + + ≤⎜ ⎟

⎝ ⎠

And, because the difference between two independently distributed extreme value Type 1

variates with common scale parameter η is logistic distributed,

( )( )1

1n ' 'n i in

P i EXP X - β Xη β

=⎡ ⎤+ ⎣ ⎦

rearranging terms,

( ) ( )( ) ( )

i inn ' '

i in n

EXP XP i

EXP X EXP X

η β

η β η β

⎡ ⎤⎣ ⎦=⎡ ⎤+⎡ ⎤⎣ ⎦ ⎣ ⎦



40

Substituting with ( )1' 'n I In

I iX L N E X P Xβ η β

η ∀ ≠

= ∑ and setting η = 1 (there is no loss of

generality) the equation becomes

( ) [ ]

[ ] ( )i in

n

i in I In I i

EXP XP i

EXP X EXP LN exp X

β

β β∀ ≠

=⎡ ⎤+ ⎢ ⎥⎣ ⎦

∑

or,

( ) [ ]( )

i inn

I In I

EXP XP i

EXP Xββ

∀

=∑

which is the standard multinomial logit formulation. For estimation of the parameter vectors

(β’s) by maximum likelihood, the log-likelihood function is,

( )1 1

N I

in i in I Inn= i= I

LL X - LN EXP Xδ β β∀

⎛ ⎞⎡ ⎤= ⎜ ⎟⎢ ⎥⎣ ⎦⎝ ⎠∑ ∑ ∑

where I is the total number of outcomes, δ is as previously defined, and all other variables

are as defined previously.

When applying the multinomial logit model it is important to realize that the choice of the

extreme value Type 1 distribution is made on the basis of computational convenience,

although this distribution is similar to the normal distribution.



41

Figure 11-3: Comparison of binary logit and probit outcome probabilities.

Discrete Data and Utility Theory

From economics, utility (satisfaction) is maximized subject to the prices of the alternatives

and an income constraint.

Because utility theory consists of decision-makers selecting a utility maximizing alternative

based on prices of alternatives and an income constraint, any purchase affects the remaining

income and thus all purchases are interrelated.

Problem: one theoretically cannot isolate specific choice situations.

Restrictions must be placed on utility functions.

To illustrate these, a utility function is defined that is determined by the consumption of m

goods (y1, y2,…, ym) such that

u = f(y1, y2,…, ym)

P (i)

β1 X1n − β2X2n

0.5

1.0

Logit

Probit



42

As an extremely restrictive case it is assumed that the consumption of one good is

independent of the consumption of all other goods. The utility function is then written as

u = f1(y1) + f2(y2) +…..+ fm(ym)

This is referred to as an additive utility function and, in nearly all applications, it is

unrealistically restrictive.

Example: the application of such an assumption implies that the acquisition of two types of

breakfast cereal are independent although it is clear that the purchase of one will affect the

purchase of the other.

A more realistic restriction on the utility function is to separate decisions into groups and to

assume that consumption of goods within the groups is independent of those goods in other

groups.

This is referred to as separability and is an important construct in applied economic theory.

It is this property that permits the focus on specific choice groups such as the choices of

travel mode to work.

Indirect Utility

Normal or direct utility has utility that is maximized subject to an income constraint and this

maximization produces a demand for goods y1, y2,…, ym.

When applying discrete outcome models, the utility function is typically written with prices

and incomes as arguments.

When the utility function is written in this way, the utility function is indirect, and it can be

shown that the relationship between this indirect utility and the resulting demand equation

for some good m is given by Roy's identity

0 mm

Vpy V

Inc

=−

∂∂∂∂



43

Where:

V is the indirect utility,

pm is the price of good m,

Inc is the decision-maker's income, and

ym0 is the utility maximizing demand for good m.

Applying the utility framework within discrete outcome models is straightforward.

Using the notation above, T becomes the utility determining the choice (as opposed to a

function determining the outcome).

But the derivations of discrete outcome models imply that the model is compensatory.

Changes in factors that determine the function T in for each discrete outcome do not

matter as long as the total value of the function remains the same.

This is potentially problematic in some utility-maximizing choice situations.

Properties and Estimation of Multinomial Logit Models

Consider a commuter's choice of route from home to work where the choices are to take an

arterial, a two-lane road, or a freeway.

( )a

fa t

V

VV V

eP ae e e

=+ +

, ( )t

fa t

V

VV V

eP te e e

=+ +

, ( )f

fa t

V

VV V

eP fe e e

=+ +

where P(a), P(t) and P(f), are the probabilities that commuter n selects the

arterial, two-lane road and freeway respectively and Va, Vt and Vf are

corresponding indirect utility functions.

Variables defining these functions are classified into two groups:



44

1. those that vary across outcome alternatives (in route choice, distance

and number of traffic signals)

2. those that do not vary across outcome alternatives (Commuter income

and other commuter-specific characteristics such as number of children,

number of vehicles, and age of commuting vehicle).

The distinction between these two sets of variables is important, because the

MNL model is derived using the difference in utilities.

Because of this differencing, estimable parameters relating to variables that do

not vary across outcome alternatives can, at most, be estimated in I-1 of the

functions determining the discrete outcome (I is the total number of discrete

outcomes).

The parameter of at least one of the discrete outcomes must be normalized to

zero to make parameter estimation possible (this is illustrated in a forthcoming

example).

Given these two variables types, the utility functions for Equation 11.26 are

defined as

Va = β1a + β2a Xa + β3a Z

Vt = β1t + β2t Xt + β3t Z ,

Vf = β1f + β2f Xf + β3f Z



45

Where:

Xa, Xt and Xf are vectors of variables that vary across arterial, two-

lane, and freeway choice outcomes respectively, as experienced by

commuter n,

Z is a vector of characteristics specific to commuter n,

β1's are constant terms,

β2's are vectors of estimable parameters corresponding to outcome-

specific variables in X vectors, and

β3's are vectors corresponding to variables that do not vary across

outcome alternatives.

Note that the constant terms are effectively the same as variables that do not vary

across alternate outcomes and at most are estimated for I-1 of the outcomes.

Statistical Evaluation

To determine if the estimated parameter is significantly different from zero, the t-

statistic is:

( )0 - t

S.E.β

β=

where S.E.(β) is the standard error of the parameter.

Note that because the MNL is derived from an extreme value distribution and not

a normal distribution, the use of t-statistics is not strictly correct although in

practice it is a reliable approximation of the true significance.

A more general and appropriate test is the likelihood ratio test.



46

The likelihood ratio test statistic is

-2[LL(βR) – LL(βU)]

where LL(βR) is the log-likelihood at convergence of the "restricted" model and

LL(βU) is the log-likelihood at convergence of the "unrestricted" model.

This statistic is χ2 distributed with degrees of freedom equal to the difference in

the numbers of parameters between the restricted an unrestricted model (the

difference in the number of parameters in the βR and the βU parameter vectors).

Overall model fit is the ρ2 statistic (it is similar to R2 in regression models in

terms of purpose). The ρ2 statistic is:

( )( )

2 10

LLLL

βρ = −

where LL(β) is the log-likelihood at convergence with parameter vector β and

LL(0) is the initial log-likelihood (with all parameters set to zero).

As is the case with R2 in regression analysis, the disadvantage of the ρ2 statistic is

that it will always improve as additional parameters are estimated even though

the additional parameters may be statistically insignificant.

To account for the estimation of potentially insignificant parameters a corrected

ρ2 is estimated as

( )( )

2corrected 10

LL - KLLβ

ρ = −

where K is the number of parameters estimated in the model.



47

Interpretation of Findings

Elasticity

Elasticity is computed from the partial derivative for each observation n (n

subscripting omitted):

( ) ( )( )ki

P i kix

ki

P i xEx P i

∂= ×

∂

Where:

P(i) is the probability of outcome i

xki is the value of variable k for outcome i.

It is readily shown by taking the partial derivative of the MNL model that

Equation 11.33 becomes

( ) ( )1ik

P ix ki kiE P i x= −⎡ ⎤⎣ ⎦ β

Elasticity values are interpreted as the percent effect that a 1% change in xki has

on the outcome probability P(i).

If the computed elasticity value is less than one, the variable xki is said to be

inelastic and a 1% change in xki will have less than a 1% change in outcome i's

selection probability.

If the computed elasticity is greater than one it is said to be elastic and a 1%

change in xki will have more than a 1% change in outcome i's selection

probability.

Key points:

1. The values are point elasticity's and as such are valid only for small changes xik and

considerable error may be introduced when an elasticity is used to estimate the probability

change caused by a doubling of xki.



48

2. Elasticities are not applicable to indicator variables

Some measure of the sensitivity of indicator variables is made by computing a

pseudo-elasticity. The equation is

( ) ( )

( ) ( ) ( )

Δ1

Δ

i i kI kI

kii i kI kI kI kI

n n

P ( i ) Ix

I I I

EXP X EXP xE

EXP X EXP x EXP x

β β

β β β∀

∀ ∀ ≠

⎡ ⎤⎣ ⎦= −

+⎡ ⎤⎣ ⎦

∑∑ ∑

,

where In is the set of alternate outcomes with xk in the function determining the outcome,

and I is the set of all possible outcomes.

Cross-elasticity

It may be of interest to determine the effect of a variable influencing the probability of

outcome j may have on the probability of outcome i.

Example, how a change in the distance on the arterial affects the probability of the two-lane

road being chosen.

Known as a cross-elasticity, this value is computed using the equation

( ) ( )ik

P ix jk jkE P j x= − β

Note that this equation implies that there is one cross elasticity for all i (i ≠ j). This means

that an increase in distance on the arterial results in an equal increase in the likelihood that

the two-lane and freeway alternatives will be chosen.

This property of uniform cross elasticities is an artifact of the error term independence

assumed in deriving the multinomial logit model.



49

Marginal Rates of Substitution (MRS)

Because logit models are compensatory, marginal rates of substitution are computed to

determine the relative magnitude of any two parameters estimated in the model.

In MNL models, this rate is computed simply as the ratio of parameters for any two

variables in question (in this case a and b)

( ) iaba

ib

MRS i =ββ

Specification Errors

Independence of Irrelevant Alternatives (IIA) property.

Recall that a critical assumption in the derivation of the multinomial logit model

is that the disturbances (ε′s) are independently and identically distributed.

When this assumption does not hold, a major specification error results.

This problem arises when only some of the functions, which determine possible

outcomes, share unobserved elements (that show up in the disturbances).

If all outcomes shared the same unobserved effects, the problem would self

correct because in the differencing of outcome functions common unobserved

effects would cancel out.

Because the common elements cancel in the differencing, a logit model with only

two outcomes can never have an IIA violation.

To illustrate the IIA problem, note the ratio of any two-outcome probabilities is

independent of the functions determining any other outcome since

( )( )

[ ][ ]

1 1

2 2

12

P E X P β X

P E X P β X=



50

Problem: consider the estimation of a model of choice of travel mode to work where the

alternatives are to take a personal vehicle, a red transit bus, or a blue transit bus.

The red and blue transit buses clearly share unobserved effects that will appear in

their disturbance terms and they will have exactly the same functions (βrbXrb =

βbbXbb) if the only difference in their observable characteristics is their color.

For illustrative purposes, assume that, for a sample commuter, all three modes

have the same value of βiXi 's (the red bus and blue bus will, and assume that

costs, time, and other factors that determine the likelihood of the personal vehicle

being chosen works out to the same value as the buses).

Then the predicted probabilities will give each mode a 33% chance of being

selected.

This is unrealistic since the correct answer should be a 50% chance of taking a

personal vehicle and a 50% chance of taking a bus (both red and blue bus

combined) and not 33.33% and 66.67% respectively as the MNL would predict.

Most applications the IIA violation is more subtle than in the previous example.

There are a number statistical tests that are conducted to test for IIA violations.

One of the more common of these tests was developed by Small and Hsiao (1985). The

procedure is to first split the data randomly into two samples (NA and NB) containing the

same number of observations. Two separate models are then estimated producing

parameter estimates βA and βB. A weighted average of these parameters is obtained from

( ) ( )1 2 1 1 2AB A B = + −β β β

Then, a restricted set of outcomes, D, is created as a sub-sample from the full set of

outcomes. The sample NB is then reduced to include only those observations in which the

observed outcome lies in D.



51

Two models are estimated with the reduced sample (NB') using D as if it were the entire

outcome set ( B' in superscripting denotes the sample reduced to observations with outcomes

in D).

One model is estimated by constraining the parameter vector to be equal to βAB as computed

above. The second model estimates the unconstrained parameter vector βB'.

The resulting log-likelihoods are used to evaluate the suitability of the model structure by

creating a chi squared statistic with the number of degrees of freedom equal to the number

of parameters in βAB (also the same number as in βB'). This statistic is

χ2 = -2[LLB'(βAB) – LLB'(β B')]

The test is then repeated by interchanging the roles of the NA and NB sub-samples (reducing

the NA sample to observations were the observed outcomes lie in D and proceed). Using the

same notation, Equation 11.39 becomes

( ) ( )1 2 1 1 2BA B A = + −β β β

and the chi-squared statistic is

χ2 = -2[LLA'(βBA) – LLA'(βA')]

Other Specification Errors

• Omitted variables.

Results in inconsistent estimates of logit model parameters and choice probabilities if

any of the following hold:

1) the omitted variable is correlated with other variables included in the model,

2) the mean values of the omitted variable vary across alternate outcomes and

outcome specific constants are not included in the model, or



52

3) the omitted variable is correlated across alternate outcomes or has a different

variance in different outcomes.

Because one or more of these conditions are likely to hold, omitting relevant variables is

a serious specification problem.

• Presence of an irrelevant variable.

Estimates of parameter and choice probabilities remain consistent in the presence of an

irrelevant variable but the standard errors of the parameter estimates will increase (loss

of efficiency).

• Disturbances that are not independently and identically distributed (IID).

Dependence among a subset of possible outcomes causes the IIA problem resulting in

inconsistent parameter estimates and outcome probabilities.

Having disturbances with different variances (not identically distributed) also results in

inconsistent parameter estimates and outcome probabilities.

• Random parameter variations.

Standard MNL estimation assumes that the estimated parameters are the same for all

observations.

Random parameter estimates give inconsistent estimates of parameters and outcome

probabilities.

• Correlation between explanatory variables and disturbances and endogenous

variables.

If a correlation exists between X and ε then parameter estimates will be inconsistent.

• Erroneous data.

If erroneous data are used, parameter and outcome probabilities will be incorrectly

estimated (also erroneous).



53

• State dependence and heterogeneity.

A potential estimation problem can arise in discrete outcome models if information on

previous outcomes is used to determine current outcome probabilities. This may be

capturing:

1. Important habitual behavior (state dependence)

2. residual heterogeneity, which would lead one to observe spurious state

dependence.

Endogeniety in Discrete Outcome Models

Can argue that price is endogenous in a vehicle choice model for example (as it would

surely be in an aggregate regression model of vehicle demand).

Logic: the effect of any single observation on vehicle price is infinitesimal, many have

contended that price is exogenous for estimation purposes.

Still, if one were to forecast used-vehicle market shares using the model (using the

summation of individual outcome probabilities as a basis), vehicle prices would need to

be forecasted internally as a function of total vehicle demand.

Authors of recent work have argued that variables such as price still present a

problem in individual discrete outcome models. This is because prices tend to be

higher for products that have attributes that are observed by the consumer and not

by the analyst.

Data Sampling

There are two general types of sampling strategies for collecting data to estimate discrete

outcome models, random and stratified random sampling.



54

All standard MNL model derivations assume that the data used to estimate the model are

drawn randomly from a population of observations.

Stratified sampling refers to a host of non-random sampling alternatives. The idea of

stratified sampling is that some known population of observations is partitioned into

subgroups (strata) and random sampling is conducted in each of these subgroups.

This type of sampling is particularly useful when one wants to gain information on a

specific group that is a small percentage of the total population of observations (such as

transit riders in the choice of transportation mode or households with incomes exceeding

$200,000 per year).

Note that random sampling is a special case of stratified sampling in that the number of

observations chosen from each strata is in exact proportion to the size of the strata in the

population of observations.

Four special cases of stratified sampling

1. Exogenous sampling refers to sampling that in which selection of the strata is based on

values of the X (right-hand side such as income) variables. In such cases, the standard

maximum likelihood estimation (treating the sample as though it was a random sample)

is appropriate.

2. Outcome-based sampling may be used to get a sufficient representation of a specific

outcome or may be an artifact of the data-gathering process.

If the proportions of outcomes in the sample are not equal to the proportions of

outcomes in the overall population, an estimation correction must be made.

The correction is straightforward providing that a full set of outcome-specific

constants is specified in the model (since constants do not vary across alternate

outcomes this means that I-1 constants must be specified, where I is the set of

outcomes).



55

Under these conditions, standard MNL estimation correctly estimates all

parameters except for the outcome-specific constants. To correct the constant

estimates, each constant must have the following subtracted from it:

i

i

S FL NP F

⎛ ⎞⎜ ⎟⎝ ⎠

Where:

SFi is the fraction of observations having outcome i in the sample

PFi is the fraction of observations having outcome i in the total

population.

3. Enriched sampling is the merging of a random (or random stratified) sample with a

sample of another type.

Example: a random sample of route choices may be merged with a sample of

commuters observed taking one of the routes such as the freeway.

Some types of enriched sampling problems reduce to the same correction used

for the outcome-based samples. Others may result in estimation complications.

4. Double sampling usually refers to the process where information from a random sample

is used to direct the gathering of additional data often targeted at oversampling

underrepresented components of the population.

Estimation of MNL models with double sampling complicates the likelihood

function. The reader is referred to Manski and Lerman (1977), Manski and

McFadden (1981) and Coslett (198l) for details on estimation alternatives in

sampling and additional sampling strategies.



56

Forecasting and Aggregation Bias

When using logit models for forecasting there is a potential for bias if averages of X are

used because of the nonlinearly of the model.

avgx

Example of population aggregation bias.

all bias in estimating outcome shares in a population are eliminated by the equation

( ) ( )i

iX

S g X h X dX= ∫

where:

Si is the population share of outcome i,

0 xa x

1.0

xb

P (i|xb)

P (i|xa)

abP (i|xab) bias

P (i|xavg)



57

gi(X) is the functional representation of the model,

h(X) is the distribution of model variables over the population.

The problem in applying this equation is that h(X) is not completely known.

However, there are four common approximations to this equation:

1. Sample enumeration.

This procedure involves using the same sample that was used to estimate

the model to predict the population probabilities.

Outcome probabilities for all observations in the sample are computed

and these probabilities are averaged to approximate the true population

outcome probabilities for each alternative.

2. Density functions.

Constructing density function approximations of x's can approximate the

equation.

The advantage of this approach is that it has great flexibility in applying

results to different populations.

The disadvantages are that the functions are often difficult to construct on

theoretical or empirical grounds and it is difficult to capture covariances

among x's.

3. Distribution moments.

This approach attempts to describe h(X) by considering moments and

cross moments to represent the spread and shape of variable distributions

and their interactions in the population.



58

The major limitation of this approach is the gathering of theoretical

and/or empirical information to support the representation of moments

and cross moments.

4. Classification.

The classification approach attempts to categorize the population into

nearly homogeneous groups and use averages of x's from these groups.

This approach is easy to apply but the assumption of homogeneity among

population groups is often dubious and can introduce considerable error.

Transferability

A concern with all models is whether or not their estimated parameters are transferable

spatially (among regions or cities) or temporally (over time).

From a spatial perspective, transferability is desirable because it means that parameters of

models estimated in other places are used, thus saving the cost of additional data collection

and estimation.

Temporal transferability ensures that forecasts made with the model have some validity in

that the estimated parameters are stable over time.

When testing spatial and temporal transferability, likelihood ratio tests are applied,

-2[LL(βT) – LL(βa) – LL(βb)]

where:

LL(βT) is the log-likelihood at convergence of the model estimated

with the data from both regions (or both time periods),

LL(βa) is the log-likelihood at convergence of the model using region

a data, and



59

LL(βb) is the log-likelihood at convergence of the model using region

b data.

In this test the same variables are used in all three models (total model, region a model,

and region b model). This statistic is χ2 distributed with degrees of freedom equal to the

summation of the number of estimated parameters in all regional models (a and b in this

case but additional regions can be added to this test) minus the number of estimated

parameters in the overall model.

The resulting χ2 statistic provides the probability that the models have different

parameters. Alternatively, one could conduct the following test,

-2[LL(βba) – LL(βa)]

where

LL(βba) is the log-likelihood at convergence of a model using the

converged parameters from region b (using only region b's data) on

region a's data (restricting the parameters to be region b's estimated

parameters),

LL(βa) is the log-likelihood at convergence of the model using region

a data.

This test can also be reversed using LL(βab) and LL(βb).

The statistic is χ2 distributed with the degrees of freedom equal to the number of

estimated parameters in βba and the resulting χ2 statistic provides the probability

that the models have different parameters.

The Nested Logit Model (Generalized Extreme Value Models)

To overcome the IIA problem, the idea behind a nested logit model is to group alternate

outcomes suspected of sharing unobserved effects into nests (this sharing sets up the

disturbance term correlation that violates the derivation assumption).



60

Because the outcome probabilities are determined by differences in the functions

determining these probabilities (both observed and unobserved), shared unobserved effects

will cancel out in each nest providing that all alternatives in the nest share the same

unobserved effects..

This canceling out will not occur if a nest (group of alternatives) contains some alternative

outcomes that share unobserved effects and others that do not (this sets up an IIA violation

in the nest).

Suppose it is suspected that the arterial and two-lane road share unobserved elements (being

lower level roads relative to the freeway with no access control, lower design speeds). When

developing a nested structure to deal with the suspected disturbance term correlation, a structure

shown visually in the Figure is used.

By grouping the arterial and two-lane road in the same nest their shared unobserved

elements cancel.

arterial two-lane

freeway

non-freeway



61

Mathematically, McFadden (1981) has shown the GEV disturbance assumption leads to the

following model structure for observation n choosing outcome i

Pn(i) = EXP[βiΧin + φi Lin] / I∀∑ EXP[βIΧIn +φI LSIn]

Pn(j|i) = EXP[βj|i Χn] / J∀∑ EXP[βJ|i ΧJn ]

LSin = LN[J∀∑ exp(βJ|i ΧJn )]

where

Pn(i) is the unconditional probability of observation n having discrete

outcome i,

Χ's are vectors of measurable characteristics that determine the

probability of discrete outcomes,

β's are vectors of estimable parameters,

Pn(j|i) is the probability of observation n having discrete outcome j

conditioned on the outcome being in outcome category i (for example,

for the nested structure shown in the Figure the outcome category i

would be non-freeway) and

Pn(j|i) would be the binary logit model of the choice between the

arterial and two-lane road), J is the conditional set of outcomes

(conditioned on i), I is the unconditional set of outcome categories

(the upper two branches of Figure 11-5),

LSin is the inclusive value (logsum), and

φi is an estimable parameter.

Note that this equation system implies that the unconditional probability of having outcome j is,



62

Pn(j) = Pn(i) × Pn(j|i)

Estimation of a nested model is usually done in a sequential fashion.

1. Estimate the conditional model using only the observations in the sample that are

observed having discrete outcomes J. In the example illustrated in the Figure

this is a binary model of commuters observed taking the arterial or the freeway.

2. Once these estimation results are obtained, the logsum is calculated (this is the

denominator of one or more of the conditional models) for all observations, both

those selecting J and those not (for all commuters in our example case).

3. These computed logsums (in our example there is just one logsum) are used as

independent variables in the functions. Note that not all unconditional outcomes

need to have a logsum in their respective functions (the example shown in the

Figure would only have a logsum present in the function for the non-freeway

choice).

Caution needs to be exercised when using the sequential estimation procedure described

above because results in variance-covariance matrices that are too small and thus t-

statistics are inflated (typically by about 10-15%). This problem is resolved by estimating

the entire model at once using full information maximum likelihood.

It is important to note that the interpretation of the estimated parameter associated with

logsums (φi's) has the following important elements:

1. φi's must be greater than 0 and less than 1 in magnitude to be consistent with the nested

logit derivation.

2. If φi = 1, the assumed shared unobserved effects in the nest are not significant and the

nested model reduces to a simple MNL. Test with:



63

( )1 - t

S.E.β

β=

3. If φi is less than zero then factors increasing the likelihood of an outcome being chosen

in the lower nest will decrease the likelihood of the nest being chosen.

4. If φi is equal to zero then changes in nest outcome probabilities will not affect the

probability of nest selection and the correct model is recursive.

Special Properties of Logit Models

1. Sub-sampling of alternate outcomes for model estimation

There are many discrete outcome situations where the number of alternate

outcomes is very large.

The independently identically distributed extreme value distribution used in the

derivation of the multinomial logit model permits consistent estimation of model

parameters using a sub-sample of the available outcome set.

The estimation procedure is one of reducing the set of outcomes available to each

observation to a manageable size.

In doing this, one must include the outcome observed for each observation in the

estimation sample and this is supplemented with additional outcome possibilities

that are selected randomly from the complete outcome set (a different set of

randomly chosen outcomes is generated for each observation).

2. Compensating Variation

When logit models are coupled with the theory of utility maximization, the

denominator is used to compute important welfare effects.



64

The basis for this calculation is concept of compensating variation (CV), which is

the (hypothetical) amount of money that individuals would be paid (or debited)

to make them as well off after a change in X as they were prior to the change in

X.

The compensating variation for each observation n is

( ) ( )1f

I In

oI In

X

I In I X

CV - LN EXP Xβ

β

λ β∀

⎡ ⎤= − ⎢ ⎥⎣ ⎦∑

where

λ is the marginal utility of income,

Xo refers to initial values of X,

Xf refers to final values of X (after a policy change),

all other terms are as defined previously.

In most applications the marginal utility of income is equal in magnitude but

opposite in sign to the cost parameter associated with alternate outcomes—the

cost parameter estimated in the discrete outcome models.

Models of Ordered Discrete Data

In many transportation applications discrete data are ordered.

Examples:

quantitative ratings (on a scale from 1 to 10 rate the following),

ordered opinions (do you disagree, are neutral, or agree), or

categorical data (property damage only crash, injury crash, and fatal crash).



65

While these response data are discrete, application of the standard or nested multinomial

discrete models presented earlier do not account for the ordinal nature of the discrete

data and thus the ordering information is lost.

Ordered probability models are derived by defining an unobserved variable, z, that is

used as a basis for modeling the ordinal ranking of data.

z = βX + ε

where:

X is a vector of variables determining the discrete ordering for

observation n,

β is a vector of estimable parameters,

ε is a random disturbance.

Using this equation, observed ordinal data, y, for each observation are defined as,

y = 1 if z ≤ μ0

y = 2 if μ0 < z ≤ μ1

y = 3 if μ1 < z ≤ μ2

y = ...

y = I if z ≥ μI -1 ,

where μ's are estimable parameters (referred to as thresholds) that define y,

which corresponds to integer ordering, and I is the highest integer ordered

response.



66

Note that during estimation, non-numerical orderings such as never, sometimes,

and frequently are converted to integers (for example, 1, 2, and 3) without loss of

generality.

The μ's are parameters that are estimated jointly with the model parameters (β).

The estimation problem then becomes one of determining the probability of I

specific ordered responses for each observation n.

This determination is accomplished by making an assumption on the distribution

of ε.

If ε is assumed to be normally distributed across observations with mean = 0 and

variance = 1, an ordered probit model results with the ordered selection

probabilities being

P(y = 1) = Φ(-βX)

P(y = 2) = Φ(μ1-βX) - Φ(βX)

P(y = 3) = Φ(μ2-βX) - Φ(μ1-βX)

..

..

..

P(y = I) = 1 - Φ(μI -1-βX)

where Φ(.) is the cumulative normal distribution,

( ) 21 122

u

u EXP w dwΦπ −∞

⎡ ⎤= −⎢ ⎥⎣ ⎦∫



67

The threshold μ0 is set equal to zero without loss of generality (this implies that one need

only estimate I - 2 thresholds). So,

Illustration of an ordered probability model with μ0= 0.

Note that below shows that a positive value of βk implies that an increase in xk will

unambiguously increase the probability that the highest ordered discrete category results (y

= 5 in the Figure) and unambiguously decreases the probability that the lowest ordered

discrete category results (y = 1 in the Figure).

f (ε )

- βX

0

ε

0.1

0.2

0.3

μ1- βX μ2- βX μ3- βX

y = 1

y = 2

y = 3

y = 4

y = 5



68

Illustration of an ordered probability models with an increase in βX (with μ0= 0).

The problem with ordered probability models is associated with the interpretation of

intermediate categories, (y = 2, y = 3, and y = 4 in the Figure).

Depending on the location of the thresholds, it is not necessarily clear what effect a positive

or negative βk has on the probabilities of these "interior" categories.

This difficulty arises because the areas between the shifted thresholds may yield increasing

or decreasing probabilities after shifts to the left or right.

- βX

0

ε

0.1

0.2

0.3

μ1- βX μ2- βX μ3- βX

y = 1

y = 2

y = 3

y = 4

y = 5



69

Discrete/Continuous Models

Interrelated discrete and continuous data. Examples: 1. consumers’ choice of the type of vehicle to own (discrete) and the number of

kilometers to drive it (continuous),

2. choice of route (discrete) and driving speed (continuous), and

3. choice of trip-generating activity (discrete) and duration in the activity

(continuous).

Interrelated discrete/continuous data can be easily overlooked and sometimes

difficult to identify.

The Discrete/Continuous Modeling Problem

Interrelated discrete/continuous data is a problem of selectivity, with observed data

being an outcome of a selection process that results in a non-random sample of data

in observed discrete categories.

+ +

+ +

+ +

+

+ + +

+

+

+ +

+

+

- -

- - -

-

-

-

-

-

-

- -

- - -

- -

-

sn

+ +

+ - -

-

+ +

-

-

+

+

+

+ + -

- -

- -

-

+ + +

line 1

line 2

+ +

βf Xn



70

Example: Consider the estimation of a regression model of average travel speed on the freeway

route from work to home,

fn f n fns = X +β ξ

where

fns is the average speed of commuter n on the freeway,

Xn is a vector of commuter n characteristics influencing average travel speed,

βf is a vector of estimable coefficients and

fnξ are unobserved characteristics influencing travel speed.

In the figure, commuter data indicated by a ‘+’ represents the data of observed freeway users

and commuter data indicated by a ‘-’ represents the unobserved speeds of non-freeway users

had they chosen to drive on the freeway.

Because freeway users are a self-selected group (for whatever reasons) of faster drivers (with

faster drivers being more likely to choose the freeway), they are under represented at lower

values of βf Xn and over represented at higher values of βf Xn.

If the speed equation is estimated on only observed data (observed freeway users only), the

resulting estimates would be biased as indicated by “line 1” in Figure 12-1. The true equation

of freeway speed with all data (observed and unobserved) is given by “line 2”.

Econometric Corrections: Instrumental Variables and Expected Value Method

Revise the speed equation such that the average travel speed to work on commuter n’s

chosen route is,



71

n n n ns = X Z + +β α ξ

where

ns is the average speed of commuter n on commuter n’s chosen route,

Xn is a vector of commuter n characteristics influencing average travel speed that

are not a function of the route chosen (e.g. such as income, driver age),

Zn is a vector of characteristics commuter n faces that influence average travel

speed that are a function of the route chosen (such as number of traffic signals

and travel distance),

β and α are corresponding vectors of estimable coefficients and

nξ are unobserved characteristics influencing travel speed.

Direct estimation of equation 12.2 would result in biased and inefficient coefficient

estimation because Zn is endogenous due to the discrete/continuous interrelation between

travel speed and route choice.

This is because, as a commuter’s choice of speed increases, elements in the Zn vector

will change since the likelihood of selecting specific route is interrelated with speed

preferences.

As was the case in simultaneous equations models that involve all continuous variables

(see Chapter 5), one could replace the elements of Zn with estimates derived from

regressing Zn against all exogenous variables.

The procedure consists of estimating regression equations for all elements of Zn and to

use regression predicted values, nZ to estimate equation 12.2 such that,

n n n nˆs = X Z + +β α ξ



72

An alternative to the instrumental variables approach is to interact the endogenous variables

directly with the corresponding discrete-outcome model (the route-choice model in this case).

The most common approach is to replace the elements in the endogenous variable vector Zn with

their expected values.

( ) ( )jn n jni

E z = P i z∀∑

where Pn(i) is the predicted probability of commuter n selecting discrete outcome i, as

determined by a discrete outcome model.

The Equation then becomes,

( )n n n jn nj i

s = X P i z ∀ ∀

+ +∑∑β α ξ

Econometric Corrections: Selectivity-Bias Correction Term

Another popular approach to resolve selectivity bias problems and arrive at unbiased coefficient

estimates is to develop an expression for a selectivity-bias correction term.

In the context of the example problem, this is done by noting that average travel speed, s, for

commuter, n, from home to work can be written as,

( ) ( )n i n nE s | i = X E | i+β ξ

where

( )nE s |i is the average commute speed of commuter n conditional on the chosen route

i,

Xn is a vector of commuter n characteristics influencing average travel speed,

βi is a vector of estimable coefficients and



73

( )nE | iξ is the conditional unobserved characteristics.

Note that variables specific to i (Zn in Equation 12.2) are omitted from the right-hand side of this

equation.

Application of this equation provides bias corrected, and consistent estimates of βi because the

conditional expectation of ξn, ( )nE |iξ , accounts for the nonrandom observed commute speeds

that are selectively biased by commuters’ self-selected choice of route.

The problem then becomes one of deriving a closed-form representation of ( )nE |iξ that can be

used for equation estimation.

With a multinomial logit model, let γ denote a vector of discrete-outcome disturbance terms (ε1,

ε2, ε3, …εJ) where J is the total number of discrete outcomes.

The conditional expectation ξ (conditioned on discrete outcome k) is written as,

( ) ( ) ( )1

1 J

jji |i

E | i = E | f dP =

⎛ ⎞⎜ ⎟⎝ ⎠

∏∫γ

ξ ξ γ ε γ

where Pi is the probability of discrete outcome i.

If it assumed that γ is generalized extreme value distributed with σ2 being the unconditional

variance of ξ and ρi being the correlation of ξ and the resulting discrete-outcome logistic error

terms (resulting from the differencing of εI −εj), then

( ) ( ) ( ) ( ) ( )( ) ( ) ( )21 6 1 1J

J+1n i j j j i

j i

E | i = - J P LN P P LN Pξ σ ρ π≠

⎡ ⎤⎡ ⎤− +⎢ ⎥⎣ ⎦⎣ ⎦∑

Using this equation, selectivity bias in discrete/continuous models is corrected by undertaking

the following 3 steps:

1. Estimate a multinomial logit model to predict the probabilities of discrete outcomes i for

each observation.



74

2. Use the logit-predicted outcome probabilities to compute the portion of Equation 12.10 in

large brackets ([.]) for each observation.

3. Use the values computed in step 2 to estimate Equation 12.8 using standard least-squares

regression methods. The term ( )1 J+1- σ6ρi/π2 becomes a single estimable parameter.

The speed equation is estimated for each route k as:

in i n i n ns = X + +β αλ η

where αi = ( )1 J+1- σ6ρi/π2 , λn = ( ) ( )( ) ( ) ( )1 1J

j j j ij i

J P L N P P L N P≠

⎡ ⎤⎡ ⎤− +⎢ ⎥⎣ ⎦⎣ ⎦∑ , and nη is a

disturbance term. It is common practice to not include ρi in the summation over discrete

outcomes J.

By doing this, an equality restriction of the correlations of across ξn and εi−εj’s is imposed. This

restriction is relaxed by moving ρi within the summation making it necessary to estimate a total

of J−1 selectivity bias coefficients (α’s) for each continuous equation corresponding to discrete

outcome i.

However, it has been shown empirically by Hay (1980) and Mannering (1986a) that this

restriction on ρi is reasonable.

Discrete/continuous Model Structures

Reduced form approach

A common way to implement a reduced form is to start with the discrete model.

Let Tin be a linear function that determines discrete outcome i for observation n and

let yin be the corresponding continuous variable in the discrete/continuous modeling

system.

Then,

Tin = βi Xin + φi yin + εin



75

where βi is a vector of estimable coefficients for discrete outcome i,

Xin is a vector of the observable characteristics (covariates) that determines

discrete outcomes for observation n, and

εin is a disturbance term.

Let the corresponding continuous equation be the linear function,

yin = θi Win + νin

where θi is a vector of estimable coefficients for the continuous variable observed

for discrete outcome i,

Win is a vector of the observable characteristics (covariates) that determine yin,

and

νin is a disturbance term.

This Equation is estimated using ordinary least squares with appropriate selectivity

bias correction (such as adding a selectivity bias correction term).

For estimation of the discrete outcome portion of the discrete/continuous process,

note that yin is endogenous because yin changes with changing Tin due to the

interrelated discrete/continuous structure. So, substituting,

Tin = βi Xin + φiθi Win + φiνin + εin

With this Equation, a discrete outcome model is derived readily.

For example, if εin’s are assumed to be generalized extreme value distributed, a

multinomial logit model results with the probability of observation n having outcome

i as,

( ) ( )( )

i in i i inn

I In I I In I

E X P X WP i

E X P X Wβ φ θβ φ θ

∀

+=

+∑



76

Note that because the term φiνin does not vary across outcomes i, it cancels out and

does not enter the logit model structure.

Economic consistency approach

Using utility maximization, note that,

in

0 inin

in

n

Vpy V

Inc

= −

∂∂∂∂

where

Vin is the indirect utility of discrete alternative i to consumer n,

pin is the consumer’s unit price of consuming i,

Incn is the decision-maker's income, and

yin0 is the utility maximizing demand for i.

The discrete/continuous link is made by either specifying the indirect utility function,

Vin, or the commodity demand y.

To develop an economically consistent model structure, for vehicle type choice (discrete)

and vehicle utilization (continuous), a utilization equation is specified as (alternatively an

indirect utility function could be specified first),

yin = i in i in X Z+β α + κ(Incn − πrnpin) + νin

where

yin is the annual utilization (for example, kilometers per year) of vehicle i,

inX is a vector of consumer characteristics that determine vehicle

utilization,



77

inZ is a vector of vehicle characteristics that determine vehicle utilization,

Incn is consumer n's annual income,

rn is the expected annual vehicle utilization for consumer n,

pin is the consumer’s unit price of utilization (for example, dollars per

kilometer driven),

iβ , iα , κ, and π are estimable coefficients, and νi is a disturbance term.

The expected utilization, rin, is needed to capture the income effect of consumption (Incn −

πrnpin) but is determined exogenously by regressing it against exogenous variables.

With this Equation Roy’s identity is applied as a partial differential equation,

00in inin

n in

V Vy Inc p

+ =∂ ∂∂ ∂

and solve for Vin giving,

( ) inκpin i in i in n n in inV X Z Inc r p e−= + + − +⎡ ⎤⎣ ⎦β α κ π ν + εin

where εin is a disturbance term added for discrete-outcome model estimation.

If εin’s are assumed to be generalized extreme value distributed, a logit model for discrete

outcomes results.

The major drawback with the economically consistent approach is that a nonlinear form of

either the indirect utility function or continuous equation results.

In choosing between a reduced form and the economically consistent approaches, many

applications require a trade-off between ease of estimation and compromising theory.



78

Duration Models

Data such as the elapsed time until the occurrence of an event or the duration of an event.

Examples include:

time until a vehicle accident occurs,

time between vehicle purchases,

the time devoted to an activity (shopping, recreational, etc.),

the time until the adoption of new transportation technologies.

Duration data are usually continuous and can be modeled using least-squares regression.

The use of estimation techniques that are based on hazard functions, however, can often

provide additional insights into the underlying duration problem.

Hazard-Based Duration Models

Hazard-based models are applied to study the conditional probability of a time duration ending

at some time t, given that the duration has continued until time t.

Developing hazard-based duration models begins with the cumulative distribution function,

F(t) = P(T < t)

Where:

P denotes probability,

T is a random time variable, and

t is some specified time.



79

The density function corresponding to this distribution function (the first derivative of the

cumulative distribution with respect to time) is

f(t) = dF(t)/dt

and the hazard function is

h(t) = f(t)/[1 - F(t)]

where: h(t) is the conditional probability that an event will occur between time t and t

+ dt, given that the event has not occurred up to time t.

h(t) gives the rate at which event durations are ending at time t, given that the event

duration has not ended up to time t.

The survivor function, which provides the probability that a duration will be greater than or

equal to some specified time, t, is also frequently used in hazard analyses for interpretation of

results. The survivor function is

S(t) = P(T ≥ t)

Graphically, hazard, density, cumulative distribution, and survivor functions are illustrated in the

Figure.



80

0.5

1.0

0

1 2 3 4t

F(t)

h(t)

S(t)f(t)

The slope of the hazard function (the first derivative with respect to time) captures Duration

Dependence:

0

1 2 3 4 5

h(t)

t

1

h (t)

h (t)

h (t)h (t)

3

4

2

2

4



81

Consider a driver’s probability of having an accident and the length of time without having an

accident.

1. The first hazard function, h1(t), has dh1(t)/dt < 0 for all t. This hazard is monotonically

decreasing in duration, implying that the longer drivers go without having an accident, the less

likely they are to have one soon.

2. The second hazard function is nonmonotonic and has dh2(t)/dt > 0 and

dh2(t)/dt < 0 depending on the length of duration t. In this case the accident probabilities

increase or decrease in duration.

3. The third hazard function has dh3(t)/dt > 0 for all t and is monotonically increasing in duration.

This implies that the longer drivers go without having an accident the more likely they are to

have an accident soon.

4. Finally, the fourth hazard function has dh4(t)/dt = 0, which means that accident probabilities are

independent of duration and no duration dependence exists.

In addition to duration dependence, hazard-based duration models account for the effect of

covariates on probabilities.

Proportional hazards

The proportional-hazards approach assumes that the covariates, which are factors that affect

accident probabilities, act multiplicatively on some underlying hazard function.

The baseline hazard function is denoted ho(t) assumes that all elements of a covariate vector, X,

are zero.

For simplicity, covariates are assumed influence the baseline hazard with the function EXP(βX),

where β is a vector of estimable coefficients.

The hazard rate with covariates is

h(t|X) = ho(t)EXP(βX)



82

This proportional-hazards approach is illustrated:

2

4

6

0 1 2 3 4 5

Accelerated Lifetime

Assumes that the covariates rescale (accelerate) time directly in a baseline survivor

function, which is the survivor function when all covariates are zero.

The accelerated lifetime model is written as

S(t|X) = So[t EXP(βX)]

which leads to the conditional hazard function

h(t|X) = ho[t EXP(βX)]EXP(βX)

Characteristics of Duration Data

Duration data are often left or right censored.

h(t|Z)=ho(t)EXP(βX)

ho(t)



83

For example, consider the time of driving a vehicle until a driver’s first accident.

Suppose data are only available for reported accidents over a specified time period

beginning at time a in the Figure and ending at time b.

Observation 1 is not observed, since it does not fall within the time period of observation.

Observation 2 is left and right censored because it is not known when driving began and the

first accident is not observed in the a to b time interval.

Observation 3 is complete with both start and ending times in the observed period.

Observations 4 and 6 are left censored and observation 5 is right censored.

Hazard-based models can readily account for right-censored data.

Time

Observations

1

2

3

4

5

6

t1

t2

t3

t4

t5

t6

a b



84

With left-censored data the problem becomes one of determining the distribution of

duration start times so that they can be used to determine the contribution of the left-

censored data to the model’s likelihood function.

Left-censored data creates a far more difficult problem because of the additional complexity

added to the likelihood function.

Tied Data

Occurs when a number of observations end their durations at the same time. This is referred

to as the problem of tied data.

Tied data can arise when data collection is not precise enough to identify exact duration-

ending times.

When duration exits are grouped at specific times, the likelihood function for proportional

and accelerated lifetime models becomes increasingly complex.

Non-Parametric Models

Rare because of pre-dominant use of semi-parametric and parametric methods.

There are two popular approaches for generating survival functions for non-parametric

methods:

1. The product-limit (PL) method developed by Kaplan and Meier (1958)

2. Life table method (groups survival times into intervals).

Semi-Parametric Models

Semi-parametric models do not assume a duration-time distribution, although they do retain

the parametric assumption of the covariate influence.



85

A nonparametric approach for modeling the hazard function is convenient when little or no

knowledge of the functional form of the hazard is available.

The Cox proportional-hazards model is semi parametric because EXP(βX) is still used as the

functional form of the covariate influence.

The model is based on the ratio of hazards—so that the probability of an observation i

exiting a duration at time ti, given that at least one observation exits at time ti, is given as

( ) ( )i

i jj R

EXP X EXP Xβ β∈∑

Where: Ri denotes the set of observations, j, with durations greater than or equal to ti.

Fully-Parametric Models

With fully-parametric models, typical distributions of the hazard function include

gamma, exponential, Weibull, log-logistic, log-normal and Gompertz distributions,

among others.

The choice of a specific distribution has important implications relating not only to the

shape the underlying hazard, but also to the efficiency and potential biasedness of the

estimated parameters.

Exponential

f(t) = λEXP(-λt)

with hazard,

h(t) = λ

This distribution's hazard is constant, as illustrated by h4(t) in the previous figure.

This means that the probability of a duration ending is independent of time and there is

no duration dependence.



86

Weibull

A more generalized form of the exponential.

It allows for positive duration dependence (hazard is monotonic increasing in duration and

the probability of the duration ending increases over time), negative duration dependence

(hazard is monotonic decreasing in duration and the probability of the duration ending

decreases over time) or no duration dependence (hazard is constant in duration and the

probability of the duration ending is unchanged over time).

With parameters λ > 0 and P > 0, the Weibull distribution has the density function,

f(t) = λP(λt)P-1EXP[-(λt)P]

with hazard

( ) ( ) ( ) 1Ph t P tλ λ −=

If the Weibull parameter P is greater than one, the hazard is monotone increasing in duration

(see h3(t) in Figure);

If P is less than one, it is monotone decreasing in duration (see h1(t) in Figure);

If P equals one, the hazard is constant in duration and reduces to the exponential

distribution's hazard with h(t) = λ (see h4(t) in Figure).

Because the Weibull distribution is a more generalized form of the exponential distribution,

it provides a more flexible means of capturing duration dependence. However, it is still

limited because it requires the hazard to be monotonic over time. In many applications, a

nonmonotonic hazard is theoretically justified.

Log-logistic

The log-logistic distribution allows for nonmonotonic hazard functions and is often used as

an approximation of the more computationally cumbersome lognormal distribution.



87

The log-logistic with parameters λ > 0 and P > 0 has the density function,

f(t) = λP(λt)P-1[1+(λt)P]-2

and hazard function

( ) ( ) ( )( )

1

1

P

P

P th t

t

λ λ

λ

−

=⎡ ⎤+⎣ ⎦

The log-logistic's hazard is identical to the Weibull's except for the denominator.

If P < 1, then the hazard is monotone decreasing in duration (see h1(t) in Figure 9-2);

If P = 1, then the hazard is monotone decreasing in duration from parameter λ; and if P

> 1, then the hazard increases in duration from zero to an inflection point, ti = (P-1)1/P

/λ,

and decreases toward zero thereafter (see h2(t) in Figure 9-2).

Comparisons of Non-Parametric, Semi-Parametric, and Fully-Parametric Models

The choice among non-parametric, semi-parametric, and fully-parametric methods

for estimating survival or duration models can be complicated.

When there is little information about the underlying distribution due to the small

size of the sample or the lack of a theory that would suggest a specific distribution, a

non-parametric approach may be appropriate.

Parametric methods are more suitable when underlying distributions are known or

can be theoretically justified.

Semi-parametric models may also be a good choice when little is known about the

underlying hazard distribution. Problems:

1. Duration effects can be difficult to track.

2. A potential loss in efficiency may result. It can be shown that in data where

censoring exists and the underlying survival distribution is known, the Cox



88

semi-parametric proportional hazards model does not produce efficient

coefficient estimates.

Comparing various hazard distributional assumptions for fully-parametric models can also

be difficult.

Determining the relative difference between a Weibull and exponential model

could be approximated by the significance of the Weibull’s P parameter,

which represents the difference between the two distributions.

The difference between the Weibull and log-logistic models and other

distributions is more difficult to test because the models may not be nested.

One possible comparison for distributional models that are not nested is to

compare likelihood ratio statistics

–2(LL(0)- LL(βc))

where LL(0) is the initial log-likelihood (with all coefficients equal to

zero) and LL(βc) is the log-likelihood at convergence.

This statistic is χ2 distributed with the degrees of freedom equal to the

number of estimated coefficients included in the model.

One could select the distribution that provided the highest level of

significance for this statistic to determine the best-fit distribution.

Heterogeneity

While formulating proportional-hazard models an implicit assumption made is that the

survival function (see Equation 9.6) is homogenous across observations.

All of the variation in durations is assumed to be captured by the covariate vector X.

A problem arises when some unobserved factors, not included in X, influence durations.



89

This is referred to as unobserved heterogeneity and can result in a major specification error

that can lead one to draw erroneous inferences on the shape of the hazard function, in

addition to producing inconsistent coefficient estimates.

In fully-parametric models the most common approach to account for heterogeneity is to

introduce a heterogeneity term designed to capture unobserved effects across the population

and to work with the resulting conditional survival function.

With a heterogeneity term, w, having a distribution over the population g(w), along with a

conditional survival function, S(t|w), the unconditional survival function becomes

S(t) = ∫ S(t|w)g(w)dw

For a Weibull distribution with gamma heterogeneity, without loss of generality, w is

assumed to be gamma distributed with mean 1 and variance = 1/k. So,

( )1

k-kv k-kg ( w ) e w

k=Γ

With the Weibull distribution and S(t) = f(t)/h(t), so,

( )P- w tS( t | w) e= λ

The unconditional survival distribution can then be written as (with θ = 1/k):

( )1

01

P

S( t ) S( t | w )g( w )dw t

−∞ ⎡ ⎤= = +⎣ ⎦∫θ

θ λ

resulting in the hazard function

h(t) = λP(λt)P-1[S(t)]θ

Note that if θ = 0, heterogeneity is not present because the hazard reduces to a simple

Weibull and the variance of the heterogeneity term w is zero.



90

The selection of a heterogeneity distribution should not be taken lightly. The consequences

of incorrectly specifying g(w) is potentially severe and can result in inconsistent estimates.

State Dependence

In duration modeling, state dependence refers to a number of processes that seek to

establish a relationship between past duration experiences and current durations.

Conceptually, the understanding of how past experience affects current behavior is a

key component in modeling that captures important habitual behavior.

State dependence can be classified into three types:

1. Type I state dependence is duration dependence. As discussed above,

duration dependence is the conditional probability of a duration ending soon,

given that it has lasted to some known time. Type I state dependence is

captured in the shape of the hazard function.

2. Type II state dependence is occurrence dependence. This refers to the effect

that the number of previous durations has on a current duration. This is

accounted in the model by including the number of previous shopping

durations as a variable in the covariate vector.

3. Type III state dependence, lagged duration dependence, accounts for the

effect that lengths of previous durations have on a current duration. This

would be accounted for in the model’s covariate vector by including the

length of previous durations as a variable.

When including variables that account for Type II or Type III state duration

dependence, the findings can easily be misinterpreted. This arises because

unobserved heterogeneity can be captured in the coefficient estimates of the state

dependence variables.



91

Time-varying covariates

Covariates that change over a duration are problematic.

If the covariate vector, X, changes over the duration being studied, coefficient

estimates may be biased.

Time-varying covariates are difficult to account for, but can be incorporated in

hazard models by allowing the covariate vector to be a function of time.

The hazard and likelihood functions are then appropriately re-written. The

likelihood function becomes more complex, but estimation is typically simplified

because time-varying covariates usually make only a few discrete changes over the

duration being studied.

Discrete-time hazard models

An alternative to standard continuous-time hazard models is to use a discrete-time

approach.

In discrete-time models, time is segmented into uniform discrete categories and exit

probabilities in each time period are estimated using a logistic regression or other

discrete outcome modeling approach.

This discrete approach allows for a very general form of the hazard function (and

duration effects) that can change from one time interval to the next as shown in the

Figure. However, the hazard is implicitly assumed to be constant during each of the

discrete time intervals.



92

Discrete hazard models have the obvious drawback of inefficient coefficient estimators because of information lost in discretizing continuous data..

Although discrete-time hazard models are inefficient, they provide at least two

advantages over their continuous-time counterparts.

1. Time-varying covariates can be easily handled by incorporating new, time-

varying values in the discrete time intervals (there is still the assumption that

they will be constant over the discrete-time interval).

2. Second, tied data (groups of observations exiting a duration at the same

time), which, as discussed above, is problematic when using continuous-data

approaches is not a problem with discrete hazard models. This is because the

grouped data fall within a single discrete time interval and the subsequent

lack of information on “exact” exit times does not complicate the estimation.

Time

4

2

3

1

0

h(t)

1 2 3 4 5 60



93


Assignment #1 (Continuous Data - Regression Analysis)

You are given 151 observations of a travel survey collected in State College, Pennsylvania. All of the households in the sample are making the morning commute to work. They are all departing from the same origin (a large residential complex in the suburbs) and going to work in the Central Business District. They have the choice of three alternate routes; 1) a four-lane arterial (speed limit = 35mph, 2 lanes each direction), 2) a two-lane rural road (speed limit = 35mph, 1 lane each direction) and 3) a limited access four-lane freeway (speed limit = 55mph, 2 lanes each direction). Your task is to estimate a model of individual average travel speed to work using standard regression techniques. Your solution to this problem should include: 1. The results of your best model specification. 2. A discussion of the logical process that led you to the selection of your final specification.

(e.g. Discuss the theory behind the inclusion of your selected variables). Include t-statistics and justify the sign of your variables.



94

Variables available for your specification are: (file trt.out)

Variable Explanation

x1 Actual in-vehicle travel time in minutes

x2 Route chosen: 1 - arterial, 2 - rural road, 3 - freeway

x3 Traffic flow rate at time of departure in vehicles per hour

x4 Number of traffic signals on the selected route

x5 Distance along the selected route in tenths of miles

x6 Seat belts: 1 - if wear, 0 - if not

x7 Number of passengers in car

x8 Driver age in years: 1 - 18 to 23, 2 - 24 to 29, 3 - 30 to 39, 4 - 40 to 49, 5 - 50 and above

x9 Gender: 1 - male, 0 - female

x10 Marital status: 1 - single, 0 - married

x11 Number of children

x12 Annual income: 1 - less than 20000, 2 - 20000 to 29999, 3 - 30000 to 39999, 4 - 40000 to 49999, 5 - more than 50000

x13 Model year of car (e.g. 86 = 1986)

x14 Origin of car: 1 - domestic, 0 - foreign



95--> sample;1-151$ --> read;nvar=14;nobs=151;file=D:\old_drive_d\new_laptop\CE697N-disk\trt.out.... --> create;speed=(x5/10)/(x1/60)$ --> dstat;rhs=speed$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- SPEED .263726534D+02 .854564197D+01 .113684211D+02 .622500000D+02 151 --> create;if(x2=3) frwy=1$ --> create;if(x2=1) art=1$ --> create;cage=86-x13$ --> regress;lhs=speed;rhs=one,frwy,art,cage,x6,x3$ +-----------------------------------------------------------------------+ | Ordinary least squares regression Weighting variable = none | | Dep. var. = SPEED Mean= 26.37265340 , S.D.= 8.545641969 | | Model size: Observations = 151, Parameters = 6, Deg.Fr.= 145 | | Residuals: Sum of squares= .8103322769D+04, Std.Dev.= 7.47563 | | Fit: R-squared= .260254, Adjusted R-squared = .23475 | | Model test: F[ 5, 145] = 10.20, Prob value = .00000 | | Diagnostic: Log-L = -514.9573, Restricted(b=0) Log-L = -537.7167 | | LogAmemiyaPrCrt.= 4.062, Akaike Info. Crt.= 6.900 | | Autocorrel: Durbin-Watson Statistic = 1.93247, Rho = .03376 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 27.50201489 2.5979595 10.586 .0000 FRWY 11.93363775 2.3534831 5.071 .0000 .99337748E-01 ART 3.401468353 1.7110414 1.988 .0487 .21854305 CAGE -.2423910189 .15614209 -1.552 .1228 4.0927152 X6 2.056499369 1.3548641 1.518 .1312 .70860927 X3 -.7138526638E-02 .42918384E-02 -1.663 .0984 493.57616 --> create;if(x2=3)frwytl=x4$ --> create;if(x2=1)arttl=x4$ --> dstat;rhs=frwytl,arttl$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- FRWYTL .523178808D+00 .165261182D+01 .000000000D+00 .700000000D+01 151 ARTTL .322516556D+01 .620233524D+01 .000000000D+00 .230000000D+02 151



96 --> regress;lhs=speed;rhs=one,frwy,art,cage,x6,x3,x5,frwytl,arttl$ +-----------------------------------------------------------------------+ | Ordinary least squares regression Weighting variable = none | | Dep. var. = SPEED Mean= 26.37265340 , S.D.= 8.545641969 | | Model size: Observations = 151, Parameters = 9, Deg.Fr.= 142 | | Residuals: Sum of squares= .7009953198D+04, Std.Dev.= 7.02608 | | Fit: R-squared= .360067, Adjusted R-squared = .32401 | | Model test: F[ 8, 142] = 9.99, Prob value = .00000 | | Diagnostic: Log-L = -504.0141, Restricted(b=0) Log-L = -537.7167 | | LogAmemiyaPrCrt.= 3.957, Akaike Info. Crt.= 6.795 | | Autocorrel: Durbin-Watson Statistic = 1.91592, Rho = .04204 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 16.16325632 3.9928669 4.048 .0001 FRWY 16.94939297 7.6315911 2.221 .0279 .99337748E-01 ART 1.988634670 8.6623071 .230 .8188 .21854305 CAGE -.2328219473 .14912800 -1.561 .1207 4.0927152 X6 1.992892731 1.2752154 1.563 .1203 .70860927 X3 -.8198678320E-02 .40677382E-02 -2.016 .0457 493.57616 X5 .2661454594 .74404251E-01 3.577 .0005 48.099338 FRWYTL -2.301061742 1.2457614 -1.847 .0668 .52317881 ARTTL .2508103259E-01 .58844954 .043 .9661 3.2251656 --> create;if(x2=2)rurtl=x4$ --> create;if(x8<3&x9=1)youngm=1$ --> regress;lhs=speed;rhs=one,frwy,art,cage,x6,x3,x5,frwytl,rurtl,youngm$ +-----------------------------------------------------------------------+ | Ordinary least squares regression Weighting variable = none | | Dep. var. = SPEED Mean= 26.37265340 , S.D.= 8.545641969 | | Model size: Observations = 151, Parameters = 10, Deg.Fr.= 141 | | Residuals: Sum of squares= .6727183008D+04, Std.Dev.= 6.90728 | | Fit: R-squared= .385881, Adjusted R-squared = .34668 | | Model test: F[ 9, 141] = 9.84, Prob value = .00000 | | Diagnostic: Log-L = -500.9054, Restricted(b=0) Log-L = -537.7167 | | LogAmemiyaPrCrt.= 3.929, Akaike Info. Crt.= 6.767 | | Autocorrel: Durbin-Watson Statistic = 1.91023, Rho = .04489 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 22.90059405 5.1196606 4.473 .0000 FRWY -.4626342414E-02 10.374523 .000 .9996 .99337748E-01 ART -9.052728265 5.2732767 -1.717 .0882 .21854305 CAGE -.2187957231 .14657167 -1.493 .1377 4.0927152 X6 1.545666270 1.2703105 1.217 .2257 .70860927 X3 -.9457421614E-02 .40102172E-02 -2.358 .0197 493.57616 X5 .3757958689 .84435209E-01 4.451 .0000 48.099338 FRWYTL -1.821566472 1.2374165 -1.472 .1432 .52317881 RURTL -1.468789815 .64959125 -2.261 .0253 5.1523179 YOUNGM 1.309483785 1.2864788 1.018 .3105 .27814570



97 --> create;if(x9=1)maleage=x8$ --> histogram;rhs=maleage$

Bin

Histogram for Variable MALEAGE

Freq

uenc

y

0

19

38

57

76

0 1 2 3 4 5



98


Assignment #2 (Count Data - Poisson Regression)

You are given 204 observations from a travel survey conducted in the Seattle metropolitan area. The purpose of the survey was to study the number of times (per week) commuters' changed their departure time on their work-to-home trip to avoid traffic congestion. The data are non-negative integers and are thus well suited to the Poisson regression approach. Remember in a Poisson regression, you are estimating a parameter vector β such that: λ = EXP(βX) where λ is the Poisson parameter that in this case is the expected number of departure changes per week. In your analysis include: 1. The results of your best model specification. 2. A discussion of the logical process that led you to the selection of your final specification (e.g.

discuss the theory behind the inclusion of your selected variables). Include t-statistics and justify the sign of your variables.



99 Variables available for your specification are (file tobit.dat) Variable Number Explanation

x1 Household number

x2 Do you ever delay work-to-home departure to avoid traffic congestion? 1-yes, 0-no

x3 If sometimes delay, on average how many minutes do you delay?

x4 If sometimes delay, do you 1-perform additional work, 2-engage in non-work activities, or 3-do both?

x5 If sometimes delay, how many times have you delayed in the past week?

x6 Mode of transportation used work-to-home: 1-car SOV, 2-carpool, 3-vanpool, 4-bus, 5 other.

x7 Primary route (work-to-home): 1-I90, 2-I5, 3-SR520, 4-I405, 5-other

x8 Do you generally encounter traffic congestion on you work-to-home trip? 1-yes, 2-no

x9 Age: 1-(<25), 2-(26-30), 3-(31-35), 4-(36-40), 5-(41-45), 6-(46-50), 7-(>50)

x10 Gender: 1-male, 0-female

x11 Number of cars in household

x12 Number of children in household

x13 Income: 1 - less than 20000, 2 - 20000 to 29999, 3 - 30000 to 39999, 4 - 40000 to 49999, 5 - 50000 to 59999, 6 - >60000

x14 Do you have flexible work hours? 1-yes, 0-no

x15 Distance from work to home (in miles)

x16 Face LOS D or worse? 1-yes, 0-no

x17 Ratio of actual travel time to free-flow travel time

x18 Population of work zone

x19 Retail employment in work zone

x20 Service employment in work zone

x21 Size of work zone (in acres)



100 --> read;nvar=21;nobs=204;file=D:\old_drive_d\new_laptop\CE697N-disk\TOBIT.DAT$ --> reject;x2=0$ --> create;if(x7=3)sr520=1$ --> create;if(x7=2)I5=1$ --> dstat;rhs=x5$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- X5 .183333333D+01 .137394297D+01 .000000000D+00 .500000000D+01 96 --> histogram;rhs=x5$ Histogram for X5 NOBS= 96, Too low: 0, Too high: 0 Bin Lower limit Upper limit Frequency Cumulative Frequency ======================================================================== 0 .000 1.000 18 ( .1875) 18( .1875) 1 1.000 2.000 23 ( .2396) 41( .4271) 2 2.000 3.000 27 ( .2813) 68( .7083) 3 3.000 4.000 20 ( .2083) 88( .9167) 4 4.000 5.000 1 ( .0104) 89( .9271) 5 5.000 6.000 7 ( .0729) 96(1.0000)

Bin

Histogram for Variable X5

Freq

uenc

y

0

8

16

24

32

0 1 2 3 4 5



101 --> poisson;lhs=x5;rhs=one,sr520,i5,x10,x11,x14,x15,x17; limit=6;truncation;upper$ +-----------------------------------------------------------------------+ | Poisson Regression Model - OLS Results | | Ordinary least squares regression Weighting variable = none | | Dep. var. = X5 Mean= 1.556818182 , S.D.= 1.059797796 | | Model size: Observations = 88, Parameters = 8, Deg.Fr.= 80 | | Residuals: Sum of squares= .9111512339D+02, Std.Dev.= 1.06721 | | Fit: R-squared= .067551, Adjusted R-squared = -.01404 | | Model test: F[ 7, 80] = .83, Prob value = .56715 | | Diagnostic: Log-L = -126.3972, Restricted(b=0) Log-L = -129.4746 | | LogAmemiyaPrCrt.= .217, Akaike Info. Crt.= 3.054 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 2.124169713 1.1147607 1.905 .0567 SR520 -.1788795348E-05 .52109802E-03 -.003 .9973 .17045455 I5 .4994266029E-01 .24964851 .200 .8414 .36363636 X10 .8129313858E-03 .25291341 .003 .9974 .70454545 X11 -.6078538464E-02 .11441429 -.053 .9576 1.9431818 X14 -.4891679372 .24426295 -2.003 .0452 .64772727 X15 -.3586809102E-01 .26568623E-01 -1.350 .1770 7.8750000 X17 .1280136445E-01 .44047325 .029 .9768 1.9556818 +---------------------------------------------+ | Poisson Regression | | Maximum Likelihood Estimates | | Dependent variable X5 | | Weighting variable ONE | | Number of observations 96 | | Iterations completed 6 | | Log likelihood function -151.3086 | | Restricted log likelihood -160.5608 | | Chi-squared 18.50428 | | Degrees of freedom 7 | | Significance level .9890572E-02 | | RIGHT Truncated data, at Y = 5. | | Chi- squared = 90.40988 RsqP= .0757 | | G - squared = 103.41426 RsqD= .1168 | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 1.821166565 .75186875 2.422 .0154 SR520 -.5376391967 .25629632 -2.098 .0359 .16666667 I5 -.3251529733 .18668242 -1.742 .0816 .34375000 X10 -.4854679467E-01 .17966580 -.270 .7870 .69791667 X11 -.1210107221 .87127154E-01 -1.389 .1649 1.8854167 X14 -.3929141620 .17063688 -2.303 .0213 .63541667 X15 -.2822605374E-01 .20647975E-01 -1.367 .1716 7.7083333 X17 -.1453901223 .30066940 -.484 .6287 1.9593750



102


Assignment #3 (Discrete Data - Logit Analysis)

You are given 151 observations of a travel survey collected in State College Pennsylvania (same data as in assignment #1). All of the households in the sample are making the morning commute to work. They are all departing from the same origin (a large residential complex in the suburbs) and going to work in the Central Business District. They have the choice of three alternate routes; 1) a four-lane arterial (speed limit = 35mph, 2 lanes each direction), 2) a two-lane rural road (speed limit = 35mph, 1 lane each direction) and 3) a limited access four-lane freeway (speed limit = 55mph, 2 lanes each direction). Your task is to estimate a model of Route Choice (i.e., the likelihood of an individual traveler taking one of the three routes). Your solution to this problem should include:

1. The results of your best model specification.

2. A discussion of the logical process that led you to the selection of your final specification. (e.g. Discuss the theory behind the inclusion of your selected variables). Include t-statistics and justify the sign of your variables.

For reference, see Example 11.1 on page 267 of Washington, S., M. Karlaftis and F. Mannering (2003) Statistical and econometric methods for transportation data analysis, Chapman & Hall/CRC, Boca Raton, FL, 425 pages.



103 Variables available for your specification are (in file LOGIT-A1.txt):

Variable Number Explanation

x1 Route chosen, rows: 1 - arterial, 2 - rural road, 3 - freeway

x2 Arterial row indicator; 1 for arterial row, 0 for others

x3 Rural row indicator; 1 for rural row, 0 for others

x4 Freeway row indicator; 1 for freeway row, 0 for others

x5 Traffic flow rate

x6 Number of traffic signals

x7 Distance in tenths of miles

x8 Seat belts: 1 - if wear, 0 - if not

x9 Number of passengers in car

x10 Driver age in years: 1 - 18 to 23, 2 - 24 to 29, 3 - 30 to 39, 4 - 40 to 49, 5 - 50 and above

x11 Gender: 1 - male, 0 - female

x12 Marital status: 1 - single, 0 - married

x13 Number of children

x14 Annual income: 1 - less than 20000, 2 - 20000 to 29999, 3 - 30000 to 39999, 4 - 40000 to 49999, 5 - more than 50000

x15 Model year of car (e.g. 86 = 1986)

x16 Origin of car: 1 - domestic, 0 - foreign

x17 Fuel efficiency in miles per gallon



104 --> read;nvar=17;nobs=453;file=D:\old_drive_d\new_laptop\CE697N-disk\LOGIT-A1... --> create;cage=86-x15$ --> nlogit;lhs=x1;choices=arterial,rural,freeway;model: u(arterial)=dist*x7/ u(rural)=rural*one+dist*x7+cager*cage/ u(freeway)=freeway*one+dist*x7+malef*x11+cagef*cage$ Normal exit from iterations. Exit status=0. +---------------------------------------------+ | Discrete choice (multinomial logit) model | | Maximum Likelihood Estimates | | Dependent variable Choice | | Weighting variable ONE | | Number of observations 151 | | Iterations completed 7 | | Log likelihood function -97.32659 | | Log-L for Choice model = -97.3266 | | R2=1-LogL/LogL* Log-L fncn R-sqrd RsqAdj | | No coefficients -165.8905 .41331 .40142 | | Constants only -124.2267 .21654 .20066 | | Chi-squared[ 4] = 53.80016 | | Significance for chi-squared = 1.00000 | | Response data are given as ind. choice. | | Number of obs.= 151, skipped 0 bad obs. | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ DIST -.1663035351 .29951458E-01 -5.552 .0000 RURAL .1598214542 .33221388 .481 .6305 CAGER .1280857447 .67891270E-01 1.887 .0592 FREEWAY -.1641800393 .73082884 -.225 .8223 MALEF .6608161130 .59869845 1.104 .2697 CAGEF .2353363437 .84583760E-01 2.782 .0054 --> nlogit;lhs=x1;choices=arterial,rural,freeway;model: u(arterial)=dista*x7/ u(rural)=rural*one+distr*x7+cager*cage/ u(freeway)=freeway*one+distf*x7+malef*x11+cagef*cage ;prob=proute$ Normal exit from iterations. Exit status=0. +---------------------------------------------+ | Discrete choice (multinomial logit) model | | Maximum Likelihood Estimates | | Dependent variable Choice | | Weighting variable ONE | | Number of observations 151 | | Iterations completed 6 | | Log likelihood function -94.27486 | | Log-L for Choice model = -94.2749 | | R2=1-LogL/LogL* Log-L fncn R-sqrd RsqAdj | | No coefficients -165.8905 .43170 .41624 | | Constants only -124.2267 .24111 .22046 | | Chi-squared[ 6] = 59.90361 | | Significance for chi-squared = 1.00000 | | Response data are given as ind. choice. | | Number of obs.= 151, skipped 0 bad obs. | +---------------------------------------------+



105 +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ DISTA -.1224245143 .30195296E-01 -4.054 .0001 RURAL 2.789895508 1.3999591 1.993 .0463 DISTR -.1763266850 .30647778E-01 -5.753 .0000 CAGER .1234459887 .68604008E-01 1.799 .0720 FREEWAY -2.711745018 2.7256129 -.995 .3198 DISTF -.9566266147E-01 .47393754E-01 -2.018 .0435 MALEF .6645967680 .62917721 1.056 .2908 CAGEF .2272410439 .84670428E-01 2.684 .0073 --> reject;x3=1$ --> reject;x4=1$ --> dstat;rhs=proute$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- PROUTE .218543047D+00 .191798823D+00 .136479546D-02 .893318771D+00 151 --> include;x3=1$ --> reject;x2=1$ --> dstat;rhs=proute$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- PROUTE .682119196D+00 .225544682D+00 .781963709D-01 .987070066D+00 151 --> include;x4=1$ --> reject;x3=1$ --> dstat;rhs=proute$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- PROUTE .993377575D-01 .155630369D+00 .909963465D-03 .800433079D+00 151



106


Assignment #4

(Discrete Data - Logit Analysis) Using the information from assignment #3, perform the following:

1. Develop a new model with a price variable in all three choice alternatives. The price variable is created as: set price = ((distance/10)/mpg)*1.05

2. Calculate direct elasticities for all continuous variables using the Limdep "effects" command (see software command-file downloads for assignment #3). Briefly comment on your findings.

3. Perform a likelihood ratio test to determine if men and women should be modeled separately. The test statistic is (see page 282 in the text): -2[LL(βT) – LL(βM) – LL(βF)] where LL(βT) is the log-likelihood at convergence of the model estimated with the data (males and females), LL(βM) is the log-likelihood at convergence of the model using only male data (use the Limdep "reject" command), and LL(βF) is the log-likelihood at convergence of the model using only female data. This statistic is χ2 distributed with degrees of freedom equal to the summation of the number of estimated parameters in individual male and female models minus the number of estimated parameters in the overall model. The resulting χ2 statistic provides the probability that the models have different parameters. Confidence levels for this can be read from Table C.3 on page 379 of the text. Briefly comment on your findings.

--> read;nvar=17;nobs=453;file=D:\old_drive_d\new_laptop\CE697N-disk\LOGIT-A1.txt$ --> create;cage=86-x15$ --> create;price=(x7/10)/x17*1.05$ --> nlogit;lhs=x1;choices=arterial,rural,freeway;model: u(arterial)=pricea*price/ u(rural)=rural*one+pricer*price+cager*cage/ u(freeway)=freeway*one+pricef*price+malef*x11+cagef*cage ;effects:price(arterial,rural,freeway)$



107Normal exit from iterations. Exit status=0. +---------------------------------------------+ | Discrete choice (multinomial logit) model | | Maximum Likelihood Estimates | | Dependent variable Choice | | Weighting variable ONE | | Number of observations 151 | | Iterations completed 7 | | Log likelihood function -93.08420 | | Log-L for Choice model = -93.0842 | | R2=1-LogL/LogL* Log-L fncn R-sqrd RsqAdj | | No coefficients -165.8905 .43888 .42361 | | Constants only -124.2267 .25069 .23030 | | Chi-squared[ 6] = 62.28493 | | Significance for chi-squared = 1.00000 | | Response data are given as ind. choice. | | Number of obs.= 151, skipped 0 bad obs. | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ PRICEA -27.59844150 5.9577684 -4.632 .0000 RURAL 1.955464369 .96933973 2.017 .0437 PRICER -36.12375952 5.9886626 -6.032 .0000 CAGER .2050962666 .79571545E-01 2.578 .0100 FREEWAY -2.680384373 1.4163297 -1.892 .0584 PRICEF -21.41617741 5.8457065 -3.664 .0002 MALEF .4913420715 .66134595 .743 .4575 CAGEF .2518958209 .97354563E-01 2.587 .0097 +-----------------------------------------------------------------+ | Elasticity Averaged over observations. | | Attribute is PRICE in choice ARTERIAL | | Effects on probabilities of all choices in the model: | | * indicates direct Elasticity effect of the attribute. | | Decomposition of Effect Total | | Trunk Limb Branch Choice Effect| | Trunk=Trunk{1} | | Limb=Lmb[1:1] | | Branch=B(1:1,1) | | * Choice=ARTERIAL .000 .000 .000 -6.035 -6.035 | | Choice=RURAL .000 .000 .000 1.517 1.517 | | Choice=FREEWAY .000 .000 .000 1.517 1.517 | +-----------------------------------------------------------------+



108 +-----------------------------------------------------------------+ | Elasticity Averaged over observations. | | Attribute is PRICE in choice RURAL | | Effects on probabilities of all choices in the model: | | * indicates direct Elasticity effect of the attribute. | | Decomposition of Effect Total | | Trunk Limb Branch Choice Effect| | Trunk=Trunk{1} | | Limb=Lmb[1:1] | | Branch=B(1:1,1) | | Choice=ARTERIAL .000 .000 .000 5.538 5.538 | | * Choice=RURAL .000 .000 .000 -3.181 -3.181 | | Choice=FREEWAY .000 .000 .000 5.538 5.538 | +-----------------------------------------------------------------+ +-----------------------------------------------------------------+ | Elasticity Averaged over observations. | | Attribute is PRICE in choice FREEWAY | | Effects on probabilities of all choices in the model: | | * indicates direct Elasticity effect of the attribute. | | Decomposition of Effect Total | | Trunk Limb Branch Choice Effect| | Trunk=Trunk{1} | | Limb=Lmb[1:1] | | Branch=B(1:1,1) | | Choice=ARTERIAL .000 .000 .000 .830 .830 | | Choice=RURAL .000 .000 .000 .830 .830 | | * Choice=FREEWAY .000 .000 .000 -6.356 -6.356 | +-----------------------------------------------------------------+



109


Assignment #5

(Discrete Data – Ordered Probit)

A survey of 322 commuters was in the Seattle metropolitan area. The survey's intent was to gather information on commuters' opinions of high-occupancy vehicle (HOV) lanes (lanes that are restricted for use by vehicles with 2 or more occupants). The variables available from this survey are given on the attached table.

Among the questions asked, commuters were asked whether they agreed with the statement "HOV lanes should be open to all vehicles, regardless of vehicle occupancy level." (variable number x29 in the table). The question provided ordered responses of; strongly disagree, disagree, neutral, agree, agree strongly and the observed percentage frequency of response in these 5 categories was 32.74, 21.71, 8.54, 12.10, and 24.91 respectively. To understand the factors determining commuter opinions, an ordered probit model of this survey question is appropriate. Your task is to estimate a model of the ordered response of whether commuters believe HOV lanes should be open to all vehicles, regardless of vehicle occupancy level. Your solution to this problem should include: 1. The results of your best model specification. 2. A discussion of the logical process that led you to the selection of your final specification. (e.g.

Discuss the theory behind the inclusion of your selected variables). Include t-statistics and justify the sign of your variables.



110

Variables available for your specification are (in file surveys.csv):

Variable Number Explanation

x1 Usual mode of travel: 0 if drive alone, 1 if two person carpool, 2 if three or more person carpool, 3 if vanpool, 4 if bus, 5 if bicycle or walk, 6 if motorcycle, 7 if other

x2 Have used HOV lanes: 1 if yes, 0 if no

x3 If used HOV lanes, what mode is most often used: 0 in a bus, 1 in two person carpool, 2 in three or more person carpool, 3 in vanpool, 4 alone in vehicle, 5 on motorcycle

x4 Sometimes eligible for HOV lane use but do not use: 1 if yes, 0 if no

x5 Reason for not using HOV lanes when eligible: 0 if slower than regular lanes, 1 if too much trouble to change lanes, 2 if HOV lanes are not safe, 3 if traffic moves fast enough, 4 if forget to use HOV lanes, 5 if other

x6 Usual mode of travel one year ago: 0 if drive alone, 1 if two person carpool, 2 if three or more person carpool, 3 if vanpool, 4 if bus, 5 if bicycle or walk, 6 if motorcycle, 7 if other

x7 Commuted to work in Seattle a year ago: 1 if yes, 0 if no

x8 Have flexible work start times: 1 if yes, 0 if no

x9 Changed departure times to work in the last year: 1 if yes, 0 if no

x10 On average, number of minutes leaving earlier for work relative to last year

x11 On average, number of minutes leaving later for work relative to last year

x12 If changed departure times to work in the last year, reason why: 0 if change in travel mode, 1 if increasing traffic congestion, 2 if change in work start time, 3 if presence of HOV lanes, 4 if change in residence, 5 if change in lifestyle, 6 if other

x13 Changed route to work in the last year: 1 if yes, 0 if no

x14 If changed route to work in the last year, reason why: 0 if change in travel mode, 1 if increasing traffic congestion, 2 if change in work start time, 3 if presence of HOV lanes, 4 if change in residence, 5 if change in lifestyle, 6 if other

x15 Usually commute to or from work on Interstate 90: 1 if yes, 0 if no



111

x16 Usually commuted to or from work on Interstate 90 last year: 1 if yes, 0 if no

x17 On your past five commutes to work, how often have you used HOV lanes

x18 On your past five commutes to work, how often did you drive alone

x19 On your past five commutes to work, how often did you carpool with one other person

x20 On your past five commutes to work, how often did you carpool with two or more people

x21 On your past five commutes to work, how often did you take a vanpool

x22 On your past five commutes to work, how often did you take a bus

x23 On your past five commutes to work, how often did you bicycle or walk

x24 On your past five commutes to work, how often did you take a motorcycle

x25 On your past five commutes to work, how often did you take a mode other than those listed in variables 18 through 24

x26 On your past five commutes to work, how often have you changed route or departure time

x27 HOV lanes save all commuters time: 0 if strongly disagree, 1 if disagree, 2 if neutral, 3 if agree, 4 if agree strongly

x28 Existing HOV lanes are being adequately used: 0 if strongly disagree, 1 if disagree, 2 if neutral, 3 if agree, 4 if agree strongly

x29 HOV lanes should be open to all traffic: 0 if strongly disagree, 1 if disagree, 2 if neutral, 3 if agree, 4 if agree strongly

x30 Converting some regular lanes to HOV lanes is a good idea: 0 if strongly disagree, 1 if disagree, 2 if neutral, 3 if agree, 4 if agree strongly

x31 Converting some regular lanes to HOV lanes is a good idea only if it is done before traffic congestion becomes serious: 0 if strongly disagree, 1 if disagree, 2 if neutral, 3 if agree, 4 if agree strongly

x32 Gender: 1 if male, 0 if female

x33 Age in years: 0 if under 21, 1 if 22 to 30, 2 if 31 to 40, 3 if 41 to 50, 4 if 51 to 64, 5 if 65 or greater



112

x34 Annual household income (US dollars per year): 0 if no income, 1 if 1 to 9,999, 2 if 10,000 to 19,999, 3 if 20,000 to 29,999, 4 if 30,000 to 39,999, 5 if 40,000 to 49,999, 6 if 50,000 to 74,999, 7 if 75,000 to 100,000, 8 if over 100,000

x35 Highest level of education: 0 if did not finish high school, 1 if high school, 2 if community college or trade school, 3 if college/university, 4 if post college graduate degree

x36 Number of household members

x37 Number of adults in household (aged 16 or more)

x38 Number of household members working outside the home

x39 Number of licensed motor vehicles in the household

x40 Postal zip code of work place

x41 Postal zip code of home

x42 Type of survey comment left by respondent regarding opinions on HOV lanes: 0 if no comment on HOV lanes, 1 if comment not in favor of HOV lanes, 2 comment positive toward HOV lanes but critical of HOV lane policies, 3 comment positive toward HOV lanes, 4 neutral HOV lane comment

--> read;nvar=42;nobs=322;file=D:\old_drive_d\new_laptop\CE697N-disk\SURVEYS-L.CSV$ --> create;if(x1=0)dalone=1$ --> create;if(x33>3&x32=1)oldmen=1$ --> create;if(x35>2)college=1$ --> histogram;rhs=x29$ Histogram for X29 NOBS= 314, Too low: 0, Too high: 0 Bin Lower limit Upper limit Frequency Cumulative Frequency ======================================================================== 0 .000 1.000 99 ( .3153) 99( .3153) 1 1.000 2.000 77 ( .2452) 176( .5605) 2 2.000 3.000 26 ( .0828) 202( .6433) 3 3.000 4.000 36 ( .1146) 238( .7580) 4 4.000 5.000 76 ( .2420) 314(1.0000)



113

Bin

Histogram for Variable X29Fr

eque

ncy

0

28

56

84

112

0 1 2 3 4

--> skip$ --> ordered;lhs=x29;rhs=one,dalone,x8,oldmen,college,x36;marginal effects$ +-----------------------------------------------------------------------+ | Dependent variable is binary, y=0 or y not equal 0 | | Ordinary least squares regression Weighting variable = none | | Dep. var. = Y=0/Not0 Mean= .6738351254 , S.D.= .4696508592 | | Model size: Observations = 279, Parameters = 6, Deg.Fr.= 273 | | Residuals: Sum of squares= .9972582530D+03, Std.Dev.= 1.91127 | | Fit: R-squared=*********, Adjusted R-squared = -15.56131 | | Diagnostic: Log-L = -573.5787, Restricted(b=0) Log-L = -184.5243 | | LogAmemiyaPrCrt.= 1.317, Akaike Info. Crt.= 4.155 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant .1296715074 .44220253 .293 .7693 DALONE .4388033502 .27449960 1.599 .1099 .77060932 X8 .4781738604E-01 .23267286 .206 .8372 .48028674 OLDMEN .6020054050E-01 .35308446 .170 .8646 .12903226 COLLEGE .9724386887E-01 .28975461 .336 .7372 .79211470 X36 .3343083528E-01 .97510448E-01 .343 .7317 2.9390681 Normal exit from iterations. Exit status=0. +---------------------------------------------+ | Ordered Probit Model | | Maximum Likelihood Estimates | | Dependent variable X29 | | Weighting variable ONE | | Number of observations 279 | | Iterations completed 14 | | Log likelihood function -397.2770 | | Restricted log likelihood -421.3950 | | Chi-squared 48.23599 | | Degrees of freedom 5 | | Significance level .0000000 | | Cell frequencies for outcomes | | Y Count Freq Y Count Freq Y Count Freq | | 0 91 .326 1 60 .215 2 24 .086 | | 3 34 .121 4 70 .250 | +---------------------------------------------+



114 +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Index function for probability Constant -.5807798304 .27813128 -2.088 .0368 DALONE 1.136565726 .16430272 6.918 .0000 .77060932 X8 .2301353655 .13561406 1.697 .0897 .48028674 OLDMEN .1968407034 .20635114 .954 .3401 .12903226 COLLEGE .1996976747E-01 .15582658 .128 .8980 .79211470 X36 .1178065062E-01 .60437908E-01 .195 .8455 2.9390681 Threshold parameters for index Mu( 1) .6231650207 .73062591E-01 8.529 .0000 Mu( 2) .8657954320 .83104656E-01 10.418 .0000 Mu( 3) 1.240495241 .95160650E-01 13.036 .0000 +------------------------------------------------------+ | Marginal Effects for OrdProbt | +----------+----------+----------+----------+----------+ | Variable | X29=0 | X29=1 | X29=2 | X29=3 | +----------+----------+----------+----------+----------+ | ONE | .2063 | .0230 | -.0142 | -.0415 | | DALONE | -.4038 | -.0451 | .0278 | .0812 | | X8 | -.0818 | -.0091 | .0056 | .0164 | | OLDMEN | -.0699 | -.0078 | .0048 | .0141 | | COLLEGE | -.0071 | -.0008 | .0005 | .0014 | | X36 | -.0042 | -.0005 | .0003 | .0008 | +----------+----------+----------+----------+----------+ Frequencies of actual & predicted outcomes Predicted outcome has maximum probability. Predicted ------ ------------------------- + ----- Actual 0 1 2 3 4 | Total ------ ------------------------- + ----- 0 59 0 0 0 32 | 91 1 23 0 0 0 37 | 60 2 9 0 0 0 15 | 24 3 9 0 0 0 25 | 34 4 25 0 0 0 45 | 70 ------ ------------------------- + ----- Total 125 0 0 0 154 | 279



115


Assignment #6 (Duration Models)

You are given 204 observations from a travel survey conducted in the spring of 1988, in the Seattle area (this is the same data that was used for assignment #2). While the purpose of the survey was to study the number of times (per week) commuters' changed their departure time on their work-to-home trip to avoid traffic congestion, we also have information on the length of time that they delay their trips to avoid congestion. The length of time commuters’ delay is ideally suited to duration models. Your task is to estimate, Weibull, Weibull model with gamma heterogeneity and log-logistic hazard models using the software package LIMDEP version 7.0. Please note that LIMDEP actually estimates the parameter vector -β instead of just β so that the effect of the covariates on the hazard is: EXP(-βX) This means that a negative parameter in LIMDEP increases the hazard and thus decreases the duration. So the negative sign gives the effect on duration instead of on the hazard. In your analysis include: 1. The results of your best model specification. 2. Show and discuss the shape of the hazard function of your best specifications. 2. A discussion of the logical process that led you to the selection of your final specification. (e.g.

Discuss the theory behind the inclusion of your selected variables). Include t-statistics and justify the sign of your variables.



116

Variables available for your specification are (file tobit.dat) Variable Number Explanation

x1 Household number

x2 Do you ever delay work-to-home departure to avoid traffic congestion? 1-yes, 0-no

x3 If sometimes delay, on average how many minutes do you delay?

x4 If sometimes delay, do you 1-perform additional work, 2-engage in non-work activities, or 3-do both?

x5 If sometimes delay, how many times have you delayed in the past week?

x6 Mode of transportation used work-to-home: 1-car SOV, 2-carpool, 3-vanpool, 4-bus, 5 other.

x7 Primary route (work-to-home): 1-I90, 2-I5, 3-SR520, 4-I405, 5-other

x8 Do you generally encounter traffic congestion on you work-to-home trip? 1-yes, 2-no

x9 Age in years: 1-(<25), 2-(26-30), 3-(31-35), 4-(36-40), 5-(41-45), 6-(46-50), 7-(>50)

x10 Gender: 1-male, 0-female

x11 Number of cars in household

x12 Number of children in household

x13 Annual income: 1 - less than 20000, 2 - 20000 to 29999, 3 - 30000 to 39999, 4 - 40000 to 49999, 5 - 50000 to 59999, 6 - >60000

x14 Do you have flexible work hours? 1-yes, 0-no

x15 Distance from work to home (in miles)

x16 Face LOS D or worse? 1-yes, 0-no

x17 Ratio of actual travel time to free-flow travel time

x18 Population of work zone

x19 Retail employment in work zone

x20 Service employment in work zone

x21 Size of work zone (in acres)



117 --> RESET --> sample;1-204$ --> read;nvar=21;nobs=204;file=D:\new_laptop\CE697N-disk\tobit.dat$ --> reject;x3=0$ --> dstat;rhs=x3$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- X3 51.2916667 37.4671552 4.00000000 240.000000 96 --> create;if(x6=1)car=1$ --> create;ltime=log(x3)$ --> create;if(x9>6)old=1$ --> dstat;rhs=car$ Descriptive Statistics All results based on nonmissing observations. Variable Mean Std.Dev. Minimum Maximum Cases ------------------------------------------------------------------------------- CAR .718750000 .451969375 .000000000 1.00000000 96 --> survival;lhs=ltime;rhs=one,x15,x17,x12;model=weibull$ +-----------------------------------------------------------------------+ | Log-linear survival regression model: WEIBULL | | Least squares is used to obtain starting values for MLE. | | Ordinary least squares regression Weighting variable = none | | Dep. var. = LTIME Mean= 3.706843804 , S.D.= .6997312277 | | Model size: Observations = 96, Parameters = 4, Deg.Fr.= 92 | | Residuals: Sum of squares= 39.91702844 , Std.Dev.= .65870 | | Fit: R-squared= .141832, Adjusted R-squared = .11385 | | Model test: F[ 3, 92] = 5.07, Prob value = .00271 | | Diagnostic: Log-L = -94.0959, Restricted(b=0) Log-L = -101.4378 | | LogAmemiyaPrCrt.= -.794, Akaike Info. Crt.= 2.044 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 1.619000121 .54359827 2.978 .0029 X15 .4225283769E-01 .15618491E-01 2.705 .0068 7.7083333 X17 .9020960070 .24142168 3.737 .0002 1.9593750 X12 -.6645707142E-02 .60944914E-01 -.109 .9132 .81250000 Normal exit from iterations. Exit status=0.



118 +---------------------------------------------+ | Loglinear survival model: WEIBULL | | Maximum Likelihood Estimates | | Dependent variable LTIME | | Weighting variable ONE | | Number of observations 96 | | Iterations completed 11 | | Log likelihood function -96.28262 | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ RHS of hazard model Constant 1.732270225 .65862735 2.630 .0085 X15 .3273725360E-01 .19402531E-01 1.687 .0916 7.7083333 X17 1.055416934 .27856852 3.789 .0002 1.9593750 X12 -.3865858378E-01 .57807767E-01 -.669 .5037 .81250000 Ancillary parameters for survival Sigma .5872525538 .55008811E-01 10.676 .0000 +----------------------------------------------------------------+ | Parameters of underlying density at data means: | | Parameter Estimate Std. Error Confidence Interval | | ------------------------------------------------------------ | | Lambda .01793 .00121 .0156 to .0203 | | P 1.70284 .15951 1.3902 to 2.0155 | | Median 44.96713 3.03851 39.0116 to 50.9226 | | Percentiles of survival distribution: | | Survival .25 .50 .75 .95 | | Time 67.56 44.97 26.83 9.75 | +----------------------------------------------------------------+ --> survival;lhs=ltime;rhs=one,x15,x17,x12;model=weibull;heterogeneity$ +-----------------------------------------------------------------------+ | Log-linear survival regression model: WEIBULL | | Least squares is used to obtain starting values for MLE. | | Weibull Model with Gamma Heterogeneity | | Ordinary least squares regression Weighting variable = none | | Dep. var. = LTIME Mean= 3.706843804 , S.D.= .6997312277 | | Model size: Observations = 96, Parameters = 4, Deg.Fr.= 92 | | Residuals: Sum of squares= 39.91702844 , Std.Dev.= .65870 | | Fit: R-squared= .141832, Adjusted R-squared = .11385 | | Model test: F[ 3, 92] = 5.07, Prob value = .00271 | | Diagnostic: Log-L = -94.0959, Restricted(b=0) Log-L = -101.4378 | | LogAmemiyaPrCrt.= -.794, Akaike Info. Crt.= 2.044 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 1.619000121 .54359827 2.978 .0029 X15 .4225283769E-01 .15618491E-01 2.705 .0068 7.7083333 X17 .9020960070 .24142168 3.737 .0002 1.9593750 X12 -.6645707142E-02 .60944914E-01 -.109 .9132 .81250000 Normal exit from iterations. Exit status=0.



119 +---------------------------------------------+ | Loglinear survival model: WEIBULL | | Maximum Likelihood Estimates | | Dependent variable LTIME | | Weighting variable ONE | | Number of observations 96 | | Iterations completed 16 | | Log likelihood function -93.88402 | | Weibull Model with Gamma Heterogeneity | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ RHS of hazard model Constant 1.870386758 .58870206 3.177 .0015 X15 .3375074414E-01 .17842561E-01 1.892 .0585 7.7083333 X17 .8579132493 .25730277 3.334 .0009 1.9593750 X12 -.1044830246E-01 .57608312E-01 -.181 .8561 .81250000 Ancillary parameters for survival Theta .6141476031 .39135931 1.569 .1166 Sigma .4212482203 .71720253E-01 5.873 .0000 +----------------------------------------------------------------+ | Parameters of underlying density at data means: | | Parameter Estimate Std. Error Confidence Interval | | ------------------------------------------------------------ | | Lambda .02230 .00226 .0179 to .0267 | | P 2.37390 .40417 1.5817 to 3.1661 | | Median 42.16025 4.27718 33.7770 to 50.5435 | | Percentiles of survival distribution: | | Survival .25 .50 .75 .95 | | Time 62.34 42.16 27.55 12.92 | +----------------------------------------------------------------+ --> survival;lhs=ltime;rhs=one,x15,x17,x12;model=logistic;plot$ +-----------------------------------------------------------------------+ | Log-linear survival regression model: LOGISTIC | | Least squares is used to obtain starting values for MLE. | | Ordinary least squares regression Weighting variable = none | | Dep. var. = LTIME Mean= 3.706843804 , S.D.= .6997312277 | | Model size: Observations = 96, Parameters = 4, Deg.Fr.= 92 | | Residuals: Sum of squares= 39.91702844 , Std.Dev.= .65870 | | Fit: R-squared= .141832, Adjusted R-squared = .11385 | | Model test: F[ 3, 92] = 5.07, Prob value = .00271 | | Diagnostic: Log-L = -94.0959, Restricted(b=0) Log-L = -101.4378 | | LogAmemiyaPrCrt.= -.794, Akaike Info. Crt.= 2.044 | +-----------------------------------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant 1.619000121 .54359827 2.978 .0029 X15 .4225283769E-01 .15618491E-01 2.705 .0068 7.7083333 X17 .9020960070 .24142168 3.737 .0002 1.9593750 X12 -.6645707142E-02 .60944914E-01 -.109 .9132 .81250000 Normal exit from iterations. Exit status=0.



120 +---------------------------------------------+ | Loglinear survival model: LOGISTIC | | Maximum Likelihood Estimates | | Dependent variable LTIME | | Weighting variable ONE | | Number of observations 96 | | Iterations completed 9 | | Log likelihood function -94.28102 | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ RHS of hazard model Constant 1.859264488 .56577702 3.286 .0010 X15 .3536846032E-01 .16964884E-01 2.085 .0371 7.7083333 X17 .8117401209 .24761640 3.278 .0010 1.9593750 X12 -.6399300532E-02 .55823521E-01 -.115 .9087 .81250000 Ancillary parameters for survival Sigma .3648248813 .34783222E-01 10.489 .0000 +----------------------------------------------------------------+ | Parameters of underlying density at data means: | | Parameter Estimate Std. Error Confidence Interval | | ------------------------------------------------------------ | | Lambda .02430 .00157 .0212 to .0274 | | P 2.74104 .26134 2.2288 to 3.2533 | | Median 41.14903 2.65177 35.9516 to 46.3465 | | Percentiles of survival distribution: | | Survival .25 .50 .75 .95 | | Time 61.44 41.15 27.56 14.06 | +----------------------------------------------------------------+

Estimated Hazard FunctionDuration

.000

.005

.010

.015

.020

.025

.030

.035

.040

-.00548 96 144 192 2400

HazardFn

Date post:	19-Jan-2021
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Course Notes: Statistical and Econometric Methodsflm/CE697N_files/Stat...Fred Mannering, Statistical...

Documents