y ma .org. An Approximate Dynamic Programming Algorithm ... · An Approximate Dynamic Programming...

Articles in Advance, pp. 1–20issn 0041-1655 �eissn 1526-5447

informs ®

doi 10.1287/trsc.1080.0238© 2008 INFORMS

An Approximate Dynamic ProgrammingAlgorithm for Large-Scale Fleet Management:

A Case Application

Hugo P. SimãoDepartment of Operations Research and Financial Engineering, Princeton University,

Princeton, New Jersey 08544, [email protected]

Jeff DaySchneider National, Green Bay, Wisconsin 54306, [email protected]

Abraham P. GeorgeDepartment of Operations Research and Financial Engineering, Princeton University,


Ted Gifford, John NienowSchneider National, Green Bay, Wisconsin 54306 {[email protected], [email protected]}

Warren B. PowellDepartment of Operations Research and Financial Engineering, Princeton University,


We addressed the problem of developing a model to simulate at a high level of detail the movements of over6,000 drivers for Schneider National, the largest truckload motor carrier in the United States. The goal of

the model was not to obtain a better solution but rather to closely match a number of operational statistics. Inaddition to the need to capture a wide range of operational issues, the model had to match the performance ofa highly skilled group of dispatchers while also returning the marginal value of drivers domiciled at differentlocations. These requirements dictated that it was not enough to optimize at each point in time (somethingthat could be easily handled by a simulation model) but also over time. The project required bringing togetheryears of research in approximate dynamic programming, merging math programming with machine learning,to solve dynamic programs with extremely high-dimensional state variables. The result was a model that closelycalibrated against real-world operations and produced accurate estimates of the marginal value of 300 differenttypes of drivers.

Key words : fleet management; truckload trucking; approximate dynamic programming; driver managementHistory : Received: February 2007; revision received: August 2007; accepted: April 2008. Published online inArticles in Advance.

In 2003, Schneider National, the largest truck-load motor carrier in the United States, contractedwith CASTLE Laboratory at Princeton University,Princeton, New Jersey, to develop a model that wouldsimulate its long-haul truckload operations to per-form analyses to answer questions ranging from thesize and mix of its driver pool to questions aboutvaluing contracts and getting drivers home. Therequirements for the simulator seemed quite simple:it had to capture the dynamics of the real problem,producing behaviors that closely matched corporateperformance along several dimensions, and it had toprovide estimates of the marginal value of differenttypes of drivers. If the model accurately matched his-

torical performance, the company would be able touse the system to test changes in the mix of drivers,the mix of freight, and other operating policies.

The major challenge we faced was that theserequirements meant that we had to do much morethan just develop a classical simulator. It was notenough to optimize decisions (in the form of match-ing drivers to loads) at a point in time. The model hadto optimize decisions over time to take into accountdownstream impacts. Formulating the problem as adeterministic, time-space network problem was bothcomputationally intractable (the problem is huge) andtoo limiting (we needed to model different forms ofuncertainty as well as a high degree of realism that

1

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g. Published online ahead of print August 15, 2008

Simão et al.: An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management2 Transportation Science, Articles in Advance, pp. 1–20, © 2008 INFORMS

was beyond the capabilities of classical math pro-grams). Classical techniques from Markov decisionprocesses applied to this setting are limited to prob-lems with only a small number of identical trucksmoving between a few locations (see Powell 1988 orKleywegt, Nori, and Savelsbergh 2004). Our probleminvolved modeling thousands of drivers at a highlevel of detail.

We solved the problem using approximate dynamicprogramming (ADP), but even classical ADP tech-niques (Bertsekas and Tsitsiklis 1996; Sutton andBarto 1998) would not handle the requirements ofthis project. Three years of development produceda model that closely matches a range of historicalmetrics. Achieving this goal required drawing on theresearch of three Ph.D. dissertations (Spivey 2001;Marar 2002; George 2005) and depended on the exten-sive participation of the sponsor to produce a modelthat accurately simulated operations. The model isable to handle a host of engineering details to allowthe sponsor to run a broad range of simulations. Toestablish credibility, the model had to match the his-torical performance of a dozen major operating statis-tics. Two of particular importance to our presentationincluded matching the average length of haul for dif-ferent types of drivers and getting drivers home withthe same frequency as the company. A central hypoth-esis of the research, which is supported by the evi-dence we present in this paper, was that the behaviorof a group of dispatchers could be described by anoptimization model using a suitably designed objec-tive function.

The contributions of this paper include:(1) We show, for the first time in a production set-

ting for a truckload motor carrier, that approximatedynamic programming can provide high-quality solu-tions while capturing operational issues at a highlevel of detail, including all business rules such ashours of service, returning drivers home, and opera-tional restrictions on the use of specific driver types.This appears to be the first optimization model of anyform that captures the complex dynamics of a truck-load motor carrier where decisions produce behaviorthat optimizes over time.

(2) We demonstrate that the framework of approx-imate dynamic programming, with methods adaptedto this problem class, produces a model that accu-rately captures the performance of a well-run com-pany based on comparisons with historical metrics.This appears to be the first demonstrated calibrationof an optimization model for truckload trucking forplanning purposes.

(3) We show that the value function approxima-tions used in the dynamic programming formula-tion produce accurate estimates of the marginal valueof particular driver types (for example, the value of

adding additional team drivers domiciled in a par-ticular region) over the entire simulation when com-pared against brute-force derivatives computed usingthe model (adding additional drivers and runningthe simulation again). These marginal values wouldnot be available from a traditional simulator (whichdoes not use the framework of dynamic program-ming to capture the value of a driver over the entiresimulation). They mimic dual variables from a linearprogram (which is not able to handle the complexdynamics of this system).

The presentation begins in §1 with a generaldescription of the problem. Section 2 provides a for-mal model of the problem. Section 3 describes thealgorithmic strategies that are used, focusing primar-ily on the use of approximate dynamic programmingto solve the problem of optimizing over time. Sec-tion 4 describes the results of calibration experimentsthat show that the model closely matches historicalperformance, which required using recent researchdescribing how to make cost-based models matchrule-based patterns. Then, §5 shows that the modelcan be used to estimate the value of particular typesof drivers, which is then used to change the mix ofdrivers. The value of a particular type of driver, whichrequires estimating a derivative of the simulation, canonly be achieved using the approximate dynamic pro-gramming strategies that were used to optimize overtime. Section 6 concludes the paper.

1. Problem Description andLiterature Review

On the surface, truckload trucking can appear to bea relatively simple operational problem. At any pointin time, there will be a set of drivers available to bedispatched and a set of loads that need to be moved(typically from one city to another). The loads in thisindustry are typically quite long, generally requiringanywhere from one to four days to complete. As aresult, at a point in time we will assign a driver toat most one load. This can easily be modeled as anassignment problem, where the cost of assigning adriver to a load includes both the cost of movingempty to pick up the load and the net revenue frommoving the load.

In real applications, the problem is much richer.Whereas dispatchers do their best to minimize theempty miles and move the most profitable loads,real decisions have to balance profits now and in thefuture as well as accomplish objectives such as get-ting drivers home in a reasonable amount of time. Animportant issue in this project was matching histori-cal behavior in terms of the average length of loadshandled by different types of drivers. We modeledthree “capacity types” (using the terminology of the

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Simão et al.: An Approximate Dynamic Programming Algorithm for Large-Scale Fleet ManagementTransportation Science, Articles in Advance, pp. 1–20, © 2008 INFORMS 3

carrier): teams (two drivers in the same tractor whocould trade off driving and resting), solos (a singledriver who had to rest according to a schedule deter-mined by federal law), and ICs (independent contrac-tors who owned the tractors they drove). Drivers ineach of these three fleets had different expectationsregarding the lengths of the loads to which they wereassigned. Teams were generally given the longestloads so that their total revenue per week would rea-sonably compensate two people. Solos exhibited theshortest average length of haul. Getting the model tomatch historical performance for length of haul foreach of the three driver classes required special algo-rithmic measures.

The standard approach for modeling such large-scale problems (we worked with over 6,000 drivers)at a high level of detail would be to simply simulatedecisions over time. In this setting, this would involvesolving a series of network problems to assign driversto loads at a point in time. Whereas such an approachwould handle a high level of detail, the decisionswould not be able to reflect the future impact of deci-sions made now. For example, this logic would nottake into account that sending a driver whose homeis in Dallas on loads to Chicago is a good way ofgetting him home. It is also unable to realize thata long (and high revenue) load from Maryland toIdaho is not as good as a shorter load from Marylandto Cleveland (which offers more opportunities fordrivers once they unload).

In addition to producing an accurate simulation ofthe company, we also wanted to produce estimates ofthe marginal value of different types of drivers dis-tinguished by their home domicile and capacity type.For example, we would like to know the marginalvalue of adding 10 teams with home domiciles in cen-tral Illinois. It is not practical to run a simulation, add10 drivers of a particular type (there were 300 types),and simulate again. If this were repeated 10 times (toreduce statistical error), we would have to run 3,000simulations.

There is fairly extensive literature on models andalgorithms for the full truckload problem and, in par-ticular, dynamic versions of the problem. Much of thiswork has solved sequences of deterministic problemsthat reflect only what is known at a point in time (forreviews, see Psaraftis 1995; Powell, Jaillet, and Odoni1995; Gendreau and Potvin 1998; Larsen, Madsen, andSolomon 2002). This work has often focused on thealgorithmic challenge of solving problems in real time(e.g., Gendreau et al. 1999; Taylor et al. 1999). A num-ber of papers simulated dynamic operations to studyquestions such as the value of real time information orother dynamic operating policies (Tjokroamidjojo andKutanoglu 2001; Regan, Mahmassani, and Jaillet 1998;

Chiu and Mahmassani 2002; Yang, Jaillet, and Mah-massani 2004). Ichoua, Gendreau, and Potvin (2006)also propose a policy for dynamically routing vehicleswith the intent of optimizing over time. Their researchfocuses on myopic policies that adjust behavior nowbased on probabilistic estimates of future demands.Secomandi (2000, 2001) provides a more formal treat-ment of policies for solving stochastic vehicle routingproblems. This line of research, however, is limited tosingle-vehicle routing problems.

The general problem of routing drivers so theyreturn home on time has received very little atten-tion. Caliskan and Hall (2003) propose a deterministicmodel for routing drivers in trucking, but this modeldoes not capture either the complexity of drivers orthe challenge of getting drivers home in the presenceof the type of uncertainty that characterizes truckloadtrucking. There is a rich literature on planning pilotschedules capturing all the attributes of a pilot anda full set of work rules (see Desrosiers, Solomon,and Soumis 1995; Desaulniers et al. 1998). However,these problems are deterministic and benefit from thehighly scheduled nature of airline operations. Also,these problems are much smaller than the problemwe address here.

A separate line of research has focused on develop-ing models that produce solutions that optimize overan entire planning horizon. A summary of differentmodeling and algorithmic strategies for dynamic fleetmanagement problems is given in Powell (1988) andPowell, Jaillet, and Odoni (1995). Early work in thisarea focused on managing large fleets of relativelysimilar vehicles such as would arise in the optimiza-tion of empty freight cars for railroads or in aggregatemodels of fleets for truckload motor carriers. Suchproblems could be formulated as space-time models(where each node represented a point in space andtime) and solved as a network problem if there wasa single vehicle type (see, for example, White 1972)or as a multicommodity flow problem if there weremultiple vehicle types and substitution (Tapiero andSoliman 1972; Crainic, Ferland, and Rousseau 1984).These models do not allow us to model drivers withany of the richness needed for our project.

The research closest to this project is given inSpivey and Powell (2004), which provides a formalmodel of the stochastic dynamic driver managementproblem. We build on this model but introduce anumber of new strategies to overcome challenges thatarose when we made the transition from a laboratoryexperiment to a production application.

2. Problem FormulationWe model the problem using the language of dynamicresource management (see Powell, Shapiro, and

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


Simão 2001), where drivers are “resources” and loadsare “tasks.” The state of a single resource is defined byan attribute vector a, composed of multiple attributesthat may be numerical or categorical. For our model,we used

a=

a1

a2

a3

a4

a5

a6

a7

a8

a9

a10

=

Location

Domicile

Capacity type

Scheduled time at home

Days away from home

Available time

Geographical constraints

DOT road hours

DOT duty hours

Eight-day duty hours

�= Set of all possible driver attribute vectors a.

A brief discussion of the driver attributes (and theload attributes below) provides an appreciation ofsome of the complexities in an industrial strengthsystem. Driver locations were captured at a levelthat produced 400 locations around the country.Driver domiciles were also captured at a level thatdivided the country into 100 regions. As discussedearlier, there were three capacity types: team, solo,and IC (independent contractor). The three attributes(location, domicile, and capacity type) were particu-larly important and will play a major role throughoutour analysis. Field a4 is the time by which we wouldlike to get the driver back home (e.g., next Saturday),but the cost of not doing this is also influenced bythe number of days the driver has been away fromhome (a5). Our ability to get drivers home on timewas one of the major metrics to which we had tocalibrate.

The remaining attributes were needed to producean accurate simulation. For example, a6 (availabletime) captured the fact that a driver might be headedto Chicago (a1 = Chicago) but would not arrive until3:17 p.m. tomorrow (all activities were modeled incontinuous time). Field a7 captured constraints suchas the fact that Canadian drivers in the United Stateshad to return to Canada, or that other drivers had tostay within 500 miles of their homes. Fields a8 and a9

(Department of Transportation (DOT) road hours andDOT duty hours) captured how many hours a driverhad been behind the wheel (road hours) or on duty(duty hours) on a given day. Field a10 is actually aneight-element vector, capturing the number of hoursa driver had worked on each of the last eight days.

Similarly, we let b be the vector of attributes of aload, including elements such as origin, destination,appointment time and type, priority, revenue, anddelivery window. Some windows are tight but manyare fairly loose, providing some flexibility in when aload is served. We let � be the space of all load types.

We can think of at , the attribute vector of a driverat time t, as the state of the driver. We model thestate of all the drivers using the resource state vector,which is defined using

Rta = The number of resources with attribute vector aat time t.

Rt = The resource state vector at time t.= �Rta�a∈�.

We then let Dtb be the number of loads withattribute b, and let Dt = �Dtb�b∈�. Our system state vec-tor is then given by

St = �RtDt�

We measure the state St just before we make a deci-sion. These decision epochs are modeled in discretetime t = 012 T , but the physical process occursin continuous time. For example, the available time ofa driver a6 and the “ready time” (time at which it isavailable for pickup) of a load b6 are both continuous.

There are two types of exogenous information pro-cesses: updates to the attributes of a driver and newcustomer demands. We let

�Rta = The change in the number of drivers withattribute a due to information arriving betweentime t− 1 and t.

�Dtb = The number of new loads that first becameknown to the system with attribute b betweentime t− 1 and t.

For example, �Dtb = +1 if we have a new customerorder with attribute vector b. If a driver attribute ran-domly changed from a to a′ (arising, for example,from a delay), we would have �Rta =−1 and �Rta′ = +1.We let Wt = � �Rt �Dt� be our generic variable for newinformation. We view information as arriving con-tinuously in time, where the interval between timeinstant t − 1 and t is labeled as time interval t. Thus,W1 is the information that arrives between now (t = 0)and the first decision epoch (t = 1).

The major decision classes for this problem includewhether a truck is to be used to move a load to a par-ticular destination or whether it needs to move emptyto another location in the anticipation of future loadswith better rewards. When a decision is applied to aresource, it produces a contribution. A loaded movewould generate revenue, whereas an empty movewould incur some cost. Decisions are described using

d = An elementary decision,

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


�L = The set of all decisions to cover a type of load,where an element d ∈�L represents a decisionto cover a load of type bd ∈�,

d� = The decision to hold a driver,�= �L ∪ d�,

xtad = The number of times decision d is applied toresource with attribute vector a at time t,

xt = �xtad�a∈�d∈�.

The decision variables xtad have to satisfy the follow-ing constraints: ∑

d∈�xtad =Rta ∀a ∈� (1)

∑a∈�

xtad ≤Dtbd∀d ∈�L (2)

xtad ≥ 0 a ∈�d ∈� (3)

Equation (1) captures flow conservation for drivers(we cannot assign more than we have of a particulartype) and Equation (2) is flow conservation on loads(we cannot assign more drivers to loads of type bd

than there are loads of this type). We let �t be theset of all xt that satisfy Equations (1)–(3). The feasibleregion �t depends on St . Rather than write ��St�, welet the subscript t in �t indicate the dependence onthe information available at time t. Finally, we assumethat decisions are determined by a decision functiondenoted

X��St�= A function that determines xt ∈�t given St ,where � ∈�,

�= A set of decision functions (or policies).

We next need to model the dynamics of the system.Both Rt and Dt evolve over time, but for the momentwe focus purely on the evolution of Rt . If we act on adriver with attribute a using decision d, we representthe change in the attribute vector using

a′ = aM�ad�

We model the transition function deterministically,which means that a′ is the attribute vector that wethink results from a decision but before any newinformation has arrived. So, if we decide to movea truck from Dallas to Chicago leaving at time 12.2with an expected travel time of 17.5, then immedi-ately after the assignment, this would be a truck withthe attribute that we expect it to be in Chicago at time29.7 (later information may change this). For algebraicpurposes, define

�a′�ad� ={

1 if aM�ad�= a′,

0 otherwise.

We now define the post-decision resource vector, whichis the resource vector after we make a decision but

before any new information arrives. This can bewritten as:

Rxta′ =

∑a∈�

∑d∈�

�a′�ad�xtad (4)

Finally, our next predecision resource vector would begiven by

Rt+1 a =Rxta + �Rt+1 a (5)

It is more conventional in stochastic dynamic sys-tems to write the transition from Rt to Rt+1. Explicitlycapturing the post-decision resource vector providessignificant computational advantages, as we illustratelater.

The transition function for the demands is symmet-rical. In addition to the state variable Dt , we woulddefine the post-decision demand vector Dx

t along withan indicator function similar to � to describe howdecisions change the attributes of a load. In the sim-plest model, a demand is either moved (in which caseit leaves the system) or it waits until the next timeperiod. In our project, it was possible to have a drivermove to pick up a load, move the load to an interme-diate location, and then drop it off so that a differentdriver could finish the move (this is known as a relay).Whereas such strategies are used for only a small per-centage of the total demand, trucking companies willuse such strategies to help get drivers home. A drivermay pick up a load that takes him too far from hishome. Instead, he may move the load part way so thata different driver can pick up the load and completethe trip.

We define the objective function using

ctad = The contribution generated by applying deci-sion d to resource with attribute vector a attime t.

The contributions were divided between “hard dol-lar” and “soft dollar” contributions. Hard dollarcontributions include the revenue generated frommoving a load minus the cost of actually moving thetruck (first moving empty to pick up the load, fol-lowed by the actual cost of moving the load). Thesoft dollar costs capture bonuses for getting the driverhome, penalties for early or late pick-up of a load, andpenalties for getting a driver home before or after thetime that he was scheduled to get home.

If we assume that the contributions are linear, thecontribution function for period t would be given by

Ct�St xt�=∑a∈�

∑d∈�

ctadxtad (6)

The optimal policy maximizes the expected sum ofcontributions, discounted by a factor �, over all thetime periods:

F ∗0 �S0�=max

�∈�Ɛ

{ T∑t=0

�tC�StX�t �St��

∣∣∣∣S0

} (7)

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


One policy for solving this problem is the myopic pol-icy, given by

XMt �St�= arg max

xt∈�t

∑a∈�

∑d∈�

ctadxtad

which involves assigning known drivers to knownloads at each point in time. This is a straightfor-ward assignment problem, involving the assigning ofdrivers (that is, attributes a where Rta > 0) to loads(attributes b where Dtb > 0). One of the biggest chal-lenges we faced was the sheer size of this prob-lem, which involved over 2,000 available drivers andloads at each time period. Using careful engineering,we limited the number of links per driver (or load)to approximately 10, which still required generatingabout 20,000 links for each time period (the costing ofeach link required considerable calculations to enforcedriver work rules and to handle the service con-straints on each load). Given a solution xt =XM

t �St�,we would then use our transition functions to com-pute �Rx

t Dxt � and then find �Rt+1Dt+1� by sampling

�Rt+1�� and �Dt+1��.A central hypothesis of this research is that an algo-

rithm that does a better job of solving Equation (7)will do a better job of matching the historical per-formance of the company. Although we use approxi-mations, our strategy works from a formal statementof the objective function (something that is typicallymissing from most simulation papers) rather thanheuristic policies. As we show, a by-product of thisstrategy is that we also obtain estimates of the deriva-tive of F ∗

0 �S0� with respect to R0a (for a at some levelof aggregation) that would tell us the value of hiringadditional drivers in a particular domicile.

In the next section, we describe the strategies wetested for solving Equation (7).

3. Algorithmic StrategiesIn dynamic programming, instead of solving Equa-tion (7) in its entirety, we divide the problem into timestages. At each time period depending on our currentstate, we can search over the set of available actionsto identify a subset that is optimal. The value associ-ated with each state can be computed using Bellman’soptimality equations, which are typically written as

Vt�St�=maxxt∈�t

(Ct�Stxt�+�

∑s′∈�

p�s′ �Stxt�Vt+1�s′�) (8)

where p�s′ � St xt� is the one-step transition matrix giv-ing the probability that St+1 = s′, and � is the statespace. Solving Equation (8) encounters three curses ofdimensionality: the state vector St (with dimensional-ity �� + ��, which can be extremely large), the out-come space (the expectation is over a vector of ran-dom variables measuring �� + ��), and the actionspace (the vector xt is dimensioned �� × ��).

Section 3.1 provides a sketch of a basic approx-imate dynamic programming algorithm for approx-imating the solution of Equation (7). Section 3.2describes how we update the value function. Sec-tion 3.3 shows how we solve the statistical prob-lem of estimating the value of drivers with hundredsof thousands of attribute vectors. Section 3.4 brieflydescribes research on stepsizes that was motivated bythis project. In §3.5, we describe how we implementeda backward pass to accelerate the rate of convergence.Finally, §3.6 reports on a series of comparisons of dif-ferent algorithmic choices we had to make.

3.1. An Approximate DynamicProgramming Algorithm

Approximate dynamic programming has been emerg-ing as a powerful technique for solving dynamicprograms that would otherwise be computationallyintractable. Our approach requires merging math pro-gramming with the techniques of machine learningused within approximate dynamic programming. Ouralgorithmic strategy differs markedly from what ispresented in classic texts on approximate dynamicprogramming, particularly in our use of the post-decision state variable. A comprehensive treatment ofour algorithmic strategy is contained in Powell (2007).

We solve Equation (7) by breaking the dynamic pro-gramming recursions into two steps:

V xt−1�S

xt−1�= Ɛ�Vt�St� � Sx

t−1 (9)

Vt�St�=maxxt∈�t

�C�St xt�+�V xt �Sx

t �� (10)

where St = SMW �Sxt−1Wt� and Sx

t = SMx�St xt�. Thebasic algorithmic strategy works as follows: At itera-tion n, assume we are following sample path �n andthat we find ourselves at the post-decision state Sxn

t−1after making the decision xn

t−1. Now, compute the nextpredecision state Sn

t using

Snt = SMW �Sxn

t−1Wt��n��

From state Snt , we compute our feasible region �n

t

(which depends on information such as �Rnt and �Dn

t ).Next, solve the optimization problem:

�vnt =max

xt∈�nt

(Ct�S

nt xt�+� �V n−1

t �SMx�Snt xt��

)(11)

and assume that xnt is the value of xt that solves Equa-

tion (18). We then compute the post-decision stateSxn

t = SMx�Snt xt� to continue the process.

We next wish to use the solution of Equation (18) toupdate our value function approximation. With tradi-tional approximate dynamic programming (Bertsekasand Tsitsiklis 1996; Sutton and Barto 1998), wewould use �vn to update a value function approxima-tion around Sn

t . Using the post-decision state variable,

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


we use �vnt update �V n−1

t−1 �Sxnt−1 � around Sxn

t−1 . The updat-ing strategy depends on the specific structure of�V n−1

t−1 �Sxt−1�.

To design our value function approximation, wetook advantage of two properties of our problem.First, most loads are served at a given point in time.If we were to define a post-decision demand vec-tor Dx

t (comparable to the post-decision resource vec-tor Rx

t ) that gives the number of loads left over afterassignment decisions have been made, we would findthat most of the elements of Dx

t were zero. Second,given the complexity of the attribute vector, Rta wastypically zero or one. For this reason, we used avalue function approximation that was linear in Rta,given by

�V n−1t �Sx

t � = �V n−1t �Rx

t �

= ∑a′∈�

vta′Rxta′ (12)

We have worked extensively with nonlinear (piece-wise linear) approximations of the value function tocapture nonlinear behavior such as “the fifth truck ina region is not as useful as the first” (see Topalogluand Powell 2006, for example), but in this project thefocus was less on determining how many drivers tomove and more on what type of driver to use.

It is easy to rewrite Equation (12) using

�Vt�Rxt �=

∑a′∈�

vta′∑a∈�

∑d∈�

�a′�ad�xtad (13)

where Equation (13) is obtained by using the statetransition equation (4). This enables us to write theproblem of finding the optimal decision functionusing

X�t �St� = arg max

xt∈�t

(∑a∈�

∑d∈�

ctadxtad +�∑a′∈�

vta′

·∑a∈�

∑d∈�

�a′�ad�xtad

)

= arg maxxt∈�t ��

∑a∈�

∑d∈�

(ctad +�

∑a′∈�

vta′�a′�ad�

)xtad

(14)

Recognizing that∑

a′∈� �a′�ad� = �aM�atdt ��ad� = 1,

we can write Equation (14) as

X�t �St�= arg max

xt∈�t ��

∑a∈�

∑d∈�

(ctad +�vn−1

t aM �ad�

)xtad (15)

Clearly, Equation (15) is no more difficult than solv-ing the original myopic problem, with the only differ-ence being that we have to solve it iteratively in orderto estimate the value function approximation. Fortu-nately, it is neither necessary nor desirable to reesti-mate the value functions each time we undertake apolicy study.

Drivers Loads

a1

a2

a3

a4

a5

Future attributes

aM (a3, d1)

aM (a3, d2)

aM (a3, d3)

aM (a3, d4)

aM (a3, d5)

Figure 1 Driver Assignment Problem, Illustrating the Different FutureDriver Attributes that Have to be Evaluated

We face two challenges at this stage. First, wehave to find a way to update the values of vn−1

t−1 ausing information derived from solving the decisionproblems. Section 3.2 describes a Monte Carlo-basedapproach, but this introduces a statistical problem. Asillustrated in Figure 1, in order to decide which loada driver should move, we have to know the value ofthe driver at the end of each load. This means it is notenough to know the value of drivers with attributesthat actually occur (that is, Rta > 0); we must alsoknow the value of attributes that we might visit.

3.2. Value Function UpdatesOnce we have settled on a value function approxi-mation, we face the challenge of estimating it. Thegeneral idea in approximate dynamic programmingis that we iteratively simulate the system forward intime. At iteration n, we follow a sample path �n thatdetermines �Rn

t = �Rt��n� and �Dn

t = �Dt��n�. The decision

function in Equation (15) is computed using valuefunctions vn−1

ta′ , computed using information from iter-ation n − 1. We then use information from itera-tion n to update vn−1

t−1 a, giving us vnt−1 a. This section

describes how the updating is accomplished.Assume that our previous decision (at time t− 1)

left us in the post-decision state Sxnt−1 . Following

the sample path �n then puts us in state Snt =

SMW �Sxnt−1Wt��

n��, which determines our feasibleregion �n

t . We then make decisions at time t bysolving

Ft�Snt � = max

xt∈�nt

(Ct�S


t �Sxt �) (16)

where Sxt = SMx�Sn

t xt�. We let xnt be the value of xt

that solves Equation (16). Note that Rxnt−1 affects Equa-

tion (16) through the flow conservation constraint (1).

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


Keep in mind that Rnta = Rxn

t−1 a + �Rta��n�. �Rta may be

the random arrival of a new driver but, for our work,it primarily captures random changes in the status ofa driver (e.g., travel delays or equipment failures). Ifthese transitions change the status of a driver froma to a′, then we would have �Rta = −1 and �Rta′ = +1.If there are no random changes of this sort (whichmeans that �Rta = 0), then it is easy to see that

�vnt−1 a =

#F �St�

#Rxt−1 a

= #F �St�

#Rta

= �$nta (17)

where �$nta is the dual variable for the flow conserva-

tion constraint (1). If we do allow random changes(say, from a to a′), we would use �$n

ta′ to update �vnt−1 a.

We want to use information from Equation (16) toupdate the value functions used at time t − 1, givenby vn−1

t−1 a. Keeping in mind that these are estimatesof slopes, what we need is the derivative of Ft�S

nt �

with respect to each Rxnt−1 a, where a= ax

t−1, which wecompute using

�vnt−1 a =

#F �St�

#Rxt−1 a

= ∑a′∈�

#F �St�

#Rta′

#Rta′

#Rxt−1 a

∣∣∣∣�=�n

(18)

#F �St�/#Rta′ is just the dual of the optimization prob-lem (15) associated with the flow conservation con-straint (1), which we denote by $n

ta′ . For the secondpart of the derivative, we have

#Rta′

#Rxt−1 a

=

1 if a′ = aMW �axt−1Wt��

n��,

0 otherwise.

This simply means that if we had a truck withattribute ax

t−1, which then evolves (due to exogenousinformation) into a truck with attribute a′ = at =aMW �ax

t−1Wt��n��, then

�vnt−1 a = $n

ta′

We do not have to execute the summation in Equa-tion (18). We just need to keep track of the transitionfrom ax

t−1 to at . We note, however, that we are unableto compute $n

ta′ for each attribute a′ ∈� (the attributespace is too large). Instead, for each at−1 whereRxn

t−1 at−1> 0, we found a′ = at = aMW �at−1Wt��

n��and computed $ta′ . We then found �vn

t−1 at−1from

Equation (18).Once we have computed �vn

t−1 a, we update thevalue function approximation using

vnt−1 at−1

= �1−&n−1�vn−1t−1 at−1

+&n−1 �vnt−1 a (19)

where &n−1 is a stepsize between zero and one (dis-cussed in greater detail in §3.4).

Step 0: Initialization:Step 0a: Initialize �V 0

t t ∈� .Step 0b: Initialize the state S1

0 .Step 0c: Set n= 1.

Step 1: Choose a sample path �n.Step 2: Do for t = 01 T :

Step 2a: Solve the optimization problem:

maxxt∈�n

t

(Ct�S


t �SMx�Snt xt��

) �20�

Let xnt be the value of xt that solves Equation (20),

and let $tatbe the dualcorresponding to the

resource conservation constraint for each Rtat

where Rtat> 0.

Step 2b: Update the value function using

vnt−1 a = �1−&n−1�v

n−1t−1 a +&n−1 �$n

ta.

Do this for each attribute a for which we havecomputed �$n

ta.Step 2c: Update the state:

Sxnt = SMx�Sn

t xnt �

Snt = SMW �Sxn

t−1 Wt��n��.

Step 3: Increment n. If n≤N , then set Sxn0 = Sxn−1

T and go toStep 1.

Step 4: Return the value functions, �vnta t = 1 T a ∈� .

Figure 2 An Approximate Dynamic Programming Algorithm to Solvethe Driver Assignment Problem

We outline the steps of a typical approximatedynamic programming algorithm for solving the fleetmanagement problem in Figure 2. This algorithm usesa single pass to simulate a sample trajectory using thecurrent estimates of the value functions. We start froman initial state S1

0 = �R0D0� of drivers and loads witha value function approximation �V 0

t �Sxt �. From this, we

determine an assignment of drivers to loads x10. We

then find the post-decision state Sx10 and simulate our

way to the next state S1 = SMW �Sx10 W1��

1��. Thissimulation includes new customer orders as well asrandom changes to the status of the drivers. All ofthe complexity of the physics of the problem is cap-tured in the transition functions, which impose virtu-ally no limits on our ability to handle the realism ofthe problem.

At an early stage of the project, the companyexpressed concern that the results might be overfittedto a particular set of drivers (input to the model as Rx

0)and loads. We took two steps in response to thisconcern. First, we randomized the loads, choosing asubset from a larger set of loads at each iteration. Sec-ond, we took the final resource state vector (Rx

T ) andused this as the new initial resource state vector (seeStep 3).

A major technical challenge in the algorithm is com-puting the value function approximation �Vt = �vta�a∈�.Even if the attribute vector a has only a few dimen-sions, the attribute space is too large to update usingEquation (19). Furthermore, we only obtain updates�vnta for a subset of attributes a at each iteration.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


In principle, we could have solved our decision prob-lem for a resource vector Rta using all the attributesin �. This is completely impractical. For our simula-tion, we only generated nodes for attributes a whereRn

ta > 0 (as a rule, we generated a unique node foreach driver), which means we obtain �vn

ta only for asubset of attributes. We need an estimate vta not justfor where we have drivers (that is, Rta > 0) but wherewe might want to send drivers. We address this prob-lem in the next section.

3.3. Approximating the Value FunctionThe full-attribute vector a that is needed to completelycapture the important characteristics of the driverproduces an attribute space that is far too large toenumerate. Fortunately, it is not necessary to use allthese attributes for the purpose of approximating thevalue function. In addition to time (we have a finitehorizon model, so all value functions are indexedby time), three attributes were considered essential:the location of the driver, the home domicile of thedriver, and his capacity type (team, solo, or indepen-dent contractor). The company divided the countryinto 100 regions for the purpose of representing loca-tion and domicile (this is only for the value function).Combined with three capacity types and 20 time peri-ods, this produced a total of 600,000 attributes forwhich we would need an estimated value. Althoughdramatically smaller than the original attribute space,this is still extremely large. Most of these attributeswill never be visited, and many will be visited onlya few times. As a result, we have serious statisticalissues in our ability to estimate vta.

The standard approach to overcoming large statespaces is to use aggregation. We can use aggrega-tion to create a hierarchy of state spaces ��g� g =012 �� with successively fewer elements. Weillustrate four levels of aggregations in Table 1. Atlevel 0, we have 20 time periods, 100 regions for loca-tion and domicile, and 3 capacity types, producing600,000 attributes. At aggregation level 1, we ignoredthe driver domicile; at aggregation level 2, we ignoredthe capacity type; and at aggregation level 3, we rep-resented location as one of 10 areas, which had theeffect of insuring that we always had some type ofestimate for any attribute.

Table 1 Levels of Aggregation Used to Approximate Value Functions

g Time Location Domicile Capacity type ��0 ∗ Region Region ∗ 600�0001 ∗ Region — ∗ 6�0002 ∗ Region — — 2�0003 ∗ Area — — 200

Note. A “∗” corresponding to a particular attribute indicates that the attributeis included in the attribute vector, and a “—” indicates that it is aggre-gated out.

Choosing the right level of aggregation to approxi-mate the value functions involves a trade-off betweenstatistical and structural errors. If �v

�g�ta g ∈� denotes

estimates of a value vta at different levels of aggre-gation, we can compute an improved estimate asa weighted combination of estimates of the valuesat different levels of aggregation using

vta =∑g∈�

w�g�ta · v�g�

ta (21)

where �w�g�ta g∈� is a set of appropriately chosen

weights. George, Powell, and Kulkarni (2005) showthat good results can be achieved using a simpleformula, called WIMSE, that weights the estimatesat different levels of aggregation by the inverse of theestimates of their mean squared deviations (obtainedas the sum of the variances and the biases) from thetrue value. These weights are easily computed froma series of simple calculations. We briefly summarizethe equations without derivation. We first compute

*�gn�ta = Estimate of bias due to smoothing a transient

data series,

= �1−+n−1�*�gn−1�ta ++n−1��vn − v

�gn−1�ta � (22)

,�gn�ta = Estimate of bias due to aggregation error,

= v�gn�ta − v

�0n�ta ¯*�gn�

ta = Estimate of total squared variation,= �1−+n−1�

¯*�gn−1�ta ++n−1��vn

ta − v�gn−1�ta �2

We are using two stepsize formulas here. &�gn−1�ta is

the stepsize used in Equation (19) to update vn−1ta . This

is discussed in more detail in §3.4. +n is typically adeterministic stepsize that might be a constant suchas 0.1, although we used McClain’s stepsize rule:

+n =+n−1

1++n−1 − + (23)

where + = 010 has been found to be very robust(George, Powell, and Kulkarni 2005).

We estimate the variance of the observations at aparticular level of aggregation using

�s2ta�

�gn� =¯*�gn�ta − �*

�gn�ta �2

1+-�gn�ta

(24)

where -�gn�ta is computed using

-�gn�ta =

(&

�gn−1�ta

)2n= 1

(1−&

�gn−1�ta

)2-

�gn−1�ta + (

&�gn−1�ta

)2n > 1

This allows us to compute an estimate of the varianceof v

�gn�ta using

� �.2ta�

�gn� = Var/v�gn�ta 0

= -�gn�ta �s2

a��gn� (25)

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 50 100 150 200 250 300 350 400 450

Ave

rage

wei

ght

Level

1

0

23

Iterations

Figure 3 Average Weight Put on Each Level of Aggregation by Iteration

The weight to be used at each level of aggregation isgiven by

w�gn�ta ∝ (

� �.2ta�

�gn� + �,�gn�ta �2

)−1 (26)

where the weights are normalized so they sum toone. This formula is easy to compute even for verylarge-scale applications such as this. All the statisticshave to be computed for each attribute a, for all levelsof aggregation, that is actually visited. From this, wecan compute an estimate of the value of any attributeregardless of whether we visited it or not. Figure 3shows the average weight put on each level of aggre-gation from one run of the model. As is apparent fromthe figure, higher weights are put on the more aggre-gate estimates, with the weight shifting to the moredisaggregate estimates as the algorithm progresses. Itis very important that the weights be adjusted as thealgorithm progresses; using the final set of weightsat the beginning produces very poor results.

3.4. StepsizesStepsizes are often treated as the soft science of ap-proximate dynamic programming, with people usingsimple formulas such as a constant (0.1 or 0.05 is typ-ical) or a declining stepsize rule such as a/�a+n� forsome a. A popular rule is McClain’s formula, given byEquation (23), which provides 1/n behavior initiallyand quickly converges to the constant �& (we used 0.10,which is typical).

We genuinely struggled with stepsizes for thisproblem. If the stepsize was too small, the rate of con-vergence was much too slow. If the stepsize was toolarge, the performance was unstable and the varianceof the estimates vta was too large (later, we show thatwe use vta in our policy studies).

As a by-product of this research, we developed anew stepsize formula that significantly improved theperformance of the algorithm (faster initial conver-gence, with better stability in the limit). The stepsizerule is developed in George and Powell (2006), where

it was named the optimal stepsize algorithm (OSA)and is given by

&n = 1− � �.2�n

�1+ -n−1�� .2�n + �*n�2 (27)

where � �.2�n is computed using Equation (25) and*n is given by Equation (22) (we have dropped theindexing by aggregation level g and attribute a forsimplicity). The stepsize rule balances the estimate ofthe noise � �.2�n and the estimate of the bias *n that isattributable to the transient nature of the data. If thedata are found to be relatively stationary (low bias),then we want a smaller stepsize; as the estimate of thenoise variance decreases, we want a larger stepsize.

3.5. ADP Using a Double-Pass AlgorithmThe steps in Figure 2 describe the simplest imple-mentation of an approximate dynamic programmingalgorithm that steps forward in time, updating valuefunctions as we proceed. This is also known asa TD(0) algorithm (Bertsekas and Tsitsiklis 1996).Although easy to implement, this algorithm can sufferfrom slow convergence because �vn

ta depends on vn−1ta ,

which is typically initialized to zero and slowly rises,producing a downward bias in all the value func-tion estimates. This does not necessarily produce poordecisions, but it does mean that vn

ta underestimatesthe value of a driver with attribute a at time t.

A strategy for overcoming this slow convergence,which proved to be particularly valuable for thisproject, involves using a two-pass procedure (alsoknown as TD(1)). In this procedure, we simulate deci-sions forward in time without updating the valuefunctions. The derivative �vn

ta is then computed ina backward pass. In the forward pass implementa-tion, �vn

ta depends on vn−1ta . With the backward pass,

�vnta depends on �vn

t+1 a.In classical discrete dynamic programs, implement-

ing a backward pass (or backward traversal, as itis often referred to) is fairly straightforward (seeBertsekas and Tsitsiklis 1996; Sutton and Barto 1998).If we are in state Sn

t , we choose an action xnt accord-

ing to some policy, compute a contribution C�Snt x

nt �,

then observe information Wt+1��n�, which leads us to

state Snt+1. After following a path Sn

t xnt S

nt+1x

nt+1 ,

we can compute �vnt = C�Sn

t xnt � + �vn

t+1 recursively bystepping backward through time.

This logic is completely intractable for the prob-lem class that we are considering. Instead, we per-form a numerical derivative for each driver, whichmeans that after solving the original assignment prob-lem (at time t), we loop over all the drivers (thatis, all the attributes a where Rta > 0) and set Rta = 0and reoptimize. The process is illustrated in Fig-ure 4, where 4(a) shows the initial assignment of four

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


at,1

at, 2

at, 3

at, 4

at,1

at, 2

at, 3

at, 4

at,1

at, 2

at, 3

at, 4

+1

–1

–1

+1

+1

+1∆Ct

∆Rtx

00

0

0

0

0

v(at +1, 12)′′

v(at +1, 23)′′

v(at +1, 34)′′

v(at +1, 45)′′

v(at +1, 22)′′

v(at +1, 33)′′

v(at +1, 45)′′

(a) Initial solution (c) Difference(b) Without driver a1

Figure 4 Illustration of Numerical DerivativeNote. (a) The base solution with four drivers, (b) the solution with driver a1 dropped out, and (c) the difference in assignment costs and post-decision resourcevector are shown.

drivers. Because the downstream value from assign-ing a driver to a load in the future depends on thedriver-load combination, we have duplicated eachload for each driver, using an ellipse to indicate whichdriver-load combinations represent the same load. Ifthe driver with attribute at1 is assigned to the sec-ond load, then this creates a driver in the future withattribute a′′t+112 and value v�a′′t+112�.

In Figure 4(b), we show the solution withoutdriver a1. Because driver a2 shifts up to cover load 2,we no longer have a driver in the future with attributea′′t+112 but instead we have a driver with attributea′′t+122. Figure 4(c) shows the difference, where weare interested in the change in the immediate contri-bution, and the change in the future availability ofdrivers. To represent these quantities, let Xt�Rt� be theinitial driver-load assignments and let Xt�Rt − ea� bethe perturbed solution, where ea is a vector of 0s witha 1 in the element corresponding to Rta. Now let

2Ct�a�=C�StXt�Rt��−C�StXt�Rt − ea��

be the change in costs due to changes in flows overthe driver to load assignment arcs as a result of theperturbation. Next, let Rx

t �Rt� be the post-decisionresource vector given Rt and let

2Rxt �a�=Rx

t �Rt�−Rxt �Rt − ea�

be the change in the post-decision state vector due tothe perturbation. Figure 4(c) indicates the change inflows that drive 2Ct�a�, and the vector 2Rx

t �a�, where2Rx

ta′�a�= 1 if including a driver with attribute a pro-duces an additional driver of type a′, or 2Rx

ta′�a�=−1if the change takes away a driver of type a′.

In the double-pass algorithm, we compute 2Ct�a�(which is a scalar) and 2Rx

t �a� (which is a vector of

+1s and −1s) for each attribute a (which we have cho-sen to represent). After we have completed the for-ward pass, we obtain �vn

ta in a backward pass using

�vnta =2Ct�a�+

∑a′∈�

2Rxta′�a��vn

t+1 a′

where we have made a slight notational simplificationby assuming that ax

t = at+1 (that is, there is no noisein the attribute transition function), which means thatRx

t =Rt+1.

3.6. ComparisonsThis section has produced a number of algorithmicchoices: Should we use the forward pass (Figure 2) orbackward pass (§3.5)? Should we compute �vn usingnumerical derivatives or dual variables? And shouldwe perform smoothing using the OSA stepsize (Equa-tion (27)) or a deterministic formula such as McClain(Equation (23))?

Figure 5 compares all of these strategies. We candraw several conclusions from this figure. First, itis apparent that the value functions computed fromthe backward pass show much faster convergence inthe early iterations than those computed from usinga forward pass. This is a well-known property ofdynamic programming when we start with initialvalue function approximations equal to zero. How-ever, the difference between these approaches disap-pears after 50 iterations. We also have to consider thatthe backward pass is much harder to implement. Thereal value of the backward pass is that we appearto be obtaining good value functions after as fewas 25 iterations (a result supported by other exper-iments reported below). For very large-scale appli-cations such as this (where each iteration requiresalmost 10 minutes of CPU time on a 3 GHz Pentiumprocessor), reducing the number of iterations neededfrom 50 to 25 is a significant benefit.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


0

1,000

2,000

3,000

4,000

5,000

0 50 100

OSA stepsize

McClain stepsize

Forward pass

Backward pass

Iterations

Ave

rage

val

ue f

unct

ion

Forward pass duals (OSA stepsize)

Numerical derivatives

Figure 5 Average Value Function When We Use Forward and Backward Passes, Numerical Derivatives and Dual Variables, and the OSA Stepsize orthe McClain Stepsize

The figure also compares value functions computedusing the OSA stepsize versus the McClain stepsize.The OSA stepsize produces faster convergence (this isparticularly noticeable when using a forward pass) aswell as more stable estimates (this is primarily appar-ent when using gradients computed using a back-ward pass).

Finally, we also see that there is a significant differ-ence between value functions computed using dualvariables versus numerical derivatives. It is easy toverify that the numerical derivative is greater than orequal to the dual variable, but it is not at all obviousthat the difference would be as large as that shownin Figure 5. Of course, this comes at a significantprice computationally. Run times using numericalderivatives are 30%–40% greater than if we used dualvariables. We have found, however, that althoughnumerical derivatives produce much more accuratevalue functions (important in our study), they do notproduce better dispatching decisions. If the interest isin a realistic simulation of the fleet (and not the valuefunctions themselves), then we have found that dualvariables work fine. In this paper, we wish to use thevalue functions to estimate the value of different typesof drivers.

4. Model CalibrationBefore the model could be used for policy analy-ses, the company insisted that it closely replicate anumber of operating statistics including the averagelength of haul (the length of a load to which a driveris assigned), the average revenue per truck per day,equipment utilization (miles per day), and the per-centage of drivers who were sent home on a weekend.These statistics had to fall between historical mini-mums and maximums for each of the three capacity

types. Model calibration meant matching the perfor-mance of the collective decisions made by the com-pany’s dispatchers (see Figure 6). Perhaps one of thesurprising (and significant) outcomes of the researchis that a properly calibrated optimization model wasrequired to closely match the performance of an expe-rienced group of dispatchers.

Average length of haul is particularly importantbecause drivers are only paid while they are drivingand longer loads mean less idle time. For this applica-tion, it was important to match the average length ofhaul for each of the three types of drivers (known as“capacity types”). Of the three capacity types, teams(drivers that work in pairs) prefer the longest loadsbecause they pay the most. The company was notwilling to consider the results of a simulation thatproduced an average length of haul that was signifi-cantly different (for each capacity type) from histori-cal performance. This could have an impact on driverturnover, which was not captured in the objectivefunction.

When we look at the historical pattern of loadsfor a particular driver class, we obtain a distributionsuch as that shown in Table 2. Thus, whereas thisdriver class may have an 800-mile average length ofhaul, this average will include a number of loads thatare significantly longer or shorter. Using penalties todiscourage assignments to loads that differ from theaverage would seriously distort the model.

Section 4.1 describes an algorithmic strategy toproduce assignments that match historical patternsof behavior. Section 4.2 then describes how wellthe model matched the historical metrics, where wedepend on both the contribution of the value func-tions as well as the pattern-matching logic described

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


Figure 6 The Schneider Dispatch Center in Green Bay, Wisconsin

in the next section. Section 4.3 compares the contribu-tion of value functions against the pattern-matchinglogic.

4.1. Pattern MatchingThe problem of matching historical averages forlength of haul (LOH) by capacity type can be viewedas an example where the results of a model need tomatch exogenous patterns of behavior. Our presenta-tion follows the work in Marar, Powell, and Kulkarni(2006) and Marar and Powell (2004). In this work, weassume that we are given a pattern vector 3, where

3e = �3ead�a∈� d∈�,

3ead= The exogenous pattern, representing the per-

centage of time that resources with attribute a

Table 2 Illustrative Length-of-Haul (LOH)Distribution for a Single Driver Type

LOH (miles) Relative frequency (%)

0–390 8�3390–689 33�9690–1,089 36�61,090–1,589 15�61,590– 5�4

are acted on by decisions of type d based onhistorical data.

We refer to 3e as an exogenous pattern because itdescribes desired behaviors rather than a specific costfor a decision. In most applications, the indices a andd for 3e

adare aggregations of the original attribute a

and decision d. For the purpose of matching thelength of haul, a consists only of the capacity type andd represents a decision to assign a driver of type a toa load whose length is within some range.

We next have to determine the degree to whichthe model is matching the exogenous pattern. Let3ad�x� be the average pattern flow from a solutionX�R� of the model corresponding to the attribute-decision pair �a d�. Also, let Ra be the total numberof resources with attribute a over the entire horizon.The goal is for the model pattern flow 3ad�x� to closelymatch the desired exogenous pattern flows 3e

ad.

The deviation from the desired frequency is cap-tured in the objective function using a penaltyfunction. The actual term included in the objectivefunction for each pattern is denoted as H�3�x�3e�where we used the square of the difference of

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


the two, given by

H�3�x�3e�=∑a

∑d

Ra�3ad�x�−3ead�2 (28)

The aim is to penalize the square of the deviationsof the observed frequencies from the desired frequen-cies. In practice, a quadratic approximation of H isused. The pattern matching term H is multiplied by aweighting parameter 5 and subtracted from the stan-dard net revenue function. The objective function thatincorporates the patterns is written as follows:

xt�5�= arg maxxt∈�t

[∑a∈�

∑d∈�

ctadxtad − 5H�3�x�3e�

] (29)

5 permits control over how much emphasis is put onthe patterns relative to the remainder of the objectivefunction. Setting 5 to zero turns the patterns off.

We use an algorithm proposed by Marar and Powell(2004) (and modified by Powell, Wu, and Whisman2004) that incorporates this feature.

4.2. Comparison to HistoryWe are finally ready to compare the model to histor-ical measures. We have 4 types of statistics that aremeasured for each of the 3 capacity types, giving us12 statistics altogether. The company derived what itconsidered to be acceptable ranges for each statistic.Figures 7(a) to 7(d) give the length of haul, revenueper driver, utilization (miles per driver per day), andthe percentage of drivers who are sent home on a

Type 1 Type 2 Type 3

Capacity category

LO

H


Capacity category


Capacity category

% d

rive

rs h

ome

on w

eeke

nds

Hstorical minimumSimulationHistorical maximum


Capacity category

Util

izat

ion

(a) (b)

(c) (d)

Rev

enue

per

dri

ver

Figure 7 Simulation Results Compared Against Historical Extremes for Various Patterns

weekend. The last statistic reflects the ability of themodel to get drivers home on weekends, which wasviewed as being important to the drivers.

All the results of the model closely matched his-torical averages. The units of the vertical axis havebeen eliminated due to the confidentiality of the data,but the graphs accurately show the relative error (thebottom of the vertical axis is zero in all the plots).The bands were developed by company managementbefore the model was run. It is easy to see that threeof the four sets of max/min bands are quite tight. Wealso note that although we used specific pattern logicto match the length of haul statistics, the other statis-tics were a natural output of the model, calibratedthrough the use of cost-based rules.

At this point, company management felt comfort-able concluding that the model was well calibratedand could be used for policy studies. Although themodel has many applications, in §5 we focus specifi-cally on the ability of the model to evaluate the valueof drivers by capacity type and domicile. This studyrequired that the value functions do more than simplyproduce good driver assignment decisions; the valuefunctions themselves had to accurately estimate thevalue of each driver type.

4.3. Value Function Approximations vs. PatternsWe have introduced two major algorithmic strate-gies for improving the performance of the model:value function approximations (VFAs), which producethe behavior of optimizing over time, and pattern

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


matching, which is specifically designed to help themodel match the length of haul for each driver class.These strategies introduce two questions: How dovalue function approximations and patterns each con-tribute to the ability of the model to match historicalperformance? And how do they individually affectthe quality of the solution as measured by the objec-tive function?

Figure 8 shows the average length of haul as a func-tion of the number of iterations (a) with patterns andVFAs, (b) with patterns and without VFAs, (c) without

UB

LB

130

120

110

100

90

80

Mile

s

Iterations

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Iterations1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

130

120

110

100

90

80

Mile

s

(a) Driver type 1

(b) Driver type 2

Patterns only

Both patterns and VFAs

VFAs only

No patterns or VFAs

Figure 8 Length of Haul for Two Driver Classes With Patterns and VFAs, With Patterns and Without VFAs, Without Patterns and With VFAs, andWithout Patterns or VFAs

Note. Upper and lower bounds (UB and LB, respectively) represent the acceptable range set by management.

patterns and with VFAs, and (d) without patternsor VFAs. We show the results for two different driverclasses because the behavior is somewhat different.In both figures, we show upper and lower boundsspecified by management as the limit of what theyconsider acceptable (the middle of this range is con-sidered the best). Both figures show that we obtainthe worst results when we do not use VFAs or pat-terns, and we obtain the best results with patternsand VFAs. Of interest is the individual contribution ofVFAs versus patterns. In Figure 8(a), the use of VFAs

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


Optimization objective function

0

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

35,000,000

40,000,000

45,000,000

50,000,000

1 6 11 16 21 26 31 36 41 46

Obj

ectiv

e fu

nctio

n

Iterations

Both patterns and VFAs

Patterns onlyVFAs only

No patterns or VFAs

Figure 9 Objective Function Without Patterns and VFAs, With Patterns and Without VFAs, Without Patterns and With VFAs, and With Patternsand VFAs

alone improves the ability of the model to match his-tory, whereas in Figure 8(b) VFAs actually make thematch worse. Even in Figure 8(b), VFAs and patternstogether outperform either alone.

We next examine the effect of VFAs and patternson the objective function. We define the objectivefunction as the total contribution earned by follow-ing the policy determined by using patterns andvalue functions. The contributions include the rev-enues from covering loads minus the cost of mov-ing the truck and any penalties for activities such asarriving late to a service appointment or allowing adriver to be away from home for too long (the “softcosts”). Figure 9 shows the objective function for thesame four combinations (with and without patterns,with and without value functions). The figure showsthat the results using the value functions significantlyoutperform the results without the value functions.Including the patterns with the value function doesnot seem to change the objective function (althoughit obviously improves our ability to match historicperformance measures). Interestingly, using patternswithout the value functions produces a noticeableimprovement over the results without the patterns (orvalue functions), suggesting that the patterns do, infact, contribute to the overall objective function. How-ever, the point of the patterns is to achieve goals thatare not captured by the objective function, so this ben-efit appears to be incidental.

5. Fleet Mix StudiesAll truckload motor carriers are continuously hir-ing drivers just to maintain their fleet size. It isnot unusual for companies to experience over 100%turnover (that is, if the fleet has 1,000 drivers, they

have to hire 1,000 drivers a year to maintain the fleet).Because companies are constantly advertising andprocessing applications, it is necessary to decide eachweek how many jobs to offer drivers based on theirhome domicile and which of the three capacity typesthey would belong to. We studied the ability of themodel to help guide the driver hiring process.

We divided the study into two parts. In §5.1, weassessed our ability to estimate the marginal valueof a driver type (defined by the driver domicile andcapacity type) using the value function approxima-tions. Then, we report in §5.2 on the results of simula-tions where we used the value functions to change themix of drivers (while holding the fleet size constant).

5.1. Driver ValuationsFor our project, the 100 domicile regions and 3 capac-ity types produced 300 driver types. If this were atraditional simulator, we could estimate the value ofeach of these driver types by starting from a singlebase run, then incrementing the number of drivers ofa particular type and running the simulation again.There is a fair amount of noise inherent in the resultsof a single simulation run, so it might be reasonableto replicate this 10 times and take an average. With300 driver types, this implies 3,000 runs of the simu-lation model.

We can avoid this by simply using the value func-tions. If we run N iterations of our ADP algorithm,we might expect that the final value functions for timet = 0, given by vN

0 = �vN0a�a∈�, could be used to estimate

the value of each driver type. The value functions areindexed by three attributes (location, domicile, andcapacity type); however, we only need values indexedby domicile and capacity type. The value indexed

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


by domicile and capacity type is nothing more thanan aggregation on location and was estimated usingthe same methods we used to estimate the valuefunctions at different levels of aggregation. For theremainder of this section, we use the attribute a torepresent driver domicile and capacity type only. Inaddition, we will let v0a = vN

0a be our final estimateof the marginal value of a driver (at time zero) withattribute a.

Whereas it is certainly reasonable to expect v0a to bethe marginal value of a driver of type a, we needed toverify that this was, in fact, an accurate estimate. Weran experiments adding 10, 20, 30, 40, and 50 driversfor four different driver classes. These experimentsconvinced us that the model produced relatively lin-ear behavior when we add up to 20 drivers.

We then estimated the value of adding 20 differ-ent types of drivers (different domiciles and capac-ity types) by adding 20 drivers and averaging themarginal value over 10 repetitions of the experiment.In each case, we computed a 95% confidence intervalfor the slope (based on the estimated mean and stan-dard deviation of both the base case and the resultsof the 10 iterations with 20 additional drivers). Fig-ure 10 shows the confidence intervals for the slopeestimated from adding 20 additional drivers and thepoint estimate from the value function for the 20 dif-ferent driver types. For 18 driver types, the valuefunction estimate fell within the confidence interval(with a 95% confidence interval, we would expect

–500

0

500

1,000

1,500

2,000

2,500

3,000

1 2 3 4 5 6 7 8 9 11 12 15 16 17 18 19 20

Attribute vector (driver type)

Mar

gina

l val

ue o

f a

driv

er

13 1410

Figure 10 Predicted Values Compared Against Observed Values from Actual ScenariosNote. The columns represent the approximations of the marginal values for different driver types. The error bars denote a 95% confidence interval around themean marginal value, computed from observed scenarios.

19 of the driver types to fall within the confidenceinterval).

5.2. Driver Remix ExperimentsIn this section, we attempt to optimize the numberof drivers belonging to each class so that there is anincrease in the objective function. The method that weadopt for this purpose is to redistribute the driversbetween the various driver types such that there aremore drivers of types with higher marginal values ascompared with the ones with lower values.

To find the number of drivers to be added orremoved from each class, we apply a stochastic gra-dient algorithm where we use a correction term tosmooth the original number of drivers of each class.The correction term is a function of the difference inthe marginal value from the mean marginal value ofall the driver classes. We define the following:

vna = The marginal value of a driver with attribute a

at iteration n.Rn

a = The number of drivers with attribute a atiteration n.

v∗ = vna averaged over all attribute vectors a.

The algorithm for computing the new number ofdrivers of class a consists of the following step:

Rn+1a = max�0Rn

a +*�vna − vn

∗� (30)

where * is a scaling factor that we set, after someexperimentation, to 0.10. After the update, we thenrescale Rn+1 so that

∑a R

n+1a =∑

a Rna .

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


0

2

4

6

8

10

12

14

16

18

600 700 800 900 1,000

Perc

enta

ge o

f dri

vers

not

get

ting

hom

e

Original

Remix

1.5

1.6

1.7

1.8

1.9

2.0

200 250 300 350 400 450 500 550 600

Number of iterations

Number of iterations

Obj

ectiv

e fu

nctio

n (m

illio

ns)

(b) Percent not getting home

(a) Objective function

Figure 11 Result of Driver Remix ExperimentsNote. (a) The change in the objective function, and (b) change in the per-centage of drivers not getting home are shown.

In Figure 11, we show the effect of shifting to anew mix of drivers. Figure 11(a) shows the improve-ment in the objective function when we used valuefunctions to adjust the mix of drivers. We did notadjust the driver mix until iteration 400 so that thevalue functions had a chance to stabilize. Figure 11(b)shows the percentage of drivers who did not gethome within the simulation. This figure shows asignificant improvement in our ability to get drivershome when we shift the fleet based on the value func-tion approximations.

6. ConclusionsThis paper has demonstrated that approximate dy-namic programming allows us to produce an accu-rate simulation of a large-scale fleet that (a) allowedus to capture real-world operations at a very highlevel of detail, (b) produced operating statistics thatclosely matched historical performance, and (c) pro-vided accurate estimates of the marginal value of 300different driver types from a single simulation. Thetechnology of approximate dynamic programmingallows us to capture all the relevant features of driversand loads to produce a very realistic simulation,

including decisions that balance immediate contribu-tions against downstream impacts. The logic is ableto handle different types of uncertainty including ran-dom customer demands and travel times. Value func-tion approximations produced not only more realisticbehaviors (measured in terms of our ability to matchhistorical performance) but also the marginal valueof different types of drivers from a single run of themodel.

This project motivated other important results.Although the value functions were approximatedin terms of only four driver attributes (location,driver type, domicile, and time), this still produced600,000 parameters to be estimated, creating a sig-nificant statistical problem. A new approach forestimating parameters using a weighted averageof estimates at different levels of aggregation wasdeveloped specifically for this project. This methodwas shown to produce better, more stable estimates.This project also motivated the development of anew stepsize formula that eliminated the need toconstantly tune parameters in deterministic formu-las. Finally, we used novel pattern-matching logicto produce behaviors (the average length of a loadfor different driver types) that matched historicalperformance.

The simulation has been adopted at SchneiderNational as a planning tool that, as of this writing,is used continually to perform studies of policies thataffect the performance of the network. A partial listof benefits from studies that have been undertakenusing the simulation are:

• Getting drivers home—A major component forretaining drivers in a long-haul carrier is the abilityto return them home in a predictable way. Schnei-der had developed a plan to make stronger commit-ments to drivers, but the simulation showed that theplan would have cost the company $30 million peryear. Using the model, an alternative strategy wasdeveloped that provided 93% of the proposed self-scheduling flexibility for only $6 million per year.

• Quantifying the cost of hours-of-service rules—Using the model, Schneider has been able to quan-tify the cost of changes in the hours-of-service rulesset by the Department of Transportation. With thisinformation, we are able to effectively negotiateadjustments in customer billing rates and freighttendering/handling procedures, leading to marginimprovements of 2% to 3%.

• Setting appointments—The model has been usedto evaluate the value of new policies for settingappointments. Preliminary results suggest marginimpacts from improved utilization are in the rangeof 4%–10%, and the number of late deliveries wasreduced by half.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


• Cross-border driver management—With recentchanges in security and border policies, it is necessaryto maintain a pool of drivers who are trained withthese policies. Using the model, Schneider was ableto reduce the number of drivers engaged in bordercrossing by 91% and restrict relays to three designatedpoints. This has resulted in an initial avoidance of $3.8million in training/identification/certification costsand ongoing annual cost avoidance of $2.3 million.

• Hiring drivers—The home location of long-haultruck drivers has a significant impact on networkoperating efficiency. Schneider is continually hiringdrivers and can control the number of drivers hired ineach region. Using the model, Schneider has been ableto quantify the marginal contribution of changes inregional driver populations, leading to an estimatedannual profit improvement of $5 million.

The next step with the model is to focus on theloads. The model currently does not model the dif-ference between tendered loads (loads offered to thecompany that may be refused), committed loads (ten-dered loads that the carrier has made a commitmentto move), and contracted loads (loads that are offeredto the carrier under a standing contract). Our goalis to use the model to identify good policies for mak-ing commitments to loads as they are tendered, takinginto consideration the state of the system. Once thispolicy is in place, the next goal would be to determinehow to evaluate customer contracts as a foundationfor determining contractual commitments. We are notaware of any existing technology that can evaluateloads in the presence of driver management issues.

This project has offered an important insight intothe process of implementing optimization modelsfor operational problems. The research communityhas traditionally focused on developing optimizationmodels that produce the best possible solutions, pre-sumably better than what can be achieved by a com-pany. Our experience with this and other similarprojects is that the first and most important goal isto produce a model that calibrates against history. Ofparticular importance was the ability of the modelto handle a high level of detail, allowing the modelto accurately represent hours-of-service rules, detailedservice commitments, and complex rules governingdriver relays and foreign drivers. Only after the modelproved to be realistic did the carrier begin to believethe results. Perhaps the most remarkable conclusionwas that an optimization model that used optimalsolutions at a point in time and near-optimal solu-tions over time accurately reproduced (at an aggregatelevel) the performance of a well-run company.

ReferencesBertsekas, D., J. Tsitsiklis. 1996. Neuro-Dynamic Programming.

Athena Scientific, Belmont, MA.

Caliskan, C., R. W. Hall. 2003. A dynamic empty equipment andcrew allocation model for long-haul networks. TransportationRes. Part A 5 405–418.

Chiu, Y., H. S. Mahmassani. 2002. Hybrid real-time dynamic trafficassignment approach for robust network performance. Trans-portation Res. Record 1783 89–97.

Crainic, T., J. Ferland, J.-M. Rousseau. 1984. A tactical planningmodel for rail freight transportation. Transportation Sci. 18165–184.

Desaulniers, G., J. Desrosiers, M. Gamache, F. Soumis. 1998. Crewscheduling in air transportation. T. G. Crainic, G. Laporte, eds.Fleet Management and Logistics. Kluwer Academic Publishers,Norwell, MA, 169–185.

Desrosiers, J., M. Solomon, F. Soumis. 1995. Time constrainedrouting and scheduling. C. Monma, T. Magnanti, M. Ball,eds. Handbook in Operations Research and Management Science,Volume on Networks. North Holland, Amsterdam, 35–139.

Gendreau, M., J. Y. Potvin. 1998. Dynamic vehicle routing anddispatching. T. Crainic, G. Laporte, eds. Fleet Managementand Logistics. Kluwer Academic Publishers, Norwell, MA,115–126.

Gendreau, M., F. Guertin, J. Potvin, E. Taillard. 1999. Parallel tabusearch for real-time vehicle routing and dispatching. Trans-portation Sci. 33 381–390.

George, A. 2005. Optimal learning strategies for multi-attributeresource allocation problems. Ph.D. thesis, Princeton Univer-sity, Princeton, NJ.

George, A., W. B. Powell. 2006. Adaptive stepsizes for recursiveestimation with applications in approximate dynamic pro-gramming. Machine Learn. 65 167–198.

George, A., W. B. Powell, S. Kulkarni. 2005. Value func-tion approximation using hierarchical aggregation for multi-attribute resource management. Technical report, Departmentof Operations Research and Financial Engineering, PrincetonUniversity, Princeton, NJ.

Ichoua, S., M. Gendreau, J.-Y. Potvin. 2006. Exploiting knowledgeabout future demands for real-time vehicle dispatching. Trans-portation Sci. 40 211–225.

Kleywegt, A., V. S. Nori, M. W. P. Savelsbergh. 2004. Dynamic pro-gramming approximations for a stochastic inventory routingproblem. Transporation Sci. 38 42–70.

Larsen, A., O. B. G. Madsen, M. M. Solomon. 2002. Partiallydynamic vehicle routing—Models and algorithms. J. Oper. Res.Soc. 53 637–646.

Marar, A. 2002. Information representation in large-scale resourceallocation problems: Theory, algorithms and applications.Ph.D. thesis, Princeton University, Princeton, NJ.

Marar, A., W. B. Powell. 2004. Using static flow patterns intime-staged resource allocation problems. Technical report,Department of Operations Research and Financial Engineering,Princeton University, Princeton, NJ.

Marar, A., W. B. Powell, S. Kulkarni. 2006. Capturing expert knowl-edge in resource allocation problems through low-dimensionalpatterns. IIE Trans. 38 159–172.

Powell, W. B. 1988. A comparative review of alternative algo-rithms for the dynamic vehicle allocation problem. B. Golden,A. Assad, eds. Vehicle Routing6 Methods and Studies. NorthHolland, Amsterdam, 249–292.

Powell, W. B. 2007. Approximate Dynamic Programming6 Solving theCurses of Dimensionality. John Wiley & Sons, New York.

Powell, W. B., P. Jaillet, A. Odoni. 1995. Stochastic and dynamic net-works and routing. C. Monma, T. Magnanti, M. Ball, eds. Hand-book in Operations Research and Management Science, Volume onNetworks. North Holland, Amsterdam, 141–295.

Powell, W. B., J. A. Shapiro, H. P. Simão. 2001. A representationalparadigm for dynamic resource transformation problems.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.


R. F. C. Coullard, J. H. Owens, eds. Annals of OperationsResearch. J. C. Baltzer AG, Basel, Switzerland, 231–279.

Powell, W. B., T. T. Wu, A. Whisman. 2004. Using low dimensionalpatterns in optimizing simulators: An illustration for the airliftmobility problem. Math. Comput. Model. 29 657–2004.

Psaraftis, H. 1995. Dynamic vehicle routing: Status and prospects.Ann. Oper. Res. 61 143–164.

Regan, A., H. S. Mahmassani, P. Jaillet. 1998. Evaluation of dynamicfleet management systems—Simulation framework. Transporta-tion Res. Record 1648 176–184.

Secomandi, N. 2000. Comparing neuro-dynamic programmingalgorithms for the vehicle routing problem with stochasticdemands. Comput. Oper. Res. 27 1201–1225.

Secomandi, N. 2001. A rollout policy for the vehicle routing prob-lem with stochastic demands. Oper. Res. 49 796–802.

Spivey, M. J. 2001. The dynamic assignment problem. Ph.D. thesis,Princeton University, Princeton, NJ.

Spivey, M., W. B. Powell. 2004. The dynamic assignment problem.Transportation Sci. 38 399–419.

Sutton, R., A. Barto. 1998. Reinforcement Learning. The MIT Press,Cambridge, MA.

Tapiero, C., M. Soliman. 1972. Multicommodities transportationschedules over time. Networks 2 311–327.

Taylor, G., T. S. Meinert, R. C. Killian, G. L. Whicker. 1999. Develop-ment and analysis of alternative dispatching methods in truck-load trucking. Transportation Res. Part E 35 191–205.

Tjokroamidjojo, E., G. T. Kutanoglu. 2001. Quantifying the valueof advance load information in truckload trucking. Technicalreport, University of Arkansas, Fayetteville.

Topaloglu, H., W. B. Powell. 2006. Dynamic programming approx-imations for stochastic, time-staged integer multicommodityflow problems. INFORMS J. Comput. 18 31–42.

White, W. 1972. Dynamic transshipment networks: An algorithmand its application to the distribution of empty containers.Networks 2 211–236.

Yang, J., P. Jaillet, H. Mahmassani. 2004. Real-time multivehicletruckload pick-up and delivery problems. Transportation Sci. 38135–148.

Copyright:

INF

OR

MS

hold

sco

pyrig

htto

this

Articlesin

Adv

ance

vers

ion,

whi

chis

mad

eav

aila

ble

toin

stitu

tiona

lsub

scrib

ers.

The

file

may

notb

epo

sted

onan

yot

her

web

site

,inc

ludi

ngth

eau

thor

’ssi

te.

Ple

ase

send

any

ques

tions

rega

rdin

gth

ispo

licy

tope

rmis

sion

s@in

form

s.or

g.

Date post:	20-Mar-2020
Category:	Documents
Upload:	others
View:	8 times
Download:	0 times

y ma .org. An Approximate Dynamic Programming Algorithm ... · An Approximate Dynamic Programming...

Documents