Decentralized optimization of last-mile delivery services with non-cooperative bounded rational...

Ann Oper ResDOI 10.1007/s10479-014-1647-x

Decentralized optimization of last-mile delivery serviceswith non-cooperative bounded rational customers

Yezekael Hayel · Dominique Quadri · Tania Jiménez ·Luce Brotcorne

© Springer Science+Business Media New York 2014

Abstract The goal of this paper is to introduce bounded rational behaviors in a competitivequeuing system. Furthermore, we propose a realistic queuing model for two last-mile deliveryservices in which consumers are in competition. This work is derived from a real-worlde-commerce application. We study the problem using a game theoretical point of view: thee-consumers are interacting through the last-mile delivery service system creating congestionfor each other. Specifically, we focus our analysis on several equilibrium concepts fromcongestion/routing games: Wardrop and Logit equilibria. The difference in these equilibriumconcepts is on the rationality level of players in the game. We are able to prove the existenceand uniqueness of both equilibria. We compare them through a new metric called the Price ofRationality and we also compare each one to the social optimum solution through the Priceof Anarchy. Some numerical results are presented in order to illustrate the theoretical resultsobtained.

Keywords Queueing systems · Game theory · Rationality

Y. Hayel (B) · T. JimenezLIA/CERI, Avignon, Francee-mail: [email protected]

T. Jimeneze-mail: [email protected]

D. QuadriLRI, Paris, Francee-mail: [email protected]

L. BrotcorneINRIA, Lille, Francee-mail: [email protected]

123

Ann Oper Res

1 Introduction

Nowadays, the commerce on the Internet, also called e-commerce, has become more andmore attractive and it is now widely used. From the JP Morgan’s annual Internet Investmentguide, the annual growth rate of e-commerce sales is about 20 %, to reach $963 billion in 2013.This implies important new logistic centers as well as tools for managing all the purchaseson the web.

A key issue for internet sellers is the delivery service design (Agatz et al. 2008). Indeedthe choice of delivery service sharply influences the customers satisfaction (Boyer and Hult2005). Two main services are defined: home delivery and customer’s pickup (Rabinovichand Bailey 2004). The home delivery service consists in delivering the parcels directly to thecustomer’s home [we call this service: Delivery at Home (DaH)], whereas the customer’spickup consists in going to pick up the parcels at a site (also called Relay Station) close tothe consumer’s home or work [we call this service: Relay Station Service (RSS)]. In this casethe last-mile is performed by the consumer. The RSS choice is constrained by the capacity interms of number of packets at the parcel outlet: when parcel outlets are full they must refusepackages. This last-mile delivery service option is usually free but the last-mile deliveryservice time is longer than option DaH. The choice DaH can be more convenient for most ofthe consumers since the package will be delivered in less time, but it is not free.

In this paper we consider a last-mile delivery problem in which consumers can be deliveredusing DaH or RSS. We model the tradeoff between these two last-mile delivery services withtwo theoretical frameworks:

The first one is queuing models (Cooper 1981). On one hand, we decide to model theRSS by a limited capacity queue, known as the Erlang-B model, which is well adapted tothis situation. On the other hand, the delivery at home service (DaH) is modeled through aM/D/1 queue with a simple First-In-First-Out discipline and a constant service rate. In fact,the DaH service is built to deliver a given number of packages per unit of time.

The second framework used in this paper is derived from Game Theory. Each consumerdetermines his last-mile delivery service by optimizing his own objective function. Thedecision of each consumer impacts indirectly the decision of the others as the average costdepends on the demand. We thus consider a non-cooperative framework with a very largenumber of players. Then, we use the concept of Wardrop equilibrium (Wardrop 1952) whichis somewhat adapted better to fit routing games compared to standard equilibrium conceptsas that of Nash (Fudenberg and Tirole 1991). The main difference with a standard non-cooperative game with finite number of players is the impact of an individual on the otherplayers: In the Wardrop context, a unique individual has a negligible impact on the actions ofthe other players. More specifically, a deviation of a large number of players has an influenceon the objective functions of the others. Particularly, Wardrop equilibrium has been originallyproposed in transportation networks (see the book on traffic assignment problems Patriksson1994) where the players are vehicles that interact through their travel path (see Dafermos andSparrow 1969). The Wardrop’s main assumption is that each player (vehicle) is a decisionmaker and minimizes his travel cost.

The Wardrop equilibrium is based on the following two principles : (1) “The journey timesin all routes actually used are equal and less than those which would be experienced by asingle vehicle on any unused route.” and (2) “At equilibrium the average journey time is aminimum.” (see Patriksson 1994)

The first one implies that each user non-cooperatively seeks to minimize his cost oftransportation. The second principle implies that no user has any benefit to individuallydeviate from the equilibrium.

123

Ann Oper Res

A user optimal situation is an equilibrium situation in the sense that no traveler canreduce his travel cost by unilaterally choosing another route. Then, a user optimal situation ischaracterized by the same average travel cost on all routes actually used. But, in this situation,the total travel time is generally worse than in the situation where a centralized entity controlsthe traffic. The latter leads to the notion of system optimal situation in which the society, byimposing route choices or by charging them tolls, guide users towards an optimal utilizationof the traffic network, i.e. to minimize the total travel time.

In the first part of our study, we show the existence and uniqueness of a Wardrop equilib-rium for our non-cooperative delivery game.

The main assumption in non-cooperative game models, including Wardrop, is to considerthat players are totally rational. It means that they react to the action of the others by applyingtheir best response, i.e. by choosing the action which optimizes their own objective function.This assumption is very strong and not so realistic in several contexts. For example it doesnot take into account errors or non-rationality of the players. This rationality characteristicdetermines how consumers make their decisions based on their information on the system.The main information observed by players is the expected utility, defined as the usefulnessof the consequences of their actions. One main source of non-rationality which contradictsthe assumption is errors in the expected utility.

There exists a concept of equilibrium that takes into account errors on user’s expected util-ity, it is the quantal response equilibrium (QRE) (Palfrey and McKelvey 1995). When usingthis concept, we consider random errors and we assume that the players decide their actionbased on this biased expected utility, we say that the players are bounded rational (Gigerenzerand Selten 2002). The most famous QRE is the Logit equilibrium which is defined when theerrors follow a bell distribution (Anderson et al. 2002).

In a second part of this paper, we assume that the consumers may experience error in theirexpected utility which can be due to external factors, and then we study the Logit equilibrium.We show also the existence and uniqueness of this Logit equilibrium.

We then theoretically compare both equilibria. In addition, in routing games it is interestingto use a metric that shows the gap in performance of the system relative to the equilibriumemployed. Usually, the price-of-anarchy is used. Nevertheless, as mentioned later in thispaper, this metric is not realistic in our context. We then propose to apply a more adaptedmetric: the price-of-rationality—which has never been applied to routing games, to the bestof our knowledge.

The contributions of this paper are the following:

– A simple model based on queuing theory and game theory that introduces the concept ofbounded rational behavior in strategic queuing systems.

– A theoretical study of Wardrop equilibrium and Logit equilibrium: proof of both existenceand uniqueness.

– A comparison of the two equilibria above as the Wardrop equilibrium serves as a bench-mark.

– The use of the price of rationality to measure the gap in the performance of the systemwhen using the Wardrop equilibrium or the Logit equilibrium.

– A numerical illustration.

The paper is organized as follows: In Sect. 2 we describe the problem and the mathematicalframeworks we used to model the last-mile delivery service network. Then, the standardconcept of Wardrop equilibrium is studied as a benchmark in Sect. 3. Thereafter, we introducebounded rational behaviors of the users through the Logit equilibrium model in Sect. 4. Wecompare the equilibria by introducing the notion of Price of Rationality (PoR) in Sect. 5.

123

Ann Oper Res

We show in Sect. 6 some numerical illustrations and finally we conclude and give someperspectives of our work in Sect. 7.

2 Statement of problem

We consider a last-mile delivery service network problem. The consumer decides how to beserved between two last-mile delivery service systems: The first option is to let the trans-portation company deposit the packet to a parcel outlet that serves as a relay. Then, the enduser can decide when he wants to take the packet by himself. We call this option hereafterRSS. The second option is to ask the transportation company to deliver the packet directlyto the home. This second solution is more efficient in terms of delay but more expensive. Weuse the name “delivery at home” (DaH) when referring to this service.

We model the system as follows: The customer’s demand per time unit (day or week)follows a Poisson distribution with rate λ. The relay station is modeled through a M/M/K/Kqueue where K is the capacity of the relay, i.e. the number of packets that can be stored,waiting to be picked up by consumers. We assume that each packet occupies one storageunit (before being picked up by the consumer) during a random duration which follows anexponential distribution with parameter μ. All these durations are independent and identicallydistributed. We do not consider the time it takes to deliver the packet to the relay station.We consider that the most important for a customer when he/she chooses this option, is tobe delivered in the chosen relay (which is usually close to his/her house or his/her office).If a consumer decides that his/her packet has to be delivered directly at home, he/she has topay a fixed charge of q . Moreover, the transportation company can treat a maximum numberof packets 1/D per unit of time. In other words, D is the time it takes to the transportationcompany in order to deliver one packet. We consider that this time is constant as it is related tothe delivery capacity of the vehicles used by the transportation company. We could considerthat this time depends also of some exogenous random conditions (traffic density, drivers,etc.) and then we should consider an M/G/1 queue. In order to keep the analysis simple andas clear as possible, we decide to keep the M/D/1 model. By the way, the main concepts andresults of this paper are related to the introduction of bounded rational behaviors in strategicqueuing systems. Particularly, for our application related to last-mile delivery services, weare interested in understanding the impact of the delivery capacity of the transportationcompany on the performance of the system. That is why a deterministic service time is welladapted to our model. Thus, this second option is modeled using an M/D/1 queue with aFirst-In-First-Out (FIFO) service discipline.

We study our non-cooperative game by considering that the decision process of each useris stochastic, meaning that we assume mixed strategies (Fudenberg and Tirole 1991) playedby each user. Then, in this context, we look for a mixed equilibrium. In fact, each user isfaced with two possible actions: RSS and DaH. We denote by p the probability that eachend user decides to use the RSS for last-mile delivery service. Note that a mixed equilibriump∗, meaning that each individual will choose action RSS with probability p∗, is exactlyequivalent to a system in which a proportion p∗ of individuals uses action RSS and all theothers DaH. Then, a mixed equilibrium has a realistic meaning because it corresponds to adeterministic decision process.

Then, considering this mixed strategy concept with probability p, the incoming rate forthe RSS (resp. the DaH) follows a Poisson process with rate λp [resp. λ(1 − p)]. If the enduser decides to use the action RSS, the cost he/she has to pay is C if the packet is rejected dueto the fullness of the queue, otherwise its cost is zero. This cost may come from a situation

123

Ann Oper Res

in which the packet is rerouted to another relay station far away from the location of thecustomer. In this case, the customer may incur travel expenses or frustrating cost, and wemeasure this by the blocking cost C . Then the cost, denoted by Cr (p), for a user that choosesaction RSS, is a discrete random variable in the set {0, C}. We have:

Cr (p) ={

C if packet is rejected,

0 otherwise.

As we model the relay station as an Erlang-B model, a packet is rejected with probabilityΠ(λ, p, K , μ). Then the average cost C̄r (p) incurred by the end user that chooses actionRSS is:

C̄r (p) = Π (λ, p, K , μ) C,

where Π(λ, p, K , μ) =(ρp)K

K !∑Ki=0

(ρp)ii !

is the blocking probability, also known as the Erlang-B

formula and ρ = λμ

. Then, we have:

C̄r (p) = C

(λpμ

)K

K !∑K

i=0

(λpμ

)i

i !

.

If the end user decides to choose DaH, he/she has to pay a fixed charge q and incurs a costwhich is proportional to the waiting time W (p) for the delivery of its packet. Then the costCl(p) for choosing this action is given by:

Cl(p) = q + A × W (p),

where A is the fixed charge for a unit waiting time. This second action is modeled through aM/D/1 queue so that, when the queue is stable (when λ(1 − p)D < 1), the average waitingtime W̄ (p) is:

W̄ (p) = D + λ(1 − p)D2

2 (1 − λ(1 − p)D).

Finally, it yields to the following average cost for the action DaH:

C̄l(p) = q + A × W̄ (p),

= q + A ×(

D + λ(1 − p)D2

2 (1 − λ(1 − p)D)

).

Then, the average cost for a consumer that chooses to be served at a relay station withprobability p is given by:

C̄(p) = pC̄r (p) + (1 − p)C̄l(p)

= pC

(λpμ

)K

K !∑Ki=0

(λpμ

)i

i !+ (1 − p)

(q + A

(D + λ(1 − p)D2

2 (1 − λ(1 − p)D)

)). (1)

From a centralized point of view, it is possible to control the system such that only aproportion p∗ of individuals chooses the delivery to the relay station. The value p∗ is definedsuch that:

p∗ = arg minp∈[0,1]C̄(p). (2)

123

Ann Oper Res

The solution p∗ is called the system optimal flow in traffic assignment problems (Patriksson1994). This solution is generally different from the equilibrium, which is the user optimalflow. Considering this system optimal flow, the system does not let each individual to choosehis delivery option.

3 A standard equilibrium concept in queuing systems: the Wardrop equilibrium

In a non-cooperative setting, we look for an equilibrium between the consumers. As the num-ber of consumers can be very large, we consider the concept of Wardrop equilibrium (Dafer-mos and Sparrow 1969). Wardrop equilibrium is a state of art concept in queuing systems.Consequently, we start studying Wardrop equilibrium in order to provide a benchmark.

Based on the first principle of Wardrop, the decision of a single user has no influence onthe total population of players (customers). Moreover, we can define here the average costof an individual that decide to be served at a relay station with probability p for a populationprofile p′. A population profile determines the proportion of individuals inside a populationof players that decide to be served at a relay station. This average cost is given by

C̄(p, p′) = pC̄r (p′) + (1 − p)C̄l(p′).

We denote the best-response function as:

p(p′) = arg minp

C̄(p, p′).

This function is important because it can determine if our system has the Avoid the Crowd(AtC) or the Follow the Crowd (FtC) properties which are important in competitive queuingsystems. In fact, if a model is AtC (resp. FtC) then it has at most one equilibrium (resp.multiple equilibria) (Hassin and Haviv 2003). A system has the AtC (resp. FtC) propertyif the best-response function is monotone decreasing (resp. increasing). A strategy pw is aWardrop equilibrium in our setting if and only if:

pw = arg minp

C̄(p, pw).

Given the expression of the average cost, a strategy pw is an equilibrium if and only if:

C̄r (pw) = C̄l(pw).

This last equality is in accordance with the first Wardrop principle that says that the costson each link used by travelers are equal, at the equilibrium situation. We have the followingtheorem which determines the equilibrium of the system depending on the price q .

Theorem 1 For any K , C, A, D, λ and μ, the Wardrop equilibrium pw exists and is unique.It is given by:

– if q >C ρK

K !∑Ki=0

ρii !

− AD, then pw = 1,

– else pw is the unique solution of the following equation:

C (ρp)K

K !∑Ki=0

(ρp)i

i != q + A

(D + λ(1 − p)D2

2(1 − λ(1 − p)D)

),

with ρ = λμ

.

123

Ann Oper Res

Before proving this theorem, let us make the following two remarks.

Remark 1 It is possible to obtain a trivial equilibrium (pw = 1) where all the players decideto use the delivery relay. Note that the situation in which all users choose to receive deliveryat home is never an equilibrium, whatever the variables of the system are (charges, cost, etc).This result comes from the fact that the relay station is the best action if it is not saturated,meaning if packets are not blocked. Then, if all users decide to be served at home, one newuser has interest to choose the relay, because its packet will not be blocked and its cost istherefore the minimum one.

Remark 2 If a non-trivial equilibrium pw exists, i.e. pw ∈]0, 1[, then it is possible to find it bysolving a polynomial with degree K +1 written in the following general form : aK+1 pK+1 +aK pK + · · · + a1 p + a0 = 0 where aK+1, aK , . . . , a1, a0 depend on λ, D, C, K , ρ. Thispolynomial comes from the denominator of the Erlang-B formula Π(λ, p, K , μ) after somealgebra.

Proof of Theorem 1 We denote the following C∞-functions:

F(p) = C

(λpμ

)K

K !∑K

i=0

(λpμ

)i

i !

, and G(p) = q + A

(D + λ(1 − p)D2

2(1 − λ(1 − p)D)

).

Note that, in order to compute the best-response function, we need the following partialderivative:

∂C̄

∂p(p, p′) = F(p′) − G(p′).

We have the particular values:

F(0) = 0, and G(0) = q + A

(D + λD2

2(1 − λD)

)> 0.

Moreover, the derivatives of the two functions are expressed by:

F ′(p) = C∂Π

∂p(λ, p, K , μ) > 0,

because the Erlang blocking probability is increasing with the load ρp, and

G ′(p) = − Aλ2 D3

2(1 − λ(1 − p)D)2 < 0.

As the two functions are strictly monotone, there exists at most one intersection, denoted p0,inside the interval [0, 1]. The necessary and sufficient condition for this is:

F(1) ≥ G(1),

which is equivalent to:

C ρK

K !∑Ki=0

ρi

i !≥ q + AD. (3)

If this condition is true, then the unique equilibrium is pw the unique solution of F(p) =G(p). In fact, the other possible equilibria are {0, 1}, so let us check that none of them is

123

Ann Oper Res

an equilibrium. In fact, our system has the AtC property, because if p′ < p0 then F(p′) −G(p′) < 0 and the best response is p(p′) = 1. If p′ > p0, then F(p′) − G(p′) > 0 and thebest response is p(p′) = 0. Thus the best-response function is the following step function:

p(p′) ={

1 if p′ < p0,

0 otherwise.

Then, as our system has the AtC property, none of the mixed strategies 0 and 1 can be anequilibrium, and our system has a unique mixed equilibrium if the condition is verified.

Finally, if the condition 3 is not satisfied, namely q + AD >C ρK

K !∑Ki=0

ρii !

then for all p ∈ [0, 1]we have that F(p) < G(p). Thus, if the condition is not satisfied, the equilibrium is p = 1,i.e. all the users will decide to join the first system, i.e. to be served through the relay station.

��

We have proved with the previous theorem that if we assume a best-response strategyof each end user in this delivery system, there always exists a unique equilibrium. We haveshown moreover, that this equilibrium can be trivial and verifies the AtC property dependingon the parameters of the system. Precisely, we observe that the trivial equilibrium for whichall the end users decide to be delivered at the relay station is obtained if the price q for thehome delivery service is too large. This result will be observed in the numerical illustrationsin this article.

4 Discrete choice model: bounded rationality in queueing systems

In this section, we assume another behavior for the end users. Since the assumption ofrationality of each consumer appears very strong in real world, we suggest to study a secondequilibrium concept that allow us to consider that rationality of the consumers can be biased.Indeed, this kind of error can be due to distractions, perception biases of each individual,etc. It is more accurate to consider that individuals can make errors in their decisions. Theyare then bounded rational. Mathematically speaking, in order to incorporate such bias inuser’s perception, we add a disturbance (also called error) ε to the average costs, i.e. C̄r + ε

and C̄l + ε. Then, the decision process of each individual takes into account a realizationof the random variable ε. This has a big impact on the optimal decision for each user, andalso on the equilibrium concept underlined. Depending on the nature of the error ε, which isassumed to be independent and identically distributed for all users, we are faced with differentdiscrete choice models (DCM) (Ben-Akiva and Lerman 1985). Each user determines the bestalternative for himself (which minimizes his perceived average cost) depending on the averagecosts of each one of the alternatives (which are constants) and also on the error which is arandom variable. Thus, knowing the error distribution, it is sometimes possible to determineexplicitly the probabilities associated to each alternative, depending only on the averagecosts.

In the literature, and specifically in transport, DCM are widely used to model the choiceof the travelers (MacFadden and Domencich 1996). The two following models are mainlyproposed:

– Probit: If the error follows a Gaussian distribution, the DCM is called the Probit model.In this case, it is difficult to get explicitly the expressions of the probability for eachalternative to be chosen by the decision maker.

123

Ann Oper Res

– Logit: If the error follows a Gumbel distribution, the DCM is called the Logit model (Mac-Fadden 1974). A Gumbel distribution is characterized by a location parameter η and ascale parameter γ > 0. The cumulative distribution function Fg of the Gumbel distribu-tion is:

Fg(ε) = e−e−γ (ε−η)

.

In this case, the probability p for any user to choose the RSS action is explicitly known (theresult is basically based on two properties: (1) the maximum of n independent Gumbelvariables is also a Gumbel variable with known parameters and (2) the difference betweentwo independent Gumbel distributed variables is logistically distributed). This probabilitydepends explicitly on the average costs C̄r (p) and C̄l(p) as follows:

p = e−γ C̄r (p)

e−γ C̄r (p) + e−γ C̄l (p). (4)

The scale parameter γ can represent the agent’s level of rationality. If this parameter tendsto infinity, the agents respond perfectly. On the contrary, if γ tends to 0, the agents playthe uniform mixed strategy without regarding their average costs. In fact, the probabilityp is equal to 1/2 whatever the average costs are. Then, when γ is very small, the behaviorof the users is almost non-rational. Thus, we can model different levels of rationality bychoosing a particular scale parameter γ , from non-rational users (γ = 0) to full rationalusers (γ tends to infinity, which corresponds to the Wardrop equilibrium). Finally, ingeneral, the Logit DCM has the important property that the probability distribution overthe alternatives is the only distribution probability satisfying the efficiency principle. Thisprinciple assures that the probability distribution over the alternatives verifies that, forevery number of decision makers, samples (outcomes of a decision of all the players)with lower average cost are more probable (Erlander and Stewart 1075).

We define now the equilibrium concept that is used in this situation of bounded rationality.In a general framework, saying when the error follows a general distribution, the solutionconcept is called a QRE. This notion of equilibrium has been proposed in (MacFadden1976). When the error follows a Gumbel distribution, the equilibrium concept is called aLogit equilibrium. It has been proved in the seminal paper (Palfrey and McKelvey 1995)which introduces the Logit equilibrium, that this concept fits experimental data better thanthe Nash equilibrium model of perfectly rational players.

Definition 1 A Logit equilibrium with any given γ > 0 is a mixed strategy vector pl thatsolves:

pl = e−γ C̄r (pl )

e−γ C̄r (pl ) + e−γ C̄l (pl ). (5)

Based on the previous Definition 1, we observe that the Logit Equilibrium pl is the solutionof the following equation pl = Br(pl). The function Br(·) is called in the literature the Logitbest response function. The problem of existence and uniqueness of the Logit equilibriumis very important for our problem. The following theorem establishes the existence and theuniqueness of the Logit equilibrium in our context.

Theorem 2 For any K , C, A, D, q, λ, γ and μ, the Logit equilibrium pl exists.

123

Ann Oper Res

Proof of Theorem 2 The Logit equilibrium p̃∗ is given by the solution of the fixed pointequation:

p = e−γ C̄r (p)

e−γ C̄r (p) + e−γ C̄l (p).

This equation is equivalent to:

H(p) := p(

e−γ C̄r (p) + e−γ C̄l (p))

= e−γ C̄r (p) := I (p).

These two functions are C∞ and we have:

H(0) = 0 and I (0) = e−γ C̄r (0) > 0.

Moreover, we have:

H(1) = e−γ C̄r (1) + e−γ C̄l (1) and I (1) = e−γ C̄r (1) < H(1).

Then the Logit equilibrium exists. This result can be also proved directly using the Brouwer’sfixed point theorem (Brouwer 1911) as the Logit response function is continuous and insidethe compact interval [0, 1]. ��

We prove in the next theorem that the Logit equilibrium is unique.

Theorem 3 For any K , C, A, D, q, λ, γ and μ, the Logit equilibrium γ l is unique.

Proof of Theorem 3 The Logit equilibrium is the probability p̃ which verifies:

pl = e−γ C̄r (pl )

e−γ C̄r (pl ) + e−γ C̄l (pl ).

After some manipulations, this equality becomes:

C̄r (pl) + 1

γlog

(pl

1 − pl

)= C̄l(pl).

We have proved in the previous section that the function C̄r (·) is strictly increasing and thefunction C̄l(·) is strictly decreasing. Moreover, we have the following derivative:

log

(p

1 − p

)′= 1

p(1 − p)> 0.

Then, the left-hand side of the equation is strictly increasing and the right-hand side is strictlydecreasing. Thus, if the two curves cross, they cross only at a signal point, that means thatthe Logit equilibrium is unique. ��

5 Comparison of the equilibria: the price of rationality

At this stage of reasoning we have proved, in our application context, that both equilibriaWardrop (assumption of totally rationality) and Logit (assumption of bounded rationality)exist and are unique. Now, let us compare theoretically both equilibria. To do so, let usestablish the following proposition:

123

Ann Oper Res

Proposition 1 Considering our delivery system, we have the following relation between theproportion of consumers that decide to be served at home when the consumers are fullyrational compared with the case where the consumers have bounded rationality:

– if pw ≤ 1/2 then pw ≤ pl ≤ 1/2,– else pl < pw .

Proof We first determine the relation between the following functions: C̄r (p), C̄l(p) and

C̄r (p) + 1γ

log(

p1−p

). We have proved in the previous section that the interior Wardrop

equilibrium pw ∈]0, 1[, when it exists, is the unique solution of the equation:

C̄r (p) = C̄l(p).

The interior Logit equilibrium pl ∈]0, 1[ is the unique solution of the following equation:

C̄r (p) + 1

γlog

(p

1 − p

)= C̄l(p).

Moreover, if p ∈]0, 12 [ we have that 1

γlog

(p

1−p

)< 0 which induces that, geometrically, if

the Wardrop equilibrium pw < 12 then we have that pl < pw because C̄r (p)+ 1

γlog

(p

1−p

)<

C̄r (p). With the same analysis, we have that if pw > 12 then pl > pw . ��

In other words, we have proved that the bounded rationality assumption on the behavior ofthe consumers implies that the equilibrium is more concentrated in a way. More specifically,this result shows that the more uncertainty is introduced in the system, the more the equilibriaare equivalent.

There exists a metric in routing games which measures the gap of performance of a systemdue to the decentralization. This metric is called the price-of-anarchy (PoA) and has beendefined in (Roughgarden 2005). It is well known that the non-cooperative behavior of theindividual induces a loss of efficiency in a decentralized system. The efficiency is measuredin terms of average cost, meaning that when each individual decides by himself, minimizinghis own average cost, the equilibrium reached gives a higher cost compared to the centralizedpoint of view. The latter case means that a controller determines the proportion of individualswho choose to use the DaH or the other option in order to minimize the average cost of oneindividual. This remark is counter intuitive in general game theoretical problems becauseit means in a way that letting freedom of choice to individual reduces performance of thesystem. But the same type of phenomena is observed in strategic queuing problems.

In order to measure this gap of performance, i.e. improvement of average cost, it has beendefined the “Price of Anarchy” which is given by:

PoA = maxp∈Eq C̄(p)

minp∈[0,1] C̄(p),

where Eq is the set of equilibrium. As in our setting we consider two rationality models, fulland partial rationality, and the equilibrium concepts induced, we define two types of price ofanarchy depending on this level of rationality. We denote by PoAw (reps. PoAl ) the priceof anarchy considering the full rationality model and thus the Wardrop equilibrium concept(resp. partial rationality and the Logit equilibrium concept). Then we get:

PoAw = maxp∈W C̄(p)

minp∈[0,1] C̄(p)

(resp. PoAl = maxp∈L C̄(p)

minp∈[0,1] C̄(p)

), (6)

123

Ann Oper Res

where W is the set of Wardrop equilibrium (resp. L is the set of Logit equilibrium). We haveproved the uniqueness of the equilibrium hence we are able to simplify the expression of thePoA as follows :

PoAw = C̄(pw)

minp∈[0,1] C̄(p)and PoAl = C̄(pl)

minp∈[0,1] C̄(p).

However the centralized point of view is hardly applicable in our setting. Indeed, weassume that it is very important for the consumer to have the possibility to choose theirdelivery option. Instead of comparing the equilibrium solution to the centralized one, wecompare the two equilibria, in order to evaluate the gap of performance assuming full orpartial rationality. In this way, we introduce a new metric: the “Price of Rationality”, a metricwhich has never been applied to routing games, to the best of our knowledge. The Price ofRationality is better adapted in our context because it does not constrain consumers to choosea service. This ratio measures the gap, in terms of average cost, depending on the level ofrationality, i.e. the Wardrop or the Logit equilibrium considered.

Definition 2 The “Price of Rationality” is defined by:

PoR = maxp∈L C̄(p)

maxp∈W C̄(p). (7)

The Price of Rationality is related to the Price of Anarchy, as it is the ratio of the PoA

considering partial rationality with the PoA with full rationality, i.e. PoR = PoAl

PoAw . In oursetting, we have proved that both equilibria, Wardrop and Logit are unique, so that we cansimplify the expression of the PoR by:

PoR = C̄(pl)

C̄(pw).

Proposition 2 In our setting, there are the following bounds on the Price of Rationality:

– if pw < 1/2, then PoR ∈] C̄l (pl )

C̄l (pw),∞[,

– if pw > 1/2, then PoR ∈ [0,C̄l (pl )

C̄l (pw)[,

– else pw = 1/2 then PoR = 1.

Proof The Price of Rationality is expressed by:

PoR = C̄(pl)

C̄(pw)= pl C̄r (pl) + (1 − pl)C̄l(pl)

pwC̄r (pw) + (1 − pw)C̄l(pw).

But, we proved the following relation for the Logit equilibrium:

C̄r (pl) + 1

γlog

(pl

1 − pl

)= C̄l(pl).

Also, we have for the Wardrop equilibrium:

C̄r (pw) = C̄l(pw).

Then, these two equations, lead to the following expression for the PoR:

PoR =γ C̄l(pl) − pl log

(pl

1−pl

)γ C̄l(pw)

.

123

Ann Oper Res

The derivative of the PoR with respect to γ is equal to:

PoR′(γ ) =pl log

(pl

1−pl

)

γ 2C̄l2(pw)

.

Now, we separate the analysis depending on the equilibrium pw .

– If pw < 1/2, we proved in Proposition 1 that pw < pl < 1/2 and thus pl log(

pl

1−pl

)<

0. Then, the PoR is decreasing with γ and we have limγ→0 PoR(γ ) = ∞ and

limγ→∞ PoR(γ ) = C̄l (pl )

C̄l (pw). As this function is strictly decreasing we have that

PoR ∈] C̄l (pl )

C̄l (pw),∞[ if pw < 1/2.

– If pw > 1/2, we obtain the following relation pw > pl > 1/2 and thus pl log(

pl

1−pl

)>

0. As in the previous case, the PoR is then strictly increasing with γ and we havelimγ→0 PoR(γ ) = −∞ but as it is a ratio of average costs, it has to be always pos-

itive and limγ→∞ PoR(γ ) = C̄l (pl )

C̄l (pw). Due to the monotonicity of the PoR function, we

have that PoR ∈ [0,C̄l (pl )

C̄l (pw)[ if pw > 1/2.

– Finally, if pw = 1/2 then we have pl = pw = 1/2 and thus C̄(pl) = C̄(pw) whichleads to PoR = 1.

��

6 Numerical illustrations

In this section we illustrate the different theoretical results obtained in the previous sections.Specifically, we study the impact of the following two parameters which can be controlledmore or less by a system designer:

1. the maximum number of packages K which can be in the stock of the relay station; and2. the last-mile delivery service capacity 1/D.

In addition, we consider the following data for our system: C = 8, q = 5, A = 5, λ = 40.We set for all our illustrations three values for the level of rationality γ (γ ∈ {1; 10; 50}).When γ is small the Logit response function approximates the best response function. Onthe contrary, a large γ induces a non-rational behavior, meaning that the decision follows auniform distribution without taking into account the average cost incurred.

6.1 Relay capacity

During this first set of numerical illustrations, we study the impact of the relay capacity K onthe performance of the system. Specifically, we focus on the equilibria, the Price of Anarchy(PoA) and the Price of Rationality (PoR). We first compare on Fig. 1 the two equilibria(Wardrop and Logit) with three levels of rationality (γ = 1: lowly rational, γ = 10: partiallyrational and γ = 50: highly rational). We assume that these levels represent the main generalpossibilities we can find in practice. We plot also on this figure the optimal probability p∗,the probability that minimizes the function C̄(p). First an intuitive result, as the level ofrationality decreases, the Logit response function behaves as the best response function andthen the Logit equilibrium converges to the Wardrop equilibrium. Second, we observe that

123

Ann Oper Res

2 4 6 8 10 12 14 16 18 200

0.2

0.4

0.6

0.8

1

K

Equ

ilibr

ium

pw

pl with γ=1

pl with γ=10

pl with γ=50

p*

Fig. 1 The Wardrop equilibrium, the Logit equilibria and the optimal repartition depending on the relaycapacity K

all the equilibria and also the optimal repartition increases with K . Indeed, if the capacity ofthe relay increases, as the reject probability decreases, more consumers will take this optionas it is free of charge compared to the other. In terms of rationality, we observe that the lowrational behavior has a smooth shape compared to the other rationality levels and also all theequilibria curves cross almost around the same capacity point.

The Price of Anarchy is plotted in Fig. 2 for the three levels of rationality and also forthe full rationality model which corresponds to the Wardrop equilibrium. A very interestingproperty, that we observe, is obtained when the level of rationality goes to 0, meaning thatthe consumers are less rational. We can see in this case that the PoA has two local maxima,implying also that there exists a value K ∗ such that the PoA is equal to one (the equilibrium isequal to the social optimal). This result comes from the shape of the Logit equilibrium (Fig. 1).We observe that when K is higher than K ∗ the proportion of consumers that choose the RSSoption for the centralized point of view, namely p∗, is higher than the Logit equilibriumpl when γ = 1. This result is intuitive because as consumers are more and more non-rational, they make their decisions in a totally random manner (with full non-rationality,each individual chooses a last-mile delivery service option with probability 0.5). Anotherimportant remark is that the Price of Anarchy is bounded, and in our setting, the maximumloss of average cost is 25 % when we let the consumers decide by themselves their last-miledelivery service option.

In terms of Price of Rationality, we observe in Fig. 3 that there is also a gap between fullyrational behavior and non-rational behavior. Very interestingly, we observe that a non-rationalbehavior may lead to a lower average cost at equilibrium compared to a full rational model.This observation is very difficult to explain and it is closely related to well-known paradoxesthat we find in game theoretical problems. For example, the most famous one is the Braess

123

Ann Oper Res

2 4 6 8 10 12 14 16 18 201

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

K

Pric

e of

Ana

rchy

PoAw

PoAl with γ=1

PoAl with γ=10

PoAl with γ=50

Fig. 2 The price of anarchy for the different levels of rationality

2 4 6 8 10 12 14 16 18 200.88

0.9

0.92

0.94

0.96

0.98

1

1.02

1.04

1.06

K

Pric

e of

Rat

iona

lity

γ=1

γ=10

γ=50

Fig. 3 The price of rationality with respect to the relay capacity K with different levels of rationality

123

Ann Oper Res

50 55 60 65 70 75 80 85 90 950.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

1/D

Equ

libriu

m

pw

pl with γ=1

pl with γ=10

pl with γ=50

p*

Fig. 4 The Wardrop equilibrium, the Logit equilibria and the optimal repartition depending on the last-miledelivery service capacity 1/D

paradox which shows that in some routing games, adding links/capacity to the system, yieldsworst performance at the equilibrium (Braess 1968). We can also cite problems where givingtoo much information to players is not efficient in terms of individual utility at equilibrium(like observable vs. non-observable queues depicted in Hassin et al. 2006). Then, on onehand, we can say that this type of paradox has already been observed in game theoreticalproblems, because considering bounded rational users and fully rational users can be viewedas giving more accurate information (error free) to players. On the other hand, consideringerror in optimization techniques in order to reach global maximum is the basic concept ofsimulated annealing techniques for example. Thus, our result is not so surprising, as it isusually observed in competitive systems.

6.2 Last-mile delivery service capacity

We now fix the capacity of the relay station to K = 6. We look for the impact of the last-miledelivery service capacity 1/D on the performances of the system, namely equilibria, Priceof Anarchy and Price of Rationality. This parameter takes values in the interval [45, 95] inorder to guarantee the stability constraint of the M/D/1 queue, i.e. λ < 1/D. We observefirst on Fig. 4 a strict convergence of the equilibrium to a limit. In this stationary regime, weobserve that around 33 % of the consumers choose the RSS option where as at the optimumit should be 12 %. The gap is close to a ratio of 3. The same phenomenon is observed for thePoA in Fig. 5 and the PoR in Fig. 6. Finally, we also observe that the PoA is bounded andthat the PoR is always higher than 1, meaning that the average cost is always higher with lowrationality compared to a system with full rationality of the consumers.

123

Ann Oper Res

50 55 60 65 70 75 80 85 90 951.04

1.05

1.06

1.07

1.08

1.09

1.1

1.11

1.12

1.13

1.14

1/D

Pric

e of

Ana

rchy

PoAw

PoAl with γ=1

PoAl with γ=10

PoAl with γ=50

Fig. 5 The price of anarchy for the different levels of rationality depending on the last-mile delivery servicecapacity

50 55 60 65 70 75 80 85 90 951

1.01

1.02

1.03

1.04

1.05

1.06

1.07

1/D

Pric

e of

Rat

iona

lity

γ=1γ=10γ=50

Fig. 6 The price of rationality with respect to the last-mile delivery service capacity 1/D with different levelsof rationality

123

Ann Oper Res

7 Conclusions

In this paper we proposed a simple queuing model that describes a last-mile delivery servicesystem with two options for the consumer: to be delivered directly at home or to store hisproduct in a pick-up station, called relay station. These two options have different costs,one monetary which is due to the service provided and another one related to a congestioneffect. Then, consumers interact in this system and we study a game-theoretical approach fordetermining the best action a consumer should choose in order to minimize his/her averagecost. We considered different levels of rationality, i.e. how players perceived their averagecosts, and we have shown the existence and uniqueness of the equilibrium concepts in oursetting. Furthermore, we compared the efficiency in terms of average costs of the game settingin which consumers can choose a centralized setting in which the choice is not proposed. Wehave also compared the different equilibria between them. In perspective, we will considerinformation to help the consumer to take the best decision. Such information can be the levelof congestion or the current occupation of the system. Those models are called observablecases in strategic problems in queuing systems. Moreover, we plan to consider a mathematicalprogramming with an equilibrium constraint (MPEC) approach which can help to determinebounds on the PoA and the PoR. Finally, a very interesting generalization of our frameworkshould be to consider the last-mile delivery service provider also as a decision maker and thenour system becomes hierarchical such that the last-mile delivery service provider determineshis parameter and the consumers react to it in an optimal manner. For example, the deliveryservice provider may control the cost q for the DaH option and/or the delivery service capacityD, in order to optimize his profit.

References

Agatz, N. A. H., Fleischmann, M., & van Nunen, J. A. E. E. (2008). E-fulfillment and multi-channel distribution:A review. European Journal of Operational Research, 187(2), 339–356.

Anderson, S. P., Goeree, J. K., & Holt, C. A. (2002). The logit equilibrium: A perspective on intuitive behavioralanomalies. Southern Economic Journal, 69(1), 21–47.

Ben-Akiva, M., & Lerman, S. (1985). Discrete choice analysis: Theory and application to travel demand.Cambridge, MA: MIT Press.

Boyer, K. K., & Hult, G. T. M. (2005). Extending the supply chain: Integrating operations and marketing inthe online grocery industry. Journal of Operations Management, 23(6), 642–661.

Braess, D. (1968). Uber ein Paradoxon aus der Verkehrsplanung, Unternehmensforschung (Vol. 12).Brouwer, L. E. J. (1911). Mannigfaltigkeiten Mathematische Annalen (Vol. 71). Berlin: Springer.Cooper, R. B. (1981). Introduction to queueing theory (2nd ed.). Amsterdam: North-Holland.Dafermos, S. C., & Sparrow, F. T. (1969). The traffic assignment problem for a general network. Journal of

Research of the U.S. National Bureau of Standards, 91–118.Erlander, S., & Stewart, N. F. (1075). The gravity model in transportation analysis: Theory and extensions.

Utrecht: VSP.Fudenberg, D., & Tirole, J. (1991). Game theory. Cambridge, MA: MIT Press.Gigerenzer, G., & Selten, R. (2002). Bounded rationality. Cambridge, MA: MIT Press.Hassin, R., & Haviv, M. (2003). To queue or not to queue. Kluwer’s international series.Hassin, R., Haviv, M. & Hassin, S. (2006). To queue or not to queue: Equilibrium behavior in queueing

systems. International Series in Operations Research and Management Science.MacFadden, D. (1974) Conditional logit analysis of qualitative choice behavior. Frontier of Econometrics.MacFadden, D. (1976). Quantal choice analysis: A survey. Annals of Economic and Social Measurement, 5,

363–390.MacFadden, D., & Domencich, T. (1996). Urban travel demand: A behavioral analysis. Mount Pleasant, MI:

The Blackstone Company.Palfrey, T., & McKelvey, R. (1995). Quantal response equilibria for normal form games. Games and Economic

Behavior, 10(1), 6–38.

123

Ann Oper Res

Patriksson, M. (1994). The traffic assignment problem: Models and methods. Topics in transportation, VS.Rabinovich, E., & Bailey, J. P. (2004). Physical distribution service quality in Internet retailing: Service pricing,

transaction attributes, and firm attributes. Journal of Operations Management, 21(6), 651–672.Roughgarden, T. (2005). The price of anarchy. Cambridge, MA: MIT Press.Wardrop, J. (1952). Some theoretical aspects of road traffic research. In Proceedings of the institution of civil

engineers (Vol. 1).

123

Date post:	25-Jan-2017
Category:	Documents
Upload:	luce
View:	212 times
Download:	0 times

Decentralized optimization of last-mile delivery services with non-cooperative bounded rational...

Documents