Dynamic pricing and inventory control with large replenishment lead … · 2018-10-04 · Dynamic...

Dynamic pricing and inventory control withlarge replenishment lead times

Xin Chen

University of Illinois at Urbana-Champaign

Joint work with Sasha Stolyar and Linwei Xin

IMA Workshop on Data-Driven Supply Chain ManagementOctober 4, 2018

Funding support: NSF, JD.com, UIUC-ZJU Institute

Model description Prior work Main result Proof sketch Conclusion

Outline

1 Model description

2 Prior work

3 Main resultAsymptotic optimality of constant-order list-price policies

4 Proof sketchThree steps in the proof

5 Conclusion


Model and notation

Single item, periodic-review, backorder model, long-run averageprofit

Unit ordering, holding and backorder costs: c, h and b

Demand Dt , γtD(pt) + βt , D(pt) strictly decreasing

{γt} i.i.d. with mean one

{βt} i.i.d. with zero mean

pt ∈ [pmin, pmax ], where pmin < pmax

Deterministic lead time L > 0


Model and notation









Model and notation









Model and notation









Model and notation









Model and notation









Model and notation









Inventory dynamics (in period t)

Inventoryreview

Itemsdelivered

Pricingdecision

Neworder

placed

Demandrealized

(It , xt ) x1,t pt qt Dt

On-hand inventory ItPipeline vector xt = (x1,t , x2,t , . . . , xL,t)

Orders already placed but not yet received

Decision variables pt ,qt

pt : pricing deicsionqt : new order placed

Inventory update:It+1 = It + x1,t − Dt , xt+1 = (x2,t , . . . , xL,t , qt)



Inventoryreview

Itemsdelivered

Pricingdecision

Neworder

placed

Demandrealized









Inventoryreview

Itemsdelivered

Pricingdecision

Neworder

placed

Demandrealized









Inventoryreview

Itemsdelivered

Pricingdecision

Neworder

placed

Demandrealized









Inventoryreview

Itemsdelivered

Pricingdecision

Neworder

placed

Demandrealized









Inventoryreview

Itemsdelivered

Pricingdecision

Neworder

placed

Demandrealized









Inventoryreview

Itemsdelivered

Pricingdecision

Neworder

placed

Demandrealized









Inventoryreview

Itemsdelivered

Pricingdecision

Neworder

placed

Demandrealized








Performance measure and optimal policy

G(x) ∆= hx+ + bx−

Profit in period t :

Ct∆= ptDt − [cx1,t + G (It + x1,t − Dt)]

Long run average profit of policy π:

C(π) , lim infT→∞

1T

T∑t=1

E [Cπt ]

Optimal long run average profit:

OPT(L) , supπ

C(π)



G(x) ∆= hx+ + bx−





1T

T∑t=1

E [Cπt ]


OPT(L) , supπ

C(π)



G(x) ∆= hx+ + bx−





1T

T∑t=1

E [Cπt ]


OPT(L) , supπ

C(π)



G(x) ∆= hx+ + bx−





1T

T∑t=1

E [Cπt ]


OPT(L) , supπ

C(π)


Literature review

First studied in Whitin (1955)

Optimality of a base-stock list price policy in the zero lead-timesetting (Federgruen and Heching (1999))

Extension to the setting with setup costs (e.g. Chen and Simchi-Levi (2004a, 2004b), Yao et al. (2007), Huh and Janakarimian(2008)

Setting with lead-timesbase-stock list price policy is no longer optimal in generalCurse of dimensionality

“. . . it remains a significant challenge to incorporate lead time intostochastic models. Indeed, the zero lead time assumption is re-quired for all the multi-period models reviewed here. . . " (Chen andSimchi-Levi 2012)


Literature review

First studied in Whitin (1955)

Optimality of a base-stock list price policy in the zero lead-timesetting (Federgruen and Heching (1999))

Extension to the setting with setup costs (e.g. Chen and Simchi-Levi (2004a, 2004b), Yao et al. (2007), Huh and Janakarimian(2008)

Setting with lead-timesbase-stock list price policy is no longer optimal in generalCurse of dimensionality

“. . . it remains a significant challenge to incorporate lead time intostochastic models. Indeed, the zero lead time assumption is re-quired for all the multi-period models reviewed here. . . " (Chen andSimchi-Levi 2012)


Literature review cont.

Selected literature: Thomas (1974), Petruzzi and Dada (1999),Agrawal and Seshadri (2000), Elmaghraby and Keskinocak (2003),Chen, Xu and Zhang (2009), Li, Lim and Rodrigues (2009), Chen,Pang and Pan (2014), Chen, Chao and Ahn (2015), Chen, Chaoand Shi (2016)

Bernstein, Li and Shang (2015): positive lead time, focusing ondesigning effective heuristics


Literature review cont.

Selected literature: Thomas (1974), Petruzzi and Dada (1999),Agrawal and Seshadri (2000), Elmaghraby and Keskinocak (2003),Chen, Xu and Zhang (2009), Li, Lim and Rodrigues (2009), Chen,Pang and Pan (2014), Chen, Chao and Ahn (2015), Chen, Chaoand Shi (2016)

Bernstein, Li and Shang (2015): positive lead time, focusing ondesigning effective heuristics


Constant-order policies

First studied in a lost-sales inventory model [Reiman (2004)]

Always order the same amount of inventory regardless of what onhands and in-transit

Example:Constant-order quantity: 100

If oh-hand=0, order 100

If oh-hand=1000, order 100


Performance in a lost-sales model

Can beat base-stock as the lead time grows [Reiman (2004)]

Surprising computational results of [Zipkin (2008)]Compare to several heuristicsConstant-order policy did surprisingly well even when L = 4


Asymptotic optimality

Lost-sales modelconstant-order is asymptotically optimal as the lead time grows[Goldberg, Katz-Rogozhnikov, Lu, Sharma, Squillante (2016)]

exponential convergence [Xin and Goldberg (2016)]

Dual-sourcing modelTailored Base-Surge policy (constant-order + base-stock) is asymp-totically optimal as the lead time difference grows [Xin and Gold-berg (2017)]


Asymptotic optimality

Lost-sales modelconstant-order is asymptotically optimal as the lead time grows[Goldberg, Katz-Rogozhnikov, Lu, Sharma, Squillante (2016)]

exponential convergence [Xin and Goldberg (2016)]

Dual-sourcing modelTailored Base-Surge policy (constant-order + base-stock) is asymp-totically optimal as the lead time difference grows [Xin and Gold-berg (2017)]


Assumptions

Assumption

The inverse function D−1 of D is continuous and strictly decreas-ing.The revenue dD−1(d) is a concave function of the expected de-mand d .dD−1(d) is Lipschitz continuous with a constant κ > 0.


Constant-order list-price policy

dmin , D(pmax), dmax , D(pmin)

Compute the best constant-order policy:

maxx∈[dmin,dmax ]

maxπp

C(πx,πp)︸︷︷︸concave in x

The best constant x∗ ∈ (dmin,dmax)

Theorem

limL→∞

OPT(L) = maxx∈[dmin,dmax ]

maxπp






maxx∈[dmin,dmax ]

maxπp



Theorem

limL→∞


maxπp






maxx∈[dmin,dmax ]

maxπp



Theorem

limL→∞


maxπp






maxx∈[dmin,dmax ]

maxπp



Theorem

limL→∞


maxπp



Proof overview

Step I: existence of a steady-stateperturbative approaches

Step II: an upper bound of the optimal valueconcavity argument

Step III: match constant-order to the upper boundvanishing discount approach


Proof overview





Proof overview





Proof overview





Proof overview





Proof overview





Step I: existence of a steady-state

LemmaWithout loss of generality, there exists a stationary measure(IL,∗, χL,∗

1 , . . . , χL,∗L

)of the Markov chain under an optimal stationary

policy, and it satisfies

OPT(L) = E[dL,∗

1 D−1(

dL,∗1

)]− cE[χL,∗

1 ]− E[G(IL,∗)] .


Step II: upper bound of OPT (L)

Use concavity to obtain an upper bound:

IL,∗ χL,∗1

. . . χL,∗L

E[IL,∗] E

[χL,∗

1

]. . . E

[χL,∗

L

]

constant-order!




IL,∗ χL,∗1

. . . χL,∗L

E[IL,∗] E

[χL,∗

1

]. . . E

[χL,∗

L

]

constant-order!




IL,∗ χL,∗1

. . . χL,∗L

E[IL,∗] E

[χL,∗

1

]. . . E

[χL,∗

L

]

constant-order!


OPT(L) is at most

E[dL,∗

1 D−1(

dL,∗1

)]− cE[χL,∗

1 ]− E[G(IL,∗)]

=1− α1− αL

L∑k=1

αk−1E

[dL,∗

k D−1(dL,∗k )− cχL,∗

1

−G

(IL,∗ +

k∑t=1

(χL,∗

t − γtdL,∗t − βt

))].


Applying Jensen’s inequalty, OPT(L) is at most

1− α1− αL

L∑k=1

αk−1E

[E[dL,∗

k |ε[k−1]]D−1(E[dL,∗

k |ε[k−1]])− cE

[χL,∗

1

]

−G

(E[IL,∗] +

k∑t=1

(E[χL,∗

t ]− γtE[dL,∗t |ε[t−1]]− βt

))],

Thus,

OPT(L) ≤ 1− α1− αL max

S∈[−S,S]Vα

L (xL,S) for each α ∈ (0,1),

andlim infL→∞

OPT(L) ≤ (1− α) lim supL→∞

maxS∈[−S,S]

VαL (xL,S)

≤ (1− α) maxS∈[−S,S]

Vα∞ (x∞,S) .


Step III: match constant-order to the upper bound

Upper bound

Total discounted profit over an infinite horizon with initial on-hand in-ventory S and constant-order x∞.

Constant-order policy

Long-run average profit under the best constant-order policy

convergence of a discounted problem to its long-run average counter-part



Upper bound







Upper bound






Schäl’s conditions

Consider the following MDP withstate space S,action spaces A(s) for each s ∈ S,probability transition function q(.|s,a),deterministic and nonnegative single-period cost function c(s,a).

Given a feasible policy π, a discount factor α ∈ (0,1), and an initialstate s, the expected long-run average cost and total discounted costare denoted as Jπ(s) and Jπα(s) respectively.



Consider the following MDP withstate space S,action spaces A(s) for each s ∈ S,probability transition function q(.|s,a),deterministic and nonnegative single-period cost function c(s,a).

Given a feasible policy π, a discount factor α ∈ (0,1), and an initialstate s, the expected long-run average cost and total discounted costare denoted as Jπ(s) and Jπα(s) respectively.



1 S is a locally compact space with a countable base, i.e., thereexists a countable collection B of open sets in a locally compactspace S such that any open set containing x ∈ S contains at leastone of the open sets in B.

2 For each s ∈ S, A(s) is nonempty and compact. Furthermore,A(.) is upper semicontinuous, i.e., for every open set B ⊆ R, theset {s : A(s) ⊆ B} is open in S.

3 The probability transition function q : {(s,a) : a ∈ A(s)} → P(S)is continuous with respect to weak convergence on P(S), whereP(S) denotes the set of all probability measures on S.

4 The single-period cost function c is lower semicontinuous, i.e.,{(s,a) : c(s,a) > γ} is an open set for all γ ∈ R.

5 There exists a policy π and an initial state s ∈ S such that Jπ(s) <∞.

6 supα<1

(infπ Jπα(s)− infs′∈S infπ Jπα(s′)

)<∞ for all s ∈ S.



1 S is a locally compact space with a countable base, i.e., thereexists a countable collection B of open sets in a locally compactspace S such that any open set containing x ∈ S contains at leastone of the open sets in B.

2 For each s ∈ S, A(s) is nonempty and compact. Furthermore,A(.) is upper semicontinuous, i.e., for every open set B ⊆ R, theset {s : A(s) ⊆ B} is open in S.

3 The probability transition function q : {(s,a) : a ∈ A(s)} → P(S)is continuous with respect to weak convergence on P(S), whereP(S) denotes the set of all probability measures on S.

4 The single-period cost function c is lower semicontinuous, i.e.,{(s,a) : c(s,a) > γ} is an open set for all γ ∈ R.

5 There exists a policy π and an initial state s ∈ S such that Jπ(s) <∞.

6 supα<1

(infπ Jπα(s)− infs′∈S infπ Jπα(s′)

)<∞ for all s ∈ S.


Vanishing discount approach

Theorem (Schäl 1993)

Under the above conditions, there exists an optimal stationary policyπ∗ such that for all s ∈ S,

Jπ∗(s) = inf

s′∈Sinfπ

Jπ(s′) = limα↑1

[(1− α) inf

s′∈Sinfπ

Jπα(s′)

].

In our setting, we need to prove

lim infα↑1

[(1− α) max

S∈[−S,S]Vα∞(x∞,S)

]= C(πx∞).


Verifying Conditions

It suffices to verify condition

supα∈(0,1)

[maxS′∈R

Vα∞(x∞,S′)− Vα

∞(x∞,S)

]<∞.

Assume S∗α solves maxS′∈R Vα∞(x∞,S′) with an optimal policy π∗. In

the inventory system starting with S∗α following policy π∗,

I∗n = I∗n−1 + x∞ − γd∗n − β.

For the system starting with S, we want to construct a policy π topursue I∗ so that the profit difference is bounded (independent of α),

In = In−1 + x∞ − γdn − β.


Mapping to Random Yield Model with CapacityConstraint

By treating supply as “demand” and demand as “supply”, the pric-ing and inventory control problem becomes a random yield modelwith capacity constraint on orders

In = In−1 + γdn − (x∞ − β).

Federgruen and Yang (2014) address infinite horizon random yieldmodel without capacity constraint using the vanishing discount ap-proach

|In − I∗n | decreases geometricallythe idea cannot be extended to the case with capacity constraint

We show for the infinite horizon random yield model with capacityconstraint

order to capacity when inventory is too low (uniformly on α)carefully bound the cost difference




In = In−1 + γdn − (x∞ − β).








In = In−1 + γdn − (x∞ − β).








In = In−1 + γdn − (x∞ − β).








In = In−1 + γdn − (x∞ − β).








In = In−1 + γdn − (x∞ − β).








In = In−1 + γdn − (x∞ − β).






Conclusion

Main contributionEstablish asymptotic optimality of constant-order policies for jointpricing and inventory control with large replenishment lead times


Conclusion

Main contributionEstablish asymptotic optimality of constant-order policies for jointpricing and inventory control with large replenishment lead times


Future research directions

Other inventory modelsnot universally heldcounter-example: single-sourcing backlogged model

Open problemsfixed ordering costsgeneral MDPs. . .














Extension and Implications

Finite state, finite action MDPsFirst type decisions: takes effect right awaySecond type decisions: takes effect after a long lead time

Claim: Open-loop control is asymptotically optimal for the secondtype decisionsImplications for data-driven models

Ignore real time information when making the second type decisionsother examples?





















Date post:	26-Jun-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Dynamic pricing and inventory control with large replenishment lead … · 2018-10-04 · Dynamic...

Documents