Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd
Stochastic optimal control for a pricing problem
Asbjørn Nilsen RisethSupervisors: Jeff Dewynne, Chris Farmer
November 25, 2016
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Pricing challenge
Pricing challengeGiven
some initial stock of different products,a termination time.
Maximise revenue and minimise cost of unsold items.
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Overview
Formulate mathematically: stochastic optimal controlComputationally intractable in practice
Solution for a one-product system
Investigate approximation techniqueTractableBetter more than half the time
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Problem formulation
Dynamical systemRemaining stock: St ≥ 0Price process: αt ∈ ADemand forecast: q(a) ≥ 0Exogenous information: (i.i.d.) Wt ≥ 0
Realised sales over a given period:
Q(s, a,w) = min(q(a)w, s) (1)
Evolution of stock:
Sαt+1 = Sα
t − Q(Sαt , αt,Wt+1), t = 0, . . . ,T − 1. (2)
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Problem formulation
ObjectiveRevenue from time t → t + 1:
αt · Q(Sαt , αt,Wt+1) (3)
Cost to handle unsold stock:
C · SαT C ≥ 0. (4)
Find pricing strategy α to maximize profit:
Pα =T−1∑t=0
[αt · Q(Sαt , αt,Wt+1)]− CSα
T (5)
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd
Pα is a random variable
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Problem formulation
Definition (Control problem)Given initial stock s > 0 and a cost per unit unsold stock C ≥ 0, findthe pricing strategy α ∈ A that maximises the expected profit,
maxα∈A
EW
[T−1∑t=0
[αt · Q(Sαt , αt,Wt+1)]− C · Sα
T | Sα0 = s
](6)
Infinite-dimensional optimisationproblem
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Problem formulation
Definition (Control problem)Given initial stock s > 0 and a cost per unit unsold stock C ≥ 0, findthe pricing strategy α ∈ A that maximises the expected profit,
maxα∈A
EW
[T−1∑t=0
[αt · Q(Sαt , αt,Wt+1)]− C · Sα
T | Sα0 = s
](6)
Infinite-dimensional optimisationproblem
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Dynamic programming [1]
Bellman equationSolve problem recursively, backwards in time from t = T.Finds optimal function aB(t, s):
What price to set at time t, given remaining stock s
Call the optimal pricing process αB. Given an event ω from theunderlying probability space,
αBt (ω) = aB(t,SαB
t (ω)). (7)
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Example system
A = [0, 1] q(a) = 13e2−3a C = 1 (8)
T = 3 Wt ∼ N (1, γ2) γ = 0.05 (9)
1. Solved Bellman equation recursively2. Let’s plot aB(t, s)3. Then investigate the distribution of PαB
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Example system
0 0.2 0.4 0.6 0.8 1
0.4
0.6
0.8
1
s
aB(t,s)
Policy function
t = 0t = 1t = 2
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Example system
Initial stock S0 = 1.The profit following policy α is a random variable
Pα =T−1∑t=0
[αt · Q(Sαt , αt,Wt+1)]− C · Sα
T (10)
Simulate the system when using the Bellman policy αB
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Example system
0.58 0.6 0.62 0.64 0.66 0.68
0
20
40
60
80
100
120
PαB
Coun
t
Realised profit
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Suboptimal policies
Bellman computationally intractablePractical applications, curse of dimensionality:
Hundreds of productsUnobserved parametersBusiness-goals and constraints change over time
Tractable, suboptimal approximationsCertainty Equivalent Control policy:
Classic, constrained optimisation problemSeparate parameter estimation and optimisationEasier software development
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Suboptimal policies
Bellman computationally intractablePractical applications, curse of dimensionality:
Hundreds of productsUnobserved parametersBusiness-goals and constraints change over time
Tractable, suboptimal approximationsCertainty Equivalent Control policy:
Classic, constrained optimisation problemSeparate parameter estimation and optimisationEasier software development
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Certainty Equivalent Control policy
Assume the system is deterministic
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Certainty Equivalent Control policy
Assume the system is deterministicAlgorithmFor each decision point t = 0, . . . ,T − 1:
1. Observe remaining stock s.2. Create a point estimate wt+1, . . . ,wT of (W)T
t+1.3. Solve the optimisation problem
maxa∈AT−t
{T−1∑τ=t
aτQ(Saτ , aτ ,wτ )− CSa
T
}, s.t. Sa
t = s. (11)
4. Implement the price corresponding to the maximizer at ∈ Aabove.
5. Discard the decisions at+1, . . . , aT−1
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Performance comparison
Two policiesαB: Decides prices based on aB(t,SαB
t )
αC: Decides prices based on deterministic optimisation
Two profit outcomesRandom variables:
PαB : Profit using optimal pricing policyPαC : Profit using suboptimal pricing policy
Simulate strategies 1000 times
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Performance comparison
Two policiesαB: Decides prices based on aB(t,SαB
t )
αC: Decides prices based on deterministic optimisation
Two profit outcomesRandom variables:
PαB : Profit using optimal pricing policyPαC : Profit using suboptimal pricing policy
Simulate strategies 1000 times
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Performance comparison
−0.5 0 0.5 1 1.5 2·10−2
0
200
400
PαB − PαC
Coun
t
Simulations of Bellman and CEC policies
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Performance comparison
Bellman policy has best average outcome
Certainty Equivalent policy is better 50% of the time
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd Conclusion
Project introductionSimple, model pricing problemCan find optimal policy by solving Bellman equation
My workIndustry setting: Bellman intractableLooking at algorithms to find suboptimal policies. Trade-off:
Computational time,Software costDegree of suboptimality
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling
Mat
hem
atica
lIns
titut
eUn
iversi
tyof
Oxfo
rd References I
[1] D. P. Bertsekas. Dynamic programming and optimal control,volume 1. Athena Scientific Belmont, MA, third edition, 2005.
EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling