Stochastic optimal control for a pricing problem › riseth › files ›...

Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd

Stochastic optimal control for a pricing problem

Asbjørn Nilsen RisethSupervisors: Jeff Dewynne, Chris Farmer

November 25, 2016

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd Pricing challenge

Pricing challengeGiven

some initial stock of different products,a termination time.

Maximise revenue and minimise cost of unsold items.


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd Overview

Formulate mathematically: stochastic optimal controlComputationally intractable in practice

Solution for a one-product system

Investigate approximation techniqueTractableBetter more than half the time


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd Problem formulation

Dynamical systemRemaining stock: St ≥ 0Price process: αt ∈ ADemand forecast: q(a) ≥ 0Exogenous information: (i.i.d.) Wt ≥ 0

Realised sales over a given period:

Q(s, a,w) = min(q(a)w, s) (1)

Evolution of stock:

Sαt+1 = Sα

t − Q(Sαt , αt,Wt+1), t = 0, . . . ,T − 1. (2)


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo


ObjectiveRevenue from time t → t + 1:

αt · Q(Sαt , αt,Wt+1) (3)

Cost to handle unsold stock:

C · SαT C ≥ 0. (4)

Find pricing strategy α to maximize profit:

Pα =T−1∑t=0

[αt · Q(Sαt , αt,Wt+1)]− CSα

T (5)


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd

Pα is a random variable


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo


Definition (Control problem)Given initial stock s > 0 and a cost per unit unsold stock C ≥ 0, findthe pricing strategy α ∈ A that maximises the expected profit,

maxα∈A

EW

[T−1∑t=0

[αt · Q(Sαt , αt,Wt+1)]− C · Sα

T | Sα0 = s

](6)

Infinite-dimensional optimisationproblem


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo


Definition (Control problem)Given initial stock s > 0 and a cost per unit unsold stock C ≥ 0, findthe pricing strategy α ∈ A that maximises the expected profit,

maxα∈A

EW

[T−1∑t=0


T | Sα0 = s

](6)

Infinite-dimensional optimisationproblem


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd Dynamic programming [1]

Bellman equationSolve problem recursively, backwards in time from t = T.Finds optimal function aB(t, s):

What price to set at time t, given remaining stock s

Call the optimal pricing process αB. Given an event ω from theunderlying probability space,

αBt (ω) = aB(t,SαB

t (ω)). (7)


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd Example system

A = [0, 1] q(a) = 13e2−3a C = 1 (8)

T = 3 Wt ∼ N (1, γ2) γ = 0.05 (9)

1. Solved Bellman equation recursively2. Let’s plot aB(t, s)3. Then investigate the distribution of PαB


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd Example system

0 0.2 0.4 0.6 0.8 1

0.4

0.6

0.8

1

s

aB(t,s)

Policy function

t = 0t = 1t = 2

Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd Example system

Initial stock S0 = 1.The profit following policy α is a random variable

Pα =T−1∑t=0


T (10)

Simulate the system when using the Bellman policy αB


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd Example system

0.58 0.6 0.62 0.64 0.66 0.68

0

20

40

60

80

100

120

PαB

Coun

t

Realised profit

Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd Suboptimal policies

Bellman computationally intractablePractical applications, curse of dimensionality:

Hundreds of productsUnobserved parametersBusiness-goals and constraints change over time

Tractable, suboptimal approximationsCertainty Equivalent Control policy:

Classic, constrained optimisation problemSeparate parameter estimation and optimisationEasier software development


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd Suboptimal policies

Bellman computationally intractablePractical applications, curse of dimensionality:

Hundreds of productsUnobserved parametersBusiness-goals and constraints change over time

Tractable, suboptimal approximationsCertainty Equivalent Control policy:

Classic, constrained optimisation problemSeparate parameter estimation and optimisationEasier software development


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd Certainty Equivalent Control policy

Assume the system is deterministic


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd Certainty Equivalent Control policy

Assume the system is deterministicAlgorithmFor each decision point t = 0, . . . ,T − 1:

1. Observe remaining stock s.2. Create a point estimate wt+1, . . . ,wT of (W)T

t+1.3. Solve the optimisation problem

maxa∈AT−t

{T−1∑τ=t

aτQ(Saτ , aτ ,wτ )− CSa

T

}, s.t. Sa

t = s. (11)

4. Implement the price corresponding to the maximizer at ∈ Aabove.

5. Discard the decisions at+1, . . . , aT−1


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd Performance comparison

Two policiesαB: Decides prices based on aB(t,SαB

t )

αC: Decides prices based on deterministic optimisation

Two profit outcomesRandom variables:

PαB : Profit using optimal pricing policyPαC : Profit using suboptimal pricing policy

Simulate strategies 1000 times


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo


Two policiesαB: Decides prices based on aB(t,SαB

t )

αC: Decides prices based on deterministic optimisation

Two profit outcomesRandom variables:

PαB : Profit using optimal pricing policyPαC : Profit using suboptimal pricing policy

Simulate strategies 1000 times


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo


−0.5 0 0.5 1 1.5 2·10−2

0

200

400

PαB − PαC

Coun

t

Simulations of Bellman and CEC policies

Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo


Bellman policy has best average outcome

Certainty Equivalent policy is better 50% of the time


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd Conclusion

Project introductionSimple, model pricing problemCan find optimal policy by solving Bellman equation

My workIndustry setting: Bellman intractableLooking at algorithms to find suboptimal policies. Trade-off:

Computational time,Software costDegree of suboptimality


Mat

hem

atica

lIns

titut

eUn

iversi

tyof

Oxfo

rd References I

[1] D. P. Bertsekas. Dynamic programming and optimal control,volume 1. Athena Scientific Belmont, MA, third edition, 2005.


Date post:	23-Jun-2020
Category:	Documents
Upload:	others
View:	8 times
Download:	0 times

Stochastic optimal control for a pricing problem › riseth › files ›...

Documents