Deep neural networks for BSDEs and Control problems.zoltan.szabo/jc/2020_02... · applications...

CMAP,Ecole Poly-technique

C. B.

Controlproblems,PDEs andBSDEs

Stochasticoptimal controlproblems

Non-linearFeynman-Kacformula andBSDEs

Approximationof thesolution withneuralnetworks

Control problemsin discrete time

Semilinearparabolic PDEsand FBSDEs

Numericalapplications

Ongoingwork

Deep neural networks for BSDEs and Control

problems.

Cyril BénézetBased on works by Han, E, Jentzen and Han, E, and a work in

progress with M. Allouche

February, 27Machine Learning Journal Club @ CMAP, Ecole Polytechnique


C. B.








Ongoingwork

Outline

• Motivation: stochastic optimal control problems and PDEs

• Approximation with deep neural networks

• Application and numerical results

• Ongoing work on new applications


C. B.








Ongoingwork

Control problems, PDEs and BSDEs


C. B.








Ongoingwork

Control problems in discrete time

• Setup of Deep Learning Approximation for Stochastic ControlProblems by Han and E.

• Let (Ω,F ,F = (Ft)TT=0) a ltered probability space with F = FT .

• For t = 1, . . . ,T let ξt be a Ft -measurable random variable.

• At each time t = 0, . . . ,T − 1, the agent choses at , Ft-measurable(potentially subject to constraints).

• State process evolves according to

st+1 = st + b(st , at) + ξt+1, t = 0, . . . ,T − 1.

• Agent aims for minimal expected cost:

V (0, x) = infa=(a0,...,aT−1)

E(0,x)

[T−1∑t=0

ct(st , at) + cT (sT )

], (1)

where E(0,x) means that s0 = x a.s. .

• The functions ct , n = 0, . . . ,T − 1 are the running costs and cT isthe terminal cost.


C. B.








Ongoingwork

Usual approach: DPP

• For t0 = 0, . . . ,T consider the problem starting at time n

V (t0, x) = infa=(at0 ,...,aT−1)

E(t0,x)

[T−1∑t=t0

ct(st , at) + cT (sT )

],

where E(t,x) means that st = x a.s. .

• Dynamic programming backward algorithm:

V (T , x) = cT (x), (2)

V (t − 1, x) = infat

E(t−1,x) [ct(st , at) + V (t, st)] , t = 1, . . . ,T . (3)

• This traditional approach runs into the curse of dimensionality.

• However, some works use this dynamic programming principle (2)-(3)together with a neural network approximation of V (see for examplethe work of Huré, Pham, Warin).

• We discuss here a more direct approach where we approximate theoptimal control in (1) with neural networks.


C. B.








Ongoingwork

Example: trading costs

• Consider a market with n assets whose prices evolve following a GBMdiusion and is impacted by market conditions and the strategy ofthe trader.

• One trader has zero inventory at time t = 0, she wants to possessai ∈ R of stock i (i = 1, . . . , n) at time t = T > 0. She can trade ateach time t = 0, . . . ,T − 1.

• At each time t, the vector of price is pt = pt + δt where pt is theno-impact price (a discretized GBM) and δt is the impact pricemodeled by

δt = Pt(APtat + Bxt)

with Pt = diag [pt ] ,A ∈ Rn×n,B ∈ Rn×m, at ∈ Rn is the tradingstrategy and xt ∈ Rm is the market conditions at time t.

• Goal:

mina

E

[T−1∑t=0

p>t at

],

under the constraint∑T−1

t=0 at = a.


C. B.








Ongoingwork

Control problems in continuous

time• Consider (Ω,F ,F = (Ft)t≥0,W ) a ltered probability space with a

Wiener process.

• Let X a controlled diusion

dXt = b(t,Xt , αt)dt + σ(t,Xt , αt)dWt ,

where b, σ are Lipschitz coecients and α is the control process.

• For T > 0, a running reward f and a terminal reward g , if the agentchoses control α starting from time t and initial condition x , herexpected cost is given by

J(t, x , α) = E(t,x)

[∫ T

t

f (s,Xs , αs)ds + g(XT )

].

• The agent aims to compute

V (t, x) = infα

J(t, x , α),

and to nd an optimal control α.


C. B.








Ongoingwork

Dynamic programming equation

• As in the discrete case, there is a dynamic programming principle forthe continuous time problem. If V is known to be continuous, for any(t, x) and any stopping time τ ∈ [t,T ], the DPP is

V (t, x) = infα

E(t,x)

[∫ τ

t

f (s,Xs , αs)ds + V (τ,Xτ )

].

• If V is not a priori continuous, one can state a similar DPP but itsproof is more involved due to measurability arguments. However, byBouchard and Touzi, a weak DPP is more accessible and enough,using lower- and upper-semicontinuous envelope of V .

• In this continuous time setting, the DPP implies that V is thesolution (in the viscosity sense) to the following PDE

− ∂tϕ+ supa

(−b(·, a) · Dxϕ−

1

2Tr[σσ′(·, a)D2

xxϕ]− f (·, a)

)= 0,

ϕ(T , ·) = g(·).

• One can solve control problems by PDE methods, but classicalmethods also run into the curse of dimensionality.


C. B.








Ongoingwork

Example: LQ control problem• Example taken from Solving High-Dimensional Partial Dierential

Equations Using Deep Learning by Han, Jentzen and E.

• We consider a control m = (mt ∈ Rd)t≥0 and the controlled processX = (Xt ∈ Rd)t≥0 with dynamics

dXt = 2√λmtdt +

√2dWt .

• The control problem is

v(t, x) = infm

E(t,x)

[∫ T

t

‖mt‖2dt + g(XT )

].

• From the precedent slide, v is a solution to

− ∂tϕ+ supm∈Rd

(−2√λm · Dxϕ− Tr

[D2

xxϕ])

= 0,

ϕ(T , ·) = g .

• One can easily compute the m achieving the sup, and the PDEbecomes

− ∂tϕ+ λ‖Dxϕ‖2 −1

2∆ϕ = 0,

ϕ(T , ·) = g .


C. B.








Ongoingwork

PDEs and BSDEs

• One can use PDE methods to solve control problems. Conversely,one can also be interested in solving PDEs using probabilistic tools.

• One may be interested in the solution u to the following semilinearparabolic PDE

− ∂tϕ+−b · Dxϕ−1

2Tr[σσ′D2

xxϕ]− f (·, ϕ, σ′Dxϕ) = 0,

ϕ(T , ·) = g(·).

• Under some technical assumption, a non-linear Feynman-Kac formula

is available: if (X ,Y ,Z) is the solution, on [t,T ], to the followingMarkovian forward-backward system

Xs = x +

∫ s

t

b(u,Xu)du +

∫ s

t

σ(u,Xu)dWu,

Ys = g(XT ) +

∫ T

s

f (u,Xu,Yu,Zu)du −∫ T

s

ZudWu,

then Ys = u(s,Xs) and Zs = σ′(s,Xs)Dxu(s,Xs) and in particularYt = u(t, x) and Zt = σ′(t, x)Dxu(t, x).


C. B.








Ongoingwork

Example: non-linear Black&Scholes

• Example taken from Solving High-Dimensional Partial DierentialEquations UsingDeep Learning by Han, Jentzen and E.

• Classical Black&Scholes equation is a linear parabolic PDE.

• Introducing important factors from real markets traduce intononlinearities in the function f . We introduce default risk in thevaluation of an European claim. When default occurs, the holderreceives only a fraction δ ∈ [0, 1) of the current value.

• The default time is modeled by the rst jump time of a Poissonproces with intensity Q, where Q is a decreasing function of theclaim's value: the default is more likely when the claim's value is low.

• In this setting, the function f is given by

f (t, x , y , z) = −(1− δ)Q(y)y − ry ,

where r is the interest rate (and when we work under the risk-neutralmeasure Q).


C. B.








Ongoingwork

Approximation of the solution with neural

networks


C. B.








Ongoingwork

Control problems in discrete time• Let's recall the setup. Controlled state process and goal:

st+1 = st + b(st , at) + ξt+1,

infa=(a0,...,aT−1)

E(0,x)

[T−1∑t=0

ct(st , at) + cT

].

• We assume that the optimal control can be found in closed form, i.e.at = αt(st) for each n.

• For each t, we approximate αt by some neural network withparameters θt :

αt ' βt(·|θt).

• The new optimization problem is

infθ=(θ0,...,θT−1)

E(0,x)

[T−1∑t=0

ct(st , βt(st |θt)) + cT (sT )

]=: E(0,x) [CT ] ,

• We also dene, for t < T , the cumulative cost at time t

Ct =t∑`=0

ct(st , βt(st |θt)).


C. B.








Ongoingwork

The full network

Diagram from Deep Learning Approximation for Stochastic ControlProblems by Han and E.


C. B.








Ongoingwork

Comments on the architecture

There are 3 types of connections in this architecture:

• st → h1t → · · · → hNt → at is the multilayer feedforward neural

network representing at ' βt(st |θt). The parameters θt of thisnetwork are the parameters we aim to optimize.

• (st , at ,Ct−1)→ Ct is the contribution to the nal output of thenetwork. It is simply Ct = Ct−1 + ct(st , at), and there are noparameters to optimize.

• (at−1, st−1, ξt)→ st is the evolution of the state process, given byst = st−1 + b(st−1, at−1) + ξt . There are no parameters to optimize.


C. B.








Ongoingwork

Training the network

• We sample ξ = (ξt)Tt=1 as input data (ξ1, . . . , ξM).

• We compute CT =∑T

t=0 ct(st , βt(st |θt)) + cT (sT ) using the fullnetwork for each input ξi , i = 1, . . . ,M.

• For each batch, we compute the loss function, which is the empiricalestimator for the objective function to minimize

L =1

M

M∑i=1

C iT .

• Backpropagation step to update the parameters using a gradientdescent.


C. B.








Ongoingwork

PDEs and FBSDEs• We want to compute u(0, x) for xed x ∈ Rd , where u is the solution

to

− ∂tϕ+−b · Dxϕ−1

2Tr[σσ′D2

xxϕ]− f (·, ϕ, σ′Dxϕ) = 0,

ϕ(T , ·) = g(·).• Recall that u(0, x) = Y0 where (X ,Y ,Z) solves

Xt = x +

∫ t

0

b(u,Xu)du +

∫ t

0

σ(u,Xu)dWu,

Yt = g(XT ) +

∫ T

t

f (u,Xu,Yu,Zu)du −∫ T

t

ZudWu.

• We consider a time grid π = (tn)Nn=0 of [0,T ] and the Euler schemeassociated to X and Y :

Xn+1 = Xn + ∆nb(tn,Xn) +√

∆nσ(tn,Xn)ξn+1,

Yn+1 = Yn − f (tn,Xn,Yn,Zn)∆n +√

∆nZnξn+1,

where ∆n := tn+1 − tn and ξn = (ξ1n , . . . , ξdn ) for each n with ξin iid

N (0, 1).

• Question: since Y0 and Zn are unknown, how can we use the schemefor Y ?


C. B.








Ongoingwork

The associated control problem

• To use the previous scheme, we consider Y0 and Z0 as parametersθu0 and θ∇u0 of our model, and each Zn, n ≥ 1, is replaced by aneural network αn(Xn|θn) (recall that Zn is the Euler approximationfor Ztn , and that Ztn = σ′(tn,Xtn )Dxu(tn,Xtn )).

• Given parameters Θ := (θu0, θ∇u0, θ1, . . . , θN−1) we thus consider thefollowing dynamics for Y :

Y Θ1 = θu0 − f (0, x , θu0, θ∇u0)∆0 +

√∆0θ∇u0ξ1,

Y Θn+1 = Yn − f (tn,Xn,Y

Θn , αn(Xn|θn))∆n +

√∆nαn(Xn|θn)ξn+1.

• The goal is now to nd parameters so that YT matches g(XT ).

• The loss function is then

L(θ) = E[∣∣∣g(XT )− Y Θ

T

∣∣∣2] ,or more precisely the empirical expectation associated to a sample of(ξi )i=1,...,N .


C. B.








Ongoingwork

The full network

Diagram from Solving High-Dimensional Partial Dierential EquationsUsing Deep Learning by Han, Jentzen and E.


C. B.








Ongoingwork

Numerical applications


C. B.








Ongoingwork

Trading costs

• We consider the model introduced above for trading costs.

• The state process is st := (pt , xt ,wt)Tt=0 with pt the impact-free

prices, xt the market contitions and wt the remaining assets to bebought at time t (so w0 = a and wT = 0).

• We consider n = 10 assets, and the market conditions represented by3 parameters. The state process is of dimension 10 + 10 + 3 = 23.

• The dynamics of st+1 is controlled only through w , by

wt+1 = wt − at+1 ' wt − βt(st |θt),

where θt parametrises a neural network βt(·|θt) : R23 → R10.

• Each βt(·|θt) is a neural network with 2 hidden layers and ReLUactivation function for the hidden variables.

• Analytical formulae for the optimal trading costs and optimalcontrols are available in this model.


C. B.








Ongoingwork

Results

Relative trading cost (with the dashed line being optimal trading costrescaled to 1) and relative error for the controls (compared to the exactsolution) as a function of the number of iterations on validation samples.The shaded area depicts the mean ± the standard deviation over vedierent random seeds.The average relative trading cost and relative error for the controls on testsamples are 1.001, 1.002, 1.009 and 3.7%, 3.7%, 8.6% for T = 20, 25, 30.The average running time is 605 s, 778 s, 905 s respectively.


C. B.








Ongoingwork

HJB Equation• We consider the following PDE

− ∂tϕ+ λ‖Dxϕ‖2 −1

2∆ϕ = 0 on [0,T )× Rd ,

ϕ(T , ·) = g .

• Analytical solution:

u(t, x) = − 1

λln(E[exp

(−λg(x +

√2WT−t)

)]),

so we can compute u(t, x) with a Monte-Carlo procedure.

• We consider the PDE in dimension d = 100 with terminal conditiong(x) = ln

((1 + ‖x‖2)/2

).

• We compute u(t = 0, x = (0, . . . , 0) using the non-linearFeynman-Kac representation and the neural network approximationof the FBSDE.

• Associated FBSDE via non-linear Feynman-Kac representation:

Xt = Wt ,

Yt = g(XT )−∫ T

t

λZ 2s ds −

∫ T

t

ZsdWs .


C. B.








Ongoingwork

Results

We use a Euler scheme with a time-grid π = k T2020k=0. At each time step:

neural network with 2 hidden layers with d + 10 neurons each and ReLUactivation function. Learning rate of 0.01.Left: the mean and the standard deviation of the relative error foru(t = 0, x = (0, ..., 0)) in the case λ = 1. The deep BSDE method achievesa relative error of 0.17% in a runtime of 330 seconds on a Macbook Pro.Right: comparison, for dierent values of λ, of u(t = 0, x = (0, . . . , 0))using the Monte-Carlo approximation of the analytical solution and theneural network approximation of the solution of the BSDE


C. B.








Ongoingwork

Allen-Cahn equation

• The PDE we want to solve here is, in dimension d = 100,

− ∂tϕ+ ∆ϕ+ u − u3 = 0 on (0,T ]× Rd ,

ϕ(0, x) = g(x) =1

(2 + 0.4‖x‖2).

• Here there is no explicitely known exact solution. However, we canapproximate u(t = 0.3, x = (0, . . . , 0)) ' 0.0528 by a branchingdiusion method.

• By considering vT (t, ·) = u(T − t, ·) one obtains a PDE withterminal condition v(T , ·) = g and the BSDE method can be used.

• One then computes u(T , x = (0, . . . , 0)) by computingvT (0, x = (0, . . . , 0)).

• We use the neural network approximation with 20 equidistant timesteps for the Euler scheme, and learning rate 0.0005. At each timestep: neural network with 2 hidden layers with d + 10 neurons eachand ReLU activation function.


C. B.








Ongoingwork

Results

Left: Relative error of the deep BSDE method for u(t = 0.3, x = (0, ..., 0))against the number of iteration steps. The shaded area depicts the mean±the standard deviation of the relative error for 5 dierent runs. The deepBSDE method achieves a relative error of size 0.30% in a runtime of 647seconds.Right: Time evolution of u(t, x = (0, ..., 0)) for t ∈ [0, 0.3] computed bymeans of the deep BSDE method for several vT .


C. B.








Ongoingwork

Non-linear Black&Scholes equation• Recall that we consider a Black&Scholes market where we introduce

default risk. When default occurs, the holder of an European claimrecieves only a fraction δ ∈ [0, 1) of the current value.

• The driver f of the BSDE is thus given by

f (y) = −(1− δ)Q(y)y − ry ,

where r is the risk free rate and Q is the intensity of the Poissonprocess which rst jump model the default time.

• The FBSDE to solve is then (under the risk-neutral measure Q)

Xt = x +

∫ t

0

rdiag [Xs ] ds +

∫ t

0

σdiag [Xs ] dWs ,

Yt = g(XT )−∫ T

t

((1− δ)Q(Ys)Ys + rYs) ds −∫ T

t

ZsdWs .

• We consider an option over un underlying of dimension 100, withpayo

g(x) = mini=1,...,100

xi .


C. B.








Ongoingwork

Results• There is no explicit solution known for this problem, but a multilevel

Picard method gives u(t = 0, x = (100, . . . , 100)) ' 57.3 (incomparison, the usual Black&Scholes price is about 60.8 which showsthat important pricing errors appear when some risks are ignored).

• Using the neural network approach to solve the BSDE, we observeconvergence

• Here we use 40 time step and the learning rate is 0.008.


C. B.








Ongoingwork

Ongoing work


C. B.








Ongoingwork

Quantile hedging• Usual super-replication price can be too high to be interesting for

banks and insurances. Alternative: quantile hedging price. Givenp ∈ [0, 1], one looks for the smallest initial wealth such that there is astrategy which hedges successfully with probability at least p.

• If Y y,ν is the wealth process associated to initial wealth y andstrategy ν,

Vp = infy ≥ 0 : ∃ν,P [Y y,νT ≥ g(XT )] ≥ p.

• When the market is linear, one can prove that

Vp = infα∈Ap

EQ[e−rTg(XT )Pp,α

T

],

where, if E[∫ T

0|αt |2dt

]< +∞, Pp,α is the martingale dened by

Pp,αt = p +

∫ t

0

αsdWs ,

and Ap is dened by

Ap =

α : E

[∫ T

0

|αt |2dt]< +∞ and Pp,α ∈ [0, 1], (ds ⊗ dP)− a.s.

.


C. B.








Ongoingwork

Quantile hedging ctd.

• This is a control problem in continuous time, we want to use themethods developed before to approximate its solution.

• First, in this problem the volatility is controlled. This is not in theframework of Han and E.

• Second problem: it is a priori not clear what is a discrete version ofthis problem, because of the constraint P ∈ [0, 1].

• Let π = tkNk=0 a discrete grid of [0,T ]. The Euler scheme for P is

Pk+1 = Pk + akξk+1,

where ξk+1 is the Brownian increment. Since it is normallydistributed, it is not possible to guarantee Pk+1 ∈ [0, 1] unless ak = 0.

• To discretise the problem, we consider the modied dynamics:

Pk+1 =

Pk if Pk ≤ 0 or Pk ≥ 1,

Pk + akξk+1 otherwise.

However, notice that P is not a martingale anymore.


C. B.








Ongoingwork

Explicit solution

• In the Black&Scholes model, i.e. when X is a geometric Brownianmotion, and when g is a vanilla call or put, one obtains an explicitformula for Vp, p ∈ [0, 1].

• Moreover, consider α : [0,T ]× [0, 1]→ R dened by

α(t, p) =1√

2π(T − t)exp

(−N−1(p)2

2

),

where N is the c.d.f of the gaussian distribution.

• Then Vp = EQ [e−rTg(XT )PαT], where

Pαt = p +

∫ t

0

α(s,Pαs )dWs ,

i.e. αt := α(t,Pαt ) is an optimal control.


C. B.








Ongoingwork

Using the optimal control

• To test if the modied dynamics still allows to observe convergence,we rst compute an Monte-Carlo approximation forEQ [e−rTg(Xt)P

αT

].

• Given a time grid π, X is an Euler scheme approximation for theGBM, and Pα is computed using the optimal control and themodied dynamics

Pk+1 =

Pk if Pk ≤ 0 or Pk ≥ 1,

Pk + akξk+1 otherwise.

• We observe convergence of p 7→ Vp for a ATM put option, using atime grid π = k T

NNk=0 with N = 20, 100, 1000.


C. B.








Ongoingwork

Using neural networks

• We replace the explicit optimal control in the dynamics of P by aneural network to be trained, following the ideas of Han and E.

• While the behaviour close to p = 0 and p = 1 is better when thetimestep of our Euler scheme goes to 0, it is not clear thatconvergence to the true function occurs when 0 << p << 1.

• We use a time grid π = k TNnk=0 with N = 20, 50, 100.

• The architecture of the neural networks at each time step is similar towhat was used before.


C. B.








Ongoingwork

Thank you for your attention!

Date post:	23-May-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Deep neural networks for BSDEs and Control problems.zoltan.szabo/jc/2020_02... · applications...

Documents