Markov decision processes: dynamic programming …...with numerical algorithms (value and policy...

Markov decision processes: dynamicprogramming and applications

Marianne AKIANINRIA Saclay - Ile-de-France and CMAP Ecole polytechnique CNRS,[email protected]

and Jean-Philippe CHANCELIERCERMICS, Ecole des Ponts [email protected]

M2 Optimization, University Paris Saclay, 2017

Several real life problems can be modelized as Markov decision processes

(MDP) or Stochastic Control Problems

Airline Revenue management

Portfolio selectionDam management

Stock management,Transportation or Web PageRank optimisation,Divorce of birds...

Aim of the course

• modelize• apply dynamic programming approach• solve dynamic programming equations:

• with analytical tools (convexity,...)• with numerical algorithms (value and policy iterations, linear programming)

One shall consider essentially stochastic dynamical systems with discrete timeand finite state space, or finite Markov chains,

and go from simple to more sophisticated models/problems:

• deterministic and stochastic problems;• complete and incomplete observation problems;• criteria with finite horizon, discounted infinite horizon, stopping time,

ergodic criteria, risk-sensitive criteria;• unconstrained and constrainted problems.

Possible schedule

Lectures 1-7 are common to ENSTA course and M2 course.

Lectures 1 and 2: Dynamic programming equation of deterministic controlproblems. Problems with Markov chains (without control).

Lectures 3 and 4: Markov decision processes (MDP) with complete stateobservation. Finite horizon problems. Infinite horizon problems:contraction of the dynamic programming operator, value iterationand policy iteration algorithms.

Lecture 5: Long-term behaviour of Markov chains. Evaluation ofmean-payoff/ergodic criteria.

Lecture 6: Practical work on the PageRank optimization.Lecture 7: Modelization and solution of several problems.Lecture 8: MDP with mean-payoff/ergodic criteria or long-term risk

sensitive criteria.Lectures 9 and 10: Constrained MDP.Lectures 11 and 12: Problems with partial observation. Gittins index for

multiarmed bandits.

Date post:	25-Jul-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Markov decision processes: dynamic programming …...with numerical algorithms (value and policy...

Documents