Markov decision processes: dynamicprogramming and applications
Marianne AKIANINRIA Saclay - Ile-de-France and CMAP Ecole polytechnique CNRS,[email protected]
and Jean-Philippe CHANCELIERCERMICS, Ecole des Ponts [email protected]
M2 Optimization, University Paris Saclay, 2017
Several real life problems can be modelized as Markov decision processes
(MDP) or Stochastic Control Problems
Airline Revenue management
Portfolio selectionDam management
Stock management,Transportation or Web PageRank optimisation,Divorce of birds...
Aim of the course
• modelize• apply dynamic programming approach• solve dynamic programming equations:
• with analytical tools (convexity,...)• with numerical algorithms (value and policy iterations, linear programming)
One shall consider essentially stochastic dynamical systems with discrete timeand finite state space, or finite Markov chains,
and go from simple to more sophisticated models/problems:
• deterministic and stochastic problems;• complete and incomplete observation problems;• criteria with finite horizon, discounted infinite horizon, stopping time,
ergodic criteria, risk-sensitive criteria;• unconstrained and constrainted problems.
Possible schedule
Lectures 1-7 are common to ENSTA course and M2 course.
Lectures 1 and 2: Dynamic programming equation of deterministic controlproblems. Problems with Markov chains (without control).
Lectures 3 and 4: Markov decision processes (MDP) with complete stateobservation. Finite horizon problems. Infinite horizon problems:contraction of the dynamic programming operator, value iterationand policy iteration algorithms.
Lecture 5: Long-term behaviour of Markov chains. Evaluation ofmean-payoff/ergodic criteria.
Lecture 6: Practical work on the PageRank optimization.Lecture 7: Modelization and solution of several problems.Lecture 8: MDP with mean-payoff/ergodic criteria or long-term risk
sensitive criteria.Lectures 9 and 10: Constrained MDP.Lectures 11 and 12: Problems with partial observation. Gittins index for
multiarmed bandits.