+ All Categories
Home > Documents > Markov decision processes: dynamic programming …...with numerical algorithms (value and policy...

Markov decision processes: dynamic programming …...with numerical algorithms (value and policy...

Date post: 25-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
4
Markov decision processes: dynamic programming and applications Marianne AKIAN INRIA Saclay - ˆ Ile-de-France and CMAP ´ Ecole polytechnique CNRS, [email protected] and Jean-Philippe CHANCELIER CERMICS, ´ Ecole des Ponts ParisTech [email protected] M2 Optimization, University Paris Saclay, 2017
Transcript
Page 1: Markov decision processes: dynamic programming …...with numerical algorithms (value and policy iterations, linear programming) One shall consider essentially stochastic dynamical

Markov decision processes: dynamicprogramming and applications

Marianne AKIANINRIA Saclay - Ile-de-France and CMAP Ecole polytechnique CNRS,[email protected]

and Jean-Philippe CHANCELIERCERMICS, Ecole des Ponts [email protected]

M2 Optimization, University Paris Saclay, 2017

Page 2: Markov decision processes: dynamic programming …...with numerical algorithms (value and policy iterations, linear programming) One shall consider essentially stochastic dynamical

Several real life problems can be modelized as Markov decision processes

(MDP) or Stochastic Control Problems

Airline Revenue management

Portfolio selectionDam management

Stock management,Transportation or Web PageRank optimisation,Divorce of birds...

Page 3: Markov decision processes: dynamic programming …...with numerical algorithms (value and policy iterations, linear programming) One shall consider essentially stochastic dynamical

Aim of the course

• modelize• apply dynamic programming approach• solve dynamic programming equations:

• with analytical tools (convexity,...)• with numerical algorithms (value and policy iterations, linear programming)

One shall consider essentially stochastic dynamical systems with discrete timeand finite state space, or finite Markov chains,

and go from simple to more sophisticated models/problems:

• deterministic and stochastic problems;• complete and incomplete observation problems;• criteria with finite horizon, discounted infinite horizon, stopping time,

ergodic criteria, risk-sensitive criteria;• unconstrained and constrainted problems.

Page 4: Markov decision processes: dynamic programming …...with numerical algorithms (value and policy iterations, linear programming) One shall consider essentially stochastic dynamical

Possible schedule

Lectures 1-7 are common to ENSTA course and M2 course.

Lectures 1 and 2: Dynamic programming equation of deterministic controlproblems. Problems with Markov chains (without control).

Lectures 3 and 4: Markov decision processes (MDP) with complete stateobservation. Finite horizon problems. Infinite horizon problems:contraction of the dynamic programming operator, value iterationand policy iteration algorithms.

Lecture 5: Long-term behaviour of Markov chains. Evaluation ofmean-payoff/ergodic criteria.

Lecture 6: Practical work on the PageRank optimization.Lecture 7: Modelization and solution of several problems.Lecture 8: MDP with mean-payoff/ergodic criteria or long-term risk

sensitive criteria.Lectures 9 and 10: Constrained MDP.Lectures 11 and 12: Problems with partial observation. Gittins index for

multiarmed bandits.


Recommended