
Module Reinforcement Learning, Freek Stulp and Michele Sebag


The objectives of this course are to understand and acquire practical experience with:
  • The definition of Reinforcement Learning (RL) problems, i.e. Markov Decision Processes.
  • Solution methods for model-based/model-free discrete RL problems.
  • Value function approximation for continuous RL problems.
  • Direct policy search with parameterized policies.
  • Applications of RL to robotic tasks.

Apart from the theoretical background provided in the lectures (cours magistraux), students will acquire hands-on experience by implementing a variety of discrete/continuous RL algorithms in Python during the lab hours (travaux pratiques).

Requisite: a basic knowledge of Python.
If students do not know Python well (matrices(, they have the option to do it in Matlab as a fall-back solution.


7 modules of 3 hours each; slides in English; exam: 3 hours (written exam: questions, problems) + oral :presentation of an article.
Location: ENSTA (or PUIO).
Each course: 1 hour lecture + 2 hours programming.
Book: An introduction to Reinforcement Learning. R. Sutton and A. Barto


Introduction (1 module)

Model-based, discrete search space (2 modules)

Model-free, discrete (1 module)

Model-free, continuous (2 modules)

Extensions (1 module, not taken into account for the exam)


Implementation Language

  • Matlab
    • more students know it
    • integrated development environment
    • easy to visualize values etc.
    • we don't need advanced Python features anyway
  • Python
    • fits better in a big data context


  • Dynamic programming
  • Discrete Q-Value estimation with Monte Carlo and TD methods
  • Value function approximation
  • Direct policy search
  • Multi-Armed Bandits

Collaborateur(s) de cette page: sebag .
Page dernièrement modifiée le Jeudi 06 août 2015 10:33:30 CEST par sebag.