Objectives

The objectives of this course are to understand and acquire practical experience with:

The definition of Reinforcement Learning (RL) problems, i.e. Markov Decision Processes.
Solution methods for model-based/model-free discrete RL problems.
Value function approximation for continuous RL problems.
Direct policy search with parameterized policies.
Applications of RL to robotic tasks.

Apart from the theoretical background provided in the lectures (cours magistraux), students will acquire hands-on experience by implementing a variety of discrete/continuous RL algorithms in Python during the lab hours (travaux pratiques).

Requisite: a basic knowledge of Python.
If students do not know Python well (matrices(, they have the option to do it in Matlab as a fall-back solution.

Modalities

7 modules of 3 hours each; slides in English; exam: 3 hours (written exam: questions, problems) + oral :presentation of an article.
Location: ENSTA (or PUIO).
Each course: 1 hour lecture + 2 hours programming.
Book: An introduction to Reinforcement Learning. R. Sutton and A. Barto

Lectures

Introduction (1 module)

Model-based, discrete search space (2 modules)

Model-free, discrete (1 module)

Model-free, continuous (2 modules)

Extensions (1 module, not taken into account for the exam)

Programming

Implementation Language

Matlab
- more students know it
- integrated development environment
- easy to visualize values etc.
- we don't need advanced Python features anyway
Python
- fits better in a big data context

Exercizes

Dynamic programming
Discrete Q-Value estimation with Monte Carlo and TD methods
Value function approximation
Direct policy search
Multi-Armed Bandits

actions

Module Reinforcement Learning, Freek Stulp and Michele Sebag