Objectives
The objectives of this course are to understand and acquire practical experience with:- The definition of Reinforcement Learning (RL) problems, i.e. Markov Decision Processes.
- Solution methods for model-based/model-free discrete RL problems.
- Value function approximation for continuous RL problems.
- Direct policy search with parameterized policies.
- Applications of RL to robotic tasks.
Apart from the theoretical background provided in the lectures (cours magistraux), students will acquire hands-on experience by implementing a variety of discrete/continuous RL algorithms in Python during the lab hours (travaux pratiques).
Requisite: a basic knowledge of Python.
If students do not know Python well (matrices(, they have the option to do it in Matlab as a fall-back solution.
Modalities
7 modules of 3 hours each; slides in English; exam: 3 hours (written exam: questions, problems) + oral :presentation of an article.
Location: ENSTA (or PUIO).
Each course: 1 hour lecture + 2 hours programming.
Book: An introduction to Reinforcement Learning. R. Sutton and A. Barto
Lectures
Introduction (1 module)
Model-based, discrete search space (2 modules)
Model-free, discrete (1 module)
Model-free, continuous (2 modules)
Extensions (1 module, not taken into account for the exam)
Programming
Implementation Language
- Matlab
- more students know it
- integrated development environment
- easy to visualize values etc.
- we don't need advanced Python features anyway
- Python
- fits better in a big data context
Exercizes
- Dynamic programming
- Discrete Q-Value estimation with Monte Carlo and TD methods
- Value function approximation
- Direct policy search
- Multi-Armed Bandits