2016 Module Reinforcement Learning, Michele Sebag, Diviyan Kalainathan


  1. Vidéo du tutoriel de Richard Sutton,
  2. some videos of the Boston Dynamics group

Cours (M. Sebag)

Exposés 1/2

TD1 : Markov decision processes [ Exercices here ]

Skeleton code here

Mail : diviyan (at) lri (dot) fr
Here is a few notes, in order to make things clearer about definitions : ComplementsTD1.pdf
The solutions are here :
Note : Some codes might fail because of an error in the string formatting in mdp_grid.valuesString(), remove the calls to this function.

TD2 : Monte Carlo Methods [ Exercices here ]

Solutions :
1 : State-values agent
2 : State-action values agent
3 : Epsilon-greedy policy ?
4 : TD-learning Sarsa

For TD learning : FromMC_to_TD.pdf
For next time : Experiment agents & implement evolutive Epsilon & maybe do stochasticity (4.) . As an exercise, you can also implement a state/action agent with trace (cf. 1. where it is implemented for state values).

TD3 : Function approximation [ Exercices here ]

Part 1 :

Pybrain library documentation :
use this version :

Install :

Part 2 :

Correction : Q-Learning with Value Function Approximation Solution.ipynb


(Un rapport de 2 pages est demandé, ainsi que le code. Le pompage de programmes existants aura une très mauvaise note.)

Projects are done with a maximum number of 3 students in each group except for the two last subjects (Halite & Alesia) in which the limit is 4.
The projects are due to the 24th February, 23:59 GMT+1.

Each group must produce :
  1. A brief report of ~2 pages (max 3 pages without references), TeX and .pdf files, including a description of the approach, results and comparison with other algorithms/state of the art (when possible), using the ICML 2017 format (whose deadline is also on the 24th, fun fact). People not able to write TeX can produce a .doc(x) document, with its .pdf.( Description | ICML2017 TeX package )
  2. The code of your implemented approach. This code should work "out of the box", add a notice/readme giving the list of required packages/libraries, special notes if needed. Producing a code taken from the internet, with no or minimal modifications could lead to unwanted consequences.

You can discuss about your project's problems/ideas, and ask for more information at : diviyan (at) lri (dot) fr

The subjects are the following (increasing difficulty):
  1. Mountain car problem (compare two approaches)
    1. Belkham Fella, Medjkoune Nawel et Sorostinean Mihaela
    2. Mohamed Abdelkhalek
  2. Inverted pendulum (compare two representations of the problem)
  3. The acrobot
    1. Jonathan Crouzet
  4. Octopus
    1. Laurent Cetinsoy et Clément Thierry
  5. Td-gammon
    1. Xiaoxiao CHEN/Yuxiang WANG/Honglin LI/Dong FEI
    2. Ahmed MAZARI & Divya GROVER
    3. Aris TRITAS & Hafed RHOUMA
    4. Gabriel Quéré, Florence Carton, Alvaro Correia
    5. FATHALLAH Mohamed Ali, Amal TARGHI, Katia SANA
  6. bicycle: equilibrium + advancing
    1. Abdelhadi Temmar, Stephen Batifol, Nicolas Bougie
  7. Anti-Imitation Policy learning: reproduire une expérience de mainDIVA.pdf
    1. Force Fidele KIEN, XIyu ZHANG, Yaohui WANG, Herilalaina RAKOTOARISON.
    2. Guillaume Lorre, Gabriel Bellard, Karim Kouki et Lambert Alwine
  9. Jeu d'Alesia (voir Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games, ICML 15)

Collaborateur(s) de cette page: Diviyan et sebag .
Page dernièrement modifiée le Dimanche 19 novembre 2017 21:18:48 CET par Diviyan.