Pointeurs
- Vidéo du tutoriel de Richard Sutton, https://www.microsoft.com/en-us/research/video/tutorial-introduction-to-reinforcement-learning-with-function-approximation/
- some videos of the Boston Dynamics group
Cours (M. Sebag)
- 24/11 RL_2016_Cours1.pdf
- 9/12 RL_2016_Cours2.pdf
- 18/1 RL_2016_Cours3.pdf
- 23/1, 26/1 revised slides: RL_2016_Cours4.pdf
- 23/1 Cours Mehdi Khamassi Mehdi_Khamassi_2017_RL.pdf
- 1/2 RL_2016_Cours5.pdf
Exposés 1/2
- Deep Reinforcement Learning for Simulated Autonomous Vehicle Control
- Karim Kouki, Alwine Lambert, Guillaume Lorre
- Mastering the game of Go with Deep Neural Networks & Tree Search
- Aris Tritas, Divya Gover, Ahmed Mazari, Hafed Rhouma
TD1 : Markov decision processes [ Exercices here ]
Skeleton code hereMail : diviyan (at) lri (dot) fr
Here is a few notes, in order to make things clearer about definitions : ComplementsTD1.pdf
The solutions are here :
Note : Some codes might fail because of an error in the string formatting in mdp_grid.valuesString(), remove the calls to this function.
TD2 : Monte Carlo Methods [ Exercices here ]
Solutions :
1 : State-values agent
2 : State-action values agent
3 : Epsilon-greedy policy ?
4 : TD-learning Sarsa
For TD learning : FromMC_to_TD.pdf
For next time : Experiment agents & implement evolutive Epsilon & maybe do stochasticity (4.) . As an exercise, you can also implement a state/action agent with trace (cf. 1. where it is implemented for state values).
TD3 : Function approximation [ Exercices here ]
Part 1 :
Pybrain library documentation : http://pybrain.org/docs/
+ http://simontechblog.blogspot.fr/2010/08/pybrain-reinforcement-learning-tutorial_15.html
use this version : pybrain.zip
Install : https://github.com/pybrain/pybrain/wiki/installation
Part 2 : FA.zip
Correction : Q-Learning with Value Function Approximation Solution.ipynb
Projets
(Un rapport de 2 pages est demandé, ainsi que le code. Le pompage de programmes existants aura une très mauvaise note.)Projects are done with a maximum number of 3 students in each group except for the two last subjects (Halite & Alesia) in which the limit is 4.
The projects are due to the 24th February, 23:59 GMT+1.
Each group must produce :
- A brief report of ~2 pages (max 3 pages without references), TeX and .pdf files, including a description of the approach, results and comparison with other algorithms/state of the art (when possible), using the ICML 2017 format (whose deadline is also on the 24th, fun fact). People not able to write TeX can produce a .doc(x) document, with its .pdf.( Description | ICML2017 TeX package )
- The code of your implemented approach. This code should work "out of the box", add a notice/readme giving the list of required packages/libraries, special notes if needed. Producing a code taken from the internet, with no or minimal modifications could lead to unwanted consequences.
You can discuss about your project's problems/ideas, and ask for more information at : diviyan (at) lri (dot) fr
The subjects are the following (increasing difficulty):
- Mountain car problem (compare two approaches)
- Belkham Fella, Medjkoune Nawel et Sorostinean Mihaela
- Mohamed Abdelkhalek
- Inverted pendulum (compare two representations of the problem)
- The acrobot
- Jonathan Crouzet
- Octopus
- Laurent Cetinsoy et Clément Thierry
- Td-gammon
- Xiaoxiao CHEN/Yuxiang WANG/Honglin LI/Dong FEI
- Ahmed MAZARI & Divya GROVER
- Aris TRITAS & Hafed RHOUMA
- Gabriel Quéré, Florence Carton, Alvaro Correia
- FATHALLAH Mohamed Ali, Amal TARGHI, Katia SANA
- bicycle: equilibrium + advancing
- Abdelhadi Temmar, Stephen Batifol, Nicolas Bougie
- Anti-Imitation Policy learning: reproduire une expérience de mainDIVA.pdf
- halite.io
- Force Fidele KIEN, XIyu ZHANG, Yaohui WANG, Herilalaina RAKOTOARISON.
- Guillaume Lorre, Gabriel Bellard, Karim Kouki et Lambert Alwine
- Jeu d'Alesia (voir Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games, ICML 15) Alesia_game.zip