Chargement...
 
Tao

Seminar02032017

March 2nd

14:30 (Shannon amphitheatre, building 660) (see location):

Marta Soare (Aalto University)


Title: Sequential Decision Making in Linear Bandit Setting


Abstract:


When making a decision in an unknown environment, a learning agent decides at every step whether to gather more information on the environment (explore), or to choose what seems to be the best action given the current information (exploit). The multi-armed bandit setting is a simple framework that captures this exploration-exploitation trade-off and offers efficient solutions for sequential decision making. In this talk, I will review a particular multi-armed bandit setting, where there is a global linear structure in the environment. I will then show how this structure can be exploited for finding the best action using a minimal number of steps and for deciding when to transfer samples to improve the performance in other similar environments.



Contact: guillaume.charpiat at inria.fr


Collaborateur(s) de cette page: guillaume .
Page dernièrement modifiée le Dimanche 19 février 2017 18:39:25 CET par guillaume.