Chargement...
 

Historique: Seminar02032017

Aperçu de cette version: 1

February 14th

14:30 (building 660) (see location):

Marta Soare (Aalto University)


Abstract:


When making a decision in an unknown environment, a learning agent decides at every step whether to gather more information on the environment (explore), or to choose what seems to be the best action given the current information (exploit). The multi-armed bandit setting is a simple framework that captures this exploration-exploitation trade-off and offers efficient solutions for sequential decision making. In this talk, I will review a particular multi-armed bandit setting, where there is a global linear structure in the environment. I will then show how this structure can be exploited for finding the best action using a minimal number of steps and for deciding when to transfer samples to improve the performance in other similar environments.



Contact: guillaume.charpiat at inria.fr

Historique

Avancé
Information Version
dim. 19 de Feb, 2017 18h39 guillaume from 129.175.15.11 4
Afficher
dim. 19 de Feb, 2017 18h37 guillaume from 129.175.15.11 3
Afficher
mer. 08 de Feb, 2017 23h25 guillaume from 129.175.15.11 2
Afficher
mer. 08 de Feb, 2017 23h24 guillaume from 129.175.15.11 1
Afficher