Tuesday, 2nd of July

14h30 (room R2014, 660 building) (see location)

Reda Alami

(Orange / LRI)

Memory bandits for decision-making in dynamic environments. Application to 5G optimization.

In this talk, we build the next generation of multi-armed bandits for the non-stationary environment. We call them Memory Bandits. They are a combination between a MAB solver (Thompson Sampling, KLUCB, Bayes UCB,...) and the Bayesian Online change-point detector. We also present a modified version of this detector which is easier to analyze in term of false alarm and detection delay. Then, we present two industrial applications of multi-armed bandit in the context of 5G optimization. Finally, we introduce the decentralized exploration problem in the multi-armed bandit paradigm with a first generic solution called decentralized elimination

Contact: guillaume.charpiat at
