March 31th

14:30 , R2014 Digiteo Shannon (660) (see location):

Bruno Scherrer (Inria Nancy)

Title : Non-Stationary Modified Policy Iteration

(joint work with Boris Lesner)

Abstract :

I will consider the infinite-horizon $\gamma$-discounted optimal control
problem formalized by Markov Decision Processes. Running any instance
of Modified Policy Iteration---a family of algorithms that can
interpolate between Value and Policy Iteration---with an error
$\epsilon$ at each iteration is known to lead to stationary policies
that are at least $\frac{2\gamma\epsilon}{(1-\gamma)

2}$-optimal.
Variations of Value and Policy Iteration, that build $\ell$-periodic
non-stationary policies, have recently been shown to display a better
$\frac{2\gamma\epsilon}{(1-\gamma)(1-\gamma

\ell)}$-optimality
guarantee. I will describe a new algorithmic
scheme, Non-Stationary Modified Policy Iteration, a family of
algorithms parameterized by two integers $m \ge 0$ and $\ell \ge 1$
that generalizes all the above mentionned algorithms. While $m$ allows
to interpolate between Value-Iteration-style and Policy-Iteration-style updates,
$\ell$ specifies the period of the non-stationary policy that is
output.

I will present results that show that this new family of algorithms also enjoys the
improved $\frac{2\gamma\epsilon}{(1-\gamma)(1-\gamma^\ell)}$-optimality
guarantee. Perhaps more importantly, I will show, by exhibiting an
original problem instance, that this guarantee is tight for all $m$
and $\ell$; this tightness was to our knowledge only known in two
specific cases, Value Iteration $(m=0,\ell=1)$ and Policy Iteration
$(m=\infty,\ell=1)$. This constitutes a unification of
many dynamic programming algorithms and their analysis
of sensitivity to noise.

Contact: cyril.furtlehner@inria.fr