1. Competitive experience replay
  2. Episodic Curiosity through Reachability
  3. Diversity is All You Need: Learning Skills without a Reward Function

Value functions (3)

  1. Value Propagation Networks
  2. Recall Traces: Backtracking Models for Efficient Reinforcement Learning
  3. Soft Q-Learning with Mutual-Information Regularization

Imitation learning, RL from observational data (4)

  1. Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic
  2. Learning what you can do before doing anything
  3. Intrinsically motivated reinforcement learning forhuman-robot interaction in the real-world
  4. Learning driving styles for autonomous vehicles from demonstration

Multi-Armed Bandits

  1. The K-armed Dueling Bandits Problem
  2. Restless bandits: indexability and computation of Whittle index
  3. Risk–Aversion in Multi–armed Bandits

Reinforcement Learning for Games (5)

  1. Human-level control through deep reinforcement learning
  2. Deep Reinforcement Learning with Double Q-learning
  3. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
  4. The Predictron: End-To-End Learning and Planning
  5. ELF OpenGo: an analysis and open reimplementation of AlphaZero
  6. Solving the Rubik's Cube with Approximate Policy Iteration

RL for Robotics

  1. Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection
  2. Sim-to-Real Robot Learning from Pixels withProgressive Nets

Exploration / Exploitation (3)

  1. Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control
  2. Information-Directed Exploration for Deep Reinforcement Learning
  3. Off-Policy Deep Reinforcement Learning without Exploration


  1. Reward Constrained Policy Optimization
  2. Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor