Rewards
- Competitive experience replay
- Episodic Curiosity through Reachability
- Diversity is All You Need: Learning Skills without a Reward Function
Value functions (3)
- Value Propagation Networks
- Recall Traces: Backtracking Models for Efficient Reinforcement Learning
- Soft Q-Learning with Mutual-Information Regularization
Imitation learning, RL from observational data (4)
- Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic
- Learning what you can do before doing anything
- Intrinsically motivated reinforcement learning forhuman-robot interaction in the real-world
- Learning driving styles for autonomous vehicles from demonstration
Multi-Armed Bandits
- The K-armed Dueling Bandits Problem
- Restless bandits: indexability and computation of Whittle index
- Risk–Aversion in Multi–armed Bandits
Reinforcement Learning for Games (5)
- Human-level control through deep reinforcement learning
- Deep Reinforcement Learning with Double Q-learning
- Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
- The Predictron: End-To-End Learning and Planning
- ELF OpenGo: an analysis and open reimplementation of AlphaZero
- Solving the Rubik's Cube with Approximate Policy Iteration
RL for Robotics
- Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection
- Sim-to-Real Robot Learning from Pixels withProgressive Nets
Exploration / Exploitation (3)
- Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control
- Information-Directed Exploration for Deep Reinforcement Learning
- Off-Policy Deep Reinforcement Learning without Exploration
Actor-Critic
- Reward Constrained Policy Optimization
- Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor