Reviews: Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs

The paper considers regret minimization in infinite-horizon undiscounted MDPs using the idea of an "exploration bonus": exploring the environment by planning over an MDP with rewards that are perturbed by a bonus that scales inversely with the number of times this state-action pair was visited. Based on this idea, new online algorithms are developed, and while their regret guarantees do not improve over previous work, they are computationally efficient in contrast to existing methods. The paper has received solid support from all three reviewers, who appreciated the technical quality of the work and the advancement compared to previous work (in particular, to Bartlett & Tewari '09) in terms of computational tractability.

Paper ID:	2710
Title:	Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs