NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper contributes significant results that quantify the impact, and optimal use, of partial information due to sub-sampling in stochastic multiarmed bandits -- an important class of online learning problems. In a sense this is an extension of partial information in the "space" domain (bandit arm feedback) to the "time" domain, where it is not possible to collect a sample of feedback in every round. The submission was unanimously appreciated by all reviewers and this was also reflected in the post-response discussion that ensued among the reviewers. [This meta-review was reviewed and revised by the Program Chairs]