On the Minimax Regret for Contextual Linear Bandits and Multi-Armed Bandits with Expert Advice

Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track

Bibtex Paper

Authors

Shinji Ito

Abstract

This paper examines two extensions of multi-armed bandit problems: multi-armed bandits with expert advice and contextual linear bandits. For the former problem, multi-armed bandits with expert advice, the previously known best upper and lower bounds have been $O(\sqrt{KT \log \frac{N}{K} })$ and $\Omega( \sqrt{KT \frac{ \log N }{\log K }} )$, respectively. Here, $K$, $N$, and $T$ represent the numbers of arms, experts, and rounds, respectively. We provide a lower bound of $\Omega( \sqrt{KT \log \frac{N}{K}} )$ for the setup in which the player chooses an expert before observing the advices in each round. For the latter problem, contextual linear bandits, we provide an algorithm that achieves $O ( \sqrt{d T \log ( K \min\{ 1, \frac{S}{d} \} )} )$ together with a matching lower bound, where $d$ and $S$ represent the dimensionality of feature vectors and the size of the context space, respectively.