Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track
Sergey Samsonov, Eric Moulines, Qi-Man Shao, Zhuo-Song Zhang, Alexey Naumov
In this paper, we obtain the Berry–Esseen bound for multivariate normal approximation for the Polyak-Ruppert averaged iterates of the linear stochastic approximation (LSA) algorithm with decreasing step size. Moreover, we prove the non-asymptotic validity of the confidence intervals for parameter estimation with LSA based on multiplier bootstrap. This procedure updates the LSA estimate together with a set of randomly perturbed LSA estimates upon the arrival of subsequent observations. We illustrate our findings in the setting of temporal difference learning with linear function approximation.