ROIDICE: Offline Return on Investment Maximization for Efficient Decision Making

Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track

Bibtex Paper

Authors

Woosung Kim, Hayeong Lee, Jongmin Lee, Byung-Jun Lee

Abstract

In this paper, we propose a novel policy optimization framework that maximizes Return on Investment (ROI) of a policy using a fixed dataset within a Markov Decision Process (MDP) equipped with a cost function. ROI, defined as the ratio between the return and the accumulated cost of a policy, serves as a measure of efficiency of the policy. Despite the importance of maximizing ROI in various applications, it remains a challenging problem due to its nature as a ratio of two long-term values: return and accumulated cost. To address this, we formulate the ROI maximizing reinforcement learning problem as a linear fractional programming. We then incorporate the stationary distribution correction (DICE) framework to develop a practical offline ROI maximization algorithm.Our proposed algorithm, ROIDICE, yields an efficient policy that offers a superior trade-off between return and accumulated cost compared to policies trained using existing frameworks.