Inverse Factorized Soft Q-Learning for Cooperative Multi-agent Imitation Learning

Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track

Bibtex Paper Supplemental

Authors

The Viet Bui, Tien Mai, Thanh Nguyen

Abstract

This paper concerns imitation learning (IL) in cooperative multi-agent systems.The learning problem under consideration poses several challenges, characterized by high-dimensional state and action spaces and intricate inter-agent dependencies. In a single-agent setting, IL was shown to be done efficiently via an inverse soft-Q learning process. However, extending this framework to a multi-agent context introduces the need to simultaneously learn both local value functions to capture local observations and individual actions, and a joint value function for exploiting centralized learning.In this work, we introduce a new multi-agent IL algorithm designed to address these challenges. Our approach enables thecentralized learning by leveraging mixing networks to aggregate decentralized Q functions.We further establish conditions for the mixing networks under which the multi-agent IL objective function exhibits convexity within the Q function space.We present extensive experiments conducted on some challenging multi-agent game environments, including an advanced version of the Star-Craft multi-agent challenge (SMACv2), which demonstrates the effectiveness of our algorithm.