Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track
Julius Kunze, Daniel Severo, Jan-Willem van de Meent, James Townsend
We present a general method for lossless compression of unordered data structures, including multisets and graphs. It is a variant of shuffle coding that is many orders of magnitude faster than the original and enables 'one-shot' compression of single unordered objects. Our method achieves state-of-the-art compression rates on various large-scale network graphs at speeds of megabytes per second, efficiently handling even a multi-gigabyte plain graph with one billion edges. We release an implementation that can be easily adapted to different data types and statistical models.