$p$-value Adjustment for Monotonous, Unbiased, and Fast Clustering Comparison

Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track

Bibtex Paper Supplemental

Authors

Kai Klede, Thomas Altstidl, Dario Zanca, Bjoern Eskofier

Abstract

Popular metrics for clustering comparison, like the Adjusted Rand Index and the Adjusted Mutual Information, are type II biased. The Standardized Mutual Information removes this bias but suffers from counterintuitive non-monotonicity and poor computational efficiency. We introduce the $p$-value adjusted Rand Index ($\operatorname{PMI}_2$), the first cluster comparison method that is type II unbiased and provably monotonous. The $\operatorname{PMI}_2$ has fast approximations that outperform the Standardized Mutual information. We demonstrate its unbiased clustering selection, approximation quality, and runtime efficiency on synthetic benchmarks. In experiments on image and social network datasets, we show how the $\operatorname{PMI}_2$ can help practitioners choose better clustering and community detection algorithms.