Trading off Consistency and Dimensionality of Convex Surrogates for Multiclass Classification

Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track

Bibtex Paper Supplemental

Authors

Enrique Nueve, Dhamma Kimpara, Bo Waggoner, Jessica Finocchiaro

Abstract

In multiclass classification over $n$ outcomes, we typically optimize some surrogate loss $L: \mathbb{R}^d \times\mathcal{Y} \to \mathbb{R}$ assigning real-valued error to predictions in $\mathbb{R}^d$. In this paradigm, outcomes must be embedded into the reals with dimension $d \approx n$ in order to design a consistent surrogate loss. Consistent losses are well-motivated theoretically, yet for large $n$, such as in information retrieval and structured prediction tasks, their optimization may be computationally infeasible. In practice, outcomes are typically embedded into some $\mathbb{R}^d$ for $d \ll n$, with little known about their suitability for multiclass classification. We investigate two approaches for trading off consistency and dimensionality in multiclass classification while using a convex surrogate loss. We first formalize partial consistency when the optimized surrogate has dimension $d \ll n$. We then check if partial consistency holds under a given embedding and low-noise assumption, providing insight into when to use a particular embedding into $\mathbb{R}^d$. Finally, we present a new method to construct (fully) consistent losses with $d \ll n$ out of multiple problem instances. Our practical approach leverages parallelism to sidestep lower bounds on $d$.