Optimal Hypothesis Selection in (Almost) Linear Time

Aliakbarpour, Maryam; Bun, Mark; Smith, Adam

Optimal Hypothesis Selection in (Almost) Linear Time

Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track

Bibtex Paper

Authors

Maryam Aliakbarpour, Mark Bun, Adam Smith

Abstract

Hypothesis selection, also known as density estimation, is a fundamental problem in statistics and learning theory. Suppose we are given a sample set from an unknown distribution $P$ and a finite class of candidate distributions (called hypotheses) $\mathcal{H} \coloneqq \{H_1, H_2, \ldots, H_n\}$. The aim is to design an algorithm that selects a distribution $\hat H$ in $\mathcal{H}$ that best fits the data. The algorithm's accuracy is measured based on the distance between $\hat{H}$ and $P$ compared to the distance of the closest distribution in $\mathcal{H}$ to $P$ (denoted by $OPT$). Concretely, we aim for $\|\hat{H} - P\|_{TV}$ to be at most $ \alpha \cdot OPT + \epsilon$ for some small $\epsilon$ and $\alpha$. While it is possible to decrease the value of $\epsilon$ as the number of samples increases, $\alpha$ is an inherent characteristic of the algorithm. In fact, one cannot hope to achieve $\alpha < 3$ even when there are only two candidate hypotheses, unless the number of samples is proportional to the domain size of $P$ [Bousquet, Kane, Moran '19]. Finding the best $\alpha$ has been one of the main focuses of studies of the problem since early work of [Devroye, Lugosi '01]. Prior to our work, no algorithm was known that achieves $\alpha = 3$ in near-linear time. We provide the first algorithm that operates in almost linear time ($\tilde{O}(n/\epsilon^3)$ time) and achieves $\alpha = 3$. This result improves upon a long list of results in hypothesis selection. Previously known algorithms either had worse time complexity, a larger factor $\alpha$, or extra assumptions about the problem setting.In addition to this algorithm, we provide another (almost) linear-time algorithm with better dependency on the additive accuracy parameter $\epsilon$, albeit with a slightly worse accuracy parameter, $\alpha = 4$.

Optimal Hypothesis Selection in (Almost) Linear Time

Authors

Abstract

Name Change Policy