Entropy testing and its application to testing Bayesian networks

Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track

Bibtex Paper

Authors

Clément L. Canonne, Joy Qiping Yang

Abstract

This paper studies the problem of \emph{entropy identity testing}: given sample access to a distribution $p$ and a fully described distribution $q$ (both are discrete distributions over the support of size $k$), and the promise that either $p = q$ or $ | H(p) - H(q) | \geqslant \varepsilon$, where $H(\cdot)$ denotes the Shannon entropy, a tester needs to distinguish between the two cases with high probability. We establish a near-optimal sample complexity bound of $\tilde{\Theta}(\sqrt{k}/\varepsilon + {1}/{\varepsilon^2}$) for this problem, and show how to apply it to the problem of identity testing for in-degree-$d$ $n$-dimensional Bayesian networks, obtaining an upper bound of $\tilde{O}( {2^{d / 2} n^{3/2}}/{\varepsilon^2} + {n^2}/{\varepsilon^4} )$. This improves on the sample complexity bound of $\tilde{O}(2^{d/2}n^2/\varepsilon^4)$ from Canonne, Diakonikolas, Kane, and Stewart (2020), which required an additional assumption on the structure of the (unknown) Bayesian network.