Improving Neural ODE Training with Temporal Adaptive Batch Normalization

Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track

Bibtex Paper Supplemental

Authors

Su Zheng, Zhengqi Gao, Fan-Keng Sun, Duane Boning, Bei Yu, Martin D. Wong

Abstract

Neural ordinary differential equations (Neural ODEs) is a family of continuous-depth neural networks where the evolution of hidden states is governed by learnable temporal derivatives. We identify a significant limitation in applying traditional Batch Normalization (BN) to Neural ODEs, due to a fundamental mismatch --- BN was initially designed for discrete neural networks with no temporal dimension, whereas Neural ODEs operate continuously over time. To bridge this gap, we introduce temporal adaptive Batch Normalization (TA-BN), a novel technique that acts as the continuous-time analog to traditional BN. Our empirical findings reveal that TA-BN enables the stacking of more layers within Neural ODEs, enhancing their performance. Moreover, when confined to a model architecture consisting of a single Neural ODE followed by a linear layer, TA-BN achieves 91.1\% test accuracy on CIFAR-10 with 2.2 million parameters, making it the first \texttt{unmixed} Neural ODE architecture to approach MobileNetV2-level parameter efficiency. Extensive numerical experiments on image classification and physical system modeling substantiate the superiority of TA-BN compared to baseline methods.