AID: Attention Interpolation of Text-to-Image Diffusion

Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track

Bibtex Paper

Authors

He Qiyuan, Jinghao Wang, Ziwei Liu, Angela Yao

Abstract

Conditional diffusion models can create unseen images in various settings, aiding image interpolation. Interpolation in latent spaces is well-studied, but interpolation with specific conditions like text or image is less understood. Common approaches interpolate linearly in the conditioning space but tend to result in inconsistent images with poor fidelity. This work introduces a novel training-free technique named \textbf{Attention Interpolation via Diffusion (AID)}. AID has two key contributions: \textbf{1)} a fused inner/outer interpolated attention layer to boost image consistency and fidelity; and \textbf{2)} selection of interpolation coefficients via a beta distribution to increase smoothness. Additionally, we present an AID variant called \textbf{Prompt-guided Attention Interpolation via Diffusion (PAID)}, which \textbf{3)} treats interpolation as a condition-dependent generative process. Experiments demonstrate that our method achieves greater consistency, smoothness, and efficiency in condition-based interpolation, aligning closely with human preferences. Furthermore, PAID offers substantial benefits for compositional generation, controlled image editing, image morphing and image-controlled generation, all while remaining training-free.