Video Diffusion Models are Training-free Motion Interpreter and Controller

Synthesized Motion Control Results

Cameral Motion Control
Object Motion Control

Reference Motion Control Results

Cameral Motion Control

Reference

Generated Results

Reference

Generated Results

Object Motion Control

Reference

Generated Results

Point-drag Results

Qualitative Comparison

DragNUWA

Gen-2

Ours

Motion Correspondence

Reference

Correspondence

* We calculate the correspondence for all frames here.

Video Consistency

Origin

Control Signal

Vanilla Control

Shared K&V

+ 8-frame Gradient Clip

+ All-frame Gradient Clip