guandeh17/Self-Forcing
Research codebase for training autoregressive video diffusion models with KV caching to enable real-time streaming video generation.

Self Forcing addresses the train-test distribution mismatch in autoregressive video diffusion by simulating the inference process during training, using KV caching and autoregressive rollout. It enables real-time streaming video generation on a single RTX 4090 GPU while achieving quality comparable to state-of-the-art diffusion models. The project provides model weights on HuggingFace and implementation code for training and inference with the Wan2.1-T2V-1.3B base model.