Is diffusion-forcing-transformer open source?

Yes — kwsong0113/diffusion-forcing-transformer is an open-source project tracked on heatdrop.

What language is diffusion-forcing-transformer written in?

kwsong0113/diffusion-forcing-transformer is primarily written in Python.

How popular is diffusion-forcing-transformer?

kwsong0113/diffusion-forcing-transformer has 702 stars on GitHub.

Where can I find diffusion-forcing-transformer?

kwsong0113/diffusion-forcing-transformer is on GitHub at https://github.com/kwsong0113/diffusion-forcing-transformer.

← all repositories

kwsong0113/diffusion-forcing-transformer

Video diffusion that remembers what just happened

It generates videos from any number of context frames and uses history-aware guidance to keep long rollouts temporally coherent.

★702 stars Python Image · Video · Audio

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

The Diffusion Forcing Transformer (DFoT) generates video conditioned on any number of context frames—one image or many. It pairs this architecture with History Guidance, a family of sampling methods that reuse previously generated frames to improve temporal consistency and motion dynamics. The repository includes pretrained weights, a HuggingFace demo, and training code for several datasets.

The interesting bit

DFoT is built to accept an arbitrary number of conditioning frames by design, not just a fixed context window. That architectural choice is what enables History Guidance: because the model can already ingest its own history at inference time, guiding sampling with past frames is a native operation rather than a post-hoc workaround.

Key highlights

Generates short clips from multiple images or extremely long videos from a single frame
History Guidance explicitly leverages past generated frames during sampling
Interactive browser demo and pretrained checkpoints are available on HuggingFace
Training recipes cover RealEstate10K, Kinetics-600, and Minecraft
ICML 2025 official implementation

Caveats

Training requires 12 GPUs with 80 GB VRAM according to the README
Training assumes Weights & Biases is configured for logging
Minecraft data requires an extra preprocessing step to convert videos into latents

Verdict

Researchers working on long-form or controllable video generation should explore the pretrained demo; anyone hoping to train from scratch needs a cluster and a W&B account.

Frequently asked

What is kwsong0113/diffusion-forcing-transformer?: It generates videos from any number of context frames and uses history-aware guidance to keep long rollouts temporally coherent.
Is diffusion-forcing-transformer open source?: Yes — kwsong0113/diffusion-forcing-transformer is an open-source project tracked on heatdrop.
What language is diffusion-forcing-transformer written in?: kwsong0113/diffusion-forcing-transformer is primarily written in Python.
How popular is diffusion-forcing-transformer?: kwsong0113/diffusion-forcing-transformer has 702 stars on GitHub.
Where can I find diffusion-forcing-transformer?: kwsong0113/diffusion-forcing-transformer is on GitHub at https://github.com/kwsong0113/diffusion-forcing-transformer.