Is distrifuser open source?

Yes — mit-han-lab/distrifuser is open source, released under the MIT license.

What language is distrifuser written in?

mit-han-lab/distrifuser is primarily written in Python.

How popular is distrifuser?

mit-han-lab/distrifuser has 727 stars on GitHub.

Where can I find distrifuser?

mit-han-lab/distrifuser is on GitHub at https://github.com/mit-han-lab/distrifuser.

← all repositories

mit-han-lab/distrifuser

Splitting high-res diffusion across GPUs without the seams

DistriFusion exists to accelerate high-resolution diffusion inference across multiple GPUs without retraining the model or tolerating the seams of naïve patch parallelism.

★727 stars Python Inference · Serving Image · Video · Audio

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

DistriFusion is a training-free method for running diffusion inference—SDXL and Stable Diffusion—across multiple NVIDIA GPUs. It partitions the high-resolution image into spatial patches, distributes them across devices, and reassembles the output. The distrifuser package wraps HuggingFace diffusers pipelines, so the API remains familiar.

The interesting bit

The first diffusion step uses synchronous communication so patches don’t drift apart and create seams. After that, the system reuses activations from the previous step via asynchronous communication, hiding the GPU-to-GPU overhead inside the computation pipeline. That scheduling detail is where the speedup comes from.

Key highlights

Benchmarked at 3840×3840 on A100s: 1.8×, 3.4×, and 6.1× speedups with 2, 4, and 8 GPUs respectively, using SDXL with a 50-step DDIM sampler.
Visual fidelity is preserved; the project reports FID against ground-truth images and shows no quality degradation.
No model retraining or architecture changes required—it is purely an inference-time patch-parallelism and communication optimization.
Adopted by NVIDIA TensorRT-LLM and ColossalAI, indicating the technique is useful beyond this reference implementation.
Compatible with the diffusers API via DistriSDXLPipeline and DistriConfig.

Caveats

All published benchmarks use NVIDIA A100s at very high resolutions; whether the gains translate to consumer GPUs or modest image sizes is not discussed.
Requires PyTorch 2.2 and CUDA ≥ 12.0, so legacy environments need not apply.

Verdict

A solid pick if you’re serving high-resolution diffusion models and need to squeeze latency out of a multi-GPU rack. If you’re on a single card or generating standard 512×512 images, this isn’t your bottleneck.

Frequently asked

What is mit-han-lab/distrifuser?: DistriFusion exists to accelerate high-resolution diffusion inference across multiple GPUs without retraining the model or tolerating the seams of naïve patch parallelism.
Is distrifuser open source?: Yes — mit-han-lab/distrifuser is open source, released under the MIT license.
What language is distrifuser written in?: mit-han-lab/distrifuser is primarily written in Python.
How popular is distrifuser?: mit-han-lab/distrifuser has 727 stars on GitHub.
Where can I find distrifuser?: mit-han-lab/distrifuser is on GitHub at https://github.com/mit-han-lab/distrifuser.