← all repositories
CompVis/stable-diffusion

The 10-gig horse: Stable Diffusion's original research release

The CompVis reference implementation that proved latent diffusion could run on consumer GPUs.

73.1k stars Jupyter Notebook Image · Video · Audio
stable-diffusion
Velocity · 7d
+52
★ / day
Trend
steady
star history

What it does Stable Diffusion v1 generates 512×512 images from text prompts using a latent diffusion model: an 860M-parameter UNet and a frozen CLIP ViT-L/14 text encoder, trained on LAION-5B subsets. The repo provides reference sampling scripts (txt2img.py, img2img.py) and links to four progressively refined checkpoints (v1-1 through v1-4).

The interesting bit The “latent” part is the trick: diffusion happens in a compressed 8× downsampled space rather than raw pixels, which is why a model of this quality fits in 10 GB VRAM instead of a server farm. The authors explicitly call the weights “research artifacts” and ship them under a use-restricted OpenRAIL license with a safety checker and invisible watermarking—unusual candor about misuse risks in a release this popular.

Key highlights

  • Four published checkpoints with documented training curricula (256→512, aesthetic filtering, classifier-free guidance tuning)
  • Reference scripts include PLMS sampler, safety checker, and invisible watermarking
  • Hugging Face diffusers integration provided as the preferred community path
  • img2img.py supports SDEdit-style translation and upscaling via noise strength
  • Builds on OpenAI’s ADM codebase and lucidrains’ diffusion implementations

Caveats

  • The README warns against commercial deployment without additional safety mechanisms
  • EMA-only vs full checkpoints have a footgun: use_ema=False is required for inference config compatibility
  • Environment setup is conda-centric with pinned dependency versions (transformers==4.19.2)

Verdict Worth studying if you want to understand how latent diffusion actually works under the hood, or need the original checkpoints for reproducibility research. Most practitioners should probably use the Hugging Face diffusers pipeline instead—this repo is a paper reference, not a product.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.