← all repositories
lllyasviel/ControlNet

Taming Stable Diffusion with scribbles, edges, and zero convolutions

ControlNet lets you steer image generation with sketches, poses, or depth maps without retraining the base model from scratch.

ControlNet
Velocity · 7d
+28
★ / day
Trend
steady
star history

What it does ControlNet is a neural network structure that plugs into Stable Diffusion and lets you condition generation on extra inputs—Canny edges, HED boundaries, human poses, depth maps, normal maps, semantic segmentation, or even MS Paint scribbles. It copies each block of the base model into a “locked” frozen copy and a “trainable” copy that learns your condition. The trick is a 1×1 “zero convolution” initialized with zeros, so before training starts the added network outputs nothing and can’t distort the original model.

The interesting bit The zero convolution is the quiet hack that makes this feasible: weights start at zero, yet gradients somehow still flow (the README hand-waves to a separate FAQ doc for the proof). Because the original SD encoder stays frozen and doesn’t store gradients, GPU memory barely budges despite adding many layers. You can train on small datasets or “personal devices” without destroying a production-ready model.

Key highlights

  • Nine pretrained Gradio apps covering edges, lines, HED, scribbles, pose, segmentation, depth, and normals
  • Reuses the SD encoder as backbone; no layer trained from scratch
  • Low VRAM mode for 8GB GPUs
  • “Guess Mode” generates images from control maps alone—no positive prompt, no negative prompt, no caption detector
  • Depth model ingests full 512×512 maps vs. Stability’s 64×64, preserving more detail
  • ControlNets can be transferred to community models

Caveats

  • The Gradio UI is repeatedly described as “difficult to customize” and “buggy”; some workflows require drawing in external software and importing
  • The interactive scribble canvas was recently broken and the README notes a fix “will update asap”
  • Anime line model exists but is withheld pending risk evaluation

Verdict Anyone who wants Stable Diffusion to actually respect composition, pose, or geometry should look here. If you’re expecting a polished consumer app, you’ll find a research codebase with nine Python scripts and some honest complaints about Gradio.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.