← all repositories
pharmapsychotic/clip-interrogator

Reverse-engineering images into Stable Diffusion prompts

A tool that runs an image through CLIP and BLIP to spit back a text prompt that could have generated it.

clip-interrogator
Velocity · 7d
+2.1
★ / day
Trend
steady
star history

What it does

Feed it an image and it returns a prompt string optimized for text-to-image models like Stable Diffusion. It pairs OpenAI’s CLIP (for matching images to text embeddings) with Salesforce’s BLIP (for caption generation), then ranks candidate terms to assemble something coherent.

The interesting bit

The project is essentially prompt engineering as infrastructure. It precomputes text embeddings, caches them, and lets you swap CLIP models depending on which Stable Diffusion version you’re targeting — ViT-L-14/openai for SD 1.x, ViT-H-14/laion2b_s32b_b79k for SD 2.0. There’s even a low-VRAM mode that drops usage from ~6.3GB to ~2.7GB, trading speed and quality.

Key highlights

  • Available as a pip-installable library, Colab notebook, HuggingFace Space, Replicate model, and Stable Diffusion Web UI extension
  • Supports OpenCLIP’s model zoo; model choice directly tied to Stable Diffusion version compatibility
  • Can rank custom term lists against your own images (v0.6.0+)
  • Precomputed embedding cache downloadable from HuggingFace to skip local computation
  • CLI and Gradio reference implementations included

Caveats

  • The README doesn’t explain how the BLIP + CLIP combination actually resolves conflicts between caption and ranked terms; the blending logic is opaque
  • v0.6.0 with BLIP2 support is labeled “WIP” — the stable release is pinned to 0.5.4
  • Output quality depends heavily on picking the right CLIP model for your target diffusion model; wrong pairing means wasted prompts

Verdict

Worth a look if you’re building image-to-prompt pipelines or trying to clone a visual style in Stable Diffusion. Skip it if you need interpretable, fine-grained control over how prompts are constructed — this is a black box that happens to work.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.