← all repositories
jina-ai/dalle-flow

DALL·E Flow: an assembly line for picky prompt engineers

An interactive pipeline that chains multiple diffusion models and CLIP ranking so humans can iterate instead of praying to a single prompt.

dalle-flow
Velocity · 7d
+1.9
★ / day
Trend
steady
star history

What it does DALL·E Flow is a client-server pipeline that turns text prompts into 1024×1024 images through multiple stages. It generates candidates using DALL·E-Mega, GLID-3 XL, and Stable Diffusion; ranks them with CLIP-as-service; diffuses the winner through GLID-3 XL for texture enrichment; then upscales via SwinIR or RealESRGAN. The whole thing is wired together with Jina, exposing gRPC, WebSocket, and HTTP interfaces.

The interesting bit The project treats generative art as an iterative procedure rather than a lottery. The “human-in-the-loop” angle is the actual product insight: single-prompt-single-output UIs “lock the imagination to a single possibility, which is bad no matter how fine this single result is.” The pipeline formalizes the back-and-forth that artists already want to do.

Key highlights

  • Chains three generators (DALL·E-Mega, GLID-3 XL, Stable Diffusion) plus CLIP ranking and diffusion refinement
  • Upscaled output to 1024×1024 via SwinIR; RealESRGAN added later as alternative
  • Built on Jina for client-server scaling with non-blocking streaming
  • Prebuilt Docker image for CUDA 11.6; fits on single GPU with 21GB memory
  • CLIP-as-service now requires access token (jina ≥ v3.11.0)
  • Includes CLIP-based segmentation from prompts

Caveats

  • The deprecation banner at the top of the README suggests the project may be winding down or superseded; the exact status is unclear from the truncated source
  • Multiple breaking changes and URL migrations documented; Colab notebooks need periodic reopening
  • Stable Diffusion integration requires separate weight download and ToS acceptance

Verdict Worth a look if you’re building interactive image generation tools and want a reference architecture for chaining models with human feedback loops. Skip if you need a maintained, turnkey API; the deprecation signal and operational churn suggest this is more instructive than dependable.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.