gligen/GLIGEN
A text-to-image diffusion model that grounds generation on spatial inputs like bounding boxes, keypoints, and reference images.

Velocity · 7d
+1.8
★ / day
Trend
→steady
star history
GLIGEN extends frozen text-to-image models to accept additional spatial conditioning inputs including bounding boxes, keypoints, and reference images. Published at CVPR 2023, it demonstrates zero-shot performance on COCO and LVIS benchmarks that exceeds supervised layout-to-image baselines. The project includes inference code and integration with Hugging Face Spaces for demos.