nerdyrodent/VQGAN-CLIP
A local implementation of VQGAN+CLIP for generating images from text prompts.

Velocity · 7d
+1.5
★ / day
Trend
→steady
star history
This repository provides a local setup for running VQGAN+CLIP, a generative model that creates images from text descriptions. Based on Katherine Crowson’s work, it combines VQGAN (a generative adversarial network with vector quantization) with CLIP (a vision-language model) to guide image generation toward matching text prompts. Users provide text prompts and the system iteratively refines generated images to match the description.