showlab/Image2Paragraph
An AI pipeline combining vision models and LLMs to convert images into unique descriptive paragraphs.

Velocity · 7d
+0.7
★ / day
Trend
→steady
star history
The project transforms images into paragraphs by chaining together multiple AI models including BLIP2 for image captioning, Segment Anything for semantic segmentation, ControlNet for spatial understanding, GRIT for image parsing, and GPT-4 via ChatGPT API for text generation. It runs on GPU with under 8GB memory and completes in under 20 seconds per image.