← all repositories

showlab/Image2Paragraph

An AI pipeline combining vision models and LLMs to convert images into unique descriptive paragraphs.

Image2Paragraph
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

The project transforms images into paragraphs by chaining together multiple AI models including BLIP2 for image captioning, Segment Anything for semantic segmentation, ControlNet for spatial understanding, GRIT for image parsing, and GPT-4 via ChatGPT API for text generation. It runs on GPU with under 8GB memory and completes in under 20 seconds per image.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.