ttengwang/Caption-Anything
A multi-model image captioning tool that combines Segment Anything, visual captioning, and ChatGPT for controllable text generation.

Velocity · 7d
+1.5
★ / day
Trend
→steady
star history
Caption-Anything is an image processing system that leverages Segment Anything for visual segmentation, a captioning model for text generation, and ChatGPT for language-level control. Users can click on image regions to select objects, then generate descriptive captions with customizable style, length, sentiment, and factuality. The system also supports conversational follow-up about selected objects via ChatGPT integration.