microsoft/SoM
SoM is a visual prompting technique that overlays spatial marks on images to improve GPT-4V's visual grounding capabilities.

Velocity · 7d
+1.6
★ / day
Trend
→steady
star history
The Set-of-Mark (SoM) prompting method overlays visual markers on images to unlock spatial and visual understanding in large multimodal models like GPT-4V. The project releases a Python toolbox for generating SoM prompts, a GPT-4V demo integration, a vision benchmark for evaluation, and an open-source SoM-LLaVA implementation for empowering other open-source MLLMs with this technique.