← all repositories

microsoft/SoM

SoM is a visual prompting technique that overlays spatial marks on images to improve GPT-4V's visual grounding capabilities.

1.5k stars Python Language ModelsLLMOps · Eval
SoM
Velocity · 7d
+1.6
★ / day
Trend
steady
star history

The Set-of-Mark (SoM) prompting method overlays visual markers on images to unlock spatial and visual understanding in large multimodal models like GPT-4V. The project releases a Python toolbox for generating SoM prompts, a GPT-4V demo integration, a vision benchmark for evaluation, and an open-source SoM-LLaVA implementation for empowering other open-source MLLMs with this technique.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.