← all repositories

hustvl/EVF-SAM

A multimodal model that segments objects in images based on text prompts by fusing vision and language representations early.

EVF-SAM
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

This repository implements EVF-SAM, a Segment Anything Model extended with early vision-language fusion for text-prompted referring image segmentation. The model processes images together with text descriptions to output segmentation masks for the described regions. It extends both the original SAM and SAM-2 architectures to support textual grounding, enabling users to specify what to segment using natural language rather than visual prompts.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.