← all repositories

NVlabs/describe-anything

A large multimodal model that generates detailed captions for arbitrary regions of images or video frames.

describe-anything
Velocity · 7d
+3.5
★ / day
Trend
steady
star history

Describe Anything Model (DAM) takes region annotations (points, boxes, scribbles, masks) on images or video frames and outputs detailed textual descriptions of those regions. For videos, a single frame annotation suffices. The project includes a new evaluation benchmark (DLC-Bench) to assess models on the detailed localized captioning task.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.