← all repositories

cambrian-mllm/cambrian-s

A multimodal LLM that performs spatial supersensing — interpreting depth, geometry, and spatial relations — in video sequences.

cambrian-s
Velocity · 7d
+2.3
★ / day
Trend
steady
star history

Cambrian-S is a vision-language model designed to understand spatial relationships and geometry in video. It builds on the Cambrian MLLM family to process visual input and generate spatially-grounded understanding across video frames. The project includes model weights on HuggingFace, training datasets (VSI-590K), and an evaluation benchmark (VSI-Super) for spatial video understanding.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.