cambrian-mllm/cambrian-s
A multimodal LLM that performs spatial supersensing — interpreting depth, geometry, and spatial relations — in video sequences.

Velocity · 7d
+2.3
★ / day
Trend
→steady
star history
Cambrian-S is a vision-language model designed to understand spatial relationships and geometry in video. It builds on the Cambrian MLLM family to process visual input and generate spatially-grounded understanding across video frames. The project includes model weights on HuggingFace, training datasets (VSI-590K), and an evaluation benchmark (VSI-Super) for spatial video understanding.