Is cambrian-s open source?

Yes — cambrian-mllm/cambrian-s is open source, released under the Apache-2.0 license.

What language is cambrian-s written in?

cambrian-mllm/cambrian-s is primarily written in Python.

How popular is cambrian-s?

cambrian-mllm/cambrian-s has 562 stars on GitHub.

Where can I find cambrian-s?

cambrian-mllm/cambrian-s is on GitHub at https://github.com/cambrian-mllm/cambrian-s.

← all repositories

cambrian-mllm/cambrian-s

Teaching video models to know where things are, not just what they are

A family of open video MLLMs and a 590K-sample dataset targeting spatial reasoning—the boring kind of intelligence that most benchmarks accidentally let models cheat on.

★562 stars Python Language Models Image · Video · Audio Computer Vision

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does Cambrian-S is a set of video understanding models (0.5B to 7B parameters) built on Qwen2.5 and SigLIP2, plus a curated instruction-tuning dataset called VSI-590K. The project also ships VSI-SUPER, a benchmark designed to test whether models actually understand spatial relationships in video, or are just exploiting textual shortcuts.

The interesting bit The authors noticed that standard spatial benchmarks are leaky—models can score well without looking at the video. Their fix is “predictive sensing,” a training objective where the model learns to predict latent future frames, forcing it to build genuine spatial representations. One variant, Cambrian-S-7B-LFP, is trained explicitly with this objective.

Key highlights

Four model sizes (0.5B–7B) plus an LFP variant; all weights and training code are open
VSI-590K: 590K video QA pairs focused on spatial reasoning, with task-type and question-type breakdowns visible in the repo
VSI-SUPER benchmark plus evaluation code in lmms-eval/
Competitive on general video benchmarks (Perception Test, EgoSchema) while outperforming prior work on spatial tasks
Training pipeline is four-staged, from vision-language alignment through spatial video tuning; scripts provided for each stage
TPU-first training stack via TorchXLA

Caveats

Training is TPU-oriented (TorchXLA); GPU support is not mentioned in the README
The “predictive sensing” mechanism is described at a high level; architectural details require reading the paper
Dataset construction diagram suggests heavy curation, but exact filtering logic isn’t specified in the README

Verdict Worth a look if you’re building spatially-aware video agents, or if you’re suspicious that your current MLLM is just reading captions. Skip if you need a drop-in GPU training recipe today.

Frequently asked

What is cambrian-mllm/cambrian-s?: A family of open video MLLMs and a 590K-sample dataset targeting spatial reasoning—the boring kind of intelligence that most benchmarks accidentally let models cheat on.
Is cambrian-s open source?: Yes — cambrian-mllm/cambrian-s is open source, released under the Apache-2.0 license.
What language is cambrian-s written in?: cambrian-mllm/cambrian-s is primarily written in Python.
How popular is cambrian-s?: cambrian-mllm/cambrian-s has 562 stars on GitHub.
Where can I find cambrian-s?: cambrian-mllm/cambrian-s is on GitHub at https://github.com/cambrian-mllm/cambrian-s.