cambrian-mllm/cambrian
A family of multimodal large language models that combine vision encoders (DINO, CLIP) with LLMs for vision-language understanding.

Velocity · 7d
+2.8
★ / day
Trend
→steady
star history
Cambrian-1 is a multimodal LLM family designed with a vision-centric approach. It combines language model capabilities with vision encoders to enable visual understanding within language model frameworks. The project includes model weights, training data (Cambrian-10M), and evaluation benchmarks (CV-Bench). Researchers released both base and instruction-tuned model variants with various sizes.