← all repositories

ByteDance-Seed/Seed1.5-VL

ByteDance's vision-language foundation model combining a 532M vision encoder with a 20B parameter MoE LLM for multimodal understanding and reasoning.

1.6k stars Jupyter Notebook Language ModelsImage · Video · Audio
Seed1.5-VL
Velocity · 7d
+4.0
★ / day
Trend
steady
star history

Seed1.5-VL is a general-purpose vision-language foundation model designed for advanced multimodal understanding and reasoning. It combines a 532M-parameter vision encoder with a 20B active parameter mixture-of-experts language model to achieve state-of-the-art performance across diverse benchmarks including OCR, diagram understanding, visual grounding, 3D spatial reasoning, video comprehension, and agent-centric tasks like GUI control and gameplay. The repository provides usage cookbooks and best practices for developers.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.