AILab-CVC/SEED
Multimodal LLM combining vision and language capabilities, implemented with training code for pretraining and instruction tuning.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
This repository provides the official implementation of SEED-LLaMA, a multimodal large language model that integrates visual and textual understanding. The codebase includes the SEED tokenizer, multimodal LLM pretraining pipeline, and instruction tuning components. It supports large-scale multi-node training with DeepSpeed and provides efficient training data pipelines for vision-language model development.