← all repositories

deepglint/unicom

UNICOM is a large-scale vision transformer model designed as a visual backbone for multimodal large language models like LLaVA.

unicom
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

The repository provides foundational visual representation models trained at scale using LAION400M and COYO700M datasets. It implements sample-to-cluster contrastive learning to optimize vision encoders, and these models serve as the vision tower in multimodal LLM pipelines such as LLaVA-NeXT with Qwen2.5-7B. Benchmarks demonstrate strong performance across document understanding, chart analysis, and general VQA tasks.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.