AIDC-AI/Ovis
Ovis is a multimodal LLM architecture that structurally aligns visual and textual embeddings across multiple model sizes (1B–34B).

Ovis is a multimodal large language model architecture designed to structurally align visual and textual embeddings. It supports tasks across vision-language understanding, reasoning, chart analysis, video comprehension, and multilingual OCR. The project provides multiple model sizes (2B to 34B parameters) with versions including Ovis2 and the newer Ovis2.5 featuring native-resolution visual perception and enhanced reflective reasoning capabilities.