AIDC-AI/Ovis

Ovis is a multimodal LLM architecture that structurally aligns visual and textual embeddings across multiple model sizes (1B–34B).

★1.5k stars Python Language Models

View on GitHub ↗ Homepage ↗

Velocity · 7d

+2.0

★ / day

Trend

→steady

star history

Ovis is a multimodal large language model architecture designed to structurally align visual and textual embeddings. It supports tasks across vision-language understanding, reasoning, chart analysis, video comprehension, and multilingual OCR. The project provides multiple model sizes (2B to 34B parameters) with versions including Ovis2 and the newer Ovis2.5 featuring native-resolution visual perception and enhanced reflective reasoning capabilities.