ATH-MaaS/Ovis
Ovis is a multimodal LLM architecture that structurally aligns visual and textual embeddings across multiple model sizes (1B–34B).

Ovis is a multimodal large language model architecture designed to structurally align visual and textual embeddings. It supports tasks across vision-language understanding, reasoning, chart analysis, video comprehension, and multilingual OCR. The project provides multiple model sizes (2B to 34B parameters) with versions including Ovis2 and the newer Ovis2.5 featuring native-resolution visual perception and enhanced reflective reasoning capabilities.
Frequently asked
- What is ATH-MaaS/Ovis?
- Ovis is a multimodal LLM architecture that structurally aligns visual and textual embeddings across multiple model sizes (1B–34B).
- Is Ovis open source?
- Yes — ATH-MaaS/Ovis is open source, released under the Apache-2.0 license.
- What language is Ovis written in?
- ATH-MaaS/Ovis is primarily written in Python.
- How popular is Ovis?
- ATH-MaaS/Ovis has 1.5k stars on GitHub and is currently holding steady.
- Where can I find Ovis?
- ATH-MaaS/Ovis is on GitHub at https://github.com/ATH-MaaS/Ovis.