NVlabs/OmniVinci
OmniVinci is an NVIDIA research multimodal LLM that jointly processes vision, audio, and language inputs.

Velocity · 7d
+2.6
★ / day
Trend
→steady
star history
OmniVinci is an omni-modal large language model designed to jointly understand vision, audio, and language inputs. It is published at ICLR 2026 and available as a model on HuggingFace. The project includes code, pretrained weights, and training pipelines for this multimodal foundation model.