← all repositories

NVlabs/OmniVinci

OmniVinci is an NVIDIA research multimodal LLM that jointly processes vision, audio, and language inputs.

OmniVinci
Velocity · 7d
+2.6
★ / day
Trend
steady
star history

OmniVinci is an omni-modal large language model designed to jointly understand vision, audio, and language inputs. It is published at ICLR 2026 and available as a model on HuggingFace. The project includes code, pretrained weights, and training pipelines for this multimodal foundation model.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.