← all repositories

NVlabs/VILA

VILA is a family of open vision language models optimized for video and multi-image understanding tasks.

VILA
Velocity · 7d
+4.6
★ / day
Trend
steady
star history

VILA provides a suite of vision language models designed for efficient multimodal AI across edge, data center, and cloud deployments. The project includes models for video understanding, high-resolution image processing, and long-context video analysis. Recent releases cover OmniVinci for visual-audio joint understanding, LongVILA for million-token context windows, and NVILA for full-stack efficiency optimization of multi-modal model design.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.