← all repositories

fixie-ai/ultravox

A multimodal LLM that extends open-weight models (Llama, Mistral, Gemma) with a projector enabling direct audio understanding for real-time voice AI.

ultravox
Velocity · 7d
+6.0
★ / day
Trend
steady
star history

Ultravox is a speech-capable multimodal LLM that processes audio directly without a separate ASR stage, converting audio into the high-dimensional space used by the underlying language model. This direct coupling allows faster responses than cascading ASR + LLM systems. The model builds on research from AudioLM, SeamlessM4T, and similar works, and versions have been trained on Llama 3, Mistral, and Gemma architectures.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.