gpt-omni/mini-omni
An open-source multimodal LLM that enables real-time speech-to-speech conversation with streaming audio output and concurrent text/audio generation.

Velocity · 7d
+5.5
★ / day
Trend
→steady
star history
Mini-Omni is a multimodal large language model designed for real-time voice conversation. It processes speech input directly without requiring separate ASR or TTS models, enabling true end-to-end speech-to-speech interaction. The model can generate text and audio simultaneously while thinking, and supports streaming audio output for natural conversational experiences.