← all repositories

gpt-omni/mini-omni

An open-source multimodal LLM that enables real-time speech-to-speech conversation with streaming audio output and concurrent text/audio generation.

mini-omni
Velocity · 7d
+5.5
★ / day
Trend
steady
star history

Mini-Omni is a multimodal large language model designed for real-time voice conversation. It processes speech input directly without requiring separate ASR or TTS models, enabling true end-to-end speech-to-speech interaction. The model can generate text and audio simultaneously while thinking, and supports streaming audio output for natural conversational experiences.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.