gpt-omni/mini-omni2
An open-source multimodal language model enabling real-time voice conversations with image, audio, and text understanding.

Velocity · 7d
+3.2
★ / day
Trend
→steady
star history
Mini-Omni2 is a foundation model designed to replicate GPT-4o-style omni capabilities. It accepts image, audio, and text inputs and produces end-to-end speech-to-speech responses without requiring separate ASR or TTS models. The project includes model weights, inference code, and chat demo functionality for real-time conversational interaction with interruption handling.