← all repositories

QwenLM/Qwen3-Omni

A natively end-to-end multilingual omni-modal foundation model that processes text, audio, images, and video while generating real-time text and speech responses.

3.8k stars Jupyter Notebook Language ModelsImage · Video · Audio
Qwen3-Omni
Velocity · 7d
+15
★ / day
Trend
steady
star history

Qwen3-Omni is a foundation model that handles multiple modalities in a unified architecture. It processes text, images, audio, and video as inputs and generates both text and natural speech as outputs in real time. The model represents an end-to-end approach to multimodal understanding and generation, released with model weights, demos, and cookbooks by Alibaba Cloud’s Qwen team.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.