← all repositories

QwenLM/Qwen2.5-Omni

A 7B-parameter end-to-end multimodal foundation model by Alibaba's Qwen team that processes text, images, audio, and video while generating both text and speech.

4k stars Jupyter Notebook Language ModelsImage · Video · Audio
Qwen2.5-Omni
Velocity · 7d
+9.1
★ / day
Trend
steady
star history

Qwen2.5-Omni is a flagship multimodal foundation model from Alibaba Cloud’s Qwen team. It processes diverse inputs including text, images, audio, and video in an end-to-end manner, and can generate streaming text responses and natural speech synthesis. The model ranked first among 7B-parameter multimodal models and is available on Hugging Face and ModelScope with supporting cookbooks and demos.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.