stepfun-ai/Step-Audio2
Step-Audio 2 is an open-source multi-modal LLM for audio understanding and speech conversation.

Velocity · 7d
+4.5
★ / day
Trend
→steady
star history
Step-Audio 2 is an end-to-end multi-modal large language model designed for audio understanding and speech conversation. It supports open-source model weights on HuggingFace with versions including mini, mini-Base, and mini-Think. The project provides vLLM backend integration and example scripts for inference, targeting industry-strength speech AI applications.