← all repositories

stepfun-ai/Step-Audio-EditX

A 3B-parameter LLM-based audio editing model that controls emotion, speaking style, and paralinguistics via reinforcement learning.

926 stars Python Image · Video · Audio
Step-Audio-EditX
Velocity · 7d
+4.2
★ / day
Trend
steady
star history

Step-Audio-EditX is a large language model for audio editing and synthesis. It enables editing of emotion, speaking style, and paralinguistic features in audio while supporting zero-shot text-to-speech. The model is trained using reinforcement learning techniques including SFT, DPO, and GRPO. It supports cross-lingual capabilities including English, Japanese, and Korean, and can be deployed via vLLM for efficient inference.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.