OpenMOSS/MOSS-TTSD
A multi-speaker text-to-speech model for expressive spoken dialogue generation with zero-shot voice cloning from short audio references.

Velocity · 7d
+3.8
★ / day
Trend
→steady
star history
MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enabling zero-shot voice cloning from short audio references. The model is built with PyTorch and supports fine-tuning for real-world long-form content creation including podcasts, audiobooks, and entertainment scenarios.