← all repositories

OpenMOSS/MOSS-TTSD

A multi-speaker text-to-speech model for expressive spoken dialogue generation with zero-shot voice cloning from short audio references.

1.3k stars Python Image · Video · Audio
MOSS-TTSD
Velocity · 7d
+3.8
★ / day
Trend
steady
star history

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enabling zero-shot voice cloning from short audio references. The model is built with PyTorch and supports fine-tuning for real-world long-form content creation including podcasts, audiobooks, and entertainment scenarios.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.