Is SoulX-Podcast open source?

Yes — Soul-AILab/SoulX-Podcast is open source, released under the Apache-2.0 license.

What language is SoulX-Podcast written in?

Soul-AILab/SoulX-Podcast is primarily written in Python.

How popular is SoulX-Podcast?

Soul-AILab/SoulX-Podcast has 3.5k stars on GitHub.

Where can I find SoulX-Podcast?

Soul-AILab/SoulX-Podcast is on GitHub at https://github.com/Soul-AILab/SoulX-Podcast.

← all repositories

Soul-AILab/SoulX-Podcast

An AI podcast generator that coughs, laughs, and speaks Henanese

SoulX-Podcast generates multi-speaker, multi-turn dialogic speech with paralinguistic tics and cross-dialectal voice cloning for realistic long-form podcasts.

★3.5k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

SoulX-Podcast is an inference framework that synthesizes long-form, multi-speaker podcast dialogue from text. It handles both conversational multi-turn speech and conventional monologue TTS, supporting Mandarin, English, and several Chinese dialects including Cantonese, Sichuanese, and Henanese. The system also accepts paralinguistic tags—such as <|laughter|>, <|sigh|>, and <|coughing|>—to inject non-verbal sounds into generated audio.

The interesting bit

The unusual part is the cross-dialectal zero-shot voice cloning: you can prompt the model with a Mandarin voice sample and have it generate speech in Cantonese or Sichuanese while preserving the speaker’s characteristics. It also treats paralinguistic events like laughter and coughing as first-class tokens rather than post-processing afterthoughts, which matters more for naturalistic dialogue than for sterile read-aloud TTS.

Key highlights

Multi-turn, multi-speaker dialogic generation for podcast-style conversations
Zero-shot voice cloning across Chinese dialects (Mandarin, Cantonese, Sichuanese, Henanese)
Paralinguistic control via special tags: laughter, sighing, breathing, coughing, throat clearing
1.7B parameter models available on Hugging Face with WebUI and Docker/vLLM support
Apache 2.0 licensed, though the authors include a strict usage disclaimer against impersonation and deepfakes

Caveats

Streaming inference is marked as a pending TODO; current generation is offline only
The repository provides inference scripts and pretrained weights, not training code or datasets

Verdict

Worth a look if you need realistic multi-speaker Chinese podcast synthesis with regional dialect support or want to experiment with controllable paralinguistic events in speech generation. Skip it if you need real-time streaming TTS or are looking for a fully general-purpose voice-cloning toolkit beyond the supported language set.

Frequently asked

What is Soul-AILab/SoulX-Podcast?: SoulX-Podcast generates multi-speaker, multi-turn dialogic speech with paralinguistic tics and cross-dialectal voice cloning for realistic long-form podcasts.
Is SoulX-Podcast open source?: Yes — Soul-AILab/SoulX-Podcast is open source, released under the Apache-2.0 license.
What language is SoulX-Podcast written in?: Soul-AILab/SoulX-Podcast is primarily written in Python.
How popular is SoulX-Podcast?: Soul-AILab/SoulX-Podcast has 3.5k stars on GitHub.
Where can I find SoulX-Podcast?: Soul-AILab/SoulX-Podcast is on GitHub at https://github.com/Soul-AILab/SoulX-Podcast.