A voice cloner that speaks Mandarin, with baggage
Real-time TTS voice cloning forked for Chinese, but the repo is now in maintenance mode while its author builds a commercial successor.

What it does MockingBird is a PyTorch implementation of SV2TTS (speaker verification to multispeaker text-to-speech) that lets you clone a voice from a short audio sample and generate new speech in that voice. It’s explicitly forked from CorentinJ’s Real-Time-Voice-Cloning project, with the key addition of Mandarin Chinese support through multiple datasets (aidatatang_200zh, magicdata, aishell3, data_aishell). You can run it via web server, desktop toolbox, or command line.
The interesting bit
The project reuses pretrained encoder and vocoder models while requiring you to train or download a new synthesizer specifically for Chinese symbols — a pragmatic split that saves compute but adds friction. The M1 Mac setup instructions are endearingly elaborate, involving Rosetta terminals, manual C header paths, and a custom pythonM1 wrapper script.
Key highlights
- Supports Mandarin out of the box, unlike the original English-only upstream
- Community-shared pretrained synthesizer models available (Baidu Pan, Aliyun Drive)
- Web server mode for remote API-style usage
- Multiple vocoder options: WaveRNN, HiFi-GAN, and Fre-GAN
- Tested on Tesla T4 and GTX 2060; Windows, Linux, and M1 macOS supported
Caveats
- Author no longer actively updates the repo; development has moved to commercial product noiz.ai
- Requirements.txt is pinned to August 2021 PyTorch versions (1.9.0, CUDA 10.2) and breaks with newer stacks
demo_cliis non-functional; you must obtain or train a Chinese-compatible synthesizer model- Several community models only work with repo tag 0.0.1
Verdict Worth a look if you specifically need open-source Mandarin voice cloning and can tolerate dated dependencies. Skip it if you want maintained code or a polished English experience — the original upstream or newer alternatives are likely smoother.