fishaudio/Bert-VITS2
Bert-VITS2 is a multilingual text-to-speech model combining VITS2 vocoder architecture with multilingual BERT embeddings for improved voice synthesis.

Velocity · 7d
+8.4
★ / day
Trend
→steady
star history
The project implements VITS2, an end-to-end neural vocoder for TTS, enhanced by multilingual BERT to improve prosody and pronunciation accuracy. Users preprocess training data using webui_preprocess.py, and the system supports multiple languages through BERT embeddings. The architecture builds on prior work from MassTTS and jaywalnut310/vits.