← all repositories

PlayVoice/vits_chinese

A Chinese Text-to-Speech system built on VITS with BERT embeddings for natural prosody and Microsoft Natural Speech features for reduced sound errors.

1.2k stars Python Image · Video · Audio
vits_chinese
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

This project implements a Text-to-Speech system based on VITS (Variational Inference for Text-to-Speech) architecture. It uses BERT to extract hidden prosody embeddings for natural grammatical pauses, incorporates inference loss techniques from Microsoft Natural Speech to reduce sound errors, and supports ONNX streaming inference for deployment.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.