← all repositories

Plachtaa/VITS-fast-fine-tuning

A pipeline for fine-tuning VITS text-to-speech models to clone voices and perform cross-lingual voice conversion.

5k stars Python Image · Video · Audio
VITS-fast-fine-tuning
Velocity · 7d
+4.1
★ / day
Trend
steady
star history

This repository provides tools for rapid speaker adaptation of VITS (Variational Inference for Text-to-Speech) models, enabling voice cloning from short audio clips, long recordings, or videos within under an hour of fine-tuning. It supports English, Japanese, and Chinese TTS synthesis as well as many-to-many voice conversion between added characters and preset speakers in the model. Users can fine-tune locally or via Google Colab to create custom voice profiles.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.