Is VITS-fast-fine-tuning open source?

Yes — Plachtaa/VITS-fast-fine-tuning is open source, released under the Apache-2.0 license.

What language is VITS-fast-fine-tuning written in?

Plachtaa/VITS-fast-fine-tuning is primarily written in Python.

How popular is VITS-fast-fine-tuning?

Plachtaa/VITS-fast-fine-tuning has 5k stars on GitHub.

Where can I find VITS-fast-fine-tuning?

Plachtaa/VITS-fast-fine-tuning is on GitHub at https://github.com/Plachtaa/VITS-fast-fine-tuning.

← all repositories

Plachtaa/VITS-fast-fine-tuning

Fast speaker adaptation for VITS, no research lab required

This repo exists so you can add any voice—your own, a game character's, or a Bilibili clip—to a pretrained trilingual VITS model for TTS and voice conversion without starting from scratch.

★5k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

VITS Fast Fine-tuning is a pipeline that grafts new speakers onto existing VITS text-to-speech models. You supply audio samples—anything from ten short clips to a Bilibili video link—and it fine-tunes the model so the new voice can speak English, Japanese, or Chinese, or be used to morph other modeled voices into that character.

The interesting bit

Instead of treating voice cloning as a weeks-long training project, this frames it as a quick adapter job on top of pretrained checkpoints. The many-to-many voice conversion is also a nice twist: any two speakers already inside the model can be cross-converted, not just mapped to a single target.

Key highlights

Accepts voice samples from short audio, long recordings, local videos, or Bilibili links
Trilingual TTS output in English, Japanese, and Chinese using custom or preset characters
Many-to-many voice conversion between any two speakers baked into the model
Fine-tuning time ranges from 20 minutes to two hours depending on how many voices you upload
Provides both a Windows inference executable and Hugging Face demo spaces

Caveats

The bundled inference executable is Windows-only; other platforms must use the CLI
The README claims the full workflow takes under an hour, but also quotes fine-tuning alone at up to two hours
Voice conversion requires an external ffmpeg dependency and only works between speakers already inside the model

Verdict

Good for hobbyists or developers who want to prototype a custom trilingual TTS cast or play with character voice conversion. Look elsewhere if you need to convert arbitrary unseen voices or a polished native GUI outside Windows.

Frequently asked

What is Plachtaa/VITS-fast-fine-tuning?: This repo exists so you can add any voice—your own, a game character's, or a Bilibili clip—to a pretrained trilingual VITS model for TTS and voice conversion without starting from scratch.
Is VITS-fast-fine-tuning open source?: Yes — Plachtaa/VITS-fast-fine-tuning is open source, released under the Apache-2.0 license.
What language is VITS-fast-fine-tuning written in?: Plachtaa/VITS-fast-fine-tuning is primarily written in Python.
How popular is VITS-fast-fine-tuning?: Plachtaa/VITS-fast-fine-tuning has 5k stars on GitHub.
Where can I find VITS-fast-fine-tuning?: Plachtaa/VITS-fast-fine-tuning is on GitHub at https://github.com/Plachtaa/VITS-fast-fine-tuning.