PlayVoice/whisper-vits-svc
A deep learning model for end-to-end singing voice conversion using VITS (Variational Inference with adversarial learning).

Velocity · 7d
+2.1
★ / day
Trend
→steady
star history
This project implements a variational inference model with adversarial learning for singing voice conversion based on the VITS architecture. It enables converting one singer’s voice to another speaker’s voice, supports multiple speakers, speaker mixing, and basic F0 editing. The model requires a minimum of 6GB VRAM for training and can even handle audio with light accompaniment.