Is deep-voice-conversion open source?

Yes — andabi/deep-voice-conversion is open source, released under the MIT license.

What language is deep-voice-conversion written in?

andabi/deep-voice-conversion is primarily written in Python.

How popular is deep-voice-conversion?

andabi/deep-voice-conversion has 3.9k stars on GitHub.

Where can I find deep-voice-conversion?

andabi/deep-voice-conversion is on GitHub at https://github.com/andabi/deep-voice-conversion.

← all repositories

andabi/deep-voice-conversion

Voice style transfer without the parallel-data headache

It converts any voice into a specific target voice using only unpaired waveforms, skipping the painstaking matched recordings voice cloning usually demands.

★3.9k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This TensorFlow project performs many-to-one voice conversion, turning arbitrary speech into a target voice—most notably demonstrated by making inputs sound like Kate Winslet. It sidesteps the usual voice-cloning prerequisite of parallel corpora: no matched source-target recordings, text transcripts, or phoneme alignments for the target are required. You only need a collection of the target speaker’s isolated waveforms and a small labeled dataset from anonymous speakers to bootstrap phoneme recognition.

The interesting bit

The system decouples content from identity using two stacked networks. Net1 extracts speaker-independent phoneme posteriors from spectrograms, while Net2 maps those phonemes back into the target speaker’s spectral envelope. By borrowing CBHG modules from Tacotron, the model handles sequential acoustic features without ever forcing the source and target to utter the same sentence.

Key highlights

Trains without parallel data—only target-speaker waveforms and anonymous <wav, phone> pairs are needed.
Net1 classifies spectrograms into 60 English phonemes using the TIMIT dataset, achieving over 70% test accuracy.
Net2 synthesizes target spectrograms and reconstructs audio via Griffin-Lim.
Demonstrated on the public Arctic dataset and a private two-hour Kate Winslet audiobook corpus.
Uses CBHG blocks (1-D convolution bank + highway network + bidirectional GRU) adapted from Tacotron.

Caveats

The codebase targets Python 2.7 and TensorFlow ≥ 1.1, so expect a retro dependency stack.
The Kate Winslet dataset is private; reproducing that exact demo requires sourcing your own target audio.
Net1’s phoneme accuracy is modest (over 70%), though the authors note Net2 tolerates classifier error surprisingly well.

Verdict

A solid reference if you’re studying many-to-one voice conversion or want to see how phoneme posteriors can replace painstaking paired data. Avoid it if you need a modern, production-ready voice pipeline—this is a research artifact frozen in a 2017 toolchain.

Frequently asked

What is andabi/deep-voice-conversion?: It converts any voice into a specific target voice using only unpaired waveforms, skipping the painstaking matched recordings voice cloning usually demands.
Is deep-voice-conversion open source?: Yes — andabi/deep-voice-conversion is open source, released under the MIT license.
What language is deep-voice-conversion written in?: andabi/deep-voice-conversion is primarily written in Python.
How popular is deep-voice-conversion?: andabi/deep-voice-conversion has 3.9k stars on GitHub.
Where can I find deep-voice-conversion?: andabi/deep-voice-conversion is on GitHub at https://github.com/andabi/deep-voice-conversion.