← all repositories

jasonppy/VoiceCraft

A token infilling neural codec language model for zero-shot speech editing and text-to-speech on in-the-wild audio.

8.5k stars Jupyter Notebook Image · Video · AudioInference · Serving
VoiceCraft
Velocity · 7d
+10
★ / day
Trend
steady
star history

VoiceCraft is a generative speech model that performs zero-shot text-to-speech and speech editing using only a few seconds of reference audio. It uses a neural codec language model with token infilling to achieve state-of-the-art performance on diverse audio sources including audiobooks, internet videos, and podcasts. The repository provides multiple inference options including Gradio UI, Google Colab notebooks, Docker, and command-line scripts for integration.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.