← all repositories
google/voice-builder

Google's TTS lab in a box, minus the official blessing

An unofficial Google project that wraps Festival and Merlin into a web UI so non-specialists can train synthetic voices on GCP.

687 stars JavaScript Image · Video · AudioData Tooling
voice-builder
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does Voice Builder is a browser-based wrapper around two classic TTS engines—Festival and Merlin—that runs on Google Cloud Platform. You upload audio and text data, click a button, and wait 30–60 minutes for a deployable voice model you can test with a “hello” and a play button. The whole stack (Docker, Firebase, App Engine, Cloud Functions, Genomics Pipeline API) deploys via shell scripts.

The interesting bit The project treats voice building as a batch job rather than a research artifact. It abstracts away the usual Festival/Merlin incantations by standardizing inputs through a JSON VoiceBuildingSpecification—lexicon paths, phonology, wavs, engine params—then hands that spec to either the built-in engines or a custom “data exporter” you can hook in to munge files first. That makes it feasible for linguists or language-preservation groups to iterate without becoming speech-processing hackers.

Key highlights

  • Ships with pre-loaded public data from Google’s language-resources repo, including a Sinhala example
  • Custom data exporter hook lets you transform lexicons or filter bad data before the TTS engine sees it
  • All job artifacts land in GCS buckets; the UI polls job status until deployment
  • Explicitly not an official Google product—disclaimed right at the top
  • Published research backing at ai.google/research/pubs/pub46977

Caveats

  • Deployment is a nine-step prerequisite slog across GCP, Firebase, gcloud, and Docker; one typo in deploy.sh and you’re debugging IAM roles
  • The Genomics Pipeline API dependency is a curious choice for TTS training and may date the architecture
  • No candidate images provided in the repo, so you’re flying blind on UI polish

Verdict Worth a look if you’re building voices for low-resource languages and need a shared web interface for non-technical collaborators. Skip it if you want modern neural TTS or a local, dependency-light setup.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.