Speech recognition's missing dictionary generator
Trains models that guess how words sound, because you can't ship a pronunciation dictionary for every proper noun the user will invent.

What it does
Phonetisaurus builds grapheme-to-phoneme (G2P) models: feed it a dictionary of known word-to-pronunciation mappings, and it learns to generate pronunciations for words it has never seen. The output is a weighted finite-state transducer (WFST) in OpenFst format, the same representation used by Kaldi and other speech toolkits. It ships as C++ binaries with optional Python 3 bindings for extracting scores, alignments, and raw lattices.
The interesting bit
The project treats G2P as a joint n-gram modeling problem over aligned grapheme-phoneme sequences, then compiles the result into an FST. This is the old-school, pre-neural approach—fast, compact, and interpretable, with a lineage tracing back to INTERSPEECH papers and the original Google Code era. The README still references git-lfs archives of those historical releases.
Key highlights
- End-to-end training pipeline: align lexicon, estimate n-gram model, convert to WFST
- Wrapper scripts (
phonetisaurus-train,phonetisaurus-apply) hide the OpenFst plumbing - Supports n-best output, probability mass filtering, and greedy decoding
- Optional Python bindings expose per-multigram scores and alignments
- Docker images available; tested build path for Ubuntu 20.04 + OpenFst 1.7.2
Caveats
- Requires manual OpenFst installation and
LD_LIBRARY_PATHwrangling; not apip installexperience - Python bindings need
pybindgenand a manual.socopy step that feels circa 2010 - The
phonetisaurus-g2prnnbinary exists but the README offers no usage details—unclear if RNN support is first-class or vestigial
Verdict
Worth a look if you’re maintaining a Kaldi-based ASR pipeline or need a lightweight, self-contained G2P module without dragging in PyTorch. Skip it if you want state-of-the-art neural G2P out of the box; this is the reliable sedan, not the self-driving car.