Is Phonetisaurus open source?

Yes — AdolfVonKleist/Phonetisaurus is open source, released under the BSD-3-Clause license.

What language is Phonetisaurus written in?

AdolfVonKleist/Phonetisaurus is primarily written in Shell.

How popular is Phonetisaurus?

AdolfVonKleist/Phonetisaurus has 517 stars on GitHub.

Where can I find Phonetisaurus?

AdolfVonKleist/Phonetisaurus is on GitHub at https://github.com/AdolfVonKleist/Phonetisaurus.

← all repositories

AdolfVonKleist/Phonetisaurus

Speech recognition's missing dictionary generator

Trains models that guess how words sound, because you can't ship a pronunciation dictionary for every proper noun the user will invent.

★517 stars Shell Data Tooling Other AI

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Phonetisaurus builds grapheme-to-phoneme (G2P) models: feed it a dictionary of known word-to-pronunciation mappings, and it learns to generate pronunciations for words it has never seen. The output is a weighted finite-state transducer (WFST) in OpenFst format, the same representation used by Kaldi and other speech toolkits. It ships as C++ binaries with optional Python 3 bindings for extracting scores, alignments, and raw lattices.

The interesting bit

The project treats G2P as a joint n-gram modeling problem over aligned grapheme-phoneme sequences, then compiles the result into an FST. This is the old-school, pre-neural approach—fast, compact, and interpretable, with a lineage tracing back to INTERSPEECH papers and the original Google Code era. The README still references git-lfs archives of those historical releases.

Key highlights

End-to-end training pipeline: align lexicon, estimate n-gram model, convert to WFST
Wrapper scripts (phonetisaurus-train, phonetisaurus-apply) hide the OpenFst plumbing
Supports n-best output, probability mass filtering, and greedy decoding
Optional Python bindings expose per-multigram scores and alignments
Docker images available; tested build path for Ubuntu 20.04 + OpenFst 1.7.2

Caveats

Requires manual OpenFst installation and LD_LIBRARY_PATH wrangling; not a pip install experience
Python bindings need pybindgen and a manual .so copy step that feels circa 2010
The phonetisaurus-g2prnn binary exists but the README offers no usage details—unclear if RNN support is first-class or vestigial

Verdict

Worth a look if you’re maintaining a Kaldi-based ASR pipeline or need a lightweight, self-contained G2P module without dragging in PyTorch. Skip it if you want state-of-the-art neural G2P out of the box; this is the reliable sedan, not the self-driving car.

Frequently asked

What is AdolfVonKleist/Phonetisaurus?: Trains models that guess how words sound, because you can't ship a pronunciation dictionary for every proper noun the user will invent.
Is Phonetisaurus open source?: Yes — AdolfVonKleist/Phonetisaurus is open source, released under the BSD-3-Clause license.
What language is Phonetisaurus written in?: AdolfVonKleist/Phonetisaurus is primarily written in Shell.
How popular is Phonetisaurus?: AdolfVonKleist/Phonetisaurus has 517 stars on GitHub.
Where can I find Phonetisaurus?: AdolfVonKleist/Phonetisaurus is on GitHub at https://github.com/AdolfVonKleist/Phonetisaurus.