A fossil record of how we used to teach machines to listen
An early TensorFlow seq2seq speech project that its own authors now point to as a historical artifact.

What it does This repo houses Python experiments in speech-to-text using TensorFlow’s now-deprecated seq2seq APIs. It includes toy classifiers for numbers and speakers, plus denser architectures, all wired up to spectrograms and live audio via pyaudio. The stated goal was a standalone Linux speech recognizer built on plentiful public training data.
The interesting bit The README is unusually honest: the authors have twice updated it to tell you to go elsewhere—first to Mozilla DeepSpeech in 2020, then to OpenAI’s Whisper in 2024. That makes it a rare self-annotated graveyard, useful for tracing how quickly the SOTA treadmill can obsolete a project.
Key highlights
- Built on TensorFlow 1.0 seq2seq, now incompatible with current releases
- Includes toy examples (
number_classifier_tflearn.py,speaker_classifier_tflearn.py) and a densenet variant - Ships with spectrogram visualizations and live recording via
record.py - Proposes extensions that now look prescient: GPU WarpCTC, modular graphs, and “P2P learning” snapshots
- Explicitly maintained “only for educational purposes” since 2020
Caveats
- The installation instructions require building portaudio from source and hand-tweaking
LD_LIBRARY_PATH; the README even misspellsLIBRARY_PATHasLIDRARY_PATH - Dependencies (
layer,tensorpeers) are separate repos by the same author, so the project is somewhat glued together - No training scripts beyond
train.shwith no documented contents; getting to a working model is left as an exercise
Verdict Worth a quick browse if you’re writing a history-of-STT talk or want to see how seq2seq was wielded in 2016. Anyone building something today should follow the authors’ own advice and use Whisper instead.