← all repositories
Jakobovski/free-spoken-digit-dataset

MNIST, but you can hear it coming

A tidy, versioned dataset of 3,000 spoken digits for when your model needs to learn what "seven" sounds like at 8kHz.

677 stars Python Data Tooling
free-spoken-digit-dataset
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does FSDD collects wav recordings of English digits 0-9 from six speakers, trimmed to near-minimal silence and organized with predictable filenames like 7_jackson_32.wav. It ships with a prescribed train/test split (first 10% of each speaker’s recordings) and a small Python API for loading data and generating spectrograms.

The interesting bit The dataset is deliberately boring in the right ways: fixed 8kHz mono, consistent naming, Zenodo DOI versioning for reproducibility, and a metadata.py tracking speaker gender and accent. That predictability is the point — it’s audio MNIST in spirit, a quick sanity-check substrate for speech pipelines.

Key highlights

  • 3,000 recordings: 50 per digit per speaker across 6 speakers
  • Pre-built spectrograms and a trimmer.py utility for silence-hacking your own additions
  • Direct loaders for PyTorch, TensorFlow, and the Hub ecosystem
  • 50+ scholarly citations, plus wrappers in TensorFlow Datasets and Accord.NET
  • CC BY-SA 4.0, with explicit contribution workflow for growing the corpus

Caveats

  • English-only, six speakers — accent and demographic coverage is narrow
  • 8kHz is telephone-grade; don’t expect rich phonetic detail
  • The “Made with FSDD” list is self-reported and not curated

Verdict Grab this if you need a lightweight, reproducible spoken-digit baseline or a teaching dataset. Skip it if you’re building a production voice interface — the speaker count and sampling rate won’t generalize far.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.