A leaderboard for speech recognition that refuses to die
A community-curated spreadsheet of WER benchmarks, tracking who beat whom and by how much, since 2015.

What it does
wer_are_we is a GitHub repository that collects Word Error Rate (WER) results for speech recognition across standard benchmarks: LibriSpeech, WSJ, and Switchboard/CallHome. Each entry lists the paper, the date, the score, and whatever architectural footnotes matter—self-supervised pre-training, spec augmentation, transformer LMs, or just “humans” as a baseline. It is essentially a markdown table that the community is invited to correct.
The interesting bit
The value is in the footnotes, not the numbers. You can watch the field’s obsessions shift in real time: from HMM-TDNNs and iVectors through end-to-end attention, to the current regime where everyone seems to be stacking Conformers and pre-training on unlabeled data. The README also quietly flags when results used extra training data, which is the kind of asterisk that actual leaderboards often bury.
Key highlights
- Tracks three major benchmarks with consistent formatting since 2015
- Includes human baselines (5.83% WER on LibriSpeech test-clean, per Deep Speech 2)
- Current best on LibriSpeech: 1.8% / 2.9% (HuBERT, June 2021, self-supervised on 60K hours unlabeled)
- Explicitly notes data augmentation methods and out-of-corpus training where known
- Openly editable: the README invites corrections
Caveats
- No automation: it is hand-maintained markdown, so freshness depends on pull requests
- Coverage is spotty outside the big three benchmarks; no TTS, no streaming latency numbers, no industrial APIs
- Some entries are truncated or incomplete in the source (the README cuts off mid-table in places)
Verdict
Useful if you are writing a paper and need to cite “state of the art” with actual numbers, or if you want to argue that your 2.1% is competitive. Not useful if you need code, models, or reproducibility details—this is a bibliography with WER columns, not a framework.