← all repositories
githubharald/CTCWordBeamSearch

CTC decoding that knows "ba" from "a ba"

A CTC decoder that constrains output to dictionary words while gracefully handling numbers, punctuation, and other non-word characters that pure token passing chokes on.

CTCWordBeamSearch
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

Word beam search is a CTC decoder for sequence recognition tasks like handwritten text or speech. You feed it RNN outputs (softmax already applied, shape TxBx(C+1)) and it returns decoded text strings. The twist: it only emits words that appear in your dictionary, but unlike stricter decoders, it doesn’t break when it encounters digits, punctuation, or other characters between words.

The interesting bit

The algorithm sits in a sweet spot between vanilla beam search (too permissive, gets words wrong) and token passing (too rigid, fails on non-word characters). It also optionally scores beams with an n-gram language model, with four modes trading off accuracy for speed—from O(1) dictionary-only lookup up to O(W*log(W)) forecasting of next words.

Key highlights

  • Four LM modes: “Words”, “NGrams”, “NGramsForecast”, “NGramsForecastAndSample” with documented runtime complexity vs. dictionary size W
  • Add-k smoothing for unseen bigrams (configurable, 0 to 1)
  • Handles arbitrary non-word characters between words—numbers, punctuation—without dictionary lookup failure
  • C++ core with Python 3.11/3.12 bindings, installable via pip install .
  • Ships with TensorFlow custom op and pure Python prototype in extras/

Caveats

  • Requires careful character ordering: your RNN output dimension must exactly match the chars string order, with CTC-blank last
  • Constructor takes UTF-8 encoded strings (.encode('utf8')), a Python 2-ism that feels dated in 2024
  • No GPU acceleration mentioned; decoding happens on CPU

Verdict

Worth a look if you’re doing CTC-based text recognition and need dictionary constraints without the brittleness of pure token passing. Skip it if you’re already using a modern end-to-end model with built-in LM fusion (Whisper, etc.)—this solves a 2018-era problem with 2018-era ergonomics.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.