A tighter CRNN that beats the original on its own turf
A TensorFlow 1.x reimplementation of the classic CRNN text-recognition architecture that trims 15% of parameters while nudging word error rate lower on standard synthetic data.

What it does
Trains a convolutional-recurrent neural network to read words in images end-to-end, no character segmentation required. The model feeds CNN features into a stacked bidirectional LSTM and learns with CTC loss, the standard recipe for scene-text OCR. It ships with scripts to download the MJSynth synthetic dataset, pack it into TensorFlow records, and train via a Makefile-driven pipeline.
The interesting bit
The architecture is deliberately not a revolution. It is a careful refactor of Shi et al.’s CRNN: paired 3×3 convolutions replace single layers early on, the final expensive 2×2×512 conv is dropped in favor of vertical max-pooling, and horizontal downsampling is restrained so narrow fonts survive. Batch normalization is added after every conv pair. The payoff is 15% fewer convolutional parameters and a 1.82% word error rate on case-insensitive closed-vocabulary MJSynth—edging below the original CRNN’s reported numbers.
Key highlights
- TensorFlow 1.x implementation using
tf.dataand customEstimatorAPIs for I/O and training - Supports open, closed, and mixed vocabulary decoding; optional lexicon-constrained beam search via a forked CTCWordBeamSearch module
- Dynamic training data generation supported through MapTextSynthesizer for domain-specific augmentation
- Pre-trained checkpoints published via DOI for reproducibility of ICDAR 2019 historical-map recognition results
- Validation script runs interactively: pipe image paths to
validate.pyand read decoded text from stdout
Caveats
- Python 2.7 only; TensorFlow ≥1.10 with deprecation warnings for newer versions—this is legacy-stack code
- Model parameters are hardcoded in
src/model.py, not exposed as command-line flags - Full MJSynth download takes 4–12 hours; the included 0.1% demo set is only useful for a quick smoke test
Verdict
Worth a look if you need a well-documented, reproducible CRNN baseline in TensorFlow 1.x or want to study architectural tweaks that trade a little compute for cleaner convergence. Skip it if you are already committed to PyTorch, TensorFlow 2.x, or modern transformer-based OCR; this is a research artifact, not a maintained product.