← all repositories
weinman/cnn_lstm_ctc_ocr

A tighter CRNN that beats the original on its own turf

A TensorFlow 1.x reimplementation of the classic CRNN text-recognition architecture that trims 15% of parameters while nudging word error rate lower on standard synthetic data.

503 stars Python Computer Vision
cnn_lstm_ctc_ocr
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

Trains a convolutional-recurrent neural network to read words in images end-to-end, no character segmentation required. The model feeds CNN features into a stacked bidirectional LSTM and learns with CTC loss, the standard recipe for scene-text OCR. It ships with scripts to download the MJSynth synthetic dataset, pack it into TensorFlow records, and train via a Makefile-driven pipeline.

The interesting bit

The architecture is deliberately not a revolution. It is a careful refactor of Shi et al.’s CRNN: paired 3×3 convolutions replace single layers early on, the final expensive 2×2×512 conv is dropped in favor of vertical max-pooling, and horizontal downsampling is restrained so narrow fonts survive. Batch normalization is added after every conv pair. The payoff is 15% fewer convolutional parameters and a 1.82% word error rate on case-insensitive closed-vocabulary MJSynth—edging below the original CRNN’s reported numbers.

Key highlights

  • TensorFlow 1.x implementation using tf.data and custom Estimator APIs for I/O and training
  • Supports open, closed, and mixed vocabulary decoding; optional lexicon-constrained beam search via a forked CTCWordBeamSearch module
  • Dynamic training data generation supported through MapTextSynthesizer for domain-specific augmentation
  • Pre-trained checkpoints published via DOI for reproducibility of ICDAR 2019 historical-map recognition results
  • Validation script runs interactively: pipe image paths to validate.py and read decoded text from stdout

Caveats

  • Python 2.7 only; TensorFlow ≥1.10 with deprecation warnings for newer versions—this is legacy-stack code
  • Model parameters are hardcoded in src/model.py, not exposed as command-line flags
  • Full MJSynth download takes 4–12 hours; the included 0.1% demo set is only useful for a quick smoke test

Verdict

Worth a look if you need a well-documented, reproducible CRNN baseline in TensorFlow 1.x or want to study architectural tweaks that trade a little compute for cleaner convergence. Skip it if you are already committed to PyTorch, TensorFlow 2.x, or modern transformer-based OCR; this is a research artifact, not a maintained product.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.