← all repositories
senlinuc/caffe_ocr

Caffe OCR: When BLSTM is optional and CNNs do the heavy lifting

A Windows-first Caffe fork that questions whether you even need recurrent layers for Chinese text recognition.

1.3k stars C++ Computer Vision
caffe_ocr
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

This is a research sandbox for OCR architectures built on a patched version of Caffe. It implements CNN+BLSTM+CTC pipelines for Chinese and English text recognition, with ready-made VS2015 projects, pre-trained models, and ~3.6 million synthetic Chinese training samples (news + classical literature, heavily augmented). The repo also includes evaluation tools and lexicon-assisted prediction for English.

The interesting bit

The author keeps accidentally proving you don’t need the BLSTM. A pure CNN+CTC variant (densenet-no-blstm-vertical-feature) hits 98.16% accuracy on Chinese—slightly better than the full recurrent stack—by preserving vertical stroke detail through the feature maps. The repo documents this as an open question, not a sales pitch.

Key highlights

  • Ships with concrete benchmarks: best Chinese result is 98.05% (densenet-sum-blstm-full-res-blstm), fastest is 2.4ms/GPU (densenet-no-blstm)
  • Residual BLSTM connections give measurable gains: 94% → 96.5% → 98.05%
  • Includes memory-efficient DenseNet, custom transpose/reverse layers, and Warp-CTC integration without sequence-indicator layers
  • Synthetic Chinese dataset (360万 samples) available via Baidu Pan; English uses VGG Synthetic Word Dataset
  • Pre-trained Chinese model downloadable; English lexicon-assisted decoding included

Caveats

  • Windows-only build system (VS2015); Linux requires manual patch merging into upstream Caffe
  • DenseNet CPU path is “very slow”—raw convolution without BLAS, explicitly noted as unoptimized
  • Narrow glyphs like “11” or “ll” get dropped due to oversized receptive fields
  • All dependencies and models hosted on Baidu Pan, which may be inaccessible outside China

Verdict

Worth a look if you’re reproducing 2017-era OCR baselines or studying whether recurrent layers earn their compute cost. Skip if you need production-ready code, modern frameworks, or straightforward Linux builds.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.