← all repositories
emedvedev/attention-ocr

TensorFlow 1.x OCR with attention, still waiting for its sequel

A packaged-up attention OCR model that trains on GCP but hasn't made the jump to TensorFlow 2.

1.1k stars Python Computer VisionML Frameworks
attention-ocr
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

attention-ocr is a Python package (aocr) that trains a visual attention-based OCR model: sliding CNN → LSTM → attention decoder. It handles the full pipeline from TFRecords dataset creation to training, testing with attention visualization, and exporting as SavedModel or frozen graph for TensorFlow Serving.

The interesting bit

The project packages a research model into something deployable. The CLI wraps dataset building, training, and export; the Google Cloud ML Engine integration means you can spin up GPU training jobs without writing your own plumbing. The attention visualization during testing is a nice diagnostic — you can see where the model is looking for each character.

Key highlights

  • Installable via pip install aocr with CLI for dataset, train, test, and export commands
  • Exports to SavedModel (default) or frozen graph for serving
  • Includes TensorFlow Serving REST API setup with base64-encoded image input
  • Google Cloud ML Engine training job support documented with gcloud examples
  • Attention map visualization during testing, saved to out/ by default

Caveats

  • Stuck on TensorFlow 1.x; TF2 upgrade is “planned” but not done, and the README invites PRs
  • Training “takes quite a long time to reach convergence” since CNN and attention train simultaneously
  • Export requires manually moving files into a version-numbered subdirectory for TensorFlow Serving

Verdict

Worth a look if you need an attention-based OCR you can train on GCP and serve via TensorFlow Serving — but only if you’re willing to work in the TensorFlow 1.x era. Everyone else should probably wait for that TF2 migration or look elsewhere.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.