← all repositories
Bartzi/see

Scene text recognition without bounding box babysitting

A 2018 Chainer implementation that learns to find and read text in images even when you only give it transcriptions, not coordinates.

577 stars Python Computer VisionML Frameworks
see
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does SEE trains a single network to both localize and recognize text in scene images—think street signs, house numbers, storefronts. The twist: it can learn from images where you only know what text appears, not where it is. The authors tested this on SVHN (house numbers) and FSNS (French street signs), the latter being especially apt since FSNS has no localization labels at all.

The interesting bit The network uses a curriculum learning strategy for FSNS, gradually increasing the maximum number of words per image (2, then 3, then 4+) during training. The README notes they found this necessary—not optional—to get the model to converge. There’s also a hardcoded loss_weights assumption that breaks if you train on images with more than 4 words; you must edit or delete it.

Key highlights

  • Semi-supervised setup: learns text localization without bounding-box ground truth
  • End-to-end training with Chainer, CUDA 8+, CUDNN 6+, NCCL 2+
  • Docker support with nvidia-docker for GPU training
  • Includes demo script (fsns_demo.py) for inference on single images
  • Pre-built SVHN and FSNS dataset preparation pipelines with download links

Caveats

  • Built on Chainer, which is now effectively a legacy framework (development ended in 2019)
  • CUDA/CUDNN version requirements are pinned to 2017-era versions; modern GPUs may need workarounds
  • The README is truncated mid-sentence for the FSNS demo section, so full inference details are unclear

Verdict Worth a look if you’re researching weakly-supervised text detection or need to reproduce the AAAI 2018 paper exactly. Skip it if you want production-ready scene text recognition—modern frameworks and newer architectures have superseded this implementation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.