Finding text in the wild, one kernel at a time
A PyTorch reimplementation of PSENet, a CVPR 2019 method for detecting oddly-shaped text in images.

What it does PSENet detects text in natural scene images—think curved signs, rotated labels, or wavy book spines. It outputs bounding regions by progressively expanding small “kernel” predictions until they meet neighboring text instances, which keeps overlapping words from merging into one blob.
The interesting bit The “progressive scale expansion” trick: instead of predicting full text masks directly, the network learns multiple shrunken versions of each text region and expands them outward during post-processing. It’s a bit like growing crystals in a dish—each word gets its own territory without invading the neighbor’s.
Key highlights
- Official PyTorch implementation of a CVPR 2019 paper; now upgraded from Python 2 to Python 3
- Pre-trained ResNet50 checkpoints provided for three standard benchmarks: ICDAR 2015, CTW1500, and Total-Text
- Evaluation scripts and speed reporting built in (
--report_speedflag) - Also integrated into MMOCR and ported to Paddle by third parties
- Apache 2.0 license
Caveats
- Dependencies are pinned to fairly old versions (PyTorch 1.1.0, mmcv 0.2.12, OpenCV 3.4.2.17); expect some archaeology to get running on modern stacks
- The README doesn’t explain the progressive expansion algorithm in any detail—you’ll need the paper for that
- No training data or data preparation scripts shown; just config files and command templates
Verdict Worth a look if you’re reproducing scene-text detection baselines or need a reference PSENet implementation. Skip it if you want a maintained, batteries-included OCR pipeline; MMOCR is probably the smoother path now.