← all repositories
Sanster/text_renderer

Synthetic text images for OCR training, with a font-strictness fix

A Python tool that renders text images with configurable distortions to feed data-hungry OCR models like CRNN.

1.5k stars Python Data ToolingComputer Vision
text_renderer
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

Text Renderer generates labeled synthetic images for training OCR models. You provide fonts, a corpus, and a YAML config; it spits out distorted text images plus a labels file. Supports Latin and non-Latin scripts, with a small zoo of built-in effects: perspective transforms, curves, blur, emboss, color shifts, random spacing, and various line overlays.

The interesting bit

The “strict mode” solves a genuinely annoying problem: many fonts only cover a subset of characters, which silently breaks non-Latin training data with garbage glyphs. With --strict, the renderer retries text sampling until it finds a font that actually supports every character in the string. There’s also a check_font.py tool to audit font coverage before you waste a training run.

Key highlights

  • 15+ configurable effects (distortion, color, spacing, lines, blur) with per-effect probability in YAML
  • GPU acceleration path via OpenCV CUDA and a small Cython extension
  • Debug mode exports bounding boxes and transform visualizations
  • Explicitly targets CRNN-style models; 1,465 stars suggests it found its audience

Caveats

  • The README points to a “new version” at oh-my-ocr/text_renderer; this repo appears to be in maintenance mode
  • GPU setup requires compiling OpenCV with CUDA support manually—not a quick toggle
  • Ubuntu 16.04 and Python 3.5+ in the setup section dates the tooling somewhat

Verdict

Worth a look if you need synthetic OCR training data and want the font-strictness guardrails. If you’re starting fresh, check whether the newer oh-my-ocr fork has superseded this one.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.