← all repositories

Belval/TextRecognitionDataGenerator

A Python tool that generates synthetic text images with configurable fonts, backgrounds, and distortions to create training data for OCR systems.

3.7k stars Python Data ToolingComputer Vision
TextRecognitionDataGenerator
Velocity · 7d
+1.1
★ / day
Trend
steady
star history

TextRecognitionDataGenerator creates synthetic text images to train optical character recognition models. It supports multiple languages including non-Latin scripts, configurable fonts, backgrounds, and text distortions to produce realistic training data. The tool can be used via CLI, pip package, or Docker, allowing users to generate datasets at scale for training custom OCR systems.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.