sparkfish/augraphy
Python library that generates synthetic distorted document images for training machine learning models.

Augraphy is an augmentation pipeline library that creates realistic synthetic documents simulating printing, scanning, faxing and copying processes. It applies configurable transformations to clean documents to produce degraded versions, generating large volumes of paired training data for document processing neural networks. This reverses the typical data problem by starting from known-good originals and degrading them, providing ground-truth pairs for training models that remove document distortions.