clovaai/donut
OCR-free document understanding transformer that directly processes visual documents without external OCR engines.

Velocity · 7d
+4.8
★ / day
Trend
→steady
star history
Donut is an end-to-end transformer model for visual document understanding that eliminates the need for external OCR engines by directly processing document images. It handles tasks such as document classification and information extraction through a multimodal architecture combining vision and language understanding. The accompanying SynthDoG generator creates synthetic training documents to pretrain the model across diverse languages and domains.