← all repositories

clovaai/donut

OCR-free document understanding transformer that directly processes visual documents without external OCR engines.

6.9k stars Python Computer VisionLanguage Models
donut
Velocity · 7d
+4.8
★ / day
Trend
steady
star history

Donut is an end-to-end transformer model for visual document understanding that eliminates the need for external OCR engines by directly processing document images. It handles tasks such as document classification and information extraction through a multimodal architecture combining vision and language understanding. The accompanying SynthDoG generator creates synthetic training documents to pretrain the model across diverse languages and domains.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.