opendatalab/DocLayout-YOLO
A real-time document layout detection model based on YOLO-v10 trained on a 300K synthetic document dataset.

DocLayout-YOLO is a document layout detection system that identifies and localizes document elements like text blocks, images, tables, and figures in diverse document types. It introduces Mesh-candidate BestFit, a two-dimensional bin-packing approach for synthesizing large-scale labeled document data, and a Global-to-Local Controllability module for multi-scale detection. The model is pretrained on DocSynth-300K, a 300,000-sample diverse document dataset, and achieves real-time inference speeds while maintaining accuracy across varying document layouts.