79k stars, one 0.9B model: the OCR toolkit China built
PaddleOCR turns scans and PDFs into structured Markdown or JSON using a tiny vision-language model that punches above its weight class.

What it does PaddleOCR extracts text, tables, formulas, and charts from images and PDFs, then outputs structured Markdown or JSON ready for LLM pipelines. It handles 100+ languages, including mixed multilingual documents, and runs on everything from NVIDIA GPUs to Kunlunxin XPUs and plain Intel CPUs.
The interesting bit The flagship PaddleOCR-VL-1.6 model is only 0.9B parameters—small enough to deploy at the edge—yet claims 96.3% accuracy on OmniDocBench v1.6, beating both open-source and proprietary alternatives. The trick is a NaViT-style dynamic resolution visual encoder paired with a lightweight ERNIE language model, trading brute scale for architectural efficiency.
Key highlights
- PP-OCRv5: single-model multilingual recognition with a 13% accuracy boost over prior versions
- PP-StructureV3: layout-aware parsing with fine-grained coordinate data (table cells, text boxes)
- PaddleOCR.js: official browser SDK for running OCR without a backend
- DOCX export: parsed results editable in Word, not just raw text dumps
- Deep ecosystem integration: used by Dify, RAGFlow, Pathway, and Cherry Studio
Caveats
- The README is heavy on benchmark claims (“SOTA,” “industry-leading”) but light on reproducible methodology or latency numbers for specific hardware
- “Seamless migration” between versions is promised, but the 3.5.0 notes mention switching inference backends (Paddle static/dynamic graph, Transformers)—actual migration friction is unclear
- Ancient documents, rare characters, and seals are highlighted as new capabilities, but no public test sets are named for these niche cases
Verdict Worth evaluating if you’re building RAG pipelines or document agents and need a battle-tested OCR layer that won’t demand a GPU farm. Skip if you need transparent benchmarking or if your documents are mostly clean, single-column English text where simpler tools suffice.