Is OCRFlux open source?

Yes — chatdoc-com/OCRFlux is open source, released under the Apache-2.0 license.

What language is OCRFlux written in?

chatdoc-com/OCRFlux is primarily written in Python.

How popular is OCRFlux?

chatdoc-com/OCRFlux has 2.5k stars on GitHub.

Where can I find OCRFlux?

chatdoc-com/OCRFlux is on GitHub at https://github.com/chatdoc-com/OCRFlux.

← all repositories

chatdoc-com/OCRFlux

A 3B model that actually reads your PDFs, not just scrapes them

OCRFlux converts messy PDFs and images into clean Markdown by treating layout as a visual reasoning problem, not a pipeline of brittle heuristics.

★2.5k stars Python Data Tooling Language Models

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

OCRFlux is a Python toolkit that turns PDFs and images into structured Markdown. It handles multi-column layouts, figures, equations, and tables—then stitches together paragraphs and tables that span across page breaks. The authors claim this cross-page merging is the first open-source implementation of its kind.

Under the hood it runs a 3B-parameter vision-language model through vLLM, so inference needs a recent NVIDIA GPU with at least 12 GB of VRAM.

The interesting bit

Most OCR tools treat each page as an isolated image and pray the layout is simple. OCRFlux’s unusual angle is explicitly modeling cross-page structure: it detects when a table or paragraph continues on the next page, then merges fragments even when headers repeat or cells split mid-row. The README documents genuinely gnarly cases—vertical table splits, multi-line cells broken across pages—that most pipelines simply garble.

Key highlights

3B-parameter VLM runs on a GTX 3090; no 70B model required
Benchmarks against olmOCR-7B, Nanonets-OCR-s, and MonkeyOCR on manually labeled English and Chinese data
Claims 0.967 average Edit Distance Similarity on single-page parsing versus 0.872 for olmOCR-7B
Cross-page table/paragraph detection scores 0.986 F1 on held-out test data
Ships four evaluation datasets on Hugging Face, including a 9K-sample table-merging benchmark

Caveats

Complex tables (rowspan/colspan) underperform simpler ones: 0.807 TEDS vs. 0.912 on the PubTabNet-derived benchmark, and behind MonkeyOCR on that specific split
Installation is finicky: requires poppler-utils, specific Microsoft and Crosextra fonts, and a clean conda environment; the README warns against installing into existing Python environments
Only launched June 2025; long-term maintenance trajectory unclear

Verdict

Worth a look if you regularly ingest academic papers, financial reports, or scanned documents where table continuity matters. Skip it if you need CPU-only inference or your PDFs are already clean single-page images.

Frequently asked

What is chatdoc-com/OCRFlux?: OCRFlux converts messy PDFs and images into clean Markdown by treating layout as a visual reasoning problem, not a pipeline of brittle heuristics.
Is OCRFlux open source?: Yes — chatdoc-com/OCRFlux is open source, released under the Apache-2.0 license.
What language is OCRFlux written in?: chatdoc-com/OCRFlux is primarily written in Python.
How popular is OCRFlux?: chatdoc-com/OCRFlux has 2.5k stars on GitHub.
Where can I find OCRFlux?: chatdoc-com/OCRFlux is on GitHub at https://github.com/chatdoc-com/OCRFlux.