Is ParseBench open source?

Yes — run-llama/ParseBench is open source, released under the Apache-2.0 license.

What language is ParseBench written in?

run-llama/ParseBench is primarily written in Python.

How popular is ParseBench?

run-llama/ParseBench has 532 stars on GitHub.

Where can I find ParseBench?

run-llama/ParseBench is on GitHub at https://github.com/run-llama/ParseBench.

← all repositories

run-llama/ParseBench

Parsing PDFs is solved; parsing them for agents is not

ParseBench scores document parsers on whether their output preserves the structure and meaning AI agents need to make autonomous decisions.

★532 stars Python LLMOps · Eval Agents

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

ParseBench runs PDF parsing tools against roughly 2,000 human-verified pages from real enterprise documents—insurance, finance, and government records—and scores them on five capability dimensions. It does not measure textual similarity to a reference; it checks whether extracted tables, charts, formatting, and reading order remain faithful enough for downstream agents to reason over without misreading values or hallucinating context. The benchmark ships with more than 90 pre-configured pipelines spanning proprietary APIs, open-weight vision models, and commercial document services.

The interesting bit

The evaluation is deliberately deterministic and rule-based—no LLM-as-a-judge—because agent workflows need auditable, reproducible failure modes rather than vibes-based scores. Each dimension targets a specific production failure: merged table headers, omitted strikethrough text, chart data points stripped of axis labels, or extracted claims that cannot be traced back to a coordinate on the page.

Key highlights

Five stratified dimensions—Tables, Charts, Content Faithfulness, Semantic Formatting, and Visual Grounding—each with its own ground-truth format and metric.
Real-world dataset drawn from production domains, not synthetic academic PDFs.
Deterministic scoring using custom metrics like GTRM and ChartDataPointMatch rather than fuzzy LLM evaluation.
Public leaderboard tracking more than 90 pipelines, from LlamaParse and Gemini to AWS Textract and Docling.
Full dataset hosted on HuggingFace and automatically fetched during evaluation runs.

Caveats

The leaderboard shows “—” for cost per page on several open-weight entries, so cost comparisons against proprietary APIs are incomplete.

Verdict

Teams building agentic workflows over enterprise PDFs should treat this as a sanity check for their parsing layer; if you are only doing occasional human-in-the-loop extraction, the granularity is probably overkill.

Frequently asked

What is run-llama/ParseBench?: ParseBench scores document parsers on whether their output preserves the structure and meaning AI agents need to make autonomous decisions.
Is ParseBench open source?: Yes — run-llama/ParseBench is open source, released under the Apache-2.0 license.
What language is ParseBench written in?: run-llama/ParseBench is primarily written in Python.
How popular is ParseBench?: run-llama/ParseBench has 532 stars on GitHub.
Where can I find ParseBench?: run-llama/ParseBench is on GitHub at https://github.com/run-llama/ParseBench.