← all repositories

run-llama/ParseBench

A benchmark evaluating document parsing tools for AI agents, covering 2000 pages across five capability dimensions with a public leaderboard.

480 stars Python LLMOps · EvalAgents
ParseBench
Velocity · 7d
+8.2
★ / day
Trend
steady
star history

ParseBench assesses how well document parsing systems convert PDFs into structured output usable by AI agents. It tests five capability dimensions targeting failure modes that break production agent workflows: tables, charts, content faithfulness, semantic formatting, and visual grounding. The repository includes a leaderboard of commercial and open-source document parsing services including VLMs, LlamaIndex parsers, and commercial APIs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.