run-llama/ParseBench
A benchmark evaluating document parsing tools for AI agents, covering 2000 pages across five capability dimensions with a public leaderboard.

ParseBench assesses how well document parsing systems convert PDFs into structured output usable by AI agents. It tests five capability dimensions targeting failure modes that break production agent workflows: tables, charts, content faithfulness, semantic formatting, and visual grounding. The repository includes a leaderboard of commercial and open-source document parsing services including VLMs, LlamaIndex parsers, and commercial APIs.