An awesome list that actually does its homework
Because most "awesome" lists are link dumps, and measuring AI agents is already messy enough.

What it does
A curated collection of papers, blogs, talks, tools, and benchmarks for building and evaluating AI agents. Unlike typical awesome-lists, every entry is annotated with what it is and why it belongs, URLs are checked, and dead tools are pruned rather than silently listed. It also ships with 146 deep reading notes and a runnable playbook (PATTERNS.md) covering LLM-as-judge, pass@k, trajectory grading, and CI gating.
The interesting bit
The maintainers ran a depth-4 recursive citation crawl across 11.6k papers, transcribed 47 talks with timestamps, and conducted adversarial gap audits per section. The result treats evaluation as infrastructure engineering, not a vibe check.
Key highlights
- 443+ annotated links and 146 deep reading notes in
notes/ - Runnable playbook (
PATTERNS.md) with worked examples for LLM-as-judge, pass@k, error analysis, and CI gating - Depth-4 recursive citation crawl of 11.6k papers ranked by in-degree, supplemented by targeted practitioner discovery
- 47 talks and podcasts transcribed with verbatim quotes and timestamps
- Explicit pruning of dead or abandoned tools; entries marked with 🆕 (2025–2026) or ⚠️ (caveat)
Caveats
- At least one high-profile link is already flagged with an ⚠️ for an unverified URL.
- The curation is aggressively opinionated (“non-BS”), so popular resources can be excluded if the maintainers disagree with their value.
Verdict
Read this if you build or evaluate AI agents and are tired of stale link dumps that treat evaluation as an afterthought. Skip it if you want a neutral, exhaustive bibliography without editorial judgment.
Frequently asked
- What is benchflow-ai/awesome-evals?
- Because most "awesome" lists are link dumps, and measuring AI agents is already messy enough.
- Is awesome-evals open source?
- Yes — benchflow-ai/awesome-evals is an open-source project tracked on heatdrop.
- How popular is awesome-evals?
- benchflow-ai/awesome-evals has 532 stars on GitHub.
- Where can I find awesome-evals?
- benchflow-ai/awesome-evals is on GitHub at https://github.com/benchflow-ai/awesome-evals.