← all repositories

onejune2018/Awesome-LLM-Eval

A curated list of tools, benchmarks, datasets, leaderboards, and papers for evaluating large language models and exploring the boundaries of generative AI.

Awesome-LLM-Eval
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

This repository aggregates resources for LLM evaluation including benchmark datasets across domains like RAG, agents, coding, and multimodal capabilities. It organizes tools for inference-speed and quantization testing, leaderboards for model comparison, and academic papers on evaluation methodologies. The project serves as the official companion to a survey paper on value-oriented LLM evaluation roadmaps.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.