huggingface/lighteval
A benchmarking and evaluation framework from Hugging Face for assessing LLM performance on standard benchmarks.

Velocity · 7d
+2.8
★ / day
Trend
→steady
star history
Lighteval is a comprehensive evaluation toolkit designed by Hugging Face’s Evals Team to benchmark LLMs across diverse backends. It enables standardized performance measurement using existing tasks and metrics, with support for custom evaluation scenarios. Results are saved with detailed, sample-level granularity to support debugging and comparative analysis across model runs.