← all repositories

huggingface/lighteval

A benchmarking and evaluation framework from Hugging Face for assessing LLM performance on standard benchmarks.

2.4k stars Python LLMOps · Eval
lighteval
Velocity · 7d
+2.8
★ / day
Trend
steady
star history

Lighteval is a comprehensive evaluation toolkit designed by Hugging Face’s Evals Team to benchmark LLMs across diverse backends. It enables standardized performance measurement using existing tasks and metrics, with support for custom evaluation scenarios. Results are saved with detailed, sample-level granularity to support debugging and comparative analysis across model runs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.