huggingface/evaluation-guidebook
A comprehensive guidebook on evaluating large language models, covering automatic benchmarks, evaluation design, and practical tips from the Open LLM Leaderboard.

Velocity · 7d
+3.5
★ / day
Trend
→steady
star history
This repository provides practical insights and theoretical knowledge for evaluating LLMs. It covers automatic benchmarks, designing custom evaluations, and troubleshooting common issues. The guide targets users ranging from beginners to advanced practitioners, drawing from experience managing the Open LLM Leaderboard and developing the lighteval evaluation framework.