← all repositories

huggingface/evaluation-guidebook

A comprehensive guidebook on evaluating large language models, covering automatic benchmarks, evaluation design, and practical tips from the Open LLM Leaderboard.

2.1k stars Jupyter Notebook LLMOps · EvalLearning
evaluation-guidebook
Velocity · 7d
+3.5
★ / day
Trend
steady
star history

This repository provides practical insights and theoretical knowledge for evaluating LLMs. It covers automatic benchmarks, designing custom evaluations, and troubleshooting common issues. The guide targets users ranging from beginners to advanced practitioners, drawing from experience managing the Open LLM Leaderboard and developing the lighteval evaluation framework.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.