← all repositories

EleutherAI/lm-evaluation-harness

A Python framework for few-shot evaluation of language models across standard benchmarks.

12.9k stars Python LLMOps · Eval
lm-evaluation-harness
Velocity · 7d
+6.1
★ / day
Trend
steady
star history

The lm-evaluation-harness provides a standardized framework for evaluating language models using few-shot prompting techniques. It supports evaluation on standard benchmarks and leaderboards, with backend support for HuggingFace transformers, vLLM, and SGLang. The tool enables reproducible evaluation of model capabilities across tasks like reasoning, question answering, and multimodal understanding.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.