← all repositories

LiveBench/LiveBench

LiveBench is an LLM benchmark that releases new evaluation questions monthly to test model capabilities without data contamination.

1.2k stars Python LLMOps · EvalLanguage Models
LiveBench
Velocity · 7d
+1.6
★ / day
Trend
steady
star history

LiveBench is a benchmark suite for evaluating large language models with contamination-resistant test sets. It releases new questions monthly based on recent datasets, arXiv papers, news articles, and IMDb synopses. Each task includes verifiable, objective ground-truth answers enabling accurate, automated scoring without LLM judges. The framework covers 18 diverse tasks across 6 categories and maintains a public leaderboard for model comparison.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.