LiveBench/LiveBench
LiveBench is an LLM benchmark that releases new evaluation questions monthly to test model capabilities without data contamination.

LiveBench is a benchmark suite for evaluating large language models with contamination-resistant test sets. It releases new questions monthly based on recent datasets, arXiv papers, news articles, and IMDb synopses. Each task includes verifiable, objective ground-truth answers enabling accurate, automated scoring without LLM judges. The framework covers 18 diverse tasks across 6 categories and maintains a public leaderboard for model comparison.