← all repositories

open-compass/opencompass

A Python-based evaluation platform for benchmarking large language models across 100+ datasets.

7.1k stars Python LLMOps · EvalLanguage Models
opencompass
Velocity · 7d
+6.5
★ / day
Trend
steady
star history

OpenCompass is a comprehensive LLM evaluation platform designed to assess large language model performance. It supports a wide range of models including Llama3, Mistral, InternLM2, GPT-4, LLaMa2, Qwen, GLM, and Claude. The platform provides benchmarking capabilities across over 100 datasets, enabling standardized assessment of model capabilities.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.