open-compass/opencompass
A Python-based evaluation platform for benchmarking large language models across 100+ datasets.

Velocity · 7d
+6.5
★ / day
Trend
→steady
star history
OpenCompass is a comprehensive LLM evaluation platform designed to assess large language model performance. It supports a wide range of models including Llama3, Mistral, InternLM2, GPT-4, LLaMa2, Qwen, GLM, and Claude. The platform provides benchmarking capabilities across over 100 datasets, enabling standardized assessment of model capabilities.