Is opencompass open source?

Yes — open-compass/opencompass is open source, released under the Apache-2.0 license.

What language is opencompass written in?

open-compass/opencompass is primarily written in Python.

How popular is opencompass?

open-compass/opencompass has 7.2k stars on GitHub.

Where can I find opencompass?

open-compass/opencompass is on GitHub at https://github.com/open-compass/opencompass.

← all repositories

open-compass/opencompass

The Assembly Line for LLM Report Cards

OpenCompass orchestrates standardized testing across dozens of models and inference backends so teams can stop rewriting evaluation scripts.

★7.2k stars Python LLMOps · Eval Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

OpenCompass is a Python evaluation platform that runs large language models through a battery of standardized benchmarks—over 100 datasets covering reasoning, math, coding, long-context retrieval, and multilingual QA. It supports everything from local HuggingFace weights to API-only models like GPT-4 and Claude, and can swap inference backends (vLLM, LMDeploy) without rewriting test code. The project also maintains public leaderboards and distributes pre-packaged dataset collections so teams can reproduce academic rankings without hunting down test files.

The interesting bit

The framework treats evaluation as a pipeline rather than a single score: its CascadeEvaluator chains multiple judges in sequence, and built-in post-processing tools like XFinder attempt to extract answers from messy model outputs before grading. That focus on the unglamorous plumbing—answer extraction, backend abstraction, and LLM-as-judge scoring—is where most of the value hides.

Key highlights

Supports 100+ datasets including custom academic benchmarks (MuSR, BABILong, RULER, SciCode, SuperGPQA) and API models like OpenAI o1 and DeepSeek-R1
One-click switching between HuggingFace, vLLM, and LMDeploy inference backends
CascadeEvaluator and GenericLLMEvaluator enable multi-stage and LLM-as-judge scoring pipelines
Automatic dataset downloads from OpenCompass or on-demand loading via ModelScope
Meta AI recommends it as a validation tool for Llama models

Caveats

Version 0.4.0 introduced a breaking change that consolidated configuration files into the package itself, so existing setups may need reference updates
Acceleration backends have dependency conflicts that the docs explicitly warn require separate virtual environments
The README is heavy on quickstart material and light on architectural detail

Verdict

Teams training or fine-tuning LLMs who need reproducible, multi-benchmark scores across different hardware and APIs will find this saves weeks of scripting. If you only need to run a single custom eval once, it is probably overkill.

Frequently asked

What is open-compass/opencompass?: OpenCompass orchestrates standardized testing across dozens of models and inference backends so teams can stop rewriting evaluation scripts.
Is opencompass open source?: Yes — open-compass/opencompass is open source, released under the Apache-2.0 license.
What language is opencompass written in?: open-compass/opencompass is primarily written in Python.
How popular is opencompass?: open-compass/opencompass has 7.2k stars on GitHub.
Where can I find opencompass?: open-compass/opencompass is on GitHub at https://github.com/open-compass/opencompass.