jeinlee1991/chinese-llm-benchmark
A Chinese LLM evaluation benchmark system that assesses 383+ models across 7 domains and ~300 capability dimensions
★6.1k stars LLMOps · Eval

Velocity · 7d
+5.6
★ / day
Trend
→steady
star history
ReLE (Really Reliable Live Evaluation for LLM) is a structured benchmarking system for Chinese large language models. It evaluates 383+ models across education, healthcare, finance, law, reasoning, language, and agent capabilities in approximately 300 dimensions. Beyond rankings, it maintains a defect database exceeding 2 million entries to support community research and model improvement efforts.