← all repositories

jeinlee1991/chinese-llm-benchmark

A Chinese LLM evaluation benchmark system that assesses 383+ models across 7 domains and ~300 capability dimensions

6.1k stars LLMOps · Eval
chinese-llm-benchmark
Velocity · 7d
+5.6
★ / day
Trend
steady
star history

ReLE (Really Reliable Live Evaluation for LLM) is a structured benchmarking system for Chinese large language models. It evaluates 383+ models across education, healthcare, finance, law, reasoning, language, and agent capabilities in approximately 300 dimensions. Beyond rankings, it maintains a defect database exceeding 2 million entries to support community research and model improvement efforts.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.