← all repositories

CLUEbenchmark/SuperCLUE

A benchmark for evaluating Chinese foundation models and large language models across multiple capability dimensions including agent performance.

SuperCLUE
Velocity · 7d
+2.9
★ / day
Trend
steady
star history

SuperCLUE is a comprehensive benchmark for evaluating Chinese foundation models and LLMs. It assesses models across four primary capability quadrants: language understanding and generation, professional skills and knowledge, AI agents, and safety. The benchmark includes specific sub-evaluations such as SuperCLUE-Agent for agent task performance and SuperCLUE-Safety for adversarial safety testing. It provides monthly leaderboards and annual reports tracking the progress of Chinese AI models.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.