CLUEbenchmark/SuperCLUE
A benchmark for evaluating Chinese foundation models and large language models across multiple capability dimensions including agent performance.

SuperCLUE is a comprehensive benchmark for evaluating Chinese foundation models and LLMs. It assesses models across four primary capability quadrants: language understanding and generation, professional skills and knowledge, AI agents, and safety. The benchmark includes specific sub-evaluations such as SuperCLUE-Agent for agent task performance and SuperCLUE-Safety for adversarial safety testing. It provides monthly leaderboards and annual reports tracking the progress of Chinese AI models.