← all repositories

THUDM/AlignBench

A comprehensive multi-dimensional benchmark for evaluating Chinese large language model alignment using LLM-as-Judge methodology.

429 stars Python LLMOps · Eval
AlignBench
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

AlignBench is a benchmark designed to evaluate the alignment performance of Chinese large language models. It employs a multi-dimensional, rule-calibrated LLM-as-Judge evaluation approach combined with Chain-of-Thought reasoning to generate analysis and final scores. The benchmark includes a human-involved data construction pipeline to ensure dynamic updates of evaluation data and covers multiple evaluation dimensions to assess real-world model performance.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.