THU-KEG/EvaluationPapers4ChatGPT
A curated collection of evaluation papers, datasets, and benchmarking tools for assessing ChatGPT and large language model performance.

Velocity · 7d
+0.4
★ / day
Trend
→steady
star history
This repository aggregates research resources for evaluating ChatGPT and similar LLMs. It maintains ongoing datasets like ChatLog that track LLM responses over time, and introduces evaluation frameworks such as Language-Model-as-an-Examiner and the KoLA knowledge evaluation platform. The project also catalogs detection tools for identifying LLM-generated content and serves as a reference hub for the LLM evaluation community.