← all repositories

THU-KEG/EvaluationPapers4ChatGPT

A curated collection of evaluation papers, datasets, and benchmarking tools for assessing ChatGPT and large language model performance.

EvaluationPapers4ChatGPT
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

This repository aggregates research resources for evaluating ChatGPT and similar LLMs. It maintains ongoing datasets like ChatLog that track LLM responses over time, and introduces evaluation frameworks such as Language-Model-as-an-Examiner and the KoLA knowledge evaluation platform. The project also catalogs detection tools for identifying LLM-generated content and serves as a reference hub for the LLM evaluation community.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.